.. _common_definitions: Common Definitions ================== Like any niche field, the world of quantamental data and its analysis has its own terminology and definitions. This section will cover the most common terms and definitions used within the Macrosynergy package. The Quantamental Data Format (QDF) ---------------------------------- The Quantamental Data Format, alternatively a Quantamental DataFrame, is a tabular schema for storing and organizing quantamental data. It is a simple, yet flexible format that allows for ease of use and extensibility. The Macrosynergy package is built around the QDF, and all data is stored in this format. An example of the QDF is shown below: ========== === ======== ===== ======= ======= ======= real_date cid xcat value grading eop_lag mop_lag ========== === ======== ===== ======= ======= ======= 2023-01-02 AUD FXXR_NSA 3.8 1 0 10 2023-01-02 AUD RIR_NSA 2.0 1 0 10 2023-01-02 AUD EQXR_NSA -0.1 1 0 10 2023-01-02 CAD FXXR_NSA 0.1 2 0 10 2023-01-02 CAD RIR_NSA 1.3 2 0 10 2023-01-02 CAD EQXR_NSA -0.2 2 0 10 ========== === ======== ===== ======= ======= ======= To get a better understanding of the QDF, please see our Research: `How to build a quantamental system for investment management `__ and `Quantitative methods for macro information efficiency `__ on the `Macrosynergy Research `__ website. Cross-Section Identifiers ------------------------- This would be a 3-letter string that identifies a cross-section/currency area. In the package, these are commonly referred to as ``cid`` or ``cids``. Examples: - USD - United States Dollar - FRA - France - EUR - Euro Area - GBP - Great Britain Pound Extended Category Codes ----------------------- These are strings identifying the category of a data series. In the package, these are commonly referred to as ``xcat`` or ``xcats``. Extended category codes denote base category tickers and their transformations. . For example CPI_NSA would be a base category for seasonally adjusted headline CPI and P1M1ML12 would mean % change of the latest month versus one year ago. An extended category code will contain the following parts: - Base category code (e.g. ``CPI``) - Adjustment (e.g. ``NSA``) - Transformation (e.g. ``P1M1ML12``) Combining these parts with underscores (``_``) will give us the extended category code. In the example above, the extended category code would be ``CPI_NSA_P1M1ML12``. JPMaQS Indicator Names ---------------------- These are strings that identify the data series in the JPMaQS dataset. These are unique to every time-series published in the dataset. These are commonly referred to as ``ticker``/``tickers`` or ``indicator`` in the package. The ticker name is composed of the cross-section identifier (``cid``) and the extended category code (``xcat``) separated by an underscore (``_``). For example: - Cross-Section Identifier: ``USD`` - Base Category Code: ``CPI`` - Adjustment: ``NSA`` - Transformation: ``P1M1ML12`` Which makes: - Extended Category Code: ``CPI_NSA_P1M1ML12`` and: - Ticker: ``USD_CPI_NSA_P1M1ML12`` JPMaQS Metrics ============== Grading ------- This is an integer between 1 and 3 (inclusive) that specifies the quality of the data series. Grade 1 is the highest quality, and Grade 3 is the lowest quality. Here, "quality" means estimated proximity to the actual value of an indicator to what was seen by the market at the related point in time. End-of-Period (EOP) lags ------------------------ These are integers that specify for each real-time date the number of business days that passed since the end of the concurrent observation period. For example, if the series reports monthly data, and the last day of the month would be the end of the period, then the EOP lag would be 0. When data is published for the next date in the series, the EOP lag would be 1. Median-of-Period (MOP) lags --------------------------- These are integers that specify for each real-time date the number of business days that passed since the median date of the concurrent observation period. For example, if the series reports monthly data, and the middle day of the month would be the median of the period, and for a value published on the first of the next month - the MOP lag would be approximated to 11 or 12. Lags ---- Used as ``lag`` or ``lags`` in the package, a "lag" refers to a time delay or the use of past values of a variable to explain its current or future value. Lags are applied in the native frequency of the data (i.e. monthly data is lagged by months, daily data is lagged by days). In simple terms, a lag represents the late arrival of a piece of information. Slippage -------- Used as ``slip`` in the package (see :func:`macrosynergy.management.utils.df_utils.apply_slip`), slippage is a term used to the delay in acting on a signal or a piece of information. It can also be seen as the time taken to take a position. In the package, slippage is always measured in business days, and is always applied before any resampling or lags. JPMaQS Expressions on DataQuery ------------------------------- The JPMaQS dataset is served via the `JPMorgan DataQuery `__ platform. The DataQuery platform allows users to search for, discover, and analyze financial data. DataQuery uses a consistent notation to represent the numerous datasets and the many data series within them. The notation is as follows: ``DB(,,`` Where: - ``DATASET_NAME`` is the name of the dataset (``JPMAQS``). - ``SERIES_IDENTIFIER`` is the unique identifier for the data series, which is the ticker name in the case of JPMaQS (e.g. ``USD_CPI_NSA_P1M1ML12``) - ``ATTRIBUTE`` is the attribute of the data series that you want to retrieve, which is the metric name for JPMaQS (e.g. ``value``). Which makes the full expression: - ``DB(JPMAQS,USD_CPI_NSA_P1M1ML12,value)``