macrosynergy.download.dataquery_file_api.file_loader#

JPMaQS DataQuery File API - local parquet loader.

This module contains the local-cache loading utilities used by DataQueryFileAPIClient.load_data() and DataQueryFileAPIClient.download().

Notes

  • The loader assumes JPMaQS snapshot/delta parquet files follow a canonical schema (see JPMaQSParquetExpectedColumns). If files have schema drift (extra/missing columns or incompatible dtypes), loading may fail or columns may be coerced.

  • DataQueryFileAPIClient.delete_corrupt_files() performs a strict schema check (exact Polars schema equality) when deciding whether a parquet file is “corrupt”. This is intentional and can delete files that are readable but do not match the expected schema exactly.

lazy_load_from_parquets(files_dir, file_format='parquet', tickers=None, metrics=None, start_date=None, end_date=None, min_last_updated=None, max_last_updated=None, dataframe_format='qdf', dataframe_type='pandas', categorical_dataframe=True, datasets=None, include_delta_files=True, delta_treatment='latest', since_datetime=None, to_datetime=None, include_file_column=True, catalog_file=None, warn_if_no_full_snapshots=False)[source]#

This function helps to lazily load JPMaQS parquet files from a specified directory. It operates using the exact ticker names provided.

Notes

The datasets argument applies to the “effective dataset” (e-dataset), meaning that delta datasets (those ending with _DELTA) are treated as updates to their base dataset, not as separate datasets.

Vintage selection (to_datetime)#

When JPMaQS removes older full snapshots, historical data may only be reconstructible from delta files. In monthly “large delta” regimes, the delta file for a given month is timestamped at month-end (or the previous business day), which can fall after an in-month to_datetime (e.g., to_datetime=”2025-03-15”).

In that case the loader will still select the covering month-end delta file and you should use max_last_updated <= to_datetime to exclude updates beyond the requested vintage. DataQueryFileAPIClient.load_data() applies this default automatically when to_datetime is provided without max_last_updated.

rtype:

Union[DataFrame, DataFrame, LazyFrame]

class JPMaQSParquetSchemaKind(value, names=<not given>, *values, module=None, qualname=None, type=None, start=1, boundary=None)[source]#

Bases: Enum

TICKER = 'ticker'#
QDF = 'qdf'#