macrosynergy.download.dataquery_file_api.file_loader#
JPMaQS DataQuery File API - local parquet loader.
This module contains the local-cache loading utilities used by DataQueryFileAPIClient.load_data() and DataQueryFileAPIClient.download().
Notes
The loader assumes JPMaQS snapshot/delta parquet files follow a canonical schema (see JPMaQSParquetExpectedColumns). If files have schema drift (extra/missing columns or incompatible dtypes), loading may fail or columns may be coerced.
DataQueryFileAPIClient.delete_corrupt_files() performs a strict schema check (exact Polars schema equality) when deciding whether a parquet file is “corrupt”. This is intentional and can delete files that are readable but do not match the expected schema exactly.
- lazy_load_from_parquets(files_dir, file_format='parquet', tickers=None, metrics=None, start_date=None, end_date=None, min_last_updated=None, max_last_updated=None, dataframe_format='qdf', dataframe_type='pandas', categorical_dataframe=True, datasets=None, include_delta_files=True, delta_treatment='latest', since_datetime=None, to_datetime=None, include_file_column=True, catalog_file=None, warn_if_no_full_snapshots=False)[source]#
This function helps to lazily load JPMaQS parquet files from a specified directory. It operates using the exact ticker names provided.
Notes
The datasets argument applies to the “effective dataset” (e-dataset), meaning that delta datasets (those ending with _DELTA) are treated as updates to their base dataset, not as separate datasets.
Vintage selection (to_datetime)#
When JPMaQS removes older full snapshots, historical data may only be reconstructible from delta files. In monthly “large delta” regimes, the delta file for a given month is timestamped at month-end (or the previous business day), which can fall after an in-month to_datetime (e.g., to_datetime=”2025-03-15”).
In that case the loader will still select the covering month-end delta file and you should use max_last_updated <= to_datetime to exclude updates beyond the requested vintage. DataQueryFileAPIClient.load_data() applies this default automatically when to_datetime is provided without max_last_updated.
- rtype:
Union[DataFrame,DataFrame,LazyFrame]