macrosynergy.download.dataquery_file_api#

Client for downloading JPMaQS data files from the JPMorgan DataQuery File API.

This module provides the DataQueryFileAPIClient, a high-level wrapper for the JPMorgan DataQuery File API.

Note

This functionality is currently in BETA and is subject to significant changes without deprecation cycles.

Consumption & Examples#

Before using the client, ensure your API credentials are set as environment variables:

export DQ_CLIENT_ID="your_client_id"
export DQ_CLIENT_SECRET="your_client_secret"

Example 1: Initialize the client and list all available JPMaQS files.

from macrosynergy.download import DataQueryFileAPIClient
import pandas as pd

client = DataQueryFileAPIClient()

# Fetch a DataFrame of all available files for the JPMaQS group
available_files_df = client.list_available_files_for_all_file_groups()
print("Available JPMaQS files:")
print(available_files_df.head())

Example 2: Download all new or updated files for the current day.

This is the recommended way to get a daily snapshot of all JPMaQS data, including full datasets, deltas, and metadata.

from macrosynergy.download import DataQueryFileAPIClient
client = DataQueryFileAPIClient(out_dir="./jpmaqs_data")

print(f"Downloading today's files to {client.out_dir}...")
client.download_full_snapshot()
print("Download complete.")

Example 3: Download all new or updated files for the day, and load data from them as a dataframe.

Here, the client checks locally available files, compares them to the latest files. It automatically downloads new or updated files, and loads data for the specified cids, xcats, tickers, and start_date/end_date as appropriate. The resulting dataframe is returned to the user in the chosen dataframe format (quantamental format/tickers format) and dataframe type (pandas/polars).

from macrosynergy.download import DataQueryFileAPIClient

cids = ['AUD', 'CAD', 'USD', 'JPY']
xcats = ['EQXR_NSA', 'RIR_NSA']
start_date = '2000-01-01'

with DataQueryFileAPIClient(out_dir="./jpmaqs_data") as client:
    df = client.download(cids=cids, xcats=xcats, start_date=start_date)
    print(df.head())
   real_date  cid     xcat  value  eop_lag  mop_lag  grading        last_updated
0 2000-01-03  AUD  RIR_NSA  4.078      0.0     55.0     1.25 2024-07-25 07:27:22
1 2000-01-04  AUD  RIR_NSA  3.778      0.0     56.0     1.25 2024-07-25 07:27:22
2 2000-01-05  AUD  RIR_NSA  3.747      0.0     56.0     1.25 2024-07-25 07:27:22
3 2000-01-06  AUD  RIR_NSA  3.710      0.0     56.0     1.25 2024-07-25 07:27:22
4 2000-01-07  AUD  RIR_NSA  3.697      0.0     57.0     1.25 2024-07-25 07:27:22

Example 4: Download all new or updated delta-files since a specific date/time.

from macrosynergy.download import DataQueryFileAPIClient
import pandas as pd

client = DataQueryFileAPIClient("./jpmaqs_data")
since_datetime = pd.Timestamp.today() - pd.DateOffset(days=10)

client.download_full_snapshot(
    since_datetime=since_datetime,
    include_full_snapshots=False,
    include_metadata=True,
    include_delta=True,
)
print("Download complete.")

Example 5: Download a single, specific historical file.

from macrosynergy.download import DataQueryFileAPIClient
client = DataQueryFileAPIClient("./jpmaqs_data")
# This specific filename can be found using the list_available_files... methods
target_filename = "JPMAQS_MACROECONOMIC_BALANCE_SHEETS_20250414.parquet"

print(f"Downloading {target_filename}...")
file_path = client.download_file(filename=target_filename)
print(f"File downloaded to: {file_path}")

Example 6: Check availability for a specific file-group.

from macrosynergy.download import DataQueryFileAPIClient
client = DataQueryFileAPIClient()
file_group_id = "JPMAQS_MACROECONOMIC_BALANCE_SHEETS"

available_files = client.list_available_files(file_group_id=file_group_id)

# print the earliest file's details
print(available_files.iloc[-1])

Example 7: Load “notification” metadata (missing updates & revisions).

JPMaQS publishes daily metadata notification JSON files that summarize:

  • Missing updates (“Missing Updates”)

  • Additional info about missing updates (“Additional information on missing updates”)

  • Changed historical values (“Changed historical values”)

The helpers below download the relevant metadata for the requested date (UTC, business-day window) if needed, and return the notifications as pandas DataFrames.

from macrosynergy.download import DataQueryFileAPIClient

with DataQueryFileAPIClient(out_dir="./jpmaqs_data") as client:
    missing_df = client.get_missing_data_notifications(date="2026-01-19")
    revisions_df = client.get_revisions_notifications(date="2026-01-19")

    print(missing_df.head())
    print(revisions_df.head())
class DataQueryFileAPIOauth(client_id, client_secret, resource='JPMC:URI:RS-06785-DataQueryExternalApi-PROD', auth_url='https://authe.jpmchase.com/as/token.oauth2', root_url='https://api-dataquery.jpmchase.com/research/dataquery-authe/api/v2', application_name='DataQueryFileAPI', proxies=None, verify=True, **kwargs)[source]#

Bases: JPMorganOAuth

A class to handle OAuth authentication for the JPMorgan DataQuery File API.

class DataQueryFileAPIClient(client_id=None, client_secret=None, out_dir=None, base_url='https://api-dataquery.jpmchase.com/research/dataquery-authe/api/v2', scope='JPMC:URI:RS-06785-DataQueryExternalApi-PROD', proxies=None, verify_ssl=True)[source]#

Bases: object

A client for accessing JPMaQS product files via the JPMorgan DataQuery File API.

This client provides an alternative distribution channel to the Fusion API for JPMaQS data. It is designed to list and download JPMaQS data files, which are available as full snapshots, daily deltas, and metadata files. The client handles authentication, API requests, and file downloads, including large file downloads using a segmented, concurrent approach.

Parameters:
  • client_id (Optional[str]) – Client ID for authentication. If not provided, it will be sourced from environment variables (DQ_CLIENT_ID or DATAQUERY_CLIENT_ID).

  • client_secret (Optional[str]) – Client Secret for authentication. If not provided, it will be sourced from environment variables (DQ_CLIENT_SECRET or DATAQUERY_CLIENT_SECRET).

  • out_dir (Optional[str]) – Default output directory for downloads. Can be overridden in download methods.

  • base_url (str) – The base URL for the DataQuery File API. Defaults to DQ_FILE_API_BASE_URL.

  • scope (str) – The API scope for authentication. Defaults to DQ_FILE_API_SCOPE.

  • proxies (Optional[Dict[str, str]]) – Optional proxies to use for HTTP requests. Defaults to None.

  • verify_ssl (bool) – If True, verifies SSL certificates for all requests. Defaults to True.

list_groups()[source]#

Lists all available data provider groups.

Returns:

A DataFrame containing details of available groups.

Return type:

pd.DataFrame

search_groups(keywords)[source]#

Searches for data provider groups that match the given keywords.

Parameters:

keywords (str) – Keywords to search for in group names and descriptions.

Returns:

A DataFrame of groups matching the search criteria.

Return type:

pd.DataFrame

list_group_files(group_id='JPMAQS', include_full_snapshots=True, include_delta=True, include_metadata=True)[source]#

Lists all file groups (datasets) for a specific data provider.

Parameters:
  • group_id (str) – The identifier for the data provider group, defaults to the JPMaQS group.

  • include_full_snapshots (bool) – If True, include full snapshot file groups in the result.

  • include_delta (bool) – If True, include delta file groups in the result.

  • include_metadata (bool) – If True, include metadata file groups in the result.

Returns:

A DataFrame listing the available file groups.

Return type:

pd.DataFrame

list_available_files(file_group_id=None, group_id='JPMAQS', start_date='20220101', end_date=None, convert_metadata_timestamps=True, include_unavailable=False)[source]#

Lists all available files for a specific file group within a date range.

Parameters:
  • file_group_id (Optional[str]) – The identifier for the file group (e.g. “JPMAQS_MACROECONOMIC_BALANCE_SHEETS”). If None, returns all files for the group_id. Defaults to None.

  • group_id (str) – The identifier for the data provider group.

  • start_date (str) – The start date for the search in “YYYYMMDD” format.

  • end_date (str) – The end date for the search in “YYYYMMDD” format. Defaults to today.

  • convert_metadata_timestamps (bool) – If True, convert timestamp columns to datetime objects.

  • include_unavailable (bool) – If True, includes files that are listed but not currently available.

Returns:

A DataFrame of available files with their details.

Return type:

pd.DataFrame

list_available_files_for_all_file_groups(group_id='JPMAQS', start_date='20220101', end_date=None, include_full_snapshots=True, include_delta=True, include_metadata=True, convert_metadata_timestamps=True, include_unavailable=False)[source]#

Fetches and consolidates available files for all relevant file groups.

This method concurrently queries for available files across all specified file group types (full snapshots, deltas, metadata) for a given provider.

Parameters:
  • group_id (str) – The identifier for the data provider group.

  • start_date (str) – The start date for the search in “YYYYMMDD” format.

  • end_date (str) – The end date for the search in “YYYYMMDD” format. Defaults to today.

  • include_full_snapshots (bool) – If True, query for full snapshot file groups.

  • include_delta (bool) – If True, query for delta file groups.

  • include_metadata (bool) – If True, query for metadata file groups.

  • convert_metadata_timestamps (bool) – If True, convert timestamp columns to datetime objects.

  • include_unavailable (bool) – If True, include files that are listed but not currently available.

Returns:

A consolidated DataFrame of all available files.

Return type:

pd.DataFrame

filter_available_files_by_datetime(since_datetime=None, to_datetime=None, include_full_snapshots=True, include_delta=True, include_metadata=True, include_unavailable=False)[source]#

Retrieves files whose ‘last-modified’ timestamp falls within a datetime window.

Parameters:
  • since_datetime (Optional[str]) – The start of the time window (inclusive). Format “YYYYMMDD” or “YYYYMMDDTHHMMSS”. Defaults to the start of the current day (UTC).

  • to_datetime (Optional[str]) – The end of the time window (inclusive). Format “YYYYMMDD” or “YYYYMMDDTHHMMSS”. Defaults to the current timestamp (UTC).

  • include_full_snapshots (bool) – If True, include full snapshot files in the search.

  • include_delta (bool) – If True, include delta files in the search.

  • include_metadata (bool) – If True, include metadata files in the search.

  • include_unavailable (bool) – If True, include files that are not currently available for download.

Returns:

A DataFrame of files modified within the specified time window.

Return type:

pd.DataFrame

check_file_availability(file_group_id=None, file_datetime=None, filename=None)[source]#

Checks if a specific file is available for download.

Provide either (file_group_id and file_datetime) or filename.

Parameters:
  • file_group_id (str) – The identifier for the file group.

  • file_datetime (str) – The file’s timestamp identifier.

  • filename (Optional[str]) – The full name of the file (e.g., “JPMAQS_GENERIC_RETURNS_20250501.parquet”).

Returns:

A DataFrame with the file’s availability status.

Return type:

pd.DataFrame

download_file(file_group_id=None, file_datetime=None, filename=None, out_dir=None, overwrite=False, qdf=False, as_csv=False, keep_raw_data=False, chunk_size=None, timeout=300.0, max_retries=3)[source]#

Downloads a single Parquet file to a specified directory.

This method can be called with either (file_group_id and file_datetime) or a filename. For large files, it automatically uses the SegmentedFileDownloader for a robust, multi-part download.

Parameters:
  • file_group_id (str) – The identifier of the file group to download from.

  • file_datetime (str) – The timestamp of the file to download.

  • filename (Optional[str]) – The full filename to download. Overrides file_group_id and file_datetime.

  • out_dir (str) – The directory where the file will be saved.

  • overwrite (bool) – If True, overwrites the file if it already exists. Default is False.

  • qdf (bool) – If True, converts the DataFrame to a QuantamentalDataFrame. If False, files are saved as-is in the ticker-based Parquet format. Default is False.

  • as_csv (bool) – If True, saves the downloaded datasets as CSV files. Default is False, with Parquet as the default format.

  • keep_raw_data (bool) – If True, keeps the raw data files after conversion. Default is False.

  • chunk_size (Optional[int]) – The chunk size for streaming downloads (in bytes).

  • timeout (Optional[float]) – The timeout for the download request in seconds.

  • max_retries (int) – The number of retries for the entire file download.

Returns:

The full path to the downloaded file.

Return type:

str

delete_corrupt_files(out_dir=None, files=None)[source]#

Deletes corrupt files from the provided list based on file integrity checks.

Parameters:
  • out_dir (Optional[str]) – The directory to scan for corrupt files. If None, uses the client’s default output directory.

  • files (Optional[List[str]]) – A list of file paths to check for corruption. If None, scans all downloaded files in the specified output directory.

Returns:

A list of file paths that were identified as corrupt and deleted.

Return type:

List[str]

download_multiple_files(filenames, out_dir=None, overwrite=False, qdf=False, as_csv=False, keep_raw_data=False, max_retries=3, n_jobs=None, chunk_size=None, timeout=300.0, show_progress=True)[source]#

Downloads a list of files concurrently with progress indication.

Parameters:
  • filenames (List[str]) – A list of full filenames to be downloaded.

  • out_dir (str) – The directory to save the downloaded files.

  • overwrite (bool) – If True, overwrites files if they already exist. Default is False.

  • qdf (bool) – If True, converts the DataFrame to a QuantamentalDataFrame. If False, files are saved as-is in the ticker-based Parquet format. Default is False.

  • as_csv (bool) – If True, saves the DataFrame as a CSV file. Default is False.

  • keep_raw_data (bool) – If True, keeps the raw data files after conversion. Default is False.

  • max_retries (int) – The number of times to retry downloading the entire list of failed files.

  • n_jobs (int) – The number of concurrent download jobs. If -1, it uses all available cores.

  • chunk_size (Optional[int]) – The chunk size for streaming downloads (in bytes).

  • timeout (Optional[float]) – The timeout for each download request in seconds.

  • show_progress (bool) – If True, displays a progress bar for the downloads.

Return type:

None

download_catalog_file(out_dir=None, add_dataset_column=False, as_csv=False, overwrite=False, keep_raw_data=False, timeout=300.0)[source]#
Return type:

str

get_datasets_for_indicators(tickers=None, cids=None, xcats=None, case_sensitive=False, out_dir=None)[source]#
Return type:

List[str]

list_downloaded_files(out_dir=None)[source]#
Return type:

DataFrame

get_revisions_notifications(date=None, normalize_headers=True)[source]#

Return “Changed historical values” notifications for a given date.

This loads daily JPMaQS metadata notification JSON(s) for the requested date and returns the table describing historical revisions. If no matching notification file(s) are found, an empty DataFrame is returned.

Parameters:
  • date (Optional[Union[pd.Timestamp, str]]) – Target date (UTC). Strings can be “YYYY-MM-DD”, “YYYYMMDD”, or ISO 8601. Defaults to today (UTC).

  • normalize_headers (bool) – If True, normalizes column names to lowercase snake_case and converts “(%)” to “pct”. Defaults to True.

Returns:

A DataFrame of revision notifications. Empty if none are found.

Return type:

pd.DataFrame

get_missing_data_notifications(date=None, normalize_headers=True)[source]#

Return missing-update notifications (with optional additional information).

This loads daily JPMaQS metadata notification JSON(s) for the requested date. It returns:

  • “Missing Updates” rows

  • left-joined with “Additional information on missing updates” when available

If only one of the two tables is available, that table is returned. If neither is available, an empty DataFrame is returned.

Parameters:
  • date (Optional[Union[pd.Timestamp, str]]) – Target date (UTC). Strings can be “YYYY-MM-DD”, “YYYYMMDD”, or ISO 8601. Defaults to today (UTC).

  • normalize_headers (bool) – If True, normalizes column names to lowercase snake_case and converts “(%)” to “pct”. Defaults to True.

Returns:

A DataFrame of missing-update notifications (optionally enriched).

Return type:

pd.DataFrame

download_full_snapshot(out_dir=None, since_datetime=None, to_datetime=None, file_datetime=None, overwrite=False, qdf=False, as_csv=False, keep_raw_data=False, chunk_size=None, timeout=300.0, include_full_snapshots=True, include_delta=True, include_metadata=True, file_group_ids=None, show_progress=True)[source]#

Downloads a complete snapshot of files based on specified criteria.

This method fetches a list of files modified within a given time window and then downloads them. It can be customized to download only specific file types or from a specific list of file groups.

Parameters:
  • out_dir (str) – The directory where files will be saved.

  • since_datetime (Optional[str]) – Download files modified since this timestamp (inclusive). Defaults to the start of the current day (UTC) if file_datetime is not set.

  • to_datetime (Optional[str]) – Download files modified up to this timestamp (inclusive).

  • file_datetime (Optional[str]) – A specific file date to check for. Overrides since_datetime.

  • overwrite (bool) – If True, overwrites files if they already exist. Default is False.

  • qdf (bool) – If True, converts the DataFrame to a QuantamentalDataFrame. If False, files are saved as-is in the ticker-based Parquet format. Default is False.

  • as_csv (bool) – If True, saves the downloaded datasets as CSV files. Default is False, with Parquet as the default format.

  • keep_raw_data (bool) – If True, keeps the raw data files after conversion. Default is False.

  • chunk_size (Optional[int]) – The chunk size for streaming downloads (in bytes).

  • timeout (Optional[float]) – The timeout for each download request in seconds.

  • include_full_snapshots (bool) – If True, download full snapshot files.

  • include_delta (bool) – If True, download delta files.

  • include_metadata (bool) – If True, download metadata files.

  • file_group_ids (Optional[List[str]]) – A specific list of file groups to download from. If provided, only files from these groups will be downloaded.

  • show_progress (bool) – If True, displays a progress bar for downloads.

Return type:

None

download(tickers=None, cids=None, xcats=None, metrics=None, start_date=None, end_date=None, dataframe_format='qdf', dataframe_type='pandas', categorical_dataframe=True, include_delta_files=False, show_progress=True, out_dir=None, overwrite=False, qdf=False, keep_raw_data=False, as_csv=False)[source]#

A method to download data and load it as a DataFrame based on specified indicators, and specified date range.

Parameters:
  • tickers (Optional[List[str]]) – A list of tickers to filter datasets. Each ticker must be in the standard format “CID_XCAT” used in JPMaQS.

  • cids (Optional[List[str]]) – A list of cross-sectional identifiers (CIDs) to filter datasets.

  • xcats (Optional[List[str]]) – A list of extended categories (XCATS) to filter datasets.

  • metrics (Optional[List[str]]) – A list of JPMaQS metrics to filter the data. Available metrics are “value”, “grading”, “eop_lag”, “mop_lag”, and “last_updated”. The available metrics are also defined in macrosynergy.constants.JPMAQS_METRICS. The default is None, in which case all metrics are returned.

  • start_date (Optional[str]) – The start date for the returned data in the ISO format “YYYY-MM-DD”. If None, data is returned from the earliest available date.

  • end_date (Optional[str]) – The end date for the returned data in the ISO format “YYYY-MM-DD”. If None, data is returned up to the latest available date.

  • dataframe_format (str) – The format of the returned DataFrame. Options are “qdf” for QuantamentalDataFrame or “tickers” for a standard DataFrame with tickers as columns. Default is “qdf”.

  • dataframe_type (str) – The type of DataFrame to return. Options are “pandas” for a pandas DataFrame, “polars” for a polars DataFrame, or “polars-lazy” for a polars LazyFrame. Default is “pandas”.

  • categorical_dataframe (bool) – If True and dataframe_type is “pandas”, the returned DataFrame will use categorical dtypes for object columns. Default is True.

  • include_delta_files (bool) – If True, delta files will be included in the download. Default is False.

  • show_progress (bool) – If True, displays a progress bar during downloads. Default is True.

  • out_dir (Optional[str]) – The output directory for downloaded files. The default directory being used by the DataQueryFileAPI instance is used if None.

  • overwrite (bool) – If True, overwrites files if they already exist. Default is False.

  • qdf (bool) – If True, each downloaded dataframe will be saved as a QuantamentalDataFrame, otherwise files are saved as-is in the ticker-based Parquet format. Default is False.

  • keep_raw_data (bool) – If True, keeps the raw data files after conversion. Default is False.

  • as_csv (bool) – If True, saves the downloaded datasets as CSV files. Default is False, with Parquet as the default format.

Returns:

A DataFrame containing the requested data.

Return type:

Union[pd.DataFrame, pl.DataFrame, pl.LazyFrame]

pd_to_datetime_compat(ts, format='mixed', utc=True)[source]#
validate_dq_timestamp(ts, var_name=None, raise_error=True)[source]#

Validate a timestamp string for DataQuery API.

Return type:

bool

get_client_id_secret()[source]#

Retrieve client ID and secret from environment variables.

Return type:

Optional[Tuple[str, str]]

large_delta_file_datetimes(as_str=True)[source]#

Plausible file datetimes for large delta files, which are typically generated at the end of each month and on business month ends, with timestamps of end-of-day (23:59:59).

Return type:

List[str]

class SegmentedFileDownloader(filename, url, headers, params, proxies=None, chunk_size=8192, segment_size_mb=8.0, timeout=300.0, api_delay=0.04, api_delay_margin=1.1, headers_timeout=30.0, max_concurrent_downloads=None, max_file_retries=3, verify_ssl=True, start_download=False, debug=False)[source]#

Bases: object

A utility class to manage the multi-part, concurrent download of a single large file.

log(msg, part_num=None, level=20)[source]#

Logs a message with downloader-specific context.

download(retries=None)[source]#

Orchestrates the entire file download process, including retries.

Return type:

Path

cleanup()[source]#

Removes the temporary directory and all downloaded parts.

convert_ticker_based_parquet_file_to_qdf_pl(filename, compression='zstd', as_csv=False, qdf=True, keep_raw_data=False)[source]#
Return type:

None

lazy_load_from_parquets(files_dir, file_format='parquet', tickers=None, cids=None, xcats=None, metrics=None, start_date=None, end_date=None, dataframe_format='qdf', dataframe_type='pandas', categorical_dataframe=True, datasets=None, include_delta_files=False, include_metadata_files=False)[source]#
Return type:

DataFrame

class JPMaQSParquetSchemaKind(value, names=<not given>, *values, module=None, qualname=None, type=None, start=1, boundary=None)[source]#

Bases: Enum

TICKER = 'ticker'#
QDF = 'qdf'#