macrosynergy.download.dataquery_file_api.file_selector#

class FileSelector(api_files_df, local_files_df, file_name_col='file-name', tickers=None, catalog_file=None, case_sensitive=False)[source]#

Bases: object

Helper class to reconcile API vs local file inventories.

refresh(*, api_files_df=None, local_files_df=None)[source]#

Refresh cached API and/or local inventories in-place.

This is intended for reusing a single FileSelector instance across multiple selection operations (for example when the client downloads files and the local inventory changes).

Return type:

None

effective_snapshot_switchover_ts(*, file_group_ids, catalog_file_group_id=None)[source]#

Return the effective (per-request) earliest full-snapshot timestamp.

Notes

JPMaQS can remove older full snapshots over time. For a given set of datasets we define the “switchover” as the latest of the datasets’ earliest currently available full snapshots. If any dataset has no full snapshots at all, returns None.

Return type:

Optional[Timestamp]

select_files_for_download(overwrite=False, since_datetime=None, to_datetime=None, file_group_ids=None, include_full_snapshots=True, include_delta_files=True, include_metadata_files=False, warn_if_no_full_snapshots=False, last_modified_col='last-modified', min_last_updated=None, max_last_updated=None)[source]#

Select API file-name(s) required for a load vintage that are missing/outdated locally.

Return type:

List[str]

select_files_for_load(since_datetime=None, to_datetime=None, include_delta_files=True, warn_if_no_full_snapshots=False, min_last_updated=None, max_last_updated=None)[source]#

Select local snapshot/delta files to load from disk (drops rows without a valid file path).

Return type:

DataFrame

oldest_api_file_timestamp()[source]#

Return the oldest file timestamp present in the API inventory (UTC).

Assumes api_files_df is an unfiltered API inventory (full history), as provided by DataQueryFileAPIClient.list_available_files_for_all_file_groups().

Return type:

Optional[Timestamp]