macrosynergy.management.utils.check_availability#
Module for checking the availability of data availabity from a Quantamental DataFrame. Includes functions for checking start years and end dates of a DataFrame, as well as visualizing the results.
- check_availability(df, xcats=None, cids=None, start=None, start_size=None, end_size=None, start_years=True, missing_recent=True, use_last_businessday=True, title=None, xcat_labels=None)[source]#
Wrapper for visualizing start and end dates of a filtered DataFrame.
- Parameters:
df (pd.DataFrame) – standardized DataFrame with the following necessary columns: ‘cid’, ‘xcat’, ‘real_date’.
xcats (List[str]) – extended categories to be checked on. Default is all in the DataFrame.
cids (List[str]) – cross sections to be checked on. Default is all in the DataFrame.
start (str) – string representing earliest considered date. Default is None, which reverts to earliest date in the dataframe.
start_size (Tuple[float]) – tuple of floats with width / length of the start years heatmap. Default is None (format adjusted to data).
end_size (Tuple[float]) – tuple of floats with width/length of the end dates heatmap. Default is None (format adjusted to data).
start_years (bool) – boolean indicating whether or not to display a chart of starting years for each cross-section and indicator. Default is True (display start years).
missing_recent (bool) – boolean indicating whether or not to display a chart of missing date numbers for each cross-section and indicator. Default is True (display missing days).
use_last_businessday (bool) – boolean indicating whether or not to use the last business day before today as the end date. Default is True.
xcat_labels (dict) – dictionary with xcat labels. Default is None (no labels).
- missing_in_df(df, xcats=None, cids=None)[source]#
Print missing cross-sections and categories
- Parameters:
df (QuantamentalDataFrame) – standardized DataFrame with the following necessary columns: ‘cid’, ‘xcat’, ‘real_date’.
xcats (List[str]) – extended categories to be checked on. Default is all in the DataFrame.
cids (List[str]) – cross sections to be checked on. Default is all in the DataFrame.
- check_startyears(df)[source]#
DataFrame with starting years across all extended categories and cross-sections
- Parameters:
df (pd.DataFrame) – standardized DataFrame with the following necessary columns: ‘cid’, ‘xcat’, ‘real_date’.
- Returns:
DataFrame consisting of starting years for all series.
- Return type:
pd.DataFrame
- check_enddates(df)[source]#
DataFrame with end dates across all extended categories and cross sections.
- Parameters:
df (pd.DataFrame) – standardized DataFrame with the following necessary columns: ‘cid’, ‘xcat’, ‘real_date’.
- Returns:
DataFrame consisting of end dates for all series.
- Return type:
pd.DataFrame
- business_day_dif(df, maxdate)[source]#
Number of business days between two respective business dates.
- Parameters:
df (pd.DataFrame) – DataFrame cross-sections rows and category columns. Each cell in the DataFrame will correspond to the start date of the respective series.
maxdate (pd.Timestamp) – maximum release date found in the received DataFrame. In principle, all series should have values up until the respective business date. The difference will represent possible missing values.
- Returns:
DataFrame consisting of business day differences for all series.
- Return type:
pd.DataFrame