macrosynergy.management.utils.check_availability#

Module for checking the availability of data availabity from a Quantamental DataFrame. Includes functions for checking start years and end dates of a DataFrame, as well as visualizing the results.

check_availability(df, xcats=None, cids=None, start=None, start_size=None, end_size=None, start_years=True, missing_recent=True, use_last_businessday=True, title=None, title_fontsize=None, xcat_labels=None, sort_labels=False)[source]#

Wrapper for visualizing start and end dates of a filtered DataFrame.

Parameters:
  • df (pd.DataFrame) – standardized DataFrame with the following necessary columns: ‘cid’, ‘xcat’, ‘real_date’.

  • xcats (List[str]) – extended categories to be checked on. Default is all in the DataFrame.

  • cids (List[str]) – cross sections to be checked on. Default is all in the DataFrame.

  • start (str) – string representing earliest considered date. Default is None, which reverts to earliest date in the dataframe.

  • start_size (Tuple[float]) – tuple of floats with width / length of the start years heatmap. Default is None (format adjusted to data).

  • end_size (Tuple[float]) – tuple of floats with width/length of the end dates heatmap. Default is None (format adjusted to data).

  • start_years (bool) – boolean indicating whether or not to display a chart of starting years for each cross-section and indicator. Default is True (display start years).

  • missing_recent (bool) – boolean indicating whether or not to display a chart of missing date numbers for each cross-section and indicator. Default is True (display missing days).

  • use_last_businessday (bool) – boolean indicating whether or not to use the last business day before today as the end date. Default is True.

  • title (str) – A string to be used as the title of the heatmap. If None, a default header will be used.

  • title_fontsize (int) – Font size for the title of the heatmap. Default is None (automatic sizing).

  • xcat_labels (dict) – dictionary with xcat labels. Default is None (no labels).

  • sort_labels (bool) – boolean indicating whether to sort the xcats in the heatmap alphabetically. The sorting is done based on the xcats list, with the labels from xcat_labels simply used for display (not regarded for sorting at all). Default is False (no sorting, ordered as provided in xcats).

missing_in_df(df, xcats=None, cids=None)[source]#

Print missing cross-sections and categories

Parameters:
  • df (QuantamentalDataFrame) – standardized DataFrame with the following necessary columns: ‘cid’, ‘xcat’, ‘real_date’.

  • xcats (List[str]) – extended categories to be checked on. Default is all in the DataFrame.

  • cids (List[str]) – cross sections to be checked on. Default is all in the DataFrame.

check_startyears(df)[source]#

DataFrame with starting years across all extended categories and cross-sections

Parameters:

df (pd.DataFrame) – standardized DataFrame with the following necessary columns: ‘cid’, ‘xcat’, ‘real_date’.

Returns:

DataFrame consisting of starting years for all series.

Return type:

pd.DataFrame

check_enddates(df)[source]#

DataFrame with end dates across all extended categories and cross sections.

Parameters:

df (pd.DataFrame) – standardized DataFrame with the following necessary columns: ‘cid’, ‘xcat’, ‘real_date’.

Returns:

DataFrame consisting of end dates for all series.

Return type:

pd.DataFrame

business_day_dif(df, maxdate)[source]#

Number of business days between two respective business dates.

Parameters:
  • df (pd.DataFrame) – DataFrame cross-sections rows and category columns. Each cell in the DataFrame will correspond to the start date of the respective series.

  • maxdate (pd.Timestamp) – maximum release date found in the received DataFrame. In principle, all series should have values up until the respective business date. The difference will represent possible missing values.

Returns:

DataFrame consisting of business day differences for all series.

Return type:

pd.DataFrame

get_heatmap_row_order(xcats, xcat_labels=None)[source]#
Return type:

List[str]

visual_paneldates(df, size=None, use_last_businessday=True, title=None, row_order=None, title_fontsize=None)[source]#

Visualize panel dates with color codes.

Parameters:
  • df (pd.DataFrame) – DataFrame cross sections rows and category columns.

  • size (Tuple[float]) – tuple of floats with width/length of displayed heatmap.

  • use_last_businessday (bool) – boolean indicating whether or not to use the last business day before today as the end date. Default is True.

  • title (str) – A string to be used as the title of the heatmap. If None, a default header will be used.

  • title_fontsize (int) – Font size for the title of the heatmap. Default is None (automatic sizing).

  • row_order (List[str]) – A list of strings specifying the order of rows in the heatmap. These rows correspond to the columns of the input DataFrame. If None, the default order used by Seaborn will be applied.