macrosynergy.signal.signal_return_relations#

Module for analysing and visualizing signal and a return series.

class SignalReturnRelations(df, rets=None, sigs=None, cids=None, sig_neg=None, cosp=False, start=None, end=None, blacklist=None, freqs='M', agg_sigs='last', fwin=1, slip=0, ms_panel_test=False, additional_metrics=None)[source]#

Bases: object

Class for analysing and visualizing signals and return series. The class is designed to provide a comprehensive analysis of the relationship between signals and returns across different frequencies and aggregation methods. The class can be used to calculate and visualize the following metrics:

  • Accuracy

  • Balanced accuracy

  • Positive signal ratio

  • Positive return ratio

  • Positive precision

  • Negative precision

  • Pearson correlation

  • Pearson correlation p-value

  • Kendall correlation

  • Kendall correlation p-value

  • AUC

  • Macrosynergy Panel test

Parameters:
  • df (DataFrame) – standardized DataFrame with the following necessary columns: ‘cid’, ‘xcat’, ‘real_date’ and ‘value.

  • rets (str, List[str]) – one or several target return categories.

  • sigs (str, List[str]) – list of signal categories to be considered for which detailed relational statistics can be calculated.

  • sig_neg (bool, List[bool]) – if set to True puts the signal in negative terms for all analysis. If more than one signal is tested, sig_neg must be an ordered list of the same length as the signals, containing a True for each signal that needs to be negative. Default is False.

  • cosp (bool) – If True the comparative statistics are calculated only for the “communal sample periods”, i.e. periods and cross-sections that have values for all compared signals. Default is False.

  • start (str) – earliest date in ISO format. Default is None in which case the earliest date available will be used.

  • end (str) – latest date in ISO format. Default is None in which case the latest date in the dataframe will be used.

  • blacklist (dict) – cross-sections with date ranges that should be excluded from the data frame. If one cross-section has several blacklist periods append numbers to the cross-section code.

  • freqs (str, List[str]) – letters denoting all frequencies at which the series may be sampled. This must be a selection of ‘D’, ‘W’, ‘M’, ‘Q’, ‘A’. Default is only ‘M’. The return series will always be summed over the sample period. The signal series will be aggregated according to the values of agg_sigs.

  • agg_sigs (str, List[str]) – aggregation method applied to the signal values in down-sampling. The default is “last”. Alternatives are “mean”, “median” and “sum”. If a single aggregation type is chosen for multiple signal categories it is applied to all of them.

  • fwin (int) – forward window of return category in base periods. Default is 1. This conceptually corresponds to the holding period of a position in accordance with the signal.

  • slip (int) – Default is 0, implied slippage of feature availability for relationship with the target category. See macrosynergy.management.df_utils.apply_slip() for more information.

  • ms_panel_test (bool) – if True the Macrosynergy Panel test is calculated. Please note that this is a very time-consuming operation and should be used only if you require the result.

  • additional_metrics (List[Callable]) – list of additional metrics to be calculated and added to the output table.

accuracy_bars(ret=None, sigs=None, freq=None, agg_sig=None, type='cross_section', title=None, title_fontsize=16, size=None, legend_pos='best', x_labels=None, x_labels_rotate=0)[source]#

Plot bar chart for the overall and balanced accuracy metrics. For types: cross_section and years.

Parameters:
  • ret (str, optional) – return category. Default is None, in which case the first return category will be used.

  • sigs (str, or List[str], optional) – signal category. Default is None, in which case all signals will be used.

  • freq (str, optional) – frequency to be used in analysis. Default is None, in which case the first frequency will be used.

  • agg_sig (str, optional) – aggregation method to be used in analysis. Default is None, in which case the first aggregation method will be used.

  • type (str, optional) – type of segment over which bars are drawn. Either “cross_section” (default), “years” or “signals”.

  • title (str, optional) – chart header - default will be applied if none is chosen.

  • title_fontsize (int) – font size of chart header. Default is 16.

  • size (Tuple[float], optional) – 2-tuple of width and height of plot - default will be applied if none is chosen.

  • legend_pos (str) – position of legend box. Default is ‘best’. See the documentation of matplotlib.pyplot.legend.

  • x_labels (Dict[str]) – dictionary of x-axis labels. Default is None.

  • x_labels_rotate (int) – rotation of x-axis labels. Default is 0.

correlation_bars(ret=None, sigs=None, freq=None, type='cross_section', title=None, title_fontsize=16, size=None, legend_pos='best', x_labels=None, x_labels_rotate=0)[source]#

Plot correlation coefficients and significance. For types: cross_section and years.

Parameters:
  • ret (str, optional) – return category. Default is the first return category.

  • sig (str, List[str], optional) – signal category. Default is the first signal category.

  • type (str, optional) – type of segment over which bars are drawn. Either “cross_section” (default), “years” or “signals”.

  • title (str, optional) – chart header. Default is None, in which case the default title will be applied.

  • title_fontsize (int) – font size of chart header. Default is 16.

  • size (Tuple[float, float], optional) – 2-tuple of width and height of plot. If None, the default size will be applied.

  • legend_pos (str) – position of legend box. Default is ‘best’. See matplotlib.pyplot.legend.

  • x_labels (Dict[str]) – dictionary of x-axis labels. Default is None.

  • x_labels_rotate (int) – rotation of x-axis labels. Default is 0.

static apply_slip(df, slip, cids, xcats, metrics)[source]#

Function used to call the apply slip method that is defined in macrosynergy.management.df_utils.

Parameters:
  • df (DataFrame) – standardised DataFrame.

  • slip (int) – slip value to apply to df.

  • cids (List[str]) – list of cids in df to apply slip.

  • xcats (List[str]) – list of xcats in df to apply slip.

  • metrics (List[str]) – list of metrics in df to apply slip.

Return type:

DataFrame

static is_list_of_strings(variable)[source]#

Function used to test whether a variable is a list of strings, to avoid the compiler saying a string is a list of characters.

Parameters:

variable (Any) – variable to be tested.

Returns:

True if variable is a list of strings, False otherwise.

Return type:

bool

manipulate_df(xcats, freq, agg_sig)[source]#

Used to manipulate the DataFrame to the desired format for the analysis. Firstly reduces the dataframe to only include data outside of the blacklist and data that is relevant to xcat and sig. Then applies the slip to the dataframe. It then converts the dataframe to the desired format for the analysis and checks whether any negative signs should be introduced.

Parameters:
  • xcats (List[str]) – list of xcats in df to apply slip.

  • freq (str) – frequency to be used in analysis.

  • agg_sig (str) – aggregation method to be used in analysis.

map_pval(ret_vals, sig_vals)[source]#

Calculates the p-value using statsmodels MixedLM.

Parameters:
  • ret_vals (Series) – return values.

  • sig_vals (Series) – signal values.

Returns:

p-value of the MixedLM model.

Return type:

float

calculate_single_stat(stat, ret=None, sig=None, type=None)[source]#

Calculates a single statistic for a given signal-return relation.

Parameters:
  • stat (str) – statistic to be calculated.

  • ret (str) – return category. Default is the first return category.

  • sig (str) – signal category. Default is the first signal category.

  • type (str) – type of segment over which bars are drawn. Either “panel” (default), “years” or “signals”.

Returns:

statistic value.

Return type:

float

summary_table(cross_section=False, years=False)[source]#

Generates a summary table for the signal-return relations.

Parameters:
  • cross_section (bool) – if True, the summary table will be generated for cross-sections.

  • years (bool) – if True, the summary table will be generated for years. Must be False if cross_section is True.

Returns:

summary table.

Return type:

DataFrame

signals_table(sigs=None)[source]#
cross_section_table()[source]#
yearly_table()[source]#
single_relation_table(ret=None, xcat=None, freq=None, agg_sigs=None, table_type=None)[source]#

Computes all the statistics for one specific signal-return relation:

Parameters:
  • ret (str) – single target return category. Default is first in target return list of the class.

  • xcat (str) – single signal category to be considered. Default is first in feature category list of the class.

  • freq (str) – letter denoting single frequency at which the series will be sampled. This must be one of the frequencies selected for the class. If not specified uses the freq stored in the class.

  • agg_sigs (str) – aggregation method applied to the signal values in down-sampling.

  • table_type (str) – type of table to be returned. Either “summary”, “years”, “cross_section”.

Returns:

table with the statistics for the single signal-return relation.

Return type:

DataFrame

reindex_multindex_df(df, desired_order, var_type)[source]#
multiple_relations_table(rets=None, xcats=None, freqs=None, agg_sigs=None, signal_name_dict=None, return_name_dict=None)[source]#

Calculates all the statistics for each return and signal category specified with each frequency and aggregation method, note that if none are defined it does this for all categories, frequencies and aggregation methods that were stored in the class.

Parameters:
  • rets (str, List[str]) – target return category

  • xcats (str, List[str]) – signal categories to be considered

  • freqs (str, List[str]) – letters denoting frequency at which the series are to be sampled. This must be one of ‘D’, ‘W’, ‘M’, ‘Q’, ‘A’. If not specified uses the freq stored in the class.

  • agg_sigs (str, List[str]) – aggregation methods applied to the signal values in down-sampling.

single_statistic_table(stat, type='panel', rows=['xcat', 'agg_sigs'], columns=['ret', 'freq'], show_heatmap=False, title=None, title_fontsize=16, row_names=None, column_names=None, signal_name_dict=None, return_name_dict=None, min_color=None, max_color=None, figsize=(14, 8), annotate=True, round=5)[source]#

Creates a table which shows the specified statistic for each row and column specified as arguments:

Parameters:
  • stat (str) – type of statistic to be displayed (this can be any of the column names of summary_table).

  • type (str) – type of the statistic displayed. This can be based on the overall panel (“panel”, default), an average of annual panels (mean_years), an average of cross-sectional relations (“mean_cids”), the positive ratio across years(“pr_years”), positive ratio across sections (“pr_cids”).

  • rows (List[str]) – row indices, which can be return categories, feature categories, frequencies and/or aggregations. The choice is made through a list of one or more of “xcat”, “ret”, “freq” and “agg_sigs”. The default is [“xcat”, “agg_sigs”] resulting in index strings (<agg_signs>) or if only one aggregation is available.

  • columns (List[str]) – column indices, which can be return categories, feature categories, frequencies and/or aggregations. The choice is made through a list of one or more of “xcat”, “ret”, “freq” and “agg_sigs”. The default is [“ret”, “freq] resulting in index strings () or if only one frequency is available.

  • show_heatmap (bool) – if True, the table is visualized as a heatmap. Default is False.

  • title (str, optional) – plot title. Default is None in which case the default title is used.

  • title_fontsize (int) – font size of title. Default is 16.

  • row_names (List[str]) – specifies the labels of rows in the heatmap. Default is None, the indices of the generated DataFrame are used.

  • column_names (List[str]) – specifies the labels of columns in the heatmap. Default is None, the columns of the generated DataFrame are used.

  • signal_name_dict (dict, optional) – dictionary mapping the signal names to the desired names in the heatmap. Default is None, in which case the signal names are used.

  • return_name_dict (dict, optional) – dictionary mapping the return names to the desired names in the heatmap. Default is None, in which case the return names are used.

  • min_color (float, optional) – minimum value of the color scale. Default is None, in which case the minimum value of the table is used.

  • max_color (float, optional) – maximum value of the color scale. Default is None, in which case the maximum value of the table is used.

  • figsize (Tuple[float, float]) – Tuple (w, h) of width and height of graph. Default is (14, 8).

  • annotate (bool) – Default is True, where the values shown in the heatmap are annotated.

  • round (int) – number of decimals to round the values to on the heatmap’s annotations.

Returns:

DataFrame with the specified statistic for each row and column.

Return type:

DataFrame

set_df_labels(rows_dict, rows, columns)[source]#

Creates two lists of strings that will be used as the row and column labels for the resulting dataframe.

Parameters:
  • rows_dict (dict) – dictionary containing the each value for each of the xcat, ret, freq and agg_sigs categories.

  • rows (List[str]) – list of strings specifying which of the categories are included in the rows of the dataframe.

  • columns (List[str]) – list of strings specifying which of the categories are included in the columns of the dataframe.

get_rowcol(hash, rowcols)[source]#

Calculates which row/column the hash belongs to.

Parameters:
  • hash (str) – hash of the statistic.

  • rowcols (List[str]) – list of strings specifying which of the categories are in the rows/columns of the dataframe.