macrosynergy.signal.signal_return_relations#
Module for analysing and visualizing signal and a return series.
- class SignalReturnRelations(df, rets=None, sigs=None, cids=None, sig_neg=None, cosp=False, start=None, end=None, blacklist=None, freqs='M', agg_sigs='last', fwin=1, slip=0, ms_panel_test=False, additional_metrics=None)[source]#
Bases:
objectClass for analysing and visualizing signals and return series. The class is designed to provide a comprehensive analysis of the relationship between signals and returns across different frequencies and aggregation methods. The class can be used to calculate and visualize the following metrics:
Accuracy
Balanced accuracy
Positive signal ratio
Positive return ratio
Positive precision
Negative precision
Pearson correlation
Pearson correlation p-value
Kendall correlation
Kendall correlation p-value
AUC
Macrosynergy Panel test
- Parameters:
df (DataFrame) – standardized DataFrame with the following necessary columns: ‘cid’, ‘xcat’, ‘real_date’ and ‘value.
rets (str, List[str]) – one or several target return categories.
sigs (str, List[str]) – list of signal categories to be considered for which detailed relational statistics can be calculated.
sig_neg (bool, List[bool]) – if set to True puts the signal in negative terms for all analysis. If more than one signal is tested, sig_neg must be an ordered list of the same length as the signals, containing a True for each signal that needs to be negative. Default is False.
cosp (bool) – If True the comparative statistics are calculated only for the “communal sample periods”, i.e. periods and cross-sections that have values for all compared signals. Default is False.
start (str) – earliest date in ISO format. Default is None in which case the earliest date available will be used.
end (str) – latest date in ISO format. Default is None in which case the latest date in the dataframe will be used.
blacklist (dict) – cross-sections with date ranges that should be excluded from the data frame. If one cross-section has several blacklist periods append numbers to the cross-section code.
freqs (str, List[str]) – letters denoting all frequencies at which the series may be sampled. This must be a selection of ‘D’, ‘W’, ‘M’, ‘Q’, ‘A’. Default is only ‘M’. The return series will always be summed over the sample period. The signal series will be aggregated according to the values of agg_sigs.
agg_sigs (str, List[str]) – aggregation method applied to the signal values in down-sampling. The default is “last”. Alternatives are “mean”, “median” and “sum”. If a single aggregation type is chosen for multiple signal categories it is applied to all of them.
fwin (int) – forward window of return category in base periods. Default is 1. This conceptually corresponds to the holding period of a position in accordance with the signal.
slip (int) – Default is 0, implied slippage of feature availability for relationship with the target category. See
macrosynergy.management.df_utils.apply_slip()for more information.ms_panel_test (bool) – if True the Macrosynergy Panel test is calculated. Please note that this is a very time-consuming operation and should be used only if you require the result.
additional_metrics (List[Callable]) – list of additional metrics to be calculated and added to the output table.
- accuracy_bars(ret=None, sigs=None, freq=None, agg_sig=None, view='cross_section', title=None, title_fontsize=16, size=None, legend_pos='best', x_labels=None, x_labels_rotate=0, return_fig=False, **kwargs)[source]#
Plot bar chart for the overall and balanced accuracy metrics. For types: cross_section and years.
- Parameters:
ret (str, optional) – return category. Default is None, in which case the first return category will be used.
sigs (str, or List[str], optional) – signal category. Default is None, in which case all signals will be used.
freq (str, optional) – frequency to be used in analysis. Default is None, in which case the first frequency will be used.
agg_sig (str, optional) – aggregation method to be used in analysis. Default is None, in which case the first aggregation method will be used.
view (str, optional) – type of segment over which bars are drawn. Either “cross_section” (default), “years” or “signals”.
title (str, optional) – chart header - default will be applied if none is chosen.
title_fontsize (int) – font size of chart header. Default is 16.
size (Tuple[float], optional) – 2-tuple of width and height of plot - default will be applied if none is chosen.
legend_pos (str) – position of legend box. Default is ‘best’. See the documentation of matplotlib.pyplot.legend.
x_labels (Dict[str]) – dictionary of x-axis labels. Default is None.
x_labels_rotate (int) – rotation of x-axis labels. Default is 0.
- correlation_bars(ret=None, sigs=None, freq=None, type='cross_section', title=None, title_fontsize=16, size=None, legend_pos='best', x_labels=None, x_labels_rotate=0, return_fig=False)[source]#
Plot correlation coefficients and significance. For types: cross_section and years.
- Parameters:
ret (str, optional) – return category. Default is the first return category.
sig (str, List[str], optional) – signal category. Default is the first signal category.
type (str, optional) – type of segment over which bars are drawn. Either “cross_section” (default), “years” or “signals”.
title (str, optional) – chart header. Default is None, in which case the default title will be applied.
title_fontsize (int) – font size of chart header. Default is 16.
size (Tuple[float, float], optional) – 2-tuple of width and height of plot. If None, the default size will be applied.
legend_pos (str) – position of legend box. Default is ‘best’. See matplotlib.pyplot.legend.
x_labels (Dict[str]) – dictionary of x-axis labels. Default is None.
x_labels_rotate (int) – rotation of x-axis labels. Default is 0.
- static apply_slip(df, slip, cids, xcats, metrics)[source]#
Function used to call the apply slip method that is defined in macrosynergy.management.df_utils.
- static is_list_of_strings(variable)[source]#
Function used to test whether a variable is a list of strings, to avoid the compiler saying a string is a list of characters.
- Parameters:
variable (Any) – variable to be tested.
- Returns:
True if variable is a list of strings, False otherwise.
- Return type:
- manipulate_df(xcats, freq, agg_sig)[source]#
Used to manipulate the DataFrame to the desired format for the analysis. Firstly reduces the dataframe to only include data outside of the blacklist and data that is relevant to xcat and sig. Then applies the slip to the dataframe. It then converts the dataframe to the desired format for the analysis and checks whether any negative signs should be introduced.
- calculate_single_stat(stat, ret=None, sig=None, type=None)[source]#
Calculates a single statistic for a given signal-return relation.
- Parameters:
- Returns:
statistic value.
- Return type:
- summary_table(cross_section=False, years=False)[source]#
Generates a summary table for the signal-return relations.
- cross_section_table()[source]#
Deprecated method for cross-section table. Use single_relation_table instead. Shows a table of category values across cross-sections for a given date.
- yearly_table()[source]#
Deprecated method for yearly table. Use single_relation_table instead. Displays annual average values of selected categories across cross-sections.
- single_relation_table(ret=None, xcat=None, freq=None, agg_sigs=None, table_type=None)[source]#
Computes all the statistics for one specific signal-return relation:
- Parameters:
ret (str) – single target return category. Default is first in target return list of the class.
xcat (str) – single signal category to be considered. Default is first in feature category list of the class.
freq (str) – letter denoting single frequency at which the series will be sampled. This must be one of the frequencies selected for the class. If not specified uses the freq stored in the class.
agg_sigs (str) – aggregation method applied to the signal values in down-sampling.
table_type (str) – type of table to be returned. Either “summary”, “years”, “cross_section”.
- Returns:
table with the statistics for the single signal-return relation.
- Return type:
- multiple_relations_table(rets=None, xcats=None, freqs=None, agg_sigs=None, signal_name_dict=None, return_name_dict=None)[source]#
Calculates all the statistics for each return and signal category specified with each frequency and aggregation method, note that if none are defined it does this for all categories, frequencies and aggregation methods that were stored in the class.
- single_statistic_table(stat, type='panel', rows=['xcat', 'agg_sigs'], columns=['ret', 'freq'], show_heatmap=False, title=None, title_fontsize=16, row_names=None, column_names=None, signal_name_dict=None, return_name_dict=None, xcat_labels=None, freq_labels=None, agg_sigs_labels=None, min_color=None, max_color=None, figsize=(14, 8), annotate=True, round=3, pval_stat=None, round_pval=3, significance_threshold=0.9, xlabel=None, ylabel=None, collapse_constant_levels=False, axis_label_levels=None, footnote=None, footnote_fontsize=10)[source]#
Creates a table which shows the specified statistic for each row and column specified as arguments:
- Parameters:
stat (str) – type of statistic to be displayed (this can be any of the column names of summary_table).
type (str) – type of the statistic displayed. This can be based on the overall panel (“panel”, default), an average of annual panels (mean_years), an average of cross-sectional relations (“mean_cids”), the positive ratio across years(“pr_years”), positive ratio across sections (“pr_cids”).
rows (List[str]) – row indices, which can be return categories, feature categories, frequencies and/or aggregations. The choice is made through a list of one or more of “xcat”, “ret”, “freq” and “agg_sigs”. The default is [“xcat”, “agg_sigs”] resulting in index strings (<agg_signs>) or if only one aggregation is available.
columns (List[str]) – column indices, which can be return categories, feature categories, frequencies and/or aggregations. The choice is made through a list of one or more of “xcat”, “ret”, “freq” and “agg_sigs”. The default is [“ret”, “freq] resulting in index strings () or if only one frequency is available.
show_heatmap (bool) – if True, the table is visualized as a heatmap. Default is False.
title (str, optional) – plot title. Default is None in which case the default title is used.
title_fontsize (int) – font size of title. Default is 16.
row_names (List[str]) – specifies the labels of rows in the heatmap. Default is None, the indices of the generated DataFrame are used.
column_names (List[str]) – specifies the labels of columns in the heatmap. Default is None, the columns of the generated DataFrame are used.
signal_name_dict (dict, optional) – dictionary mapping the signal names to the desired names in the heatmap. Default is None, in which case the signal names are used. Renamed values flow through to the auto axis label produced by the constant-level collapse described under
ylabel.return_name_dict (dict, optional) – dictionary mapping the return names to the desired names in the heatmap. Default is None, in which case the return names are used. Renamed values flow through to the auto axis label produced by the constant-level collapse described under
xlabel.xcat_labels (dict, optional) – Unified rename dictionary covering both signal and return
xcats. Internally split by membership inself.sigs/self.retsand routed throughsignal_name_dict/return_name_dict; xcats not listed in the dict are kept verbatim. Mutually exclusive with the two legacy kwargs — pass eitherxcat_labelsorsignal_name_dict/return_name_dict, not both. Default is None (no rename).freq_labels (dict, optional) – Mapping from frequency code (
"M","Q", …) to the display label used on the heatmap and in the auto axis label produced by the constant-level collapse. Frequencies not listed in the dict are kept verbatim. Default is None (raw codes are shown).agg_sigs_labels (dict, optional) – Mapping from aggregation code (
"last","mean", …) to the display label used on the heatmap and in the auto axis label produced by the constant-level collapse. Aggregations not listed in the dict are kept verbatim. Default is None (raw codes are shown).min_color (float, optional) – minimum value of the color scale. Default is None, in which case the minimum value of the table is used.
max_color (float, optional) – maximum value of the color scale. Default is None, in which case the maximum value of the table is used.
figsize (Tuple[float, float]) – Tuple (w, h) of width and height of graph. Default is (14, 8).
annotate (bool) – Default is True, where the values shown in the heatmap are annotated.
round (int) – number of decimals to round the primary statistic to in the heatmap annotations. Default is 3.
pval_stat (str, optional) – name of a p-value statistic — typically
"kendall_pval","pearson_pval"or"map_pval"(the Macrosynergy Panel test). When set, each heatmap cell shows the probability of significance,1 - pval_stat, in brackets beneath the primary statistic. Default is None. Whenpval_stat="map_pval"the SignalReturnRelations must have been constructed withms_panel_test=True.round_pval (int) – number of decimals to round the bracketed probability of significance to in the heatmap annotations. Default is 3.
significance_threshold (float, optional) – probability-of-significance cutoff above which a cell’s annotation is rendered in black and bold. Compared directly against the bracketed value (
1 - pval_stat), so 0.9 highlights cells whose probability of significance exceeds 0.9 (equivalently, raw p-value below 0.1). Only takes effect whenpval_statis set. PassNoneto disable. Default is 0.9.xlabel (str, optional) – Label drawn beneath the heatmap columns, useful for naming the target return (e.g.
"Forward return (target)"). Default is None. Whencollapse_constant_levels=Trueand the caller leaves this None, any column-index levels whose values are constant across the table are auto-collapsed into this label (joined by" · "). Seeaxis_label_levelsto restrict which constant levels feed into the label.ylabel (str, optional) – Label drawn beside the heatmap rows, useful for naming the feature (e.g.
"Factor (feature)"). Default is None. Whencollapse_constant_levels=Trueand the caller leaves this None, any row-index levels whose values are constant across the table are auto-collapsed into this label (joined by" · "). For instance, a table whose rows iterate over one signal, one aggregation, and several frequencies will display only the frequencies as y-tick labels and place"<signal> · <aggregation>"on the y-axis label. Seeaxis_label_levelsto restrict which constant levels feed into the label.collapse_constant_levels (bool, optional) – When True, row/column index levels whose values are constant across the table are stripped from the tick labels and promoted to the corresponding axis label (joined by
" · ") when the caller did not passxlabel/ylabel(orrow_names/column_names) explicitly. The returned DataFrame is unchanged in every case. Default is False (raw MultiIndex tuples appear as tick labels, matching the historical rendering). Required to be True before passingaxis_label_levels.axis_label_levels (List[str], optional) – Subset of
["xcat", "ret", "freq", "agg_sigs"]naming the level keys eligible for promotion into the auto x/y-axis label. Constant levels not in this list still collapse from the tick labels but do not appear in the axis label. Only takes effect whencollapse_constant_levels=True; raisesValueErrorotherwise. Default is None, which promotes every collapsed level into the label. Pass e.g.["xcat", "ret"]to keep the auto-label limited to the signal/return identity and drop the aggregation/frequency suffix.footnote (str, optional) – Free-text caption rendered below the heatmap. Useful for recording the significance test, panel scope, or annotation legend (e.g.
"Significance computed with the Macrosynergy panel test."). Multi-line strings are supported. Default is None (no footnote).footnote_fontsize (int, optional) – Font size for the footnote text. Default is 10.
- Returns:
DataFrame with the specified statistic for each row and column.
- Return type:
- show_single_statistic_table(*args, **kwargs)[source]#
Return the single statistic table without rendering a heatmap.
Thin wrapper around
single_statistic_table()that forcesshow_heatmap=False.- Parameters:
stat (str) – type of statistic to be displayed (this can be any of the column names of summary_table).
type (str) – type of the statistic displayed. This can be based on the overall panel (“panel”, default), an average of annual panels (mean_years), an average of cross-sectional relations (“mean_cids”), the positive ratio across years(“pr_years”), positive ratio across sections (“pr_cids”).
rows (List[str]) – row indices, which can be return categories, feature categories, frequencies and/or aggregations. The choice is made through a list of one or more of “xcat”, “ret”, “freq” and “agg_sigs”. The default is [“xcat”, “agg_sigs”] resulting in index strings (<agg_signs>) or if only one aggregation is available.
columns (List[str]) – column indices, which can be return categories, feature categories, frequencies and/or aggregations. The choice is made through a list of one or more of “xcat”, “ret”, “freq” and “agg_sigs”. The default is [“ret”, “freq] resulting in index strings () or if only one frequency is available.
title (str, optional) – plot title. Default is None in which case the default title is used.
title_fontsize (int) – font size of title. Default is 16.
row_names (List[str]) – specifies the labels of rows in the heatmap. Default is None, the indices of the generated DataFrame are used.
column_names (List[str]) – specifies the labels of columns in the heatmap. Default is None, the columns of the generated DataFrame are used.
signal_name_dict (dict, optional) – dictionary mapping the signal names to the desired names in the heatmap. Default is None, in which case the signal names are used. Renamed values flow through to the auto axis label produced by the constant-level collapse described under
ylabel.return_name_dict (dict, optional) – dictionary mapping the return names to the desired names in the heatmap. Default is None, in which case the return names are used. Renamed values flow through to the auto axis label produced by the constant-level collapse described under
xlabel.xcat_labels (dict, optional) – Unified rename dictionary covering both signal and return
xcats. Internally split by membership inself.sigs/self.retsand routed throughsignal_name_dict/return_name_dict; xcats not listed in the dict are kept verbatim. Mutually exclusive with the two legacy kwargs — pass eitherxcat_labelsorsignal_name_dict/return_name_dict, not both. Default is None (no rename).freq_labels (dict, optional) – Mapping from frequency code (
"M","Q", …) to its display label. Frequencies not listed in the dict are kept verbatim. Default is None.agg_sigs_labels (dict, optional) – Mapping from aggregation code (
"last","mean", …) to its display label. Aggregations not listed in the dict are kept verbatim. Default is None.min_color (float, optional) – minimum value of the color scale. Default is None, in which case the minimum value of the table is used.
max_color (float, optional) – maximum value of the color scale. Default is None, in which case the maximum value of the table is used.
figsize (Tuple[float, float]) – Tuple (w, h) of width and height of graph. Default is (14, 8).
annotate (bool) – Default is True, where the values shown in the heatmap are annotated.
round (int) – number of decimals to round the primary statistic to in the heatmap annotations. Default is 3.
pval_stat (str, optional) – name of a p-value statistic — typically
"kendall_pval","pearson_pval"or"map_pval"(the Macrosynergy Panel test). When set, each heatmap cell shows the probability of significance,1 - pval_stat, in brackets beneath the primary statistic. Default is None. Whenpval_stat="map_pval"the SignalReturnRelations must have been constructed withms_panel_test=True.round_pval (int) – number of decimals to round the bracketed probability of significance to in the heatmap annotations. Default is 3.
significance_threshold (float, optional) – probability-of-significance cutoff above which a cell’s annotation is rendered in black and bold. Compared directly against the bracketed value (
1 - pval_stat), so 0.9 highlights cells whose probability of significance exceeds 0.9 (equivalently, raw p-value below 0.1). Only takes effect whenpval_statis set. PassNoneto disable. Default is 0.9.xlabel – Forwarded to
single_statistic_table()and only affect the heatmap; accepted here for API symmetry even though this wrapper renders no heatmap.ylabel – Forwarded to
single_statistic_table()and only affect the heatmap; accepted here for API symmetry even though this wrapper renders no heatmap.footnote – Forwarded to
single_statistic_table()and only affect the heatmap; accepted here for API symmetry even though this wrapper renders no heatmap.footnote_fontsize – Forwarded to
single_statistic_table()and only affect the heatmap; accepted here for API symmetry even though this wrapper renders no heatmap.
- Returns:
DataFrame with the specified statistic for each row and column.
- Return type:
- plot_single_statistic_heatmap(*args, **kwargs)[source]#
Render the heatmap of the single statistic table.
Thin wrapper around
single_statistic_table()that forcesshow_heatmap=True. The computed table itself is not returned.- Parameters:
stat (str) – type of statistic to be displayed (this can be any of the column names of summary_table).
type (str) – type of the statistic displayed. This can be based on the overall panel (“panel”, default), an average of annual panels (mean_years), an average of cross-sectional relations (“mean_cids”), the positive ratio across years(“pr_years”), positive ratio across sections (“pr_cids”).
rows (List[str]) – row indices, which can be return categories, feature categories, frequencies and/or aggregations. The choice is made through a list of one or more of “xcat”, “ret”, “freq” and “agg_sigs”. The default is [“xcat”, “agg_sigs”] resulting in index strings (<agg_signs>) or if only one aggregation is available.
columns (List[str]) – column indices, which can be return categories, feature categories, frequencies and/or aggregations. The choice is made through a list of one or more of “xcat”, “ret”, “freq” and “agg_sigs”. The default is [“ret”, “freq] resulting in index strings () or if only one frequency is available.
show_heatmap (bool) – not allowed; this wrapper always forces
show_heatmap=Trueand any value supplied by the caller is overridden.title (str, optional) – plot title. Default is None in which case the default title is used.
title_fontsize (int) – font size of title. Default is 16.
row_names (List[str]) – specifies the labels of rows in the heatmap. Default is None, the indices of the generated DataFrame are used.
column_names (List[str]) – specifies the labels of columns in the heatmap. Default is None, the columns of the generated DataFrame are used.
signal_name_dict (dict, optional) – dictionary mapping the signal names to the desired names in the heatmap. Default is None, in which case the signal names are used. Renamed values flow through to the auto axis label produced by the constant-level collapse described under
ylabel.return_name_dict (dict, optional) – dictionary mapping the return names to the desired names in the heatmap. Default is None, in which case the return names are used. Renamed values flow through to the auto axis label produced by the constant-level collapse described under
xlabel.xcat_labels (dict, optional) – Unified rename dictionary covering both signal and return
xcats. Internally split by membership inself.sigs/self.retsand routed throughsignal_name_dict/return_name_dict; xcats not listed in the dict are kept verbatim. Mutually exclusive with the two legacy kwargs — pass eitherxcat_labelsorsignal_name_dict/return_name_dict, not both. Default is None (no rename).freq_labels (dict, optional) – Mapping from frequency code (
"M","Q", …) to its display label. Frequencies not listed in the dict are kept verbatim. Default is None.agg_sigs_labels (dict, optional) – Mapping from aggregation code (
"last","mean", …) to its display label. Aggregations not listed in the dict are kept verbatim. Default is None.min_color (float, optional) – minimum value of the color scale. Default is None, in which case the minimum value of the table is used.
max_color (float, optional) – maximum value of the color scale. Default is None, in which case the maximum value of the table is used.
figsize (Tuple[float, float]) – Tuple (w, h) of width and height of graph. Default is (14, 8).
annotate (bool) – Default is True, where the values shown in the heatmap are annotated.
round (int) – number of decimals to round the primary statistic to in the heatmap annotations. Default is 3.
pval_stat (str, optional) – name of a p-value statistic — typically
"kendall_pval","pearson_pval"or"map_pval"(the Macrosynergy Panel test). When set, each heatmap cell shows the probability of significance,1 - pval_stat, in brackets beneath the primary statistic. Default is None. Whenpval_stat="map_pval"the SignalReturnRelations must have been constructed withms_panel_test=True.round_pval (int) – number of decimals to round the bracketed probability of significance to in the heatmap annotations. Default is 3.
significance_threshold (float, optional) – probability-of-significance cutoff above which a cell’s annotation is rendered in black and bold. Compared directly against the bracketed value (
1 - pval_stat), so 0.9 highlights cells whose probability of significance exceeds 0.9 (equivalently, raw p-value below 0.1). Only takes effect whenpval_statis set. PassNoneto disable. Default is 0.9.xlabel (str, optional) – Label drawn beneath the heatmap columns. Default is None.
ylabel (str, optional) – Label drawn beside the heatmap rows. Default is None.
footnote (str, optional) – Free-text caption rendered below the heatmap. Useful for recording the significance test, panel scope, or annotation legend. Multi-line strings are supported. Default is None.
footnote_fontsize (int, optional) – Font size for the footnote text. Default is 10.
- Return type:
- set_df_labels(rows_dict, rows, columns)[source]#
Creates two lists of strings that will be used as the row and column labels for the resulting dataframe.
- Parameters:
rows_dict (dict) – dictionary containing the each value for each of the xcat, ret, freq and agg_sigs categories.
rows (List[str]) – list of strings specifying which of the categories are included in the rows of the dataframe.
columns (List[str]) – list of strings specifying which of the categories are included in the columns of the dataframe.