macrosynergy.signal.signal_return_relations#
Module for analysing and visualizing signal and a return series.
- class SignalReturnRelations(df, rets=None, sigs=None, cids=None, sig_neg=None, cosp=False, start=None, end=None, blacklist=None, freqs='M', agg_sigs='last', fwin=1, slip=0, ms_panel_test=False, additional_metrics=None)[source]#
Bases:
object
Class for analysing and visualizing signals and return series. The class is designed to provide a comprehensive analysis of the relationship between signals and returns across different frequencies and aggregation methods. The class can be used to calculate and visualize the following metrics:
Accuracy
Balanced accuracy
Positive signal ratio
Positive return ratio
Positive precision
Negative precision
Pearson correlation
Pearson correlation p-value
Kendall correlation
Kendall correlation p-value
AUC
Macrosynergy Panel test
- Parameters:
df (DataFrame) – standardized DataFrame with the following necessary columns: ‘cid’, ‘xcat’, ‘real_date’ and ‘value.
rets (str, List[str]) – one or several target return categories.
sigs (str, List[str]) – list of signal categories to be considered for which detailed relational statistics can be calculated.
sig_neg (bool, List[bool]) – if set to True puts the signal in negative terms for all analysis. If more than one signal is tested, sig_neg must be an ordered list of the same length as the signals, containing a True for each signal that needs to be negative. Default is False.
cosp (bool) – If True the comparative statistics are calculated only for the “communal sample periods”, i.e. periods and cross-sections that have values for all compared signals. Default is False.
start (str) – earliest date in ISO format. Default is None in which case the earliest date available will be used.
end (str) – latest date in ISO format. Default is None in which case the latest date in the dataframe will be used.
blacklist (dict) – cross-sections with date ranges that should be excluded from the data frame. If one cross-section has several blacklist periods append numbers to the cross-section code.
freqs (str, List[str]) – letters denoting all frequencies at which the series may be sampled. This must be a selection of ‘D’, ‘W’, ‘M’, ‘Q’, ‘A’. Default is only ‘M’. The return series will always be summed over the sample period. The signal series will be aggregated according to the values of agg_sigs.
agg_sigs (str, List[str]) – aggregation method applied to the signal values in down-sampling. The default is “last”. Alternatives are “mean”, “median” and “sum”. If a single aggregation type is chosen for multiple signal categories it is applied to all of them.
fwin (int) – forward window of return category in base periods. Default is 1. This conceptually corresponds to the holding period of a position in accordance with the signal.
slip (int) – Default is 0, implied slippage of feature availability for relationship with the target category. See
macrosynergy.management.df_utils.apply_slip()
for more information.ms_panel_test (bool) – if True the Macrosynergy Panel test is calculated. Please note that this is a very time-consuming operation and should be used only if you require the result.
additional_metrics (List[Callable]) – list of additional metrics to be calculated and added to the output table.
- accuracy_bars(ret=None, sigs=None, freq=None, agg_sig=None, type='cross_section', title=None, title_fontsize=16, size=None, legend_pos='best', x_labels=None, x_labels_rotate=0)[source]#
Plot bar chart for the overall and balanced accuracy metrics. For types: cross_section and years.
- Parameters:
ret (str, optional) – return category. Default is None, in which case the first return category will be used.
sigs (str, or List[str], optional) – signal category. Default is None, in which case all signals will be used.
freq (str, optional) – frequency to be used in analysis. Default is None, in which case the first frequency will be used.
agg_sig (str, optional) – aggregation method to be used in analysis. Default is None, in which case the first aggregation method will be used.
type (str, optional) – type of segment over which bars are drawn. Either “cross_section” (default), “years” or “signals”.
title (str, optional) – chart header - default will be applied if none is chosen.
title_fontsize (int) – font size of chart header. Default is 16.
size (Tuple[float], optional) – 2-tuple of width and height of plot - default will be applied if none is chosen.
legend_pos (str) – position of legend box. Default is ‘best’. See the documentation of matplotlib.pyplot.legend.
x_labels (Dict[str]) – dictionary of x-axis labels. Default is None.
x_labels_rotate (int) – rotation of x-axis labels. Default is 0.
- correlation_bars(ret=None, sigs=None, freq=None, type='cross_section', title=None, title_fontsize=16, size=None, legend_pos='best', x_labels=None, x_labels_rotate=0)[source]#
Plot correlation coefficients and significance. For types: cross_section and years.
- Parameters:
ret (str, optional) – return category. Default is the first return category.
sig (str, List[str], optional) – signal category. Default is the first signal category.
type (str, optional) – type of segment over which bars are drawn. Either “cross_section” (default), “years” or “signals”.
title (str, optional) – chart header. Default is None, in which case the default title will be applied.
title_fontsize (int) – font size of chart header. Default is 16.
size (Tuple[float, float], optional) – 2-tuple of width and height of plot. If None, the default size will be applied.
legend_pos (str) – position of legend box. Default is ‘best’. See matplotlib.pyplot.legend.
x_labels (Dict[str]) – dictionary of x-axis labels. Default is None.
x_labels_rotate (int) – rotation of x-axis labels. Default is 0.
- static apply_slip(df, slip, cids, xcats, metrics)[source]#
Function used to call the apply slip method that is defined in macrosynergy.management.df_utils.
- static is_list_of_strings(variable)[source]#
Function used to test whether a variable is a list of strings, to avoid the compiler saying a string is a list of characters.
- Parameters:
variable (Any) – variable to be tested.
- Returns:
True if variable is a list of strings, False otherwise.
- Return type:
- manipulate_df(xcats, freq, agg_sig)[source]#
Used to manipulate the DataFrame to the desired format for the analysis. Firstly reduces the dataframe to only include data outside of the blacklist and data that is relevant to xcat and sig. Then applies the slip to the dataframe. It then converts the dataframe to the desired format for the analysis and checks whether any negative signs should be introduced.
- calculate_single_stat(stat, ret=None, sig=None, type=None)[source]#
Calculates a single statistic for a given signal-return relation.
- Parameters:
- Returns:
statistic value.
- Return type:
- summary_table(cross_section=False, years=False)[source]#
Generates a summary table for the signal-return relations.
- single_relation_table(ret=None, xcat=None, freq=None, agg_sigs=None, table_type=None)[source]#
Computes all the statistics for one specific signal-return relation:
- Parameters:
ret (str) – single target return category. Default is first in target return list of the class.
xcat (str) – single signal category to be considered. Default is first in feature category list of the class.
freq (str) – letter denoting single frequency at which the series will be sampled. This must be one of the frequencies selected for the class. If not specified uses the freq stored in the class.
agg_sigs (str) – aggregation method applied to the signal values in down-sampling.
table_type (str) – type of table to be returned. Either “summary”, “years”, “cross_section”.
- Returns:
table with the statistics for the single signal-return relation.
- Return type:
- multiple_relations_table(rets=None, xcats=None, freqs=None, agg_sigs=None, signal_name_dict=None, return_name_dict=None)[source]#
Calculates all the statistics for each return and signal category specified with each frequency and aggregation method, note that if none are defined it does this for all categories, frequencies and aggregation methods that were stored in the class.
- single_statistic_table(stat, type='panel', rows=['xcat', 'agg_sigs'], columns=['ret', 'freq'], show_heatmap=False, title=None, title_fontsize=16, row_names=None, column_names=None, signal_name_dict=None, return_name_dict=None, min_color=None, max_color=None, figsize=(14, 8), annotate=True, round=5)[source]#
Creates a table which shows the specified statistic for each row and column specified as arguments:
- Parameters:
stat (str) – type of statistic to be displayed (this can be any of the column names of summary_table).
type (str) – type of the statistic displayed. This can be based on the overall panel (“panel”, default), an average of annual panels (mean_years), an average of cross-sectional relations (“mean_cids”), the positive ratio across years(“pr_years”), positive ratio across sections (“pr_cids”).
rows (List[str]) – row indices, which can be return categories, feature categories, frequencies and/or aggregations. The choice is made through a list of one or more of “xcat”, “ret”, “freq” and “agg_sigs”. The default is [“xcat”, “agg_sigs”] resulting in index strings (<agg_signs>) or if only one aggregation is available.
columns (List[str]) – column indices, which can be return categories, feature categories, frequencies and/or aggregations. The choice is made through a list of one or more of “xcat”, “ret”, “freq” and “agg_sigs”. The default is [“ret”, “freq] resulting in index strings () or if only one frequency is available.
show_heatmap (bool) – if True, the table is visualized as a heatmap. Default is False.
title (str, optional) – plot title. Default is None in which case the default title is used.
title_fontsize (int) – font size of title. Default is 16.
row_names (List[str]) – specifies the labels of rows in the heatmap. Default is None, the indices of the generated DataFrame are used.
column_names (List[str]) – specifies the labels of columns in the heatmap. Default is None, the columns of the generated DataFrame are used.
signal_name_dict (dict, optional) – dictionary mapping the signal names to the desired names in the heatmap. Default is None, in which case the signal names are used.
return_name_dict (dict, optional) – dictionary mapping the return names to the desired names in the heatmap. Default is None, in which case the return names are used.
min_color (float, optional) – minimum value of the color scale. Default is None, in which case the minimum value of the table is used.
max_color (float, optional) – maximum value of the color scale. Default is None, in which case the maximum value of the table is used.
figsize (Tuple[float, float]) – Tuple (w, h) of width and height of graph. Default is (14, 8).
annotate (bool) – Default is True, where the values shown in the heatmap are annotated.
round (int) – number of decimals to round the values to on the heatmap’s annotations.
- Returns:
DataFrame with the specified statistic for each row and column.
- Return type:
- set_df_labels(rows_dict, rows, columns)[source]#
Creates two lists of strings that will be used as the row and column labels for the resulting dataframe.
- Parameters:
rows_dict (dict) – dictionary containing the each value for each of the xcat, ret, freq and agg_sigs categories.
rows (List[str]) – list of strings specifying which of the categories are included in the rows of the dataframe.
columns (List[str]) – list of strings specifying which of the categories are included in the columns of the dataframe.