macrosynergy.learning.sequential.base_panel_learner#

Sequential learning over a panel.

class BasePanelLearner(df, xcats, cids=None, n_targets=1, start=None, end=None, blacklist=None, freq='M', lag=1, xcat_aggs=['last', 'sum'], generate_labels=None, skip_checks=False, drop_nas=True)[source]#

Bases: ABC

run(name, models, outer_splitter, inner_splitters=None, hyperparameters=None, scorers=None, search_type='grid', normalize_fold_results=False, cv_summary='mean', include_train_folds=False, n_iter=100, split_functions=None, store_additional_data=None, n_jobs_outer=-1, n_jobs_inner=1)[source]#

Run a learning process over a panel.

Parameters:

name (str) – Category name for the forecasted panel resulting from the learning process.
models (dict) – Dictionary of model names and compatible scikit-learn model objects.
outer_splitter (WalkForwardPanelSplit) – Outer splitter for the learning process.
inner_splitters (dict, optional) – Inner splitters for the learning process.
hyperparameters (dict, optional) – Dictionary of model names and hyperparameter grids.
scorers (dict, optional) – Dictionary of scikit-learn compatible scoring functions.
search_type (str) – Search type for hyperparameter optimization. Default is “grid”. Options are “grid”, “prior” and “bayes”. If no hyperparameter tuning is required, this parameter can be disregarded.
normalize_fold_results (bool) – Whether to normalize the scores across folds before combining them. Default is False. If no hyperparameter tuning is required, this parameter can be disregarded.
cv_summary (str or callable) – Summary function to use to combine scores across cross-validation folds. Default is “mean”. Options are “mean”, “median”, “mean-std”, “mean/std”, “mean-std-ge” or a callable function. If no hyperparameter tuning is required, this parameter can be disregarded.
include_train_folds (bool, optional) – Whether to calculate cross-validation statistics on the training folds in additional to the test folds. If no hyperparameter tuning is required, this parameter can be disregarded.
n_iter (int) – Number of iterations for random or bayesian hyperparameter optimization. If no hyperparameter tuning is required, this parameter can be disregarded.
split_functions (dict, optional) – Dictionary of callables for determining the number of cross-validation splits to add to the initial number, as a function of the number of iterations passed in the sequential learning process. If no hyperparameter tuning is required, this parameter can be disregarded.
store_additional_data (list, optional) – List of optimal model attributes to store from each optimal model at each retraining date. Default is None.
n_jobs_outer (int, optional) – Number of jobs to run in parallel for the outer loop. Default is -1.
n_jobs_inner (int, optional) – Number of jobs to run in parallel for the inner loop. Default is 1. If no hyperparameter tuning is required, this parameter can be disregarded.

Returns:

List of dictionaries containing the results of the learning process.

Return type:

list

store_split_data(pipeline_name, optimal_model, optimal_model_name, optimal_model_score, optimal_model_params, optimal_model_additional_data, inner_splitters_adj, X_train, y_train, X_test, y_test, timestamp, adjusted_test_index)[source]#

Store predictive analytics for training set (X_train, y_train).

Parameters:

pipeline_name (str) – Name of the sequential optimization pipeline.
optimal_model (RegressorMixin or ClassifierMixin or Pipeline) – Optimal model selected for the training set.
optimal_model_name (str) – Name of the optimal model.
optimal_model_score (float) – Score of the optimal model.
optimal_model_params (dict) – Hyperparameters of the optimal model.
optimal_model_additional_data (dict) – Additional attributes of the optimal model to store.
inner_splitters_adj (dict) – Inner splitters for the learning process.
X_train (pd.DataFrame) – Input feature matrix.
y_train (pd.Series) – Target variable.
X_test (pd.DataFrame) – Input feature matrix.
y_test (pd.Series) – Target variable.
timestamp (pd.Timestamp) – Model retraining date.
adjusted_test_index (pd.MultiIndex) – Adjusted test index to account for lagged features.

Returns:

Dictionary containing predictive analytics.

Return type:

dict

get_optimal_models(name=None)[source]#

Returns the sequences of optimal models for one or more processes.

Parameters:: name (str or list, optional) – Label of sequential optimization process. Default is all stored in the class instance.
Returns:: Pandas dataframe of the optimal models and hyperparameters selected at each retraining date.
Return type:: pd.DataFrame

models_heatmap(name, title=None, cap=5, figsize=(12, 8), title_fontsize=None, tick_fontsize=None)[source]#

Visualized optimal models used for signal calculation.

Parameters:

name (str) – Name of the sequential optimization pipeline.
title (str, optional) – Title of the heatmap. Default is None. This creates a figure title of the form “Model Selection Heatmap for {name}”.
cap (int, optional) – Maximum number of models to display. Default (and limit) is 5. The chosen models are the ‘cap’ most frequently occurring in the pipeline.
figsize (tuple, optional) – Tuple of floats or ints denoting the figure size. Default is (12, 8).
title_fontsize (int, optional) – Font size for the title. Default is None.
tick_fontsize (int, optional) – Font size for the ticks. Default is None.

Notes

This method displays the models selected at each date in time over the span of the sequential learning process. A binary heatmap is used to visualise the model selection process.