macrosynergy.learning.sequential.base_panel_learner#

Sequential learning over a panel.

class BasePanelLearner(df, xcats, cids=None, start=None, end=None, blacklist=None, freq='M', lag=1, xcat_aggs=['last', 'sum'], generate_labels=None, skip_checks=False)[source]#

Bases: ABC

run(name, models, outer_splitter, inner_splitters=None, hyperparameters=None, scorers=None, search_type='grid', normalize_fold_results=False, cv_summary='mean', include_train_folds=False, n_iter=100, split_functions=None, n_jobs_outer=-1, n_jobs_inner=1)[source]#

Run a learning process over a panel.

Parameters:
  • name (str) – Category name for the forecasted panel resulting from the learning process.

  • models (dict) – Dictionary of model names and compatible scikit-learn model objects.

  • outer_splitter (WalkForwardPanelSplit) – Outer splitter for the learning process.

  • inner_splitters (dict, optional) – Inner splitters for the learning process.

  • hyperparameters (dict, optional) – Dictionary of model names and hyperparameter grids.

  • scorers (dict, optional) – Dictionary of scikit-learn compatible scoring functions.

  • search_type (str) – Search type for hyperparameter optimization. Default is “grid”. Options are “grid”, “prior” and “bayes”. If no hyperparameter tuning is required, this parameter can be disregarded.

  • normalize_fold_results (bool) – Whether to normalize the scores across folds before combining them. Default is False. If no hyperparameter tuning is required, this parameter can be disregarded.

  • cv_summary (str or callable) – Summary function to use to combine scores across cross-validation folds. Default is “mean”. Options are “mean”, “median”, “mean-std”, “mean/std”, “mean-std-ge” or a callable function. If no hyperparameter tuning is required, this parameter can be disregarded.

  • include_train_folds (bool, optional) – Whether to calculate cross-validation statistics on the training folds in additional to the test folds. If no hyperparameter tuning is required, this parameter can be disregarded.

  • n_iter (int) – Number of iterations for random or bayesian hyperparameter optimization. If no hyperparameter tuning is required, this parameter can be disregarded.

  • split_functions (dict, optional) – Dictionary of callables for determining the number of cross-validation splits to add to the initial number, as a function of the number of iterations passed in the sequential learning process. If no hyperparameter tuning is required, this parameter can be disregarded.

  • n_jobs_outer (int, optional) – Number of jobs to run in parallel for the outer loop. Default is -1.

  • n_jobs_inner (int, optional) – Number of jobs to run in parallel for the inner loop. Default is 1. If no hyperparameter tuning is required, this parameter can be disregarded.

Returns:

List of dictionaries containing the results of the learning process.

Return type:

list

store_split_data(pipeline_name, optimal_model, optimal_model_name, optimal_model_score, optimal_model_params, inner_splitters_adj, X_train, y_train, X_test, y_test, timestamp, adjusted_test_index)[source]#

Store predictive analytics for training set (X_train, y_train).

Parameters:
  • pipeline_name (str) – Name of the sequential optimization pipeline.

  • optimal_model (RegressorMixin or ClassifierMixin or Pipeline) – Optimal model selected for the training set.

  • optimal_model_name (str) – Name of the optimal model.

  • optimal_model_score (float) – Score of the optimal model.

  • optimal_model_params (dict) – Hyperparameters of the optimal model.

  • inner_splitters_adj (dict) – Inner splitters for the learning process.

  • X_train (pd.DataFrame) – Input feature matrix.

  • y_train (pd.Series) – Target variable.

  • X_test (pd.DataFrame) – Input feature matrix.

  • y_test (pd.Series) – Target variable.

  • timestamp (pd.Timestamp) – Model retraining date.

  • adjusted_test_index (pd.MultiIndex) – Adjusted test index to account for lagged features.

Returns:

Dictionary containing predictive analytics.

Return type:

dict

get_optimal_models(name=None)[source]#

Returns the sequences of optimal models for one or more processes.

Parameters:

name (str or list, optional) – Label of sequential optimization process. Default is all stored in the class instance.

Returns:

Pandas dataframe of the optimal models and hyperparameters selected at each retraining date.

Return type:

pd.DataFrame

models_heatmap(name, title=None, cap=5, figsize=(12, 8), title_fontsize=None, tick_fontsize=None)[source]#

Visualized optimal models used for signal calculation.

Parameters:
  • name (str) – Name of the sequential optimization pipeline.

  • title (str, optional) – Title of the heatmap. Default is None. This creates a figure title of the form “Model Selection Heatmap for {name}”.

  • cap (int, optional) – Maximum number of models to display. Default (and limit) is 5. The chosen models are the ‘cap’ most frequently occurring in the pipeline.

  • figsize (tuple, optional) – Tuple of floats or ints denoting the figure size. Default is (12, 8).

  • title_fontsize (int, optional) – Font size for the title. Default is None.

  • tick_fontsize (int, optional) – Font size for the ticks. Default is None.

Notes

This method displays the models selected at each date in time over the span of the sequential learning process. A binary heatmap is used to visualise the model selection process.