macrosynergy.learning.sequential.base_panel_learner#
Sequential learning over a panel.
- class BasePanelLearner(df, xcats, cids=None, start=None, end=None, blacklist=None, freq='M', lag=1, xcat_aggs=['last', 'sum'], generate_labels=None, skip_checks=False)[source]#
Bases:
ABC
- run(name, models, outer_splitter, inner_splitters=None, hyperparameters=None, scorers=None, search_type='grid', normalize_fold_results=False, cv_summary='mean', include_train_folds=False, n_iter=100, split_functions=None, n_jobs_outer=-1, n_jobs_inner=1)[source]#
Run a learning process over a panel.
- Parameters:
name (str) – Category name for the forecasted panel resulting from the learning process.
models (dict) – Dictionary of model names and compatible scikit-learn model objects.
outer_splitter (WalkForwardPanelSplit) – Outer splitter for the learning process.
inner_splitters (dict, optional) – Inner splitters for the learning process.
hyperparameters (dict, optional) – Dictionary of model names and hyperparameter grids.
scorers (dict, optional) – Dictionary of scikit-learn compatible scoring functions.
search_type (str) – Search type for hyperparameter optimization. Default is “grid”. Options are “grid”, “prior” and “bayes”. If no hyperparameter tuning is required, this parameter can be disregarded.
normalize_fold_results (bool) – Whether to normalize the scores across folds before combining them. Default is False. If no hyperparameter tuning is required, this parameter can be disregarded.
cv_summary (str or callable) – Summary function to use to combine scores across cross-validation folds. Default is “mean”. Options are “mean”, “median”, “mean-std”, “mean/std”, “mean-std-ge” or a callable function. If no hyperparameter tuning is required, this parameter can be disregarded.
include_train_folds (bool, optional) – Whether to calculate cross-validation statistics on the training folds in additional to the test folds. If no hyperparameter tuning is required, this parameter can be disregarded.
n_iter (int) – Number of iterations for random or bayesian hyperparameter optimization. If no hyperparameter tuning is required, this parameter can be disregarded.
split_functions (dict, optional) – Dictionary of callables for determining the number of cross-validation splits to add to the initial number, as a function of the number of iterations passed in the sequential learning process. If no hyperparameter tuning is required, this parameter can be disregarded.
n_jobs_outer (int, optional) – Number of jobs to run in parallel for the outer loop. Default is -1.
n_jobs_inner (int, optional) – Number of jobs to run in parallel for the inner loop. Default is 1. If no hyperparameter tuning is required, this parameter can be disregarded.
- Returns:
List of dictionaries containing the results of the learning process.
- Return type:
- store_split_data(pipeline_name, optimal_model, optimal_model_name, optimal_model_score, optimal_model_params, inner_splitters_adj, X_train, y_train, X_test, y_test, timestamp, adjusted_test_index)[source]#
Store predictive analytics for training set (X_train, y_train).
- Parameters:
pipeline_name (str) – Name of the sequential optimization pipeline.
optimal_model (RegressorMixin or ClassifierMixin or Pipeline) – Optimal model selected for the training set.
optimal_model_name (str) – Name of the optimal model.
optimal_model_score (float) – Score of the optimal model.
optimal_model_params (dict) – Hyperparameters of the optimal model.
inner_splitters_adj (dict) – Inner splitters for the learning process.
X_train (pd.DataFrame) – Input feature matrix.
y_train (pd.Series) – Target variable.
X_test (pd.DataFrame) – Input feature matrix.
y_test (pd.Series) – Target variable.
timestamp (pd.Timestamp) – Model retraining date.
adjusted_test_index (pd.MultiIndex) – Adjusted test index to account for lagged features.
- Returns:
Dictionary containing predictive analytics.
- Return type:
- get_optimal_models(name=None)[source]#
Returns the sequences of optimal models for one or more processes.
- models_heatmap(name, title=None, cap=5, figsize=(12, 8), title_fontsize=None, tick_fontsize=None)[source]#
Visualized optimal models used for signal calculation.
- Parameters:
name (str) – Name of the sequential optimization pipeline.
title (str, optional) – Title of the heatmap. Default is None. This creates a figure title of the form “Model Selection Heatmap for {name}”.
cap (int, optional) – Maximum number of models to display. Default (and limit) is 5. The chosen models are the ‘cap’ most frequently occurring in the pipeline.
figsize (tuple, optional) – Tuple of floats or ints denoting the figure size. Default is (12, 8).
title_fontsize (int, optional) – Font size for the title. Default is None.
tick_fontsize (int, optional) – Font size for the ticks. Default is None.
Notes
This method displays the models selected at each date in time over the span of the sequential learning process. A binary heatmap is used to visualise the model selection process.