macrosynergy.learning.forecasting.model_systems#

class BaseRegressionSystem(roll='full', min_xs_samples=2, data_freq=None)[source]#

Bases: BaseEstimator, RegressorMixin, ABC

fit(X, y)[source]#

Fit a regression on each cross-section of a panel, subject to availability.

Parameters:
  • X (pd.DataFrame) – Input feature matrix.

  • y (pd.Series, pd.DataFrame or np.ndarray) – Target variable.

Returns:

self – Fitted regression system object.

Return type:

BaseRegressionSystem

predict(X)[source]#

Make predictions over a panel dataset based on trained observation-specific models.

Parameters:

X (pd.DataFrame) – Input feature matrix.

Returns:

predictions – Pandas series of predictions, multi-indexed by cross-section and date.

Return type:

pd.Series

roll_dates(roll, X_section, y_section, unique_dates)[source]#

Adjust dataset to be contained within a rolling window.

Parameters:
  • roll (int) – The lookback of the rolling window.

  • X_section (pd.DataFrame) – Input feature matrix for the cross-section.

  • y_section (pd.Series) – Target variable for the cross-section.

  • unique_dates (list) – List of unique dates in the cross-section.

Returns:

  • X_section (pd.DataFrame) – Input feature matrix for the cross-section, adjusted for the rolling window.

  • y_section (pd.Series) – Target variable for the cross-section, adjusted for the rolling window.

abstract store_model_info(section, model)[source]#

Store necessary model information for explainability.

Parameters:
  • section (str) – The identifier of the cross-section.

  • model (RegressorMixin) – The fitted regression model.

Notes

Must be overridden.

abstract create_model()[source]#

Instantiate a regression model for a given cross-section.

Notes

Must be overridden.

set_score_request(*, sample_weight: bool | None | str = '$UNCHANGED$') BaseRegressionSystem#

Request metadata passed to the score method.

Note that this method is only relevant if enable_metadata_routing=True (see sklearn.set_config()). Please see User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to score if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to score.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

New in version 1.3.

Note

This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a Pipeline. Otherwise it has no effect.

Parameters:

sample_weight (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for sample_weight parameter in score.

Returns:

self – The updated object.

Return type:

object

class CorrelationVolatilitySystem(correlation_lookback='full', correlation_type='pearson', volatility_lookback='full', volatility_window_type='rolling', min_xs_samples=2, data_freq=None)[source]#

Bases: BaseRegressionSystem

Cross-sectional system of moving average models to estimate correlation and volatility components of a macro beta separately over a panel of financial contracts.

Parameters:
  • correlation_lookback (int or str, default="full") – The lookback of the rolling window for correlation estimation. If “full”, the entire cross-sectional history is used. Otherwise, this parameter should be an integer specified in the native dataset frequency. If data_freq is not None or ‘unadjusted’, this parameter should be expressed in units of the frequency specified in data_freq.

  • correlation_type (str, default='pearson') – The type of correlation to be calculated. Accepted values are ‘pearson’, ‘kendall’ and ‘spearman’.

  • volatility_lookback (int or str, default="full") – The lookback of the rolling window for volatility estimation. If “full”, the entire cross-sectional history is used. Otherwise, this parameter should be an integer specified in the native dataset frequency. If data_freq is not None or ‘unadjusted’, this parameter should be expressed in units of the frequency specified in data_freq.

  • volatility_window_type (str, default='rolling') – The type of window to use for the volatility calculation. Accepted values are ‘rolling’ and ‘exponential’.

  • min_xs_samples (int, default=2) – The minimum number of samples required in each cross-section training set for a regression model to be fitted on that cross-section. If data_freq is None or ‘unadjusted’, this parameter is specified in units of the underlying dataset frequency. Otherwise, this parameter should be expressed in unites of the frequency specified in data_freq.

  • data_freq (str, optional) – Training set data frequency for resampling. This is primarily to be used within the context of market beta estimation in the BetaEstimator class in macrosynergy.learning, allowing for cross-validation of the underlying dataset frequency for good beta estimation. Accepted strings are ‘unadjusted’ to use the native dataset frequency, ‘W’ for weekly, ‘M’ for monthly and ‘Q’ for quarterly. It is recommended to set this parameter to ‘W’, ‘M’ or ‘Q’ only when the native dataset frequency is greater.

Notes

This class is specifically designed for market beta estimation based on the decomposition of the beta into correlation and volatility components in univariate analysis.

Separate estimators are fit for each cross-section, but evaluation is performed over the panel. Consequently, the results of a hyperparameter search will choose a single set of hyperparameters for all cross-sections, but the model parameters themselves may differ across cross-sections.

predict(X)[source]#

Make naive zero predictions over a panel dataset.

Parameters:

X (pd.DataFrame) – Input feature matrix.

Returns:

predictions – Pandas series of zero predictions, multi-indexed by cross-section and date.

Return type:

pd.Series

Notes

This method outputs zero predictions for all cross-sections and dates, since the CorrelationVolatilitySystem is solely used for beta estimation and no forecasting is performed.

store_model_info(section, beta)[source]#

Store the betas induced by the correlation and volatility estimators.

Parameters:
  • section (str) – The cross-section identifier.

  • beta (numbers.Number) – The beta estimate for the associated cross-section.

create_model()[source]#

Redundant method for the CorrelationVolatilitySystem class.

set_score_request(*, sample_weight: bool | None | str = '$UNCHANGED$') CorrelationVolatilitySystem#

Request metadata passed to the score method.

Note that this method is only relevant if enable_metadata_routing=True (see sklearn.set_config()). Please see User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to score if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to score.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

New in version 1.3.

Note

This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a Pipeline. Otherwise it has no effect.

Parameters:

sample_weight (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for sample_weight parameter in score.

Returns:

self – The updated object.

Return type:

object

class LADRegressionSystem(fit_intercept=True, positive=False, roll='full', min_xs_samples=2, data_freq=None)[source]#

Bases: BaseRegressionSystem

Cross-sectional system of LAD regression models.

Parameters:
  • fit_intercept (bool, default=True) – Whether to fit an intercept for each regression.

  • positive (bool, default=False) – Whether to enforce positive coefficients for each regression.

  • roll (int or str, default = "full") – The lookback of the rolling window for the regression. This should be expressed in either integer units of the native dataset frequency, or as the string roll = ‘full’ to use the entire available history.

  • min_xs_samples (int, default=2) – The minimum number of samples required in each cross-section training set for a regression model to be fitted on that cross-section. If data_freq is None or ‘unadjusted’, this parameter is specified in units of the underlying dataset frequency. Otherwise, this parameter should be expressed in unites of the frequency specified in data_freq.

  • data_freq (str, optional) – Training set data frequency for resampling. This is primarily to be used within the context of market beta estimation in the BetaEstimator class in macrosynergy.learning, allowing for cross-validation of the underlying dataset frequency for good beta estimation. Accepted strings are ‘unadjusted’ to use the native dataset frequency, ‘W’ for weekly, ‘M’ for monthly and ‘Q’ for quarterly. It is recommended to set this parameter to ‘W’, ‘M’ or ‘Q’ only when the native dataset frequency is greater.

Notes

Separate regression models are fit for each cross-section, but evaluation is performed over the panel. Consequently, the results of a hyperparameter search will choose a single set of hyperparameters for all cross-sections, but the model parameters themselves may differ across cross-sections.

This estimator is primarily intended for use within the context of market beta estimation, but can be plausibly used for return forecasting or other downstream tasks. The data_freq parameter is particularly intended for cross-validating market beta estimation models, since choosing the underlying data frequency is of interest for this problem.

create_model()[source]#

Instantiate a LAD regression model.

Returns:

A LAD regression model with the specified hyperparameters.

Return type:

LADRegressor

store_model_info(section, model)[source]#

Store the coefficients and intercepts of a fitted LAD regression model.

Parameters:
  • section (str) – The cross-section identifier.

  • model (LADRegressor) – The fitted linear regression model.

set_score_request(*, sample_weight: bool | None | str = '$UNCHANGED$') LADRegressionSystem#

Request metadata passed to the score method.

Note that this method is only relevant if enable_metadata_routing=True (see sklearn.set_config()). Please see User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to score if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to score.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

New in version 1.3.

Note

This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a Pipeline. Otherwise it has no effect.

Parameters:

sample_weight (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for sample_weight parameter in score.

Returns:

self – The updated object.

Return type:

object

class LinearRegressionSystem(fit_intercept=True, positive=False, roll='full', min_xs_samples=2, data_freq=None)[source]#

Bases: BaseRegressionSystem

Cross-sectional system of linear regression models for panel data.

Parameters:
  • fit_intercept (bool, default=True) – Whether to fit an intercept for each regression.

  • positive (bool, default=False) – Whether to enforce positive coefficients for each regression.

  • roll (int or str, default = "full") – The lookback of the rolling window for the regression. This should be expressed in either integer units of the native dataset frequency, or as the string roll = ‘full’ to use the entire available history.

  • min_xs_samples (int, default=2) – The minimum number of samples required in each cross-section training set for a regression model to be fitted on that cross-section. If data_freq is None or unadjusted, this parameter is specified in units of the underlying dataset frequency. Otherwise, this parameter should be expressed in units of the frequency specified in data_freq.

  • data_freq (str, optional) – Training set data frequency for resampling. This is primarily to be used within the context of market beta estimation in the BetaEstimator class in macrosynergy.learning, allowing for cross-validation of the underlying dataset frequency for good beta estimation. Accepted strings are ‘unadjusted’ to use the native dataset frequency, ‘W’ for weekly, ‘M’ for monthly and ‘Q’ for quarterly. It is recommended to set this parameter to ‘W’, ‘M’ or ‘Q’ only when the native dataset frequency is greater.

Notes

Separate regression models are fit for each cross-section, but evaluation is performed over the panel. Consequently, the results of a hyperparameter search will choose a single set of hyperparameters for all cross-sections, but the model parameters themselves may differ across cross-sections.

This estimator is primarily intended for use within the context of market beta estimation, but can be plausibly used for return forecasting or other downstream tasks. The data_freq parameter is particularly intended for cross-validating market beta estimation models, since choosing the underlying data frequency is of interest for this problem.

create_model()[source]#

Instantiate a linear regression model.

Returns:

A linear regression model with the specified hyperparameters.

Return type:

LinearRegression

store_model_info(section, model)[source]#

Store the coefficients and intercepts of a fitted linear regression model.

Parameters:
  • section (str) – The cross-section identifier.

  • model (LinearRegression) – The fitted linear regression model.

set_score_request(*, sample_weight: bool | None | str = '$UNCHANGED$') LinearRegressionSystem#

Request metadata passed to the score method.

Note that this method is only relevant if enable_metadata_routing=True (see sklearn.set_config()). Please see User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to score if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to score.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

New in version 1.3.

Note

This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a Pipeline. Otherwise it has no effect.

Parameters:

sample_weight (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for sample_weight parameter in score.

Returns:

self – The updated object.

Return type:

object

class RidgeRegressionSystem(fit_intercept=True, positive=False, alpha=1.0, tol=0.0001, solver='lsqr', roll='full', min_xs_samples=2, data_freq=None)[source]#

Bases: BaseRegressionSystem

Cross-sectional system of ridge regression models for panel data.

Parameters:
  • fit_intercept (bool, default=True) – Whether to fit an intercept for each regression.

  • positive (bool, default=False) – Whether to enforce positive coefficients for each regression.

  • alpha (float, default=1.0) – L2 regularization hyperparameter. Greater values specify stronger regularization.

  • roll (int or str, default = "full") – The lookback of the rolling window for the regression. This should be expressed in either integer units of the native dataset frequency, or as the string roll = ‘full’ to use the entire available history.

  • tol (float, default=1e-4) – The tolerance for termination.

  • solver (str, default='lsqr') – Solver to use in the computational routines. Options are ‘auto’, ‘svd’, ‘cholesky’, ‘lsqr’, ‘sparse_cg’, ‘sag’, ‘saga’ and ‘lbfgs’.

  • min_xs_samples (int, default=2) – The minimum number of samples required in each cross-section training set for a regression model to be fitted on that cross-section. If data_freq is None or ‘unadjusted’, this parameter is specified in units of the underlying dataset frequency. Otherwise, this parameter should be expressed in unites of the frequency specified in data_freq.

  • data_freq (str, optional) – Training set data frequency for resampling. This is primarily to be used within the context of market beta estimation in the BetaEstimator class in macrosynergy.learning, allowing for cross-validation of the underlying dataset frequency for good beta estimation. Accepted strings are ‘unadjusted’ to use the native dataset frequency, ‘W’ for weekly, ‘M’ for monthly and ‘Q’ for quarterly. It is recommended to set this parameter to ‘W’, ‘M’ or ‘Q’ only when the native dataset frequency is greater.

Notes

Separate regression models are fit for each cross-section, but evaluation is performed over the panel. Consequently, the results of a hyperparameter search will choose a single set of hyperparameters for all cross-sections, but the model parameters themselves may differ across cross-sections.

This estimator is primarily intended for use within the context of market beta estimation, but can be plausibly used for return forecasting or other downstream tasks. The data_freq parameter is particularly intended for cross-validating market beta estimation models, since choosing the underlying data frequency is of interest in quant analysis.

create_model()[source]#

Instantiate a ridge regression model.

Returns:

A ridge regression model with the specified hyperparameters.

Return type:

Ridge

store_model_info(section, model)[source]#

Store the coefficients and intercepts of a fitted ridge regression model.

Parameters:
  • section (str) – The cross-section identifier.

  • model (Ridge) – The fitted ridge regression model.

set_score_request(*, sample_weight: bool | None | str = '$UNCHANGED$') RidgeRegressionSystem#

Request metadata passed to the score method.

Note that this method is only relevant if enable_metadata_routing=True (see sklearn.set_config()). Please see User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to score if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to score.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

New in version 1.3.

Note

This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a Pipeline. Otherwise it has no effect.

Parameters:

sample_weight (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for sample_weight parameter in score.

Returns:

self – The updated object.

Return type:

object

Submodules#