macrosynergy.learning.forecasting.model_systems#
- class BaseRegressionSystem(roll='full', min_xs_samples=2, data_freq=None)[source]#
Bases:
BaseEstimator
,RegressorMixin
,ABC
- fit(X, y)[source]#
Fit a regression on each cross-section of a panel, subject to availability.
- Parameters:
X (pd.DataFrame) – Input feature matrix.
y (pd.Series, pd.DataFrame or np.ndarray) – Target variable.
- Returns:
self – Fitted regression system object.
- Return type:
- predict(X)[source]#
Make predictions over a panel dataset based on trained observation-specific models.
- Parameters:
X (pd.DataFrame) – Input feature matrix.
- Returns:
predictions – Pandas series of predictions, multi-indexed by cross-section and date.
- Return type:
pd.Series
- roll_dates(roll, X_section, y_section, unique_dates)[source]#
Adjust dataset to be contained within a rolling window.
- Parameters:
- Returns:
X_section (pd.DataFrame) – Input feature matrix for the cross-section, adjusted for the rolling window.
y_section (pd.Series) – Target variable for the cross-section, adjusted for the rolling window.
- abstract store_model_info(section, model)[source]#
Store necessary model information for explainability.
- Parameters:
section (str) – The identifier of the cross-section.
model (RegressorMixin) – The fitted regression model.
Notes
Must be overridden.
- abstract create_model()[source]#
Instantiate a regression model for a given cross-section.
Notes
Must be overridden.
- set_score_request(*, sample_weight: bool | None | str = '$UNCHANGED$') BaseRegressionSystem #
Request metadata passed to the
score
method.Note that this method is only relevant if
enable_metadata_routing=True
(seesklearn.set_config()
). Please see User Guide on how the routing mechanism works.The options for each parameter are:
True
: metadata is requested, and passed toscore
if provided. The request is ignored if metadata is not provided.False
: metadata is not requested and the meta-estimator will not pass it toscore
.None
: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str
: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED
) retains the existing request. This allows you to change the request for some parameters and not others.New in version 1.3.
Note
This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a
Pipeline
. Otherwise it has no effect.
- class CorrelationVolatilitySystem(correlation_lookback='full', correlation_type='pearson', volatility_lookback='full', volatility_window_type='rolling', min_xs_samples=2, data_freq=None)[source]#
Bases:
BaseRegressionSystem
Cross-sectional system of moving average models to estimate correlation and volatility components of a macro beta separately over a panel of financial contracts.
- Parameters:
correlation_lookback (int or str, default="full") – The lookback of the rolling window for correlation estimation. If “full”, the entire cross-sectional history is used. Otherwise, this parameter should be an integer specified in the native dataset frequency. If data_freq is not None or ‘unadjusted’, this parameter should be expressed in units of the frequency specified in data_freq.
correlation_type (str, default='pearson') – The type of correlation to be calculated. Accepted values are ‘pearson’, ‘kendall’ and ‘spearman’.
volatility_lookback (int or str, default="full") – The lookback of the rolling window for volatility estimation. If “full”, the entire cross-sectional history is used. Otherwise, this parameter should be an integer specified in the native dataset frequency. If data_freq is not None or ‘unadjusted’, this parameter should be expressed in units of the frequency specified in data_freq.
volatility_window_type (str, default='rolling') – The type of window to use for the volatility calculation. Accepted values are ‘rolling’ and ‘exponential’.
min_xs_samples (int, default=2) – The minimum number of samples required in each cross-section training set for a regression model to be fitted on that cross-section. If data_freq is None or ‘unadjusted’, this parameter is specified in units of the underlying dataset frequency. Otherwise, this parameter should be expressed in unites of the frequency specified in data_freq.
data_freq (str, optional) – Training set data frequency for resampling. This is primarily to be used within the context of market beta estimation in the BetaEstimator class in macrosynergy.learning, allowing for cross-validation of the underlying dataset frequency for good beta estimation. Accepted strings are ‘unadjusted’ to use the native dataset frequency, ‘W’ for weekly, ‘M’ for monthly and ‘Q’ for quarterly. It is recommended to set this parameter to ‘W’, ‘M’ or ‘Q’ only when the native dataset frequency is greater.
Notes
This class is specifically designed for market beta estimation based on the decomposition of the beta into correlation and volatility components in univariate analysis.
Separate estimators are fit for each cross-section, but evaluation is performed over the panel. Consequently, the results of a hyperparameter search will choose a single set of hyperparameters for all cross-sections, but the model parameters themselves may differ across cross-sections.
- predict(X)[source]#
Make naive zero predictions over a panel dataset.
- Parameters:
X (pd.DataFrame) – Input feature matrix.
- Returns:
predictions – Pandas series of zero predictions, multi-indexed by cross-section and date.
- Return type:
pd.Series
Notes
This method outputs zero predictions for all cross-sections and dates, since the CorrelationVolatilitySystem is solely used for beta estimation and no forecasting is performed.
- store_model_info(section, beta)[source]#
Store the betas induced by the correlation and volatility estimators.
- Parameters:
section (str) – The cross-section identifier.
beta (numbers.Number) – The beta estimate for the associated cross-section.
- set_score_request(*, sample_weight: bool | None | str = '$UNCHANGED$') CorrelationVolatilitySystem #
Request metadata passed to the
score
method.Note that this method is only relevant if
enable_metadata_routing=True
(seesklearn.set_config()
). Please see User Guide on how the routing mechanism works.The options for each parameter are:
True
: metadata is requested, and passed toscore
if provided. The request is ignored if metadata is not provided.False
: metadata is not requested and the meta-estimator will not pass it toscore
.None
: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str
: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED
) retains the existing request. This allows you to change the request for some parameters and not others.New in version 1.3.
Note
This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a
Pipeline
. Otherwise it has no effect.
- class LADRegressionSystem(fit_intercept=True, positive=False, roll='full', min_xs_samples=2, data_freq=None)[source]#
Bases:
BaseRegressionSystem
Cross-sectional system of LAD regression models.
- Parameters:
fit_intercept (bool, default=True) – Whether to fit an intercept for each regression.
positive (bool, default=False) – Whether to enforce positive coefficients for each regression.
roll (int or str, default = "full") – The lookback of the rolling window for the regression. This should be expressed in either integer units of the native dataset frequency, or as the string roll = ‘full’ to use the entire available history.
min_xs_samples (int, default=2) – The minimum number of samples required in each cross-section training set for a regression model to be fitted on that cross-section. If data_freq is None or ‘unadjusted’, this parameter is specified in units of the underlying dataset frequency. Otherwise, this parameter should be expressed in unites of the frequency specified in data_freq.
data_freq (str, optional) – Training set data frequency for resampling. This is primarily to be used within the context of market beta estimation in the BetaEstimator class in macrosynergy.learning, allowing for cross-validation of the underlying dataset frequency for good beta estimation. Accepted strings are ‘unadjusted’ to use the native dataset frequency, ‘W’ for weekly, ‘M’ for monthly and ‘Q’ for quarterly. It is recommended to set this parameter to ‘W’, ‘M’ or ‘Q’ only when the native dataset frequency is greater.
Notes
Separate regression models are fit for each cross-section, but evaluation is performed over the panel. Consequently, the results of a hyperparameter search will choose a single set of hyperparameters for all cross-sections, but the model parameters themselves may differ across cross-sections.
This estimator is primarily intended for use within the context of market beta estimation, but can be plausibly used for return forecasting or other downstream tasks. The data_freq parameter is particularly intended for cross-validating market beta estimation models, since choosing the underlying data frequency is of interest for this problem.
- create_model()[source]#
Instantiate a LAD regression model.
- Returns:
A LAD regression model with the specified hyperparameters.
- Return type:
- store_model_info(section, model)[source]#
Store the coefficients and intercepts of a fitted LAD regression model.
- Parameters:
section (str) – The cross-section identifier.
model (LADRegressor) – The fitted linear regression model.
- set_score_request(*, sample_weight: bool | None | str = '$UNCHANGED$') LADRegressionSystem #
Request metadata passed to the
score
method.Note that this method is only relevant if
enable_metadata_routing=True
(seesklearn.set_config()
). Please see User Guide on how the routing mechanism works.The options for each parameter are:
True
: metadata is requested, and passed toscore
if provided. The request is ignored if metadata is not provided.False
: metadata is not requested and the meta-estimator will not pass it toscore
.None
: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str
: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED
) retains the existing request. This allows you to change the request for some parameters and not others.New in version 1.3.
Note
This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a
Pipeline
. Otherwise it has no effect.
- class LinearRegressionSystem(fit_intercept=True, positive=False, roll='full', min_xs_samples=2, data_freq=None)[source]#
Bases:
BaseRegressionSystem
Cross-sectional system of linear regression models for panel data.
- Parameters:
fit_intercept (bool, default=True) – Whether to fit an intercept for each regression.
positive (bool, default=False) – Whether to enforce positive coefficients for each regression.
roll (int or str, default = "full") – The lookback of the rolling window for the regression. This should be expressed in either integer units of the native dataset frequency, or as the string roll = ‘full’ to use the entire available history.
min_xs_samples (int, default=2) – The minimum number of samples required in each cross-section training set for a regression model to be fitted on that cross-section. If data_freq is None or unadjusted, this parameter is specified in units of the underlying dataset frequency. Otherwise, this parameter should be expressed in units of the frequency specified in data_freq.
data_freq (str, optional) – Training set data frequency for resampling. This is primarily to be used within the context of market beta estimation in the BetaEstimator class in macrosynergy.learning, allowing for cross-validation of the underlying dataset frequency for good beta estimation. Accepted strings are ‘unadjusted’ to use the native dataset frequency, ‘W’ for weekly, ‘M’ for monthly and ‘Q’ for quarterly. It is recommended to set this parameter to ‘W’, ‘M’ or ‘Q’ only when the native dataset frequency is greater.
Notes
Separate regression models are fit for each cross-section, but evaluation is performed over the panel. Consequently, the results of a hyperparameter search will choose a single set of hyperparameters for all cross-sections, but the model parameters themselves may differ across cross-sections.
This estimator is primarily intended for use within the context of market beta estimation, but can be plausibly used for return forecasting or other downstream tasks. The data_freq parameter is particularly intended for cross-validating market beta estimation models, since choosing the underlying data frequency is of interest for this problem.
- create_model()[source]#
Instantiate a linear regression model.
- Returns:
A linear regression model with the specified hyperparameters.
- Return type:
LinearRegression
- store_model_info(section, model)[source]#
Store the coefficients and intercepts of a fitted linear regression model.
- Parameters:
section (str) – The cross-section identifier.
model (LinearRegression) – The fitted linear regression model.
- set_score_request(*, sample_weight: bool | None | str = '$UNCHANGED$') LinearRegressionSystem #
Request metadata passed to the
score
method.Note that this method is only relevant if
enable_metadata_routing=True
(seesklearn.set_config()
). Please see User Guide on how the routing mechanism works.The options for each parameter are:
True
: metadata is requested, and passed toscore
if provided. The request is ignored if metadata is not provided.False
: metadata is not requested and the meta-estimator will not pass it toscore
.None
: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str
: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED
) retains the existing request. This allows you to change the request for some parameters and not others.New in version 1.3.
Note
This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a
Pipeline
. Otherwise it has no effect.
- class RidgeRegressionSystem(fit_intercept=True, positive=False, alpha=1.0, tol=0.0001, solver='lsqr', roll='full', min_xs_samples=2, data_freq=None)[source]#
Bases:
BaseRegressionSystem
Cross-sectional system of ridge regression models for panel data.
- Parameters:
fit_intercept (bool, default=True) – Whether to fit an intercept for each regression.
positive (bool, default=False) – Whether to enforce positive coefficients for each regression.
alpha (float, default=1.0) – L2 regularization hyperparameter. Greater values specify stronger regularization.
roll (int or str, default = "full") – The lookback of the rolling window for the regression. This should be expressed in either integer units of the native dataset frequency, or as the string roll = ‘full’ to use the entire available history.
tol (float, default=1e-4) – The tolerance for termination.
solver (str, default='lsqr') – Solver to use in the computational routines. Options are ‘auto’, ‘svd’, ‘cholesky’, ‘lsqr’, ‘sparse_cg’, ‘sag’, ‘saga’ and ‘lbfgs’.
min_xs_samples (int, default=2) – The minimum number of samples required in each cross-section training set for a regression model to be fitted on that cross-section. If data_freq is None or ‘unadjusted’, this parameter is specified in units of the underlying dataset frequency. Otherwise, this parameter should be expressed in unites of the frequency specified in data_freq.
data_freq (str, optional) – Training set data frequency for resampling. This is primarily to be used within the context of market beta estimation in the BetaEstimator class in macrosynergy.learning, allowing for cross-validation of the underlying dataset frequency for good beta estimation. Accepted strings are ‘unadjusted’ to use the native dataset frequency, ‘W’ for weekly, ‘M’ for monthly and ‘Q’ for quarterly. It is recommended to set this parameter to ‘W’, ‘M’ or ‘Q’ only when the native dataset frequency is greater.
Notes
Separate regression models are fit for each cross-section, but evaluation is performed over the panel. Consequently, the results of a hyperparameter search will choose a single set of hyperparameters for all cross-sections, but the model parameters themselves may differ across cross-sections.
This estimator is primarily intended for use within the context of market beta estimation, but can be plausibly used for return forecasting or other downstream tasks. The data_freq parameter is particularly intended for cross-validating market beta estimation models, since choosing the underlying data frequency is of interest in quant analysis.
- create_model()[source]#
Instantiate a ridge regression model.
- Returns:
A ridge regression model with the specified hyperparameters.
- Return type:
Ridge
- store_model_info(section, model)[source]#
Store the coefficients and intercepts of a fitted ridge regression model.
- Parameters:
section (str) – The cross-section identifier.
model (Ridge) – The fitted ridge regression model.
- set_score_request(*, sample_weight: bool | None | str = '$UNCHANGED$') RidgeRegressionSystem #
Request metadata passed to the
score
method.Note that this method is only relevant if
enable_metadata_routing=True
(seesklearn.set_config()
). Please see User Guide on how the routing mechanism works.The options for each parameter are:
True
: metadata is requested, and passed toscore
if provided. The request is ignored if metadata is not provided.False
: metadata is not requested and the meta-estimator will not pass it toscore
.None
: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str
: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED
) retains the existing request. This allows you to change the request for some parameters and not others.New in version 1.3.
Note
This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a
Pipeline
. Otherwise it has no effect.