macrosynergy.learning.forecasting#

class LADRegressor(fit_intercept=True, positive=False, alpha=0, shrinkage_type='l1', tol=None, maxiter=None)[source]#

Bases: BaseEstimator, RegressorMixin

fit(X, y, sample_weight=None)[source]#

Learn LAD regression model parameters.

Parameters:

X (pd.DataFrame or np.ndarray) – Input feature matrix.
y (pd.Series or pd.DataFrame or np.ndarray) – Target vector associated with each sample in X.
sample_weight (np.ndarray, default=None) – Numpy array of sample weights to create a weighted LAD regression model.

predict(X)[source]#

Predict dependent variable using the fitted LAD regression model.

Parameters:: X (pd.DataFrame or np.ndarray) – Input feature matrix.
Returns:: y_pred – Numpy array of predictions.
Return type:: np.ndarray

Notes

If the model learning algorithm failed to converge, the predict method will return an array of zeros. This has the interpretation of no buy/sell signal being triggered based on this model.

set_fit_request(*, sample_weight: bool | None | str = '$UNCHANGED$') → LADRegressor#

Configure whether metadata should be requested to be passed to the fit method.

Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with enable_metadata_routing=True (see sklearn.set_config()). Please check the User Guide on how the routing mechanism works.

The options for each parameter are:

True: metadata is requested, and passed to fit if provided. The request is ignored if metadata is not provided.
False: metadata is not requested and the meta-estimator will not pass it to fit.
None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.
str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

New in version 1.3.

Parameters:: sample_weight (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for sample_weight parameter in fit.
Returns:: self – The updated object.
Return type:: object

set_score_request(*, sample_weight: bool | None | str = '$UNCHANGED$') → LADRegressor#

Configure whether metadata should be requested to be passed to the score method.

The options for each parameter are:

True: metadata is requested, and passed to score if provided. The request is ignored if metadata is not provided.
False: metadata is not requested and the meta-estimator will not pass it to score.
None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.
str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

New in version 1.3.

Parameters:: sample_weight (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for sample_weight parameter in score.
Returns:: self – The updated object.
Return type:: object

class KNNClassifier(n_neighbors='sqrt', weights='uniform')[source]#

Bases: ClassifierMixin, BaseEstimator

fit(X, y)[source]#

Fit method.

Parameters:

X (pd.DataFrame or np.ndarray) – The input feature matrix.
y (pd.Series or np.ndarray) – The target variable.

Returns:

The fitted model.

Return type:

self

predict(X)[source]#

Predict method.

Parameters:: X (pd.DataFrame or np.ndarray) – The input feature matrix.
Returns:: The predicted values.
Return type:: np.ndarray

predict_proba(X)[source]#

Predict probability method.

Parameters:: X (pd.DataFrame or np.ndarray) – The input feature matrix.
Returns:: The predicted probabilities.
Return type:: np.ndarray

set_score_request(*, sample_weight: bool | None | str = '$UNCHANGED$') → KNNClassifier#

Configure whether metadata should be requested to be passed to the score method.

The options for each parameter are:

True: metadata is requested, and passed to score if provided. The request is ignored if metadata is not provided.
False: metadata is not requested and the meta-estimator will not pass it to score.
None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.
str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

New in version 1.3.

Parameters:: sample_weight (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for sample_weight parameter in score.
Returns:: self – The updated object.
Return type:: object

class SignWeightedLADRegressor(fit_intercept=True, positive=False, alpha=0, shrinkage_type='l1', tol=None, maxiter=None)[source]#

Bases: SignWeightedRegressor

set_params(**params)[source]#

Setter method to update the parameters of the SignWeightedLADRegressor.

Parameters:: **params (dict) – Dictionary of parameters to update.
Returns:: The SignWeightedLADRegressor instance with updated parameters.
Return type:: self

set_score_request(*, sample_weight: bool | None | str = '$UNCHANGED$') → SignWeightedLADRegressor#

Configure whether metadata should be requested to be passed to the score method.

The options for each parameter are:

True: metadata is requested, and passed to score if provided. The request is ignored if metadata is not provided.
False: metadata is not requested and the meta-estimator will not pass it to score.
None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.
str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

New in version 1.3.

Parameters:: sample_weight (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for sample_weight parameter in score.
Returns:: self – The updated object.
Return type:: object

class TimeWeightedLADRegressor(fit_intercept=True, positive=False, half_life=252, alpha=0, shrinkage_type='l1', tol=None, maxiter=None)[source]#

Bases: TimeWeightedRegressor

set_params(**params)[source]#

Setter method to update the parameters of the TimeWeightedLADRegressor.

Parameters:: **params (dict) – Dictionary of parameters to update.
Returns:: The TimeWeightedLADRegressor instance with updated parameters.
Return type:: self

set_score_request(*, sample_weight: bool | None | str = '$UNCHANGED$') → TimeWeightedLADRegressor#

Configure whether metadata should be requested to be passed to the score method.

The options for each parameter are:

True: metadata is requested, and passed to score if provided. The request is ignored if metadata is not provided.
False: metadata is not requested and the meta-estimator will not pass it to score.
None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.
str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

New in version 1.3.

Parameters:: sample_weight (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for sample_weight parameter in score.
Returns:: self – The updated object.
Return type:: object

class SignWeightedLinearRegression(fit_intercept=True, positive=False, alpha=0, shrinkage_type='l1')[source]#

Bases: SignWeightedRegressor

set_params(**params)[source]#

Setter method to update the parameters of the SignWeightedLinearRegression.

Parameters:: **params (dict) – Dictionary of parameters to update.
Returns:: The SignWeightedLinearRegression instance with updated parameters.
Return type:: self

set_score_request(*, sample_weight: bool | None | str = '$UNCHANGED$') → SignWeightedLinearRegression#

Configure whether metadata should be requested to be passed to the score method.

The options for each parameter are:

True: metadata is requested, and passed to score if provided. The request is ignored if metadata is not provided.
False: metadata is not requested and the meta-estimator will not pass it to score.
None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.
str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

New in version 1.3.

Parameters:: sample_weight (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for sample_weight parameter in score.
Returns:: self – The updated object.
Return type:: object

class TimeWeightedLinearRegression(fit_intercept=True, positive=False, half_life=252, alpha=0, shrinkage_type='l1')[source]#

Bases: TimeWeightedRegressor

set_params(**params)[source]#

Setter method to update the parameters of the TimeWeightedLinearRegression.

Parameters:: **params (dict) – Dictionary of parameters to update.
Returns:: The TimeWeightedLinearRegression instance with updated parameters.
Return type:: self

set_score_request(*, sample_weight: bool | None | str = '$UNCHANGED$') → TimeWeightedLinearRegression#

Configure whether metadata should be requested to be passed to the score method.

The options for each parameter are:

True: metadata is requested, and passed to score if provided. The request is ignored if metadata is not provided.
False: metadata is not requested and the meta-estimator will not pass it to score.
None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.
str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

New in version 1.3.

Parameters:: sample_weight (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for sample_weight parameter in score.
Returns:: self – The updated object.
Return type:: object

class NaiveRegressor[source]#

Bases: BaseEstimator, RegressorMixin

Equally weighted unbiased factor model.

Notes

Given a collection of factors that are theoretically positively correlated with a dependent variable, a plausible signal is a simple average of those factors. This is effectively a linear regression model with zero intercept and equal weights for all factors.

This is a useful benchmark model which works well when the factors are as uncorrelated as possible with one another, because it offers a layer of diversification on the underlying return drivers. When the user has strong priors, this is often a competitive model that is difficult to beat.

However, it is vital for the features to have been preprocessed to have a positive theoretical correlation with the target variable.

fit(X, y=None)[source]#

Fit method.

Parameters:

X (pd.DataFrame, pd.Series or np.ndarray) – The input feature matrix.
y (pd.DataFrame, pd.Series or np.ndarray) – The target variable.

Returns:

The fitted model.

Return type:

self

Notes

This method involves fully trusting one’s priors and thus requires no learning element. As a consequence, no training set information is needed.

predict(X)[source]#

Predict method.

Notes

The predictions are simply the average of the features across columns of the input feature matrix.

set_score_request(*, sample_weight: bool | None | str = '$UNCHANGED$') → NaiveRegressor#

Configure whether metadata should be requested to be passed to the score method.

The options for each parameter are:

True: metadata is requested, and passed to score if provided. The request is ignored if metadata is not provided.
False: metadata is not requested and the meta-estimator will not pass it to score.
None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.
str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

New in version 1.3.

Parameters:: sample_weight (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for sample_weight parameter in score.
Returns:: self – The updated object.
Return type:: object

class CorrelationVolatilitySystem(correlation_lookback='full', correlation_type='pearson', volatility_lookback='full', volatility_window_type='rolling', min_xs_samples=2, data_freq=None)[source]#

Bases: BaseRegressionSystem

Cross-sectional system of moving average models to estimate correlation and volatility components of a macro beta separately over a panel of financial contracts.

Parameters:

correlation_lookback (int or str, default="full") – The lookback of the rolling window for correlation estimation. If “full”, the entire cross-sectional history is used. Otherwise, this parameter should be an integer specified in the native dataset frequency. If data_freq is not None or ‘unadjusted’, this parameter should be expressed in units of the frequency specified in data_freq.
correlation_type (str, default='pearson') – The type of correlation to be calculated. Accepted values are ‘pearson’, ‘kendall’ and ‘spearman’.
volatility_lookback (int or str, default="full") – The lookback of the rolling window for volatility estimation. If “full”, the entire cross-sectional history is used. Otherwise, this parameter should be an integer specified in the native dataset frequency. If data_freq is not None or ‘unadjusted’, this parameter should be expressed in units of the frequency specified in data_freq.
volatility_window_type (str, default='rolling') – The type of window to use for the volatility calculation. Accepted values are ‘rolling’ and ‘exponential’.
min_xs_samples (int, default=2) – The minimum number of samples required in each cross-section training set for a regression model to be fitted on that cross-section. If data_freq is None or ‘unadjusted’, this parameter is specified in units of the underlying dataset frequency. Otherwise, this parameter should be expressed in unites of the frequency specified in data_freq.
data_freq (str, optional) – Training set data frequency for resampling. This is primarily to be used within the context of market beta estimation in the BetaEstimator class in macrosynergy.learning, allowing for cross-validation of the underlying dataset frequency for good beta estimation. Accepted strings are ‘unadjusted’ to use the native dataset frequency, ‘W’ for weekly, ‘M’ for monthly and ‘Q’ for quarterly. It is recommended to set this parameter to ‘W’, ‘M’ or ‘Q’ only when the native dataset frequency is greater.

Notes

This class is specifically designed for market beta estimation based on the decomposition of the beta into correlation and volatility components in univariate analysis.

Separate estimators are fit for each cross-section, but evaluation is performed over the panel. Consequently, the results of a hyperparameter search will choose a single set of hyperparameters for all cross-sections, but the model parameters themselves may differ across cross-sections.

predict(X)[source]#

Make naive zero predictions over a panel dataset.

Parameters:: X (pd.DataFrame) – Input feature matrix.
Returns:: predictions – Pandas series of zero predictions, multi-indexed by cross-section and date.
Return type:: pd.Series

Notes

This method outputs zero predictions for all cross-sections and dates, since the CorrelationVolatilitySystem is solely used for beta estimation and no forecasting is performed.

store_model_info(section, beta)[source]#

Store the betas induced by the correlation and volatility estimators.

Parameters:

section (str) – The cross-section identifier.
beta (numbers.Number) – The beta estimate for the associated cross-section.

create_model()[source]#: Redundant method for the CorrelationVolatilitySystem class.

set_score_request(*, sample_weight: bool | None | str = '$UNCHANGED$') → CorrelationVolatilitySystem#

Configure whether metadata should be requested to be passed to the score method.

The options for each parameter are:

True: metadata is requested, and passed to score if provided. The request is ignored if metadata is not provided.
False: metadata is not requested and the meta-estimator will not pass it to score.
None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.
str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

New in version 1.3.

Parameters:: sample_weight (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for sample_weight parameter in score.
Returns:: self – The updated object.
Return type:: object

class LADRegressionSystem(fit_intercept=True, positive=False, roll='full', min_xs_samples=2, data_freq=None)[source]#

Bases: BaseRegressionSystem

Cross-sectional system of LAD regression models.

Parameters:

fit_intercept (bool, default=True) – Whether to fit an intercept for each regression.
positive (bool, default=False) – Whether to enforce positive coefficients for each regression.
roll (int or str, default = "full") – The lookback of the rolling window for the regression. This should be expressed in either integer units of the native dataset frequency, or as the string roll = ‘full’ to use the entire available history.
min_xs_samples (int, default=2) – The minimum number of samples required in each cross-section training set for a regression model to be fitted on that cross-section. If data_freq is None or ‘unadjusted’, this parameter is specified in units of the underlying dataset frequency. Otherwise, this parameter should be expressed in unites of the frequency specified in data_freq.
data_freq (str, optional) – Training set data frequency for resampling. This is primarily to be used within the context of market beta estimation in the BetaEstimator class in macrosynergy.learning, allowing for cross-validation of the underlying dataset frequency for good beta estimation. Accepted strings are ‘unadjusted’ to use the native dataset frequency, ‘W’ for weekly, ‘M’ for monthly and ‘Q’ for quarterly. It is recommended to set this parameter to ‘W’, ‘M’ or ‘Q’ only when the native dataset frequency is greater.

Notes

Separate regression models are fit for each cross-section, but evaluation is performed over the panel. Consequently, the results of a hyperparameter search will choose a single set of hyperparameters for all cross-sections, but the model parameters themselves may differ across cross-sections.

This estimator is primarily intended for use within the context of market beta estimation, but can be plausibly used for return forecasting or other downstream tasks. The data_freq parameter is particularly intended for cross-validating market beta estimation models, since choosing the underlying data frequency is of interest for this problem.

create_model()[source]#

Instantiate a LAD regression model.

Returns:: A LAD regression model with the specified hyperparameters.
Return type:: LADRegressor

store_model_info(section, model)[source]#

Store the coefficients and intercepts of a fitted LAD regression model.

Parameters:

section (str) – The cross-section identifier.
model (LADRegressor) – The fitted linear regression model.

set_score_request(*, sample_weight: bool | None | str = '$UNCHANGED$') → LADRegressionSystem#

Configure whether metadata should be requested to be passed to the score method.

The options for each parameter are:

True: metadata is requested, and passed to score if provided. The request is ignored if metadata is not provided.
False: metadata is not requested and the meta-estimator will not pass it to score.
None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.
str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

New in version 1.3.

Parameters:: sample_weight (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for sample_weight parameter in score.
Returns:: self – The updated object.
Return type:: object

class LinearRegressionSystem(fit_intercept=True, positive=False, roll='full', min_xs_samples=2, data_freq=None)[source]#

Bases: BaseRegressionSystem

Cross-sectional system of linear regression models for panel data.

Parameters:

fit_intercept (bool, default=True) – Whether to fit an intercept for each regression.
positive (bool, default=False) – Whether to enforce positive coefficients for each regression.
roll (int or str, default = "full") – The lookback of the rolling window for the regression. This should be expressed in either integer units of the native dataset frequency, or as the string roll = ‘full’ to use the entire available history.
min_xs_samples (int, default=2) – The minimum number of samples required in each cross-section training set for a regression model to be fitted on that cross-section. If data_freq is None or unadjusted, this parameter is specified in units of the underlying dataset frequency. Otherwise, this parameter should be expressed in units of the frequency specified in data_freq.
data_freq (str, optional) – Training set data frequency for resampling. This is primarily to be used within the context of market beta estimation in the BetaEstimator class in macrosynergy.learning, allowing for cross-validation of the underlying dataset frequency for good beta estimation. Accepted strings are ‘unadjusted’ to use the native dataset frequency, ‘W’ for weekly, ‘M’ for monthly and ‘Q’ for quarterly. It is recommended to set this parameter to ‘W’, ‘M’ or ‘Q’ only when the native dataset frequency is greater.

Notes

create_model()[source]#

Instantiate a linear regression model.

Returns:: A linear regression model with the specified hyperparameters.
Return type:: LinearRegression

store_model_info(section, model)[source]#

Store the coefficients and intercepts of a fitted linear regression model.

Parameters:

section (str) – The cross-section identifier.
model (LinearRegression) – The fitted linear regression model.

set_score_request(*, sample_weight: bool | None | str = '$UNCHANGED$') → LinearRegressionSystem#

Configure whether metadata should be requested to be passed to the score method.

The options for each parameter are:

True: metadata is requested, and passed to score if provided. The request is ignored if metadata is not provided.
False: metadata is not requested and the meta-estimator will not pass it to score.
None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.
str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

New in version 1.3.

Parameters:: sample_weight (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for sample_weight parameter in score.
Returns:: self – The updated object.
Return type:: object

class RidgeRegressionSystem(fit_intercept=True, positive=False, alpha=1.0, tol=0.0001, solver='lsqr', roll='full', min_xs_samples=2, data_freq=None)[source]#

Bases: BaseRegressionSystem

Cross-sectional system of ridge regression models for panel data.

Parameters:

fit_intercept (bool, default=True) – Whether to fit an intercept for each regression.
positive (bool, default=False) – Whether to enforce positive coefficients for each regression.
alpha (float, default=1.0) – L2 regularization hyperparameter. Greater values specify stronger regularization.
roll (int or str, default = "full") – The lookback of the rolling window for the regression. This should be expressed in either integer units of the native dataset frequency, or as the string roll = ‘full’ to use the entire available history.
tol (float, default=1e-4) – The tolerance for termination.
solver (str, default='lsqr') – Solver to use in the computational routines. Options are ‘auto’, ‘svd’, ‘cholesky’, ‘lsqr’, ‘sparse_cg’, ‘sag’, ‘saga’ and ‘lbfgs’.
min_xs_samples (int, default=2) – The minimum number of samples required in each cross-section training set for a regression model to be fitted on that cross-section. If data_freq is None or ‘unadjusted’, this parameter is specified in units of the underlying dataset frequency. Otherwise, this parameter should be expressed in unites of the frequency specified in data_freq.
data_freq (str, optional) – Training set data frequency for resampling. This is primarily to be used within the context of market beta estimation in the BetaEstimator class in macrosynergy.learning, allowing for cross-validation of the underlying dataset frequency for good beta estimation. Accepted strings are ‘unadjusted’ to use the native dataset frequency, ‘W’ for weekly, ‘M’ for monthly and ‘Q’ for quarterly. It is recommended to set this parameter to ‘W’, ‘M’ or ‘Q’ only when the native dataset frequency is greater.

Notes

create_model()[source]#

Instantiate a ridge regression model.

Returns:: A ridge regression model with the specified hyperparameters.
Return type:: Ridge

store_model_info(section, model)[source]#

Store the coefficients and intercepts of a fitted ridge regression model.

Parameters:

section (str) – The cross-section identifier.
model (Ridge) – The fitted ridge regression model.

set_score_request(*, sample_weight: bool | None | str = '$UNCHANGED$') → RidgeRegressionSystem#

Configure whether metadata should be requested to be passed to the score method.

The options for each parameter are:

True: metadata is requested, and passed to score if provided. The request is ignored if metadata is not provided.
False: metadata is not requested and the meta-estimator will not pass it to score.
None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.
str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

New in version 1.3.

Parameters:: sample_weight (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for sample_weight parameter in score.
Returns:: self – The updated object.
Return type:: object

class ModifiedLinearRegression(method, fit_intercept=True, positive=False, error_offset=0.01, bootstrap_method='panel', bootstrap_iters=1000, resample_ratio=1, analytic_method=None)[source]#

Bases: BaseModifiedRegressor

adjust_analytical_se(model, X, y, analytic_method=None)[source]#

Adjust the coefficients of the OLS linear regression model by an analytical standard error formula.

Parameters:

model (LinearRegression) – The underlying OLS linear regression model to be modified.
X (pd.DataFrame) – Input feature matrix.
y (pd.DataFrame or pd.Series) – Target vector associated with each sample in X.
analytic_method (str, default = None) – The analytic method used to calculate standard errors.

Returns:

intercept (float) – Adjusted intercept.
coef (np.ndarray) – Adjusted coefficients.

Notes

By default, the calculated standard errors use the usual standard error expression for OLS linear regression models under the assumption of multivariate normality, homoskedasticity and zero mean of the model errors. If analytic_method = “White”, the HC3 White estimator is used.

References

[1] https://online.stat.psu.edu/stat462/node/131/ [2] https://en.wikipedia.org/wiki/Heteroskedasticity-consistent_standard_errors

set_params(**params)[source]#

Setter method to update the parameters of the ModifiedLinearRegression

Parameters:: **params (dict) – Dictionary of parameters to update.
Returns:: The ModifiedLinearRegression instance with updated parameters.
Return type:: self

set_score_request(*, sample_weight: bool | None | str = '$UNCHANGED$') → ModifiedLinearRegression#

Configure whether metadata should be requested to be passed to the score method.

The options for each parameter are:

True: metadata is requested, and passed to score if provided. The request is ignored if metadata is not provided.
False: metadata is not requested and the meta-estimator will not pass it to score.
None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.
str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

New in version 1.3.

Parameters:: sample_weight (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for sample_weight parameter in score.
Returns:: self – The updated object.
Return type:: object

class ModifiedSignWeightedLinearRegression(method, fit_intercept=True, positive=False, error_offset=0.01, bootstrap_method='panel', bootstrap_iters=1000, resample_ratio=1, analytic_method=None)[source]#

Bases: BaseModifiedRegressor

adjust_analytical_se(model, X, y, analytic_method=None)[source]#

Adjust the coefficients of the SWLS linear regression model by an analytical standard error formula.

Parameters:

model (SignWeightedLinearRegression) – The underlying SWLS linear regression model to be modified.
X (pd.DataFrame) – Input feature matrix.
y (pd.DataFrame or pd.Series) – Target vector associated with each sample in X.
analytic_method (str, default = None) – The analytic method used to calculate standard errors.

Returns:

intercept (float) – Adjusted intercept.
coef (np.ndarray) – Adjusted coefficients.

Notes

The analytical parameter estimates for WLS are:

\[\hat{\beta}^{\text{WLS}} = (X^{\intercal}WX)^{-1}X^{\intercal}y\]

where:

X is the input feature matrix, possibly with a column of ones representing the choice of an intercept.
W is the positive-definite, symmetric weight matrix, a diagonal matrix with sample weights along the main diagonal.
y is the dependent variable vector.

Since W is a positive-definite, symmetric matrix, it has a square root equal to the diagonal matrix with square roots of the sample weights along the diagonal. Hence, the WLS estimator can be rewritten as:

\[\hat{\beta}^{\text{WLS}} = ((({W^{1/2}X})^{\intercal}(W^{1/2}X))^{-1}(W^{1/2}X)^{\intercal}(W^{1/2}y))\]

This is precisely the OLS estimator for a rescaled matrix

\[\tilde {X} = W^{1/2}X\]

and a rescaled dependent variable

\[\tilde {y} = W^{1/2}y\]

Hence, the usual standard error estimate and White’s estimator can be applied based on a rescaling of the design matrix and associated target vector.

set_params(**params)[source]#

Setter method to update the parameters of the ModifiedSignWeightedLinearRegression.

Parameters:: **params (dict) – Dictionary of parameters to update.
Returns:: The ModifiedSignWeightedLinearRegression instance with updated parameters.
Return type:: self

set_score_request(*, sample_weight: bool | None | str = '$UNCHANGED$') → ModifiedSignWeightedLinearRegression#

Configure whether metadata should be requested to be passed to the score method.

The options for each parameter are:

True: metadata is requested, and passed to score if provided. The request is ignored if metadata is not provided.
False: metadata is not requested and the meta-estimator will not pass it to score.
None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.
str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

New in version 1.3.

Parameters:: sample_weight (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for sample_weight parameter in score.
Returns:: self – The updated object.
Return type:: object

class ModifiedTimeWeightedLinearRegression(method, fit_intercept=True, positive=False, half_life=252, error_offset=0.01, bootstrap_method='panel', bootstrap_iters=1000, resample_ratio=1, analytic_method=None)[source]#

Bases: BaseModifiedRegressor

adjust_analytical_se(model, X, y, analytic_method)[source]#

Adjust the coefficients of the TWLS linear regression model by an analytical standard error formula.

Parameters:

model (TimeWeightedLinearRegression) – The underlying TWLS linear regression model to be modified.
X (pd.DataFrame) – Input feature matrix.
y (pd.DataFrame or pd.Series) – Target vector associated with each sample in X.
analytic_method (str, default = None) – The analytic method used to calculate standard errors.

Returns:

intercept (float) – Adjusted intercept.
coef (np.ndarray) – Adjusted coefficients.

Notes

The analytical parameter estimates for WLS are:

\[\hat{\beta}^{\text{WLS}} = (X^{\intercal}WX)^{-1}X^{\intercal}y\]

where:

X is the input feature matrix, possibly with a column of ones representing the choice of an intercept.
W is the positive-definite, symmetric weight matrix, a diagonal matrix with sample weights along the main diagonal.
y is the dependent variable vector.

\[\hat{\beta}^{\text{WLS}} = (({W^{1/2}X})^{\intercal}(W^{1/2}X))^{-1}(W^{1/2}X)^{\intercal}(W^{1/2}y))\]

This is precisely the OLS estimator for a rescaled matrix

\[\tilde {X} = W^{1/2}X\]

and a rescaled dependent variable

\[\tilde {y} = W^{1/2}y\]

Hence, the usual standard error estimate and White’s estimator can be applied based on a rescaling of the design matrix and associated target vector.

set_params(**params)[source]#

Setter method to update the parameters of the ModifiedTimeWeightedLinearRegression.

Parameters:: **params (dict) – Dictionary of parameters to update.
Returns:: The ModifiedTimeWeightedLinearRegression instance with updated parameters.
Return type:: self

set_score_request(*, sample_weight: bool | None | str = '$UNCHANGED$') → ModifiedTimeWeightedLinearRegression#

Configure whether metadata should be requested to be passed to the score method.

The options for each parameter are:

True: metadata is requested, and passed to score if provided. The request is ignored if metadata is not provided.
False: metadata is not requested and the meta-estimator will not pass it to score.
None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.
str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

New in version 1.3.

Parameters:: sample_weight (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for sample_weight parameter in score.
Returns:: self – The updated object.
Return type:: object

class ProbabilityEstimator(classifier)[source]#

Bases: BaseEstimator, MetaEstimatorMixin, ClassifierMixin

Meta estimator to create trading signals based on the probability of going long.

Parameters:: classifier (ClassifierMixin) – A scikit-learn classifier.

Notes

This class stores feature importances as the feature importances of the base estimator as well as defining a create_signal method that returns the probability of going long in excess of 0.5. This is taken into account when used in the SignalOptimizer class in this package.

fit(X, y)[source]#

Fit the underlying classifier.

Parameters:

X (pd.DataFrame or np.ndarray) – Pandas dataframe or numpy array of input features.
y (pd.Series or pd.DataFrame or np.ndarray) – Pandas series, dataframe or numpy array of targets associated with each sample in X.

predict(X)[source]#

Predict the class labels for the provided data.

Parameters:: X (pd.DataFrame or numpy array) – Input feature matrix.
Returns:: y_pred – Numpy array of predictions.
Return type:: np.ndarray

create_signal(X)[source]#

Create a trading signal based on the probability of going long.

Parameters:: X (pd.DataFrame or numpy array) – Input feature matrix.
Returns:: y_pred – Numpy array of signals.
Return type:: np.ndarray

set_score_request(*, sample_weight: bool | None | str = '$UNCHANGED$') → ProbabilityEstimator#

Configure whether metadata should be requested to be passed to the score method.

The options for each parameter are:

True: metadata is requested, and passed to score if provided. The request is ignored if metadata is not provided.
False: metadata is not requested and the meta-estimator will not pass it to score.
None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.
str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

New in version 1.3.

Parameters:: sample_weight (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for sample_weight parameter in score.
Returns:: self – The updated object.
Return type:: object

class VotingClassifier(estimators, voting='hard', weights=None, n_jobs=None, flatten_transform=True, verbose=False)[source]#

Bases: VotingClassifier

Classification model that votes on the predictions of many classifiers.

Parameters:

estimators (list of (str, estimator) tuples) – List of (name, estimator) tuples that are used to fit the model.
voting ({'hard', 'soft'}, default='hard') – If ‘hard’, uses predicted class labels for majority rule voting. If ‘soft’, predicts the class label based on the argmax of the sums of the predicted probabilities, which is recommended for an ensemble of well-calibrated classifiers.
weights (array-like of shape (n_estimators,), default=None) – Sequence of weights to assign to models. If None, models are weighted equally.
n_jobs (int, default=None) – The number of jobs to run in parallel for fit. None means 1 unless in a joblib.parallel_backend context. -1 means using all processors.
flatten_transform (bool, default=True) – Affects shape of transform output only when voting=’soft’. If True, the transform method returns a matrix with shape (n_samples, n_classes*n_classifiers). If False, the shape is (n_classifiers, n_samples, n_classes).
verbose (bool, default=False) – If True, the time elapsed while fitting will be printed as model trains.

Notes

This class calculates feature importances as the average of the feature importances of the base estimators.

fit(X, y, sample_weight=None, **fit_params)[source]#

Fit the estimators.

Parameters:

X (pd.DataFrame or np.ndarray) – Pandas dataframe or numpy array of input features.
y (pd.Series or pd.DataFrame or np.ndarray) – Pandas series, dataframe or numpy array of targets associated with each sample in X.

set_fit_request(*, sample_weight: bool | None | str = '$UNCHANGED$') → VotingClassifier#

Configure whether metadata should be requested to be passed to the fit method.

The options for each parameter are:

True: metadata is requested, and passed to fit if provided. The request is ignored if metadata is not provided.
False: metadata is not requested and the meta-estimator will not pass it to fit.
None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.
str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

New in version 1.3.

Parameters:: sample_weight (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for sample_weight parameter in fit.
Returns:: self – The updated object.
Return type:: object

set_score_request(*, sample_weight: bool | None | str = '$UNCHANGED$') → VotingClassifier#

Configure whether metadata should be requested to be passed to the score method.

The options for each parameter are:

True: metadata is requested, and passed to score if provided. The request is ignored if metadata is not provided.
False: metadata is not requested and the meta-estimator will not pass it to score.
None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.
str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

New in version 1.3.

Parameters:: sample_weight (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for sample_weight parameter in score.
Returns:: self – The updated object.
Return type:: object

class VotingRegressor(estimators, weights=None, verbose=False)[source]#

Bases: VotingRegressor

Regression model that averages the predictions of many regression models.

Parameters:

estimators (list of (str, estimator) tuples) – List of (name, estimator) tuples that are used to fit the model.
weights (array-like of shape (n_estimators,), default=None) – Sequence of weights to assign to models. If None, models are weighted equally.
verbose (bool, default=False) – If True, the time elapsed while fitting will be printed as model trains.

Notes

This class calculates feature importances as the average of the feature importances of the base estimators.

fit(X, y, **fit_params)[source]#

Fit the estimators.

Parameters:

X (pd.DataFrame or np.ndarray) – Pandas dataframe or numpy array of input features.
y (pd.Series or pd.DataFrame or np.ndarray) – Pandas series, dataframe or numpy array of targets associated with each sample in X.

set_score_request(*, sample_weight: bool | None | str = '$UNCHANGED$') → VotingRegressor#

Configure whether metadata should be requested to be passed to the score method.

The options for each parameter are:

True: metadata is requested, and passed to score if provided. The request is ignored if metadata is not provided.
False: metadata is not requested and the meta-estimator will not pass it to score.
None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.
str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

New in version 1.3.

Parameters:: sample_weight (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for sample_weight parameter in score.
Returns:: self – The updated object.
Return type:: object

class FIExtractor(estimator)[source]#

Bases: BaseEstimator, MetaEstimatorMixin, RegressorMixin

fit(X, y)[source]#

Fit the underlying estimator and store normalized feature importances.

Parameters:

X (pd.DataFrame or np.ndarray) – Pandas dataframe or numpy array of input features.
y (pd.Series or pd.DataFrame or np.ndarray) – Pandas series, dataframe or numpy array of targets associated with each sample in X.

predict(X)[source]#

Predict the class labels for the provided data.

Parameters:: X (pd.DataFrame or numpy array) – Input feature matrix.
Returns:: y_pred – Numpy array of predictions.
Return type:: np.ndarray

set_score_request(*, sample_weight: bool | None | str = '$UNCHANGED$') → FIExtractor#

Configure whether metadata should be requested to be passed to the score method.

The options for each parameter are:

True: metadata is requested, and passed to score if provided. The request is ignored if metadata is not provided.
False: metadata is not requested and the meta-estimator will not pass it to score.
None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.
str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

New in version 1.3.

Parameters:: sample_weight (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for sample_weight parameter in score.
Returns:: self – The updated object.
Return type:: object

class DataFrameTransformer(transformer, column_names=None)[source]#

Bases: BaseEstimator, TransformerMixin, MetaEstimatorMixin

Meta estimator to reconvert a transformed numpy array back to a multiindexed pandas DataFrame. This maintains the multi-indexed panel structure.

Parameters:: transformer (TransformerMixin) – A scikit-learn transformer with a fit and transform method.

Notes

Many scikit-learn compatible transformers convert pandas DataFrames to numpy arrays. This can be problematic when working with panel models that require knowledge of the panel structure. This class wraps around such transformers to ensure that the output is a pandas DataFrame, preserving the original index.

When no column names are provided, default names of the form “Factor_0”, “Factor_1”, etc. are used for the transformed DataFrame. If column names are provided, they will be used instead.

fit(X, y=None)[source]#

Fit the underlying transformer.

Parameters:

X (pd.DataFrame) – Pandas dataframe of input features.
y (pd.Series or pd.DataFrame or np.ndarray) – Pandas series, dataframe or numpy array of targets associated with each sample in X.

transform(X)[source]#

Transform the input data based on the underlying transformer, but return a pandas DataFrame instead of a numpy array.

Parameters:: X (pd.DataFrame or numpy array) – Input feature matrix.
Returns:: Transformed data as a pandas DataFrame, preserving the original index and using either provided column names or default names.
Return type:: pd.DataFrame

class GlobalLocalRegression(local_lambda=1, global_lambda=1, positive=False, fit_intercept=True, min_xs_samples=36)[source]#

Bases: BaseEstimator, RegressorMixin

Linear panel model with hierarchical shrinkage of country-specific (local) coefficients towards unknown global coefficients. Learning means that both country-specific and global coefficients are estimated from data.

Parameters:

local_lambda (float, default=1) – Regularization strength to pull local coefficients towards global coefficients.
global_lambda (float, default=1) – Regularization strength to pull global coefficients towards zero.
positive (bool, default=False) – Whether to constrain all coefficients to be positive. Default is False.
fit_intercept (bool, default=True) – Whether to fit an intercept term. Default is True.
min_xs_samples (int, default=36) – Minimum number of samples required in each group for the group to be considered a contribution to the mean squared error component of the loss function.

Notes

A panel can be modelled from a global perspective, where time series of all countries are “pooled” or stacked together, meaning that samples from different countries are treated as independent. This is called a pooled regression. With one model fit on all countries’ data, this is a high-bias, low-variance model.

Alternatively, country-by-country regressions can be fit, with a separate model for each country. This is low-bias but high-variance, since each model sees less data.

This implies that a balance can be found between these two extremes by balancing this bias-variance trade-off. Introduction of bias to the country-by-country models can lead to a potentially substantial reduction in variance. Mathematically, this fit is found by minimizing the sum of squared residuals for each country, with a term that penalizes deviation of country-specific coefficients from a global coefficient. The global coefficient is also penalized to prevent it from growing too large.

The loss function is as follows:

\[L(\{\beta_i\}_{i=1}^{C}, \beta) = \frac{1}{C} \sum_{i = 1}^{C} \left [ \frac{1}{n_{i}} \sum_{t=1}^{n_{i}} (y_{it} - x_{it}^{\intercal} \beta_{i})^2 \right ] + \lambda_{\text{local}} \sum_{i=1}^{C} ||\beta_i - \beta||_{2}^{2} + \lambda_{\text{global}} ||\beta||_{2}^{2}\]

fit(X, y, sample_weight=None)[source]#

Fit the global-local model.

Parameters:

X (pd.DataFrame) – Input feature matrix, multi-indexed by cid and real_date.
y (pd.DataFrame or pd.Series) – Target vector associated with each sample in X, multi-indexed by cid and real_date.
sample_weight (np.ndarray, optional) – Sample weights for each sample in X. If provided, it should be a 1D array with the same length as the number of samples in X. If None, all samples are treated equally.

Returns:

Fitted estimator.

Return type:

self

loss(weights)[source]#

Loss function for the global-local regression model.

Parameters:: weights (np.ndarray) – Flattened array of weights, where the last n_features_ elements correspond with the global coefficients and the rest correspond to the local coefficients for each country.

loss_derivative(weights)[source]#

Derivative of the loss function with respect to the weights.

Parameters:: weights (np.ndarray) – Flattened array of weights, where the last n_features_ elements correspond with the global coefficients and the rest correspond to the local coefficients for each country.

predict(X)[source]#

Predict the target values for the given input data.

Parameters:: X (pd.DataFrame) – Input features for prediction.
Returns:: Predicted target values.
Return type:: np.ndarray

set_fit_request(*, sample_weight: bool | None | str = '$UNCHANGED$') → GlobalLocalRegression#

Configure whether metadata should be requested to be passed to the fit method.

The options for each parameter are:

True: metadata is requested, and passed to fit if provided. The request is ignored if metadata is not provided.
False: metadata is not requested and the meta-estimator will not pass it to fit.
None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.
str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

New in version 1.3.

Parameters:: sample_weight (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for sample_weight parameter in fit.
Returns:: self – The updated object.
Return type:: object

set_score_request(*, sample_weight: bool | None | str = '$UNCHANGED$') → GlobalLocalRegression#

Configure whether metadata should be requested to be passed to the score method.

The options for each parameter are:

True: metadata is requested, and passed to score if provided. The request is ignored if metadata is not provided.
False: metadata is not requested and the meta-estimator will not pass it to score.
None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.
str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

New in version 1.3.

Parameters:: sample_weight (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for sample_weight parameter in score.
Returns:: self – The updated object.
Return type:: object

class CountryByCountryRegression(estimator, min_xs_samples=32)[source]#

Bases: BaseEstimator, MetaEstimatorMixin, RegressorMixin

MetaEstimator to fit a scikit-learn-compatible regressor on each country’s data slice in a panel. If a country has fewer samples than min_xs_samples, a global model is used for the sake of prediction.

Parameters:

estimator (object) – A scikit-learn compatible regressor that will be cloned for each country.
min_xs_samples (int, default=32) – Minimum number of samples required for fitting a country-specific model. If a country has fewer samples, the global model will be used for predictions.

Notes

Country by country regressions model a panel through a “bottoms-up” approach, treating each country as a separate regression problem. This is useful when a panel is particularly heterogeneous or each time series in the panel is long. Short time series results in a low-bias, high-variance model that tends to underperform a global forecasting model. Regularization on each country-specific model can help improve performance.

fit(X, y)[source]#

predict(X)[source]#

Predict the target values for the given input data.

Parameters:: X (pd.DataFrame) – Input features for prediction.
Returns:: Predicted target values.
Return type:: np.ndarray

set_score_request(*, sample_weight: bool | None | str = '$UNCHANGED$') → CountryByCountryRegression#

Configure whether metadata should be requested to be passed to the score method.

The options for each parameter are:

True: metadata is requested, and passed to score if provided. The request is ignored if metadata is not provided.
False: metadata is not requested and the meta-estimator will not pass it to score.
None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.
str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

New in version 1.3.

Parameters:: sample_weight (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for sample_weight parameter in score.
Returns:: self – The updated object.
Return type:: object

class TimeWeightedWrapper(model, half_life)[source]#

Bases: BaseEstimator, RegressorMixin

Meta-estimator that applies time-based weighting to samples during model fitting.

Parameters:

model (BaseEstimator) – An instance of a scikit-learn compatible regression model.
half_life (float) – The half-life parameter for the exponential decay weighting.

fit(X, y)[source]#

Fit the underlying model with time weights applied.

Parameters:

X (pandas.DataFrame or np.ndarray) – The feature matrix.
y (pandas.Series or np.ndarray) – The target vector.

predict(X)[source]#

Predict using the underlying model.

Parameters:: X (pandas.DataFrame or np.ndarray) – The feature matrix.
Returns:: predictions – The predicted values.
Return type:: np.ndarray

set_score_request(*, sample_weight: bool | None | str = '$UNCHANGED$') → TimeWeightedWrapper#

Configure whether metadata should be requested to be passed to the score method.

The options for each parameter are:

True: metadata is requested, and passed to score if provided. The request is ignored if metadata is not provided.
False: metadata is not requested and the meta-estimator will not pass it to score.
None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.
str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

New in version 1.3.

Parameters:: sample_weight (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for sample_weight parameter in score.
Returns:: self – The updated object.
Return type:: object

class PLSTransformer(n_components=2)[source]#

Bases: BaseEstimator, TransformerMixin

Extract PLS components from scikit-learn’s PLSRegression.

Parameters:: n_components (int, default=2) – Number of PLS components to extract.

fit(X, y)[source]#

Fit the PLS model to the data.

Parameters:

X (pd.DataFrame, pd.Series or np.ndarray) – The input feature matrix.
y (pd.DataFrame, pd.Series or np.ndarray) – The target variable.

Returns:

The fitted model.

Return type:

self

transform(X)[source]#

Transform the input data to the latent PLS space.

Parameters:: X (pd.DataFrame, pd.Series or np.ndarray) – The input feature matrix to be transformed.

class LinearMultiTargetRegression(fit_intercept=True, seemingly_unrelated=False, covariance_estimator='ewm', span=60, feature_selection=None)[source]#

Bases: BaseEstimator, RegressorMixin

Linear regression model with multiple targets, supporting seemingly unrelated regression (SUR) via feasible generalized least squares (FGLS).

Parameters:

fit_intercept (bool, default=True) – Whether to include an intercept term in the regression.
seemingly_unrelated (bool, default=False) – Whether to make the regression seemingly unrelated.
covariance_estimator (Union[str, BaseEstimator], default="ewm") – Choice of covariance estimator. Options are “ml” for maximum likelihood, “ewm” for exponentially weighted moving covariance, or a custom scikit-learn compatible covariance estimator.
span (int, default=60) – Span parameter for exponentially weighted covariance estimation of residuals.
feature_selection (object, default=None) – A feature selection object inheriting from scikit-learn’s SelectorMixin base class in sklearn.feature_selection. If provided, feature selection is applied per target before fitting.

fit(X, y, sample_weight=None)[source]#

Fit the linear multi-target regression model.

Parameters:

X (pd.DataFrame) – Feature matrix of shape (n_samples, n_features). Should be multi-indexed by asset and real date.
y (pd.DataFrame) – Target matrix of shape (n_samples, n_assets). Should be multi-indexed by asset and real date.
sample_weight (array-like of shape (n_samples,), default=None) – Individual weights for each sample.

predict(X)[source]#

Predict method to return predictions for each asset.

Parameters:: X (pd.DataFrame) – Feature matrix of shape (n_samples, n_features). Should be multi-indexed by asset and real date.

set_fit_request(*, sample_weight: bool | None | str = '$UNCHANGED$') → LinearMultiTargetRegression#

Configure whether metadata should be requested to be passed to the fit method.

The options for each parameter are:

True: metadata is requested, and passed to fit if provided. The request is ignored if metadata is not provided.
False: metadata is not requested and the meta-estimator will not pass it to fit.
None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.
str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

New in version 1.3.

Parameters:: sample_weight (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for sample_weight parameter in fit.
Returns:: self – The updated object.
Return type:: object

set_score_request(*, sample_weight: bool | None | str = '$UNCHANGED$') → LinearMultiTargetRegression#

Configure whether metadata should be requested to be passed to the score method.

The options for each parameter are:

True: metadata is requested, and passed to score if provided. The request is ignored if metadata is not provided.
False: metadata is not requested and the meta-estimator will not pass it to score.
None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.
str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

New in version 1.3.

Parameters:: sample_weight (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for sample_weight parameter in score.
Returns:: self – The updated object.
Return type:: object

class MultiLayerPerceptron(n_inputs, n_latent, n_outputs, encoder_activation='tanh', head_activation='identity', fit_encoder_intercept=False, fit_head_intercept=True)[source]#

Bases: Module

Multi-layer perceptron models in PyTorch.

Parameters:

n_inputs (int) – Number of input features. Must be at least 1.
n_latent (Union[int, list[int]]) – Number of latent features in a single hidden layer or list specifying the size of each hidden layer.
n_outputs (int) – Number of output variables. Must be at least 1.
encoder_activation (str, optional) – Activation function for the encoder layers. Default is “tanh”. Other options include “relu” and “sigmoid”.
head_activation (str, optional) – Activation function for the head layers. Default is “identity” for no activation. Other options include “tanh”, “relu” and “sigmoid”.
fit_encoder_intercept (bool, optional) – Whether to fit intercepts in the encoder layers. Default is False.
fit_head_intercept (bool, optional) – Whether to fit intercepts in the output head. Default is True.

Notes

A multi-layer perceptron is a feed-forward neural network that learns a (hopefully) optimal representation of the feature set for a prediction task, or for a collection of tasks. The intitial set is transformed into a new, “learnt”, collection of features. This is the “first hidden layer” of the network. Each learnt feature is the composition of the linear combination of initial features and a non-linear activation function. The choice of activation is currently “relu” ($f(x) = \max(0, x)$), “tanh” ($f(x) = \frac{e^x - e^{-x}}{e^x + e^{-x}}$), or “sigmoid” ($f(x) = \frac{1}{1 + e^{-x}}$). This new feature set can be further transformed in the same manner by creating a second hidden layer, and so on.

The part of the network that describes how the initial features are transformed into the final features (before mapping to the outputs) is called the “encoder”. The component that maps the final learnt features to the outputs is called the “projection head”. When multiple outputs are being modelled, this is usually referred to as having a “multi-head” architecture.

What’s the advantage of a feedforward neural network over other models on tabular datasets? Structure and customizability. 32 neurons in a hidden layer means that 32 features are being learnt. I can shrink these features towards priors, if I have any beliefs. I can regularize network outputs to encourage smoothness (temporal regularization) and consistency with known relationships (spatial regularization). I can customize loss functions to optimize economically informed losses rather than generic distance metrics. I can penalize correlation against existing strategies, if so desired. People often refer to neural network flexibility in the context of learning an arbitrarily complex function. While this is true, I would use the word “flexibility” to refer to the ability to customize architectures and loss functions to suit a particular problem.

Future work#

Add dropout layers for regularization.
Support for skip connections.

forward(x)[source]#

Forward pass through the network.

Parameters:: x (torch.Tensor) – Input tensor of shape (batch_size, n_inputs).
Returns:: Output tensor of shape (batch_size, n_outputs).
Return type:: torch.Tensor

class TimeSeriesSampler(dataset, batch_size, shuffle=True, aggregate_last=True, drop_last=False)[source]#

Bases: Sampler

Batch sampler for datasets indexed by time, to ensure that batches are comprised of samples from contiguous time periods.

Parameters:

dataset (torch.utils.data.Dataset) – The PyTorch dataset to sample from.
batch_size (int) – Number of samples per batch.
shuffle (bool, optional) – Whether to shuffle the order of batches. Default is True.
aggregate_last (bool, optional) – Whether to aggregate the last batch with the previous one if it has length smaller than batch_size. Default is True.
drop_last (bool, optional) – Whether to drop the last batch if it has length smaller than batch_size. Default is False.

class MultiOutputSharpe(skip_validation=True, unbiased=True)[source]#

Bases: Module

Negative Sharpe ratio loss for multi-output regression problems.

Notes

When a neural network is designed so that the output can be interpreted as signals or portfolio weights for each output, a stylized Sharpe ratio can be calculated by multiplying the true returns by the respective signals or weights, before downsampling to portfolio returns. The Sharpe ratio, excluding trading frictions such as transaction costs, can be calculated over the batch.

Neural networks are most naturally formulated as minimization problems, so the negative Sharpe ratio is used as a loss function.

forward(y_pred, y_true)[source]#

Evaluate batch negative Sharpe ratio loss.

Parameters:

y_pred (torch.Tensor) – Predicted outputs (signals or portfolio weights).
y_true (torch.Tensor) – True outputs (returns).

class MultiOutputMCR(skip_validation=True, unbiased=True)[source]#

Bases: Module

Negative mean-concentration risk ratio loss for multi-output regression problems.

Notes

By mean-concentration risk ratio, we refer to the ratio of the mean return within a time period, to the standard deviation of returns within that time period. This differs from a Sharpe ratio in that the Sharpe is a temporal quantity, whereas this statistic is cross-sectional. Maximisation of such a statistic would encourage positive returns at each time period whilst penalising diversity in the cross-sectional return distribution. The goal is to encourage prevent the model from concentrating returns in a small subset of the outputs.

This statistic can be calculated for each sample in a batch, and then averaged over the batch. Neural networks are most naturally formulated as minimization problems, so the negative mean-concentration risk ratio is used as a loss function.

forward(y_pred, y_true)[source]#

Evaluate batch negative mean-concentration risk ratio loss.

Parameters:

y_pred (torch.Tensor) – Predicted outputs (signals or portfolio weights).
y_true (torch.Tensor) – True outputs (returns).

class MLPRegressor(n_latent, loss_func=MSELoss(), weight_decay=0.0001, reg_turnover=0, batch_size=16, learning_rate=0.0003, use_ts_sampler=True, encoder_activation='tanh', head_activation='identity', fit_encoder_intercept=False, fit_head_intercept=True, epochs=10000, patience=1000, train_pct=0.7, verbose=False, random_state=42, inverse_transform_preds=False)[source]#

Bases: BaseEstimator, RegressorMixin

Scikit-learn compatible multi-layer perceptron (MLP) regressor implemented in PyTorch.

This estimator wraps macrosynergy.learning.forecasting.torch.MultiLayerPerceptron and trains it via macrosynergy.learning.forecasting.torch.MLPTrainer, including optional scaling of inputs and targets using sklearn.preprocessing.StandardScaler.

Parameters:

n_latent (int) – Number of hidden units in the latent layer of the MLP.
loss_func (torch.nn.Module, optional) – Loss function used during training. Default is nn.MSELoss().
weight_decay (float, optional) – L2 regularization strength applied via the optimizer. Default is 1e-4.
reg_turnover (float, optional) – Additional turnover regularization penalty applied by the trainer. Default is 0.
batch_size (int, optional) – Batch size used during training. Default is 16.
learning_rate (float, optional) – Learning rate used by the optimizer. Default is 3e-4.
use_ts_sampler (bool, optional) – Whether to use time-series batch sampling during training. Default is True.
encoder_activation (str, optional) – Activation function for the encoder (hidden) component of the network. Default is “tanh”.
head_activation (str, optional) – Activation function for the head (output) component of the network. Default is “identity”.
fit_encoder_intercept (bool, optional) – Whether to include an intercept (bias term) in the encoder layers. Default is False.
fit_head_intercept (bool, optional) – Whether to include an intercept (bias term) in the output layer. Default is True.
epochs (int, optional) – Maximum number of training epochs. Default is 10000.
patience (int, optional) – Number of epochs to wait for improvement before early stopping. Default is 1000.
train_pct (float, optional) – Fraction of samples used for training (remainder used for validation). Default is 0.7.
verbose (bool, optional) – Whether to print training diagnostics. Default is False.
random_state (int, optional) – Random seed used for PyTorch initialization and training. Default is 42.
inverse_transform_preds (bool, optional) – Whether to inverse-transform predictions back to the original target scale using the fitted target scaler. Default is False.

fit(X, y)[source]#

predict(X)[source]#

set_score_request(*, sample_weight: bool | None | str = '$UNCHANGED$') → MLPRegressor#

Configure whether metadata should be requested to be passed to the score method.

The options for each parameter are:

True: metadata is requested, and passed to score if provided. The request is ignored if metadata is not provided.
False: metadata is not requested and the meta-estimator will not pass it to score.
None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.
str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

New in version 1.3.

Parameters:: sample_weight (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for sample_weight parameter in score.
Returns:: self – The updated object.
Return type:: object

macrosynergy.learning.forecasting#

Future work#

Subpackages#

Submodules#