macrosynergy.learning.forecasting#
- class LADRegressor(fit_intercept=True, positive=False, alpha=0, shrinkage_type='l1', tol=None, maxiter=None)[source]#
Bases:
BaseEstimator
,RegressorMixin
- fit(X, y, sample_weight=None)[source]#
Learn LAD regression model parameters.
- Parameters:
X (pd.DataFrame or np.ndarray) – Input feature matrix.
y (pd.Series or pd.DataFrame or np.ndarray) – Target vector associated with each sample in X.
sample_weight (np.ndarray, default=None) – Numpy array of sample weights to create a weighted LAD regression model.
- predict(X)[source]#
Predict dependent variable using the fitted LAD regression model.
- Parameters:
X (pd.DataFrame or np.ndarray) – Input feature matrix.
- Returns:
y_pred – Numpy array of predictions.
- Return type:
np.ndarray
Notes
If the model learning algorithm failed to converge, the predict method will return an array of zeros. This has the interpretation of no buy/sell signal being triggered based on this model.
- set_fit_request(*, sample_weight: bool | None | str = '$UNCHANGED$') LADRegressor #
Request metadata passed to the
fit
method.Note that this method is only relevant if
enable_metadata_routing=True
(seesklearn.set_config()
). Please see User Guide on how the routing mechanism works.The options for each parameter are:
True
: metadata is requested, and passed tofit
if provided. The request is ignored if metadata is not provided.False
: metadata is not requested and the meta-estimator will not pass it tofit
.None
: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str
: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED
) retains the existing request. This allows you to change the request for some parameters and not others.New in version 1.3.
Note
This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a
Pipeline
. Otherwise it has no effect.
- set_score_request(*, sample_weight: bool | None | str = '$UNCHANGED$') LADRegressor #
Request metadata passed to the
score
method.Note that this method is only relevant if
enable_metadata_routing=True
(seesklearn.set_config()
). Please see User Guide on how the routing mechanism works.The options for each parameter are:
True
: metadata is requested, and passed toscore
if provided. The request is ignored if metadata is not provided.False
: metadata is not requested and the meta-estimator will not pass it toscore
.None
: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str
: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED
) retains the existing request. This allows you to change the request for some parameters and not others.New in version 1.3.
Note
This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a
Pipeline
. Otherwise it has no effect.
- class KNNClassifier(n_neighbors='sqrt', weights='uniform')[source]#
Bases:
ClassifierMixin
,BaseEstimator
- fit(X, y)[source]#
Fit method.
- Parameters:
X (pd.DataFrame or np.ndarray) – The input feature matrix.
y (pd.Series or np.ndarray) – The target variable.
- Returns:
The fitted model.
- Return type:
self
- predict(X)[source]#
Predict method.
- Parameters:
X (pd.DataFrame or np.ndarray) – The input feature matrix.
- Returns:
The predicted values.
- Return type:
np.ndarray
- predict_proba(X)[source]#
Predict probability method.
- Parameters:
X (pd.DataFrame or np.ndarray) – The input feature matrix.
- Returns:
The predicted probabilities.
- Return type:
np.ndarray
- set_score_request(*, sample_weight: bool | None | str = '$UNCHANGED$') KNNClassifier #
Request metadata passed to the
score
method.Note that this method is only relevant if
enable_metadata_routing=True
(seesklearn.set_config()
). Please see User Guide on how the routing mechanism works.The options for each parameter are:
True
: metadata is requested, and passed toscore
if provided. The request is ignored if metadata is not provided.False
: metadata is not requested and the meta-estimator will not pass it toscore
.None
: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str
: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED
) retains the existing request. This allows you to change the request for some parameters and not others.New in version 1.3.
Note
This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a
Pipeline
. Otherwise it has no effect.
- class SignWeightedLADRegressor(fit_intercept=True, positive=False, alpha=0, shrinkage_type='l1', tol=None, maxiter=None)[source]#
Bases:
SignWeightedRegressor
- set_params(**params)[source]#
Setter method to update the parameters of the SignWeightedLADRegressor.
- Parameters:
**params (dict) – Dictionary of parameters to update.
- Returns:
The SignWeightedLADRegressor instance with updated parameters.
- Return type:
self
- set_score_request(*, sample_weight: bool | None | str = '$UNCHANGED$') SignWeightedLADRegressor #
Request metadata passed to the
score
method.Note that this method is only relevant if
enable_metadata_routing=True
(seesklearn.set_config()
). Please see User Guide on how the routing mechanism works.The options for each parameter are:
True
: metadata is requested, and passed toscore
if provided. The request is ignored if metadata is not provided.False
: metadata is not requested and the meta-estimator will not pass it toscore
.None
: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str
: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED
) retains the existing request. This allows you to change the request for some parameters and not others.New in version 1.3.
Note
This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a
Pipeline
. Otherwise it has no effect.
- class TimeWeightedLADRegressor(fit_intercept=True, positive=False, half_life=252, alpha=0, shrinkage_type='l1', tol=None, maxiter=None)[source]#
Bases:
TimeWeightedRegressor
- set_params(**params)[source]#
Setter method to update the parameters of the TimeWeightedLADRegressor.
- Parameters:
**params (dict) – Dictionary of parameters to update.
- Returns:
The TimeWeightedLADRegressor instance with updated parameters.
- Return type:
self
- set_score_request(*, sample_weight: bool | None | str = '$UNCHANGED$') TimeWeightedLADRegressor #
Request metadata passed to the
score
method.Note that this method is only relevant if
enable_metadata_routing=True
(seesklearn.set_config()
). Please see User Guide on how the routing mechanism works.The options for each parameter are:
True
: metadata is requested, and passed toscore
if provided. The request is ignored if metadata is not provided.False
: metadata is not requested and the meta-estimator will not pass it toscore
.None
: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str
: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED
) retains the existing request. This allows you to change the request for some parameters and not others.New in version 1.3.
Note
This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a
Pipeline
. Otherwise it has no effect.
- class SignWeightedLinearRegression(fit_intercept=True, positive=False, alpha=0, shrinkage_type='l1')[source]#
Bases:
SignWeightedRegressor
- set_params(**params)[source]#
Setter method to update the parameters of the SignWeightedLinearRegression.
- Parameters:
**params (dict) – Dictionary of parameters to update.
- Returns:
The SignWeightedLinearRegression instance with updated parameters.
- Return type:
self
- set_score_request(*, sample_weight: bool | None | str = '$UNCHANGED$') SignWeightedLinearRegression #
Request metadata passed to the
score
method.Note that this method is only relevant if
enable_metadata_routing=True
(seesklearn.set_config()
). Please see User Guide on how the routing mechanism works.The options for each parameter are:
True
: metadata is requested, and passed toscore
if provided. The request is ignored if metadata is not provided.False
: metadata is not requested and the meta-estimator will not pass it toscore
.None
: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str
: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED
) retains the existing request. This allows you to change the request for some parameters and not others.New in version 1.3.
Note
This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a
Pipeline
. Otherwise it has no effect.
- class TimeWeightedLinearRegression(fit_intercept=True, positive=False, half_life=252, alpha=0, shrinkage_type='l1')[source]#
Bases:
TimeWeightedRegressor
- set_params(**params)[source]#
Setter method to update the parameters of the TimeWeightedLinearRegression.
- Parameters:
**params (dict) – Dictionary of parameters to update.
- Returns:
The TimeWeightedLinearRegression instance with updated parameters.
- Return type:
self
- set_score_request(*, sample_weight: bool | None | str = '$UNCHANGED$') TimeWeightedLinearRegression #
Request metadata passed to the
score
method.Note that this method is only relevant if
enable_metadata_routing=True
(seesklearn.set_config()
). Please see User Guide on how the routing mechanism works.The options for each parameter are:
True
: metadata is requested, and passed toscore
if provided. The request is ignored if metadata is not provided.False
: metadata is not requested and the meta-estimator will not pass it toscore
.None
: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str
: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED
) retains the existing request. This allows you to change the request for some parameters and not others.New in version 1.3.
Note
This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a
Pipeline
. Otherwise it has no effect.
- class NaiveRegressor[source]#
Bases:
BaseEstimator
,RegressorMixin
Equally weighted unbiased factor model.
Notes
Given a collection of factors that are theoretically positively correlated with a dependent variable, a plausible signal is a simple average of those factors. This is effectively a linear regression model with zero intercept and equal weights for all factors.
This is a useful benchmark model which works well when the factors are as uncorrelated as possible with one another, because it offers a layer of diversification on the underlying return drivers. When the user has strong priors, this is often a competitive model that is difficult to beat.
However, it is vital for the features to have been preprocessed to have a positive theoretical correlation with the target variable.
- fit(X, y=None)[source]#
Fit method.
- Parameters:
X (pd.DataFrame, pd.Series or np.ndarray) – The input feature matrix.
y (pd.DataFrame, pd.Series or np.ndarray) – The target variable.
- Returns:
The fitted model.
- Return type:
self
Notes
This method involves fully trusting one’s priors and thus requires no learning element. As a consequence, no training set information is needed.
- predict(X)[source]#
Predict method.
Notes
The predictions are simply the average of the features across columns of the input feature matrix.
- set_score_request(*, sample_weight: bool | None | str = '$UNCHANGED$') NaiveRegressor #
Request metadata passed to the
score
method.Note that this method is only relevant if
enable_metadata_routing=True
(seesklearn.set_config()
). Please see User Guide on how the routing mechanism works.The options for each parameter are:
True
: metadata is requested, and passed toscore
if provided. The request is ignored if metadata is not provided.False
: metadata is not requested and the meta-estimator will not pass it toscore
.None
: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str
: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED
) retains the existing request. This allows you to change the request for some parameters and not others.New in version 1.3.
Note
This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a
Pipeline
. Otherwise it has no effect.
- class CorrelationVolatilitySystem(correlation_lookback='full', correlation_type='pearson', volatility_lookback='full', volatility_window_type='rolling', min_xs_samples=2, data_freq=None)[source]#
Bases:
BaseRegressionSystem
Cross-sectional system of moving average models to estimate correlation and volatility components of a macro beta separately over a panel of financial contracts.
- Parameters:
correlation_lookback (int or str, default="full") – The lookback of the rolling window for correlation estimation. If “full”, the entire cross-sectional history is used. Otherwise, this parameter should be an integer specified in the native dataset frequency. If data_freq is not None or ‘unadjusted’, this parameter should be expressed in units of the frequency specified in data_freq.
correlation_type (str, default='pearson') – The type of correlation to be calculated. Accepted values are ‘pearson’, ‘kendall’ and ‘spearman’.
volatility_lookback (int or str, default="full") – The lookback of the rolling window for volatility estimation. If “full”, the entire cross-sectional history is used. Otherwise, this parameter should be an integer specified in the native dataset frequency. If data_freq is not None or ‘unadjusted’, this parameter should be expressed in units of the frequency specified in data_freq.
volatility_window_type (str, default='rolling') – The type of window to use for the volatility calculation. Accepted values are ‘rolling’ and ‘exponential’.
min_xs_samples (int, default=2) – The minimum number of samples required in each cross-section training set for a regression model to be fitted on that cross-section. If data_freq is None or ‘unadjusted’, this parameter is specified in units of the underlying dataset frequency. Otherwise, this parameter should be expressed in unites of the frequency specified in data_freq.
data_freq (str, optional) – Training set data frequency for resampling. This is primarily to be used within the context of market beta estimation in the BetaEstimator class in macrosynergy.learning, allowing for cross-validation of the underlying dataset frequency for good beta estimation. Accepted strings are ‘unadjusted’ to use the native dataset frequency, ‘W’ for weekly, ‘M’ for monthly and ‘Q’ for quarterly. It is recommended to set this parameter to ‘W’, ‘M’ or ‘Q’ only when the native dataset frequency is greater.
Notes
This class is specifically designed for market beta estimation based on the decomposition of the beta into correlation and volatility components in univariate analysis.
Separate estimators are fit for each cross-section, but evaluation is performed over the panel. Consequently, the results of a hyperparameter search will choose a single set of hyperparameters for all cross-sections, but the model parameters themselves may differ across cross-sections.
- predict(X)[source]#
Make naive zero predictions over a panel dataset.
- Parameters:
X (pd.DataFrame) – Input feature matrix.
- Returns:
predictions – Pandas series of zero predictions, multi-indexed by cross-section and date.
- Return type:
pd.Series
Notes
This method outputs zero predictions for all cross-sections and dates, since the CorrelationVolatilitySystem is solely used for beta estimation and no forecasting is performed.
- store_model_info(section, beta)[source]#
Store the betas induced by the correlation and volatility estimators.
- Parameters:
section (str) – The cross-section identifier.
beta (numbers.Number) – The beta estimate for the associated cross-section.
- set_score_request(*, sample_weight: bool | None | str = '$UNCHANGED$') CorrelationVolatilitySystem #
Request metadata passed to the
score
method.Note that this method is only relevant if
enable_metadata_routing=True
(seesklearn.set_config()
). Please see User Guide on how the routing mechanism works.The options for each parameter are:
True
: metadata is requested, and passed toscore
if provided. The request is ignored if metadata is not provided.False
: metadata is not requested and the meta-estimator will not pass it toscore
.None
: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str
: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED
) retains the existing request. This allows you to change the request for some parameters and not others.New in version 1.3.
Note
This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a
Pipeline
. Otherwise it has no effect.
- class LADRegressionSystem(fit_intercept=True, positive=False, roll='full', min_xs_samples=2, data_freq=None)[source]#
Bases:
BaseRegressionSystem
Cross-sectional system of LAD regression models.
- Parameters:
fit_intercept (bool, default=True) – Whether to fit an intercept for each regression.
positive (bool, default=False) – Whether to enforce positive coefficients for each regression.
roll (int or str, default = "full") – The lookback of the rolling window for the regression. This should be expressed in either integer units of the native dataset frequency, or as the string roll = ‘full’ to use the entire available history.
min_xs_samples (int, default=2) – The minimum number of samples required in each cross-section training set for a regression model to be fitted on that cross-section. If data_freq is None or ‘unadjusted’, this parameter is specified in units of the underlying dataset frequency. Otherwise, this parameter should be expressed in unites of the frequency specified in data_freq.
data_freq (str, optional) – Training set data frequency for resampling. This is primarily to be used within the context of market beta estimation in the BetaEstimator class in macrosynergy.learning, allowing for cross-validation of the underlying dataset frequency for good beta estimation. Accepted strings are ‘unadjusted’ to use the native dataset frequency, ‘W’ for weekly, ‘M’ for monthly and ‘Q’ for quarterly. It is recommended to set this parameter to ‘W’, ‘M’ or ‘Q’ only when the native dataset frequency is greater.
Notes
Separate regression models are fit for each cross-section, but evaluation is performed over the panel. Consequently, the results of a hyperparameter search will choose a single set of hyperparameters for all cross-sections, but the model parameters themselves may differ across cross-sections.
This estimator is primarily intended for use within the context of market beta estimation, but can be plausibly used for return forecasting or other downstream tasks. The data_freq parameter is particularly intended for cross-validating market beta estimation models, since choosing the underlying data frequency is of interest for this problem.
- create_model()[source]#
Instantiate a LAD regression model.
- Returns:
A LAD regression model with the specified hyperparameters.
- Return type:
- store_model_info(section, model)[source]#
Store the coefficients and intercepts of a fitted LAD regression model.
- Parameters:
section (str) – The cross-section identifier.
model (LADRegressor) – The fitted linear regression model.
- set_score_request(*, sample_weight: bool | None | str = '$UNCHANGED$') LADRegressionSystem #
Request metadata passed to the
score
method.Note that this method is only relevant if
enable_metadata_routing=True
(seesklearn.set_config()
). Please see User Guide on how the routing mechanism works.The options for each parameter are:
True
: metadata is requested, and passed toscore
if provided. The request is ignored if metadata is not provided.False
: metadata is not requested and the meta-estimator will not pass it toscore
.None
: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str
: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED
) retains the existing request. This allows you to change the request for some parameters and not others.New in version 1.3.
Note
This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a
Pipeline
. Otherwise it has no effect.
- class LinearRegressionSystem(fit_intercept=True, positive=False, roll='full', min_xs_samples=2, data_freq=None)[source]#
Bases:
BaseRegressionSystem
Cross-sectional system of linear regression models for panel data.
- Parameters:
fit_intercept (bool, default=True) – Whether to fit an intercept for each regression.
positive (bool, default=False) – Whether to enforce positive coefficients for each regression.
roll (int or str, default = "full") – The lookback of the rolling window for the regression. This should be expressed in either integer units of the native dataset frequency, or as the string roll = ‘full’ to use the entire available history.
min_xs_samples (int, default=2) – The minimum number of samples required in each cross-section training set for a regression model to be fitted on that cross-section. If data_freq is None or unadjusted, this parameter is specified in units of the underlying dataset frequency. Otherwise, this parameter should be expressed in units of the frequency specified in data_freq.
data_freq (str, optional) – Training set data frequency for resampling. This is primarily to be used within the context of market beta estimation in the BetaEstimator class in macrosynergy.learning, allowing for cross-validation of the underlying dataset frequency for good beta estimation. Accepted strings are ‘unadjusted’ to use the native dataset frequency, ‘W’ for weekly, ‘M’ for monthly and ‘Q’ for quarterly. It is recommended to set this parameter to ‘W’, ‘M’ or ‘Q’ only when the native dataset frequency is greater.
Notes
Separate regression models are fit for each cross-section, but evaluation is performed over the panel. Consequently, the results of a hyperparameter search will choose a single set of hyperparameters for all cross-sections, but the model parameters themselves may differ across cross-sections.
This estimator is primarily intended for use within the context of market beta estimation, but can be plausibly used for return forecasting or other downstream tasks. The data_freq parameter is particularly intended for cross-validating market beta estimation models, since choosing the underlying data frequency is of interest for this problem.
- create_model()[source]#
Instantiate a linear regression model.
- Returns:
A linear regression model with the specified hyperparameters.
- Return type:
LinearRegression
- store_model_info(section, model)[source]#
Store the coefficients and intercepts of a fitted linear regression model.
- Parameters:
section (str) – The cross-section identifier.
model (LinearRegression) – The fitted linear regression model.
- set_score_request(*, sample_weight: bool | None | str = '$UNCHANGED$') LinearRegressionSystem #
Request metadata passed to the
score
method.Note that this method is only relevant if
enable_metadata_routing=True
(seesklearn.set_config()
). Please see User Guide on how the routing mechanism works.The options for each parameter are:
True
: metadata is requested, and passed toscore
if provided. The request is ignored if metadata is not provided.False
: metadata is not requested and the meta-estimator will not pass it toscore
.None
: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str
: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED
) retains the existing request. This allows you to change the request for some parameters and not others.New in version 1.3.
Note
This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a
Pipeline
. Otherwise it has no effect.
- class RidgeRegressionSystem(fit_intercept=True, positive=False, alpha=1.0, tol=0.0001, solver='lsqr', roll='full', min_xs_samples=2, data_freq=None)[source]#
Bases:
BaseRegressionSystem
Cross-sectional system of ridge regression models for panel data.
- Parameters:
fit_intercept (bool, default=True) – Whether to fit an intercept for each regression.
positive (bool, default=False) – Whether to enforce positive coefficients for each regression.
alpha (float, default=1.0) – L2 regularization hyperparameter. Greater values specify stronger regularization.
roll (int or str, default = "full") – The lookback of the rolling window for the regression. This should be expressed in either integer units of the native dataset frequency, or as the string roll = ‘full’ to use the entire available history.
tol (float, default=1e-4) – The tolerance for termination.
solver (str, default='lsqr') – Solver to use in the computational routines. Options are ‘auto’, ‘svd’, ‘cholesky’, ‘lsqr’, ‘sparse_cg’, ‘sag’, ‘saga’ and ‘lbfgs’.
min_xs_samples (int, default=2) – The minimum number of samples required in each cross-section training set for a regression model to be fitted on that cross-section. If data_freq is None or ‘unadjusted’, this parameter is specified in units of the underlying dataset frequency. Otherwise, this parameter should be expressed in unites of the frequency specified in data_freq.
data_freq (str, optional) – Training set data frequency for resampling. This is primarily to be used within the context of market beta estimation in the BetaEstimator class in macrosynergy.learning, allowing for cross-validation of the underlying dataset frequency for good beta estimation. Accepted strings are ‘unadjusted’ to use the native dataset frequency, ‘W’ for weekly, ‘M’ for monthly and ‘Q’ for quarterly. It is recommended to set this parameter to ‘W’, ‘M’ or ‘Q’ only when the native dataset frequency is greater.
Notes
Separate regression models are fit for each cross-section, but evaluation is performed over the panel. Consequently, the results of a hyperparameter search will choose a single set of hyperparameters for all cross-sections, but the model parameters themselves may differ across cross-sections.
This estimator is primarily intended for use within the context of market beta estimation, but can be plausibly used for return forecasting or other downstream tasks. The data_freq parameter is particularly intended for cross-validating market beta estimation models, since choosing the underlying data frequency is of interest in quant analysis.
- create_model()[source]#
Instantiate a ridge regression model.
- Returns:
A ridge regression model with the specified hyperparameters.
- Return type:
Ridge
- store_model_info(section, model)[source]#
Store the coefficients and intercepts of a fitted ridge regression model.
- Parameters:
section (str) – The cross-section identifier.
model (Ridge) – The fitted ridge regression model.
- set_score_request(*, sample_weight: bool | None | str = '$UNCHANGED$') RidgeRegressionSystem #
Request metadata passed to the
score
method.Note that this method is only relevant if
enable_metadata_routing=True
(seesklearn.set_config()
). Please see User Guide on how the routing mechanism works.The options for each parameter are:
True
: metadata is requested, and passed toscore
if provided. The request is ignored if metadata is not provided.False
: metadata is not requested and the meta-estimator will not pass it toscore
.None
: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str
: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED
) retains the existing request. This allows you to change the request for some parameters and not others.New in version 1.3.
Note
This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a
Pipeline
. Otherwise it has no effect.
- class ModifiedLinearRegression(method, fit_intercept=True, positive=False, error_offset=0.01, bootstrap_method='panel', bootstrap_iters=1000, resample_ratio=1, analytic_method=None)[source]#
Bases:
BaseModifiedRegressor
- adjust_analytical_se(model, X, y, analytic_method=None)[source]#
Adjust the coefficients of the OLS linear regression model by an analytical standard error formula.
- Parameters:
model (LinearRegression) – The underlying OLS linear regression model to be modified.
X (pd.DataFrame) – Input feature matrix.
y (pd.DataFrame or pd.Series) – Target vector associated with each sample in X.
analytic_method (str, default = None) – The analytic method used to calculate standard errors.
- Returns:
intercept (float) – Adjusted intercept.
coef (np.ndarray) – Adjusted coefficients.
Notes
By default, the calculated standard errors use the usual standard error expression for OLS linear regression models under the assumption of multivariate normality, homoskedasticity and zero mean of the model errors. If analytic_method = “White”, the HC3 White estimator is used.
References
[1] https://online.stat.psu.edu/stat462/node/131/ [2] https://en.wikipedia.org/wiki/Heteroskedasticity-consistent_standard_errors
- set_params(**params)[source]#
Setter method to update the parameters of the ModifiedLinearRegression
- Parameters:
**params (dict) – Dictionary of parameters to update.
- Returns:
The ModifiedLinearRegression instance with updated parameters.
- Return type:
self
- set_score_request(*, sample_weight: bool | None | str = '$UNCHANGED$') ModifiedLinearRegression #
Request metadata passed to the
score
method.Note that this method is only relevant if
enable_metadata_routing=True
(seesklearn.set_config()
). Please see User Guide on how the routing mechanism works.The options for each parameter are:
True
: metadata is requested, and passed toscore
if provided. The request is ignored if metadata is not provided.False
: metadata is not requested and the meta-estimator will not pass it toscore
.None
: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str
: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED
) retains the existing request. This allows you to change the request for some parameters and not others.New in version 1.3.
Note
This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a
Pipeline
. Otherwise it has no effect.
- class ModifiedSignWeightedLinearRegression(method, fit_intercept=True, positive=False, error_offset=0.01, bootstrap_method='panel', bootstrap_iters=1000, resample_ratio=1, analytic_method=None)[source]#
Bases:
BaseModifiedRegressor
- adjust_analytical_se(model, X, y, analytic_method=None)[source]#
Adjust the coefficients of the SWLS linear regression model by an analytical standard error formula.
- Parameters:
model (SignWeightedLinearRegression) – The underlying SWLS linear regression model to be modified.
X (pd.DataFrame) – Input feature matrix.
y (pd.DataFrame or pd.Series) – Target vector associated with each sample in X.
analytic_method (str, default = None) – The analytic method used to calculate standard errors.
- Returns:
intercept (float) – Adjusted intercept.
coef (np.ndarray) – Adjusted coefficients.
Notes
The analytical parameter estimates for WLS are:
\[\hat{\beta}^{\text{WLS}} = (X^{\intercal}WX)^{-1}X^{\intercal}y\]- where:
X is the input feature matrix, possibly with a column of ones representing the choice of an intercept.
W is the positive-definite, symmetric weight matrix, a diagonal matrix with sample weights along the main diagonal.
y is the dependent variable vector.
Since W is a positive-definite, symmetric matrix, it has a square root equal to the diagonal matrix with square roots of the sample weights along the diagonal. Hence, the WLS estimator can be rewritten as:
\[\hat{\beta}^{\text{WLS}} = ((({W^{1/2}X})^{\intercal}(W^{1/2}X))^{-1}(W^{1/2}X)^{\intercal}(W^{1/2}y))\]This is precisely the OLS estimator for a rescaled matrix
\[\tilde {X} = W^{1/2}X\]and a rescaled dependent variable
\[\tilde {y} = W^{1/2}y\]Hence, the usual standard error estimate and White’s estimator can be applied based on a rescaling of the design matrix and associated target vector.
- set_params(**params)[source]#
Setter method to update the parameters of the ModifiedSignWeightedLinearRegression.
- Parameters:
**params (dict) – Dictionary of parameters to update.
- Returns:
The ModifiedSignWeightedLinearRegression instance with updated parameters.
- Return type:
self
- set_score_request(*, sample_weight: bool | None | str = '$UNCHANGED$') ModifiedSignWeightedLinearRegression #
Request metadata passed to the
score
method.Note that this method is only relevant if
enable_metadata_routing=True
(seesklearn.set_config()
). Please see User Guide on how the routing mechanism works.The options for each parameter are:
True
: metadata is requested, and passed toscore
if provided. The request is ignored if metadata is not provided.False
: metadata is not requested and the meta-estimator will not pass it toscore
.None
: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str
: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED
) retains the existing request. This allows you to change the request for some parameters and not others.New in version 1.3.
Note
This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a
Pipeline
. Otherwise it has no effect.
- class ModifiedTimeWeightedLinearRegression(method, fit_intercept=True, positive=False, half_life=252, error_offset=0.01, bootstrap_method='panel', bootstrap_iters=1000, resample_ratio=1, analytic_method=None)[source]#
Bases:
BaseModifiedRegressor
- adjust_analytical_se(model, X, y, analytic_method)[source]#
Adjust the coefficients of the TWLS linear regression model by an analytical standard error formula.
- Parameters:
model (TimeWeightedLinearRegression) – The underlying TWLS linear regression model to be modified.
X (pd.DataFrame) – Input feature matrix.
y (pd.DataFrame or pd.Series) – Target vector associated with each sample in X.
analytic_method (str, default = None) – The analytic method used to calculate standard errors.
- Returns:
intercept (float) – Adjusted intercept.
coef (np.ndarray) – Adjusted coefficients.
Notes
The analytical parameter estimates for WLS are:
\[\hat{\beta}^{\text{WLS}} = (X^{\intercal}WX)^{-1}X^{\intercal}y\]- where:
X is the input feature matrix, possibly with a column of ones representing the choice of an intercept.
W is the positive-definite, symmetric weight matrix, a diagonal matrix with sample weights along the main diagonal.
y is the dependent variable vector.
Since W is a positive-definite, symmetric matrix, it has a square root equal to the diagonal matrix with square roots of the sample weights along the diagonal. Hence, the WLS estimator can be rewritten as:
\[\hat{\beta}^{\text{WLS}} = (({W^{1/2}X})^{\intercal}(W^{1/2}X))^{-1}(W^{1/2}X)^{\intercal}(W^{1/2}y))\]This is precisely the OLS estimator for a rescaled matrix
\[\tilde {X} = W^{1/2}X\]and a rescaled dependent variable
\[\tilde {y} = W^{1/2}y\]Hence, the usual standard error estimate and White’s estimator can be applied based on a rescaling of the design matrix and associated target vector.
- set_params(**params)[source]#
Setter method to update the parameters of the ModifiedTimeWeightedLinearRegression.
- Parameters:
**params (dict) – Dictionary of parameters to update.
- Returns:
The ModifiedTimeWeightedLinearRegression instance with updated parameters.
- Return type:
self
- set_score_request(*, sample_weight: bool | None | str = '$UNCHANGED$') ModifiedTimeWeightedLinearRegression #
Request metadata passed to the
score
method.Note that this method is only relevant if
enable_metadata_routing=True
(seesklearn.set_config()
). Please see User Guide on how the routing mechanism works.The options for each parameter are:
True
: metadata is requested, and passed toscore
if provided. The request is ignored if metadata is not provided.False
: metadata is not requested and the meta-estimator will not pass it toscore
.None
: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str
: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED
) retains the existing request. This allows you to change the request for some parameters and not others.New in version 1.3.
Note
This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a
Pipeline
. Otherwise it has no effect.
- class ProbabilityEstimator(classifier)[source]#
Bases:
BaseEstimator
,MetaEstimatorMixin
,ClassifierMixin
Meta estimator to create trading signals based on the probability of going long.
- Parameters:
classifier (ClassifierMixin) – A scikit-learn classifier.
Notes
This class stores feature importances as the feature importances of the base estimator as well as defining a create_signal method that returns the probability of going long in excess of 0.5. This is taken into account when used in the SignalOptimizer class in this package.
- fit(X, y)[source]#
Fit the underlying classifier.
- Parameters:
X (pd.DataFrame or np.ndarray) – Pandas dataframe or numpy array of input features.
y (pd.Series or pd.DataFrame or np.ndarray) – Pandas series, dataframe or numpy array of targets associated with each sample in X.
- predict(X)[source]#
Predict the class labels for the provided data.
- Parameters:
X (pd.DataFrame or numpy array) – Input feature matrix.
- Returns:
y_pred – Numpy array of predictions.
- Return type:
np.ndarray
- create_signal(X)[source]#
Create a trading signal based on the probability of going long.
- Parameters:
X (pd.DataFrame or numpy array) – Input feature matrix.
- Returns:
y_pred – Numpy array of signals.
- Return type:
np.ndarray
- set_score_request(*, sample_weight: bool | None | str = '$UNCHANGED$') ProbabilityEstimator #
Request metadata passed to the
score
method.Note that this method is only relevant if
enable_metadata_routing=True
(seesklearn.set_config()
). Please see User Guide on how the routing mechanism works.The options for each parameter are:
True
: metadata is requested, and passed toscore
if provided. The request is ignored if metadata is not provided.False
: metadata is not requested and the meta-estimator will not pass it toscore
.None
: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str
: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED
) retains the existing request. This allows you to change the request for some parameters and not others.New in version 1.3.
Note
This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a
Pipeline
. Otherwise it has no effect.
- class VotingClassifier(estimators, voting='hard', weights=None, n_jobs=None, flatten_transform=True, verbose=False)[source]#
Bases:
VotingClassifier
Classification model that votes on the predictions of many classifiers.
- Parameters:
estimators (list of (str, estimator) tuples) – List of (name, estimator) tuples that are used to fit the model.
voting ({'hard', 'soft'}, default='hard') – If ‘hard’, uses predicted class labels for majority rule voting. If ‘soft’, predicts the class label based on the argmax of the sums of the predicted probabilities, which is recommended for an ensemble of well-calibrated classifiers.
weights (array-like of shape (n_estimators,), default=None) – Sequence of weights to assign to models. If None, models are weighted equally.
n_jobs (int, default=None) – The number of jobs to run in parallel for fit. None means 1 unless in a joblib.parallel_backend context. -1 means using all processors.
flatten_transform (bool, default=True) – Affects shape of transform output only when voting=’soft’. If True, the transform method returns a matrix with shape (n_samples, n_classes*n_classifiers). If False, the shape is (n_classifiers, n_samples, n_classes).
verbose (bool, default=False) – If True, the time elapsed while fitting will be printed as model trains.
Notes
This class calculates feature importances as the average of the feature importances of the base estimators.
- fit(X, y, sample_weight=None, **fit_params)[source]#
Fit the estimators.
- Parameters:
X (pd.DataFrame or np.ndarray) – Pandas dataframe or numpy array of input features.
y (pd.Series or pd.DataFrame or np.ndarray) – Pandas series, dataframe or numpy array of targets associated with each sample in X.
- set_fit_request(*, sample_weight: bool | None | str = '$UNCHANGED$') VotingClassifier #
Request metadata passed to the
fit
method.Note that this method is only relevant if
enable_metadata_routing=True
(seesklearn.set_config()
). Please see User Guide on how the routing mechanism works.The options for each parameter are:
True
: metadata is requested, and passed tofit
if provided. The request is ignored if metadata is not provided.False
: metadata is not requested and the meta-estimator will not pass it tofit
.None
: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str
: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED
) retains the existing request. This allows you to change the request for some parameters and not others.New in version 1.3.
Note
This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a
Pipeline
. Otherwise it has no effect.
- set_score_request(*, sample_weight: bool | None | str = '$UNCHANGED$') VotingClassifier #
Request metadata passed to the
score
method.Note that this method is only relevant if
enable_metadata_routing=True
(seesklearn.set_config()
). Please see User Guide on how the routing mechanism works.The options for each parameter are:
True
: metadata is requested, and passed toscore
if provided. The request is ignored if metadata is not provided.False
: metadata is not requested and the meta-estimator will not pass it toscore
.None
: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str
: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED
) retains the existing request. This allows you to change the request for some parameters and not others.New in version 1.3.
Note
This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a
Pipeline
. Otherwise it has no effect.
- class VotingRegressor(estimators, weights=None, verbose=False)[source]#
Bases:
VotingRegressor
Regression model that averages the predictions of many regression models.
- Parameters:
estimators (list of (str, estimator) tuples) – List of (name, estimator) tuples that are used to fit the model.
weights (array-like of shape (n_estimators,), default=None) – Sequence of weights to assign to models. If None, models are weighted equally.
verbose (bool, default=False) – If True, the time elapsed while fitting will be printed as model trains.
Notes
This class calculates feature importances as the average of the feature importances of the base estimators.
- fit(X, y, **fit_params)[source]#
Fit the estimators.
- Parameters:
X (pd.DataFrame or np.ndarray) – Pandas dataframe or numpy array of input features.
y (pd.Series or pd.DataFrame or np.ndarray) – Pandas series, dataframe or numpy array of targets associated with each sample in X.
- set_score_request(*, sample_weight: bool | None | str = '$UNCHANGED$') VotingRegressor #
Request metadata passed to the
score
method.Note that this method is only relevant if
enable_metadata_routing=True
(seesklearn.set_config()
). Please see User Guide on how the routing mechanism works.The options for each parameter are:
True
: metadata is requested, and passed toscore
if provided. The request is ignored if metadata is not provided.False
: metadata is not requested and the meta-estimator will not pass it toscore
.None
: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str
: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED
) retains the existing request. This allows you to change the request for some parameters and not others.New in version 1.3.
Note
This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a
Pipeline
. Otherwise it has no effect.
- class FIExtractor(estimator)[source]#
Bases:
BaseEstimator
,MetaEstimatorMixin
,RegressorMixin
- fit(X, y)[source]#
Fit the underlying estimator and store normalized feature importances.
- Parameters:
X (pd.DataFrame or np.ndarray) – Pandas dataframe or numpy array of input features.
y (pd.Series or pd.DataFrame or np.ndarray) – Pandas series, dataframe or numpy array of targets associated with each sample in X.
- predict(X)[source]#
Predict the class labels for the provided data.
- Parameters:
X (pd.DataFrame or numpy array) – Input feature matrix.
- Returns:
y_pred – Numpy array of predictions.
- Return type:
np.ndarray
- set_score_request(*, sample_weight: bool | None | str = '$UNCHANGED$') FIExtractor #
Request metadata passed to the
score
method.Note that this method is only relevant if
enable_metadata_routing=True
(seesklearn.set_config()
). Please see User Guide on how the routing mechanism works.The options for each parameter are:
True
: metadata is requested, and passed toscore
if provided. The request is ignored if metadata is not provided.False
: metadata is not requested and the meta-estimator will not pass it toscore
.None
: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str
: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED
) retains the existing request. This allows you to change the request for some parameters and not others.New in version 1.3.
Note
This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a
Pipeline
. Otherwise it has no effect.
Subpackages#
- macrosynergy.learning.forecasting.bootstrap
- macrosynergy.learning.forecasting.ensemble
- macrosynergy.learning.forecasting.linear_model
LADRegressor
SignWeightedLADRegressor
TimeWeightedLADRegressor
SignWeightedLinearRegression
TimeWeightedLinearRegression
ModifiedLinearRegression
ModifiedSignWeightedLinearRegression
ModifiedTimeWeightedLinearRegression
- Subpackages
- macrosynergy.learning.forecasting.meta_estimators
- macrosynergy.learning.forecasting.model_systems
- macrosynergy.learning.forecasting.neighbors