macrosynergy.learning.preprocessing#
- class BasePanelSelector[source]#
Bases:
BaseEstimator
,SelectorMixin
,ABC
Base class for statistical feature selection over a panel.
- fit(X, y=None)[source]#
Learn optimal features based on a training set pair (X, y).
- Parameters:
X (pandas.DataFrame) – The feature matrix.
y (pandas.Series or pandas.DataFrame, optional) – The target vector.
- abstract determine_features(X, y)[source]#
Determine mask of selected features based on a training set pair (X, y).
- Parameters:
X (pandas.DataFrame) – The feature matrix.
y (pandas.Series or pandas.DataFrame) – The target vector.
- Returns:
mask – Boolean mask of selected features.
- Return type:
np.ndarray
- transform(X)[source]#
Transform method to return only the selected features of the dataframe.
- Parameters:
X (pandas.DataFrame) – The feature matrix.
- Returns:
X_transformed – The feature matrix with only the selected features.
- Return type:
- class LarsSelector(n_factors=10, fit_intercept=False)[source]#
Bases:
BasePanelSelector
- determine_features(X, y)[source]#
Create feature mask based on the LARS algorithm.
- Parameters:
X (pandas.DataFrame) – The feature matrix.
y (pandas.Series or pandas.DataFrame) – The target vector.
- Returns:
mask – Boolean mask of selected features.
- Return type:
- class LassoSelector(n_factors=10, positive=False)[source]#
Bases:
BasePanelSelector
- determine_features(X, y)[source]#
Create feature mask based on the LASSO-LARS algorithm.
- Parameters:
X (pandas.DataFrame) – The feature matrix.
y (pandas.Series or pandas.DataFrame) – The target vector.
- Returns:
mask – Boolean mask of selected features.
- Return type:
np.ndarray
- class MapSelector(n_factors=None, significance_level=0.05, positive=False)[source]#
Bases:
BasePanelSelector
- determine_features(X, y)[source]#
Create feature mask based on the Macrosynergy panel test.
- Parameters:
X (pandas.DataFrame) – The feature matrix.
y (pandas.Series or pandas.DataFrame) – The target vector.
- Returns:
mask – Boolean mask of selected features.
- Return type:
np.ndarray
- class BasePanelScaler(type='panel')[source]#
Bases:
BaseEstimator
,TransformerMixin
,OneToOneFeatureMixin
,ABC
Base class for scaling a panel of features in a learning pipeline.
- Parameters:
type (str, default="panel") – The panel dimension over which the scaling is applied. Options are “panel” and “cross_section”.
Notes
Learning algorithms can benefit from scaling each feature to a similar range. This ensures they consider each feature equally in the model training process. It can also encourage faster convergence of an optimization algorithm.
- fit(X, y=None)[source]#
Fit method to learn training set quantities for feature scaling.
- Parameters:
X (pd.DataFrame) – The feature matrix.
y (pd.Series or pd.DataFrame, default=None) – The target vector.
- Returns:
The fitted scaler.
- Return type:
self
- transform(X)[source]#
Transform method to scale the input data based on extracted training statistics.
- Parameters:
X (pandas.DataFrame) – The feature matrix.
- Returns:
X_transformed – The feature matrix with scaled features.
- Return type:
- class PanelMinMaxScaler(type='panel')[source]#
Bases:
BasePanelScaler
Scale and translate panel features to lie within the range [0,1].
Notes
This class is designed to replicate scikit-learn’s MinMaxScaler() class, with the additional option to scale within cross-sections. Unlike the MinMaxScaler() class, dataframes are always returned, preserving the multi-indexing of the inputs.
- extract_statistics(X, feature)[source]#
Determine the minimum and maximum values of a feature in the input matrix.
- Parameters:
X (pandas.DataFrame) – The feature matrix.
feature (str) – The feature to extract statistics for.
- Returns:
statistics – List containing the minimum and maximum values of the feature.
- Return type:
- scale(X, feature, statistics)[source]#
Scale the ‘feature’ column in the design matrix ‘X’ based on the minimum and maximum values of the feature.
- Parameters:
X (pandas.DataFrame) – The feature matrix.
feature (str) – The feature to scale.
statistics (list) – List containing the minimum and maximum values of the feature, in that order.
- Returns:
X_transformed – The scaled feature.
- Return type:
- class PanelStandardScaler(type='panel', with_mean=True, with_std=True)[source]#
Bases:
BasePanelScaler
Scale and translate panel features to have zero mean and unit variance.
- Parameters:
Notes
This class is designed to replicate scikit-learn’s StandardScaler() class, with the additional option to scale within cross-sections. Unlike the StandardScaler() class, dataframes are always returned, preserving the multi-indexing of the inputs.
- extract_statistics(X, feature)[source]#
Determine the mean and standard deviation of values of a feature in the input matrix.
- Parameters:
X (pandas.DataFrame) – The feature matrix.
feature (str) – The feature to extract statistics for.
- Returns:
statistics – List containing the mean and standard deviation of values of the feature.
- Return type:
- scale(X, feature, statistics)[source]#
Scale the ‘feature’ column in the design matrix ‘X’ based on the mean and standard deviation values of the feature.
- Parameters:
X (pandas.DataFrame) – The feature matrix.
feature (str) – The feature to scale.
statistics (list) – List containing the mean and standard deviation of values of the feature, in that order.
- Returns:
X_transformed – The scaled feature.
- Return type:
- class PanelPCA(n_components=None, kaiser_criterion=False, adjust_signs=False)[source]#
Bases:
BaseEstimator
,TransformerMixin
- fit(X, y=None)[source]#
Fit method to determine an eigenbasis for the PCA.
- Parameters:
X (pd.DataFrame) – Input feature matrix.
y (pd.DataFrame, pd.Series or np.ndarray, default=None) – Target variable.
Notes
The target variable y is only ever used to adjust the signs of the eigenvectors to ensure consistency of eigenvector signs when retrained over time. This does not affect the PCA itself.
- class ZnScoreAverager(neutral='zero', use_signs=False)[source]#
Bases:
BaseEstimator
,TransformerMixin