macrosynergy.learning.preprocessing.scalers#
- class BasePanelScaler(type='panel')[source]#
Bases:
BaseEstimator
,TransformerMixin
,OneToOneFeatureMixin
,ABC
Base class for scaling a panel of features in a learning pipeline.
- Parameters:
type (str, default="panel") – The panel dimension over which the scaling is applied. Options are “panel” and “cross_section”.
Notes
Learning algorithms can benefit from scaling each feature to a similar range. This ensures they consider each feature equally in the model training process. It can also encourage faster convergence of an optimization algorithm.
- fit(X, y=None)[source]#
Fit method to learn training set quantities for feature scaling.
- Parameters:
X (pd.DataFrame) – The feature matrix.
y (pd.Series or pd.DataFrame, default=None) – The target vector.
- Returns:
The fitted scaler.
- Return type:
self
- transform(X)[source]#
Transform method to scale the input data based on extracted training statistics.
- Parameters:
X (pandas.DataFrame) – The feature matrix.
- Returns:
X_transformed – The feature matrix with scaled features.
- Return type:
- class PanelMinMaxScaler(type='panel')[source]#
Bases:
BasePanelScaler
Scale and translate panel features to lie within the range [0,1].
Notes
This class is designed to replicate scikit-learn’s MinMaxScaler() class, with the additional option to scale within cross-sections. Unlike the MinMaxScaler() class, dataframes are always returned, preserving the multi-indexing of the inputs.
- extract_statistics(X, feature)[source]#
Determine the minimum and maximum values of a feature in the input matrix.
- Parameters:
X (pandas.DataFrame) – The feature matrix.
feature (str) – The feature to extract statistics for.
- Returns:
statistics – List containing the minimum and maximum values of the feature.
- Return type:
- scale(X, feature, statistics)[source]#
Scale the ‘feature’ column in the design matrix ‘X’ based on the minimum and maximum values of the feature.
- Parameters:
X (pandas.DataFrame) – The feature matrix.
feature (str) – The feature to scale.
statistics (list) – List containing the minimum and maximum values of the feature, in that order.
- Returns:
X_transformed – The scaled feature.
- Return type:
- class PanelStandardScaler(type='panel', with_mean=True, with_std=True)[source]#
Bases:
BasePanelScaler
Scale and translate panel features to have zero mean and unit variance.
- Parameters:
Notes
This class is designed to replicate scikit-learn’s StandardScaler() class, with the additional option to scale within cross-sections. Unlike the StandardScaler() class, dataframes are always returned, preserving the multi-indexing of the inputs.
- extract_statistics(X, feature)[source]#
Determine the mean and standard deviation of values of a feature in the input matrix.
- Parameters:
X (pandas.DataFrame) – The feature matrix.
feature (str) – The feature to extract statistics for.
- Returns:
statistics – List containing the mean and standard deviation of values of the feature.
- Return type:
- scale(X, feature, statistics)[source]#
Scale the ‘feature’ column in the design matrix ‘X’ based on the mean and standard deviation values of the feature.
- Parameters:
X (pandas.DataFrame) – The feature matrix.
feature (str) – The feature to scale.
statistics (list) – List containing the mean and standard deviation of values of the feature, in that order.
- Returns:
X_transformed – The scaled feature.
- Return type: