macrosynergy.learning.preprocessing.scalers#

class BasePanelScaler(type='panel')[source]#

Bases: BaseEstimator, TransformerMixin, OneToOneFeatureMixin, ABC

Base class for scaling a panel of features in a learning pipeline.

Parameters:

type (str, default="panel") – The panel dimension over which the scaling is applied. Options are “panel” and “cross_section”.

Notes

Learning algorithms can benefit from scaling each feature to a similar range. This ensures they consider each feature equally in the model training process. It can also encourage faster convergence of an optimization algorithm.

fit(X, y=None)[source]#

Fit method to learn training set quantities for feature scaling.

Parameters:
  • X (pd.DataFrame) – The feature matrix.

  • y (pd.Series or pd.DataFrame, default=None) – The target vector.

Returns:

The fitted scaler.

Return type:

self

transform(X)[source]#

Transform method to scale the input data based on extracted training statistics.

Parameters:

X (pandas.DataFrame) – The feature matrix.

Returns:

X_transformed – The feature matrix with scaled features.

Return type:

pandas.DataFrame

abstract extract_statistics(X, feature)[source]#

Determine the relevant statistics for feature scaling.

abstract scale(X, feature, statistics)[source]#

Scale the input data based on the relevant statistics.

class PanelMinMaxScaler(type='panel')[source]#

Bases: BasePanelScaler

Scale and translate panel features to lie within the range [0,1].

Notes

This class is designed to replicate scikit-learn’s MinMaxScaler() class, with the additional option to scale within cross-sections. Unlike the MinMaxScaler() class, dataframes are always returned, preserving the multi-indexing of the inputs.

extract_statistics(X, feature)[source]#

Determine the minimum and maximum values of a feature in the input matrix.

Parameters:
  • X (pandas.DataFrame) – The feature matrix.

  • feature (str) – The feature to extract statistics for.

Returns:

statistics – List containing the minimum and maximum values of the feature.

Return type:

list

scale(X, feature, statistics)[source]#

Scale the ‘feature’ column in the design matrix ‘X’ based on the minimum and maximum values of the feature.

Parameters:
  • X (pandas.DataFrame) – The feature matrix.

  • feature (str) – The feature to scale.

  • statistics (list) – List containing the minimum and maximum values of the feature, in that order.

Returns:

X_transformed – The scaled feature.

Return type:

pandas.Series

class PanelStandardScaler(type='panel', with_mean=True, with_std=True)[source]#

Bases: BasePanelScaler

Scale and translate panel features to have zero mean and unit variance.

Parameters:
  • type (str, default="panel") – The panel dimension over which the scaling is applied. Options are “panel” and “cross_section”.

  • with_mean (bool, default=True) – Whether to centre the data before scaling.

  • with_std (bool, default=True) – Whether to scale the data to unit variance.

Notes

This class is designed to replicate scikit-learn’s StandardScaler() class, with the additional option to scale within cross-sections. Unlike the StandardScaler() class, dataframes are always returned, preserving the multi-indexing of the inputs.

extract_statistics(X, feature)[source]#

Determine the mean and standard deviation of values of a feature in the input matrix.

Parameters:
  • X (pandas.DataFrame) – The feature matrix.

  • feature (str) – The feature to extract statistics for.

Returns:

statistics – List containing the mean and standard deviation of values of the feature.

Return type:

list

scale(X, feature, statistics)[source]#

Scale the ‘feature’ column in the design matrix ‘X’ based on the mean and standard deviation values of the feature.

Parameters:
  • X (pandas.DataFrame) – The feature matrix.

  • feature (str) – The feature to scale.

  • statistics (list) – List containing the mean and standard deviation of values of the feature, in that order.

Returns:

X_transformed – The scaled feature.

Return type:

pandas.Series

Submodules#