macrosynergy.learning.splitters.kfold_splitters#

Panel K-Fold cross-validator classes.

class ExpandingKFoldPanelSplit(n_splits=5, min_n_splits=2)[source]#

Bases: KFoldPanelSplit

Time-respecting K-Fold cross-validator for panel data.

Parameters:

n_splits (int) – Number of folds i.e. (training set, test set) pairs. Default is 5. Must be at least 2.

Notes

This splitter can be considered to be a panel data analogue to the TimeSeriesSplit splitter provided by scikit-learn.

Unique dates in the panel are divided into ‘n_splits + 1’ sequential and non-overlapping intervals, resulting in ‘n_splits’ pairs of training and test sets. The ‘i’th training set is the union of the first ‘i’ intervals, and the ‘i’th test set is the ‘i+1’th interval.

class RollingKFoldPanelSplit(n_splits=5, min_n_splits=2)[source]#

Bases: KFoldPanelSplit

Unshuffled K-Fold cross-validator for panel data.

Parameters:

n_splits (int) – Number of folds. Default is 5. Must be at least 2.

Notes

This splitter can be considered to be a panel data analogue to the KFold splitter provided by scikit-learn, with shuffle=False and with splits determined on the time dimension.

Unique dates in the panel are divided into ‘n_splits’ sequential and non-overlapping intervals of equal size, resulting in ‘n_splits’ pairs of training and test sets. The ‘i’th test set is the ‘i’th interval, and the ‘i’th training set is all other intervals.

class RecencyKFoldPanelSplit(n_splits=5, n_periods=252)[source]#

Bases: KFoldPanelSplit

Time-respecting K-Fold panel cross-validator that creates training and test sets based on the most recent samples in the panel.

Parameters:
  • n_splits (int) – Number of folds i.e. (training set, test set) pairs. Default is 5. Must be at least 1.

  • n_periods (int) – Number of time periods, in units of native dataset frequency, to comprise each test set. Default is 252 (1 year for daily data).

Notes

This splitter is similar to the ExpandingKFoldPanelSplit, except that the sorted unique timestamps are not divided into equal intervals. Instead, the last n_periods * n_splits timestamps in the panel are divided into n_splits non-overlapping intervals, each of which is used as a test set. The corresponding training set is comprised of all samples with timestamps earlier than its test set. Consequently, this is a K-Fold walk-forward cross-validator, but with test folds concentrated on the most recent information.