macrosynergy.learning.forecasting.torch#
- class MultiLayerPerceptron(n_inputs, n_latent, n_outputs, encoder_activation='tanh', head_activation='identity', fit_encoder_intercept=False, fit_head_intercept=True)[source]#
Bases:
ModuleMulti-layer perceptron models in PyTorch.
- Parameters:
n_inputs (int) – Number of input features. Must be at least 1.
n_latent (Union[int, list[int]]) – Number of latent features in a single hidden layer or list specifying the size of each hidden layer.
n_outputs (int) – Number of output variables. Must be at least 1.
encoder_activation (str, optional) – Activation function for the encoder layers. Default is “tanh”. Other options include “relu” and “sigmoid”.
head_activation (str, optional) – Activation function for the head layers. Default is “identity” for no activation. Other options include “tanh”, “relu” and “sigmoid”.
fit_encoder_intercept (bool, optional) – Whether to fit intercepts in the encoder layers. Default is False.
fit_head_intercept (bool, optional) – Whether to fit intercepts in the output head. Default is True.
Notes
A multi-layer perceptron is a feed-forward neural network that learns a (hopefully) optimal representation of the feature set for a prediction task, or for a collection of tasks. The intitial set is transformed into a new, “learnt”, collection of features. This is the “first hidden layer” of the network. Each learnt feature is the composition of the linear combination of initial features and a non-linear activation function. The choice of activation is currently “relu” (\(f(x) = \max(0, x)\)), “tanh” (\(f(x) = \frac{e^x - e^{-x}}{e^x + e^{-x}}\)), or “sigmoid” (\(f(x) = \frac{1}{1 + e^{-x}}\)). This new feature set can be further transformed in the same manner by creating a second hidden layer, and so on.
The part of the network that describes how the initial features are transformed into the final features (before mapping to the outputs) is called the “encoder”. The component that maps the final learnt features to the outputs is called the “projection head”. When multiple outputs are being modelled, this is usually referred to as having a “multi-head” architecture.
What’s the advantage of a feedforward neural network over other models on tabular datasets? Structure and customizability. 32 neurons in a hidden layer means that 32 features are being learnt. I can shrink these features towards priors, if I have any beliefs. I can regularize network outputs to encourage smoothness (temporal regularization) and consistency with known relationships (spatial regularization). I can customize loss functions to optimize economically informed losses rather than generic distance metrics. I can penalize correlation against existing strategies, if so desired. People often refer to neural network flexibility in the context of learning an arbitrarily complex function. While this is true, I would use the word “flexibility” to refer to the ability to customize architectures and loss functions to suit a particular problem.
Future work#
Add dropout layers for regularization.
Support for skip connections.
- class TimeSeriesSampler(dataset, batch_size, shuffle=True, aggregate_last=True, drop_last=False)[source]#
Bases:
SamplerBatch sampler for datasets indexed by time, to ensure that batches are comprised of samples from contiguous time periods.
- Parameters:
dataset (torch.utils.data.Dataset) – The PyTorch dataset to sample from.
batch_size (int) – Number of samples per batch.
shuffle (bool, optional) – Whether to shuffle the order of batches. Default is True.
aggregate_last (bool, optional) – Whether to aggregate the last batch with the previous one if it has length smaller than batch_size. Default is True.
drop_last (bool, optional) – Whether to drop the last batch if it has length smaller than batch_size. Default is False.
- class MultiOutputSharpe(skip_validation=True, unbiased=True)[source]#
Bases:
ModuleNegative Sharpe ratio loss for multi-output regression problems.
Notes
When a neural network is designed so that the output can be interpreted as signals or portfolio weights for each output, a stylized Sharpe ratio can be calculated by multiplying the true returns by the respective signals or weights, before downsampling to portfolio returns. The Sharpe ratio, excluding trading frictions such as transaction costs, can be calculated over the batch.
Neural networks are most naturally formulated as minimization problems, so the negative Sharpe ratio is used as a loss function.
- class MultiOutputMCR(skip_validation=True, unbiased=True)[source]#
Bases:
ModuleNegative mean-concentration risk ratio loss for multi-output regression problems.
Notes
By mean-concentration risk ratio, we refer to the ratio of the mean return within a time period, to the standard deviation of returns within that time period. This differs from a Sharpe ratio in that the Sharpe is a temporal quantity, whereas this statistic is cross-sectional. Maximisation of such a statistic would encourage positive returns at each time period whilst penalising diversity in the cross-sectional return distribution. The goal is to encourage prevent the model from concentrating returns in a small subset of the outputs.
This statistic can be calculated for each sample in a batch, and then averaged over the batch. Neural networks are most naturally formulated as minimization problems, so the negative mean-concentration risk ratio is used as a loss function.
- class MLPTrainer(train_pct=0.8, batch_size=256, use_ts_sampler=False, learning_rate=0.001, weight_decay=0.0, epochs=50, loss_fn=MSELoss(), x_scaler=StandardScaler(with_mean=False), y_scaler=StandardScaler(with_mean=False), patience=5, reg_turnover=0.0, random_state=0, verbose=False)[source]#
Bases:
objectTrainer utility for fitting a PyTorch regression model with time-based train/validation splitting, optional scaling, and early stopping.
- Parameters:
train_pct (float, optional) – Fraction of unique dates used for the training set (remainder used for validation). Default is 0.8.
batch_size (int, optional) – Batch size used by the training and evaluation dataloaders. Default is 256.
use_ts_sampler (bool, optional) – Whether to use a time-series batch sampler (contiguous batches by time order) instead of random shuffling. Default is False.
learning_rate (float, optional) – Learning rate used by the AdamW optimizer. Default is 1e-3.
weight_decay (float, optional) – Weight decay (L2 penalty) used by the AdamW optimizer. Default is 0.0.
epochs (int, optional) – Maximum number of training epochs. Default is 50.
loss_fn (torch.nn.Module, optional) – Loss function used for optimization. Default is
nn.MSELoss().x_scaler (object or None, optional) – Feature scaler implementing
fitandtransform(e.g.StandardScaler). If None, no scaling is applied to inputs. Default isStandardScaler(with_mean=False).y_scaler (object or None, optional) – Target scaler implementing
fitandtransform(e.g.StandardScaler). If None, no scaling is applied to targets. Default isStandardScaler(with_mean=False).patience (int, optional) – Number of epochs without validation improvement tolerated before early stopping. Default is 5.
reg_turnover (float, optional) – Strength of an L1 penalty on successive prediction differences, intended to discourage excessive turnover. If 0, no turnover penalty is applied. Default is 0.0.
random_state (int, optional) – Random seed used for PyTorch initialization and training. Default is 0.
verbose (bool, optional) – Whether to print periodic training diagnostics. Default is False.