macrosynergy.learning.forecasting.torch#

class MultiLayerPerceptron(n_inputs, n_latent, n_outputs, encoder_activation='tanh', head_activation='identity', fit_encoder_intercept=False, fit_head_intercept=True, dropout_p=0)[source]#

Bases: Module

Multi-layer perceptron models in PyTorch.

Parameters:
  • n_inputs (int) – Number of input features. Must be at least 1.

  • n_latent (Union[int, list[int]]) – Number of latent features in a single hidden layer or list specifying the size of each hidden layer.

  • n_outputs (int) – Number of output variables. Must be at least 1.

  • encoder_activation (str, optional) – Activation function for the encoder layers. Default is “tanh”. Other options include “relu” and “sigmoid”.

  • head_activation (str, optional) – Activation function for the head layers. Default is “identity” for no activation. Other options include “tanh”, “relu” and “sigmoid”.

  • fit_encoder_intercept (bool, optional) – Whether to fit intercepts in the encoder layers. Default is False.

  • fit_head_intercept (bool, optional) – Whether to fit intercepts in the output head. Default is True.

  • dropout_p (float, optional) – Dropout probability for regularization. Default is 0 (no dropout). Must be between 0 and 0.5.

Notes

A multi-layer perceptron is a feed-forward neural network that learns a (hopefully) optimal representation of the feature set for a prediction task, or for a collection of tasks. The intitial set is transformed into a new, “learnt”, collection of features. This is the “first hidden layer” of the network. Each learnt feature is the composition of the linear combination of initial features and a non-linear activation function. The choice of activation is currently “relu” (\(f(x) = \max(0, x)\)), “tanh” (\(f(x) = \frac{e^x - e^{-x}}{e^x + e^{-x}}\)), or “sigmoid” (\(f(x) = \frac{1}{1 + e^{-x}}\)). This new feature set can be further transformed in the same manner by creating a second hidden layer, and so on.

The part of the network that describes how the initial features are transformed into the final features (before mapping to the outputs) is called the “encoder”. The component that maps the final learnt features to the outputs is called the “projection head”. When multiple outputs are being modelled, this is usually referred to as having a “multi-head” architecture.

What’s the advantage of a feedforward neural network over other models on tabular datasets? Structure and customizability. 32 neurons in a hidden layer means that 32 features are being learnt. I can shrink these features towards priors, if I have any beliefs. I can regularize network outputs to encourage smoothness (temporal regularization) and consistency with known relationships (spatial regularization). I can customize loss functions to optimize economically informed losses rather than generic distance metrics. I can penalize correlation against existing strategies, if so desired. People often refer to neural network flexibility in the context of learning an arbitrarily complex function. While this is true, I would use the word “flexibility” to refer to the ability to customize architectures and loss functions to suit a particular problem.

The model allows for dropout regularization, which regularizes a neural network by randomly “dropping out” (setting to zero) a fraction of the neurons during training. This prevents over-reliance on specific neurons and encourages the network to become robust to the design of the neural network architecture.

Future work#

  • Support for skip connections.

forward(x)[source]#

Forward pass through the network.

Parameters:

x (torch.Tensor) – Input tensor of shape (batch_size, n_inputs).

Returns:

Output tensor of shape (batch_size, n_outputs).

Return type:

torch.Tensor

class TimeSeriesSampler(dataset, batch_size, shuffle=True, aggregate_last=True, drop_last=False)[source]#

Bases: Sampler

Batch sampler for datasets indexed by time, to ensure that batches are comprised of samples from contiguous time periods.

Parameters:
  • dataset (torch.utils.data.Dataset) – The PyTorch dataset to sample from.

  • batch_size (int) – Number of samples per batch.

  • shuffle (bool, optional) – Whether to shuffle the order of batches. Default is True.

  • aggregate_last (bool, optional) – Whether to aggregate the last batch with the previous one if it has length smaller than batch_size. Default is True.

  • drop_last (bool, optional) – Whether to drop the last batch if it has length smaller than batch_size. Default is False.

class MultiOutputSharpe(skip_validation=True, unbiased=True)[source]#

Bases: Module

Negative Sharpe ratio loss for multi-output regression problems.

Notes

When a neural network is designed so that the output can be interpreted as signals or portfolio weights for each output, a stylized Sharpe ratio can be calculated by multiplying the true returns by the respective signals or weights, before downsampling to portfolio returns. The Sharpe ratio, excluding trading frictions such as transaction costs, can be calculated over the batch.

Neural networks are most naturally formulated as minimization problems, so the negative Sharpe ratio is used as a loss function.

forward(y_pred, y_true)[source]#

Evaluate batch negative Sharpe ratio loss.

Parameters:
  • y_pred (torch.Tensor) – Predicted outputs (signals or portfolio weights).

  • y_true (torch.Tensor) – True outputs (returns).

class MultiOutputMCR(skip_validation=True, unbiased=True)[source]#

Bases: Module

Negative mean-concentration risk ratio loss for multi-output regression problems.

Notes

By mean-concentration risk ratio, we refer to the ratio of the mean return within a time period, to the standard deviation of returns within that time period. This differs from a Sharpe ratio in that the Sharpe is a temporal quantity, whereas this statistic is cross-sectional. Maximisation of such a statistic would encourage positive returns at each time period whilst penalising diversity in the cross-sectional return distribution. The goal is to encourage prevent the model from concentrating returns in a small subset of the outputs.

This statistic can be calculated for each sample in a batch, and then averaged over the batch. Neural networks are most naturally formulated as minimization problems, so the negative mean-concentration risk ratio is used as a loss function.

forward(y_pred, y_true)[source]#

Evaluate batch negative mean-concentration risk ratio loss.

Parameters:
  • y_pred (torch.Tensor) – Predicted outputs (signals or portfolio weights).

  • y_true (torch.Tensor) – True outputs (returns).

Subpackages#