macrosynergy.learning.forecasting.torch.models#

class MultiLayerPerceptron(n_inputs, n_latent, n_outputs, encoder_activation='tanh', head_activation='identity', fit_encoder_intercept=False, fit_head_intercept=True, dropout_p=0)[source]#

Bases: Module

Multi-layer perceptron models in PyTorch.

Parameters:
  • n_inputs (int) – Number of input features. Must be at least 1.

  • n_latent (Union[int, list[int]]) – Number of latent features in a single hidden layer or list specifying the size of each hidden layer.

  • n_outputs (int) – Number of output variables. Must be at least 1.

  • encoder_activation (str, optional) – Activation function for the encoder layers. Default is “tanh”. Other options include “relu” and “sigmoid”.

  • head_activation (str, optional) – Activation function for the head layers. Default is “identity” for no activation. Other options include “tanh”, “relu” and “sigmoid”.

  • fit_encoder_intercept (bool, optional) – Whether to fit intercepts in the encoder layers. Default is False.

  • fit_head_intercept (bool, optional) – Whether to fit intercepts in the output head. Default is True.

  • dropout_p (float, optional) – Dropout probability for regularization. Default is 0 (no dropout). Must be between 0 and 0.5.

Notes

A multi-layer perceptron is a feed-forward neural network that learns a (hopefully) optimal representation of the feature set for a prediction task, or for a collection of tasks. The intitial set is transformed into a new, “learnt”, collection of features. This is the “first hidden layer” of the network. Each learnt feature is the composition of the linear combination of initial features and a non-linear activation function. The choice of activation is currently “relu” (\(f(x) = \max(0, x)\)), “tanh” (\(f(x) = \frac{e^x - e^{-x}}{e^x + e^{-x}}\)), or “sigmoid” (\(f(x) = \frac{1}{1 + e^{-x}}\)). This new feature set can be further transformed in the same manner by creating a second hidden layer, and so on.

The part of the network that describes how the initial features are transformed into the final features (before mapping to the outputs) is called the “encoder”. The component that maps the final learnt features to the outputs is called the “projection head”. When multiple outputs are being modelled, this is usually referred to as having a “multi-head” architecture.

What’s the advantage of a feedforward neural network over other models on tabular datasets? Structure and customizability. 32 neurons in a hidden layer means that 32 features are being learnt. I can shrink these features towards priors, if I have any beliefs. I can regularize network outputs to encourage smoothness (temporal regularization) and consistency with known relationships (spatial regularization). I can customize loss functions to optimize economically informed losses rather than generic distance metrics. I can penalize correlation against existing strategies, if so desired. People often refer to neural network flexibility in the context of learning an arbitrarily complex function. While this is true, I would use the word “flexibility” to refer to the ability to customize architectures and loss functions to suit a particular problem.

The model allows for dropout regularization, which regularizes a neural network by randomly “dropping out” (setting to zero) a fraction of the neurons during training. This prevents over-reliance on specific neurons and encourages the network to become robust to the design of the neural network architecture.

Future work#

  • Support for skip connections.

forward(x)[source]#

Forward pass through the network.

Parameters:

x (torch.Tensor) – Input tensor of shape (batch_size, n_inputs).

Returns:

Output tensor of shape (batch_size, n_outputs).

Return type:

torch.Tensor

Submodules#