WiseTrader Toolbox

Network Architectures: MLP, LSTM & GRU

The toolbox can build several kinds of network. The default is a classic feed-forward network (an MLP); the others are sequence models designed to read a window of consecutive bars — the recurrent networks (LSTM and GRU) and the TCN and Transformer. You pick which one to build with a single call to SetNeuralNetworkType. This page covers the MLP and the two recurrent models; the two newer sequence models have their own page, TCN & Transformer.

Feed-forward (MLP)

An MLP — Multi-Layer Perceptron — is the standard neural network described in the introduction. Each bar is treated independently: you hand the network a set of inputs for that bar, the signal flows forward through the hidden layers, and out comes a prediction. The network has no memory of previous bars. If you want it to "see" history, you supply that history yourself as inputs — for example by feeding lagged values (Ref) of an indicator.

The MLP is the right default for almost everything. It is fast, robust, and supports every training algorithm and every feature in the toolbox — including the categorical input embeddings and LayerNorm, and the network-to-AFL export, which only the MLP supports.

Recurrent networks (LSTM & GRU)

Recurrent networks are built for sequences. Instead of looking at one bar at a time, they read a window of consecutive bars in order and carry an internal "memory" from one step to the next. This lets them learn patterns that depend on the order of recent bars — the shape of a move over the last N bars rather than just its current values. For time-series this can capture momentum and mean-reversion dynamics that an MLP would only see if you hand-engineered lagged inputs.

  • LSTM (Long Short-Term Memory) uses a richer gated cell that can hold information over longer windows. It is the more powerful of the two and a good first choice when you want a recurrent model.
  • GRU (Gated Recurrent Unit) is a streamlined cousin of the LSTM with fewer internal gates. It trains a little faster and often performs comparably, especially on shorter windows.
Note

The recurrent models are trained and run through the standard Train… / Run… functions (including the multi-input API). They also work through the Walk-Forward NeuralNetworkIndicator… functions, provided the look-back is at least the sequence length (see Neural Network Functions). The recurrent models do not support the network-to-AFL export feature.

When to use each for time-series

  • Start with the MLP. On the noisy, low-signal data typical of markets, a small MLP with well-chosen inputs is hard to beat and is the most robust choice. Use it unless you have a specific reason not to.
  • Consider LSTM/GRU when the sequence of recent bars matters and you have a reasonably large training window. Recurrent models have more moving parts and are more prone to overfitting on small datasets, so apply the anti-overfitting settings firmly.
  • Whichever you choose, the bottleneck in this regime is almost always overfitting, not the model type. Get your inputs and your test split right first.

Choosing the model: SetNeuralNetworkType

SetNeuralNetworkType(Type)
TypeModel
0MLP (feed-forward) — the default.
1LSTM (recurrent).
2GRU (recurrent).
3TCN (temporal convolutional network). See TCN & Transformer.
4Transformer (self-attention). See TCN & Transformer.

If you never call this function you get an MLP. Values outside 0–4 are rejected and the setting is left unchanged.

Configuring recurrent networks: SetRecurrentParams

When you select LSTM or GRU you also need to tell the network how big its memory is and how long a window it reads. This is done with SetRecurrentParams, which takes three whole-number arguments. It is ignored for the MLP.

SetRecurrentParams(HiddenSize, Layers, SequenceLength)
ParameterDescription
HiddenSizeNumber of hidden units in the recurrent cell — the size of its memory. Larger values model more complex dynamics but overfit more easily. Must be at least 1.
LayersNumber of stacked recurrent layers. Must be at least 1; 1 is the usual choice. This builds a genuinely stacked multi-layer LSTM/GRU — each layer feeds the next — so values above 1 add real depth.
SequenceLengthThe window length — how many consecutive bars the network reads as one sequence before producing an output. Must be at least 1. This is the recurrent equivalent of "how much history to look at".

All three values must be 1 or greater; if any is invalid the call is ignored with a trace message.

SetNeuralNetworkType( 1 );          // LSTM
SetRecurrentParams( 16, 1, 20 );    // 16 hidden units, 1 layer, 20-bar window
SetLearningAlgorithm( 8 );          // AdamW suits recurrent training
SetLearningRate( 0.003 );
SetPercentTestingData( 25 );
SetEarlyStoppingPatience( 30 );
SetMaximumEpochs( 300 );
Note

Recurrent models always use a linear output projection and are trained with the LR-based optimizers (the Adam family, RMSProp, and so on); the RPROP-style algorithms and the layer/activation functions below apply to the MLP. The hidden-layer activations and learning-rate schedules used by the MLP do not apply to the recurrent cell, whose gate activations are fixed internally.

Setting layer sizes (MLP)

For an MLP you describe the hidden layers with one of two function families. Both replace any previously configured layers. The number in the function name is the number of hidden layers it defines (1 to 4).

Sizes only: SetNetworkLayer1…4

Use these when you are happy with the default activation (sigmoid) on every layer. Pass the neuron count for each hidden layer. Each count must be between 1 and 100.

FunctionArguments
SetNetworkLayer1(n1) — one hidden layer of n1 neurons.
SetNetworkLayer2(n1, n2) — two hidden layers.
SetNetworkLayer3(n1, n2, n3) — three hidden layers.
SetNetworkLayer4(n1, n2, n3, n4) — four hidden layers.

Sizes and activations: SetNetworkWithActivationLayer1…4

Use these when you want to choose the activation function for each layer — for example a Linear output for regression. The arguments are (neurons, activation) pairs for each hidden layer, followed by a single trailing argument for the output-layer activation. So each function takes 2 × (number of hidden layers) + 1 arguments.

FunctionArguments (count)
SetNetworkWithActivationLayer1(n1, a1, outAct) — 3 args: one hidden layer.
SetNetworkWithActivationLayer2(n1, a1, n2, a2, outAct) — 5 args.
SetNetworkWithActivationLayer3(n1, a1, n2, a2, n3, a3, outAct) — 7 args.
SetNetworkWithActivationLayer4(n1, a1, n2, a2, n3, a3, n4, a4, outAct) — 9 args.

The activation codes (0–9) are listed and explained on the Activation Functions page. Neuron counts here may go up to 200.

// One hidden layer of 8 tanh neurons, linear output (typical regression setup):
SetNetworkWithActivationLayer1( 8, 1, 9 );

// Two hidden layers: 12 ReLU then 6 ReLU, sigmoid output (0..1, e.g. up/down):
SetNetworkWithActivationLayer2( 12, 7, 6, 7, 0 );
Tip

Smaller networks overfit less. Start with a single small hidden layer and only add neurons or layers if the network genuinely cannot fit the training data. See Accuracy & Avoiding Overfitting.