TCN, Transformer & Advanced Options
The toolbox provides two more sequence models — a TCN and a Transformer — alongside the LSTM and GRU covered on the Architectures page. There are also three options you can layer on top of a network: ensembling (for any model), and LayerNorm and categorical input embeddings (for the MLP). This page covers all five functions.
As with every other model, you select these with
SetNeuralNetworkType and train/run them
through the ordinary Train… /
Run… functions. None of them changes how you call training — they only
add a configuration step.
Like the LSTM and GRU, the TCN and Transformer are sequence models: they read a window of consecutive bars. They cannot be exported as an AFL formula (the network-to-AFL export is MLP-only), and they need their full window of history before they can predict.
TCN — temporal convolutional network
A TCN reads the recent window of bars with stacked one-dimensional convolutions. Each level looks at a small kernel of neighbouring time steps, and the dilation doubles every level (1, 2, 4, 8, …) so that with only a few levels the network can see a long stretch of history while staying cheap to train. The convolutions are causal — a bar only ever sees itself and earlier bars, never the future — so the output does not peek ahead.
It is a good middle ground between the MLP and the recurrent models: it captures
order-dependent shape over a window like an LSTM, but trains more like a feed-forward
network. Pick it with SetNeuralNetworkType(3) and configure it with
SetTcnParams.
SetTcnParams(Channels, KernelSize, Levels)
| Parameter | Description |
|---|---|
| Channels | Number of convolution channels (filters) per level — the width of the network. Must be at least 1. Default 16. |
| KernelSize | How many neighbouring time steps each convolution looks at. Must be at least 2 (a kernel of 1 has no temporal extent). Default 2. |
| Levels | Number of dilated convolution levels. Must be at least 1. More levels see further back. Default 4. |
The window length the TCN reads — its receptive field — is worked out
for you from these three numbers as
1 + (KernelSize − 1) × (2Levels − 1). With the defaults
(kernel 2, 4 levels) that is 16 bars. You do not set the window directly; you set the
kernel and levels and the receptive field follows.
SetNeuralNetworkType( 3 ); // TCN
SetTcnParams( 16, 2, 4 ); // 16 channels, kernel 2, 4 levels -> 16-bar receptive field
SetLearningAlgorithm( 8 ); // AdamW suits the sequence models
SetLearningRate( 0.003 );
SetPercentTestingData( 25 );
SetEarlyStoppingPatience( 30 );
SetMaximumEpochs( 300 );
input1 = PercentDifference( Close, 1 );
input2 = RSI( 14 ) / 100;
target = Ref( PercentDifference( Close, 1 ), 1 );
TrainNeuralNetwork3( input1, input2, target, "MyTCN" );
RestoreDefaults();
Transformer — self-attention
A Transformer reads its window with multi-head self-attention: at each position the network learns which other bars in the window are most relevant and weighs them accordingly, rather than reading the window in a fixed order. It adds a sinusoidal positional encoding so it still knows where each bar sits in the window, and the attention is causal — every position attends only to itself and the past.
Pick it with SetNeuralNetworkType(4) and configure it with
SetTransformerParams.
SetTransformerParams(DModel, NumHeads, NumBlocks, SeqLen)
| Parameter | Description |
|---|---|
| DModel | Model dimension — the width of the internal representation. Must be at least 1, and must be divisible by NumHeads. Default 16. |
| NumHeads | Number of attention heads. Must be at least 1 and must divide
DModel evenly. Default 2. |
| NumBlocks | Number of stacked encoder blocks (depth). Must be at least 1. Default 2. |
| SeqLen | Window length — how many consecutive bars are read as one sequence. Must be at least 1. Default 16. |
The feed-forward sub-layer inside each block is sized automatically at
4 × DModel. If DModel is not divisible by
NumHeads the call is rejected with a trace message and the setting is left
unchanged.
SetNeuralNetworkType( 4 ); // Transformer
SetTransformerParams( 16, 2, 2, 16 ); // dModel 16, 2 heads, 2 blocks, 16-bar window
SetLearningAlgorithm( 8 );
SetLearningRate( 0.003 );
SetPercentTestingData( 25 );
SetEarlyStoppingPatience( 30 );
SetMaximumEpochs( 300 );
input1 = PercentDifference( Close, 1 );
input2 = RSI( 14 ) / 100;
target = Ref( PercentDifference( Close, 1 ), 1 );
TrainNeuralNetwork3( input1, input2, target, "MyTransformer" );
RestoreDefaults();
The TCN and Transformer have many more parameters than a small MLP, so they overfit
more easily on the noisy, low-signal data typical of markets. Reserve test data with
SetPercentTestingData, use early stopping, and keep the model small. On
most problems a well-tuned MLP is still the model to beat — reach for these when the
sequence of recent bars genuinely matters and you have a large training
window.
Ensembling: SetEnsembleSize
An ensemble trains several networks instead of one and averages their predictions. Because each network starts from a different random seed and so settles in a slightly different place, averaging them cancels out some of the noise and usually gives a steadier, more reliable prediction. This works with any model type — MLP, LSTM, GRU, TCN or Transformer.
SetEnsembleSize(k)
| Parameter | Description |
|---|---|
| k | Number of networks in the ensemble. Must be at least 1. Default 1 (a single network — the unchanged behaviour). |
How it behaves depends on which function you call:
- Training (
TrainNeuralNetwork…/TrainMultiInputNeuralNetwork): withk > 1the toolbox trainskseed-varied networks and saves them as member files<file>.0,<file>.1, …<file>.{k−1}instead of the bare<file>. Withk = 1it writes the bare<file>exactly as before. - Running (
RunNeuralNetwork…/RunMultiInputNeuralNetwork): the run path finds the member files on disk and averages them automatically. You do not callSetEnsembleSizebefore running — the saved artifact already knows how many members there are. - Walk-Forward indicator (
NeuralNetworkIndicator…): withk > 1it trainskfresh networks per predicted bar and averages them, atk× the per-bar cost.
SetEnsembleSize( 5 ); // average 5 seed-varied networks
SetNeuralNetworkType( 0 ); // any model type works
input1 = PercentDifference( Close, 1 );
input2 = RSI( 14 ) / 100;
target = Ref( PercentDifference( Close, 1 ), 1 );
TrainNeuralNetwork3( input1, input2, target, "MyEnsemble" ); // saves MyEnsemble.0 … MyEnsemble.4
RestoreDefaults();
// Later — no SetEnsembleSize needed, the saved files know k:
pred = RunNeuralNetwork3( input1, input2, "MyEnsemble" );
Because ensemble members are stored as <file>.0 … <file>.{k−1},
avoid naming a plain single network with a .<digit> suffix
(a network literally called model.0 is indistinguishable from member 0 of
an ensemble named model). Retraining with a smaller k (or
k = 1) automatically deletes the now-stale member files.
LayerNorm (MLP): SetLayerNorm
Layer Normalization rescales the activations inside each hidden layer so they stay in a stable range as training proceeds. On deeper MLPs this often makes training faster and steadier. It adds a small learnable scale and shift per neuron, trained alongside the ordinary weights.
SetLayerNorm(flag)
| Parameter | Description |
|---|---|
| flag | 0 = off (default), 1 = on. Any other
value is rejected. |
LayerNorm is applied to every hidden layer; the output layer is never normalized. It is an MLP-only option — the sequence models normalize internally where they need to, and ignore this setting.
While LayerNorm is on, the network-to-AFL export is disabled — the exported formula has no way to express the normalization.
Categorical input embeddings (MLP): SetCategoricalInput
Most inputs are numbers the network scales and reads directly. Sometimes an input is really a category — a day-of-week, a market regime, a discrete state — encoded as an integer ID. Feeding such an ID straight in as a number implies a false ordering (that category 3 is "between" 2 and 4). An embedding avoids that: the network maps each category ID to a short learned vector and trains those vectors along with the weights, discovering for itself how the categories relate.
SetCategoricalInput(ChannelIndex, NumCategories, EmbedDim)
| Parameter | Description |
|---|---|
| ChannelIndex | Which input channel is categorical, by its position in the
input order used by the AddNeuralNetworkInput /
TrainNeuralNetwork… family. Must be at least 0. |
| NumCategories | How many distinct categories the channel holds; its array
must carry integer IDs in 0 … NumCategories−1. Must be at least 2. |
| EmbedDim | Length of the learned vector each category maps to. Must be at least 1. |
Mark each categorical channel with its own call — the calls accumulate, so you can declare several categorical inputs. A channel declared this way bypasses the usual input scaling (its values are IDs, not magnitudes). This is an MLP-only option. The default is none declared — every input treated as numeric, the unchanged behaviour.
SetNeuralNetworkType( 0 ); // MLP
// Channel 0 is a regime ID in 0..3, mapped to a 2-dimensional embedding:
SetCategoricalInput( 0, 4, 2 );
ClearNeuralNetworkInputs();
AddNeuralNetworkInput( regimeId, 0 ); // channel 0 — categorical (IDs 0..3)
AddNeuralNetworkInput( RSI( 14 ) / 100, 0 ); // channel 1 — numeric as usual
AddNeuralNetworkOutput( Ref( PercentDifference( Close, 1 ), 1 ), 0 );
TrainMultiInputNeuralNetwork( "MyEmbedNet" );
RestoreDefaults();
While any categorical input is declared, the network-to-AFL export is disabled — an embedding lookup table can't be written as a formula.
Settings shared with the other models
The sequence models honour the same training knobs as the MLP.
SetBatchSize and the learning-rate schedule family
(SetLearningRateSchedule, SetLRScheduleStep,
SetLRStepDecayFactor, SetLRScheduleMinPercent,
SetSGDRCycleMultiplier) are honoured by the LSTM, GRU, TCN and Transformer
too — not just the MLP. So are SetLearningAlgorithm,
SetLearningRate, SetMaximumEpochs,
SetPercentTestingData, the scaling, gradient clipping, Huber delta, early
stopping, SetSeed, SetMaximumThreads and
SetEnsembleSize. See the Settings Reference
for the full list.