TCN, Transformer & Advanced Options

The toolbox provides two more sequence models — a TCN and a Transformer — alongside the LSTM and GRU covered on the Architectures page. There are also three options you can layer on top of a network: ensembling (for any model), and LayerNorm and categorical input embeddings (for the MLP). This page covers all five functions.

As with every other model, you select these with SetNeuralNetworkType and train/run them through the ordinary Train… / Run… functions. None of them changes how you call training — they only add a configuration step.

Note

Like the LSTM and GRU, the TCN and Transformer are sequence models: they read a window of consecutive bars. They cannot be exported as an AFL formula (the network-to-AFL export is MLP-only), and they need their full window of history before they can predict.

TCN — temporal convolutional network

A TCN reads the recent window of bars with stacked one-dimensional convolutions. Each level looks at a small kernel of neighbouring time steps, and the dilation doubles every level (1, 2, 4, 8, …) so that with only a few levels the network can see a long stretch of history while staying cheap to train. The convolutions are causal — a bar only ever sees itself and earlier bars, never the future — so the output does not peek ahead.

It is a good middle ground between the MLP and the recurrent models: it captures order-dependent shape over a window like an LSTM, but trains more like a feed-forward network. Pick it with SetNeuralNetworkType(3) and configure it with SetTcnParams.

SetTcnParams(Channels, KernelSize, Levels)

Parameter	Description
Channels	Number of convolution channels (filters) per level — the width of the network. Must be at least 1. Default 16.
KernelSize	How many neighbouring time steps each convolution looks at. Must be at least 2 (a kernel of 1 has no temporal extent). Default 2.
Levels	Number of dilated convolution levels. Must be at least 1. More levels see further back. Default 4.

The window length the TCN reads — its receptive field — is worked out for you from these three numbers as 1 + (KernelSize − 1) × (2^Levels − 1). With the defaults (kernel 2, 4 levels) that is 16 bars. You do not set the window directly; you set the kernel and levels and the receptive field follows.

SetNeuralNetworkType( 3 );        // TCN
SetTcnParams( 16, 2, 4 );         // 16 channels, kernel 2, 4 levels -> 16-bar receptive field
SetLearningAlgorithm( 8 );        // AdamW suits the sequence models
SetLearningRate( 0.003 );
SetPercentTestingData( 25 );
SetEarlyStoppingPatience( 30 );
SetMaximumEpochs( 300 );

input1 = PercentDifference( Close, 1 );
input2 = RSI( 14 ) / 100;
target = Ref( PercentDifference( Close, 1 ), 1 );
TrainNeuralNetwork3( input1, input2, target, "MyTCN" );
RestoreDefaults();

Transformer — self-attention

A Transformer reads its window with multi-head self-attention: at each position the network learns which other bars in the window are most relevant and weighs them accordingly, rather than reading the window in a fixed order. It adds a sinusoidal positional encoding so it still knows where each bar sits in the window, and the attention is causal — every position attends only to itself and the past.

Pick it with SetNeuralNetworkType(4) and configure it with SetTransformerParams.

SetTransformerParams(DModel, NumHeads, NumBlocks, SeqLen)

Parameter	Description
DModel	Model dimension — the width of the internal representation. Must be at least 1, and must be divisible by NumHeads. Default 16.
NumHeads	Number of attention heads. Must be at least 1 and must divide `DModel` evenly. Default 2.
NumBlocks	Number of stacked encoder blocks (depth). Must be at least 1. Default 2.
SeqLen	Window length — how many consecutive bars are read as one sequence. Must be at least 1. Default 16.

The feed-forward sub-layer inside each block is sized automatically at 4 × DModel. If DModel is not divisible by NumHeads the call is rejected with a trace message and the setting is left unchanged.

SetNeuralNetworkType( 4 );          // Transformer
SetTransformerParams( 16, 2, 2, 16 ); // dModel 16, 2 heads, 2 blocks, 16-bar window
SetLearningAlgorithm( 8 );
SetLearningRate( 0.003 );
SetPercentTestingData( 25 );
SetEarlyStoppingPatience( 30 );
SetMaximumEpochs( 300 );

input1 = PercentDifference( Close, 1 );
input2 = RSI( 14 ) / 100;
target = Ref( PercentDifference( Close, 1 ), 1 );
TrainNeuralNetwork3( input1, input2, target, "MyTransformer" );
RestoreDefaults();

Tip

The TCN and Transformer have many more parameters than a small MLP, so they overfit more easily on the noisy, low-signal data typical of markets. Reserve test data with SetPercentTestingData, use early stopping, and keep the model small. On most problems a well-tuned MLP is still the model to beat — reach for these when the sequence of recent bars genuinely matters and you have a large training window.

Ensembling: `SetEnsembleSize`

An ensemble trains several networks instead of one and averages their predictions. Because each network starts from a different random seed and so settles in a slightly different place, averaging them cancels out some of the noise and usually gives a steadier, more reliable prediction. This works with any model type — MLP, LSTM, GRU, TCN or Transformer.

SetEnsembleSize(k)

Parameter	Description
k	Number of networks in the ensemble. Must be at least 1. Default 1 (a single network — the unchanged behaviour).

How it behaves depends on which function you call:

Training (TrainNeuralNetwork… / TrainMultiInputNeuralNetwork): with k > 1 the toolbox trains k seed-varied networks and saves them as member files <file>.0, <file>.1, … <file>.{k−1} instead of the bare <file>. With k = 1 it writes the bare <file> exactly as before.
Running (RunNeuralNetwork… / RunMultiInputNeuralNetwork): the run path finds the member files on disk and averages them automatically. You do not call SetEnsembleSize before running — the saved artifact already knows how many members there are.
Walk-Forward indicator (NeuralNetworkIndicator…): with k > 1 it trains k fresh networks per predicted bar and averages them, at k× the per-bar cost.

SetEnsembleSize( 5 );             // average 5 seed-varied networks
SetNeuralNetworkType( 0 );        // any model type works
input1 = PercentDifference( Close, 1 );
input2 = RSI( 14 ) / 100;
target = Ref( PercentDifference( Close, 1 ), 1 );
TrainNeuralNetwork3( input1, input2, target, "MyEnsemble" );  // saves MyEnsemble.0 … MyEnsemble.4
RestoreDefaults();

// Later — no SetEnsembleSize needed, the saved files know k:
pred = RunNeuralNetwork3( input1, input2, "MyEnsemble" );

Warning

Because ensemble members are stored as <file>.0 … <file>.{k−1}, avoid naming a plain single network with a .<digit> suffix (a network literally called model.0 is indistinguishable from member 0 of an ensemble named model). Retraining with a smaller k (or k = 1) automatically deletes the now-stale member files.

LayerNorm (MLP): `SetLayerNorm`

Layer Normalization rescales the activations inside each hidden layer so they stay in a stable range as training proceeds. On deeper MLPs this often makes training faster and steadier. It adds a small learnable scale and shift per neuron, trained alongside the ordinary weights.

SetLayerNorm(flag)

Parameter	Description
flag	`0` = off (default), `1` = on. Any other value is rejected.

LayerNorm is applied to every hidden layer; the output layer is never normalized. It is an MLP-only option — the sequence models normalize internally where they need to, and ignore this setting.

Note

While LayerNorm is on, the network-to-AFL export is disabled — the exported formula has no way to express the normalization.

Categorical input embeddings (MLP): `SetCategoricalInput`

Most inputs are numbers the network scales and reads directly. Sometimes an input is really a category — a day-of-week, a market regime, a discrete state — encoded as an integer ID. Feeding such an ID straight in as a number implies a false ordering (that category 3 is "between" 2 and 4). An embedding avoids that: the network maps each category ID to a short learned vector and trains those vectors along with the weights, discovering for itself how the categories relate.

SetCategoricalInput(ChannelIndex, NumCategories, EmbedDim)

Parameter	Description
ChannelIndex	Which input channel is categorical, by its position in the input order used by the `AddNeuralNetworkInput` / `TrainNeuralNetwork…` family. Must be at least 0.
NumCategories	How many distinct categories the channel holds; its array must carry integer IDs in `0 … NumCategories−1`. Must be at least 2.
EmbedDim	Length of the learned vector each category maps to. Must be at least 1.

Mark each categorical channel with its own call — the calls accumulate, so you can declare several categorical inputs. A channel declared this way bypasses the usual input scaling (its values are IDs, not magnitudes). This is an MLP-only option. The default is none declared — every input treated as numeric, the unchanged behaviour.

SetNeuralNetworkType( 0 );        // MLP
// Channel 0 is a regime ID in 0..3, mapped to a 2-dimensional embedding:
SetCategoricalInput( 0, 4, 2 );

ClearNeuralNetworkInputs();
AddNeuralNetworkInput( regimeId, 0 );             // channel 0 — categorical (IDs 0..3)
AddNeuralNetworkInput( RSI( 14 ) / 100, 0 );      // channel 1 — numeric as usual
AddNeuralNetworkOutput( Ref( PercentDifference( Close, 1 ), 1 ), 0 );
TrainMultiInputNeuralNetwork( "MyEmbedNet" );
RestoreDefaults();

Note

While any categorical input is declared, the network-to-AFL export is disabled — an embedding lookup table can't be written as a formula.

Settings shared with the other models

The sequence models honour the same training knobs as the MLP. SetBatchSize and the learning-rate schedule family (SetLearningRateSchedule, SetLRScheduleStep, SetLRStepDecayFactor, SetLRScheduleMinPercent, SetSGDRCycleMultiplier) are honoured by the LSTM, GRU, TCN and Transformer too — not just the MLP. So are SetLearningAlgorithm, SetLearningRate, SetMaximumEpochs, SetPercentTestingData, the scaling, gradient clipping, Huber delta, early stopping, SetSeed, SetMaximumThreads and SetEnsembleSize. See the Settings Reference for the full list.

TCN, Transformer & Advanced Options

TCN — temporal convolutional network

Transformer — self-attention

Ensembling: SetEnsembleSize

LayerNorm (MLP): SetLayerNorm

Categorical input embeddings (MLP): SetCategoricalInput

Settings shared with the other models

Ensembling: `SetEnsembleSize`

LayerNorm (MLP): `SetLayerNorm`

Categorical input embeddings (MLP): `SetCategoricalInput`