Tutorial 3: Advanced Techniques

The first two tutorials used feed-forward networks that look at each bar more or less on its own. This one steps up to a recurrent network — an LSTM — that reads a whole sequence of bars in order and carries memory from one bar to the next. The worked example uses it to anticipate the next price swing. Along the way we cover the modern training settings that matter for harder problems: which optimizer and loss to use, gradient clipping, weight initialisation, and an honest account of what the recurrent engine does and does not do. If you have not met the recurrent models yet, the Architectures page introduces them.

Note

A ready-to-run copy of the worked example below is installed with the toolbox at WiseTraderToolbox\Neural Networks\Tutorial 3 - LSTM Swing.afl.

Why a recurrent network for swings

A swing is not a single-bar event — it is a shape that develops over many bars: momentum builds, stalls, then rolls over. A feed-forward network only sees that shape if you hand-engineer it into the inputs, feeding it dozens of lagged values and hoping it stitches the story together. A recurrent network is built for exactly this. It steps through the bars in order, updating an internal memory as it goes, so the order and the build-up are part of what it learns. You hand it a few features per bar, and it works out how a sequence of them tends to resolve.

The toolbox offers two recurrent cells, selected with SetNeuralNetworkType: the LSTM (type 1) and the lighter GRU (type 2). Both read sequences and carry memory; the GRU is a bit smaller and faster, the LSTM a bit more expressive. We use the LSTM here and show the one-line switch to a GRU later.

How the example is wired

Three settings turn an ordinary multi-input training run into a recurrent one:

SetNeuralNetworkType(1) selects the LSTM engine.
SetRecurrentParams(HiddenSize, Layers, SequenceLength) sizes it. The sequence length is the look-back window — how many consecutive bars the network reads to make one prediction. For a swing, set it to roughly the span of the swing you care about; we use 20.
You still feed features with AddNeuralNetworkInput and train with TrainMultiInputNeuralNetwork — but you do not lag the inputs yourself. You add each feature once, as a plain per-bar array, and the engine slides the window over them. That is the whole point of using a recurrent model.

The target is the percentage move Horizon bars ahead. Predicting the size of the move (a regression) suits the network's linear output, and its sign is the up/down swing call — so one model gives you both a magnitude and a direction. The features are the same stationary kind we have used throughout: a one-bar return, a momentum gauge and a volatility measure.

The worked example

//  WiseTrader Toolbox - Neural Network Tutorial 3: LSTM Swing
//  --------------------------------------------------------------------------
//  Uses an LSTM (a recurrent network) to anticipate the next price swing. The
//  network reads a SEQUENCE of recent bars in order and carries memory from one
//  bar to the next, so it can learn the shape of a developing swing - momentum
//  building, stalling, then rolling over - instead of seeing each bar in
//  isolation.
//
//  We predict the percentage move Horizon bars ahead. The sign of that
//  prediction is the up/down swing call. We feed a few stationary per-bar
//  features; the LSTM forms the look-back windows itself, so you do NOT
//  hand-lag the inputs.
//
//  Recurrent models (LSTM/GRU) train and run ONLY through this multi-input
//  Train.../Run... path. Training runs only when you click the Train button, so
//  the formula is happy left on a chart; once a network is saved it plots the
//  predicted swing automatically. The button technique is explained in full
//  under "Running a Network on a Chart".
//  --------------------------------------------------------------------------

SetBarsRequired(99999, 99999);

Horizon = 5;        // swing horizon: predict the move 5 bars ahead
SEQ     = 20;       // look-back window the LSTM reads in order

// 1. Settings
SetNeuralNetworkType(1);                     // 1 = LSTM (2 = GRU, the lighter alternative)
SetRecurrentParams(16, 1, SEQ);             // 16 hidden units, 1 layer, sequence length SEQ
SetLearningAlgorithm(7);                     // Adam (the recurrent engine is Adam-based)
SetLearningRate(0.01);                       // recurrent training likes a small rate
SetErrorAlgorithm(2);                        // Huber loss: robust to fat-tailed market moves
SetHuberDelta(1.0);
SetGradientClipNorm(5);                      // keep the BPTT gradients from blowing up
SetScalingAlgorithm(0);                      // mean / standard-deviation scaling
SetPercentTestingData(25);                   // hold back the most recent 25% as unseen test data
SetEarlyStoppingPatience(40);
SetMaximumEpochs(400);
SetSeed(1);

netName = Name() + "_T3_LSTM_Swing";
netPath = "WiseTraderToolbox\\NeuralNetwork\\" + netName;

ClearNeuralNetworkInputs();

// 2. Has this symbol's network been trained yet? Ask the disk.
hasNet = False;
fh = fopen(netPath, "r");
if(fh)
{
    fclose(fh);                                  // close every handle fopen returns (or Error 53)
    hasNet = True;
}

// 3. The Train / Retrain button (a chart click triggers a recalc).
function DrawButton(Text, x1, y1, minW, h, colorFrom, colorTo)
{
    GfxSetOverlayMode(0);
    GfxSelectFont("Segoe UI", 9, 700);           // pick the font BEFORE measuring the caption
    GfxSetBkMode(1);
    padX = 12;                                    // space to leave on each side of the caption
    w    = Max(minW, GfxGetTextWidth(Text) + 2 * padX);   // widen the button if the label is long
    GfxGradientRect(x1, y1, x1 + w, y1 + h, colorFrom, colorTo);
    GfxSetTextColor(colorWhite);                 // white caption on the green box - reads on any theme
    GfxDrawText(Text, x1, y1, x1 + w, y1 + h, 32|4|1);    // single line, centred horizontally + vertically
    return w;
}

BtnX = 5;
BtnY = 40;                                   // pushed down so it clears the chart title
BtnW = 170;
BtnH = 30;
BtnTop = ColorRGB(59, 130, 246);             // button gradient - lighter blue on top ...
BtnBot = ColorRGB(37, 99, 235);              // ... deepening to a darker blue at the bottom
LBClick = GetCursorMouseButtons() == 9;      // 9 = left button down AND cursor over this pane
MouseX  = Nz(GetCursorXPosition(1));          // pixel coordinates
MouseY  = Nz(GetCursorYPosition(1));

if(hasNet)
    BtnLabel = "Retrain network";
else
    BtnLabel = "Train network";

BtnW = DrawButton(BtnLabel, BtnX, BtnY, BtnW, BtnH, BtnTop, BtnBot);   // BtnW becomes the width actually drawn
CursorInBtn = MouseX >= BtnX AND MouseX <= BtnX + BtnW AND MouseY >= BtnY AND MouseY <= BtnY + BtnH;
BtnClicked  = CursorInBtn AND LBClick;

// 4. TRAIN - only on a click. Per-bar features (stationary, NOT hand-lagged);
//    the target is the percentage move Horizon bars ahead.
if(BtnClicked)
{
    ClearNeuralNetworkInputs();
    AddNeuralNetworkInput(ROC(C, 1), 0);            // one-bar return
    AddNeuralNetworkInput(RSI(14) / 100, 0);        // momentum
    AddNeuralNetworkInput((ATR(10) / C) * 100, 0);  // volatility as a percent of price
    fwdMove = (Ref(C, Horizon) / C - 1) * 100;
    AddNeuralNetworkOutput(fwdMove, 0);             // its sign is the swing direction

    fdelete(netPath);                            // drop any stale network so we retrain fresh
    TrainMultiInputNeuralNetwork(netName);
    StaticVarSetText("WTT_" + netName, "Training MSE = " + NeuralNetworkMSE +
                     "   Test MSE = " + TestingDataNeuralNetworkMSE);
    ClearNeuralNetworkInputs();
}

// 5. Re-read the disk after training: hasNet is true only if the save worked.
hasNet = False;
fh = fopen(netPath, "r");
if(fh)
{
    fclose(fh);
    hasNet = True;
}

// 6. Display. Running a trained network is cheap, so it runs on every recalc.
//    The first SEQ-1 bars cannot form a full window, so they come back empty.
if(hasNet)
{
    ClearNeuralNetworkInputs();                  // run set: same features, index 0, NO output
    AddNeuralNetworkInput(ROC(C, 1), 0);
    AddNeuralNetworkInput(RSI(14) / 100, 0);
    AddNeuralNetworkInput((ATR(10) / C) * 100, 0);

    predMove = RunMultiInputNeuralNetwork(netName);
    // Green bars = predicted up swing, red bars = predicted down swing.
    Plot(predMove, "Predicted " + Horizon + "-bar move %",
         IIf(predMove > 0, colorGreen, colorRed), styleHistogram | styleThick | styleOwnScale);

    Title = "Tutorial 3 - predicted " + Horizon + "-bar move = " + predMove + "%      " +
            StaticVarGetText("WTT_" + netName);
}
else
{
    // White text over a black outline, so it stays readable whether the chart
    // background is white or black.
    msg = "The neural network has not been trained yet.";
    pw  = Status("pxwidth");
    ph  = Status("pxheight");
    GfxSetOverlayMode(0);
    GfxSelectFont("Segoe UI", 12, 700);
    GfxSetBkMode(1);
    GfxSetTextColor(colorBlack);
    GfxDrawText(msg,  1, 0, pw + 1, ph,     32|1|4|16);
    GfxDrawText(msg, -1, 0, pw - 1, ph,     32|1|4|16);
    GfxDrawText(msg, 0,  1, pw,     ph + 1, 32|1|4|16);
    GfxDrawText(msg, 0, -1, pw,     ph - 1, 32|1|4|16);
    GfxSetTextColor(colorWhite);
    GfxDrawText(msg, 0,  0, pw,     ph,     32|1|4|16);

    Title = "Tutorial 3 - click Train to train the network";
}

// 7. Cleanup
EnableProgress();
RestoreDefaults();
ClearNeuralNetworkInputs();

Note

The first SequenceLength − 1 bars cannot form a full window, so the prediction comes back empty there and the plot simply starts a little later. That is expected, not an error.

Tip

The Train button is what lets this formula live on a chart: a chart indicator recalculates on every scroll, zoom and click, so training inline would retrain endlessly and freeze the pane. Gating training behind a click trains once and then just plots the saved network. The mechanism — the disk check, the click handling and the input-set sequencing — is explained step by step under Running a Network on a Chart.

Prefer a straight up/down label?

If you would rather train on the direction directly, swap the target for a 0/1 label and threshold the network's output at zero:

// Target: 1 if price is higher Horizon bars ahead, else 0
AddNeuralNetworkOutput(Ref(C, Horizon) > C, 0);
// ...after running:
up = predMove > 0.5;   // the recurrent output is a linear score, so 0.5 is the natural cut

Both framings work. We lead with the forward-return regression because the recurrent engine has a linear output: it produces an unbounded number, not a squashed 0–1 probability. With a 0/1 target that linear score still orders up versus down sensibly (cut it at 0.5), but it is not a calibrated probability — treat it as a lean, exactly as in Tutorial 2.

The advanced training settings

Harder problems are where the modern settings earn their keep. Here is what the example turns on and why.

Setting	Why
SetLearningAlgorithm(7)	Adam. The recurrent engine is trained with the Adam family of optimizers; Adam is the dependable default. A small learning rate (here 0.01) suits it.
SetErrorAlgorithm(2) + SetHuberDelta	Huber loss. Market moves have fat tails — the odd huge bar would dominate a plain squared-error loss and drag the fit around. Huber behaves like squared error for small residuals and like absolute error for large ones, so outliers stop bullying the network.
SetGradientClipNorm(5)	Caps the size of the gradient. Training a recurrent network unrolls it over the whole window, and the gradients can blow up; clipping keeps each step sane and stops the error jumping around.
SetRecurrentParams(16, 1, 20)	A small hidden size and a single layer. As always, smaller means less room to memorise noise — and recurrent networks have plenty of capacity, so resist the urge to make it big.
SetPercentTestingData / SetEarlyStoppingPatience	The same overfitting guards as before, and they apply here too: the trainer keeps the network that scores best on the held-out test data and stops when it stops improving.

Warning

Not every setting reaches the recurrent engine. The LSTM and GRU honour the optimizer choice, learning rate, loss and Huber delta, gradient clipping, the test split, early stopping, scaling and the seed. They do not use dropout, AdamW's weight decay, the learning-rate schedules, or minibatch mode — those apply to the feed-forward MLP. So for a recurrent model, control overfitting with a small hidden size, a short sequence, few features, the test split and early stopping, rather than by reaching for dropout or weight decay.

A lighter alternative: GRU

If training feels slow or the LSTM seems to be over-egging the data, try a GRU. It is the same idea with a simpler cell — change one line:

SetNeuralNetworkType(2);   // 2 = GRU (everything else stays the same)

When you do have a lot of data: the MLP route

The schedule and minibatch settings that the recurrent engine ignores are genuinely useful on a large feed-forward training run — thousands of bars with many inputs. If you are training a big MLP rather than a recurrent net, the modern recipe is AdamW with a cosine learning-rate schedule and minibatches:

SetLearningAlgorithm(8);          // AdamW (Adam + weight decay)
SetLearningRate(0.003);
SetAdamWeightDecay(0.01);
SetLearningRateSchedule(2);       // cosine annealing
SetLRScheduleStep(500);           // anneal over the full epoch budget
SetBatchSize(64);                 // minibatches
EnableShuffleData();
SetMaximumEpochs(500);

The Training Algorithms page explains each of these, and Accuracy & Overfitting covers weight initialisation (SetWeightInit — Xavier or He), which can help a deep MLP start from sensible weights.

Be honest about what this can do

Recurrent networks are powerful, and that cuts both ways. Predicting swings on noisy market data is genuinely hard, and a model with this much capacity will happily memorise the training set if you let it. A few habits keep you grounded:

Trust the test number, not the training number. A low training MSE with a much higher TestingDataNeuralNetworkMSE means the network has learned the past, not the future. Only the test number reflects unseen bars.
Avoid look-ahead. Every feature must be strictly backward-looking. Only the target reaches into the future, and only because the trained network never sees it at prediction time. An input that peeks ahead produces beautiful, worthless results.
Resist curve-fitting. If you try dozens of sequence lengths, hidden sizes and feature sets and keep the one with the best test score, you have quietly turned your test set into a second training set. Change few things, and keep a final slice of data you never looked at until the end.
Keep expectations realistic. A small, steady edge that holds up out of sample is worth far more than a stunning fit that falls apart in live trading. Favour robustness over a low training error every time.

That is the end of the three tutorials. Together they cover the arc from a simple return prediction (Tutorial 1), through a practical direction classifier with real overfitting control (Tutorial 2), to a recurrent model for swings. The techniques here — recurrent cells, modern optimizers, robust losses and disciplined validation — are tools, not magic. Used with a clear head, they are a genuinely useful addition to your trading research.