Accuracy & Avoiding Overfitting
This is the most important page in the neural-network section. Getting a network to fit your training data is easy; getting it to predict new bars well is the whole challenge, and it comes down to controlling overfitting. The settings here matter far more to real-world results than your choice of optimizer or activation. In testing on a deliberately overfit-prone task, turning these on cut the out-of-sample error by about half versus the defaults.
What overfitting is, and how to spot it
Overfitting is when the network memorises the training data — including its random noise — instead of learning the real, repeatable pattern. It then looks excellent on the data it trained on and predicts poorly on new bars, which is the only thing that matters for trading.
You can see it directly. After training, the toolbox publishes two AFL variables:
NeuralNetworkMSE— the error on the training data.TestingDataNeuralNetworkMSE— the error on the held-out test data (only produced when you reserve test data; see Step 1).
If the training error is low but the test error is much higher, the network is overfitting. A healthy network has the two numbers reasonably close together. (MSE means mean squared error; lower is better.)
The accuracy recipe — do these in order
Step 1 — Reserve some data for testing (most important)
SetPercentTestingData(Percent)
This holds back the most recent Percent% of your data as unseen test data.
It does two things: it lets you see the test error (so you can detect
overfitting), and — crucially — it switches the trainer into keeping the network that
scores best on the test data rather than the one that scores best on
the training data. That single change roughly halved out-of-sample error in our tests.
Use 20–30.
SetPercentTestingData( 25 ); // keep the most recent 25% as unseen test data
Step 2 — Stop training when it stops improving (early stopping)
SetEarlyStoppingPatience(Epochs)
Stops training if the test error hasn't improved for the given number of epochs,
preventing the network from training so long that it starts memorising noise. Use
20–50. 0 disables it (the default). This uses the same
held-out test data as Step 1, so reserve test data first.
SetEarlyStoppingPatience( 30 ); // stop if test error hasn't improved for 30 epochs
Step 3 — Add regularization (discourage memorizing)
Pick one or both of these. They push the network toward simpler, more general solutions.
SetDropoutRate(Rate)
Randomly ignores a fraction of the hidden neurons on each training step, so the network can't lean too heavily on any single one. Try 0.1–0.3. Dropout is applied only during training — predictions always run with the full network. Valid values are 0 (off, the default) up to but not including 1.
Weight decay (via AdamW) is the other main regularizer:
SetDropoutRate( 0.2 ); // randomly ignore 20% of hidden units each step
SetLearningAlgorithm( 8 ); // AdamW (Adam with built-in weight decay)
SetAdamWeightDecay( 0.01 ); // gently shrink weights toward zero
SetLearningRate( 0.003 ); // the Adam family wants a SMALL learning rate
Step 4 — Right-size the network
Smaller networks overfit less. Start small and only grow if the network genuinely can't fit the training data at all.
SetNetworkWithActivationLayer1( 8, 1, 9 ); // one hidden layer of 8 tanh neurons, linear output
Step 5 — Choose the optimizer and learning rate
Good defaults are iRPROP+ (code 4, robust, no tuning) or AdamW (code 8, modern, with regularization). The Adam family needs a small learning rate (~0.001–0.005); iRPROP+ ignores the learning rate entirely. See Training Algorithms.
Step 6 — Optional polish
- Better starting weights with
SetWeightInit(1)(Xavier) orSetWeightInit(2)(He, for ReLU). - A learning-rate schedule such as cosine annealing
(
SetLearningRateSchedule(2)).
The supporting settings in detail
Weight initialisation: SetWeightInit
Controls how the network's weights are first set before training begins. Good starting weights help training converge.
| Mode | Scheme |
|---|---|
| 0 | Uniform — the original scheme (the default). |
| 1 | Xavier / Glorot — well suited to sigmoid / tanh hidden layers. |
| 2 | He — designed to pair with ReLU / LeakyReLU hidden layers. |
Gradient clipping: SetGradientClipNorm
Caps the overall size of the weight-update gradient to the given value, which stabilises
training when the error jumps around or blows up. 0 (the default) turns it
off; 1–5 is a typical range when you need it.
The loss function: SetErrorAlgorithm
The loss (or error) function defines what "wrong" means while training — it is the quantity the optimizer drives down.
| Code | Loss — use |
|---|---|
| 0 | Linear (plain MSE) — standard regression. |
| 1 | Tanh (cross-entropy-like) — classification, paired with a sigmoid output. This is the default. |
| 2 | Huber (robust) — regression where outliers or fat
tails would otherwise dominate; set the transition point with
SetHuberDelta. |
The default loss is Tanh (code 1), which is meant for classification.
If you are doing plain regression (predicting a continuous value such as a future
return), set SetErrorAlgorithm(0) for linear MSE.
SetHuberDelta(Delta)
Sets the point at which Huber loss switches from squared to linear behaviour, used with
SetErrorAlgorithm(2). Must be greater than 0; 0.5–1.0 is
typical. The default is 1.0.
Copy-paste recipes
Regression / indicator value prediction (robust default):
SetLearningAlgorithm( 4 ); // iRPROP+
SetNetworkWithActivationLayer1( 8, 1, 9 ); // 8 tanh hidden, linear output
SetErrorAlgorithm( 0 ); // plain regression error
SetPercentTestingData( 25 );
SetEarlyStoppingPatience( 30 );
SetMaximumEpochs( 300 );
Regression with modern regularization (when overfitting is a concern):
SetLearningAlgorithm( 8 ); // AdamW
SetLearningRate( 0.003 );
SetAdamWeightDecay( 0.01 );
SetDropoutRate( 0.2 );
SetNetworkWithActivationLayer1( 8, 1, 9 ); // tanh hidden, linear output
SetErrorAlgorithm( 0 );
SetPercentTestingData( 25 );
SetEarlyStoppingPatience( 30 );
SetLearningRateSchedule( 2 ); // cosine annealing
SetLRScheduleStep( 300 );
SetMaximumEpochs( 300 );
Classification (e.g. up vs down):
SetLearningAlgorithm( 8 ); // AdamW
SetLearningRate( 0.002 );
SetAdamWeightDecay( 0.01 );
SetNetworkWithActivationLayer1( 8, 1, 0 ); // tanh hidden, sigmoid output (0..1)
SetErrorAlgorithm( 1 ); // tanh / cross-entropy-like loss
SetPercentTestingData( 25 );
SetEarlyStoppingPatience( 30 );
SetMaximumEpochs( 300 );
Troubleshooting
| Symptom | Try |
|---|---|
| Training error low, test error high (overfitting) |
SetPercentTestingData, SetDropoutRate, AdamW +
SetAdamWeightDecay, a smaller network,
SetEarlyStoppingPatience. |
| Network won't fit even the training data (underfitting) | a bigger network, more epochs, a higher learning rate, or iRPROP+. |
| Training unstable / error jumps around or blows up | lower the learning
rate, SetGradientClipNorm(1), or use iRPROP+. |
| Predicting unbounded values but output stuck in 0..1 | use a linear output activation (code 9) — see Activation Functions. |
| Results change every run | SetSeed(n) to fix the random
seed. |
| Outliers / spikes dominate training | Huber loss:
SetErrorAlgorithm(2) + SetHuberDelta(0.5). |