Accuracy & Avoiding Overfitting

This is the most important page in the neural-network section. Getting a network to fit your training data is easy; getting it to predict new bars well is the whole challenge, and it comes down to controlling overfitting. The settings here matter far more to real-world results than your choice of optimizer or activation. In testing on a deliberately overfit-prone task, turning these on cut the out-of-sample error by about half versus the defaults.

What overfitting is, and how to spot it

Overfitting is when the network memorises the training data — including its random noise — instead of learning the real, repeatable pattern. It then looks excellent on the data it trained on and predicts poorly on new bars, which is the only thing that matters for trading.

You can see it directly. After training, the toolbox publishes two AFL variables:

NeuralNetworkMSE — the error on the training data.
TestingDataNeuralNetworkMSE — the error on the held-out test data (only produced when you reserve test data; see Step 1).

If the training error is low but the test error is much higher, the network is overfitting. A healthy network has the two numbers reasonably close together. (MSE means mean squared error; lower is better.)

The accuracy recipe — do these in order

Step 1 — Reserve some data for testing (most important)

SetPercentTestingData(Percent)

This holds back the most recent Percent% of your data as unseen test data. It does two things: it lets you see the test error (so you can detect overfitting), and — crucially — it switches the trainer into keeping the network that scores best on the test data rather than the one that scores best on the training data. That single change roughly halved out-of-sample error in our tests. Use 20–30.

SetPercentTestingData( 25 );   // keep the most recent 25% as unseen test data

Step 2 — Stop training when it stops improving (early stopping)

SetEarlyStoppingPatience(Epochs)

Stops training if the test error hasn't improved for the given number of epochs, preventing the network from training so long that it starts memorising noise. Use 20–50. 0 disables it (the default). This uses the same held-out test data as Step 1, so reserve test data first.

SetEarlyStoppingPatience( 30 );  // stop if test error hasn't improved for 30 epochs

Step 3 — Add regularization (discourage memorizing)

Pick one or both of these. They push the network toward simpler, more general solutions.

SetDropoutRate(Rate)

Randomly ignores a fraction of the hidden neurons on each training step, so the network can't lean too heavily on any single one. Try 0.1–0.3. Dropout is applied only during training — predictions always run with the full network. Valid values are 0 (off, the default) up to but not including 1.

Weight decay (via AdamW) is the other main regularizer:

SetDropoutRate( 0.2 );          // randomly ignore 20% of hidden units each step

SetLearningAlgorithm( 8 );      // AdamW (Adam with built-in weight decay)
SetAdamWeightDecay( 0.01 );     // gently shrink weights toward zero
SetLearningRate( 0.003 );       // the Adam family wants a SMALL learning rate

Step 4 — Right-size the network

Smaller networks overfit less. Start small and only grow if the network genuinely can't fit the training data at all.

SetNetworkWithActivationLayer1( 8, 1, 9 );  // one hidden layer of 8 tanh neurons, linear output

Step 5 — Choose the optimizer and learning rate

Good defaults are iRPROP+ (code 4, robust, no tuning) or AdamW (code 8, modern, with regularization). The Adam family needs a small learning rate (~0.001–0.005); iRPROP+ ignores the learning rate entirely. See Training Algorithms.

Step 6 — Optional polish

Better starting weights with SetWeightInit(1) (Xavier) or SetWeightInit(2) (He, for ReLU).
A learning-rate schedule such as cosine annealing (SetLearningRateSchedule(2)).

The supporting settings in detail

Weight initialisation: `SetWeightInit`

Controls how the network's weights are first set before training begins. Good starting weights help training converge.

Mode	Scheme
0	Uniform — the original scheme (the default).
1	Xavier / Glorot — well suited to sigmoid / tanh hidden layers.
2	He — designed to pair with ReLU / LeakyReLU hidden layers.

Gradient clipping: `SetGradientClipNorm`

Caps the overall size of the weight-update gradient to the given value, which stabilises training when the error jumps around or blows up. 0 (the default) turns it off; 1–5 is a typical range when you need it.

The loss function: `SetErrorAlgorithm`

The loss (or error) function defines what "wrong" means while training — it is the quantity the optimizer drives down.

Code	Loss — use
0	Linear (plain MSE) — standard regression.
1	Tanh (cross-entropy-like) — classification, paired with a sigmoid output. This is the default.
2	Huber (robust) — regression where outliers or fat tails would otherwise dominate; set the transition point with `SetHuberDelta`.

Warning

The default loss is Tanh (code 1), which is meant for classification. If you are doing plain regression (predicting a continuous value such as a future return), set SetErrorAlgorithm(0) for linear MSE.

SetHuberDelta(Delta)

Sets the point at which Huber loss switches from squared to linear behaviour, used with SetErrorAlgorithm(2). Must be greater than 0; 0.5–1.0 is typical. The default is 1.0.

Copy-paste recipes

Regression / indicator value prediction (robust default):

SetLearningAlgorithm( 4 );                  // iRPROP+
SetNetworkWithActivationLayer1( 8, 1, 9 );  // 8 tanh hidden, linear output
SetErrorAlgorithm( 0 );                     // plain regression error
SetPercentTestingData( 25 );
SetEarlyStoppingPatience( 30 );
SetMaximumEpochs( 300 );

Regression with modern regularization (when overfitting is a concern):

SetLearningAlgorithm( 8 );                  // AdamW
SetLearningRate( 0.003 );
SetAdamWeightDecay( 0.01 );
SetDropoutRate( 0.2 );
SetNetworkWithActivationLayer1( 8, 1, 9 );  // tanh hidden, linear output
SetErrorAlgorithm( 0 );
SetPercentTestingData( 25 );
SetEarlyStoppingPatience( 30 );
SetLearningRateSchedule( 2 );               // cosine annealing
SetLRScheduleStep( 300 );
SetMaximumEpochs( 300 );

Classification (e.g. up vs down):

SetLearningAlgorithm( 8 );                  // AdamW
SetLearningRate( 0.002 );
SetAdamWeightDecay( 0.01 );
SetNetworkWithActivationLayer1( 8, 1, 0 );  // tanh hidden, sigmoid output (0..1)
SetErrorAlgorithm( 1 );                     // tanh / cross-entropy-like loss
SetPercentTestingData( 25 );
SetEarlyStoppingPatience( 30 );
SetMaximumEpochs( 300 );

Troubleshooting

Symptom	Try
Training error low, test error high (overfitting)	`SetPercentTestingData`, `SetDropoutRate`, AdamW + `SetAdamWeightDecay`, a smaller network, `SetEarlyStoppingPatience`.
Network won't fit even the training data (underfitting)	a bigger network, more epochs, a higher learning rate, or iRPROP+.
Training unstable / error jumps around or blows up	lower the learning rate, `SetGradientClipNorm(1)`, or use iRPROP+.
Predicting unbounded values but output stuck in 0..1	use a linear output activation (code 9) — see Activation Functions.
Results change every run	`SetSeed(n)` to fix the random seed.
Outliers / spikes dominate training	Huber loss: `SetErrorAlgorithm(2)` + `SetHuberDelta(0.5)`.

Accuracy & Avoiding Overfitting

What overfitting is, and how to spot it

The accuracy recipe — do these in order

Step 1 — Reserve some data for testing (most important)

Step 2 — Stop training when it stops improving (early stopping)

Step 3 — Add regularization (discourage memorizing)

Step 4 — Right-size the network

Step 5 — Choose the optimizer and learning rate

Step 6 — Optional polish

The supporting settings in detail

Weight initialisation: SetWeightInit

Gradient clipping: SetGradientClipNorm

The loss function: SetErrorAlgorithm

Copy-paste recipes

Troubleshooting

Weight initialisation: `SetWeightInit`

Gradient clipping: `SetGradientClipNorm`

The loss function: `SetErrorAlgorithm`