While researching some Python functionality, I came across an interesting statistical method which seems it might have an application with forex.

Currently, we have the the option to create In Sample and Out of Sample sets of data. As we know, the IS data is used to train the model and the OOS sample is used to validate the model and there is the option to split the dataset into two across varying lines of percentage. Although this reduces the over-fitting it doesnt eliminate it entirely.

Here, I think K Fold Cross Validation might help.

This approach involves randomly dividing the set of observations into k groups, or folds, of approximately equal size. The first fold is treated as a validation set, and the method is fit on the remaining k − 1 folds.https://machinelearningmastery.com/k-fold-cross-validation/

You can see my attached graphic but as an example, the data is split into 10; the 1st set is used to train; parts 2-9 are used to validate and an average is taken at the end. Second sweep is where the 2nd part is used to train and parts 1 and 3-9 are used for validation and an average is taken. All the way through until all the parts are completed. The average performance over all the parts is then computed.

I think this would give a much closer approximation of the efficacy of the strategy.

What do we think of this?

Nice idea Minch.

A similar idea can be taken to Monte Carlo Testing for say SPP for 'convergent strategies'. For example it is likely that a curve fit return stream is going to plot as one of the better return streams of a possible permutation array. The parameters of the 'average' return stream (midpoint of the array) is therefore more likely to be a more robust performer of the series going forward. The K fold cross validation technique you describe also approaches the problem from an 'average' performance approach. I note that Perry Kauffman also recommends a robustness approach that adopts a similar philosophy.

]]>Currently, we have the the option to create In Sample and Out of Sample sets of data. As we know, the IS data is used to train the model and the OOS sample is used to validate the model and there is the option to split the dataset into two across varying lines of percentage. Although this reduces the over-fitting it doesnt eliminate it entirely.

Here, I think K Fold Cross Validation might help.

*This approach involves randomly dividing the set of observations into k groups, or folds, of approximately equal size. The first fold is treated as a validation set, and the method is fit on the remaining k − 1 folds.*

https://machinelearningmastery.com/k-fold-cross-validation/

You can see my attached graphic but as an example, the data is split into 10; the 1st set is used to train; parts 2-9 are used to validate and an average is taken at the end. Second sweep is where the 2nd part is used to train and parts 1 and 3-9 are used for validation and an average is taken. All the way through until all the parts are completed. The average performance over all the parts is then computed.

I think this would give a much closer approximation of the efficacy of the strategy.

What do we think of this?

]]>