Using Genetic Algorithms in Quantitative Trading

R-bloggers 2014-03-14

(This article was first published on The R Trader » R, and kindly contributed to R-bloggers)

The question one should always asked him/herself when using technical indicators is what would be an objective criteria to select indicators parameters (e.g., why using a 14 days RSI rather than 15 or 20 days?). Genetic algorithms (GA) are well suited tools to answer that question. In this post I’ll show you how to set up the problem in R. Before I proceed the usual reminder: What I present in this post is just a toy example and not an invitation to invest. It’s not a finished strategy either but a research idea that needs to be further researched, developed and tailored to individual needs.

What are genetic algorithms?

The best description of GA I came across comes from Cybernatic Trading a book by Murray A. Ruggiero. ”Genetic Algorithms were invented by John Holland in the mid-1970 to solve hard optimisation problems. This method uses natural selection, survival of the fittest”. The general process follows the steps below:

  1. Encode the problem into chromosomes
  2. Using the encoding, develop a fitness function for use in evaluating each chromosome’s value in solving a given problem
  3. Initialize a population of chromosomes
  4. Evaluate each chromosome in the population
  5. Create new chromosomes by mating two chromosomes. This is done by  muting and recombining two parents to form two children (parents are selected randomly but biased by their fitness)
  6. Evaluate the new chromosome
  7. Delete a member of the population that is less fit than the new chromosome and insert the new chromosome in the population.
  8. If the stop criteria is reached (maximum number of generations, fitness criteria is good enough…) then return the best chromosome alternatively go to step 4

From a trading perspective GA are very useful because they are good at dealing with highly nonlinear problems. However they exhibit some nasty features that are worth mentioning:

  • Over fitting: This is the main problem and it’s down to the analyst to set up the problem in a way that minimises this risk.
  • Computing time: If the problem isn’t properly defined, it can be extremely long to reach a decent solution and the complexity increases exponentially with the number of variables. Hence the necessity to carefully select the parameters.

There are several R packages dealing with GA, I chose to use the most common one: rgenoud

Data & experiment design

Daily closing prices for most liquid ETFs from Yahoo finance going back to January 2000. The in sample period goes from January 2000 to December 2010. The Out of sample period starts on January 2011.

The logic is as following: the fitness function is optimised  over the in sample period to obtain a set of optimal parameters for the selected technical indicators. The performance of those indicators is then evaluated  in the out of sample period. But before doing so the technical indicators have to be selected.

The equity market exhibits two main characteristics that are familiar to anyone with some trading experience. Long term momentum and short term reversal. Those features can be translated in term of technical indicators by: moving averages cross over and RSI. This represents a set of 4 parameters: Look-back periods for long and short term moving averages, look-back period for RSI and RSI threshold. The sets of parameters are the chromosomes. The other key element is the fitness function. We might want to use something like: maximum return or Sharpe ratio or minimum average Drawdown. In what follows, I chose to maximise the Sharpe ratio.

The R implementation is a set of 3 functions:

  1. fitnessFunction: defines the fitness function (e.g., maximum Sharpe ratio) to be used within the GA engine
  2. tradingStatistics: summary of trading statistics for the in and out of sample periods for comparison purposes
  3. genoud: the GA engine from the rgenoud package

The genoud function is rather complex but I’m not going to explain what each parameter means as I want to keep this post short (and the documentation is really good).

Results

In the table below I present for each instrument the optimal parameters (RSI look-back period, RSI threshold, Short Term Moving Average, and Long Term Moving Average) along with the in and out of sample trading statistics.

<!-- table.gridtable { font-family: verdana,arial,sans-serif; font-size:10px; color:#333333; border-width: 1px; border-color: #666666; border-collapse: collapse; } table.gridtable th { border-width: 1px; padding: 8px; border-style: solid; border-color: #666666; background-color: #dedede; } table.gridtable td { border-width: 1px; padding: 1px; border-style: solid; border-color: #666666; background-color: #ffffff; } -->

Instrument/Parameters In Sample Out Of Sample SPY c(31,62,32,76) total Return = 14.4% Number of trades = 60 Hit ratio = 60% total Return = 2.3% Number of trades = 8 Hit ratio = 50% EFA c(37,60,36,127) total Return = 27.6% Number of trades = 107 Hit ratio = 57% total Return = 2.5% Number of trades = 11 Hit ratio = 64% EEM c(44,55,28,90) total Return = 39.1% Number of trades = 85 Hit ratio = 58% total Return = 1.0% Number of trades = 17 Hit ratio = 53% EWJ c(44,55,28,90) total Return = 15.7% Number of trades = 93 Hit ratio = 54% total Return = -13.1% Number of trades = 31 Hit ratio = 45%

Before commenting the above results, I want to explain a few important points. To match the logic defined above, I bounded the parameters to make sure the look-back period for the long term moving average is always longer that the shorter moving average. I also constrained the optimiser to choose only the solutions with more than 50 trades in the in sample period (e.g;, statistical significance).

Overall the out of sample results are far from impressive. The returns are low even if the number of trades is small to make the outcome really significant. However there’s a significant loss of efficiency between in and out of sample period for Japan (EWJ) which very likely means over fitting.

Conclusion

This post is intended to give the reader the tools to properly use GA in a quantitative trading framework. Once again, It’s just an example that needs to be further refined. A few potential improvement to explore would be:

  • fitness function: maximising the Sharpe ratio is very simplistic. A “smarter” function would certainly improve the out of sample trading statistics
  • pattern: we try to capture a very straightforward pattern. A more in depth pattern research is definitely needed.
  • optimisation: there are many ways to improve the way the optimisation is conducted. This would improve both the computation speed and the rationality of the results.

The code used in this post is available on a Gist repository.

As usual any comments welcome

<!-- table.gridtable { font-family: verdana,arial,sans-serif; font-size:10px; color:#333333; border-width: 1px; border-color: #666666; border-collapse: collapse; } table.gridtable th { border-width: 1px; padding: 8px; border-style: solid; border-color: #666666; background-color: #dedede; } table.gridtable td { border-width: 1px; padding: 1px; border-style: solid; border-color: #666666; background-color: #ffffff; } -->

To leave a comment for the author, please follow the link and comment on his blog: The R Trader » R.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...