Preliminary Results with the Neural Network Ensemble Continued …

Please see the disclaimer. This blog does not contain investment advice.

This blog post is continuing the discussion on what might be termed ‘stockmarket prediction’ using machine learning / artificial intelligence. More precisely, it is exploring whether pattern recognition and optimisation algorithms can discover strategies that profitably exploit the volatility in share prices. So far – and as discussed in preceding posts – I’ve developed neural-network-based ensemble algorithms that vote each trading day. When a voting threshold is reached during the day, the algorithms simulate trading the stock. Successful strategies should, on average, increase the total number of shares held – and hopefully the total value of the holding – over time. The previous post used the example of a strategy for RIO.

This blog post extends the preliminary results to show some more (partially) successful strategies developed using the same set of input functions and dates for the training and back test periods. As with RIO, trading costs are modelled as 0.5% tax on purchases with £10 trades on an initial £3k investment (I usually ignore spread when dealing with a FTSE100 company). The black chart is the valuation of a buy-and-hold strategy; the coloured charts are for the ensemble strategy at different voting percentages (the second chart is a ratio of total shares held).

Of the companies tested thus far, it seems that the mining sector (and highly correlated) emerging market sector are most likely to achieve a simulated lead over buy-and-hold. This may be partially due to both sectors experiencing heavy losses in share prices over recent past – any strategy that stays in cash at least some of the time on a falling price is likely to beat buy and hold.

First up is the mining company AAL (using the same dates and input function as per RIO). The strategy could be characterised as performing well for the first six months, ok thereafter until about May of last year when the value of the holding collapsed but at a slower rate than the disastrous buy and hold (at the optimal 40% voting).

AAL_20160104

The AAL ratio plot…

AAL_Ratio_20160104

Currently the neural networks train, test and operate without interference. An alternative would be to impose a stop loss model that the neural network has to learn to operate around during training. The danger is that in operation – without the benefit of hindsight – such a stop-loss is as likely to crystallize a loss, missing the subsequent upswing from a temporary downwards spike, as it is to save the strategy from a heavy and sustained losses. Instead I have used the gain or loss since the last trade as in input into the neural network. In this instance, ‘knowing’ that the most recent trade is in heavy loss hasn’t persuaded the ensemble to sell…

Next up are results for BLT (below) – similar comments as for AAL, though losses slightly less extreme, and optimal voting much higher at 55%.

BLT_20160104

The BLT Ratio plot…

BLT_Ratio_20160104

This is the last mining stock modelled from the FTSE 250 this time – VED (below). Only a 40% voting strategy would currently be profitable

VED_20160104

VED ratio plot…

VED_Ratio_20160104

This is a fund manager, ADN (see below). The results are far more sensitive to the voting percentage. Only 55% voting significantly beat buy-and-hold. If the ‘break-out’ of Jul/Aug 2013 could have been detected quickly, using automated monitoring of the voting percentage performances, the 55% voting would still have led to a significant gain over buy and hold.

ADN_20160104

ADN ratio plot…

ADN_Ratio_20160104

Results for ASHM – another fund manager in the emerging markets sector…

ASHM_20160104

ASHM ratio plot…

ASHM_Ratio_20160104

The software has so far proven unsuccessful – with these input functions and dates – in generating strategies that surpass buy and hold to a significant degree for other relatively volatile sectors. I am thinking primarily of the banks, where models have performed very poorly. For example, this is the strategy generated for RBS (still with the same dates and input functions)

RBS_20160104

For some other companies/sectors, the neural networks appear to be performing a random walk, e.g. GSK

GSK_20160104

… or be inconsistent across a sector (e.g. in the oil & gas sector, strategies for BG tend to beat buy and hold; those for RDSB do not).

This disparity in the performance between the different sectors must largely be down to whether the dynamics of the training data persists post training i.e. if we are to beat buy and hold we need the price dynamics – the underlying trends – to continue into the future (or use our judgement on the likely course of the near future to pick the most relevant price history period for training). For example, with the RIO strategy of the previous post: if the share price were to experience a strong recovery, the neural networks trained from 2011 to 2012 are likely to perform poorly; ones trained during the 09-11 might perform significantly better due the strong gains that characterised that period. Detecting this change in trend as it occurs and swapping to different sets of trained networks might be the key to developing a more universally successful solution.

Preliminary Results with the Neural Network Ensemble

Please see the disclaimer. This blog does not contain investment advice.

My previous post discussed the plan for developing volatility-trading strategies using artificial intelligence – starting with a stochastic neural network ensemble-based approach. I should point out that I began working on this problem a long time before starting this blog and the neural network based approach is now quite well developed.

At this stage I could go into more details on the development of the software but I prefer to leave that for later and jump into some preliminary results. So far I’ve managed to incorporate most of the desired features that I listed in my previous posts; the exception being the adaptive voting. Plus there’s more work to do on some aspects including the network inputs – developing new inputs and determining which are most significant.

To date I’ve generated strategies (neural-network voting ensembles) for about 40-50 companies (nearly all my testing thus far has been with FTSE 250 plus a few from AIM). Perhaps obvious but the candidate shares with the most potential for beating buy and hold are those that are most volatile. A high beta (volatility metric) is the closest readily available measure, though not ideal. Of the companies tested thus far, there is quite a lot of variability in performance  of the strategies (even at similar levels of volatility in the share price). What is more, this variation in performance across companies appears to be consistent across several different training periods (at least with data since 2011, I haven’t done much testing with data prior to then). It is not yet clear why there should be such variability.

To recap – in case you haven’t read the previous posts – the neural network ensemble votes each day and provides a lower and upper price band; the algorithm only recommends buying (or selling) if the price goes either above or below the band. The way in which the band is calculated (by summing probabilities) means that sometimes there is only an upper or only a lower price (typically if selling or buying, respectively). The trading cycles between a buying state (when fully in cash) and a selling state (when fully invested). I have not explored shorting.

For the first trials, I wanted to see how well the ensemble of networks performs over several years of data – data that comes immediately after the training and back-test periods. With these trials, the networks are trained from Jan 1 2011 to Jan 1 2012 and back tested from Jan 1 2012 to July 1 2012 (more precisely, the algorithms simulate trading from the first available trading day after the start date up to the last trading day before the end date – for both the training period and the back-test period). The ensemble is then tested on the new data at different voting percentages (from 1st July 2012 up to the present date). Note that the simulations have ignored dividends as this data is not included in my data feeds; this may be a significant omission for some companies.

Note also that all the results discussed are entirely auto generated – I select the company(ies), set up the training and backtest dates (3 dates in total since backtest starts when training finishes), select the set of inputs (into the networks) optionally adjust a few parameters … and press the run button and wait a few hours. I want to avoid biasing results, so I don’t do any manual filtering/selection of networks, post training. The software algorithms automatically save networks that meet the performance criteria. Once the software has finished the training process, the saved networks are ready to be tested in voting mode.

I typically use an initial simulated investment of £3K. The black plot in the charts (see figure 1) is the value of the investment over time for buy and hold (and thus reflects a fixed multiple of the share price). The coloured lines represent the simulated values of the investment when following the ensemble’s trading advice, for different voting percentages (subject to the usual caveats when simulating trading – data errors and insufficient volume may make some of the trades infeasible).

The second type of plot is a ratio plot (see figure 2), where the vertical value reflects the increase/decrease in the number of shares held relative to buy-and-hold e.g. a value of 2 would indicate approximately twice as many shares are now held relative to the initial investment. Note that the horizontal periods of each chart represent being invested in (a constant number of) shares; the positive slopes represent the periods in cash when shares are sold and repurchased at a lower price; the negative slopes are where shares are repurchased at a higher price (so not desirable). A vertical value less than 1 means the simulation now has fewer shares than it started with. A break in the chart up to the present date indicates that the simulation is in cash, awaiting an opportunity to repurchase the shares.

The following plots are for the simulated trades of RIO. This is one of the most consistent and promising performers over the years tested thus far, with mostly positive gains over a spread of voting percentages.

Training Dates: Jan 1 2011 to Jan 1 2012

Backtest Dates: Jan 1 2012 to July 1 2012

Inputs: Set_9

These are the simulated results of running the ensemble of neural networks for data immediately after the training and backtest, so from July 1 2012 to present date (figure 1). The legend shows the voting percentage. It shows that 15% was optimal for this simulation (this is atypical – the best voting percentage is more typically in the 25-40% range), with a current value above the 4300 line, whereas buy and hold has a current value below the 2050 line.

Fig1_20160103

Figure 1

This is the corresponding ratio plot for the RIO simulation (figure 2) – it shows the ratio of the number of shares currently held relative to the number of shares at the initial investment – i.e. the gain relative to buy and hold. For a share that has lost a considerable proportion of its value (since the simulation started), it is not surprising that the multiple is greater than 1 for RIO over this period – any strategy that stays out of the market for periods on a falling share price will do better than buy and hold.

Fig2_20160103

Figure 2

The power of voting…

I find this next illustration interesting (figure 3). It shows a list of the individual networks (from the ensemble of 66 voting networks), listed in descending order of gain in total value. I developed this view of the data to see the relative performance of individual networks over different timeframes (because the networks are stochastic when run individually, I’ve used a probability value >0.5 is true, else false).

It shows that the best network (from Jul 12 to present) had a gain of 20%, as did the next best two, followed by 19% for the following two, etc. So a simulation with the best network has a current value of ca. 3600 (a 20% gain relative to the initial investment of 3000); the worst performing network has a current value of <1600 (a 45% loss which is even worse than buy and hold). Of course, this is with hindsight; all the networks did well during training and backtesting and we have no idea from the outset which network will be the best or worst into the future. Compare this to the 15% ensemble voting current value of >4300 – which is a better performance than even the best individual neural network out of the 66. Of course, the 15% voting value is also unknown from the outset, but at least this value of the voting percentage takes an early lead, at least in this example for RIO. The same cannot be said for the ordering of the individual networks – the ordering can change significantly over different simulation time-frames.

Fig3_20160103

Figure 3

Optimal Voting percentage During Training & Backtesting

The obvious question is: Could the 15% optimal voting percentage be predicted from the performance of the ensemble over the training and backtest period? With this example of RIO, the optimal performance comes again from a 15% voting rate (figure 4). Unfortunately, this seems to be a fluke – most of the simulations I’ve done do not have the same percentage during – and post – backtest & training. (Note that the gain in share value in this chart is largely meaningless – the networks were ‘evolved’ to optimise their performance over the data from this period – although too great a performance could indicate over-fitting to the training data, which would be detrimental to future predictions).

Fig4_20160103

Figure 4

RIO turns out to be one of the best performing simulations (in terms of beating buy-and-hold and potentially making a gain at optimal, or near-optimal, voting percentages). Whether this continues to be the case presumably depends on whether the training data of 2011 to 2012 continues to be a good model for future share price dynamics. The 2011 to 2012 could be categorised crudely as an overall downward share price path with periods of volatile sideways movement – a trend that has broadly continued to the current day (perhaps more downward than sideways!). If in the future, this trend ceases – e.g. if the share price were to recover strongly, it is unlikely the ensemble would do well, as it has not experienced such a scenario during training. One test is to look back before the training period. 2009 – 2011 was characterised by a strong share price recovery following the 08 crash, with the share price more than tripling in value. Running the simulation over the 09-11 period (figure 5) shows that the ensemble underperformed buy-and-hold for all voting percentages for most of the time.

Fig5_20160103Figure 5

Building an ensemble from networks trained during different periods may mitigate this effect – or may just dilute the information contained in the most recent data. This is something I hope to look at further at some point – developing a heuristic or perhaps using a reinforcement learning algorithm – to adapt the voting weights of networks or ‘buckets’ of networks.

For the next posts, I’ll add some results for other companies from the FTSE

Trading strategies, neural networks and genetic algorithms…

In the first post, I wrote about some of the features I hoped to implement into software for (simulating) the trading of share price volatility.

The first approach I want to try is to use neural networks where the weights within each neural network are adapted or ‘trained’ by a genetic algorithm over the course of hundreds of ‘generations’. Each neural network (from now on referred to as ‘network’) will be a ‘strategy’, providing buy (or sell) prices for this day only (this is a daily model; however it could be operated over different time periods, assuming the data is available).

As a very brief description of my proposed approach: a set of neural networks are first trained to fix their internal state (their set of weights). Once trained, each network provides buy (or sell) prices and probabilities. Each network does this by taking a set of inputs that are pre-processed & scaled data, mostly derived from past price data (the price data of whatever share we are investigating, but it could be based on other price ratios, volumes, dates, indicators, as well as previous states of the strategy – more on this in a later post). The inputs are next multiplied by the weights, summed and pushed through ‘sigmoid’ functions. This is repeated through several layers of the network. The final layer is the output layer. The output is selected to be the aforementioned high and low prices (for buy or sell), together with probabilities for each price. So if the share price moves outside the high-low price band then a decision is made to buy (or sell), subject to the probability for that band.

Together the selected set of networks operates as a team or ‘ensemble’, voting on the price band and probabilities. By ordering the networks by their prices and summing the probabilities, deterministic prices are derived for different level of ‘voting’.

The software and the networks will operate in one of three modes:

Training Mode

During training there is a population of say one-thousand networks with randomly generated weights. Each network runs through every simulated trading day of the training period (typically 1 or 2 years) – each simulated trading day the software decides whether to buy (or sell) based on the network’s price and probability outputs. After all networks have traded and their performance assessed, a new generation of networks is created using a genetic algorithm (GA). The GA selects pairs of networks from the existing generation based on their ‘fitness’ (how successful they were at trading) and performs a ‘mating’ operation to create a new generation of ‘child’ networks. Over the course of many generations, the GA slowly evolves the networks’ weights to maximise the ‘objective function’ – the success at trading during the training interval. During the above process, each network is also ‘back-tested’ to see how well it performs against data that it has not encountered before during training – typically the 7-12 months that immediately follow the training period. Only the networks that are most successful during the backtest period are saved into the ensemble used for voting.

Backtest Voting Mode

In voting mode, all the saved networks operate over a specified (past) date interval using ‘voting’ at a predetermined percentage level. Typically, the voting percentage is incremented – e.g. by 5% from say 20% to 50% voting – in order to determine a historically optimal voting percentage (over whatever date range has been selected). Adaptive voting, where successful networks are reinforced at the cost of unsuccessful networks, is a more complex alternative approach. It is reasonable to expect the ensemble of networks to perform well over the date period that was used for training and for backtesting (as they were individually selected based on their ability within this period). What is unknown is how well the ensemble performs after the backtest period has finished – and what percentage of voting amongst the ensemble produces the best returns.

Live Mode  

In the live mode, the daily open price of the share is the trigger to calculate the low and high price for the buy (or sell) for today. To operate in the Live mode, the voting percentage must be fixed to some value. By sorting all the networks in order of price, the code sums the probabilities for each network until the specified voting percentage is reached. Actual live prices can be pulled in from live data streams and an email sent when a buy or sell decision is reached. Or they could interact with a share broker’s software ‘API’ to automatically trade without human intervention. I doubt I’d want to go that far – maybe with a virtual portfolio.

Confidence

Which leads to my next thought – even if everything appears to work well, a key consideration is having confidence that all the software is working as expected. While some bugs are inevitable, there absolutely must not be any bias or foresight required in the process. Foresight arises when knowledge of future price data is required to derive today’s price and probability bands. What I call bias arises when the order in which code operates affects the outcome – e.g. if the ordering of the code implicitly assumes the daily high always comes before the daily low.

One way to be reasonably confident is to check that the network outputs generated for today in the ‘live’ mode are repeated tomorrow (and subsequently) in the Backtest Voting Mode. I could write a load more on this subject – together with the difficulties/feasibility of replicating the simulated returns in the ‘real world’ but that probably too much waffle for one sitting.

So what all this is leading to is hopefully a system that can recognise complex patterns in data and use this ability to suggest a price band at the start of each trading day (in the Live mode). If past data patterns prove to be a predictor of future data patterns then hopefully – and on average -there may be a long term out-performance relative to a buy-and-hold approach.

Next post…

Algorithms, AI and Trading Volatility

Can algorithms help improve the performance of investing in the stock markets? With something as random and uncertain as stock prices, I thought a good place to start is the set of algorithms that are collectively labelled as ‘Artificial Intelligence’ (AI) though that won’t preclude the use of other algorithms or statistical/mathematical approaches if more appropriate.

Algorithms could help in deciding what to buy – if they can be used to collect, collate, filter and even analyse data gleaned from myriad financial websites and the like – looking for historic statistical correlations and patterns. They may also help in determining when to buy (and sell) a particular share. This is what I want to investigate first: price volatility, can it be exploited profitably?

By exploiting volatility I mean repeatedly buying and selling a shareholding at short-term lows and highs respectively, beating a buy-and-hold strategy. Bettering buy and hold implies increasing the total number of shares held over time but does not necessarily mean increasing the value of the entire holding (i.e. making a profit). But if the approach can be applied to a basket of shares that are sufficiently diverse (i.e. their daily price movements are not too highly correlated) then hopefully it will also be profitable, at least over the medium/long term…

Strategies

Share prices are subjected to tomorrow’s events which are largely unpredictable. So the objective is not to build algorithms that try and predict the immediate future. It is use algorithms that can build strategies that on average leads to greater returns than a buy-and-hold strategy over a given timeframe.

For example, perhaps the simplest strategy is to repeatedly buy and sell a share at a fixed price, say buy at 10 and sell at 12. Within a given date range there will be two prices that maximize returns. Determining the two optimal prices for a period in the past is not entirely trivial – it is typically a global optimization problem, meaning that there may be many solutions that are locally optimal. For example, in addition to the 10/12 buy/sell solution, there could be an alternative solution where we buy and sell at 10.5 and 11.5 – for a smaller profit each time but greater frequency over the same time period, leading to broadly similar returns.

A slightly more sophisticated strategy would be a ‘chartist’ approach of trading on price movements in and out of rolling-average bands – or on the value of a dimensionless indicator such as ‘William R’ or ‘RSI’ (e.g. buy at 30 and sell at 70). Again, this will be an optimisation problem where we can derive optimal values over a particular time frame.

A better approach might be one which can take in a plethora of indicators derived from the share price, volume, other shares and indices, exchange rates etc. and look for repeating patterns that are too complex to see visually.

While it is easy to look backwards and find a strategy that would have done well, is it possible to reliably derive strategies that will remain profitable for some time into the future? Can we build strategies that deal with uncertainty and adapt to the underlying dynamics of the market as it changes?

Features List

So if we assume that volatility can be exploited, what algorithms can be used? Before considering that, I’m going to list the features that might help exploit any data patterns – if they exist. It might not be feasible to implement them all (in the software that will be developed), not at first anyway

  • The software will keep a pool of strategies – randomly generated at first – that will evolve over a series of generations during a training process.
  • Training will involve trading a strategy from the pool using historic share price data. The software will simulate trading using the strategy, (and include trading costs and taxes and spread in each trade), hopefully making a profit with each buy/sell cycle that can be used to purchase additional shares on the next cycle. The number of shares held should increase – on average – as well as their total value – if the strategy is successful. The strategies will be adapted over the course of many iterations (generations) in a way that improves performance over the training period.
  • Training of the strategy will take place between date D1 to date D2 (D2 after D1), the strategy performance will be tested over a period not used for training – typically D2 to D3 (where D3 is after D2).
  • Using a moving window between these dates (D1 and D2) should help reduce over-fitting to the data. The performance of the strategy will then be the average over the different training windows.
  • Perform many runs to build a set of strategies. Save the ‘best’ strategies for each run (best being a metric that is predominantly profit but probably also needs to be behaviour based – e.g. proportion of trades in the right direction; risk based penalties).
  • The strategy will be in the form of a software ‘model’ that receives inputs and that outputs a buy or sell decision. Inputs into the models will be derived from the past price time series for the share price, possibly past volume as well as other related time-series (other share prices, the benchmark index, volatility indices, commodity prices, exchange rates) or could be time-based (period of the year, period of the month, day of week) etc.
  • Stochastic strategies – strategies containing a probability output will allow different outcomes (with a specified probability) for the same set of inputs. This may allow the strategy to better model the behaviour of the market, for broadly similar (known) inputs can lead to very different outcomes.
  • Use voting to combine the outcome of the multiple strategies. For stochastic strategies, the output probabilities will be summed to a deterministic value – (i.e. instead of counting yes/no votes, we will be counting probabilities (fractions between 0 and 1) of yes/no votes).
  • Adapt the proportion or weighting of individual strategies over time as the environment changes. i.e. rather than adapt the strategies themselves over time, add new strategies and up/down-vote successful/unsuccessful strategies. We could keep ‘buckets’ of strategies and in addition to adapting the content of each bucket, switch between the buckets as the dynamics of the market change (and hopefully the market repeats past behaviour). There are other approaches on the same theme, e.g. keep to one-strategy, one-vote but adapt the voting percentage over time.

Of course, even with most or all these features implemented, it is entirely possible that the resulting software could just be a very complex random number generator. If that’s the case, the simulated profits will quickly turn negative as trading costs eat into the initial capital (or the software does not trade, either not investing or buying and holding).

For future posts, I plan to discuss the software & algorithms that I have been developing and that implement most of the features discussed above…

Next post…