Friday, March 11, 2016

Trying Electrophysiological Analyses on Stock Market Data

In this post I'm going to analyze data from the stock market using some of the techniques that I typically use to understand brain data. I can't think of a very good reason for why this makes sense or why it should work... but I'm going to do it anyways and you can't tell me what to do.
What you get when you google "crazy stock market people"
A lot of the neuroscience analyses that I do involve analyzing how a signal changes over time. Specifically I look at what voltage does inside or next to brains. Basically, I investigate how different parts of the brain interact with each other based on how their local voltages change over time.

This lead me of think of how different stocks in the stock market might interact. Like voltage recordings in a brain, stock prices move up and down over time and they are spread out over a wide range of industries in the economy. It also seems pretty likely that they might interact with each other in interesting ways that I might be able to characterize using certain analyses. Let's see if I can do that.

Like always, my code for these analyses is on my github here: https://github.com/torbenator/ephys-for-stocks

Getting historical stock market data:

I found this helpful website for how to algorithmically download historical stock market data here: http://www.quantshare.com/sa-43-10-ways-to-download-historical-stock-quotes-data-for-free
Unfortunately, I couldn't get any of the techniques that they described to work except for one, so that was nice.

First problem: What data do I analyze?


There are lots of stocks and I don't know anything about stocks. But I saw a commercial for this thing called sector spiders http://www.sectorspdr.com/sectorspdr/ when I was watching lacrosse once. From what I understand, this company organizes stocks into categories as a way of diversifying your portfolio. This seems like a reasonable way to split the data into a stock market version of "functional networks". I picked 5 random stocks from each category as a small sample to analyze. I could have picked more but I got tired of googling their tickers and wanted to get going. Of the 50 I tried to download I got 41. Certain stocks had different names in the past year and made my parsing method fail. Also I could only download 1 year of data using my scraping method. Such is data science.


It's time for a picture. Here's what the data looks like:


First Analysis: How do stocks correlate with each other? 
i.e. when one stock moves up, does another one reliably move down?

If we correlate all the data we get this:

From this visualization I can see that most stocks correlate positively with each other. This means that most stocks tend to move together - when one goes up, the other goes up as well. But, for some reason, some stocks seem to have really negative correlations across the board.


Here's another (better) way of visualizing the same data but squeezing the y-axis down:
Side-note for python users: There are some ridiculous colors built into matplotlib (
http://stackoverflow.com/questions/22408237/named-colors-in-matplotlib)
Colors I used in this plot included 'lemonchiffon','peachpuff', and 'papayawhip'.

When I sort the stocks by their mean correlation with all other stocks, I can find the stocks that have the highest correlation with all other stocks.
The five stocks with the highest mean correlations are (in order):

Dentsply, Republic Services, Facebook, Nasdaq, and Northrup Grumman

Because they correlate the highest with all other stocks in my sample, it means that they are the best indicators of general stock market activity. In other words, these stocks are the best at speaking for the general trend in the sample. Something like Nasdac makes sense here because its a reflection of a huge chunk of the economy but who would have thought that Dentsply, a dental prosthetics company, would be the best indicator of market activity in this sample?

Conversely - what's up with those stocks that correlate negatively with everything?

The five stocks with the lowest mean correlations are (in order):

Nisource, Exxon, Newmont Mining, Cigna, and Phillips 66

I remember my finance friends telling me that gas prices were generally indicative of a growing economy; when people buy more gas it means they're building, manufacturing, and traveling. But this analysis shows exactly the opposite. When stock in gas companies goes up, values of most other stocks go down. Weird!
Perhaps this is due to my tiny sample size of stocks and extremely short time window. I bet with more data this would get cleaned up. Moving on.



Next question: What are the relationships between stocks within categories?
i.e. are technology stocks all highly correlated with each other while materials stocks aren't?



This figure is pretty similar to the last one but you'll notice that the correlations are generally higher.

Utilities, Financial Services and Energy were by far the least consistent, while Technology, Consumer Discretionary, and Real Estate were the most consistent. Nothing too crazy here but I guess it's pretty neat.


Now onto some ~fancier~ analyses:


What does the power spectrum of stock data look like?

Spectral analyses are things that nerds use to understand periodic signals. And they're fucking rad. If you want a brief explanation of spectral analyses, go here: 
https://github.com/voytekresearch/tutorials/blob/master/Power%20Spectral%20Density%20and%20Sampling%20Tutorial.ipynb
This section might not make much sense if you aren't familiar with frequency domains.

The power spectra of one year of all my stock data look like this:
Power in the signal is distributed towards the low frequency oscillations and has a 1/f slope. (Look up 1/f slopes if you don't know what those are. They are very cool). However there is also a spike at the high end. This means that all of the stocks in my sample generally change over large time windows, but day to day fluctuations are also very important. The fact that these peaks are on either end of my frequency domain means that there is lots of information at scales that I don't have access to. Clearly a lot goes on within each day to influence a stock's value but all of that information gets clumped into the tail here. Likewise, fluctuations of a stock's value over a multi-decade window would provide better estimates of power at larger time windows.

Personally, I can conceptualize this observation better with the following figure where I (unorthodoxly for neuroscience) flipped the x axis. To me this makes these time windows a little more intuitive.

After looking at the spectra of all of the data, its clear that there is a lot of variation in the 1/f slope of the data. Some stocks are flatter and some are steeper. This has real meaning because it tells us how much of a stocks change in value occurs over different ranges of time. Steeper slopes indicate that a stock is a more stable investment but flatter slopes indicate that it is more volatile. I happen to have worked on an algorithm that automatically calculates 1/f slope in these kinds of data in neuroscience so I ran it on this because I guess that's the point of this post.
Exponents of 1/f slopes in stock data 


Here we can clearly see that stocks have a wide range of slopes. Furthermore, there seem to be trends within certain stock categories. Consumer Discretionary stocks tend towards steeper slopes, indicating stability, while Utility stocks seem to have flatter slopes, indicating volatility.

Phase amplitude coupling in the stock market

Phase amplitude coupling is an analysis you can do on electrophysiological data to see how oscillations of different frequencies interact. Specifically, you can see how the phase of a slow oscillation influences the amplitude of a fast oscillation. (see https://github.com/voytekresearch/tutorials/blob/master/Phase%20Amplitude%20Coupling%20Tutorial.ipynb) 
We have pretty good reasons to believe why this makes sense to study in brains. I have no idea why this should make any sense for stock market data but it makes pretty-looking figures. Here is the mean comodulogram (calculating phase amplitude coupling for many phases and amplitudes) of all of the stocks:


Interestingly, there seems to be a some phase amplitude coupling in the stock market! The amplitude of 2-3 month oscillations interacts really strongly with the phase of three-quarter year oscillations. I have no intention for why this would happen in reality. This probably happened because I duplicated (and left-right flipped) the data in order to have enough samples to run this analysis. I don't believe this graph but I think it looks pretty cool.


Conclusions and Future Directions

Do not take anything I did here seriously. Please. I do not claim to be a digital signal processing, financial, or even neuroscience expert. I ran a handful of hacky analyses on an extremely limited amount of data. This was mostly a proof of concept to see if I could find anything at all. That being said, there's something here and these analyses could be helpful jumping-off points. One year of data of only 41 stocks simply isn't enough to characterize much but it's clear that we can use neuroscience inspired analyses to identify:

  • Stocks that are markers or outliers of general market or category activity 
  • Consistency among stocks within and between categories
  • The stability of a stock's value over time using spectral analyses


I am interested in pursuing this a little further. Does anyone know how to get more historical stock data without paying? Does anyone reading this know anything about finance and would you be willing to educate me on how to improve the nonsense that I ran here?



Again all data and scripts that I used to run this analysis are on my github here: https://github.com/torbenator/ephys-for-stocks

1 comment:

  1. Nice blog and absolutely outstanding. You can do something much better but i still say this perfect.Keep trying for the best.
    What is ​Hyperthymesia

    ReplyDelete