Modeling Stock Market Behavior

In the finance world, there’s some debate about whether or not the daily closing prices for various stock market indices convey  useful information.  Some financiers subscribe to the belief that the daily close price reflects market trends and impacts the probability of realizing a good return.  Others disagree, claiming that the day-to-day movements in the stock market are completely random and convey no useful information.  If this is true, then the closing price changes in the stock market should mirror a geometric random variable.  In this post I’ll explain why the geometric model would imply that stock market fluctuations are random and then test the validity of the model empirically.

Suppose the outcome of some even is binary, and success occurs with probability p.   Obviously failure must occur with probability 1 – p.  A geometric random variable models the number of trials that take place before the first success.  It takes on the value k when the first success occurs on the kth trial.  Trials are assumed to be independent, so we can write the probability density function of the random variable X as follows:

Screen shot 2014-02-07 at 9.09.52 AM

We used the independence assumption to rewrite the event “k-1 failures and a success on trial k” as the product of two distinct groups of events, namely k -1 failures and then 1 success.  Now we use the fact that success occurs with probability p (and the independence assumption, again) to write the following:

Screen shot 2014-02-07 at 9.10.00 AM

To model the behavior of the stock market as a geometric random variable, assume that on day 1 the market has fallen from the previous day.  We’ll call this fall in the closing price a “failure” that occurs with probability 1 – p.  Let the geometric random variable X represent the number of subsequent failures that occur until the stock market rises (“success”).  For example, if on the second day the stock market rises, the random variable X takes on the value 1, because there was only one decline (failure) until the rise (success) that occurred on the second day.  Similarly, if the market declines on days 2, 3, and 4 and rises on day 5, then it has declined on four occasions before rising on the fifth day and thus the random variable X takes on the value 4.  Keep in mind that it is stipulated in the formulation of the random variable that the market declined on day one, and therefore a fall on days 2, 3, and 4 is a sequence of four failures, not three.

To determine whether a geometric model fits the daily behavior of the stock market, we have to estimate the parameter p.  In our model, we are addressing the question of whether stock market price fluctuations are geometric.  Geometric random variables can take on infinitely many values of p (so long as p is between 0 and 1), so our model doesn’t address the probability with which the stock market rises and falls; the geometric model addresses the behavior of the random variable for a given p.  The value p takes on may be of interest in formulating other questions, but here its job is to create a realistic geometric model that we can compare to empirical stock market data.  If the stock market data fits the geometric model, the implication is that stock markets tend to rise and fall randomly with a constant probability of success.  This suggests that daily stock market quotes are meaningless in that today’s price does not reflect historical prices.  One could say that if this model fits stock markets don’t “remember” yesterday, but that sounds a lot like something called the memoryless property, an important characteristic of the exponential distribution, so we should be careful to not confuse the two.

Once we get some empirical data, we’re going to estimate the probability of success p.  So let’s solve for the general case and then compute an actual value with data afterwards.  There is no one way to estimate the value of a parameter, but one good way to do so is to use the maximum likelihood estimator of the parameter.  The idea is simple, but sometimes computationally difficult.  To estimate the value of p with the maximum likelihood estimator, we find the value of p for which the observed sample is mostly likely to have occurred.  We are basically maximizing the “likelihood” that the sample data comes from a distribution with parameter p.   To do this, we take the likelihood function, which is the product of the probability density function of the underlying distribution evaluated at each sample value:

Screen shot 2014-02-07 at 9.11.29 AM

For our model, we just need to substitute in the pdf of a geometric random variable for the generic pdf above and replace theta with p, the probability of success:

Screen shot 2014-02-07 at 9.11.35 AM

To find the maximum likelihood estimate for p, we maximize the likelihood function with respect to p.  That is, we take its partial derivative with respect to p and set it equal to 0.  However, it’s computationally simpler to work with the natural logarithm of the likelihood function.  This won’t affect the value of p that maximizes L(p), since the natural logarithm of L(p) is a positive, increasing function of L(p).  Sometimes you’ll hear of “Log-likelihood functions”, and this is precisely what they are  – just the log of a likelihood function that facilitates easier calculus.

Screen shot 2014-02-07 at 9.12.56 AM

Taking the derivative of this function is a lot easier than the likelihood function we had before:

Screen shot 2014-02-07 at 9.13.05 AM

So our maximum likelihood estimate of p (the probability of success) is one divided by the sample average, or, equivalently, n divided by the sum of all the k values in our sample.  This gives us the value of p that is most consistent with the n observations k1, …, k.  Below is a table of k values derived from closing data for the Dow Jones over the course of the year 2006-2007.

Recall that the random variable X takes on the value K when K – 1 failures occur (market price decreases) before a success (price increase) on trial k.  For example, X takes on the value k = 1 72 times in our dataset, which means that on 72 occasions over the course of the year there was only one failure before the first success; that is, the market declined on day 1 (by definition) and subsequently rose on day 2.  Similarly, there were 35 occasions where two failures were realized before a success, because the random variable X took on the value k = 2 on 35 occasions.

K Observed Freq.

1

72

2

35

3

11

4

6
5

2

6

2

We now have the necessary data to compute p.  We have 128 observations (values of k), so n = 128.  There are two ways we can compute p.  First, we could take the sample mean of the dataset how we normally would for a discrete random variable and then utilize formula 1 above:

Screen shot 2014-02-07 at 9.14.43 AM

The second formula obviously yields the same result, as you directly compute 128/221 instead of first computing its reciprocal.  So we now have a maximum likelihood estimate for the parameter p.  We can use this to model the stock price movement as a geometric random variable.  First let’s make the assumption that the stock market can in fact be modeled this way.  Given our value of p, what would we expect for the values of k?  that is, what proportion or with what frequency do we expect X to take on the values k = 1, 2, … ? First we’ll compute this, and then compare to the empirical data.

Screen shot 2014-02-07 at 9.15.27 AM

The probability that X takes on the value one is equal to the probability of success, which is to be expected since X = 1 corresponds to the situation in which success is realized on the day immediately following the initial failure.

Screen shot 2014-02-07 at 9.16.03 AM

And the rest are computed the same way.  Now since we have 128 observations, we can multiply each expected percentage by the number of observations to come up with an expected frequency.  Then, we can compare these to the observed frequencies and judge how well the model fits.

K N Expected % Expected Frequency

1

128

.5792

74.14

2

128

.2437

31.19

3

128

.1027

13.13

4

128

.0432

5.52

5 128

.0182

2.32

6 128

.0132

1.69

Now that we know what we should expect if the geometric model is a valid representation of the stock market, let’s compare the expected frequencies to the observed frequencies:

Expected Frequency Observed Frequency

74.14

72

31.19

35

13.13

11

5.52

6

2.32

2

1.69

2 

The geometric model appears to be a very good fit, which suggests that daily fluctuations in stock market prices are random.  Furthermore, stock indices don’t ‘remember’ yesterday – the probability of the market rising or falling is constant, and whether it actually rises or falls on a given day is subject to random chance.

About schapshow

Math & Statistics graduate who likes gymnastics, 90s alternative music, and statistical modeling. View all posts by schapshow

One response to “Modeling Stock Market Behavior

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

www.openeuroscience.com/

Open source projects for neuroscience!

Systematic Investor

Systematic Investor Blog

Introduction to Data Science, Columbia University

Blog to document and reflect on Columbia Data Science Class

Heuristic Andrew

Good-enough solutions for an imperfect world

r4stats.com

"History doesn't repeat itself but it does rhyme"

My Blog

take a minute, have a seat, look around

Data Until I Die!

Data for Life :)

R Statistics and Programming

Resources and Information About R Statistics and Programming

Models are illuminating and wrong

A data scientist discussing his journey in the analytics profession

Xi'an's Og

an attempt at bloggin, nothing more...

Practical Vision Science

Vision science, open science and data analysis

Big Data Econometrics

Small posts about Big Data.

Simon Ouderkirk

Remote Work, Small Data, Digital Hospitality. Work from home, see the world.

rbresearch

Quantitative research, trading strategy ideas, and backtesting for the FX and equity markets

Statisfaction

I can't get no

The Optimal Casserole

No Line Is Ever Pointless

SOA Exam P / CAS Exam 1

Preparing for Exam P / Exam 1 thru Problem Solving

schapshow

Mathematical statistics for the layman.

%d bloggers like this: