Suppose we’re sampling from a population with mean μ and variance σ^{2}. Formally, the independent random variables {X_{1}, X_{2}, …, X_{n}} comprise our sample of size n from the distribution of interest. The random variables observed in the sample will likely all be different, but they all come from the same distribution. The stipulation that sample observations are independent is important.

Though it’s often stated in other terms, the **Central Limit Theorem** is a statement about the *sum* of the independent random variables observed in the sample. What can we say about the quantity below?

We could start by **standardizing **the quantity S_{n}; this is done in the same manner that one would standardize a test statistic while testing a hypothesis. To standardize the sum, we subtract the mean and divide by the standard deviation. The standardized quantity can be interpreted as the number of directed standard deviations said quantity is from its mean. Some people call this a Z score. We will make use of the fact that the sample random variables are independent in deriving the mean and variance of S_{n}, which is why the independence assumption is so important. So, our standardized sum of random variables in the sample is going to be of the form:

If we substitute in the values for the expected value and standard deviation of S_{n} that we derived above, we have the expression:

The CLT is a statement about this standardized sum of observations from a population. Intuitively, the theorem states that as the sample size (n) grows arbitrarily large, the distribution of the **sum** of the sample values (the X_{i}’s) tends towards the normal distribution. Mathematically, this means:

In the expression above, the sample values are {X_{1}, X_{2}, …, X_{n}}, the expected value of their **sum **is nμ and the standard deviation of their sum is (√n)σ. In words, the expected value of the sum of all n sample observations is n times the population mean μ and the standard deviation of the sum of all n sample observations is the square root of n times the population standard deviation σ. I know I’m beating a dead horse, but it’s important that the CLT is a statement about the **sum** of observations in a sample. Therefore, the quantity between numbers a and b in the inequality is the standardized sum of sample observations, or a z score if you want to call it that. On the right side is the probability density function (pdf) of the normal distribution. Integrating any continuous pdf over a region (a, b) gives the probability of attaining a value greater than a and less than b, so the right hand side of the equality above can be interpreted as the probability that a standard normal random variable takes on a value between the numbers a and b.

Putting everything together, the theorem states that if we take the sum of n observations from *any *distribution and standardize it, the probability that this quantity lies in a given range approaches can be approximated with the normal distribution as n increases. That is, the distribution of the standardized sum of sample values approaches the normal distribution as the size of the sample increases.

The best way to illustrate this is probably with the uniform distribution. Consider rolling a die – it’s a trivial example, but it’s a familiar concept. The outcome of rolling the die is uniformly distributed over (1,6), because the probability that you roll a 1, 2, 3, 4, 5, or 6 is 1/6 or approx. .167. It’s easy to show that the random variable X, where X corresponds to the dots showing after rolling one die, has a mean of E(X) = 3.5 and a Standard Deviation of approximately 1.4434.

Now let the random variable X be the *sum* of the dots showing when you roll two dice. X can take on a minimum of 2 (if you ‘snake eyes’) and a maximum of 12 (if you roll two sixes). X has a mean of of 7 and a standard deviation of 2.0412, and is distributed like the graph below:

To demonstrate the CLT, we want to increase the number of dice we roll when we compute the sum. Below is the pdf for n = 3, that is, rolling three dice and computing their sum.

And as we let n get larger and larger, the histogram looks a lot like the normal curve:

So when we consider the sum of *more* rolls, the curve looks normal. This is the CLT working! In another post, I’ll talk about ways that we can test the normality of a given distribution and apply the theorem to the sample mean.