Before talking about the Central Limit Theorem (CLT), it’s necessary to distinguish between the probability distribution of the population of interest and the *sampling* distribution of the sample mean. Suppose we draw n independent observations from a population with mean μ and standard deviation σ. Each observation is a random variable, so the sample that we draw from the population is a collection of n random variables, all of which have the same probability distribution. We can compute the sample mean, X, from the observed sample values. We use the sample values drawn from the population of interest to compute the sample mean as a way of approximating the true mean μ, which we don’t know. The true mean μ is a **parameter**, because it is a characteristic of the entire population. It is not feasible (or even possible in most cases) to determine the parameter in question, so we develop mathematical functions with the purpose of approximating the parameter. The inputs to such a function are the sample values – the n observations described previously. So the sample mean is a *function* of the random variables we observe in a given sample. Any function of a random variable is a random variable itself, so the sample mean is a random variable, complete with all the properties that random variables are endowed with. The sample mean, which we just said is a function of random variables, or, using proper terminology, a **statistic**, gives us a point estimate of the true population mean μ. It is an unbiased estimator of the population mean, so we know that the expected value of the sample mean is equal to the true population mean. All this tells us is that our guesses will be “centered around” μ; the sample mean fails to give us any indication of dispersion about the population parameter. To address this concern, we have to appeal to the idea that the sample mean is a random variable, and is thus governed by a probability distribution that can tell us about its dispersion. We call the probability distribution of the sample mean the **sampling distribution of the sample mean** – but it’s really nothing more than the probability distribution of X. Like I mentioned earlier, the sample mean is an **unbiased** estimator of μ. In other words, E(X) = μ. This is easy to prove: The expected value of each of the observations we draw from the population {X_{1},X_{2}, …, X_{n}} is the unknown population parameter μ. Notice we’re NOT talking about the sample mean right now – the last expression is simply the sum of n independent observations from the population with mean μ and variance σ. Therefore: We’ve shown that the mean of the sampling distribution of the sample mean is the parameter μ. However, this is not true for the standard deviation, as demonstrated below: So, the sample mean is distributed normally with a mean of μ and a standard deviation of σ times the square root of the sample size, n. The sample size is not how many sample values we take from the sampling distribution of the sample mean. Instead, it corresponds to the number of observations used to calculate each sample mean in the sampling distribution. A different number for n corresponds to a different sampling distribution of the sample mean, because it has a different variance. The standard deviation of the sampling distribution of the sample mean is usually called the **standard error **of the mean. Intuitively, it tells us by how much we deviate from the true mean, μ, on average. The previous information tells us that when we take a sample of size n from the population and compute the sample mean, we know two things: we expect, on average, to compute the true population mean, μ; and, the average amount by which we will deviate from the parameter μ is σ/Sqrt(n), the standard error of the mean. Clearly, the standard error of the mean decreases as n increases, which implies that larger samples lead to greater precision (i.e. less variability) in estimating μ. Suppose repair costs for a TV are normally distributed with a population mean of $100 and a standard deviation of $20. To simplify notation, we can write: Repair Cost~N(100, 20). The sampling distribution of the sample mean is the probability distribution of all sample means computed from a fixed sample of size n. In this case, Cost~N(100, 20/Sqrt(n)). If you want to visualize the distribution of repair costs for the TV in your bedroom, you’d plot a normal curve with a mean of 100 and a standard deviation of 20. If, however, you’d like the visualize* average* repair cost of the four TVs in your house, the correct curve would still be normal, but it would have a mean of 100 and a standard deviation (in this case, more correctly called a standard error) of 20/Sqrt(4). Now if you include your neighbor’s TVs as well, your sample size will be eight. The probability density function describing the average cost of the eight TV repairs is ~N(100, 20/Sqrt(8)). The three curves are plotted below. To demonstrate the idea that the variance and the sample size are inversely related, consider the probability that a TV will require more than $110 worth of repairs. The probability that one TV requires repairs in excess of $110 (i.e. sample size = 1) is equivalent to P(Z > ((110 – 100)/20) = P(Z > 0.5) = 30.85%. The probability that the *average* repair cost of a sample of four TVs exceeds $110 is P(Z > ((110 – 100)/(20/Sqrt(4)) = P(Z > 1) = 15.87%. For the sample of size n = 8, we get 7.86%. What’s happening is that the value $110 is becoming more ‘extreme’ based on the distribution parameters as we increase the sample size. Since we’ve decreased the standard deviation when we take a sample of size 8 instead of 4, for example, the value $110 is less likely to be observed. We can use the sample size n in a more useful way if we consider its impact on the precision of the sample mean. We know a larger n corresponds to a tighter distribution of the sample mean around the true mean. Therefore, we can achieve a desired level of precision with our point estimator (the sample mean) by manipulating n. Since increasing n generally costs money in practice (think about surveying more respondents, etc.), we want to find the minimum value of n that will give us the desired condition in most cases. Let’s say I want the sample mean to be within 10 units of the true parameter with probability 0.97. To do this, I’ll have to decrease the variability by increasing n. I’d also like to find the minimum value that achieves this result, since I want to spend as little money as possible. In other words, I want to be 97% confident that the value I compute for the sample mean is within 10 units of the true mean, and I want the smallest possible n to do so. Here is the mathematica input: Below, sample sizes are plotted against the probability of being within 10 units if the parameter μ: Solving for n gives 18.84, so n = 19 is the smallest sample size that gives us the desired precision. Below, the sampling distribution of the sample mean for n = 19 is plotted against the normal curve with mean $100 and standard deviation $20: In the next post, I’ll talk about how the properties of the sampling distribution of the sample mean relate to one of the most important theorems in all of mathematics, the Central Limit Theorem.

August 4th, 2014 at 12:56 am

I’m curious to find out what blog system you have been working with?

I’m experiencing some minor security issues with my latest website and

I would like to find something more risk-free. Do you have any solutions?

August 5th, 2014 at 4:06 pm

To be honest, I’m not an expert in that arena – I use wordpress with the MarsEdit app for my Macbook pro and I’m happy with it. Rather than use a fancy text editor I just write formulas out in MS Word and then screenshot them to put them into my blog. I use Mathematica and SAS for graphs and such. I really don’t know enough about computer stuff to give tips on security and so forth. Sorry!