The Sampling Distribution of the Sample Mean

Before talking about the Central Limit Theorem (CLT), it’s necessary to distinguish between the probability distribution of the population of interest and the sampling distribution of the sample mean. Suppose we draw n independent observations from a population with mean μ and standard deviation σ. Each observation is a random variable, so the sample that we draw from the population is a collection of n random variables, all of which have the same probability distribution. We can compute the sample mean, X, from the observed sample values. We use the sample values drawn from the population of interest to compute the sample mean as a way of approximating the true mean μ, which we don’t know. The true mean μ is a parameter, because it is a characteristic of the entire population. It is not feasible (or even possible in most cases) to determine the parameter in question, so we develop mathematical functions with the purpose of approximating the parameter. The inputs to such a function are the sample values – the n observations described previously. So the sample mean is a function of the random variables we observe in a given sample. Any function of a random variable is a random variable itself, so the sample mean is a random variable, complete with all the properties that random variables are endowed with. The sample mean, which we just said is a function of random variables, or, using proper terminology, a statistic, gives us a point estimate of the true population mean μ. It is an unbiased estimator of the population mean, so we know that the expected value of the sample mean is equal to the true population mean. All this tells us is that our guesses will be “centered around” μ; the sample mean fails to give us any indication of dispersion about the population parameter. To address this concern, we have to appeal to the idea that the sample mean is a random variable, and is thus governed by a probability distribution that can tell us about its dispersion. We call the probability distribution of the sample mean the sampling distribution of the sample mean – but it’s really nothing more than the probability distribution of X. Like I mentioned earlier, the sample mean is an unbiased estimator of μ. In other words, E(X) = μ. This is easy to prove: Screen shot 2014-04-19 at 7.17.59 PM The expected value of each of the observations we draw from the population {X1,X2, …, Xn} is the unknown population parameter μ. Notice we’re NOT talking about the sample mean right now – the last expression is simply the sum of n independent observations from the population with mean μ and variance σ. Therefore: Screen shot 2014-04-19 at 7.19.15 PM We’ve shown that the mean of the sampling distribution of the sample mean is the parameter μ. However, this is not true for the standard deviation, as demonstrated below: Screen shot 2014-04-19 at 7.21.06 PM So, the sample mean is distributed normally with a mean of μ and a standard deviation of σ times the square root of the sample size, n. The sample size is not how many sample values we take from the sampling distribution of the sample mean. Instead, it corresponds to the number of observations used to calculate each sample mean in the sampling distribution. A different number for n corresponds to a different sampling distribution of the sample mean, because it has a different variance. The standard deviation of the sampling distribution of the sample mean is usually called the standard error of the mean. Intuitively, it tells us by how much we deviate from the true mean, μ, on average. The previous information tells us that when we take a sample of size n from the population and compute the sample mean, we know two things: we expect, on average, to compute the true population mean, μ; and, the average amount by which we will deviate from the parameter μ is σ/Sqrt(n), the standard error of the mean. Clearly, the standard error of the mean decreases as n increases, which implies that larger samples lead to greater precision (i.e. less variability) in estimating μ. Suppose repair costs for a TV are normally distributed with a population mean of $100 and a standard deviation of $20. To simplify notation, we can write: Repair Cost~N(100, 20).  The sampling distribution of the sample mean is the probability distribution of all sample means computed from a fixed sample of size n. In this case, Cost~N(100, 20/Sqrt(n)). If you want to visualize the distribution of repair costs for the TV in your bedroom, you’d plot a normal curve with a mean of 100 and a standard deviation of 20. If, however, you’d like the visualize average repair cost of the four TVs in your house, the correct curve would still be normal, but it would have a mean of 100 and a standard deviation (in this case, more correctly called a standard error) of 20/Sqrt(4). Now if you include your neighbor’s TVs as well, your sample size will be eight. The probability density function describing the average cost of the eight TV repairs is ~N(100, 20/Sqrt(8)). The three curves are plotted below. Screen shot 2014-04-19 at 1.10.14 PM To demonstrate the idea that the variance and the sample size are inversely related, consider the probability that a TV will require more than $110 worth of repairs. The probability that one TV requires repairs in excess of $110 (i.e. sample size = 1) is equivalent to P(Z > ((110 – 100)/20) = P(Z > 0.5) = 30.85%. The probability that the average repair cost of a sample of four TVs exceeds $110 is P(Z > ((110 – 100)/(20/Sqrt(4)) = P(Z > 1) = 15.87%. For the sample of size n = 8, we get 7.86%. What’s happening is that the value $110 is becoming more ‘extreme’ based on the distribution parameters as we increase the sample size. Since we’ve decreased the standard deviation when we take a sample of size 8 instead of 4, for example, the value $110 is less likely to be observed. Screen shot 2014-04-19 at 1.35.17 PM We can use the sample size n in a more useful way if we consider its impact on the precision of the sample mean. We know a larger n corresponds to a tighter distribution of the sample mean around the true mean.   Therefore, we can achieve a desired level of precision with our point estimator (the sample mean) by manipulating n. Since increasing n generally costs money in practice (think about surveying more respondents, etc.), we want to find the minimum value of n that will give us the desired condition in most cases. Let’s say I want the sample mean to be within 10 units of the true parameter with probability 0.97. To do this, I’ll have to decrease the variability by increasing n. I’d also like to find the minimum value that achieves this result, since I want to spend as little money as possible. In other words, I want to be 97% confident that the value I compute for the sample mean is within 10 units of the true mean, and I want the smallest possible n to do so. Here is the mathematica input: Screen shot 2014-04-19 at 1.57.55 PM Below, sample sizes are plotted against the probability of being within 10 units if the parameter μ: Screen shot 2014-04-19 at 1.58.01 PM Solving for n gives 18.84, so n = 19 is the smallest sample size that gives us the desired precision. Below, the sampling distribution of the sample mean for n = 19 is plotted against the normal curve with mean $100 and standard deviation $20: Screen shot 2014-04-19 at 2.01.20 PM In the next post, I’ll talk about how the properties of the sampling distribution of the sample mean relate to one of the most important theorems in all of mathematics, the Central Limit Theorem.

About schapshow

Math & Statistics graduate who likes gymnastics, 90s alternative music, and statistical modeling. View all posts by schapshow

2 responses to “The Sampling Distribution of the Sample Mean

  • окна саратова

    I’m curious to find out what blog system you have been working with?
    I’m experiencing some minor security issues with my latest website and
    I would like to find something more risk-free. Do you have any solutions?

    • Samchappelle

      To be honest, I’m not an expert in that arena – I use wordpress with the MarsEdit app for my Macbook pro and I’m happy with it. Rather than use a fancy text editor I just write formulas out in MS Word and then screenshot them to put them into my blog. I use Mathematica and SAS for graphs and such. I really don’t know enough about computer stuff to give tips on security and so forth. Sorry!

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Open source projects for neuroscience!

Systematic Investor

Systematic Investor Blog

Introduction to Data Science, Columbia University

Blog to document and reflect on Columbia Data Science Class

Heuristic Andrew

Good-enough solutions for an imperfect world

"History doesn't repeat itself but it does rhyme"

My Blog

take a minute, have a seat, look around

Data Until I Die!

Data for Life :)

R Statistics and Programming

Resources and Information About R Statistics and Programming

Models are illuminating and wrong

A data scientist discussing his journey in the analytics profession

Xi'an's Og

an attempt at bloggin, nothing more...

Practical Vision Science

Vision science, open science and data analysis

Big Data Econometrics

Small posts about Big Data.

Simon Ouderkirk

Remote Work, Small Data, Digital Hospitality. Work from home, see the world.


Quantitative research, trading strategy ideas, and backtesting for the FX and equity markets


I can't get no

The Optimal Casserole

No Line Is Ever Pointless

SOA Exam P / CAS Exam 1

Preparing for Exam P / Exam 1 thru Problem Solving


Mathematical statistics for the layman.

%d bloggers like this: