Suppose that you collected data

$latex mathbf{X} = X_1, X_2, …, X_n$

in order to **estimate** a **parameter** $latex theta$. Let $latex f_theta(x)$ be the **probability density function (PDF)*** for $latex X_1, X_2, …, X_n$.

Let

$latex t = T(mathbf{X})$

be a **statistic** based on $latex mathbf{X}$. Let $latex g_theta(t)$ be the PDF for $latex T(X)$.

If the **conditional PDF**

$latex h_theta(mathbf{X}) = f_theta(x) div g_theta[T(mathbf{X})]$

is **independent** of $latex theta$, then $latex T(mathbf{X})$ is a **sufficient statistic** for $latex theta$. In other words,

$latex h_theta(mathbf{X}) = h(mathbf{X})$,

and $latex theta$ does not appear in $latex h(mathbf{X})$.

Intuitively, this means that $latex T(mathbf{X})$ contains everything you need to estimate $latex theta$, so knowing $latex T(mathbf{X})$ (i.e. conditioning $latex f_theta(x)$ on $latex T(mathbf{X})$) is sufficient for estimating $latex theta$.

Often, $latex T(mathbf{X})$ is a **summary statistic** of $latex X_1, X_2, …, X_n$, such as their

**sample mean****sample median****sample minimum****sample maximum**

If such a summary…

View original post 36 more words

## Leave a Reply