The **Maximum Likelihood Estimate **is one estimate of a parameter’s value. It basically answers the question: “Given my data, what’s the most likely value of an unknown parameter”. To use it, we need to compute the likelihood function by taking a sample of n observations from the same distribution, evaluating each of the n observations, (x_{1}, …, x_{n}), at the pdf p_{X}(x), and taking the product of the result for all n.

*Note**: The expression p _{X}(x; θ) is just the distribution of the population from which we’re sampling – usually the parameter is added with a semicolon to emphasize the fact that the distribution is characterized by the unknown parameter*

The likelihood function is just the joint density of the random sample. Since samples are independent and identically distributed (iid), the joint pdf of all n sample observations is the product of the individual densities. This is the same principle that lets us multiply the P(A), P(B), and P(C) together to find P(A intersect B intersect C) when events A, B, and C are independent. Suppose we take a sample of size n = 3 from the distribution, and the resulting values are x_{1}, x_{2}, and x_{3}. What’s the probability associated with the three sample values? That is, what’s the joint density of the three sample values, p_{X}(x_{1}, x_{2}, x_{3})?

Generalizing this case for all n clearly gives us the result that the joint density of n randomly drawn sample values is the product of individual densities and the likelihood function is nothing more than the joint pdf of the sample – a multivariate probability density function of the values taken on my the n random variables in the sample.

The likelihood function is a function of the unknown parameter. The **Maximum** **Likelihood** **Estimate** for the unknown parameter is the parameter value that maximizes the likelihood function:

We use calculus to find this value, by first taking the derivative of the likelihood function with respect to the unknown parameter, then setting it equal to 0 and solving for the parameter. Don’t forget to verify conditions so as to make sure you are indeed finding a maximum.

This will usually involve complicated, messy math. To mitigate this, we sometimes work with the logarithm of the likelihood function and use properties of logs to simplify computations. This won’t chance our answer – taking the logarithm of some function won’t change the point at which the maximum value is achieved.

The value of the parameter that you end up with maximizes the probability of your sample values x_{1}, …,x_{n}. You could say it’s the value “most consistent” with the observed sample – the **Maximum Likelihood Estimate**.