The Binomial Distribution is a discrete probability distribution of the number of successes in n independent trials with a binary outcome and probability of success p. There are three important conditions that must be met for the binomial distribution to be used correctly:

- Trials are
**independent** - The probability of success p is
**constant**for all trials - Each trial has only
**two possible outcomes**(binary)

One way to describe a Binomial random variable is in terms of **Bernoulli Trials**. A Bernoulli trial is a random experiment in which there are only two possible outcomes – success and failure. If we let success = 1 and failure = 0, then a Bernoulli Random Variable looks like this:

P(X = 1) = p

P(X = 0) = 1 – p

So really a Binomial experiment is just a sequence of n repeated Bernoulli trials. Formally, a random variable X that counts the number of successes, k, in n trials, has a **Binomial Distribution **with parameters n and p, written X~Bin(n, p). Let p = Probability that success occurs at any given trial. Then the probability of k successes is:

- The first part of the equation is “n choose k” or a combination of k objects from a population of n objects. This expression is the number of ways we can arrange k successes in n trials, where the order of the k successes is unimportant. This is called a Binomial Coefficient.

- The second part of the equation is the expression p raised to the power k, where p is the probability of success in any given trial. This expression takes advantage of the independence assumption – independent probabilities are multiplicative, so the probability of p happening k times is p*p*p*…*p (k times) = p raised to the k power.

- The third part of the equation is the probability of realizing n-k failures where trials are independent and the probability of failure is (1-p). Intuitively, if there are n trials and k of those are successes, the remaining trials, of which there are n-k, must result in failure.

It might help to think backwards and ignore the binomial coefficient for a moment. If k independent events occur with probability p, and n – k independent events occur with probability (1 – p), then the probability that all mentioned events occur is obviously the product (p^{k})*(1-p)^{n-k}. While what we’ve just stated is accurate for *one specific sequence *of k successes and n – k failures, we must consider that there are a number of combinations of outcomes that yield k successes and n – k failures. If we flip a coin three times and define success as landing on heads, there is more than one outcome for k = 2. We could have HHT, HTH, or THH. The exact number of outcomes that yield k successes and n – k failures is the binomial coefficient “n choose k”. That is why it is multiplied by probability of success and failure.

Ex) Three coins are tossed, each with probability p that the coin lands on heads (it may not be a fair coin, so p may not be 0.5). Since coin tosses are independent, we can write:

The table below shows the probabilities associated with each sequence:

The chart above gives the probability for each *outcome* – that is a sequence of heads and tails. It would be incorrect to look at row 6 and then assume that the probability of obtaining 2 heads is the probability in column 4. The probability there is the probability of obtaining exactly the sequence TTH, not the more general case of obtaining two heads.

But if our main interest is the **number **of heads that occur, the sequence HHT is the same as HTH. Note that there are **three** outcomes with exactly two heads, for example, each with probability p^{2}(1 – p). This is where the “n choose k” term becomes necessary. Exactly k successes may be achieved in a number of different ways. In the context of this question, the binomial probability is:

P(k heads) = (# ways to arrange k heads and n-k tails)*(prob. of any specific sequence w/ k heads and n-k tails)

P(k heads) = (# ways to arrange k heads and n-k tails)*p^{k} (1-p)^{n-k}

Implicit in the operations just performed is that a random variable was used to map outcomes in a sample space to the real numbers. Basically, we used a random variable to assign a real number (0, 1, 2, or 3 in this case) to the outcomes (sequences of H and T). A random variable is not random nor is it a variable – it is a function that maps sample outcomes to the real numbers. This is a very simplified definition that will not suffice in a mathematical statistics course most of the time, but that’s basically what’s going on here. The outcomes were grouped in such a way to facilitate problem solving – in this case, it was used to give us the number of heads rather than a specific sequence.

Just as the probability function assigns probabilities to sample outcomes in the sample space, a probability density function assigns probabilities to values that a random variable takes on. Let X be the number of heads that occur in a sequence of 3 coin tosses. Then the probability density function is:

For example:

And here is a chart of the actual probabilities:

It should be clear now why independence of trials is such an important criterion in using the Binomial distribution. If trials are **not** independent, i.e. the outcome of one trial affects the probability of subsequent outcomes, then the binomial model is not applicable. In this situation, the **Hypergeometric Distribution** should be used instead.