Binomial Distribution Basics

The Binomial Distribution is a discrete probability distribution of the number of successes in n independent trials with a binary outcome and probability of success p.  There are three important conditions that must be met for the binomial distribution to be used correctly:

  • Trials are independent
  • The probability of success p is constant for all trials
  • Each trial has only two possible outcomes (binary)

One way to describe a Binomial random variable is in terms of Bernoulli Trials.  A Bernoulli trial is a random experiment in which there are only two possible outcomes – success and failure.  If we let success = 1 and failure = 0, then a Bernoulli Random Variable looks like this:

P(X = 1) = p

P(X = 0) = 1 – p

So really a Binomial experiment is just a sequence of n repeated Bernoulli trials.  Formally, a random variable X that counts the number of successes, k, in n trials, has a Binomial Distribution with parameters n and p, written X~Bin(n, p).  Let p = Probability that success occurs at any given trial.  Then the probability of k successes is:

Screen shot 2013-11-14 at 2.10.11 AM

  • The first part of the equation is “n choose k” or a combination of k objects from a population of n objects.  This expression is the number of ways we can arrange k successes in n trials, where the order of the k successes is unimportant.  This is called a Binomial Coefficient.
  • The second part of the equation is the expression p raised to the power k, where p is the probability of success in any given trial.  This expression takes advantage of the independence assumption – independent probabilities are multiplicative, so the probability of p happening k times is p*p*p*…*p (k times) = p raised to the k power.
  • The third part of the equation is the probability of realizing n-k failures where trials are independent and the probability of failure is (1-p).  Intuitively, if there are n trials and k of those are successes, the remaining trials, of which there are n-k, must result in failure.

It might help to think backwards and ignore the binomial coefficient for a moment.  If k independent events occur with probability p, and n – k independent events occur with probability (1 – p), then the probability that all mentioned events occur is obviously the product (pk)*(1-p)n-k.  While what we’ve just stated is accurate for one specific sequence of k successes and n – k failures, we must consider that there are a number of combinations of outcomes that yield k successes and n – k failures.  If we flip a coin three times and define success as landing on heads, there is more than one outcome for k = 2.  We could have HHT, HTH, or THH.   The exact number of outcomes that yield k successes and n – k failures is the binomial coefficient “n choose k”.  That is why it is multiplied by probability of success and failure.

Ex) Three coins are tossed, each with probability p that the coin lands on heads (it may not be a fair coin, so p may not be 0.5).  Since coin tosses are independent, we can write:

Screen shot 2013-11-14 at 2.11.36 AM

The table below shows the probabilities associated with each sequence:

Screen shot 2013-11-14 at 2.12.20 AM

The chart above gives the probability for each outcome – that is a sequence of heads and tails.  It would be incorrect to look at row 6 and then assume that the probability of obtaining 2 heads is the probability in column 4.  The probability there is the probability of obtaining exactly the sequence TTH, not the more general case of obtaining two heads.

But if our main interest is the number of heads that occur, the sequence HHT is the same as HTH.  Note that there are three outcomes with exactly two heads, for example, each with probability p2(1 – p).  This is where the “n choose k” term becomes necessary.  Exactly k successes may be achieved in a number of different ways.  In the context of this question, the binomial probability is:

P(k heads) = (# ways to arrange k heads and n-k tails)*(prob. of any specific sequence w/ k heads and n-k tails)

P(k heads) = (# ways to arrange k heads and n-k tails)*pk (1-p)n-k

Implicit in the operations just performed is that a random variable was used to map outcomes in a sample space to the real numbers.  Basically, we used a random variable to assign a real number (0, 1, 2, or 3 in this case) to the outcomes (sequences of H and T).  A random variable is not random nor is it a variable – it is a function that maps sample outcomes to the real numbers.  This is a very simplified definition that will not suffice in a mathematical statistics course most of the time, but that’s basically what’s going on here.  The outcomes were grouped in such a way to facilitate problem solving – in this case, it was used to give us the number of heads rather than a specific sequence.

Just as the probability function assigns probabilities to sample outcomes in the sample space, a probability density function assigns probabilities to values that a random variable takes on.  Let X be the number of heads that occur in a sequence of 3 coin tosses.  Then the probability density function is:

Screen shot 2013-11-14 at 2.13.51 AM

For example:

Screen shot 2013-11-14 at 2.14.30 AM

And here is a chart of the actual probabilities:

Screen shot 2013-11-14 at 2.15.01 AM

It should be clear now why independence of trials is such an important criterion in using the Binomial distribution.  If trials are not independent, i.e. the outcome of one trial affects the probability of subsequent outcomes, then the binomial model is not applicable.  In this situation, the Hypergeometric Distribution should be used instead.

About schapshow

Math & Statistics graduate who likes gymnastics, 90s alternative music, and statistical modeling. View all posts by schapshow

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Open source projects for neuroscience!

Systematic Investor

Systematic Investor Blog

Introduction to Data Science, Columbia University

Blog to document and reflect on Columbia Data Science Class

Heuristic Andrew

Good-enough solutions for an imperfect world

"History doesn't repeat itself but it does rhyme"

My Blog

take a minute, have a seat, look around

Data Until I Die!

Data for Life :)

R Statistics and Programming

Resources and Information About R Statistics and Programming

Models are illuminating and wrong

A data scientist discussing his journey in the analytics profession

Xi'an's Og

an attempt at bloggin, nothing more...

Practical Vision Science

Vision science, open science and data analysis

Big Data Econometrics

Small posts about Big Data.

Simon Ouderkirk

Remote Work, Small Data, Digital Hospitality. Work from home, see the world.


Quantitative research, trading strategy ideas, and backtesting for the FX and equity markets


I can't get no

The Optimal Casserole

No Line Is Ever Pointless

SOA Exam P / CAS Exam 1

Preparing for Exam P / Exam 1 thru Problem Solving


Mathematical statistics for the layman.

%d bloggers like this: