Tag Archives: simulation

Simulating Waiting Times with R

In my last post, I used Mathematica to show the connection between events that follow a Poisson process (i.e. occur independently at a constant rate per unit time/space) and the waiting time between successive events, which is exponentially distributed.  By the memoryless property, the waiting time between successive events also happens to be the time until the first event occurs – making the exponential distribution a reasonable model of the time until failure of batteries, electronic components, machines, and so forth.  Using the exponential model, I showed that the probability that a component lasts longer than average (i.e. exceeds its expected lifetime of 15 months in the example) is only 36.79%, which tells us the distribution is skewed (i.e. the average and the median aren’t equal).  Below is the equation to compute this probability and below that are plots of the theoretical probability density function and cumulative density functions.

three

Rplot_pdf Rplot_cdf

In this post I’m going to simulate the previous result using R.  I provided the R code to generate the results but I’m a pretty bad programmer so I’d encourage you to look elsewhere for programming methodology etc. but nonetheless my code does produce the intended result.  Before starting I loaded the MASS library because it’s a quick way to make plots look less shitty if you’re not an R pro.

First, I simulated 1000 random draws from an exponential distribution with parameter 1/15 and assign the result to a variable called sample:

n <- 1000

sample <- rexp(n, rate=1/15)

Then I made a histogram of ‘sample’ using the truehist command (which you’ll need the MASS library to use:

Exp_Sim

Now I want to somehow count how many of the random waiting times are greater than the average, 15.  The graph gives us an idea – it doesn’t look more than a third – but it’s not precise.  To know for sure I create a list of only the values that exceeded 15 in the sample called upper_sample with the help of a logical vector that returns true if the corresponding value in ‘sample’ is greater than 15 and false otherwise:

above_avg <- sample > 15

upper_sample <- sample[above_avg]

truehist(upper_sample, main=”Sample Waiting Times > 15″)

upper_sample_dist

The number of elements in the upper_sample vector divided by the number of trials gives me the probability that a component lasts more than 15 hours, which I named psuccess:

psuccess <- length(upper_sample)/n
psuccess

There were 377 observations in my sample, so the simulated probability that a component lasts more than 15 hours is 37.7%, about 1% off from the true value.


www.openeuroscience.com/

Open source projects for neuroscience!

Systematic Investor

Systematic Investor Blog

Introduction to Data Science, Columbia University

Blog to document and reflect on Columbia Data Science Class

Heuristic Andrew

Good-enough solutions for an imperfect world

r4stats.com

"History doesn't repeat itself but it does rhyme"

My Blog

take a minute, have a seat, look around

Data Until I Die!

Data for Life :)

R Statistics and Programming

Resources and Information About R Statistics and Programming

Models are illuminating and wrong

A data scientist discussing his journey in the analytics profession

Xi'an's Og

an attempt at bloggin, nothing more...

Practical Vision Science

Vision science, open science and data analysis

Big Data Econometrics

Small posts about Big Data.

Simon Ouderkirk

Remote Work, Small Data, Digital Hospitality. Work from home, see the world.

rbresearch

Quantitative research, trading strategy ideas, and backtesting for the FX and equity markets

Statisfaction

I can't get no

The Optimal Casserole

No Line Is Ever Pointless

SOA Exam P / CAS Exam 1

Preparing for Exam P / Exam 1 thru Problem Solving

schapshow

Mathematical statistics for the layman.