Website is Moving!

My website is moving to, because apparently I’ve already purchased that domain.  There’s probably a way to transfer it to the address I’m on right now, but I’m really not tech savvy enough to attempt that.  Anyway, subscribe to if you’re interested.  My own domain and premium theme should make the content more visually appealing and better organized.  Thanks!

Simulating Waiting Times with R

In my last post, I used Mathematica to show the connection between events that follow a Poisson process (i.e. occur independently at a constant rate per unit time/space) and the waiting time between successive events, which is exponentially distributed.  By the memoryless property, the waiting time between successive events also happens to be the time until the first event occurs – making the exponential distribution a reasonable model of the time until failure of batteries, electronic components, machines, and so forth.  Using the exponential model, I showed that the probability that a component lasts longer than average (i.e. exceeds its expected lifetime of 15 months in the example) is only 36.79%, which tells us the distribution is skewed (i.e. the average and the median aren’t equal).  Below is the equation to compute this probability and below that are plots of the theoretical probability density function and cumulative density functions.


Rplot_pdf Rplot_cdf

In this post I’m going to simulate the previous result using R.  I provided the R code to generate the results but I’m a pretty bad programmer so I’d encourage you to look elsewhere for programming methodology etc. but nonetheless my code does produce the intended result.  Before starting I loaded the MASS library because it’s a quick way to make plots look less shitty if you’re not an R pro.

First, I simulated 1000 random draws from an exponential distribution with parameter 1/15 and assign the result to a variable called sample:

n <- 1000

sample <- rexp(n, rate=1/15)

Then I made a histogram of ‘sample’ using the truehist command (which you’ll need the MASS library to use:


Now I want to somehow count how many of the random waiting times are greater than the average, 15.  The graph gives us an idea – it doesn’t look more than a third – but it’s not precise.  To know for sure I create a list of only the values that exceeded 15 in the sample called upper_sample with the help of a logical vector that returns true if the corresponding value in ‘sample’ is greater than 15 and false otherwise:

above_avg <- sample > 15

upper_sample <- sample[above_avg]

truehist(upper_sample, main=”Sample Waiting Times > 15″)


The number of elements in the upper_sample vector divided by the number of trials gives me the probability that a component lasts more than 15 hours, which I named psuccess:

psuccess <- length(upper_sample)/n

There were 377 observations in my sample, so the simulated probability that a component lasts more than 15 hours is 37.7%, about 1% off from the true value.

Fixed Income Investing at the Zero Lower Bound–Throw Away Your Fucking Finance Textbook

There’s a hopelessly confusing mess of information – some grounded in fact, some appearing on Squawk Box and similar shows – about fixed income investing in today’s macro environment.  In this post, I’m not predicting the future or recommending any investment strategy, but instead explaining how and why some government securities did really well in 2014 even though pundits abound cursed them as a surefire way to lose money in 2014 .  And, I should add, said pundits weren’t really all that stupid in doing so, because an oversimplified analysis of the interest rate environment pre-2014 pointed in that direction.

Rates on the U.S. 10 year treasury went down in 2014, from about 3% to 2.17%, despite widespread predictions that monetary policy would inevitably cause rates to increase. If pundits everywhere predicted this, why did the opposite materialize?

Global growth slowed down in 2014, dragging inflation down with it. Greece had another…episode, which reintroduced the possibility of a ‘Grexit’, pushing rates down further. So, by 2014’s end, the 10 year treasury rate had decreased by about 75 basis points, moving in the opposite direction from rates on high-yield debt. In other words, credit spreads widened in 2014, meaning investors demanded more compensation in order to take on credit risk (i.e. lend to corporations via bonds).

Intuitively, widening credit spreads are the result of investors demanding more compensation in exchange for lending to corporations and other risky entities, because they perceive them to be increasingly susceptible to risk in comparison to the U.S. government. Why would this be the case? Usually because the economic outlook isn’t so good – investors fear corporations are less credit-worthy due to their prospects and demand to be compensated accordingly. Furthermore, investors tend to seek the safest securities during times like these – i.e. ‘flight to quality’ occurs – namely U.S. government securities, which pushes down rates. This increases the credit spread, pushing risk free rates further away from rates on other debt.

The scenario described above is why in 2014 the average return on the 10 yr. treasury was about 10% despite the consensus forecast that they would perform poorly. Remember, the price of a bond is inversely related to the rate of interest, which is why the widening credit spread and corresponding decrease in rates pushed up the value of 10 year treasuries held by investors. To see why, think about the discount factor associated with a 10 year treasury at 3.02% versus 2.17%. In the first case (i.e. at the beginning of 2014), a $1000 par value bond is worth $741. When the applicable rate of interest falls to 2.17%, however, the corresponding price is $805.87. Clearly, the value of the 10 year US treasuries was impacted favorably by the decrease in rates, benefitting the investors who held them throughout the year. Moral of the story: interest rates are fucked up and hard to predict, and Squawk Box is almost never right.

Low Volatility ETFs

The hip new financial product fangirled by every personal finance columnist on the internet is the low volatility ETF.  It is pretty much exactly what it sounds like – an ETF that, while tracking whichever index/industry/etc. it is supposed to, attempts to limit the variability of returns.  You can think of it as a stock with a low beta that moves with the trend of the market but not as severely in either direction during business cycle booms and busts.   Methodologies vary, but techniques are employed to limit the variance of individual holdings as well as the correlation between them.  I analyzed the performance of the PowerShares Low Volatility S&P 500 ETF (SPLV) to see how it stacks up against the market as a whole.

Over the past four years, the S&P500 had both a significantly higher maximum and lower minimum return compared to the PowerShares Low Volatility Index.  The S&P experienced many more extreme returns (+/- 1% daily return), suggesting that returns on SPLV fluctuate less than the market.  The S&P also earned a lower average return with higher variance than SPLV.

Period 5/6/11 to 1/6/15

S&P 500 SPLV
Max Daily Return 4.63% 3.75%
Min Daily Return -6.90% -5.18%
Returns less than -1% 98 62
Returns greater than 1% 110 71
Average Daily Return 0.04% 0.06%
Average Annual Return 0.99% 0.75%
Standard Deviation of Daily Return 10.98% 14.03%
Standard Deviation of Annual Return 15.71% 11.85%

The table below is the same analysis for only the year 2014, during which the US equity market posted more gains.

Year 2014

S&P 500 SPLV
Max Daily Return 2.37% 2.00%
Min Daily Return -2.31% -1.99%
Returns less than -1% 19 14
Returns greater than 1% 19 13
Average Daily Return 0.04% 0.06%
Average Annual Return 0.72% 0.60%
Standard Deviation of Daily Return 10.70% 15.80%
Standard Deviation of Annual Return 11.34% 9.55%

The claim that the PowerShares Low Volatility ETF (SPLV) tracks the S&P with less variability in returns  is corroborated by this simple analysis.  The graph of daily close prices and trading volume below also seems to corroborate this – the S&P500 Index (Yellow) fluctuates around the steady-ish path followed by SPLV (Blue).  The ETF misses out on some gains during the summer months, but outperforms later in the year.

Untitled picture2

Interestingly, the fund achieves its low volatility by being overweight in Healthcare and Financials, not the quintessentially low-risk sectors like Telecom or Utilities.



The Maximum Likelihood Estimate of an unknown parameter


The Maximum Likelihood Estimate is one estimate of a parameter’s value.  It basically answers the question: “Given my data, what’s the most likely value of an unknown parameter”.  To use it, we need to compute the likelihood function by taking a sample of n observations from the same distribution, evaluating each of the n observations, (x1, …, xn), at the pdf pX(x), and taking the product of the result for all n.

Likelihood Function

Note: The expression pX(x; θ) is just the distribution of the population from which we’re sampling – usually the parameter is added with a semicolon to emphasize the fact that the distribution is characterized by the unknown parameter

The likelihood function is just the joint density of the random sample.  Since samples are independent and identically distributed (iid), the joint pdf of all n sample observations is the product of the individual densities.  This is the same principle that lets us multiply the P(A), P(B), and P(C) together to find P(A intersect B intersect C) when events A, B, and C are independent.  Suppose we take a sample of size n = 3 from the distribution, and the resulting values are x1, x2, and x3.  What’s the probability associated with the three sample values? That is, what’s the joint density of the three sample values, pX(x1, x2, x3)?


Generalizing this case for all n clearly gives us the result that the joint density of n randomly drawn sample values is the product of individual densities and the likelihood function is nothing more than the joint pdf of the sample – a multivariate probability density function of the values taken on my the n random variables in the sample.

The likelihood function is a function of the unknown parameter.  The Maximum Likelihood Estimate for the unknown parameter is the parameter value that maximizes the likelihood function:

We use calculus to find this value, by first taking the derivative of the likelihood function with respect to the unknown parameter, then setting it equal to 0 and solving for the parameter.  Don’t forget to verify conditions so as to make sure you are indeed finding a maximum.


This will usually involve complicated, messy math.  To mitigate this, we sometimes work with the logarithm of the likelihood function and use properties of logs to simplify computations.  This won’t chance our answer – taking the logarithm of some function won’t change the point at which the maximum value is achieved.


The value of the parameter that you end up with maximizes the probability of your sample values x1, …,xn.  You could say it’s the value “most consistent” with the observed sample – the Maximum Likelihood Estimate.

The Game Theory of Soccer Penalty kicks

Nova workboard

It is reasonable to claim that when a soccer game has penalty kicks, these are one of the most important moments of the game and in that moment everyone tries to guess which side will the striker kick. Indeed penalty kicks is a type of game that game theory can solve.

To solve this game, we first need to think about the type of game, namely if it is a simultaneous game or a sequential game meaning that it is one in which the players effectively make their decisions at the same time or one in which the players take alternate turns to make their choices, respectively. In penalty kicks the players move simultaneously since the goalie cannot wait until the ball comes off the foot of the kicker to decide what to do. So, both players have to choose a side to play before the “game” starts. We will…

View original post 469 more words

Trends in College Majors Chosen by Women

Interesting infographic courtesy of Plotly; I’m really surprised by the trend in computer science.  Everything I have heard about which fields are increasingly important in our economy seemed to suggest the opposite would be the case.  I’d be interested to hear some explanations.

College Trends

Indiegogo Is Testing Optional Insurance Fees For Crowdfunded Products

I’m cautiously optimistic about how this innovation could help some legitimate projects raise funds! We’ll see soon enough.

Japan’s US$1.5 billion + United States’ US$3 billion to help developing countries leapfrog to renewables


Major news to help developing countries with the capital needed to go directly to wind and solar. By the way, the United States signed a treaty to do this back in the early 1990s, and never contributed a penny up to this point. Treaties have the force of federal law under the Constitution, don’t ya know, unless following it is something that’s optional ….

View original post

Mortgage Market Update from Calculated Risk

Calculated Risk is a blog that basically aggregates and analyzes up-to-date financial and economic data as it is released, particularly that which applies to the housing market.  The number of economic and financial metrics that are available on the internet is useful in some contexts but often feels more like a confusing, frustrating glut of information that renders answering a pithy question like “What is the rate of foreclosures like in the current housing market relative to pre-crisis times?” difficult to answer.  Trying to get beyond this issue is where I’ve found Calculated Risk really useful – relevant date for a particular issue is laid out, cited, and analyzed clearly in an effective and timely fashion.

I was curious about the housing market after meeting a seemingly overzealous realtor on the train, and here’s what I found via calculated risk.


At the end of Q3 2014, the delinquency rate on 1 to 4 unit residential properties was 5.85% of all loans outstanding, down for the 6th consecutive quarter and the lowest rate since the end of 2007.  The delinquency rate does not include loans in foreclosure, though they as well are at their lowest rate since the 4th quarter of ’07 at just under 2.5%.  Though foreclosures have come down from the stratospheric levels reached during their peak in 2010, they’re still more common than they were before the crisis.  Mortgages that are 30 and 60 days past due, on the other hand, have returned to approximately pre-crisis levels.  

Evernote Camera Roll 20141116 042456

Mortgage Rates 

30-year fixed rate mortgage (FRM) rates are down 1 basis point (.01) from last week at 4.01%, roughly the same level as 2011 but lower than last year’s 4.46%.  Obviously there isn’t “one” mortgage rate – the rate we’re talking about here is the one that applies to the most creditworthy borrowers in the best scenario possibly to receive a loan from the bank.  Though all other mortgages are based on this rate, it’s not exactly a rate one should expect to be offered by a bank.

Evernote Camera Roll 20141116 043840

The relatively small difference between a mortgage quoted at 4.01% and 4.45% has a surprisingly large financial impact on the 30 year FRM.  A $250,000, 30-yr. FRM at a 4.01% nominal annual rate compounded monthly (as is typically the case) necessitates a monthly payment of $1,194.98, whereas the same mortgage at 4.45% would require a monthly payment of $1,259.30.  With the higher payment, the borrower pays an additional $23,155 in interest over the term of the mortgage.

Another post talks about subdued refinancing activity, which I’d guess is the result of relatively static mortgage rates as it’s generally only financially viable to refinance when rates have changed significantly.  Banks could also be offering fewer refinancing options after the crisis, a reasonable assumption given their cautious resumption of lending post-crisis and the role that refinancing options played in exacerbating the housing bubble.  I’m purely speculating, though, and I’ll look into this more later.

Residential Prices

A widespread slowdown in the rate of housing price increases has been steadily taking hold since February of this year.  Residential prices aren’t decreasing, they’re just rising at a slower and slower rate each month, and now sit 20% below their 2006 peak.  This is not to say we should expect or even wish that housing prices should resume at 2006 levels, as such was clearly unsustainable – furthermore, though slow relative to preceding months, the (annualized) 6%+ experienced last month is still pretty strong and obviously outpaces inflation.


Evernote Camera Roll 20141116 050031

Open source projects for neuroscience!

Systematic Investor

Systematic Investor Blog

Introduction to Data Science, Columbia University

Blog to document and reflect on Columbia Data Science Class

Heuristic Andrew

Good-enough solutions for an imperfect world

"History doesn't repeat itself but it does rhyme"

My Blog

take a minute, have a seat, look around

Data Until I Die!

Data for Life :)

R Statistics and Programming

Resources and Information About R Statistics and Programming

Models are illuminating and wrong

A data scientist discussing his journey in the analytics profession

Xi'an's Og

an attempt at bloggin, nothing more...

Practical Vision Science

Vision science, open science and data analysis

Big Data Econometrics

Small posts about Big Data.

Simon Ouderkirk

Remote Work, Small Data, Digital Hospitality. Work from home, see the world.


Quantitative research, trading strategy ideas, and backtesting for the FX and equity markets


I can't get no

The Optimal Casserole

No Line Is Ever Pointless

SOA Exam P / CAS Exam 1

Preparing for Exam P / Exam 1 thru Problem Solving


Mathematical statistics for the layman.