Mathematical Statistics Lesson of the Day – Sufficient Statistics

The Chemical Statistician

Suppose that you collected data

$latex mathbf{X} = X_1, X_2, …, X_n$

in order to estimate a parameter $latex theta$.  Let $latex f_theta(x)$ be the probability density function (PDF)* for $latex X_1, X_2, …, X_n$.


$latex t = T(mathbf{X})$

be a statistic based on $latex mathbf{X}$.  Let $latex g_theta(t)$ be the PDF for $latex T(X)$.

If the conditional PDF

$latex h_theta(mathbf{X}) = f_theta(x) div g_theta[T(mathbf{X})]$

is independent of $latex theta$, then $latex T(mathbf{X})$ is a sufficient statistic for $latex theta$.  In other words,

$latex h_theta(mathbf{X}) = h(mathbf{X})$,

and $latex theta$ does not appear in $latex h(mathbf{X})$.

Intuitively, this means that $latex T(mathbf{X})$ contains everything you need to estimate $latex theta$, so knowing $latex T(mathbf{X})$ (i.e. conditioning $latex f_theta(x)$ on $latex T(mathbf{X})$) is sufficient for estimating $latex theta$.

Often, $latex T(mathbf{X})$ is a summary statistic of $latex X_1, X_2, …, X_n$, such as their

  • sample mean
  • sample median
  • sample minimum
  • sample maximum

If such a summary…

View original post 36 more words

Level Payment vs. Sinking Fund Loans

Below is a document explaining how to derive formulas for the most basic level payment and sinking fund loans. This is a simple introduction, as I’m currently working on a more detailed analysis of the benefits/drawbacks to various types of loans (including installment, variable rate, etc.) using empirical data and considering various scenarios, like the option to refinance and varying interest rates.  I used the results from my post on annuity formulas to simplify the derivation, so if you’re confused about how I got from one step to the next, check there!

Level Payment and Sinking Fund Loans

Analysis of the U.S. Output Gap by EconBrowser

I mentioned in my previous post that low inflation means substantial output gaps persist in many advanced economies.  Econbrowser’s post analyzing of the U.S. output gap is worth a read; the downside risks borne from the composition of recent economic growth and unjustified inflation concerns are also addressed.

how to forecast an election using simulation: a case study for teaching operations research

Great post about simulating election results – I might try to adapt this to investigate how unlikely yesterday’s GOP sweep was when I get the chance.  Republicans now control the Senate after winning key races in Georgia, Colorado, Iowa, and a particularly tight race in North Carolina.

Punk Rock Operations Research

After extensively blogging about the 2012 Presidential election and analytical models used to forecast the election (go here for links to some of these old posts), I decided to create a case study on Presidential election forecasting using polling data. This blog post is about this case study. I originally developed the case study for an undergraduate course on math modeling that used Palisade Decision Tools like @RISK. I retooled the spreadsheet for my undergraduate course in simulation in Spring 2014 to not rely on @RISK. All materials available in the Files tab.

The basic idea is that there are a number of mathematical models for predicting who will win the Presidential Election. The most accurate (and the most popular) use simulation to forecast the state-level outcomes based on state polls. The most sophisticated models like Nate Silver’s 538 model incorporate things such as poll biases, economic data, and momentum. I wanted to incorporate poll biases.

View original post 695 more words

Recent Developments in the World Economy

The first part of the WEO, which gives a broad overview of what’s happened since the previous WEO released in April, is (very) briefly summarized in layman’s terms below.  A technical note: any mention of rates of growth (positive and negative) refers to the annualized rate of growth of output, or GDP, in an economy (GDP isn’t the only measure of output that exists but it is what’s used here).  You can think of output, or GDP, as a measure of aggregate economic activity.  We care about growth in GDP because it leads to more employment (to meet the needs of the expansion of economic activity), and, generally speaking, a higher standard of living.  You can read a more thorough discussion of GDP growth here.

Global growth in the first half of 2014 was lower than the April WEOs projection by 0.4%.  That was the general trend, but the story varies by country:


  • Brazil – Negative growth so far this year (two consecutive quarters, which technically qualifies as a recession) due primarily to a lack of investment and confidence
  • France – No growth in output, reflecting fiscal imbalances and declining competitiveness
  • Italy – Contraction of output, albeit small, for Q1 and Q2, high unemployment (youth unemployment is at its historical peak) issues stemming from tight financial conditions (basically no credit available and thus no investment either)
  • Russia – Lack of growth is, not surprisingly, a result of insufficient investment and confidence


  • China – Relatively strong growth in Q1 despite issues in credit and housing markets that Chinese officials successfully subdued (via lowering required reserves and credit easing aimed at small and mid-size firms) for higher growth in the subsequent quarter
  • India – Stronger growth is resuming after a protracted downturn thanks primarily to much-needed investment
  • United Kingdom – Relatively strong growth (‘strong’ in comparison to what was expected in recent years, but considerably less than growth in China in India in raw number), and a strengthening labor market due to increased business investment

Investment is, unsurprisingly, prevalent in healthy economies and positively related to confidence.  If you’re surprised investors are wary of putting money into Russian markets then you must have been under a rock while Russia invaded Ukraine, and if you’re surprised about Brazil, maybe you didn’t know that it’s run by a feckless imbecile who just (barely) survived reelection.  Just as lack of investment and confidence hampers growth, India proves that  investor-friendly reforms spur investment, and the U.K. has recovered almost completely from the crisis thanks to business investment.

Those were the extremes – the rest of the world falls somewhere in the middle.  The United States economy is strengthening, but expected growth has necessarily been revised downward to adjust for the surprising contraction in the first quarter, largely a reflection of temporary factors (harsh weather, inventory accumulation in Q4 ’13, decline in exports), that won’t affect the future much.  In Japan growth continues along weak yet stable path, as the country’s enormous level of public debt inhibits its ability to grow too much despite good signs elsewhere in the economy.  Output nearly stalled in the Euro area as (mostly periphery) countries struggle to emerge from the recession, while some are achieving modest growth (Spain and Germany mainly).

Inflation is below targets in advanced economies which means they’re operating below their potential; meanwhile, inflation in emerging markets hasn’t changed.  Monetary policy is easy/accomodative in advanced economies and will continue to be as the ECB is slated to implement new policies, including targeted credit easing, and the Fed has made clear that it will aim to keep rates low for some time despite having wrapped up its asset purchase program last month.  In response to the Fed’s plans, financial conditions have eased and long term interest rates have decreased a bit, compared to data in the April WEO.  Risk premiums are low and volatility is low in advanced economies, which has some worried that risk is underpriced – but more on risk and its implications in a separate post.

So the global rate of growth or inflation or any other metric doesn’t convey much useful information because conditions are anything but  uniform across countries.  The story of the recovery is and will remain fragmented, with different problems and strengths contributing to a given market’s recovery.  That being said, all economies can expect to adjust to a level of growth that pales in comparison to the growth of the early 2000s.  Potential output, which has been revised downwards for the past 3 years, is too low for the growth rates of old to materialize.  This is due to the legacy of the recession in advanced economies, but growth-limiting structural issues also plays a role in developing economies.  For more on that, directly from the IMF, watch the short video linked below.

Deriving the Present Value and Future Value of an Annuity Immediate

Below is the derivation of the present and future value of a unit annuity immediate, or a series of $1 cash flows that occur at equal intervals of time at the end of each period.  I originally wrote this document as a review for myself in preparation for actuary exam FM/2.  The majority of questions on the exam, despite the wide array of topics covered, come down to solving for the value of some annuity.  Granted, it likely won’t be a case as simple as the one below, but many problems about loans, bonds, yield rates, and even financial derivatives biol down to an annuity problem.


Takeaways from the BLS jobs report

Ben Casselman at FiveThirtyEight provides a detailed breakdown of the BLS jobs report.  248,000 jobs were added in September, and figures for July and August were revised upward by almost 70,000.  These data are the talking points you’ll hear on the news, but they’re deficient measures of labor market health on their own.  Casselman delves into the BLS report to corroborate his stance that the report was, in fact, good news – something raw numbers of jobs added can’t do.  (Side note, why is the font on BLS reports so awful?  The color sucks too – it’s like a “my printer is almost out of ink” light grey.)  Anyway, the good:

1) The number of people who gave up on looking for work because they didn’t think any was available is down considerably – less than 700,000 in September, compared to over a million back in 2010.

2) Layoffs are at a 10 year low.

3) The (slight) majority of the unemployed either voluntarily quit their job or (re)started the job search

(1) and (3) show some confidence in the labor market. Fewer people think that a desirable job is totally unattainable given current labor market conditions, and more people are willing to voluntarily quit their jobs because they think better opportunities are out there.  These are good signs.  There are bad signs, too:

1) Many of the jobs added were in Retail, which tends to be low-paying.  More desirable sectors added relatively few jobs

2) There is still no wage growth

3) Lots of people are working part time only because they can’t find full-time employment

(1) is maybe expected, and stems from an issue that has been brewing in the U.S. economy for a while – structural unemployment.  The U.S. economy needs more people with the right skills in the right geographical areas before it can add a decent number of jobs in higher-paying sectors. (Many economists have echoed this train of thought, suggesting that structural unemployment is the driving force behind persistently high unemployment post-recession. One way to investigate this is to analyze the Beveridge Curve.)

(3) shows us that while employment has accelerated, many of those working are underemployed. (Part time workers generally don’t receive benefits – recent legislation, which you can read about here, is starting to change this, however.)  As the linked article explains, part of the increase in part-time employment could reflect better incentives for part-time work, not underemployment.  Nevertheless, while incentives could have driven the work decision of a portion of part-time workers, many indicated that the only reason they are working part-time is because full-time employment is unavailable – corroborating the underemployment suggestion.

(2) is an issue I wrote about in a previous post, and, I’d argue, the most important of the three.  There will not be sustainable growth until wages grow, and the <2% of the past year simply won’t cut it.  Furthermore, the lack of wage growth implies that there’s still plenty of slack in the labor market.

As a technical aside, below is what I mean by real wage growth, i.e. the wage growth that needs to occur before consumption can rebound and support a robust economy.  When we say real anything in economics, we mean inflation adjusted.  The real rate of wage growth is thus the inflation adjusted rate of growth of wages.  The raw, or nominal, rate of wage growth simply tells us by how much wages increased, ignoring the price level.  This is not all that useful, because wages affect consumption via the purchasing power of consumers – and if we don’t know what the inflation situation is like, we don’t know if consumers’ purchasing power increased, stayed the same, or decreased.

You could easily look up real wage growth (i.e. inflation adjusted wage growth), but for the sake of completeness here is how you can calculate the real growth in wages given the nominal rate of wage growth and a measure of inflation:

Screen Shot 2014-10-08 at 1.29.47 AM

For the nominal rate of wage growth, you could use the % change of Average Hourly Earnings (reported by the fed), and for inflation you could use the CPI % change over the same period.  These aren’t, however, the only metrics that will work – there are plenty of ways to quantify wages and inflation, each suited to a slightly different scenario.

The Poisson Distribution

A Blog on Probability and Statistics

Let $latex alpha$ be a positive constant. Consider the following probability distribution:

$latex displaystyle (1) P(X=j)=frac{e^{-alpha} alpha^j}{j!} j=0,1,2,cdots$

The above distribution is said to be a Poisson distribution with parameter $latex alpha$. The Poisson distribution is usually used to model the random number of events occurring in a fixed time interval. As will be shown below, $latex E(X)=alpha$. Thus the parameter $latex alpha$ is the rate of occurrence of the random events; it indicates on average how many events occur per unit of time. Examples of random events that may be modeled by the Poisson distribution include the number of alpha particles emitted by a radioactive substance counted in a prescribed area during a fixed period of time, the number of auto accidents in a fixed period of time or the number of losses arising from a group of insureds during a policy period.

Each of the above examples can…

View original post 704 more words

A Post on Measuring Historical Volatility

I’ve reblogged a concise yet thorough explanation of the calculation of market volatility. The post makes very clear how input parameters (weighting, time frame, etc.) affect its validity as an estimate of future market movements (link).  The phrase “Fat Tails” is often thrown around like a meaningless buzzword in financial media (Squawk Box, for example), but the concept is explained intuitively here. In a separate post, market data from the S&P500 is used to demonstrate the decay factor’s effect on log returns (link).



Say we are trying to estimate risk on a stock or a portfolio of stocks. For the purpose of this discussion, let’s say we’d like to know how far up or down we might expect to see a price move in one day.

First we need to decide how to measure the upness or downness of the prices as they vary from day to day. In other words we need to define a return. For most people this would naturally be defined as a percentage return, which is given by the formula:

$latex (p_t – p_{t-1})/p_{t-1},$

where $latex p_t$ refers to the price on day $latex t$. However, there are good reasons to define a return slightly differently, namely as a log return:

$latex mbox{log}(p_t/p_{t-1})$

If you know your power series expansions, you will quickly realize there is not much difference between these two definitions for small returns- it’s only…

View original post 807 more words

Guest post: New Federal Banking Regulations Undermine Obama Infrastructure Stance


This is a guest post by Marc Joffe, a former Senior Director at Moody’s Analytics, who founded Public Sector Credit Solutions in 2011 to educate the public about the risk – or lack of risk – in government securities. Marc published an open source government bond rating tool in 2012 and launched a transparent credit scoring platform for California cities in 2013. Currently, Marc blogs for Bitvore, a company which sifts the internet to provide market intelligence to municipal bond investors.

Obama administration officials frequently talk about the need to improve the nation’s infrastructure. Yet new regulations published by the Federal Reserve, FDIC and OCC run counter to this policy by limiting the market for municipal bonds.

On Wednesday, bank regulators published a new rule requiring large banks to hold a minimum level of high quality liquid assets (HQLAs). This requirement is intended to protect banks during a financial crisis, and thus reduce the risk…

View original post 529 more words

Open source projects for neuroscience!

Systematic Investor

Systematic Investor Blog

Introduction to Data Science, Columbia University

Blog to document and reflect on Columbia Data Science Class

Heuristic Andrew

Good-enough solutions for an imperfect world

"History doesn't repeat itself but it does rhyme"

My Blog

take a minute, have a seat, look around

Data Until I Die!

Data for Life :)

R Statistics and Programming

Resources and Information About R Statistics and Programming

Models are illuminating and wrong

Data & Machine Learning & Product

Xi'an's Og

an attempt at bloggin, nothing more...

Practical Vision Science

Vision science, open science and data analysis

Big Data Econometrics

Small posts about Big Data.

Simon Ouderkirk

Remote Work, Small Data, Digital Hospitality. Work from home, see the world.


Quantitative research, trading strategy ideas, and backtesting for the FX and equity markets


I can't get no

The Optimal Casserole

No Line Is Ever Pointless

SOA Exam P / CAS Exam 1

Preparing for Exam P / Exam 1 thru Problem Solving


Mathematical statistics for the layman.