A great taste in intervals

While the confidence interval wins the interval popularity sweepstakes, I've
seen it frequently misused, especially on blogs, sometimes in webapps, and even
occasionally in conference papers, in situations where it's not telling you the
right thing. A lot of the introductions to statistical intervals I can find are
a bit daunting and highly technical, so hopefully this brief, minimally
technical explanation will point someone toward the right keyword, convincing
them (you?) that they really might like a tolerance interval instead.
## Confidence intervals

## Prediction intervals

## Tolerance intervals

## Futher reading

Confidence intervals are what you see quoted with things like political polls: with 95% confidence, 57±4% of voters approve of Obama's job performance. The important thing about confidence intervals is that they only measure sampling error. That ±4% is only there because the polling firm called a random subset of Americans. If they managed to contact every single American, the error would be ±0%, because they would have an exact count of how many Americans responded each way to the survey question. So, an important feature: as sample size approaches population size (or infinite for a non-finite population), the confidence interval's size approaches zero, because you no longer have sampling error.

But consider this example I recently ran across (slightly anonymized). Say
you've collected some data on traffic times. Should you report it as an average
with a confidence interval? You could, but have to be careful about what you
think that means. If we say that a trip between some pair of points takes
44±9 minutes, where the ±9 is a 95% confidence interval, that means
something very specific: we have 95% confidence that the *average* travel
time is between 35 and 53 minutes. What this does *not* tell you, but
which was implied by this particular presentation, is that there will be a 95%
chance that you, driving that stretch tomorrow, will take between 35 and 53
minutes. That's because your travel time uncertainty isn't only due to sampling
error in the data used to make the prediction, but also to real variability,
since travel times vary trip to trip. If we sampled a ton of travel-time
data, the confidence interval would eventually collapse to near-zero, because
we would have a nearly exact estimate of the average travel time. But we still
wouldn't be able to exactly predict how long your specific trip tomorrow will
take, because trip times vary.

The next most common interval is probably the prediction interval. This is
closer to what we want, and sure *sounds* like it's the right thing, but
it quite possibly isn't what you'll want either, and many uses of it are not
quite right.

A prediction interval has the following interpretation. A 95% prediction
interval is one where, if you sample some data, construct an interval from that
data, and then sample one new data point, there is a 95% chance that the
interval will contain that data point. What's crucial here is that this is 95%
of the time you *repeat the whole procedure*. If you have an iterated
process, this will have the expected interpretation. You sample some data
points, use them to predict a new data point; sample some more, use them for
another prediction; repeat. Then, 95% of the prediction intervals will contain
their paired observation. But any *single* prediction interval from that
iterated series may cover more or less than 95% of future samples; they only
cover 95% on average. Thus we only get the 95% prediction rate through this
iterative process, which in effect lets us use the "average" interval, rather
than picking any one interval.

But what if we really want one interval? In the travel-time example, we want to be able to collect some data, then give a single interval bracketing trip times: you'll take between X1 and X2 minutes. Just picking one of the prediction intervals could be quite far off, depending on the sample size and the distribution of data points.

A tolerance interval can be thought of as a prediction interval where we also want to have some confidence that the interval itself is a "good" one, because unlike in the iterated case, we're going to be keeping this one interval and reusing it a lot.

Therefore we now have two inputs: what percentage of the population we want to cover, and how high we want our confidence in the interval itself to be. For example, if we construct a (95%,50%) tolerance interval, this will tell us, with 95% confidence, that at least half of car trips will fall within the interval. The two numbers can be varied independently to choose the desired coverage and confidence.

This is probably the interval you want to use if you're both: 1) using sampled data to make predictions; and 2) trying to capture the range of probable outcomes, such as the range of travel times a driver could expect.

Unlike with a confidence interval, it has the expected behavior if we think of the case where our sample size approaches infinity. Sampling error goes to zero, but instead of the tolerance interval going to zero (like the confidence interval does), it approaches the population percentiles. If we had a very large amount of data, so that sampling error was negligible, we could find the middle 50% of car trips by just looking at how long the 25th and 75th percentile trips in our data set took. The tolerance interval extends that natural procedure to cases where we don't have huge samples, so can't necessarily trust that the 25th percentile of our data set is particularly close to the 25th percentile of the population.

There are a number of ways to actually calculate tolerance intervals, both parametric (e.g. assuming a normal distribution) and non-parametric, none covered here (sorry!). Many statistics packages have the functionality built in; for example, you might use the R package 'tolerance'.

A somewhat more technical introduction to these not-the-confidence-interval
intervals, also lamenting their underuse, but unfortunately not freely
available online, can be found in Stephen B. Vardeman's "What about the other
intervals?" (*The American Statistician*, vol. 46, no. 3,
pp. 193–197, 1992). If you really do want a prediction interval
instead, a nice overview aimed at statistics educators can be found in Scott
Preston's "Teaching
Prediction Intervals" (*Journal of Statistics Education*, vol. 8,
no. 3, 2000). An explanation of how to compute tolerance intervals for a
normal distribution can be found here,
in the NIST's *Engineering Statistics Handbook*.

Finally, an old but quite readable and practical explanation is in the 1960
textbook *Statistics
Manual*, which can now be had for pennies. Written by three researchers
at the U.S. Naval Ordnance Test Station, it points out the important difference
between having a 99% confidence interval for where the *average* bomb will
fall, versus having a bound on where 99% of bombs will fall!