90% of the T Distribution

ibobev1 pts0 comments

90 % of the t distribution

William Sealy Gosset was great. He improved beer at Guinness by using the<br>statistics that existed at the time. Not happy with that, he invented new<br>statistics to brew even better beer. The things he invented are used all over<br>the place now, but Guinness wanted to keep him a secret weapon, so they made him<br>publish his results under the fake name Student.

One thing Gosset realised is that it is wrong to compute 90 % confidence<br>intervals for the mean by taking the standard deviation of the sample, and<br>assume a normal distribution, like-a-so:

\[\hat{\mu} \pm 1.645 \hat{\sigma}\]

When we do this we get too narrow a range, because while we recognise<br>\(\hat{\mu}\) is just an approximation, we are assuming we know \(\sigma =<br>\hat{\sigma}\) with certainty!

Gosset came up with correction tables based on the number of samples used in the<br>estimation of the confidence interval, to account for our uncertainty in the<br>estimation of \(\hat{\sigma}\). Here are some useful values, rounded to be easier<br>to memorise:

Number of samples<br>Correction factor for 90 % interval

1.5×

1.3×

6–8<br>1.2×

9–20<br>1.1×

To use this table, count how many samples the estimation of the standard<br>deviation is based on, multiply the estimation of the standard deviation<br>\(\hat{\sigma}\) with the correction factor, and then multiply again with 1.645 to<br>get a 90 % interval. If the number of samples is greater than 20, the naïve<br>estimation of the standard deviation is good enough for a 90 % interval.

Thus, if we have 7 samples and these have lead us to estimate a mean of 32<br>minutes with a standard deviation of 8 minutes, we should not think of the 90 %<br>confidence interval as

\[ 32 \pm 8×1.645\]

but rather as

\[32 \pm 8×1.2×1.645\]

Already with 7 samples, the actual 90 % confidence interval is fairly close to<br>the naïve one, being only a factor of 1.2 too narrow. With fewer samples, the<br>uncertainty in the standard deviation is larger, so we should estimate a<br>similarly wider confidence interval.11 A stronger confidence interval, like<br>the 95 % or even 99 % interval will be correspondingly much wider after the<br>Student t correction.

This is the table for 90 % intervals because that&rsquo;s what I need most often.<br>Gosset didn&rsquo;t actually come up with any specific approximation table; he came up<br>with the entire Student&rsquo;s t distribution which lets us create any table of<br>correction factors we need.

Variation from just two values

Although the above table is what you need for getting a 90 % confidence interval, we<br>can also use a similar technique to get a sloppy estimation of the standard<br>deviation based on just two samples. The sample standard deviation of two values<br>is given by

\[\frac{\left(\mathrm{high} - \mathrm{low}\right)}{\sqrt{2}}\]

This massively underestimates the actual standard deviation, because it is based<br>on just two values. But one standard deviation corresponds to a t score of<br>1.846, so we can multiply the above by that, and we get a better approximation<br>of the standard deviation.

If we round the constant factors for convenience, we&rsquo;ll find that the<br>appropriate estimation of the standard deviation (corrected through the t<br>distribution) is 1.3 times the distance between the two numbers we have. That&rsquo;s<br>incredibly useful in practice!

Example of how to use it

I&rsquo;m sure you&rsquo;ve been in a situation where someone has asked something like &ldquo;Is<br>49 litres a good result?&rdquo;

You don&rsquo;t know, of course, so you ask &ldquo;Compared to what?&rdquo;

Maybe they respond &ldquo;Compared to 43 litres!&rdquo;

That sounds impressive, but you don&rsquo;t want me to chastise you, so you say, &ldquo;That<br>still tells me nothing because I don&rsquo;t know the variation inherent in the<br>process. Give me another typical result!&rdquo;

They might then say &ldquo;Uhh, 47 litres.&rdquo;

Now you let your guard down and think, &ldquo;Oh, 49 is above both the typical<br>results. Very good!&rdquo;

And then i chastise you!

So you turn on your brain instead.

You have received two typical numbers: 43 and 47. These don&rsquo;t tell you much<br>about how the inherent variation, but they do tell you a little. The distance<br>between them is four. If we multiply that by 1.3, we get our estimation of the<br>standard deviation, which is something like 5 litres. That means 49 litres is<br>less than one standard deviation away from the midpoint of 45 litres. That&rsquo;s a<br>normal result, not unusually good or bad.

standard deviation rsquo interval samples estimation

Related Articles