# Confidence Intervals clarified

Confidence intervals are very important in estimation and hypothesis testing, but understanding what they are and how to calculate them can be difficult. Part of the difficulty is in sorting out the difference between population and sample statistics. A very straightforward summary is provided by some course notes from a statistics course at Yale University in the USA. The original notes may be found at

http://www.stat.yale.edu/Courses/1997-98/101/confint.htm

That final note is also below:

In this coverage of confidence interval, there are two different means and three different standard deviations. The means are that of the population, [latex]\mu=\sum x/N[/latex], and that of the sample, [latex]\overline{x}[/latex] or [latex]M=\sum x/N[/latex]. The definitions are the same for both, just that [latex]N[/latex] is that for the population in the first instance and for the sample in the second. The standard deviations have different definitions and one has a different meaning.

The first two are the standard deviation of the population

[latex]\sigma=\sqrt{\dfrac{\sum\left(x-\mu\right)^{2}}{N}}[/latex]

and the standard deviation of the sample

[latex]s=\sqrt{\dfrac{\sum\left(x-M\right)^{2}}{N-1}}[/latex].

The standard deviation of the sample is also known as the standard error of the sample, and it is a direct estimate of [latex]\sigma[/latex]. The last one is the standard deviation of the sample mean, [latex]\sigma_{M}=\sigma/\sqrt{N}[/latex], which is essentially a measure of how the sample means will vary from the population mean due to the fact it is a sample and not the whole population. Clearly [latex]\sigma_{M}[/latex] goes to zero as the sample size

approaches the population size. Given that [latex]s[/latex] is an estimate of [latex]\sigma[/latex] then [latex]\sigma_{M}\approx s/\sqrt{N}[/latex]. That is why the 95% confidence interval is either [latex]M\pm1.96\sigma/\sqrt{N}[/latex] or [latex]M\pm1.96\sigma_{M}[/latex] if [latex]\sigma[/latex] is known and [latex]M\pm t_{c}s/\sqrt{N}[/latex] or [latex]\mu\pm t_{c}\sigma_{M}[/latex], where [latex]t_{c}[/latex] is from the [latex]t[/latex] tables, if [latex]\sigma[/latex] is unknown and being estimated by [latex]s[/latex].

## Leave a Reply

You must be logged in to post a comment.