C. Patrick Doncaster
Whenever you collect a sample
of measurements, you will want to summarise its defining characteristics. If
the data are approximately normally distributed around some central tendency,
and many types of biological data are, then three parametric statistics can
provide much of the essential information. The sample mean, , tells you what is the average measurement from your sample;
the standard deviation (SD) tells you how much variation there is in the
in the data around the sample mean; the standard error (SE) indicates
the uncertainty associated with viewing the sample mean as an estimate of the
mean of the whole population,
.
Parameter |
Description |
Example |
||
1. |
Variable |
A property that varies in a measurable way between subjects in a sample. |
Weight of seeds of the Princess Bean Phaseolus vulgaris (in: Samuels, M.L. 1991. Statistics for the Life Sciences. Macmillan). |
|
2. |
Sample |
A collection of individual observations selected by a specified procedure. In most cases the sample size is given by the number of subjects (i.e. each is measured once only). |
A sample of 25 Princess Bean seeds, selected at random from the total production of an arable field. |
WEIGHT (mg) |
3. |
Sample mean |
The sum of all observations in the sample, divided by the
size of the sample, N. The sample
mean is an estimate of the population mean, |
The sample mean This comes from a population, the total production of the
field, which follows a normal distribution and has a mean |
|
4. |
Sum of squares, |
The squared distance between each data point (Yi) and the sample mean, summed for all N data points. |
The sample sums of squares |
|
5. |
Variance, v, |
The variance in a normally distributed population is described by the average of N squared deviations from the mean. Variance usually refers to a sample, however, in which case it is calculated as the sum of squares divided by N-1 rather than N. |
The sample variance v = SS / (N - 1) = 12,928 |
|
6. |
Sample standard deviation, |
Describes the dispersion of data about the mean. It is
equal to the square root of the variance. For a large sample size, |
The sample standard deviation s = |
|
7. |
Normal distribution |
A bell-shaped frequency distribution of a continuous variable. The formula for the normal distribution contains two parameters: the mean, giving its location, and the standard deviation, giving the shape of the symmetrical 'bell'. This distribution arises commonly in nature when myriad independent forces, themselves subject to variation, combine additively to produce a central tendency. Many parametric statistics are based on the normal distribution because of this, and also its property of describing both the location (mean) and dispersion (standard deviation) of the data. Since dispersion is measured in squared deviations from the mean, it can be partitioned between sources, permitting the testing of statistical models. |
|
|
8. |
Standard error of the mean, |
Describes the uncertainty, due to sampling error, in the
mean of the data. It is calculated by dividing the standard deviation by the
square root of the sample size ( |
The standard error of the mean |
|
9. |
Confidence interval for |
Regardless of the underlying distribution of data, the
sample means from repeated random samples of size n would have a distribution
that approached normal for large n, with 95% of sample means at |
The 95% confidence intervals for m
from the sample of 25 Princess Bean seeds are at:
|