The link between error bars and statistical significance
By Dr. Harvey Motulsky
President, GraphPad Software
hmotulsky@graphpad.com
All contents are copyright © 19952002 by GraphPad Software, Inc. All rights reserved.
When
you view data in a publication or presentation, you may be tempted to
draw conclusions about the statistical significance of differences
between group means by looking at whether the error bars overlap. Let's
look at two contrasting examples.
What can you conclude when standard error bars do not overlap?
When
standard error (SE) bars do not overlap, you cannot be sure that the
difference between two means is statistically significant. Even though
the error bars do not overlap in experiment 1, the difference is not
statistically significant (P=0.09 by unpaired t test). This is also true when you compare proportions with a chisquare test.
What can you conclude when standard error bars do overlap?
No
surprises here. When SE bars overlap, (as in experiment 2) you can be
sure the difference between the two means is not statistically
significant (P>0.05).
What if you are comparing more than two groups?
Post tests following oneway ANOVA account for multiple comparisons, so they yield higher P values than t
tests comparing just two groups. So the same rules apply. If two SE
error bars overlap, you can be sure that a post test comparing those
two groups will find no statistical significance. However if two SE
error bars do not overlap, you can't tell whether a post test will, or
will not, find a statistically significant difference.
What if the error bars do not represent the SEM?
Error
bars that represent the 95% confidence interval (CI) of a mean are
wider than SE error bars  about twice as wide with large sample sizes
and even wider with small sample sizes. If 95% CI error bars do not
overlap, you can be sure the difference is statistically significant (P
< 0.05). However, the converse is not true  you may or
may not have statistical significance when the 95% confidence intervals
overlap.
Some graphs and tables show the mean with the
standard deviation (SD) rather than the SEM. The SD quantifies
variability, but does not account for sample size. To assess
statistical significance, you must take into account sample size as
well as variability. Therefore, observing whether SD error bars overlap
or not tells you nothing about whether the difference is, or is not,
statistically significant.
What if the groups were matched and analyzed with a paired t test?
All the comments above assume you are performing an unpaired t test. When you analyze matched data with a paired t
test, it doesn't matter how much scatter each group has  what matters
is the consistency of the changes or differences. Whether or not the
error bars for each group overlap tells you nothing about the P
value of a paired t test.
What if the error bars represent the confidence interval of the difference between means?
This
figure depicts two experiments, A and B. In each experiment, control
and treatment measurements were obtained. The graph shows the
difference between control and treatment for each experiment. A
positive number denotes an increase; a negative number denotes a
decrease. The error bars show 95% confidence intervals for those
differences. (Note that we are not comparing experiment A with
experiment B, but rather are asking whether each experiment shows
convincing evidence that the treatment has an effect.)
In
experiment A, the 95% confidence interval for the difference between
the two means does not include zero. Therefore you can conclude that
the P value for the comparison must be less than 0.05 and that the
difference must be statistically significant (using the traditional
0.05 cutoff). The 95% confidence interval in experiment B includes
zero, so the P value must be greater than 0.05, and you can conclude
that the difference is not statistically significant.
This rule works for both paired and unpaired t
tests. Note that the confidence interval for the difference between the
two means is computed very differently for the two tests.
The
link between error bars and statistical significance is weaker than
many wish to believe. But it is worth remembering that if two SE error
bars overlap you can conclude that the difference is not statistically
significant, but that the converse is not true.
