Pairwise Comparisons in SAS and SPSS

 

 

           This handout is for users of SAS or SPSS software who would like to use multiple comparison methods available in either software package to carry out pairwise comparisons. We will give a short description of available methods and, for the analyses listed below, a recommendation based on the comparisons of the procedures provided in the references listed at the bottom of the handout. Some of the methods, namely step-down Holm-Bonferroni and Holm-Sidak, are not directly available in SAS or SPSS, but can be easily implemented using results of appropriate SAS or SPSS procedures.

 

            Multiple comparisons procedures are used to control for the familywise error rate. For example, suppose that we have four groups and we want to carry out all pairwise comparisons of the group means. There are six such comparisons: 1 with 2, 1 with 3, 1 with 4, 2 with 3, 2 with 4 and 3 with 4. Such set of comparisons is called a family. If we use, for example, a t-test to compare each pair at a certain significance level ALPHA, then the probability of Type I error (incorrect rejection of the null hypothesis of equality of means) can be guaranteed not to exceed ALPHA only individually, for each pairwise comparison separately, but not for the whole family. To ensure that the probability of incorrectly rejecting the null hypothesis for any of the pairwise comparisons in the family does not exceed ALPHA, multiple comparisons methods that control the familywise error rate (FWE) need to be used.

 

            Multiple comparisons methods can be divided into two types: single-step methods, based on simultaneous confidence intervals that allow directional decisions (for example, mean of group 1 is bigger than mean of group 2), and stepwise, sequentially rejective, methods that are limited to hypothesis testing and, in most cases, do not produce simultaneous confidence intervals or lead to directional decisions.  Stepwise methods are generally more powerful than the corresponding single-step procedures. Therefore, if the hypothesis testing is the main goal of analysis and confidence intervals are not needed, the stepwise methods are preferable.

 

            There are several tests for pairwise comparisons available in SAS as well as in SPSS. They are: LSD, Bonferroni, Sidak, Scheffe, REGWQ (Ryan-Einot-Gabriel-Welch based on range), Tukey, Tukey-Kramer, Gabriel, Hochberg’s GF2, SNK (Student-Newman-Keuls), Duncan, Waller-Duncan and Dunnett. In addition,  REGWF, which is Ryan-Einot-Gabriel-Welch test based on ANOVA F, and Tukey’s-b test, are available only in SPSS, while the simulation option for computing approximations to the exact p-values for pairwise comparisons, is available in SAS. SPSS also provides tests for pairwise comparisons in one-way ANOVA with unequal group variances. The available tests are: Tamhane’s T2, Dunnett’s T3, Games-Howell and Dunnett’s C.

 

Single-step tests

-          LSD

-          Bonferroni

-          Sidak

-          Scheffe

-          Tukey

-          Tukey-Kramer

-          Hochberg’s GF2

-          Gabriel

-          Dunnett

Step-down tests

-          REGWQ, REGWF, SNK (Student-Newman-Keuls), Duncan

-          Bonferroni-Holm

-          Sidak-Holm

Bayesian Approach

              Waller-Duncan

Recommendations  

            1. Balanced one-way ANOVA, equal variances in groups

            2. Unbalanced one-way  ANOVA, equal variances in groups

            3. One-way ANOVA with unequal variances

            4. General balanced ANOVA

            5. General unbalanced fixed effect ANOVA

            6. Mixed and Repeated Measures ANOVA

            7. General case

References

 

 

Single-step tests

 

 

            The following are single-step tests that, in addition to pairwise comparisons, produce also simultaneous confidence intervals: Bonferroni, Sidak, Scheffe, Tukey, Tukey-Kramer, Gabriel, Hochberg’s GF2 and Dunnett. Below are short descriptions of these tests.

 

            LSD: The LSD (Least Significant Difference) test is a two-step test. First the ANOVA F test is performed. If it is significant at level ALPHA, then all pairwise t-tests are carried out, each at level ALPHA. If the F test is not significant, then the procedure terminates. The LSD test does not control the FWE.

 

            Bonferroni: The Bonferroni multiple comparison test is a conservative test, that is, the FWE is not exactly equal to ALPHA, but is less than ALPHA in most situations. It is easy to apply and can be used for any set of comparisons. The get the Bonferroni adjusted p-values, just multiply the ordinary, not adjusted pairwise p-values (for example,  t-test p-values for comparing two means) by the number of comparisons in the family and take the minimum of the obtained number and 1. Even though the Bonferroni test controls the FEW rate, in many situations it may be too conservative and not have enough power to detect significant differences.

 

            Sidak:  Sidak adjusted p-values are also easy to compute; the adjusted p-value is equal to 1-(1-unadjusted p-value)k , where k is the number of comparisons in the family. The Sidak test gives slightly smaller adjusted p-values than Bonferroni, but it guarantees the strict control of FWE only when the comparisons are independent as, for example, orthogonal contrasts.

 

            Scheffe:  The Scheffe test is used in ANOVA analysis (balanced, unbalanced, with covariates). It controls for the FWE for all possible contrasts, not only pairwise comparisons and is too conservative in cases when pairwise comparisons are the only comparisons of interest.

 

            Tukey: The Tukey test is based on the studentized range distribution (standardized maximum difference between the means). For one–way  balanced anova, the FWE of the Tukey test is exactly equal the assumed value of ALPHA. The Tukey test is also exact for one-way balanced anova with correlated errors when the type of correlation structure is compound symmetry.

 

            Tukey-Kramer: The Tukey-Kramer test is an extension of the Tukey test to unbalanced designs. Unlike Tukey test for balanced designs, it is not exact. The FWE of the Tukey-Kramer test may be less than ALPHA. It is less conservative for only slightly unbalanced designs and more conservative when differences among samples sizes are bigger.

 

            Hochberg’s GF2:  The GF2 test is similar to Tukey, but the critical values are based on the studentized maximum modulus distribution instead of the studentized range. For balanced or unbalanced one-way anova, its FWE does not exceed ALPHA. It is usually more conservative than the Tukey-Kramer test for unbalanced designs and it is always more conservative than the Tukey test for balanced designs.

 

            Gabriel:  Like the GF2 test, the Gabriel test is based on studentized maximum modulus. It is equivalent to the GF2 test for balanced one-way anova. For unbalanced one-way anova, it is less conservative than GF2, but its FWE may exceed ALPHA in highly unbalanced designs.

 

            Dunnett:  The Dunnett’s test is a test to use when the only pariwise comparisons of interest are comparisons with a control. It is an exact test, that is, its FWE is exactly equal to ALPHA,  for balanced as well as unbalanced one-way designs.

 

 

Step-down tests

 

 

            The following tests are stepwise tests: REGWQ, REGWF, SNK (Student-Newman-Keuls), Duncan, Tukey’s-b. These tests do not provide confidence intervals. They just divide pairwise differences into possibly overlapping groups. Means within the same group are not significantly different, those from different groups are significantly different at an assumed level ALPHA. The Bonferroni-Holm and Sidak-Holm step-down tests belong to this class of tests. They are not available in SAS or SPSS, but can be easily performed using the results printed by either software package.

 

            All tests listed above are step-down tests. They share a common testing scheme that consists of the following steps:

 

            -  First, the equality of all of the means is tested at a level ALPHAk. If the test results in a rejection, then each subset of k-1 means is tested at level ALPHAk-1; otherwise, the procedure stops.

           -  In general, if the hypothesis of equality of a set of p means is rejected at  level ALPHAp, then each subset of p-1 means is tested at the  level ALPHAp-1; otherwise, the set of p means is considered not to differ significantly and none of its subsets is tested.

            - Continue in this manner until no subsets remain to be tested.

 

Significance levels ALPHAk, ALPHAk-1, … depend on the number of comparisons and the tests.

 

 

            REGWQ, REGWF, SNK (Student-Newman-Keuls), Duncan are step-down paiwise comparison procedures for one-way balanced anova. Although these tests can be obtained in SAS as well as in SPSS for unbalanced designs (both software packages use the harmonic mean of the sample sizes as the common sample size), their use in unbalanced cases is not recommended. It is also not recommended to use the SNK (Student-Newman-Keuls) and Duncan tests since they do not control for FWE. The REGWQ and REGWF are both conservative tests for balanced designs, their FWE do not exceed ALPHA. REGWF, based on F-statistics (available only in SPSS) is more computationally intensive and somewhat more powerful than REGWQ, which is based on the maximum range distribution.

 

            Bonferroni-Holm: The biggest advantage of the Bonferroni-Holm step down test is that it does not require any assumptions (model or distribution related) and therefore can be applied to any family of pairwise comparisons. It is a conservative test, its FWE does not exceed ALPHA. Here is how it works. Suppose that there are k pairwise comparisons of interest and corresponding p-values, not adjusted for multiple comparisons, are p1, p2, … , pk. Order the p-values from the smallest to the largest, p(1), p(2), … , p(k) with the corresponding comparisons C(1), C(2), … , C(k). If p(1) > ALPHA/k, then stop, retain all hypotheses and conclude there is no evidence there are differences between means at significance level ALPHA. If p(1) <= ALPHA/k, then reject the hypothesis related to comparison C(1), conclude that the means in comparison C(1) are significantly different at level ALPHA, and go to the next step. The next step is to compare p(2) with ALPHA/(k-1). If  p(2) > ALPHA/(k-1), then stop and retain all remaining hypotheses. If p(2) <= ALPHA/(k-1), then reject the hypothesis related to comparison C(2), conclude that the means in comparison C(2) are significantly different at level ALPHA, and go to the next step. The next step is to compare p(3) with ALPHA/(k-2). If  p(3) > ALPHA/(k-2), then stop and retain all remaining hypotheses. If p(3) <= ALPHA/(k-2), then reject the hypothesis related to comparison C(3), conclude that the means in comparison C(3) are significantly different at level ALPHA, and go to the next step. Continue until the procedure requires to stop or until all p-values are compared.

 

            Sidak-Holm: The testing procedure in the Sidak-Holms method is very similar to the Bonferroni-Holms method. The only difference is that the ordered p-values are not compared with ALPHA/(k-j), but with the Sidak adjustment, 1-(1-unadjusted p-value)k-j, for j=0,1, … , k-1. The Sidak-Holms test is slightly less conservative than Bonferroni-Holms, but its control of FWE is guaranteed only for independent comparisons.

 

 

Bayesian approach

 

            Waller-Duncan test is different from all the tests mentioned above. It is based on a Bayesian approach and minimizes an additive loss function, which is a sum of loss functions for each pairwise comparison. The individual loss functions are linear with the loss equal to absolute value of the difference between means multiplied by a constant k0 if the null hypothesis was incorrectly accepted, or by a constant k1 if the alternative hypothesis was incorrectly accepted. The ratio K=k1/k0 is a measure of relative seriousness of a Type I error versus a Type II error and it has to be specified instead of the significance level ALPHA. The values of K=50, 100 and 500 roughly correspond to ALPHA = 0.10, 0.05 and 0.01, respectively (see Multiple Comparison Procedures, by Y. Hochberg and A. Tamhane for details).

 

 

            In SPSS, the tests can be chosen as an option in the analysis of variance procedures: One-way ANOVA in the Compare Means menu and in the General Linear Model. Click on the Post Hoc button to select a test. In SAS, the tests are available in PROC GLM as options in the LSMEANS and MEANS statements, and in PROC MIXED in the LSMEANS statement. Examples of sas code will be given in the sections below.

 

Recommendations

 

1. Balanced one-way ANOVA, equal variances in groups

 

           

             If all pairwise comparisons of the means are tested and confidence intervals are also required, then the Tukey test is recommended. The Tukey test is exact for balanced one-way ANOVA, that is, the FWE is exactly equal to ALPHA, and it is more powerful than available alternatives.

 

            If comparisons with a control are the only ones needed and confidence intervals are required, then the Dunnett’s test is recommended. The FWE  of the Dunnett’s test is exactly equal to ALPHA  for pairwise comparisons with a control.

 

            If all pairwise comparisons of the means are tested, but confidence intervals are not required, then the REGWQ test is recommended. The FWE of the REGWQ does not exceed ALPHA and REGWQ is more powerful than the Tukey test when estimating the amount of  the difference between means (confidence intervals) is not required.

 

            SAS code:  Suppose that the name of the grouping variable is group with values 1, 2, 3 and 4, and the name of the variable containing measurements whose group means we want to compare is y.

 

-         SAS code to obtain Tukey test with confidence intervals:

 

      proc glm;

      class group;

      model y = group;

      means group /cldiff tukey;

 

     The above code will print the 95% confidence intervals and the same letter by the means that are not significantly different. To print 99% confidence intervals, change

 

      means group /cldiff tukey;

to

      means group /cldiff tukey alpha=0.01;

 

If you want to get p-values for each comparison, change

 

      means group /cldiff tukey;

to

      means group /pdiff adjust=tukey;

 

 

-         SAS code to obtain Dunnett’s test for comparisons with group=4:

 

      proc glm;

      class group;

      model y = group;

      means group /dunnett (‘4’);

 

-         SAS code to obtain REGWQ test:

 

      proc glm;

      class group;

      model y = group;

      means group /regwq;

 

 

2. Unbalanced one-way  ANOVA, equal variances in groups

 

 

             If all pairwise comparisons of the means are tested, then the Tukey-Kramer test is recommended. The Tukey-Kramer test is not exact, but conservative for unbalanced one-way ANOVA, that is, the FWE does not exceed ALPHA (may be less than ALPHA). The Tukey-Kramer test is conservative, because the critical value used in it is not exact, but an approximation to the exact value. The approximation is quite accurate for slightly unbalanced designs and becomes less accurate when the differences in sample sizes increase. The SIMULATION option in SAS may provide a better approximation and therefore a less conservative test. The accuracy of the approximation increases with the number of samples used in the simulation.

 

            If  comparisons with a control are the only ones needed, then the Dunnett’s test is recommended. The FWE  of the Dunnett’s test is exactly equal to ALPHA  for pairwise comparisons with a control.

 

            In SPSS, the Tukey-Kramer test is obtained for unbalanced design if the Tukey option is checked in the Post Hoc menu.

            In SAS, the following code can be used to obtain the Tukey-Kramer test (assuming group is the name of a group variable and y is a variable whose group means we want to compare):

 

      proc glm;

      class group;

      model y = group;

      means group /tukey;

           

            The following program can be used to obtain the simulation test:

 

      proc glm;

      class group;

      model y = group;

      lsmeans group /pdiff cl adjust=simulate (NSAMP=100000 seed=278912);

 

where NSAMP is the number of samples used in simulations and seed is the starting seed for the random number generation. Higher values of NSAMP result in a better test, but increase the computation time.

 

    The following SAS program may be used to obtain Dunnett’s test for comparisons with a control. In the program, the control group is group=4:

 

      proc glm;

      class group;

      model y = group;

      means group /dunnett (‘4’);

 

           

 

3. One-way ANOVA with unequal variances

 

            In SAS no tests are available for pairwise comparisons for one-way anova when variances in groups are not equal.

            In SPSS, the following tests are available: Tamhane T2, Dunnett’s T3, Games and Howell and Dunnett’s C. None of these tests is exact. T2, T3 and C are conservative procedures, that is, for all of them the FWE does not exceed ALPHA. T2 is more conservative than T3, for large samples they are approximately equal. T3 is more conservative than C for large samples, while C is more conservative for smaller. The Games and Howell test is an extension of the Tukey-Kramer test to the case of unequal variances. It has higher power (narrower confidence intervals) than T2, T3 or C, but its FWE may exceed ALPHA. The Games and Howell test is most liberal (its FWE is most likely to exceed ALPHA) when the variances of the sample means, σi2 / ni, are approximately equal.

 

            Recommendation: Dunnett’s T3 or Dunnett’s C should be used for pairwise comparisons. T3 is recommended when sample sizes in groups are small, C is recommended when sample sizes are large.

 

4. General balanced ANOVA

 

            (i) Main effect models

 

            In general balanced ANOVA with main effects and no interactions, tests recommended in Section 1, One-Way balanced ANOVA, can be used. That is, the Tukey test is recommended for all pairwise comparisons if confidence intervals for mean differences are needed, and the step-down REGWQ test, when the confidence intervals are not required. For pairwise comparisons with a control, Dunnett’s test is recommended.

            To get the test in SPSS, click on Post Hoc, select the main effect of interest and the test. In SAS, the following programs can be used.

 

       proc glm;

      class group edu;

      model y = group edu;

      means group /cldiff tukey;

 

Replace the means statement with

 

  means group /dunnett (‘4’);

 

to obtain the Dunnett’s test for pairwise comparisons with the control group 4,

and  with

 

 means group /regwq;

 

to obtain the REGWQ test.

 

            (ii) Models with interactions

 

            Pairwise comparisons for main effects are not usually of interest when interactions are present in the model. If they are, then the tests described above in subsection (i) can be used. However, in many cases, different sets of comparisons related to the interactions may be of interest. SPSS does not provide any multiple comparison procedure for such comparisons. In SAS multiple comparisons tests can be easily obtained, without additional programming, only in the case when all pairwise comparisons among all combinations of the levels of variables involved in the interaction or only pairwise comparisons with a control are of interest. For example, suppose that there are two class variables, group and edu, and their interaction in the model. If we want to carry out all pairwise comparisons among all combinations of levels of group and edu, then the Tukey test can be used for it and the following program can be used to obtain it.

 

proc glm;

      class group edu;

      model y = group edu group*edu;

      lsmeans group*edu /pdiff cl adjust=tukey;

 

            For a comparison with a control, the Dunnett’s test controls the FWE exactly and can be obtained with the following program:

 

proc glm;

      class group edu;

      model y = group edu group*edu;

      lsmeans group*edu /pdiff=control (‘1’ ‘3’) cl adjust=dunnett;

 

where (group=1 and edu=’3’) is a control group.

 

            For other sets of comparisons, the Bonferroni-Holm step down test with the t-test p-values can be used. It controls for FWE. However, it may be too conservative and in some situations better, more powerful, procedures may be available (see reference 2 for sas macros), although not ready-made in SAS or SPSS.

 

 

5. General unbalanced fixed effect ANOVA

 

            SPSS does not have any FWE controlling procedures that can be used for pairwise comparisons in general unbalanced designs. SPSS users can run General Linear Model and request comparisons based on the t-test (no multiple comparison correction) by using COMPARE option in the EMMEANS statement (for comparisons related to interactions) or selecting Compare Main Effects (overall main effect comparisons) in the Options menu. Then the t-test p-values can be adjusted for multiple comparisons by applying the Bonferroni-Holm procedure. The procedure is easy to apply, but it may be overly conservative. It does not provide confidence intervals.

 

            In SAS, the Dunnett-Hsu, Tukey-Kramer, GT2 and SIMULATE options are available in the LSMEANS statement. Since it is not known if the Tukey-Kramer test controls the FWE for pairwise comparisons for general unbalanced designs (it is proven to guarantee the FWE only in some cases), a more conservative GT2 test or approximately exact SIMULATE option is recommended. For pairwise comparisons with a control, the approximately exact Dunnet-Hsu test, obtained by specifying adjust=dunnet in the lsmeans statement, or the SIMULATE  option is recommended. For example,

 

            proc glm;

            class drug disease;

            model y = drug disease drug*disease;

            lsmeans drug /pdiff cl adjust=gt2;

            lsmeans drug*disease /pdiff cl adjust=gt2;

*or to compare with a control level defined as level 1 for drug and level 2 for;

* disease;

             lsmeans drug*disease /pdiff=control('1' '2') cl

                                                 adjust=gt2;

 

            proc glm;

      class drug disease;

      model y = drug disease drug*disease;

      lsmeans drug /pdiff cl adjust=simulate(seed=198351, acc=0.001);

      lsmeans drug*disease /pdiff cl adjust=simulate(seed=198351 acc=0.001);

 

*or to compare with a control level defined as level 1 for drug and *level 2 for disease;

     lsmeans drug*disease /pdiff=control('1' '2') cl

                                      adjust=dunnett;

 

     lsmeans drug*disease /pdiff=control('1' '2') cl

                        adjust=simulate(seed=198351 acc=0.001 cvadjust);

            

            For sets of comparisons that do not include all pairwise comparisons or comparisons with a control, the Bonferroni-Holm step down test with the t-test p-values can be carried out. The t-test p-values can be obtained with the adjust=t statement. For example,

 

       lsmeans drug*disease /pdiff adjust=t;

 

            The Bonferroni-Holm test is easy to carry out, but may be too conservative. There are SAS macros evailable (2) that provide less conservative adjustments.           

 

 

6. Mixed and Repeated Measures ANOVA

 

            In balanced ANOVA with one random factor and one fixed factor, the Tukey test controls the FWE exactly and can be used for all pariwise comparisons and Dunnett’s test can be used for pairwise comparisons with a control. For example, suppose that subject is a random factor and trial a fixed factor. In SPSS, General Linear Model, Univariate, Custom Model with main effects only, select the Tukey (or Dunnett) test for trial in the Post Hoc menu. In SAS, the following program can be used to obtain the tests.

 

            Proc mixed;

             class trail subject;

             model y = trial /ddfm=satterth;

             random subject;

             lsmeans trail /cl adjust=tukey;

             lsmeans trail /pdiff=control (‘1’) cl adjust=dunnett;

             run;

 

            For general mixed and repeated measures models, SPSS does not have any procedures that control FWE. In SAS, the SIMULATE options is recommended for all pairwise comparisons. For pairwise comparisons with a control, adjust=dunnett, that gives the approximately exact Dunnett-Hsu test, or adjust=simulate can be used to control for FWE. For example,

 

            Proc mixed;

             class id t trt;

             model y = trt x;

             repeated t /type=un subject=id;

             lsmeans trt /pdiff cl adjust=simulate(seed=18713 nsamp=200000);

 

*or to compare with a control level defined as level 1 of trt

 

             lsmeans trt /pdiff=control('1'') cl

                                  adjust=dunnett;

             lsmeans trt /pdiff=control('1' ) cl

                          adjust=simulate(seed=121211 nsamp=200000);

 

            For sets of comparisons that do not include all pairwise comparisons or comparisons with a control, the Bonferroni-Holm step down test with the t-test p-values is the easiest option but may be too conservative. There are SAS macros available (2) that provide less conservative adjustments.      

 

7. General case

 

            In general case when no assumptions about the distribution or model are made, the following tests are recommended: the Bonferroni test, if confidence intervals in addition to the tests are required, and the Bonferroni-Holm step-down test, if confidence intervals are not needed. Both tests control FWE conservatively (FWE <= ALPHA).

 

           

References

 

  1. Yosef Hochberg, Ajit C. Tamhane, Multiple Comparison Procedures, John Wiley & Sons, 1987.
  2. P. H. Westfall, R. D. Tobias, D. Rom, R. D. Wolfinger, Y. Hochberg, Multiple Comparisons and Multiple Tests Using the SAS System, SAS Institute, 1999.

      3.  SAS/STAT User’s Guide, Version 8, SAS Institute 1999.