People seem to struggle with ANOVA in R, especially when there are factors with more than 2 category levels or within-subjects effects. The trick is to think of your ANOVA as the linear model that it is, and then all roads will unfurl before you. This is a step-by-step tutorial of how to conduct a mixed-effects ANOVA with contrast coding in R. The overall steps in the process are:
Mixed-effect ANOVA is a special case of linear mixed models (a.k.a., “multilevel models”). So, you need to load a package that can do mixed models, the most common of which are nlme (Pinheiro, Bates, DebRoy, Sarkar, and R Core Team, 2015) and lme4 (Bates, Maechler, Bolker, and Walker, 2015). This example uses the nlme package, mainly because the mixed model function in the nlme package, lme
, provides degrees of freedom and p-values - that's a cringe-worthy reason to some people, though. Anyway, load it up:
library( nlme )
I am using an example dataset on the voting patterns of U.S. legislators from each of the 50 states, collected by Luke and Krauss (2004) and shared by Luke (2004). The researchers recorded the percentage of times that each U.S. legislator voted in favour of tobacco corporations when the vote was relevant to the tobacco industry, where higher values represent more pro-tobacco voting. They also recorded the state represented by each legislator and the political party with which the legislator was affiliated. I merged these data with information on the geographical region of each state, using the regions defined by the U.S. Census Bureau [data]. Read in the data:
dataset <- read.table( url( "http://page-gould.com/r/ANOVA/Legislator_Data.csv" ), header=T, sep="," )
This is what the first 5 rows of those data look like:
legislator | state | voting | party | region |
---|---|---|---|---|
Murkowski | AK | 0.842 | Republican | West |
Stevens | AK | 0.846 | Republican | West |
Young | AK | 0.571 | Republican | West |
Sessions | AL | 0.872 | Republican | South |
Shelby | AL | 0.641 | Republican | South |
Aderholt | AL | 0.900 | Republican | South |
You may notice that our two predictors, party and region, are provided as words (e.g., “Republican”, “Northeast”). However, you can only converse with linear models using numbers, so the categorical predictors (i.e., party, region) need to be recoded numerically. If you have some nice a priori hypotheses about the ways in which the category levels will differ from one another, then you can use contrast codes to do this.
To “contrast code” a categorical variable that has k levels (e.g., for region, k = 4: Northeast, Midwest, South, West), then you will create k - 1 numeric variables called “contrast codes.” For each contrast code, you will assign numbers to the category levels that represent the comparison you want to make between the levels. Follow these three rules when assigning these numbers to contrast codes to ensure that you partition the variance properly:
The difference between the minimum and maximum numbers used in a given contrast should equal 1
The sum of the numbers used in a given contrast should equal 0
The sum of the products of the numbers assigned for contrasts across a given category level should equal 0.
For a detailed description of contrasts - and really everything I describe here - check out West, Aiken, and Krull (1996).
Contrasts are the a priori form of “simple effects tests” or “simple slope analyses.” The table below shows the orthogonal contrasts that I'd like to test for the legislator data. Contrast 1 (“between.group.contrast.1”) tests whether the South reliably votes more pro-tobacco than all other regions of the U.S. Contrast 2 (“between.group.contrast.2”) tests the hypothesis that the Northeast votes less pro-tobacco than the South, specifically, ignoring the Midwest and West regions in this comparison. Contrast 3 (“between.group.contrast.3”) tests whether the Midwest votes more pro-tobacco than the West, ignoring the Northeast and South. These are the numbers that we will ultimately substitute for the different levels of the region category, creating three new predictors in the process:
between.group.contrast.1 | between.group.contrast.2 | between.group.contrast.3 | |
---|---|---|---|
Northeast | -0.25 | -0.5 | 0.0 |
South | 0.75 | 0.5 | 0.0 |
Midwest | -0.25 | 0.0 | 0.5 |
West | -0.25 | 0.0 | -0.5 |
The categorical variable representing the within-group factor, party, needs to be contrast-coded, too. However, party only has two levels due to the U.S.'s frustrating, dual-party politics. 2-level categories can be fully represented numerically with one contrast-coded variable. With my within-group contrast, I will test the hypothesis that Republican legislators voted more favourably towards tobacco companies than Democrats. To make this contrast orthogonal, I will use the values of -½ for Democrats and ½ for Republicans.
To test these contrasts, we need to create a new predictor variable for each contrast. You can use the ifelse
function to code these contrasts, like so:
dataset <- within( dataset, {
group.id <- state
dependent.variable <- voting
within.group.contrast <- ifelse( party=="Republican", 1/2, ifelse( party=="Democrat", -1/2, NA ) )
between.group.contrast.1 <- ifelse( region=="Northeast", -1/4,
ifelse( region=="South", 3/4,
ifelse( region=="Midwest", -1/4,
ifelse( region=="West", -1/4, NA ) ) ) )
between.group.contrast.2 <- ifelse( region=="Northeast", -1/2,
ifelse( region=="South", 1/2,
ifelse( region=="Midwest", 0,
ifelse( region=="West", 0, NA ) ) ) )
between.group.contrast.3 <- ifelse( region=="Northeast", 0,
ifelse( region=="South", 0,
ifelse( region=="Midwest", 1/2,
ifelse( region=="West", -1/2, NA ) ) ) )
} )
Now you've got your contrast codes. You have one contrast that captures the 2-level within-group factor, party, and you have three contrasts that capture the 4-level between-group factor, region. You want to test for the main effects of both party and region and the interaction between party and region. So, how do you combine the single contrast code for party with the three contrast codes for region?
The answer is: the distributive property. For each factor, think of its contrast codes as a set. The set of contrasts that represent one factor are added to the model together. Contrast codes from the same factor are never multiplied together, but they can definitely be multiplied by the other terms in the model. So, the properly specified model for our data will be:
dependent.variable =
within.group.contrast
+ between.group.contrast.1
+ between.group.contrast.2
+ between.group.contrast.3
+ within.group.contrast * between.group.contrast.1
+ within.group.contrast * between.group.contrast.2
+ within.group.contrast * between.group.contrast.3
As you see, the within-group contrast is multiplied separately by each of the three between-group contrasts, and the between-group contrasts are never multiplied by each other.
In R, it is super easy to correctly specify the interaction effects for a set of contrast codes. R will distribute parameters in a formula based on parentheses. Any variables that are added together within parentheses will be distributed across any parameters that are multiplied with the set outside of the parentheses.
For example, there are three contrast codes that represent the between-group factor of state's region, between.group.contrast.1, between.group.contrast.2, and between.group.contrast.3. If you add these variables to the model in one parenthetical statement, then R will distribute the effects correctly. Thus, you can specify all the effects enumerated above in R with this formula:
dependent.variable ~ within.group.contrast * ( between.group.contrast.1 + between.group.contrast.2 + between.group.contrast.3 )
In general, using parentheses when specifying linear models in R makes many things more efficient.
Once you've coded the data, you are ready to conduct your primary analysis by specifying a linear model. I will call this model the “omnibus model.” For the legislator example, the omnibus model predicts pro-tobacco voting as a function of legislator's political party, state's region, and the interaction of party and region. To account for the multiple sources of variance that come from mixed designs, a random intercept and a random slope for the within-group factor (i.e., political party) is estimated for each state.
omnibus.model <- lme(
dependent.variable ~
within.group.contrast
* ( between.group.contrast.1 + between.group.contrast.2 + between.group.contrast.3 )
, random = ~1 + within.group.contrast | group.id, data = dataset, method = "ML" )
I used the method="ML"
argument in anticipation of the model comparison that will be going on in the next step.
Truly, that's all you have to do. Your contrasts test specific a priori hypotheses, and the t-tests, df, and p-values associated with the slopes of each contrast code in the omnibus model are the significance tests for those planned contrasts. The sign of a contrast's slope indicates whether or not you were correct about the hypothesized direction of the contrast. You can see the results of your planned contrasts by printing a summary of your omnibus model:
summary( omnibus.model )
Value | Std.Error | DF | t-value | p-value | |
---|---|---|---|---|---|
(Intercept) | 0.488 | 0.014 | 473 | 34.573 | 0.000 |
within.group.contrast | 0.533 | 0.019 | 473 | 27.890 | 0.000 |
between.group.contrast.1 | 0.041 | 0.056 | 46 | 0.718 | 0.476 |
between.group.contrast.2 | 0.235 | 0.076 | 46 | 3.098 | 0.003 |
between.group.contrast.3 | 0.029 | 0.040 | 46 | 0.722 | 0.474 |
within.group.contrast:between.group.contrast.1 | -0.270 | 0.076 | 473 | -3.526 | 0.000 |
within.group.contrast:between.group.contrast.2 | 0.163 | 0.101 | 473 | 1.610 | 0.108 |
within.group.contrast:between.group.contrast.3 | -0.067 | 0.055 | 473 | -1.207 | 0.228 |
I will interpret each of the contrasts in the model shown above after I test the overall main effects of each category and their interaction.
Many researchers would actually stop here and report this model without testing whether or not the overall effect of each category was significant. That is totally kosher. However, if you want to get all the statistics that an ANOVA would give you, then you are going to find yourself wanting an overall test for the main effects of and interaction between the categories. That is, we have three contrast codes that together represent the between-group factor, but while the model output displayed above tests specific relationships between levels of the between-group factor, it does not have a test for the between-group factor as a whole. To conduct a test for the overall effects of each factor, compare the omnibus model to a reduced model that omits a set of slopes from the model.
You will likely want to know whether your categorical predictor is significant as a whole in addition to the results of the planned contrasts. The significance of categories in ANOVA is tested by comparing the omnibus model to a reduced model that omits the terms you are targeting using a Likelihood Ratio Test. For example, if you want to know the significance of state region, then you would compare the omnibus model to a model that does not include the main effects of the contrast codes for region. If these two models are significantly different, that suggests the predictors that were omitted in the reduced model explained a significant portion of the variance in the dependent variable.
To find out the overall effect of the between-group factor, we'll compare the omnibus model to a model that does not include the three between-group contrasts as main effects. Note that the interaction terms involving the between-group contrasts are still included in the model. I am also demonstrating the use of the colon in R formula objects. Like an asterisk, a colon is used to multiply terms together, except that the colon does not automatically include all lower-order terms in the model. Thus, it is really useful for reduced model comparison. In the following model, I used it to specify the interaction terms for the within-group contrast and between-group contrasts while only including a main effect for the within-group contrast:
between.group.main.effect.model <- lme(
dependent.variable ~
within.group.contrast
+ within.group.contrast
: ( between.group.contrast.1 + between.group.contrast.2 + between.group.contrast.3 )
, random = ~1 + within.group.contrast | group.id, data = dataset, method = "ML" )
Once you have run the reduced model, you can compare it to the omnibus model using the anova
function:
between.group.main.effect.test <- anova( omnibus.model, between.group.main.effect.model )
## Model df AIC BIC logLik Test L.Ratio
## omnibus.model 1 12 -276 -224 150
## between.group.main.effect.model 2 9 -246 -208 132 1 vs 2 35.6
## p-value
## omnibus.model
## between.group.main.effect.model <.0001
There was a main effect difference in the pro-tobacco voting patterns of legislators from different regions of the U.S., X23 = 35.584, p < .001.
To find out the overall effect of the within-group factor, we'll compare the omnibus model to a model that does not include the within-subjects contrast. I'm also taking this opportunity to demonstrate the use of the minus sign in an R formula. In the following model, I first specified the full factorial model by multiplying the within-group contrast by the between-group contrasts using an asterisk. However, by then subtracting the within-group contrast in the same formula, I omit the main effect for the within-group contrast.
within.group.main.effect.model <- lme(
dependent.variable ~
within.group.contrast
* ( between.group.contrast.1 + between.group.contrast.2 + between.group.contrast.3 )
- within.group.contrast
, random = ~1 + within.group.contrast | group.id, data = dataset, method = "ML" )
Now, compare the omnibus model to the reduced model that has no main effect for the within-group contrast using the anova
function:
within.group.main.effect.test <- anova( omnibus.model, within.group.main.effect.model )
## Model df AIC BIC logLik Test L.Ratio
## omnibus.model 1 12 -276 -224 149.8
## within.group.main.effect.model 2 11 -153 -106 87.6 1 vs 2 124
## p-value
## omnibus.model
## within.group.main.effect.model <.0001
There was a significant difference between the average pro-tobacco voting patterns of legislators from the different political parties, X21 = 124.401, p < .001.
Finally, to test the cross-level interaction of legislator's political party and state's region on pro-tobacco voting, compare the omnibus model to a reduced model that has the main effects of all between-group and within-group contrasts, but none of the interaction terms.
interaction.effect.model <- lme(
dependent.variable ~
within.group.contrast
+ ( between.group.contrast.1 + between.group.contrast.2 + between.group.contrast.3 )
, random = ~1 + within.group.contrast | group.id, data = dataset, method = "ML" )
interaction.effect.test <- anova( omnibus.model, interaction.effect.model )
## Model df AIC BIC logLik Test L.Ratio p-value
## omnibus.model 1 12 -276 -224 150
## interaction.effect.model 2 9 -265 -227 142 1 vs 2 16.3 0.001
There was indeed a significant interaction between party and region, X23 = 16.3, p = 0.001. As this is a cross-level interaction, you would interpret it as suggesting that the political party of a legislator is more or less predictive of their pro-tobacco voting patterns depending on the region of the U.S. that they represent.
As you likely noticed, the resulting statistic of a Likelihood Ratio Test is a X2 statistic. You can report this X2 for the significance test, and most reviewers and editors should be cool with it, because it is the truest form for reporting the test results. Also, you really should only convert X2 to F when you have balanced designs, but the bias of the estimate isn't horrible when used in unbalanced designs - it all depends on what you are trying to do with your analysis.
All that being said, if you really want to report F-values, then X2 can be converted to an F-statistic by dividing X2 by its degrees of freedom. In a model comparison, the degrees of freedom for the X2 statistic, dfX2, is the number of predictors that were omitted from the reduced model. For example, we omitted three predictors from the omnibus model when we tested the main effect of region, so the degrees of freedom for the resulting model comparison were 3. The X2's degrees of freedom will be used as the numerator for the converted F-value, dfnumerator, and the degrees of freedom for the contrasts in the omnibus model are the denominator degrees of freedom for the converted F-value, dfdenominator. Once you have these values, then you can calculate an F-statistic for the Likelihood Ratio Test by plugging the relevant numbers into the following equation:
F( dfnumerator, dfdenominator ) = X2 / dfX2
Because the conversion is only perfect when there are an equal number of observations within each group, it's a good idea to report the p-value associated with the F-statistic and not the X2 statistic from the Likelihood Ratio Test, as these will differ slightly when used with unbalanced designs.
Here is some R code that calculates the F-statistic for the interaction effect of party and region to help you get started automating this conversion in some way:
df.numerator <- abs( diff( interaction.effect.test$df ) )
df.denominator <- summary( omnibus.model )$tTable[c("within.group.contrast:between.group.contrast.1"), "DF"]
chi.square <- interaction.effect.test$L.Ratio[2]
f.statistic <- chi.square/df.numerator
p.value <- pf( f.statistic, df.numerator, df.denominator, lower.tail = F )
paste( "F(", df.numerator, ", ", df.denominator, ") = ", round(f.statistic, 3), ", p = ", round( p.value, 3), sep="" )
## [1] "F(3, 473) = 5.433, p = 0.001"
The point of the code above is just to facilitate you converting F-values in some automatic way. The take-home message with that is: Don't do it by hand! Automate whatever you can.
The model comparisons suggested that were systematic differences in pro-tobacco voting by political party and region. The contrast codes estimated in the omnibus model show exactly how voting differed across party and region. Looking at the effects for within.group.contrast from the table above, it seems that Republicans indeed voted Pro-Tobacco more than Democrats in 1999, b = 0.533, SE = 0.019, t(473) = 27.89, p <.001. As tested by between.group.contrast.2, it also seems that legislators from southern states are more likely to vote pro-tobacco than legislators from northeastern states, b = 0.235, SE = 0.076, t(46) = 3.098, p = 0.003, but between.group.contrast.1 tells us that legislators from the South were not more pro-tobacco than the other regions of the U.S. combined, b = 0.041, SE = 0.056, t(46) = 0.718, p = 0.476, just specifically more pro-tobacco than legislators from the Northeast. However, the negative slope for the interaction between within.group.contrast and between.group.contrast.1 suggests that political party was signficantly less predictive of pro-tobacco voting for legislators from the South, b = -0.27, SE = 0.076, t(473) = -3.526, p <.001. In other words, political party affiliation is a stronger predictor of pro-tobacco voting outside of the Southern U.S. For a little context, it should be noted that tobacco farming is an agricultural product of the U.S. South and not as much states from other regions. So, it's somewhat heartening to know that allegiances to a political party are less predictive of voting records when the constituents have something riding on the line regarding the legislation.
Value | Std.Error | DF | t-value | p-value | |
---|---|---|---|---|---|
(Intercept) | 0.488 | 0.014 | 473 | 34.573 | 0.000 |
within.group.contrast | 0.533 | 0.019 | 473 | 27.890 | 0.000 |
between.group.contrast.1 | 0.041 | 0.056 | 46 | 0.718 | 0.476 |
between.group.contrast.2 | 0.235 | 0.076 | 46 | 3.098 | 0.003 |
between.group.contrast.3 | 0.029 | 0.040 | 46 | 0.722 | 0.474 |
within.group.contrast:between.group.contrast.1 | -0.270 | 0.076 | 473 | -3.526 | 0.000 |
within.group.contrast:between.group.contrast.2 | 0.163 | 0.101 | 473 | 1.610 | 0.108 |
within.group.contrast:between.group.contrast.3 | -0.067 | 0.055 | 473 | -1.207 | 0.228 |
C'est fini! This model tested the main effect of the within-group factor, party, the main effect of the between-group factor, region, and the interaction between party and region. We found statistics testing these three overall effects and then interpreted the planned contrasts. Report the results in the same form as I did here, although choose between either X2 or F (i.e., do not report both, because redundant statistics create an illusion of replicability!). Believe it or not, every time you run an ANOVA, this is what is happening behind the scenes. When you embrace the linear model nature of ANOVA, then it is easy - and perhaps even easier - to analyze ANOVAs in R.
Of course, this web page was generated in R Markdown with this R Markdown *.Rmd File to ensure I did not make some silly copy-and-paste error. To reproduce this page, download the markdown file, index.Rmd, and the CSS stylesheet, r_tutorial.css, to your working directory. Then type:
knit2html( "index.Rmd", stylesheet = "r_tutorial.css" )
[1] Multilevel Modeling. SAGE Publications, 2004. DOI:
10.4135/9781412985147.
[2] D. Bates, M. Maechler, B. Bolker and S. Walker. lme4: Linear
mixed-effects models using Eigen and S4. R package version 1.1-9.
2015.
[3] D. A. Luke and M. Krauss. “Where there\textquotesingles smoke
there\textquotesingles money”. In: American Journal of Preventive
Medicine 27.5 (Dec. 2004), pp. 363-372. DOI:
10.1016/j.amepre.2004.08.014.
[4] J. Pinheiro, D. Bates, S. DebRoy, D. Sarkar and R Core Team.
nlme: Linear and Nonlinear Mixed Effects Models. R package
version 3.1-120. 2015.
[5] S. G. West, L. S. Aiken and J. L. Krull. Experimental
Personality Designs: Analyzing Categorical by Continuous Variable
Interactions.