Coding Schemes
library(tidyverse)
library(lmerTest)
How you choose to code categorical variables changes how you can interpret the intercept and effects of those variables. My favourite tutorial on coding schemes explains things in detail. I’m just adding some concrete examples below.
First, I simulated a data frame of 100 raters rating 100 faces each. Female faces get ratings with a mean of 6; male faces get ratings with a mean of 5 (I know, ratings are usually ordinal integers, but let’s pretend we used something like a slider). To simulate random effects, both raters and faces have random intercepts with SDs of 1.
set.seed(555) # for reproducibility; delete when running simulations
n_raters <- 100
n_faces <- 100
female_mean <- 6
male_mean <- 5
raters <- tibble(
rater_id = 1:n_raters,
rater_i = rnorm(n_raters)
)
faces <- tibble(
face_id = 1:n_faces,
face_i = rnorm(n_faces),
face_sex = rep(c("female", "male"), each = n_faces/2)
)
df <- expand.grid(
face_id = faces$face_id,
rater_id = raters$rater_id
) %>%
left_join(faces, by = "face_id") %>%
left_join(raters, by = "rater_id") %>%
mutate(
face_sex_i = ifelse(face_sex=="male", male_mean, female_mean),
error = rnorm(nrow(.)),
rating = face_i + rater_i + face_sex_i + error
)
Calculate the means and SDs of the female and male faces.
face_sex | mean | SD |
---|---|---|
female | 5.940 | 1.767 |
male | 5.114 | 1.649 |
Always graph your data to confirm you simulated it correctly.
df %>%
ggplot(aes(face_sex, rating)) +
geom_violin() +
geom_boxplot(width=0.2)
Recode face sex using treatment, sum, or effect coding.
df2 <- df %>%
mutate(
face_sex.tr = recode(face_sex, "female" = 1, "male" = 0),
face_sex.sum = recode(face_sex, "female" = -1, "male" = 1),
face_sex.e = recode(face_sex, "female" = -0.5, "male" = 0.5)
)
Now we analyse the data using each of the 4 styles of coding. I’m just going to show the table of fixed effects.
Categorical coding
m1 <- lmerTest::lmer(rating ~ face_sex +
(1 | face_id) +
(1 + face_sex | rater_id),
data = df2)
Estimate | Std. Error | df | t value | Pr(>|t|) | |
---|---|---|---|---|---|
(Intercept) | 5.940 | 0.174 | 173.360 | 34.080 | 0 |
face_sexmale | -0.826 | 0.203 | 98.586 | -4.069 | 0 |
Note that the intercept coefficient is equal to the female mean (5.94) and the effect of face sex is how much less the male mean is (5.114 - 5.94 = -0.826).
Treatment coding
m.tr <- lmerTest::lmer(rating ~ face_sex.tr +
(1 | face_id) +
(1 + face_sex.tr | rater_id),
data = df2)
Estimate | Std. Error | df | t value | Pr(>|t|) | |
---|---|---|---|---|---|
(Intercept) | 5.114 | 0.172 | 169.515 | 29.720 | 0 |
face_sex.tr | 0.826 | 0.203 | 98.611 | 4.069 | 0 |
Treatment coding is the same as categorical coding, but gives you more control over what the reference category is. Here, the reference category is male
and the “treatment” category is female
, so the intercept coefficient is equal to the male mean (5.114) and the effect of face sex is how much more the female mean is (5.94 - 5.114 = 0.826).
Sum coding
m.sum <- lmerTest::lmer(rating ~ face_sex.sum +
(1 | face_id) +
(1 + face_sex.sum | rater_id),
data = df2)
Estimate | Std. Error | df | t value | Pr(>|t|) | |
---|---|---|---|---|---|
(Intercept) | 5.527 | 0.140 | 194.675 | 39.387 | 0 |
face_sex.sum | -0.413 | 0.102 | 98.601 | -4.069 | 0 |
With sum coding, the intercept coefficient is equal to the overall mean ignoring face sex (i.e., (5.94 + 5.114)/2 = 5.527) and the effect of face sex is how much above and below that each of the two face sexes differ from the mean (i.e., (5.94 - 5.114)/2 = 0.413).
Effect coding
m.e <- lmerTest::lmer(rating ~ face_sex.e +
(1 | face_id) +
(1 + face_sex.e | rater_id),
data = df2)
Estimate | Std. Error | df | t value | Pr(>|t|) | |
---|---|---|---|---|---|
(Intercept) | 5.527 | 0.140 | 194.683 | 39.387 | 0 |
face_sex.e | -0.826 | 0.203 | 98.604 | -4.069 | 0 |
With effect coding, the intercept coefficient is the same as sum coding and the effect of face sex is how much the two face sexes differ from each other (i.e., 5.94 - 5.114 = 0.826). Note that this coefficient is double that from the sum coding.