# Coding Schemes

``````library(tidyverse)
library(lmerTest)``````

How you choose to code categorical variables changes how you can interpret the intercept and effects of those variables. My favourite tutorial on coding schemes explains things in detail. I’m just adding some concrete examples below.

First, I simulated a data frame of 100 raters rating 100 faces each. Female faces get ratings with a mean of 6; male faces get ratings with a mean of 5 (I know, ratings are usually ordinal integers, but let’s pretend we used something like a slider). To simulate random effects, both raters and faces have random intercepts with SDs of 1.

``````set.seed(555)  # for reproducibility; delete when running simulations

n_raters <- 100
n_faces <- 100

female_mean <- 6
male_mean <- 5

raters <- tibble(
rater_id = 1:n_raters,
rater_i = rnorm(n_raters)
)

faces <- tibble(
face_id = 1:n_faces,
face_i = rnorm(n_faces),
face_sex = rep(c("female", "male"), each = n_faces/2)
)

df <- expand.grid(
face_id = faces\$face_id,
rater_id = raters\$rater_id
) %>%
left_join(faces, by = "face_id") %>%
left_join(raters, by = "rater_id") %>%
mutate(
face_sex_i = ifelse(face_sex=="male", male_mean, female_mean),
error = rnorm(nrow(.)),
rating = face_i + rater_i + face_sex_i + error
)``````

Calculate the means and SDs of the female and male faces.

face_sex mean SD
female 5.940 1.767
male 5.114 1.649

Always graph your data to confirm you simulated it correctly.

``````df %>%
ggplot(aes(face_sex, rating)) +
geom_violin() +
geom_boxplot(width=0.2)``````

Recode face sex using treatment, sum, or effect coding.

``````df2 <- df %>%
mutate(
face_sex.tr = recode(face_sex, "female" = 1, "male" = 0),
face_sex.sum = recode(face_sex, "female" = -1, "male" = 1),
face_sex.e = recode(face_sex, "female" = -0.5, "male" = 0.5)
)``````

Now we analyse the data using each of the 4 styles of coding. I’m just going to show the table of fixed effects.

## Categorical coding

``````m1 <- lmerTest::lmer(rating ~ face_sex +
(1 | face_id) +
(1 + face_sex | rater_id),
data = df2)``````
Estimate Std. Error df t value Pr(>|t|)
(Intercept) 5.940 0.174 173.360 34.080 0
face_sexmale -0.826 0.203 98.586 -4.069 0

Note that the intercept coefficient is equal to the female mean (5.94) and the effect of face sex is how much less the male mean is (5.114 - 5.94 = -0.826).

## Treatment coding

``````m.tr <- lmerTest::lmer(rating ~ face_sex.tr +
(1 | face_id) +
(1 + face_sex.tr | rater_id),
data = df2)``````
Estimate Std. Error df t value Pr(>|t|)
(Intercept) 5.114 0.172 169.515 29.720 0
face_sex.tr 0.826 0.203 98.611 4.069 0

Treatment coding is the same as categorical coding, but gives you more control over what the reference category is. Here, the reference category is `male` and the “treatment” category is `female`, so the intercept coefficient is equal to the male mean (5.114) and the effect of face sex is how much more the female mean is (5.94 - 5.114 = 0.826).

## Sum coding

``````m.sum <- lmerTest::lmer(rating ~ face_sex.sum +
(1 | face_id) +
(1 + face_sex.sum | rater_id),
data = df2)``````
Estimate Std. Error df t value Pr(>|t|)
(Intercept) 5.527 0.140 194.675 39.387 0
face_sex.sum -0.413 0.102 98.601 -4.069 0

With sum coding, the intercept coefficient is equal to the overall mean ignoring face sex (i.e., (5.94 + 5.114)/2 = 5.527) and the effect of face sex is how much above and below that each of the two face sexes differ from the mean (i.e., (5.94 - 5.114)/2 = 0.413).

## Effect coding

``````m.e <- lmerTest::lmer(rating ~ face_sex.e +
(1 | face_id) +
(1 + face_sex.e | rater_id),
data = df2)``````
Estimate Std. Error df t value Pr(>|t|)
(Intercept) 5.527 0.140 194.683 39.387 0
face_sex.e -0.826 0.203 98.604 -4.069 0

With effect coding, the intercept coefficient is the same as sum coding and the effect of face sex is how much the two face sexes differ from each other (i.e., 5.94 - 5.114 = 0.826). Note that this coefficient is double that from the sum coding.

##### Lisa DeBruine
###### Professor of Psychology

Lisa DeBruine is a professor of psychology at the University of Glasgow. Her substantive research is on the social perception of faces and kinship. Her meta-science interests include team science (especially the Psychological Science Accelerator), open documentation, data simulation, web-based tools for data collection and stimulus generation, and teaching computational reproducibility.