nest() and irr::icc()
I’m going to use intra-class correlations to demonstrate how to run an analysis on subgroups of data (because I’m constantly forgetting exactly how to do it).
library(tidyverse)
library(irr)
Load the rating data for the open-source Face Research Lab London Set.
The data set contains 1-7 attractiveness ratings from 2513 raters for the 102 faces in the set (X001:X173
).
london <- read_csv("https://ndownloader.figshare.com/files/8542045")
head(london)
## # A tibble: 6 x 105
## rater_sex rater_sexpref rater_age X001 X002 X003 X004 X005 X006 X007
## <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 female either 17 3 3 3 3 2 3 5
## 2 female either 17 5 2 3 2 1 5 6
## 3 female either 17.1 5 3 4 3 3 4 4
## 4 female either 17.1 4 6 5 5 3 4 5
## 5 female either 17.2 3 4 3 1 1 1 3
## 6 female either 17.3 6 5 5 3 7 5 6
## # … with 95 more variables: X008 <dbl>, X009 <dbl>, X010 <dbl>, X011 <dbl>,
## # X012 <dbl>, X013 <dbl>, X014 <dbl>, X016 <dbl>, X017 <dbl>, X018 <dbl>,
## # X019 <dbl>, X020 <dbl>, X021 <dbl>, X022 <dbl>, X024 <dbl>, X025 <dbl>,
## # X026 <dbl>, X027 <dbl>, X029 <dbl>, X030 <dbl>, X031 <dbl>, X032 <dbl>,
## # X033 <dbl>, X034 <dbl>, X036 <dbl>, X037 <dbl>, X038 <dbl>, X039 <dbl>,
## # X041 <dbl>, X042 <dbl>, X043 <dbl>, X044 <dbl>, X045 <dbl>, X061 <dbl>,
## # X062 <dbl>, X063 <dbl>, X064 <dbl>, X066 <dbl>, X067 <dbl>, X068 <dbl>,
## # X069 <dbl>, X070 <dbl>, X081 <dbl>, X082 <dbl>, X083 <dbl>, X086 <dbl>,
## # X087 <dbl>, X090 <dbl>, X091 <dbl>, X092 <dbl>, X094 <dbl>, X096 <dbl>,
## # X097 <dbl>, X099 <dbl>, X100 <dbl>, X101 <dbl>, X102 <dbl>, X103 <dbl>,
## # X104 <dbl>, X105 <dbl>, X107 <dbl>, X108 <dbl>, X112 <dbl>, X113 <dbl>,
## # X114 <dbl>, X115 <dbl>, X117 <dbl>, X118 <dbl>, X119 <dbl>, X120 <dbl>,
## # X121 <dbl>, X122 <dbl>, X123 <dbl>, X124 <dbl>, X125 <dbl>, X126 <dbl>,
## # X127 <dbl>, X128 <dbl>, X129 <dbl>, X130 <dbl>, X131 <dbl>, X132 <dbl>,
## # X134 <dbl>, X135 <dbl>, X136 <dbl>, X137 <dbl>, X138 <dbl>, X139 <dbl>,
## # X140 <dbl>, X141 <dbl>, X142 <dbl>, X143 <dbl>, X144 <dbl>, X172 <dbl>,
## # X173 <dbl>
To calculate the ICC for ratings, first we need to get the data into a format where each column represents a rater and each row represents a stimulus. Select just the columns with ratings, then transpose (t()
) the data.
london %>%
select(X001:X173) %>%
t() %>%
irr::icc()
## Single Score Intraclass Correlation
##
## Model: oneway
## Type : consistency
##
## Subjects = 102
## Raters = 2513
## ICC(1) = 0.24
##
## F-Test, H0: r0 = 0 ; H1: r0 > 0
## F(101,256224) = 793 , p = 0
##
## 95%-Confidence Interval for ICC Population Values:
## 0.196 < ICC < 0.298
But what if you want to do this for several subsets of the raters or stimuli? One solution is to run the code above several times, once for each subset, adding code to select and filter.
london %>%
filter(rater_sex == "male") %>%
select(X001:X173) %>%
t() %>%
irr::icc()
## Single Score Intraclass Correlation
##
## Model: oneway
## Type : consistency
##
## Subjects = 102
## Raters = 955
## ICC(1) = 0.225
##
## F-Test, H0: r0 = 0 ; H1: r0 > 0
## F(101,97308) = 279 , p = 0
##
## 95%-Confidence Interval for ICC Population Values:
## 0.183 < ICC < 0.281
london %>%
filter(rater_sex == "female") %>%
select(X001:X173) %>%
t() %>%
irr::icc()
## Single Score Intraclass Correlation
##
## Model: oneway
## Type : consistency
##
## Subjects = 102
## Raters = 1552
## ICC(1) = 0.253
##
## F-Test, H0: r0 = 0 ; H1: r0 > 0
## F(101,158202) = 526 , p = 0
##
## 95%-Confidence Interval for ICC Population Values:
## 0.207 < ICC < 0.313
But what if you want to calculate ICCs for lots of subdivisions? It’s tedious and error-prone to do each one by hand, but you can group your data into the subdivisions, nest the ratings, and map them onto a function.
First, we have to write a function that takes the data and returns a table of the stats you’re interested in. The irr::icc()
function returns a list, which won’t play well with nesting later, so we unlist()
it, transpose it so it’s a row of values, not a column, turn it back into a tibble (transposing turns it into a matrix), and select just the columns you want.
my_icc <- function(data) {
data %>%
select(X001:X173) %>% # select just the rating columns
t() %>% # transpose so columns are raters and rows are stimuli
irr::icc() %>% # calculate the ICC
unlist() %>% # turn the output list into a vector
t() %>% # transpose this vector
as_tibble() %>% # turn the vector into a table
select( # select just the columns you want
stimuli = subjects, # rename subjects to stimuli
raters,
icc = value, # rename value to icc
lbound,
ubound
) %>%
# fix column modes (unlisting turned them all into characters)
mutate_at(vars(stimuli, raters), as.integer) %>%
mutate_at(vars(icc:ubound), as.numeric)
}
Test the function on the whole dataset to check it gives you the right data.
my_icc(london)
## # A tibble: 1 x 5
## stimuli raters icc lbound ubound
## <int> <int> <dbl> <dbl> <dbl>
## 1 102 2513 0.240 0.196 0.298
Then we can group our full dataframe. Here I’ve created a new column of age group and filtered out age/sex groups with fewer than 10 raters. After you group your data, use the nest()
function to turn all the rest of the columns into a separate table for each group (stored in the column data
). Then you can map these tables onto your my_icc
function. Finally, unnest this new icc
column to re-expand your table.
london_icc_grouped <- london %>%
mutate(age_group = round(rater_age / 10)*10) %>% # create age group by decade
group_by(rater_sex, age_group) %>% # group by rater age and sex
filter(n() >= 10) %>% # remove groups smaller than 10
nest() %>% # nest the rest of the columns
mutate(icc = map(data, my_icc)) %>% # calculate ICC for each group
unnest(icc) %>% # expand the tables returned to icc
select(-data) # get rid of the data column
london_icc_grouped
## # A tibble: 10 x 7
## # Groups: rater_sex, age_group [10]
## rater_sex age_group stimuli raters icc lbound ubound
## <chr> <dbl> <int> <int> <dbl> <dbl> <dbl>
## 1 female 20 102 1035 0.253 0.207 0.313
## 2 female 30 102 317 0.257 0.211 0.319
## 3 female 40 102 123 0.264 0.216 0.327
## 4 female 50 102 54 0.255 0.206 0.319
## 5 female 60 102 20 0.271 0.215 0.342
## 6 male 20 102 478 0.211 0.171 0.265
## 7 male 30 102 253 0.252 0.206 0.312
## 8 male 40 102 119 0.217 0.175 0.274
## 9 male 50 102 74 0.267 0.218 0.332
## 10 male 60 102 27 0.245 0.194 0.311