nest() and irr::icc()

I’m going to use intra-class correlations to demonstrate how to run an analysis on subgroups of data (because I’m constantly forgetting exactly how to do it).

library(tidyverse)
library(irr)

Load the rating data for the open-source Face Research Lab London Set. The data set contains 1-7 attractiveness ratings from 2513 raters for the 102 faces in the set (X001:X173).

london <- read_csv("https://ndownloader.figshare.com/files/8542045")

head(london)
## # A tibble: 6 x 105
##   rater_sex rater_sexpref rater_age  X001  X002  X003  X004  X005  X006  X007
##   <chr>     <chr>             <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 female    either             17       3     3     3     3     2     3     5
## 2 female    either             17       5     2     3     2     1     5     6
## 3 female    either             17.1     5     3     4     3     3     4     4
## 4 female    either             17.1     4     6     5     5     3     4     5
## 5 female    either             17.2     3     4     3     1     1     1     3
## 6 female    either             17.3     6     5     5     3     7     5     6
## # … with 95 more variables: X008 <dbl>, X009 <dbl>, X010 <dbl>, X011 <dbl>,
## #   X012 <dbl>, X013 <dbl>, X014 <dbl>, X016 <dbl>, X017 <dbl>, X018 <dbl>,
## #   X019 <dbl>, X020 <dbl>, X021 <dbl>, X022 <dbl>, X024 <dbl>, X025 <dbl>,
## #   X026 <dbl>, X027 <dbl>, X029 <dbl>, X030 <dbl>, X031 <dbl>, X032 <dbl>,
## #   X033 <dbl>, X034 <dbl>, X036 <dbl>, X037 <dbl>, X038 <dbl>, X039 <dbl>,
## #   X041 <dbl>, X042 <dbl>, X043 <dbl>, X044 <dbl>, X045 <dbl>, X061 <dbl>,
## #   X062 <dbl>, X063 <dbl>, X064 <dbl>, X066 <dbl>, X067 <dbl>, X068 <dbl>,
## #   X069 <dbl>, X070 <dbl>, X081 <dbl>, X082 <dbl>, X083 <dbl>, X086 <dbl>,
## #   X087 <dbl>, X090 <dbl>, X091 <dbl>, X092 <dbl>, X094 <dbl>, X096 <dbl>,
## #   X097 <dbl>, X099 <dbl>, X100 <dbl>, X101 <dbl>, X102 <dbl>, X103 <dbl>,
## #   X104 <dbl>, X105 <dbl>, X107 <dbl>, X108 <dbl>, X112 <dbl>, X113 <dbl>,
## #   X114 <dbl>, X115 <dbl>, X117 <dbl>, X118 <dbl>, X119 <dbl>, X120 <dbl>,
## #   X121 <dbl>, X122 <dbl>, X123 <dbl>, X124 <dbl>, X125 <dbl>, X126 <dbl>,
## #   X127 <dbl>, X128 <dbl>, X129 <dbl>, X130 <dbl>, X131 <dbl>, X132 <dbl>,
## #   X134 <dbl>, X135 <dbl>, X136 <dbl>, X137 <dbl>, X138 <dbl>, X139 <dbl>,
## #   X140 <dbl>, X141 <dbl>, X142 <dbl>, X143 <dbl>, X144 <dbl>, X172 <dbl>,
## #   X173 <dbl>

To calculate the ICC for ratings, first we need to get the data into a format where each column represents a rater and each row represents a stimulus. Select just the columns with ratings, then transpose (t()) the data.

london %>%
  select(X001:X173) %>%
  t() %>%
  irr::icc()
##  Single Score Intraclass Correlation
## 
##    Model: oneway 
##    Type : consistency 
## 
##    Subjects = 102 
##      Raters = 2513 
##      ICC(1) = 0.24
## 
##  F-Test, H0: r0 = 0 ; H1: r0 > 0 
## F(101,256224) = 793 , p = 0 
## 
##  95%-Confidence Interval for ICC Population Values:
##   0.196 < ICC < 0.298

But what if you want to do this for several subsets of the raters or stimuli? One solution is to run the code above several times, once for each subset, adding code to select and filter.

london %>%
  filter(rater_sex == "male") %>%
  select(X001:X173) %>%
  t() %>%
  irr::icc()
##  Single Score Intraclass Correlation
## 
##    Model: oneway 
##    Type : consistency 
## 
##    Subjects = 102 
##      Raters = 955 
##      ICC(1) = 0.225
## 
##  F-Test, H0: r0 = 0 ; H1: r0 > 0 
## F(101,97308) = 279 , p = 0 
## 
##  95%-Confidence Interval for ICC Population Values:
##   0.183 < ICC < 0.281
london %>%
  filter(rater_sex == "female") %>%
  select(X001:X173) %>%
  t() %>%
  irr::icc()
##  Single Score Intraclass Correlation
## 
##    Model: oneway 
##    Type : consistency 
## 
##    Subjects = 102 
##      Raters = 1552 
##      ICC(1) = 0.253
## 
##  F-Test, H0: r0 = 0 ; H1: r0 > 0 
## F(101,158202) = 526 , p = 0 
## 
##  95%-Confidence Interval for ICC Population Values:
##   0.207 < ICC < 0.313

But what if you want to calculate ICCs for lots of subdivisions? It’s tedious and error-prone to do each one by hand, but you can group your data into the subdivisions, nest the ratings, and map them onto a function.

First, we have to write a function that takes the data and returns a table of the stats you’re interested in. The irr::icc() function returns a list, which won’t play well with nesting later, so we unlist() it, transpose it so it’s a row of values, not a column, turn it back into a tibble (transposing turns it into a matrix), and select just the columns you want.

my_icc <- function(data) {
  data %>%
    select(X001:X173) %>% # select just the rating columns
    t() %>%               # transpose so columns are raters and rows are stimuli
    irr::icc() %>%        # calculate the ICC
    unlist() %>%          # turn the output list into a vector
    t() %>%               # transpose this vector
    as_tibble() %>%       # turn the vector into a table 
    select(               # select just the columns you want
      stimuli = subjects, # rename subjects to stimuli     
      raters, 
      icc = value,        # rename value to icc
      lbound, 
      ubound
    ) %>%
    # fix column modes (unlisting turned them all into characters)
    mutate_at(vars(stimuli, raters), as.integer) %>% 
    mutate_at(vars(icc:ubound), as.numeric)
}

Test the function on the whole dataset to check it gives you the right data.

my_icc(london)
## # A tibble: 1 x 5
##   stimuli raters   icc lbound ubound
##     <int>  <int> <dbl>  <dbl>  <dbl>
## 1     102   2513 0.240  0.196  0.298

Then we can group our full dataframe. Here I’ve created a new column of age group and filtered out age/sex groups with fewer than 10 raters. After you group your data, use the nest() function to turn all the rest of the columns into a separate table for each group (stored in the column data). Then you can map these tables onto your my_icc function. Finally, unnest this new icc column to re-expand your table.

london_icc_grouped <- london %>%
  mutate(age_group = round(rater_age / 10)*10) %>% # create age group by decade
  group_by(rater_sex, age_group) %>%               # group by rater age and sex
  filter(n() >= 10) %>%                            # remove groups smaller than 10
  nest() %>%                                       # nest the rest of the columns
  mutate(icc = map(data, my_icc)) %>%              # calculate ICC for each group
  unnest(icc) %>%                                  # expand the tables returned to icc
  select(-data)                                    # get rid of the data column
  
london_icc_grouped
## # A tibble: 10 x 7
## # Groups:   rater_sex, age_group [10]
##    rater_sex age_group stimuli raters   icc lbound ubound
##    <chr>         <dbl>   <int>  <int> <dbl>  <dbl>  <dbl>
##  1 female           20     102   1035 0.253  0.207  0.313
##  2 female           30     102    317 0.257  0.211  0.319
##  3 female           40     102    123 0.264  0.216  0.327
##  4 female           50     102     54 0.255  0.206  0.319
##  5 female           60     102     20 0.271  0.215  0.342
##  6 male             20     102    478 0.211  0.171  0.265
##  7 male             30     102    253 0.252  0.206  0.312
##  8 male             40     102    119 0.217  0.175  0.274
##  9 male             50     102     74 0.267  0.218  0.332
## 10 male             60     102     27 0.245  0.194  0.311
Lisa DeBruine
Lisa DeBruine
Professor of Psychology

Lisa DeBruine is a professor of psychology at the University of Glasgow. Her substantive research is on the social perception of faces and kinship. Her meta-science interests include team science (especially the Psychological Science Accelerator), open documentation, data simulation, web-based tools for data collection and stimulus generation, and teaching computational reproducibility.

Related