Appendix E — PsyArXiv
E.1 Data
PsyArXiv data from Day 27.
E.2 Ngrams
Here is a function that will extract any number of ngram.
Code
ngrams <- function(data, col, n = 2, sw = stop_words$word) {
  word_cols <- paste0("word_", 1:n)
  data %>%
    unnest_tokens(ngram, {{col}}, token = "ngrams", n = n) %>% 
    mutate(ngram_id = row_number()) %>%
    separate_rows(ngram, sep = " ") %>%
    filter(!ngram %in% sw) %>%
    group_by(ngram_id) %>%
    filter(n() == n) %>%
    mutate(word = word_cols) %>%
    ungroup() %>%
    pivot_wider(names_from = word, values_from = ngram) %>%
    count(across(all_of(word_cols)), sort = TRUE)
}Look at the top 10 trigrams from a random 1000 entries.
| word_1 | word_2 | word_3 | n | 
|---|---|---|---|
| covid | 19 | pandemic | 24 | 
| randomized | controlled | trial | 7 | 
| autism | spectrum | disorder | 4 | 
| cross | lagged | panel | 3 | 
| pilot | randomized | controlled | 3 | 
| affective | task | switching | 2 | 
| borderline | personality | disorder | 2 | 
| cochlear | implant | listeners | 2 | 
| cognitive | reflection | test | 2 | 
| covid | 19 | lockdown | 2 | 
Calculate the bigrams for all titles.
E.3 Plot
Use the code from Day 17 to make a bigram plot.
Code
bigrams %>%
  slice_max(order_by = n, n = 20) %>%
  ggraph(layout = "kk") +
  geom_edge_link(aes(width = n), color = "white", show.legend = FALSE) +
  geom_node_label(aes(label = name), vjust = 0.5, hjust = 0.5,
                  fill = "#CA1A31", color = "white", 
                  label.padding = unit(.5, "lines"),
                  label.r = unit(.75, "lines")) +
  coord_cartesian(clip = "off") +
  theme_void() +
  theme(plot.margin = unit(rep(.5, 4), "inches"),
        plot.background = element_rect(fill = "#012C4C"))
E.4 Add Image
Add the PsyArXiv logo using magick to crop the wikipedia logo and grid to rasterize it so it can be added as an annotation.
