Teaching Reproducible Research

https://debruine.github.io/talks/teach-repro-2025/

Lisa DeBruine

Abstract

In this talk, I will give an overview of how the School of Psychology and Neuroscience transformed our undergraduate and postgraduate curriculum to prioritise data skills and reproducibility using open source software and teaching resources. While our ethos is agnostic to specific tools and coding languages, I will give specific examples of how we use R, RStudio, and RMarkdown/quarto to instill transferable skills and good research practice supporting computational reproducibility and literate coding.

Why Code?

Values and Priorities

UofG Values

  • ⭐️ Ambition & Excellence
  • 🔎 Curiosity & Discovery
  • 🧭 Integrity & Truth
  • 🤝Inclusive Community

“We value the quality of our research over its quantity”

“How research is done is as important as what is done”

Research Culture Priorities

  • Research Recognition
  • Collegiality
  • Research Integrity
  • Open Research
  • Career Development

Open Research

Sketch of a 'map' to open research. There is a rainbow over the Glasgow Uni Building. Each stripe is labelled: Open research has many benefits, Visibilty, Transparency & reproducibility, Collaboration, Efficient use of funds, Credit for ideas, Public confidence.

UofG Open Research Video

Open Materials/Data/Code

Ideally, give open materials a permanent reference, like a DOI.

Reproducibility

Reproducible: same data, same analysis; Replicable: different data, smae analysis; Robust: same data, different analysis; Generalisable: different data, different analysis

via the Turing Way

Error Detection

An analysis by Nuijten et al. (2016) of over 250K p-values reported in 8 major psych journals from 1985 to 2013 found that:

  • half the papers had at least one inconsistent p-value
  • 1/8 of papers had errors that could affect conclusions
  • errors more likely to be erroneously significant than not

Analysis Reproducibility

Of 35 articles published in Cognition with usable data (but no code, Hardwicke et al. (2018) found:

  • only 11 could be reproduced independently
  • 11 were reproducible with the original authors’ help
  • 13 were not reproducible even by the original authors

Code Reproducibility

Of 62 Registered Reports in psychology published from 2014–2018, 36 had data and analysis code, 31 could be run, and 21 reproduced all the main results (Obels et al, 2020)

Flowchart of sample: starting from sampling frame of 188 paper, to 79 in psychology domian, to 62 in final data set, to 36 with data and code available, to 31 with runnable scripts, to 21 with reproducible results

Key Practices

Literate Coding

An approach to programming that focuses on the creation of a document containing a mix of human-readable narrative text and machine-readable computer code.

Quarto Logo

RStudio Logo

Jupyter Logo

Markdown

### Basic Markdown

Now I can make:

* headers
* paragraphs
* lists
* styled text:
    * *italics*, **bold**
    * ^superscript^, ~subscript~ 
    * ~~strikethrough~~
    * `verbatim code`
* [links](https://psyteachr.github.io)

Basic Markdown

Now I can make:

  • headers
  • paragraphs
  • lists
  • styled text:
    • italics, bold
    • superscript, subscript
    • strikethrough
    • verbatim code
  • links

Code Chunks

You can run and/or display code and/or its output.

```{r}
norm <- rnorm(1e5, mean = 100, sd = 10)
p <- pnorm(norm, mean = 100, sd = 10)
plot(norm, p)
```

Figures

Create figures in markdown or code and reference by their label (e.g., @fig-penguins) to automatically number and link in text (Figure 1).

```{r}
#| label: fig-penguins
#| fig-cap: Penguin bill length by body mass
ggplot(penguins, aes(x = body_mass, y = bill_len, colour = species)) +
  geom_point() +
  labs(x = "Body Mass (g)", y = "Bill Length (mm)")
```

Figure 1: Penguin bill length by body mass

Tables

Create tables in markdown or code in link like figures (e.g., @tbl-mean for Table 1).

```{r}
#| label: tbl-mean
#| tbl-cap: Mean body measurements by species.
summarise(penguins, 
          across(bill_len:body_mass, \(x) mean(x, na.rm = TRUE)),
          .by = species) |>
  mutate(across(-species, \(x) signif(x, 3)))
```
Table 1: Mean body measurements by species.
species bill_len bill_dep flipper_len body_mass
Adelie 38.8 18.3 190 3700
Gentoo 47.5 15.0 217 5080
Chinstrap 48.8 18.4 196 3730

Flowcharts

See the quarto guide for more diagramming tools.

```{mermaid}
%%| label: fig-mermaid
%%| fig-cap: A sample mermaid diagram
flowchart LR
  A[Bird] --> B(Penguin)
  B --> C{Locomotion}
  C --> D[Swim]
  C --> E[Waddle]
```
flowchart LR
  A[Bird] --> B(Penguin)
  B --> C{Locomotion}
  C --> D[Swim]
  C --> E[Waddle]
Figure 2: A sample mermaid diagram

Code Annotation

Add # <1> after lines of code and a list of annotations after the code block. Set code-annotations in the YAML header to below (default), hover or select (better for touchscreens).

```{r}
penguin_measures <- penguins |>
  mutate(
    bill_ratio = bill_dep / bill_len,
    bill_area  = bill_dep * bill_len
  )
```
1
Create a new data frame called penguin_measures from penguins, and then,
2
Start the mutate() function to add new columns.
3
Create new names for the new columns (bill_ratio and bill_area), and set them equal to functions of the existing columns bill_dep and bill_len
4
Make sure to close the mutate() function.

Document Types

You can create any pandoc format with quarto (see the full list).

My favourites are reports, websites, books, and presentations (this talk was written in quarto!).

Code Review

The process of methodically and systematically checking over code–your own or someone else’s–after it has been written.

  • Is the code is legible and clear?
  • Is the analysis reproducible?
  • Are other outputs reproducible?
  • Does the code do what was intended?
  • Does the code follows best practices?

Code Review Slides/Video

How to Get Started

Embedding Data Skills in Education

Promoting reproducibility and open science requires not only teaching relevant values and practices, but also providing the skills needed for reproducible data analysis. Improving students’ data skills will also enhance their employability within and beyond the academic context.

McAleer, P., Stack, N., Woods, H., DeBruine, L. M., Paterson, H. M., Nordmann, E., … Barr, D. J. (2022). Embedding Data Skills in Research Methods Education: Preparing Students for Reproducible Research. https://doi.org/10.31234/osf.io/hq68s

Challenges

  • Upskilling Staff / Support Systems
  • Curriculum Reform
    • More than just swapping out SPSS for R
    • One-off classes versus embedding data skills
  • Choosing Software
    • R, Python, Julia or MatLab
    • IDEs, GitHub/GitLab
    • Quarto or RMarkdown/Jupyter

Resources

Thank You!

Teaching Reproducible Research

https://debruine.github.io/talks/teach-repro-2025/

Lisa DeBruine