Being able to simulate data allows you to prep analysis scripts for pre-registration, calculate power and sensitivity for analyses that don’t have empirical methods, create reproducible examples when your data are too big or confidential to share, enhance your understanding of statistical concepts, and create demo data for teaching and tutorials. In this talk, I will introduce the basics of simulation using the R package {faux}. We will focus on simulating data from a mixed design where trials are crossed with subjects, analysing this using {lme4}, understanding how the simulation parameters correspond to the output, and using simulation to calculate power.
Why Simulate Data?
Pre-Registration
Prep analysis scripts for pre-registration or registered reports
Power
Calculate power and sensitivity for analyses that don’t have empirical methods
Reproducible Examples
Create reproducible examples when your data are too big or confidential to share
Enhance Understanding
Enhance your understanding of statistical or other complex concepts
subj_n <-100# number of subjectsitem_n <-20# number of itemsgender_prob <-c(20, 75, 5) # gender category distributionintercept <-100# model intercept (mean for control condition)cond_eff <-10# condition effect sizeerror_sd <-20# SD of trial-level error (residuals)subj_sd <-8# SD of subject-level interceptsitem_sd <-4# SD of item-level interceptssubj_cond_sd <-5# SD of subject-level condition effect sizeitem_cond_sd <-15# SD of item-level condition effect sizesubj_cors <-0.5# correlation between subject intercept and slopeitem_cors <--0.5# correlation between item intercept and slope
Linear mixed model fit by REML. t-tests use Satterthwaite's method [
lmerModLmerTest]
Formula: dv ~ cond + (1 + cond | subj) + (1 + cond | item)
Data: lmem_dat
REML criterion at convergence: 35323.9
Scaled residuals:
Min 1Q Median 3Q Max
-3.7821 -0.6676 -0.0080 0.6296 3.5944
Random effects:
Groups Name Variance Std.Dev. Corr
subj (Intercept) 51.79 7.196
cond 13.58 3.685 0.48
item (Intercept) 15.14 3.891
cond 225.11 15.004 -0.58
Residual 368.84 19.205
Number of obs: 4000, groups: subj, 100; item, 20
Fixed effects:
Estimate Std. Error df t value Pr(>|t|)
(Intercept) 100.243 1.208 41.244 82.98 < 2e-16 ***
cond 10.872 3.429 19.442 3.17 0.00493 **
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Correlation of Fixed Effects:
(Intr)
cond -0.426
Power simulation
Wrap dataset creation and analysis in a function that returns the values you care about as a data frame
Iterate this function and save the output as a data frame
Summarise the output (e.g., power, range of effect sizes)
Function Outline
sim_func <-function(iteration =0, subj_n =100, item_n =20, cond_eff =0) {# variables not in function arguments# simulate the data lmem_dat <- ...# run the analysis mod <-lmer(...)# return a table of fixed effects only broom.mixed::tidy(mod, effects ="fixed") |>mutate(iteration = iteration)}