It is useful to be able to simulate data with a specified structure. The faux package provides some functions to make this process easier. See the vignettes for more details.


You can install the released version of faux from CRAN with:

And the development version from GitHub with:

# install.packages("devtools")

Quick overview

Simulate data for a factorial design

See the Simulate by Design vignette for more details.

between <- list(pet = c(cat = "Cat Owners", 
                        dog = "Dog Owners"))
within <- list(time = c("morning", 
mu <- data.frame(
  cat    = c(10, 12, 14, 16),
  dog    = c(10, 15, 20, 25),
  row.names = within$time
df <- sim_design(within, between, 
                 n = 100, mu = mu, 
                 sd = 5, r = .5)

Default design plot

p1 <- plot_design(df)
p2 <- plot_design(df, "pet", "time")

cowplot::plot_grid(p1, p2, nrow = 2, align = "v")

Plot the data with different visualisations.

Simulate new data from an existing data table

See the Simulate from Existing Data vignette for more details.

new_iris <- sim_df(iris, 50, between = "Species") 

Simulated iris dataset

Simulate data for a mixed design

You can build up a cross-classified or nested mixed effects design using piped functions. See the contrasts vignette for more details.

# simulate 20 classes with 20 to 30 students per class
data <- add_random(class = 20) %>%
  add_random(student = sample(20:30, 20, replace = TRUE), 
             .nested_in = "class") %>%
  add_between(.by = "class", 
              school_type = c("private","public"), 
              .prob = c(5, 15)) %>%
  add_between(.by = "student",
              gender = c("M", "F", "NB"),
              .prob = c(.49, .49, .02))
school_type gender n
private M 66
private F 58
private NB 3
public M 173
public F 175
public NB 7

Other simulation packages

I started this project as a collection of functions I was writing to help with my own work. It’s one of many, many simulation packages in R; here are some others. I haven’t used most of them, so I can’t vouch for them, but if faux doesn’t meet your needs, one of these might.

  • SimDesign: generate, analyse and summarise data from models or probability density functions
  • simstudy: Simulation of Study Data
  • simr: Power Analysis of Generalised Linear Mixed Models by Simulation
  • simulator: streamlines the process of performing simulations by creating a common infrastructure that can be easily used and reused across projects
  • lsasim: Simulate large scale assessment data
  • simmer: Trajectory-based Discrete-Event Simulation (DES
  • parSim: Parallel Simulation Studies