Skip to contents

Produces a data table with the same distributions and correlations as an existing data table Only returns numeric columns and simulates all numeric variables from a continuous normal distribution (for now).

Usage

sim_df(
  data,
  n = 100,
  within = c(),
  between = c(),
  id = "id",
  dv = "value",
  empirical = FALSE,
  long = FALSE,
  seed = NULL,
  missing = FALSE,
  sep = faux_options("sep")
)

Arguments

data

the existing tbl

n

the number of samples to return per group

within

a list of the within-subject factor columns (if long format)

between

a list of the between-subject factor columns

id

the names of the column(s) for grouping observations

dv

the name of the DV (value) column

empirical

Should the returned data have these exact parameters? (versus be sampled from a population with these parameters)

long

whether to return the data table in long format

seed

DEPRECATED use set.seed() instead before running this function

missing

simulate missing data?

sep

separator for factor levels

Value

a tbl

Details

See vignette("sim_df", package = "faux") for details.

Examples

iris100 <- sim_df(iris, 100)
iris_species <- sim_df(iris, 100, between = "Species")

# set the names of within factors and (the separator character) 
# if you want to return a long version
longdf <- sim_df(iris, 
                 between = "Species", 
                 within = c("type", "dim"),
                 sep = ".",
                 long = TRUE)
                 
# or if you are simulating data from a table in long format
widedf <- sim_df(longdf, 
                 between = "Species", 
                 within = c("type", "dim"),
                 sep = ".")