Simulate an existing dataframe

Produces a data table with the same distributions and correlations as an existing data table Only returns numeric columns and simulates all numeric variables from a continuous normal distribution (for now).

Usage

sim_df(
  data,
  n = 100,
  within = c(),
  between = c(),
  id = "id",
  dv = "value",
  empirical = FALSE,
  long = FALSE,
  seed = NULL,
  missing = FALSE,
  sep = faux_options("sep")
)

Arguments

data: the existing tbl
n: the number of samples to return per group
within: a list of the within-subject factor columns (if long format)
between: a list of the between-subject factor columns
id: the names of the column(s) for grouping observations
dv: the name of the DV (value) column
empirical: Should the returned data have these exact parameters? (versus be sampled from a population with these parameters)
long: whether to return the data table in long format
seed: DEPRECATED use set.seed() instead before running this function
missing: simulate missing data?
sep: separator for factor levels

Value

a tbl

Details

See vignette("sim_df", package = "faux") for details.

Examples

iris100 <- sim_df(iris, 100)
iris_species <- sim_df(iris, 100, between = "Species")

# set the names of within factors and (the separator character) 
# if you want to return a long version
longdf <- sim_df(iris, 
                 between = "Species", 
                 within = c("type", "dim"),
                 sep = ".",
                 long = TRUE)
                 
# or if you are simulating data from a table in long format
widedf <- sim_df(longdf, 
                 between = "Species", 
                 within = c("type", "dim"),
                 sep = ".")