Produces a data table with the same distributions and correlations as an existing data table Only returns numeric columns and simulates all numeric variables from a continuous normal distribution (for now).
Usage
sim_df(
data,
n = 100,
within = c(),
between = c(),
id = "id",
dv = "value",
empirical = FALSE,
long = faux_options("long"),
seed = NULL,
missing = FALSE,
sep = faux_options("sep")
)
Arguments
- data
the existing tbl
- n
the number of samples to return per group
- within
a list of the within-subject factor columns (if long format)
- between
a list of the between-subject factor columns
- id
the names of the column(s) for grouping observations
- dv
the name of the DV (value) column
- empirical
Should the returned data have these exact parameters? (versus be sampled from a population with these parameters)
- long
whether to return the data table in long format
- seed
DEPRECATED use set.seed() instead before running this function
- missing
simulate missing data?
- sep
separator for factor levels
Examples
iris100 <- sim_df(iris, 100)
iris_species <- sim_df(iris, 100, between = "Species")
# set the names of within factors and (the separator character)
# if you want to return a long version
longdf <- sim_df(iris,
between = "Species",
within = c("type", "dim"),
sep = ".",
long = TRUE)
# or if you are simulating data from a table in long format
widedf <- sim_df(longdf,
between = "Species",
within = c("type", "dim"),
sep = ".")