Produces a data table with the same distributions and correlations as an existing data table Only returns numeric columns and simulates all numeric variables from a continuous normal distribution (for now).
sim_df(
data,
n = 100,
within = c(),
between = c(),
id = "id",
dv = "value",
empirical = FALSE,
long = FALSE,
seed = NULL,
missing = FALSE,
sep = faux_options("sep")
)
the existing tbl
the number of samples to return per group
a list of the within-subject factor columns (if long format)
a list of the between-subject factor columns
the names of the column(s) for grouping observations
the name of the DV (value) column
Should the returned data have these exact parameters? (versus be sampled from a population with these parameters)
whether to return the data table in long format
DEPRECATED use set.seed() instead before running this function
simulate missing data?
separator for factor levels
a tbl
iris100 <- sim_df(iris, 100)
iris_species <- sim_df(iris, 100, between = "Species")
# set the names of within factors and (the separator character)
# if you want to return a long version
longdf <- sim_df(iris,
between = "Species",
within = c("type", "dim"),
sep = ".",
long = TRUE)
# or if you are simulating data from a table in long format
widedf <- sim_df(longdf,
between = "Species",
within = c("type", "dim"),
sep = ".")