Simulating Multiple Vectors
library(tidyverse)
library(faux)
I’m working on a package for simulations called faux. (Update: faux is now on CRAN!)
The first function, rnorm_multi
, makes multiple normally distributed vectors with specified relationships and takes the following arguments:
n
= the number of samples required (required)vars
= the number of variables to return (default =3
)cors
= the correlations among the variables (can be a single number, vars*vars matrix, vars*vars vector, or a vars*(vars-1)/2 vector; default =0
)mu
= a vector giving the means of the variables (numeric vector of length 1 or vars; default =0
)sd
= the standard deviations of the variables (numeric vector of length 1 or vars; default =1
)varnames
= optional names for the variables (string vector of length vars; default =NULL
)empirical
= logical. If true, mu, sd and cors specify the empirical not population mean, sd and covariance (default =FALSE
)
Example 1
The following example creates a 100-row dataframe of 3 columns names A
, B
, and C
, with means = 0, SDs = 1, and where rAB = 0.2, rAC = -0.5, and rBC = 0.5.
ex1 <- rnorm_multi(100, 3, c(0.2, -0.5, 0.5), varnames=c("A", "B", "C"))
Correlation Matrix of Sample Data
A | B | C | |
---|---|---|---|
A | 1.0000000 | -0.087499 | -0.1202283 |
B | -0.0874990 | 1.000000 | 0.0157210 |
C | -0.1202283 | 0.015721 | 1.0000000 |
Example 2
The following example calculates the correlation matrix, means, and SDs from the iris
dataset and uses them to simulate a dataset of 100 rows with the same parameters.
dat <- select_if(iris, is.numeric)
iris_sim <- rnorm_multi(
n = 100,
vars = ncol(dat),
r = cor(dat),
mu = summarise_all(dat, mean) %>% t(),
sd = summarise_all(dat, sd) %>% t(),
varnames = names(dat)
)
Correlation Matrix of Original Data
Sepal.Length | Sepal.Width | Petal.Length | Petal.Width | |
---|---|---|---|---|
Sepal.Length | 1.0000000 | -0.1175698 | 0.8717538 | 0.8179411 |
Sepal.Width | -0.1175698 | 1.0000000 | -0.4284401 | -0.3661259 |
Petal.Length | 0.8717538 | -0.4284401 | 1.0000000 | 0.9628654 |
Petal.Width | 0.8179411 | -0.3661259 | 0.9628654 | 1.0000000 |
Correlation Matrix of Sample Data
Sepal.Length | Sepal.Width | Petal.Length | Petal.Width | |
---|---|---|---|---|
Sepal.Length | 1.0000000 | -0.1591051 | 0.8491459 | 0.7544625 |
Sepal.Width | -0.1591051 | 1.0000000 | -0.4527400 | -0.3513351 |
Petal.Length | 0.8491459 | -0.4527400 | 1.0000000 | 0.9485627 |
Petal.Width | 0.7544625 | -0.3513351 | 0.9485627 | 1.0000000 |