Simulating Multiple Vectors

R
correlated data
simulation
faux
Published

2018-12-22

library(tidyverse)
library(faux)

I’m working on a package for simulations called faux. (Update: faux is now on CRAN!)

The first function, rnorm_multi, makes multiple normally distributed vectors with specified relationships and takes the following arguments:

Example 1

The following example creates a 100-row dataframe of 3 columns names A, B, and C, with means = 0, SDs = 1, and where rAB = 0.2, rAC = -0.5, and rBC = 0.5.

ex1 <- rnorm_multi(100, 3, c(0.2, -0.5, 0.5), varnames=c("A", "B", "C"))

Correlation Matrix of Sample Data

A B C
A 1.0000000 0.0505608 0.0821478
B 0.0505608 1.0000000 -0.1303736
C 0.0821478 -0.1303736 1.0000000

Example 2

The following example calculates the correlation matrix, means, and SDs from the iris dataset and uses them to simulate a dataset of 100 rows with the same parameters.

dat <- select_if(iris, is.numeric)

iris_sim <- rnorm_multi(
  n = 100, 
  vars = ncol(dat), 
  r = cor(dat),
  mu = summarise_all(dat, mean) %>% t(), 
  sd = summarise_all(dat, sd) %>% t(), 
  varnames = names(dat)
)

Correlation Matrix of Original Data

Sepal.Length Sepal.Width Petal.Length Petal.Width
Sepal.Length 1.0000000 -0.1175698 0.8717538 0.8179411
Sepal.Width -0.1175698 1.0000000 -0.4284401 -0.3661259
Petal.Length 0.8717538 -0.4284401 1.0000000 0.9628654
Petal.Width 0.8179411 -0.3661259 0.9628654 1.0000000

Correlation Matrix of Sample Data

Sepal.Length Sepal.Width Petal.Length Petal.Width
Sepal.Length 1.0000000 -0.0829264 0.8656481 0.8150830
Sepal.Width -0.0829264 1.0000000 -0.4334276 -0.4177735
Petal.Length 0.8656481 -0.4334276 1.0000000 0.9700393
Petal.Width 0.8150830 -0.4177735 0.9700393 1.0000000