Simulating Multiple Vectors


I’m working on a package for simulations called faux. (Update: faux is now on CRAN!)

The first function, rnorm_multi, makes multiple normally distributed vectors with specified relationships and takes the following arguments:

  • n = the number of samples required (required)
  • vars = the number of variables to return (default = 3)
  • cors = the correlations among the variables (can be a single number, vars*vars matrix, vars*vars vector, or a vars*(vars-1)/2 vector; default = 0)
  • mu = a vector giving the means of the variables (numeric vector of length 1 or vars; default = 0)
  • sd = the standard deviations of the variables (numeric vector of length 1 or vars; default = 1)
  • varnames = optional names for the variables (string vector of length vars; default = NULL)
  • empirical = logical. If true, mu, sd and cors specify the empirical not population mean, sd and covariance (default = FALSE)

Example 1

The following example creates a 100-row dataframe of 3 columns names A, B, and C, with means = 0, SDs = 1, and where rAB = 0.2, rAC = -0.5, and rBC = 0.5.

ex1 <- rnorm_multi(100, 3, c(0.2, -0.5, 0.5), varnames=c("A", "B", "C"))

Correlation Matrix of Sample Data

A 1.0000000 -0.087499 -0.1202283
B -0.0874990 1.000000 0.0157210
C -0.1202283 0.015721 1.0000000

Example 2

The following example calculates the correlation matrix, means, and SDs from the iris dataset and uses them to simulate a dataset of 100 rows with the same parameters.

dat <- select_if(iris, is.numeric)

iris_sim <- rnorm_multi(
  n = 100, 
  vars = ncol(dat), 
  r = cor(dat),
  mu = summarise_all(dat, mean) %>% t(), 
  sd = summarise_all(dat, sd) %>% t(), 
  varnames = names(dat)

Correlation Matrix of Original Data

Sepal.Length Sepal.Width Petal.Length Petal.Width
Sepal.Length 1.0000000 -0.1175698 0.8717538 0.8179411
Sepal.Width -0.1175698 1.0000000 -0.4284401 -0.3661259
Petal.Length 0.8717538 -0.4284401 1.0000000 0.9628654
Petal.Width 0.8179411 -0.3661259 0.9628654 1.0000000

Correlation Matrix of Sample Data

Sepal.Length Sepal.Width Petal.Length Petal.Width
Sepal.Length 1.0000000 -0.1591051 0.8491459 0.7544625
Sepal.Width -0.1591051 1.0000000 -0.4527400 -0.3513351
Petal.Length 0.8491459 -0.4527400 1.0000000 0.9485627
Petal.Width 0.7544625 -0.3513351 0.9485627 1.0000000
