# Loops

``````library(tidyverse)
library(tictoc)``````

This tutorial will introduce a few ways you can iterate your code. We’ll use the {tictoc} package to time each method to show how they differ.

Let’s say you want to do a power analysis by simulation. You’ll need to simulate some data, run the analysis, and record the relevant p-value. And you’ll need to repeat this procedure a number of times.

Here are all of the simulation parameters you will need for the examples.

``````# predicted data parameters
n1 <- 50
m1 <- 100
sd1 <- 10
n2 <- 45
m2 <- 105
sd2 <- 11

# critical alpha for calculating power
# doesn't always have to be 0.05 (justify your alpha)
alpha <- 0.05

# number of simulation replications
n_reps <- 10000``````

## One iteration

Your first task when iterating is to sort out the code for a single iteration. Once that is bug-free and doing what you want, you can repeat it.

``````# simulate data
data1 <- rnorm(n1, m1, sd1)
data2 <- rnorm(n2, m2, sd2)

# analyse it
test <- t.test(data1, data2)

# get the relevant p-value
test\$p.value``````
``##  0.000158984``

## For loop

Repeat the code above `n_reps` times using a for loop. You have to assign the variable `i` (or whatever you want to call it) to each item in the vector `1:n_reps`, but you don’t necessarily have to use `i` in the code. Here, we use it to add the p-value to the vector `p`.

``````tic("for loop")

p <- c()

for (i in 1:n_reps) {
data1 <- rnorm(n1, m1, sd1)
data2 <- rnorm(n2, m2, sd2)

test <- t.test(data1, data2)

p[i] <- test\$p.value
}

# calculcate power as % < alpha
power <- mean(p < alpha)

toc()``````
``## for loop: 1.456 sec elapsed``

If you pre-allocate the vector `p`, this can speed up your loops. This means defining an empty vector with a length of `n_reps` before you start the loop. This saves the time it takes to delete the vector and make a new, larger one on each iteration. It doesn’t really make much difference for our simple code here.

``````tic("pre-allocate")

# pre-allocate vector for p-values
p <- vector("numeric", length = n_reps)

for (i in 1:n_reps) {
data1 <- rnorm(n1, m1, sd1)
data2 <- rnorm(n2, m2, sd2)

test <- t.test(data1, data2)

p[i] <- test\$p.value
}

power <- mean(p < alpha)

toc()``````
``## pre-allocate: 1.399 sec elapsed``

## replicate

The `replicate()` function lets you iterate the exact same code and collect the output.

``````tic("replicate")

p <- replicate(n_reps, {
data1 <- rnorm(n1, m1, sd1)
data2 <- rnorm(n2, m2, sd2)

test <- t.test(data1, data2)

test\$p.value
})

power <- mean(p < alpha)

toc()``````
``## replicate: 1.499 sec elapsed``

You can enclose the relevant code in a named function and replicate that. This can be useful if you need to run different values.

``````tic("replicate-function")

# define a simulation function
sim_func <- function(n1, m1, sd1, n2, m2, sd2) {
data1 <- rnorm(n1, m1, sd1)
data2 <- rnorm(n2, m2, sd2)

test <- t.test(data1, data2)

test\$p.value
}

# repeat the sim function with different sd1 and m2 values
p <- replicate(n_reps, sim_func(50, 100, 9.5, 45, 102, 11))

power <- mean(p < alpha)

toc()``````
``## replicate-function: 1.676 sec elapsed``

## apply

The apply functions can be used to iterate over a vector or list. Here, we’re iterating over the vector `1:n_reps`. The function needs to have an argument `i` for these values, but we don’t actually need to use `i` in the function code. Here, we don’t assign the p-value to a vector `p` inside the function, but rather return the p-value from the function and the `sapply()` function simplifies this into a vector, which is then assigned to `p`.

``````tic("apply")
p <- sapply(1:n_reps, function(i) {
data1 <- rnorm(n1, m1, sd1)
data2 <- rnorm(n2, m2, sd2)

test <- t.test(data1, data2)

test\$p.value
})

power <- mean(p < alpha)

toc()``````
``## apply: 1.756 sec elapsed``

If you want to use the apply functions with a function that takes arguments, either the first argument should be `i` for the replication index, or you can repeat the first argument `n_reps` times.

``````tic("apply-function")

sim_func <- function(i, n1, m1, sd1, n2, m2, sd2) {
data1 <- rnorm(n1, m1, sd1)
data2 <- rnorm(n2, m2, sd2)

test <- t.test(data1, data2)

test\$p.value
}

# you can only iterate over 1 argument
# the rest must be single values
p <- sapply(1:n_reps, sim_func,
n1 = 50, m1 = 100, sd1 = 9.5,
n2 = 45, m2 = 102, sd2 = 11)

power <- mean(p < alpha)

toc()``````
``## apply-function: 1.687 sec elapsed``

You can also split up the data simulation and analysis steps into two different apply functions. In the first one, you iterate over the vector `1:n_reps` and in the second, you iterate over each item in the `data` list.

``````tic("apply2")

# simulate data as a list of lists
data <- lapply(1:n_reps, function(i) {
list(
data1 = rnorm(n1, m1, sd1),
data2 = rnorm(n2, m2, sd2)
)
})

# iterate over list items
p <- sapply(data, function(d) {
test <- t.test(d\$data1, d\$data2)

test\$p.value
})

power <- mean(p < alpha)

toc()``````
``## apply2: 2.098 sec elapsed``

If you use `sapply()` for the data simulation, it will simplify the result into a matrix. Then you need to use `apply()` to iterate over the columns of the matrix.

``````tic("apply3")

# simulate data as a matrix
data <- sapply(1:n_reps, function(i) {
list(
data1 = rnorm(n1, m1, sd1),
data2 = rnorm(n2, m2, sd2)
)
})

# iterate over columns
p <- apply(data, MARGIN = 2, function(d) {
test <- t.test(d\$data1, d\$data2)

test\$p.value
})

power <- mean(p < alpha)

toc()``````
``## apply3: 1.908 sec elapsed``

The `mapply()` function takes any number of arguments (the function goes first). You can use this to iterate over different values of the arguments, but this can get a little tricky to keep track of, so I like to organise each iteration in a row of a data frame first.

``````tic("mapply")

# make a data frame with 1 row for each replicate
params <- tidyr::crossing(
rep = 1:2000,
n1 = 50,
m1 = 100,
sd1 = 10,
n2 = 45,
m2 = 100:110,
sd2 = 11
)

sim_func <- function(n1, m1, sd1, n2, m2, sd2) {
data1 <- rnorm(n1, m1, sd1)
data2 <- rnorm(n2, m2, sd2)

test <- t.test(data1, data2)

test\$p.value
}

p <- mapply(sim_func,
n1 = params\$n1, m1 = params\$m1, sd1 = params\$sd1,
n2 = params\$n2, m2 = params\$m2, sd2 = params\$sd2)

toc()``````
``## mapply: 3.284 sec elapsed``
``````# add p to the params table and
# calculate power for each param combo
power <- params |>
mutate(p = p) |>
group_by(n1, n2, m1, m2, sd1, sd2) |>
summarise(power = mean(p < alpha),
.groups = "drop")

# plot power by m2 - m1
ggplot(power, aes(x = m2 - m1, y = power)) +
geom_line() +
geom_point() +
scale_x_continuous(name = "Raw Effect Size", breaks = 0:10) +
scale_y_continuous(limits = c(0, 1)) +
theme_minimal(base_size = 14)`````` ## purrr

The {purrr} package has iteration functions that work a lot like the apply functions, with some extra helpful features (most of which we won’t explore here).

The function `map_dbl()` is like `sapply()` in that it returns a vector, but more specific in that it requires the results of the function be a double.

``````tic("map")
p <- map_dbl(1:n_reps, function(i) {
data1 <- rnorm(n1, m1, sd1)
data2 <- rnorm(n2, m2, sd2)

test <- t.test(data1, data2)

test\$p.value
})

power <- mean(p < alpha)

toc()``````
``## map: 2.142 sec elapsed``

You can also split up the data simulation and analysis like above. The `map()` function returns a list.

``````tic("map-split")

# simulate data as a list of lists
data <- map(1:n_reps, function(i) {
list(
data1 = rnorm(n1, m1, sd1),
data2 = rnorm(n2, m2, sd2)
)
})

# iterate over list items
p <- map_dbl(data, function(d) {
test <- t.test(d\$data1, d\$data2)

test\$p.value
})

power <- mean(p < alpha)

toc()``````
``## map-split: 1.763 sec elapsed``

The `pmap()` function is really useful if you want to run a simulation across a lot of different parameters. Use `tidyr::crossing()` to make a data frame with one row for each replicate. Add `...` to the arguments in the function inside `pmap()` so it can ignore any unused columns in the `params` table (e.g., `rep`).

``````tic("pmap")

# make a data frame with 1 row for each replicate
params <- tidyr::crossing(
rep = 1:n_reps,
n1 = n1,
m1 = m1,
sd1 = sd1,
n2 = n2,
m2 = m2,
sd2 = sd2
)

# simulate data as a list of lists
data <- pmap(params, function(n1, m1, sd1, n2, m2, sd2, ...) {
list(
data1 = rnorm(n1, m1, sd1),
data2 = rnorm(n2, m2, sd2)
)
})

# iterate over list items
p <- map_dbl(data, function(d) {
test <- t.test(d\$data1, d\$data2)

test\$p.value
})

power <- mean(p < alpha)

toc()``````
``## pmap: 1.826 sec elapsed``

In this way, we could explore a range of values, such as how the results change as m2 varies from 100 to 110.

``````# make a data frame with 1 row for each replicate
params <- tidyr::crossing(
rep = 1:2000,
n1 = 50,
m1 = 100,
sd1 = 10,
n2 = 45,
m2 = 100:110,
sd2 = 11
)

p <- pmap_dbl(params, function(n1, m1, sd1, n2, m2, sd2,...) {
data1 = rnorm(n1, m1, sd1)
data2 = rnorm(n2, m2, sd2)

test <- t.test(data1, data2)

test\$p.value
})

# add p to the params table and
# calculate power for each param combo
power <- params |>
mutate(p = p) |>
group_by(n1, n2, m1, m2, sd1, sd2) |>
summarise(power = mean(p < alpha),
.groups = "drop")

# plot power by m2 - m1
ggplot(power, aes(x = m2 - m1, y = power)) +
geom_line() +
geom_point() +
scale_x_continuous(name = "Raw Effect Size", breaks = 0:10) +
scale_y_continuous(limits = c(0, 1)) +
theme_minimal(base_size = 14)`````` ## foreach

If you have a ton of iterations or each is slow, you might want to run the iterations in parallel. These functions use the {foreach} package.

First, set up the basic loop using the `foreach()` function and `%do%` syntax. We’ll up the number of replications to 1e5.

``library(foreach)``
``````##
## Attaching package: 'foreach'``````
``````## The following objects are masked from 'package:purrr':
##
##     accumulate, when``````
``n_reps <- 1e5``
``````tic("foreach")

sim_func <- function(i) {
data1 = rnorm(n1, m1, sd1)
data2 = rnorm(n2, m2, sd2)

test <- t.test(data1, data2)

test\$p.value
}

p <- foreach(i= 1:n_reps) %do% sim_func(i)

power <- mean(p < alpha)

toc()``````
``## foreach: 49.59 sec elapsed``

Than change `%do%` to `%dopar%` to take advantage of parallelisation if your computer has the capacity.

``````# set up parallelisation
library(doParallel)``````
``## Loading required package: iterators``
``## Loading required package: parallel``
``````registerDoParallel()
getDoParWorkers() # find out how many workers``````
``##  4``
``````tic("foreach-parallel")

sim_func <- function(i) {
data1 = rnorm(n1, m1, sd1)
data2 = rnorm(n2, m2, sd2)

test <- t.test(data1, data2)

test\$p.value
}

p <- foreach(i= 1:n_reps) %dopar% sim_func(i)

power <- mean(p < alpha)

toc()``````
``## foreach-parallel: 26.711 sec elapsed``

Alternatively, you can use `times()` like `replicate()`.

``````tic("times-parallel")

sim_func <- function() {
data1 = rnorm(n1, m1, sd1)
data2 = rnorm(n2, m2, sd2)

test <- t.test(data1, data2)

test\$p.value
}

p <- times(n_reps) %dopar% sim_func()

power <- mean(p < alpha)

toc()``````
``## times-parallel: 20.183 sec elapsed`` ##### Lisa DeBruine
###### Professor of Psychology

Lisa DeBruine is a professor of psychology at the University of Glasgow. Her substantive research is on the social perception of faces and kinship. Her meta-science interests include team science (especially the Psychological Science Accelerator), open documentation, data simulation, web-based tools for data collection and stimulus generation, and teaching computational reproducibility.