Codebooks in faux follow the Psych-DS 0.1.0 format, which is still in development.
When you simulate data with sim_design
, the codebook
function reads some parameters of the design.
pet_data <- sim_design(
between = list(pet = c(cat = "Cat Owners",
dog = "Dog Owners")),
n = c(4, 6),
dv = list(happy = "Happiness Score"),
id = list(id = "Subject ID"),
mu = c(10, 12),
sd = 4,
plot = FALSE
)
You can set up a codebook with the function codebook()
.
If you don’t specify the name, it defaults to the variable name
(pet_data
).
cb <- codebook(pet_data)
#> id set to dataType string
#> pet set to dataType string
#> happy set to dataType float
If you just type the codebook object into the console, you’ll see the info in JSON format like this.
cb
#> {
#> "@context": "https://schema.org/",
#> "@type": "Dataset",
#> "name": "pet_data",
#> "schemaVersion": "Psych-DS 0.1.0",
#> "variableMeasured": [
#> {
#> "@type": "PropertyValue",
#> "name": "id",
#> "description": "Subject ID",
#> "dataType": "string"
#> },
#> {
#> "@type": "PropertyValue",
#> "name": "pet",
#> "description": "pet",
#> "levels": {
#> "cat": "Cat Owners",
#> "dog": "Dog Owners"
#> },
#> "dataType": "string",
#> "levelsOrdered": false
#> },
#> {
#> "@type": "PropertyValue",
#> "name": "happy",
#> "description": "Happiness Score",
#> "dataType": "float"
#> }
#> ]
#> }
#>
If you set return
to “list”, you get the codebook in an
R list that prints like this.
cb <- codebook(pet_data, return = "list")
#> id set to dataType string
#> pet set to dataType string
#> happy set to dataType float
cb
#> Codebook for pet_data (Psych-DS 0.1.0)
#>
#> Dataset Parameters
#>
#> * name: pet_data
#> * schemaVersion: Psych-DS 0.1.0
#>
#> Column Parameters
#>
#> * id (string): Subject ID
#> * pet (string)
#> * Levels
#> * cat: Cat Owners
#> * dog: Dog Owners
#> * Ordered: FALSE
#> * happy (float): Happiness Score
But the codebook is actually a nested list formatted like this:
str(cb)
#> List of 5
#> $ @context : chr "https://schema.org/"
#> $ @type : chr "Dataset"
#> $ name : chr "pet_data"
#> $ schemaVersion : chr "Psych-DS 0.1.0"
#> $ variableMeasured:List of 3
#> ..$ :List of 4
#> .. ..$ @type : chr "PropertyValue"
#> .. ..$ name : chr "id"
#> .. ..$ description: chr "Subject ID"
#> .. ..$ dataType : chr "string"
#> ..$ :List of 6
#> .. ..$ @type : chr "PropertyValue"
#> .. ..$ name : chr "pet"
#> .. ..$ description : chr "pet"
#> .. ..$ levels :List of 2
#> .. .. ..$ cat: chr "Cat Owners"
#> .. .. ..$ dog: chr "Dog Owners"
#> .. ..$ dataType : chr "string"
#> .. ..$ levelsOrdered: logi FALSE
#> ..$ :List of 4
#> .. ..$ @type : chr "PropertyValue"
#> .. ..$ name : chr "happy"
#> .. ..$ description: chr "Happiness Score"
#> .. ..$ dataType : chr "float"
#> - attr(*, "class")= chr [1:2] "psychds_codebook" "list"
Above you saw messages about the data type that codebook guesses for
each column. You can override this by setting the values manually.
Below, we’ll create a new column called followup
consisting
of 0 and 1 values, change the data type of the column from integer to
boolean (T/F) and also set descriptions for id
,
pet
and followup
. The id
column
had a description of “Subject ID” from the sim_design
function, but properties set in using vardesc
will override
this. You can also add unobserved levels to a factor.
pet_data$followup <- sample(0:1, nrow(pet_data), TRUE)
vardesc <- list(
dataType = list(followup = "b"),
description = c(id = "Pet ID",
pet = "Pet Type",
followup = "Followed up 2 weeks later"
),
levels = list(pet = c(cat = "Cat Owners",
dog = "Dog Owners",
ferret = "Ferret Owners"),
followup = c("0" = "No", "1" = "Yes")
)
)
cb <- codebook(pet_data, name = "pets", vardesc, return = "list")
#> id set to dataType string
#> pet set to dataType string
#> happy set to dataType float
cb
#> Codebook for pets (Psych-DS 0.1.0)
#>
#> Dataset Parameters
#>
#> * name: pets
#> * schemaVersion: Psych-DS 0.1.0
#>
#> Column Parameters
#>
#> * id (string): Pet ID
#> * pet (string): Pet Type
#> * Levels
#> * cat: Cat Owners
#> * dog: Dog Owners
#> * ferret: Ferret Owners
#> * Ordered: FALSE
#> * happy (float): Happiness Score
#> * followup (bool): Followed up 2 weeks later
#> * Levels
#> * 0: No
#> * 1: Yes
#> * Ordered: FALSE
You can change the data type of a column in the codebook, but this
won’t affect the data itself unless you set the return
argument to “data”. This runs type conversion on each column and gives
you a warning if type can’t be converted.
Note how we had to change the levels for the variable
followup
because we’re converting them to boolean (logical)
values and how the names in the levels vector have to be strings.
vardesc$levels$followup <- c("FALSE" = "No", "TRUE" = "Yes")
converted_data <- codebook(pet_data, "pets", vardesc, return = "data")
#> id set to dataType string
#> pet set to dataType string
#> happy set to dataType float
#> Converting pet from int to string
#> Converting followup from int to bool
head(converted_data)
#> id pet happy followup
#> 1 S01 cat 6.013671 TRUE
#> 2 S02 cat 12.887297 TRUE
#> 3 S03 cat 7.531165 TRUE
#> 4 S04 cat 18.117566 TRUE
#> 5 S05 dog 16.261664 TRUE
#> 6 S06 dog 15.948879 FALSE
The codebook is attached to the returned converted data as an attribute and can be accessed as follows.
attr(converted_data, "codebook")
#> Codebook for pets (Psych-DS 0.1.0)
#>
#> Dataset Parameters
#>
#> * name: pets
#> * schemaVersion: Psych-DS 0.1.0
#>
#> Column Parameters
#>
#> * id (string): Pet ID
#> * pet (string): Pet Type
#> * Levels
#> * cat: Cat Owners
#> * dog: Dog Owners
#> * ferret: Ferret Owners
#> * Ordered: FALSE
#> * happy (float): Happiness Score
#> * followup (bool): Followed up 2 weeks later
#> * Levels
#> * FALSE: No
#> * TRUE: Yes
#> * Ordered: FALSE
You can set other variable parameters than name, type, description, and levels. The variable parameters that Psych-DS currently supports are: “description”, “privacy”, “dataType”, “propertyID”, “minValue”, “maxValue”, “levels”, “ordered”, “na”, “naValues”, “alternateName”, and “unitCode”. See the technical specs for descriptions of these properties. You can add custom parameters, but will get a warning.
cb <- codebook(pet_data, vardesc = list(new_param = c(id = "YES")))
#> Warning in codebook(pet_data, vardesc = list(new_param = c(id = "YES"))): The following variable properties are not standard: new_param
#> id set to dataType string
#> pet set to dataType string
#> happy set to dataType float
#> followup set to dataType int
If you have a column that is an ordered factor, the codebook will look like this:
dat <- data.frame(
initial = sample(LETTERS, 10)
)
dat$initial <- factor(dat$initial, levels = LETTERS, ordered = TRUE)
alevels <- paste("The letter", LETTERS)
names(alevels) <- LETTERS
codebook(dat, vardesc = list(levels = list(initial = alevels)), return = "list")
#> initial set to dataType string
#> Codebook for dat (Psych-DS 0.1.0)
#>
#> Dataset Parameters
#>
#> * name: dat
#> * schemaVersion: Psych-DS 0.1.0
#>
#> Column Parameters
#>
#> * initial (string)
#> * Levels
#> * A: The letter A
#> * B: The letter B
#> * C: The letter C
#> * D: The letter D
#> * E: The letter E
#> * F: The letter F
#> * G: The letter G
#> * H: The letter H
#> * I: The letter I
#> * J: The letter J
#> * K: The letter K
#> * L: The letter L
#> * M: The letter M
#> * N: The letter N
#> * O: The letter O
#> * P: The letter P
#> * Q: The letter Q
#> * R: The letter R
#> * S: The letter S
#> * T: The letter T
#> * U: The letter U
#> * V: The letter V
#> * W: The letter W
#> * X: The letter X
#> * Y: The letter Y
#> * Z: The letter Z
#> * Ordered: TRUE
You can add extra parameters to the whole data set. Psych-DS supports the following: “license”, “author”, “citation”, “funder”, “url”, “sameAs”, “keywords”, “temporalCoverage”, “spatialCoverage”, “datePublished”, “dateCreated”. As with variable parameters, you can add custom parameters and will just get a warning.
codebook(pet_data, license = "CC-BY 3.0", author = "Lisa DeBruine", source = "faux")
#> Warning in codebook(pet_data, license = "CC-BY 3.0", author = "Lisa DeBruine", : The following dataset properties are not standard: source
#> id set to dataType string
#> pet set to dataType string
#> happy set to dataType float
#> followup set to dataType int
#> {
#> "@context": "https://schema.org/",
#> "@type": "Dataset",
#> "name": "pet_data",
#> "schemaVersion": "Psych-DS 0.1.0",
#> "license": "CC-BY 3.0",
#> "author": "Lisa DeBruine",
#> "source": "faux",
#> "variableMeasured": [
#> {
#> "@type": "PropertyValue",
#> "name": "id",
#> "description": "Subject ID",
#> "dataType": "string"
#> },
#> {
#> "@type": "PropertyValue",
#> "name": "pet",
#> "description": "pet",
#> "levels": {
#> "cat": "Cat Owners",
#> "dog": "Dog Owners"
#> },
#> "dataType": "string",
#> "levelsOrdered": false
#> },
#> {
#> "@type": "PropertyValue",
#> "name": "happy",
#> "description": "Happiness Score",
#> "dataType": "float"
#> },
#> {
#> "@type": "PropertyValue",
#> "name": "followup",
#> "description": "followup",
#> "dataType": "int"
#> }
#> ]
#> }
#>
You can also add parameters as lists.
dat <- sim_design(plot= FALSE)
author_list <- list(
list(
"@type" = "Person",
"name" = "Melissa Kline"
),
list(
"@type" = "Person",
"name" = "Lisa DeBruine"
)
)
codebook(dat, return = "list",
author = author_list,
keywords = c("test", "demo"))
#> id set to dataType string
#> y set to dataType float
#> Codebook for dat (Psych-DS 0.1.0)
#>
#> Dataset Parameters
#>
#> * name: dat
#> * schemaVersion: Psych-DS 0.1.0
#> * author:
#> 1.
#> * @type: Person
#> * name: Melissa Kline
#> 2.
#> * @type: Person
#> * name: Lisa DeBruine
#> * keywords:
#> 1. test
#> 2. demo
#>
#> Column Parameters
#>
#> * id (string)
#> * y (float): value
You can run the codebook function on existing data not created in faux, but will need to manually input column descriptions and factor levels.
vardesc <- list(
description = list(
mpg = "Miles/(US) gallon",
cyl = "Number of cylinders",
disp = "Displacement (cu.in.)",
hp = "Gross horsepower",
drat = "Rear axle ratio",
wt = "Weight (1000 lbs)",
qsec = "1/4 mile time",
vs = "Engine",
am = "Transmission",
gear = "Number of forward gears",
carb = "Number of carburetors"
),
# min and max values can be set manually or from data
# min and max are often outside the observed range
minValue = list(mpg = 0, cyl = min(mtcars$cyl)),
maxValue = list(cyl = max(mtcars$cyl)),
dataType = list(
cyl = "integer",
hp = "integer",
vs = "integer",
am = "integer",
gear = "integer",
carb = "integer"
),
# supply levels to mark factors
levels = list(
vs = c("0" = "V-shaped", "1" = "straight"),
am = c("0" = "automatic", "1" = "manual")
)
)
codebook(mtcars, "Motor Trend Car Road Tests",
vardesc, return = "list")
#> mpg set to dataType float
#> disp set to dataType float
#> drat set to dataType float
#> wt set to dataType float
#> qsec set to dataType float
#> Codebook for Motor Trend Car Road Tests (Psych-DS 0.1.0)
#>
#> Dataset Parameters
#>
#> * name: Motor Trend Car Road Tests
#> * schemaVersion: Psych-DS 0.1.0
#>
#> Column Parameters
#>
#> * mpg (float): Miles/(US) gallon
#> * cyl (int): Number of cylinders
#> * disp (float): Displacement (cu.in.)
#> * hp (int): Gross horsepower
#> * drat (float): Rear axle ratio
#> * wt (float): Weight (1000 lbs)
#> * qsec (float): 1/4 mile time
#> * vs (int): Engine
#> * Levels
#> * 0: V-shaped
#> * 1: straight
#> * Ordered: FALSE
#> * am (int): Transmission
#> * Levels
#> * 0: automatic
#> * 1: manual
#> * Ordered: FALSE
#> * gear (int): Number of forward gears
#> * carb (int): Number of carburetors
There is an experimental argument to edit the codebook interactively. It runs the codebook function, then asks you to confirm types and edit descriptions. Only run this in the console, not in an RMarkdown file or a script meant to be run non-interactively.
cb <- codebook(mtcars, interactive = TRUE)