typeof(3.14)
typeof(1L)
typeof(TRUE)
typeof("banana")
typeof(NULL)10 Datatypes, Structures, and Base Functions
10.1 Datatypes
There are 5 basic datatypes in R: double, integer, complex, logical, and character.
There is also NULL.
In R, there is no “string” datatype. Instead it is “character”.
Examples:
What are the datatypes of the following?
10.2 Datatypes and classes
We can also use the class() function:
typeof(3.14)[1] "double"
class(3.14)[1] "numeric"
typeof(1L)[1] "integer"
class(1L)[1] "integer"
typeof(TRUE)[1] "logical"
class(TRUE)[1] "logical"
typeof("banana") [1] "character"
class("banana")[1] "character"
typeof(NULL)[1] "NULL"
class(NULL)[1] "NULL"
10.3 Data structures
The basic data structures used in R include:
- vectors
- matrices
- arrays
- factors
- dataframes
- lists
Vectors:
In R, a vector is a collection of numbers or characters. They are created with:
- c() to concatenate elements
- rep() to repeat elements or patterns
- seq() or m:n to generate sequences
Most mathematical functions and operators can be applied to vectors without loops.
Examples;
u <- c(1, 1, 3, 2, 6, 1, 8, 4, 4)
u
v <- rep(4, 10)
v
w <- seq(1, 15, by=1) # seq(15)
w
x <- 1:20
xWhat does the “c” stand for and why do we need it?
Vectors & Indices:
Select the elements from the vector:
u <- c(1, 8, 4, 4, 3, 5, 2, 6, 3) Code
u <- c(1, 8, 4, 4, 3, 5, 2, 6, 3)
u[1]
u[2]
u[3]
u[4]
u[5]
u[6]
u[7]
u[8]
u[9]How to select multiple elements?
u <- c(1, 8, 4, 4, 3, 5, 2, 6, 3)
Code
u <- c(1, 8, 4, 4, 3, 5, 2, 6, 3)
u[2:5]10.4 Data structures: Vectors & indices
How to select the 2nd and 5th elements?
u <- c(1, 8, 4, 4, 3, 5, 2, 6, 3)
Code
u <- c(1, 8, 4, 4, 3, 5, 2, 6, 3)
u[c(2,5)]How to select the last element in a vector?
u <- c(1, 8, 4, 4, 3, 5, 2, 6, 3)
Code
u <- c(1, 8, 4, 4, 3, 5, 2, 6, 3)
tail(u, 1)Remove elements from a vector:
u <- c(1, 8, 4, 4, 3, 5, 2, 6, 3)
Code
u <- c(1, 8, 4, 4, 3, 5, 2, 6, 3)
u[-1][1] 8 4 4 3 5 2 6 3
Code
u[-2][1] 1 4 4 3 5 2 6 3
Code
u[-3][1] 1 8 4 3 5 2 6 3
Code
u[-4][1] 1 8 4 3 5 2 6 3
Remove multiple elements from a vector:
u <- c(1, 8, 4, 4, 3, 5, 2, 6, 3)
Code
u <- c(1, 8, 4, 4, 3, 5, 2, 6, 3)
u[-1:-3][1] 4 3 5 2 6 3
Code
u[c(-2,-5)][1] 1 4 4 5 2 6 3
10.5 Data structures: Matrices
In R, a matrix is a two-dimensional data structure where all elements are of the same type.
What is the dimension of this matrix?
mat1 <- matrix(1:6, nrow = 2, ncol = 3)
print(mat1) [,1] [,2] [,3]
[1,] 1 3 5
[2,] 2 4 6
Select elements:
mat1 <- matrix(1:6, nrow = 2, ncol = 3)
print(mat1) [,1] [,2] [,3]
[1,] 1 3 5
[2,] 2 4 6
mat1[2,3][1] 6
Select rows and columns:
mat1 <- matrix(1:6, nrow = 2, ncol = 3)
print(mat1) [,1] [,2] [,3]
[1,] 1 3 5
[2,] 2 4 6
mat1[ ,3][1] 5 6
mat1[2, ][1] 2 4 6
Remove rows, and columns:
mat1 <- matrix(1:6, nrow = 2, ncol = 3)
print(mat1) [,1] [,2] [,3]
[1,] 1 3 5
[2,] 2 4 6
mat1[ ,-3] [,1] [,2]
[1,] 1 3
[2,] 2 4
mat1[-2, ][1] 1 3 5
10.6 Data structures: Arrays
Arrays can take values of any base data type and span any number of dimensions. However, all values must be of the same base data type. This allows for efficient calculation and matrix mathematics.
# Creating a 3-dimensional array
arr1 <- array(1:12, dim = c(2, 3, 2))
print(arr1), , 1
[,1] [,2] [,3]
[1,] 1 3 5
[2,] 2 4 6
, , 2
[,1] [,2] [,3]
[1,] 7 9 11
[2,] 8 10 12
10.7 The str() and class() functions
The str() function will compactly display the structure of an object.
u <- c(1, 8, 4, 4, 3, 5, 2, 6, 3)
mat1 <- matrix(1:6, nrow = 2, ncol = 3)
arr1 <- array(1:12, dim = c(2, 3, 2))
str(u) num [1:9] 1 8 4 4 3 5 2 6 3
str(mat1) int [1:2, 1:3] 1 2 3 4 5 6
str(arr1) int [1:2, 1:3, 1:2] 1 2 3 4 5 6 7 8 9 10 ...
Notice class() is essentially the same as str() but more precise in providing the data classes/structures.
u <- c(1, 8, 4, 4, 3, 5, 2, 6, 3)
my_matrix <- matrix(1:6, nrow = 2, ncol = 3)
my_3d_array <- array(1:12, dim = c(2, 3, 2))
class(u)[1] "numeric"
class(my_matrix)[1] "matrix" "array"
class(my_3d_array)[1] "array"
10.8 Converting character vectors to numeric
u <- c("1", "8", "4", "4", "3", "5", "2", "6", "3")
class(u)[1] "character"
num.u <- as.numeric(u)Always remember to check and verify:
u <- c("1", "8", "4", "4", "3", "5", "2", "6", "3")
class(u)[1] "character"
num.u <- as.numeric(u)
class(num.u)[1] "numeric"
10.9 Converting numeric vectors to character
u <- c(1, 8, 4, 4, 3, 5, 2, 6, 3)
class(u)[1] "numeric"
char.u <- as.character(u)Always remember to check and verify:
u <- c(1, 8, 4, 4, 3, 5, 2, 6, 3)
class(u)[1] "numeric"
char.u <- as.character(u)
class(char.u)[1] "character"
char.u[1] "1" "8" "4" "4" "3" "5" "2" "6" "3"
10.10 Factors
Factors are a data structure used for representing categorical data. They are particularly useful for storing data that falls into a fixed number of unique categories or levels, such as gender, education level, or survey responses.
car.col <- factor(c("red", "blue", "green", "blue", "red"))
car.col[1] red blue green blue red
Levels: blue green red
Factors have a set of unique values called levels. These represent the different categories in the data.
car.col <- factor(c("red", "blue", "green", "blue", "red"))
str(car.col) Factor w/ 3 levels "blue","green",..: 3 1 2 1 3
car.col <- factor(c("red", "blue", "green", "blue", "red"))
levels(car.col)[1] "blue" "green" "red"
Suppose we didn’t use “factor” when the vector was created, such as:
car.col <- c("red", "blue", "green", "blue", "red")
str(car.col) chr [1:5] "red" "blue" "green" "blue" "red"
To change the vector type from a “chr” (character) to a “factor”, do the following:
car.col.fac <- factor(car.col)
# Then verify the change has been made.
str(car.col.fac) Factor w/ 3 levels "blue","green",..: 3 1 2 1 3
Why change from a character vector to a factor vector?
A “factor” refers to a statistical data type used to store categorical variables. For example:
When a vector is defined as a factor, levels or categories are automatically applied to unique categories. This makes the data analysis and plotting much easier.
10.11 Add a level to a factor
levels(car.col.fac) <- c("red", "blue", "green", "blue", "red", "yellow")
levels(car.col.fac)[1] "red" "blue" "green" "yellow"
10.12 Change the order of levels
car.col.fac <- factor(car.col.fac, levels=c("blue", "green", "red", "yellow"))
levels(car.col.fac)[1] "blue" "green" "red" "yellow"
10.13 Dataframes
A dataframe is a two-dimensional, table-like data structure in R, where each column can contain different types of data (numeric, character, factor, etc.), similar to a spreadsheet.
We will learn how to create a dataframe from scratch, and also how to import datasets into our R session as a dataframe for data analysis.
10.14 Lists
In R, a list is a flexible, one-dimensional data structure that can contain elements of different types and structures, including vectors, matrices, dataframes, functions, or even other lists.
A simple example:
Code
my_list <- list(
Name = "Alice",
Age = 25,
Scores = c(90, 85, 88),
Info = data.frame(Course = c("Math", "Science"), Grade = c("A", "B"))
)
print(my_list)$Name
[1] "Alice"
$Age
[1] 25
$Scores
[1] 90 85 88
$Info
Course Grade
1 Math A
2 Science B
10.15 Some base R functions
Try these important base R functions!
x <- c(42, 22, 31, 66, 11, 45, 39, 27, 25, 44)How would we calculate the:
- length of x?
- sum of x?
- minimum value of x?
- maximum value of x?
- range of x?
- mean of x?
- standard deviation of x?
10.16 Missing data
What if your data contains NA (not available, missing data)?
s <- c(1, 1, 3, 2, 6, 1, 8, NA, 4, 4)mean(s, na.rm=TRUE)[1] 3.333333
round(mean(s, na.rm=TRUE))[1] 3
To get help with an R function such as round(), type the following:
> ?round
In RStudio, this code opens a Help window in the bottom-right corner of your screen by default. You can also type ‘help’, but remember to use parentheses around your search term:
> help(round)
10.17 Reflection questions
What’s the difference between a “character” and a “numeric” datatype in R?
What do you think NULL represents, and how might it differ from NA?
If you had a list of daily temperatures, how might vectors help you analyze them?
What’s the difference between u[2:5], u[c(2,5)], and u[-1:-3]?
Why is it important to know how to access and remove specific elements?
What does mat1[ ,3] mean? What about mat1[2, ]?
What might be a real-world scenario where a 3-dimensional array is more useful than a matrix?
If you’re unsure what type of structure a dataset is stored in, which function would you try first?
Why is it useful to define levels and order in a factor?
How are dataframes similar to and different from matrices?
Why is it helpful to know basic summary functions like mean(), sd(), or range()?
10.18 Exercises
- What is the type and class of each variable? Use typeof() and class() to find out.
x <- 5L
y <- 5.0
z <- "5"Explain the difference between typeof() and class() in one sentence.
What type of object is created in each of the following lines? What function should you use to confirm your answer?
a <- c(10, 20, 30)
b <- matrix(1:6, nrow = 2)
c <- array(1:8, dim = c(2, 2, 2))
d <- factor(c("yes", "no", "yes", "maybe"))
e <- data.frame(name = c("Ali", "Fatima"), age = c(22, 23))
f <- list(score = 90, passed = TRUE, details = c("A", "B"))Describe what c(), rep(), and seq() are used for and what the abbreviations stand for.
Compare these functions. What do they do and how do they differ?
c(1, 2, 3)
rep(1:2, times = 2)
seq(2, 10, by = 2)- Create a vector called grades <- c(80, 70, 85, 90, 75) and then:
- Use indexing to select the first value
- Select the last two values
- Remove the third value
- Select all values greater than 75
- Logan runs this code but doesn’t understand the difference in output. Help him and then explain when would you use tail() instead of indexing.
x <- c(2, 4, 6, 8, 10, 12)
tail(x, 3)
x[4:6]- You want to create the following 3x2 matrix using matrix() and access different elements. Use this data:
m <- matrix(c(1, 2, 3, 4, 5, 6), nrow = 3, byrow = TRUE)Then write code to: - Select the second row - Select the first column - Select the element in row 2, column 2
- Create a 3-dimensional array with array() using values 1 to 12 and dimensions 2x2x3. Then:
- Use dim() and str() to inspect it
- What is one key difference between the two outputs?
- Julia wants to compare str() and class() but uses them wrong. What is the mistake in this code? Fix the code and explain when to use each function.
df <- data.frame(name = c("A", "B"), age = c(21, 22))
str(df$name)
class(df$name)- Which lines will return an error? And why?
as.numeric("5")
as.integer("apple")
as.character(10.5)- You create this factor:
colors <- factor(c("red", "blue", "red", "green", "blue"))Now: - Add the level “yellow” - Reorder the levels to: “blue”, “green”, “red”, “yellow” What will the output of levels(colors) look like now?
When is it appropriate to create a factor? Give one short example of where using a factor is better than using a character vector.
Create a dataframe manually using this code, then make the following changes WITHOUT augmenting the provided code (i.e. do not just change the code chunk), use new lines).
df <- data.frame(name = c("Lina", "Omar", "Zoe"),
age = c(23, 25, 22),
city = c("Toronto", "Montreal", "Ottawa"))- Change Omar’s city to “Vancouver”
- Add a new row with name = “Ali”, age = 26, city = “Calgary”
- Use str() on the updated dataframe
- Use this vector and apply the following functions with and without na.rm = TRUE:
x <- c(4, 7, NA, 2, 9)- length(x)
- sum(x)
- range(x)
- mean(x)
Use ?mean and help(mean) to look up the function. What is the difference between the two?
What does “base R” refer to?
Which function belongs to base R and which does not?
library(ggplot2)
mean(c(1, 2, 3))- A student runs the code below to select multiple items but only gets one result. What’s wrong? Fix it so it selects the first and third values.
v <- c("apple", "banana", "cherry", "date")
v[1, 3]- Write a line of code the code to remove the second and fourth elements from the vector. What does the final vector contain?
fruits <- c("apple", "banana", "cherry", "date") - This matrix is created incorrectly. Fix the dimensions so it creates a matrix with 2 rows and 3 columns, filled row-wise. Then select the element in row 1, column 2.
m <- matrix(1:6, nrow = 3, ncol = 2, byrow = FALSE)- Create a 2x2x2 array using values from 10 to 17 (inclusive). Then write code to:
- Print the array
- Extract the entire second matrix (along the third dimension)
- Extract the element in position [2, 1, 2]
- What does this code print, and why? Explain what each function is doing, then convert the factor a to a character vector.
a <- factor(c("low", "medium", "high", "low"))
levels(a)
str(a)- Convert the following character vector into a numeric one. Then calculate its mean.What happens, and how can you fix it?
nums <- c("5", "7", "10", "apple", "12")- You try to use this line to create a factor, but something goes wrong. What’s the issue, and how do you fix it?
f <- factor(1, 2, 3)Hint: Run it and inspect the result using levels(f) and str(f).
- You are trying to filter values using > but something isn’t working. Run this code and fix it so it returns all values greater than 3.
x <- c("1", "3", "5", "7")
x > 3- You see this line of code in a tutorial. What is it trying to do? Break it down and explain each part:
str(data.frame(x = 1:3, y = c(TRUE, FALSE, TRUE)))- Jamal wants to create a sequence from 20 down to 0, decreasing by 5 each time. He tries seq(0, 20, 5) but it gives the wrong result. What is the correct code, and why didn’t his attempt work?