R Basics

Author

Affiliation

Dr Wei Miao

UCL School of Management

Published

September 27, 2024

Modified

October 21, 2024

1 Hello R

1.1 Bilingual arrangements at MSc BA

Primary language is Python
- Programming (MSIN00143), Business Strategy (MSIN0093), Machine Learning electives
Secondary language is R
- Marketing Analytics (MSIN0094), Statistical Foundations (MSIN0096)

1.2 A brief history of R

R project was initiated by Robert Gentleman and Ross Ihaka (Univ of Auckland) in 1991; both are statisticians, who later made the language open-source.
Since 1997, R has been developed by the R Core Team on CRAN.
As of January 2022, it has almost 20k contributed packages.
As of 2024, R is ranked 18th in the TIOBE index¹.

1.3 Why learn R?

Highly powerful data analytics and visualizations, including²
- Data wrangling (dplyr) and data visualization (ggplot)
- Statistics and Econometrics (major advantage of R over Python)
- Predictive analytics such as machine learning
Write beautiful reports, dissertations, presentations using Quarto
- Write your MSc dissertation
- Effortlessly build websites. I built and maintain my personal website and the marketing course website all in R.

1.4 One-One comparison with Python

As you will be learning Python in the programming course, it’s good to know the differences between R and Python. In addition to the general comparison below, I have also prepared a detailed side-by-side comparison of R and Python here.
It’s highly recommended that when you learn both languages at the same time, you should be able to compare them side-by-side often.

R versus Python
	R	Python
Language purpose	R is a statistical language specialized in the data analytics and visualization. Best for data science, may not be robust for production environment.	Python is a general-purpose language that is used for the deployment and development of various projects. Best for production environment.
Data analytics	R is better at statistical models and econometrics.	Python is better at machine learning due to support from PyTorch and TensorFlow.
IDEs (Integrated Development Environment)	RStudio	Many options such as Jupyter Notebook, Spyder, Pycharm, etc.
Targeted users	Primary users of R include data scientists and researchers in academia, who heavily rely on data analyses and visualization.	Primary users of python include developers and programmers.

1.5 A first look at the RStudio Interface

R is the programming language, and we need a “place” to write codes. This place is called an Integrated development environment (IDE).

RStudio is so far the best R IDE. And it’s interface consists of the following major panels (clockwise from top left):

script: (top left) where you do the coding
environment: (top right) a list of named objects that we have generated
history: (top right) the list of past commands that we have used
help: (bottom right) user manuals of functions available in R
package: (bottom right) a collection of ready-to-use packages written by others
console: (bottom left) where you can run commands interactively with R and see code outputs

1.6 Where to write R codes (I): Console

You can write codes interactively in the R console. See an example: Type the following code into your console and see what happens.
Code
```
print("Hello World")
```
```
[1] "Hello World"
```
Used for simple exploratory, unstructured tasks, where you don’t need to keep a record of codes.
- e.g., summary statistics; check variable values, etc.

1.7 Where to write R codes (II): `.qmd` script

Quarto³ files have a .qmd suffix. You can think of Quarto as Microsoft Word that can run R codes.
If you have experience with Jupyter Notebook, Quarto is the R equivalent of Jupyter Notebook, but just much more powerful.
Quarto can create dynamic contents with R, conveniently combining data analytics work with beautiful reporting.
Now, let’s create a new quarto file together! Name it “MyFavoriteShow.qmd” and save it to your download folder.

2 Introduction to Quarto

2.1 YAML header

You can think of YAML header as a MS Word template, which determines how your final report looks like (font, font size, color, margins, etc.).
The YAML header is always at the beginning of a document, separated from the main text by three dashes (---). YAML does not appear in the final report.

2.2 Authoring with normal texts

RStudio provides two ways to edit a quarto file (1) visual mode and (2) source mode.

RStudio’s visual editor offers a Microsoft-Word like experience for you to write R codes.
- Explore the rich formatting tools available for report authoring
If you are familiar with markdown syntax, you can use the source mode to write the report (optional; for advanced users only).

Exercise

Create a new quarto file from RStudio with the following level-1 and level-2 headers

Level 1: Slowhorse Season 4

Level 2: Episode 1: Identity Theft

Body: A London bombing puts Taverner under pressure. When River grows concerned for his grandfather, Louisa encourages him to go for a visit.

2.3 Coding with code blocks

In qmd files, we write R codes in so-called code chunks (sometimes referred to as code cells or code blocks) identified with {r}.
To insert a code chunk, click Insert ->Code Chunk -> R. You can also use the shortcut Ctrl + Alt + I or Cmd + Option + I.

Caveat

Ensure the first line remains {r} only and do not include any comments or code on this line.

You can run each code chunk interactively by clicking the green solid triangle (run current code chunk). RStudio executes the codes in the code chunk and displays the results.
See an example and try on your computer!

Code

print("R is the Best Language! Way better than Python! The battle is on!")

[1] "R is the Best Language! Way better than Python! The battle is on!"

Exercise

Insert the above R code block in your quarto file under any section.

2.4 Rendering a report

At the end, when the Quarto document (including codes and main texts) are ready, use the Render button in the RStudio IDE to render the file.

The rendered report will be in the same folder with your qmd file.

Exercise

Render your quarto file into a document and see how it looks like.

2.5 More learning resources for Quarto (Optional)

The available YAML fields vary based on document format
- Here for YAML fields for PDF documents
- Here for MS Word
- Here for HTML documents
Markdown syntax
- Markdown basics
- Markdown practice
Quarto (recommended to be reviewed after-class)
- Get started

3 Basics of R

3.1 Named objects

R is an object-oriented language, so we will be working on named objects.
We use the left arrow <- to create a named object, the keyboard shortcut for <- for windows users is Alt + -, or for mac users, Option + -.
The <- is an assignment operator, which assigns the R objects on the RHS to the name on the LHS.⁴
The below code creates a new object called ‘x’ in R; x is a numeric object; its value is 3.

Code

# create an object x with value 3
x <- 3

After an object is created, we can refer to the object by its name

Code

# print out x
x

[1] 3

We can also perform operations on the object

Code

# Question: hmmm, why does Wei choose these two numbers?
x^2

[1] 9

Code

x^3

[1] 27

Exercise

Insert a code block in your quarto file, which does the following:

Create an object with name ‘x’ with the formula of 2 + 2

3.2 Rules for naming object

For a variable to be valid, it should follow these rules

It should contain letters, numbers, and only dot or underscore characters.
It cannot start with a number (eg: 2iota), or a dot, or an underscore.

Code

# 2iota <- 2
# .iota <- 2
# _iota <- 2

It should not be a reserved word in R (eg: mean, sum, etc.).

Code

# mean <- 2

Tips

In the future, it’s good practice to use memorable names to name an object

For instance, use prefix “df_” or “data_” to name datasets.

3.3 Functions

A function usually takes (one or several) objects as input, run specific operations on the object(s) defined by the function, and then return an output.
For instance, an R’s built-in function sqrt() takes a number as input, and returns the square root of the number. Let’s use it on object x.

Code

sqrt(x)

[1] 1.732051

We will heavily rely on functions to conduct data analyses. For how to use a new function, search the function in RStudio’s help panel.

Exercise

Search and learn the usage of function “log()”.
Insert a code block in your quarto file to compute the logarithm of x.

3.4 Collection of functions: Packages

The base R already comes with many useful built-in functions to perform basic tasks, but as data scientists, we need more.

To perform certain tasks (such as a machine learning model), we can definitely write our own code from scratch, but it takes lots of (unnecessary) effort. Fortunately, many packages have been written by others for us to directly use.

To download a package, hit Tools -> Install Packages in RStudio, and type the package name in the pop-up window. Now, download the package praise.
To load the packages, we need to type library().

Code

library(praise)

Now that the package is loaded, you can use the functions in it. praise() is a function in the praise package.

Code

praise()

[1] "You are kryptonian!"

Tips

Packages need to be downloaded only once, but need to be loaded every time you restart the RStudio.

3.5 Comment codes

You can put a # before any code, to indicate that any codes after the # on the same line are your comments, and will not be run by R.

It’s a good practice to often comment your codes, so that you can help the future you to remember what you were trying to achieve.

Code

# print("Let's fund Wei for an iPhone 16 Pro Max as a birthday gift!")

Code

# Is x 1 or 2 below?
x <- 1 # +1

4 Data structures

4.1 Data types

4.1.1 Numeric

We can use R as a calculator for numeric objects

Code

# Numeric Vector
num2 <- 2.5
log(num2)

[1] 0.9162907

Code

num2^2

[1] 6.25

Code

exp(num2)

[1] 12.18249

4.1.2 Logical (TRUE, FALSE):

Logical objects are used to store logical values, such as TRUE and FALSE.

Code

num2 <- 2.5

# larger than 2?
num2 > 2

[1] TRUE

Code

# smaller than 2?
num2 < 2

[1] FALSE

Code

# equal to 2?
num2 == 2

[1] FALSE

Code

# not equal to 2?
num2 != 2

[1] TRUE

Sometimes, we may need to operation on multiple relational operations. We can use logical operators to combine multiple relational operations.

Code

T & F # and

[1] FALSE

Code

T | F # or

[1] TRUE

Code

!T # not

[1] FALSE

For instance, we may want to know if a number is between 3 and 8.

Code

num2 >= 3 & num2 <= 8

[1] FALSE

4.1.3 Character:

Characters are enclosed within a pair of quotation marks.
Single or double quotation marks can both work.
Even if a character may contain numbers, it will be treated as a character, and R will not perform any mathematical operations on it.

Code

str1 <- "1 + 1 = 2"

4.2 Check data types using class()

We can use class() to check the type of an object in R.

Code

a <- "1+1"
class(a)

[1] "character"

Code

b <- 1 + 1
class(b)

[1] "numeric"

This is very useful when we first load data from external databases, we need to make sure variables are of the correct data types.

4.3 Data type: conversion

Sometimes, data types of variables from raw data may not be what we want; we need to change the data type of a variable to the appropriate one.

See the following example:

a is a string, and we cannot use mathematical operations on it, or R will report errors.

Code

a <- "1"
class(a)

[1] "character"

Code

a + 1

Error in a + 1: non-numeric argument to binary operator

We can convert a to a numeric value. To convert from character to numeric, we use as.numeric()

Code

b <- as.numeric(a)
class(b)

[1] "numeric"

5 Vectors

Below are the complete list of objects in R.

Visualization of data structures .

We will use vector and matrix most frequently in this course.

5.1 Creating vectors

5.1.1 Creating vectors: c()

In R, a vector is a collection of elements of the same data type, which is often used to store a variable of a dataset. For instance, a vector can store the income of a group of people, the final grades of students, etc.
Vector can be created using the function c() by listing all the values in the parenthesis, separated by comma ‘,’.
c() stands for “combine”.

Code

Income <- c(1, 3, 5, 10)
Income

[1]  1  3  5 10

Vectors must contain elements of the same data type. If they do not, R will automatically convert all elements into the same type, typically characters.

Code

x <- c(1, "intro", TRUE)
class(x)

[1] "character"

5.1.2 Checking the number of elements in a vector: length()

You can measure the length of a vector using the command length()

Code

x <- c("R", " is", " the", " best", " language")
length(x)

[1] 5

5.1.3 Creating numeric sequences: seq()

It is also possible to easily create sequences with patterns

use seq() to create sequence with fixed steps

Code

# use seq()
seq(from = 1, to = 2, by = 0.1)

 [1] 1.0 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8 1.9 2.0

If the step is 1, there’s a convenient way using :

Code

1:5

[1] 1 2 3 4 5

5.1.4 Concatenate multiple vectors into one: c()

Sometimes, we may want to concatenate multiple vectors into one. For instance, we may have collected income data from two different sources, and we want to concatenate them into one vector.
We can use c() to concatenate different vectors; this is very commonly used to concatenate vectors.

Code

Income1 <- 1:3
Income2 <- c(10, 15)

Code

c(Income1, Income2)

[1]  1  2  3 10 15

Exercise

Create a sequence of {1,1,2,2,3,3,3}.

5.2 Indexing and subsetting

We put the index of elements we would like to extract in a square bracket [ ].

Code

# create a vector of income for 4 lecturers at UCL

income <- c(5000, 5500, 6000, 9000)

Extract a single element: use the index of the element

Code

# what is the income of the 3rd lecturer?
income[3]

[1] 6000

Extract multiple elements: use a vector of indices

Code

# what are the incomes of the 1st, 3rd, and 4th lecturers?
income[c(1, 3, 4)]

[1] 5000 6000 9000

5.3 Element-wise arithmetic operations

R is a vectorized language, which broadcasts operations to all elements in a vector. This behavior is also called element-wise operations, or broadcasting.

If you operate on a vector with a single number, the operation will be applied to all elements in the vector

Code

x <- c(1, 3, 8, 7)

Code

x + 2

[1]  3  5 10  9

Code

x * 2

[1]  2  6 16 14

Exercise

Create a geometric sequence {2,4,8,16,32} using seq().

5.4 Elementwise relational operations

Besides arithmetic operations, we can also perform relational operations on vectors.

Code

x <- c(1, 3, 8, 7)
x > 2

[1] FALSE  TRUE  TRUE  TRUE

We can also compare a vector with a vector, because R is vectorized

Code

incomeUCL <- c(6000, 4600, 7000, 9100, 10000)
incomeImperial <- c(5000, 4500, 6000, 9000, 10000)
incomeUCL > incomeImperial

[1]  TRUE  TRUE  TRUE  TRUE FALSE

5.5 Special relational operation: `%in%`

A special relational operation is %in% in R, which tests whether an element exists in the object.

Code

x <- c(1, 3, 8, 7)

3 %in% x

[1] TRUE

Code

2 %in% x

[1] FALSE

5.6 After-class exercise

Create a vector of 10 numbers from 1 to 10, and extract the 2nd, 4th, and 6th elements.
Create a vector of 5 numbers from 1 to 5, and check if 3 is in the vector.
Now the interest rate is 0.1, and you have 1000 pounds in your bank account. Calculate the amount in your bank account after 1 year, 2 years, and 3 years, respectively.

6 Matrices

6.1 Matrices: creating matrices

6.1.1 Creating matrices: `matrix()`

A matrix can be created using the command matrix()
- the first argument is the vector to be converted into matrix
- the second argument is the number of rows
- the last argument is the number of cols

Code

matrix(1:9, nrow = 3, ncol = 3)

     [,1] [,2] [,3]
[1,]    1    4    7
[2,]    2    5    8
[3,]    3    6    9

Important

R by default inserts elements vertically by columns.

R will fill in the matrix by column and discard the remaining extra elements once fully filled, with a warning message

Code

matrix(1:9, nrow = 3, ncol = 2)

Warning in matrix(1:9, nrow = 3, ncol = 2): data length [9] is not a
sub-multiple or multiple of the number of columns [2]

     [,1] [,2]
[1,]    1    4
[2,]    2    5
[3,]    3    6

6.1.2 Creating matrices: inserting by row

However, we can ask R to insert by rows by setting the byrow argument.

Code

matrix(1:9, nrow = 3, ncol = 3, byrow = TRUE)

     [,1] [,2] [,3]
[1,]    1    2    3
[2,]    4    5    6
[3,]    7    8    9

6.1.3 Creating matrices: concatenate matrices `cbind()` and `rbind()`

We can use cbind() and rbind() to concatenate vectors and matrices into new matrices.

cbind() does the column binding

Code

a <- matrix(1:6, nrow = 2, ncol = 3)

a

     [,1] [,2] [,3]
[1,]    1    3    5
[2,]    2    4    6

Code

cbind(a, a) # column bind

     [,1] [,2] [,3] [,4] [,5] [,6]
[1,]    1    3    5    1    3    5
[2,]    2    4    6    2    4    6

rbind() does the row binding

Code

rbind(a, a) # row bind

     [,1] [,2] [,3]
[1,]    1    3    5
[2,]    2    4    6
[3,]    1    3    5
[4,]    2    4    6

6.2 Matrices: indexing and subsetting

Matrices have two dimensions: rows and columns. Therefore, to extract elements from a matrix, we just need to specify which row(s) and which column(s) we want.

Code

x <- matrix(1:9, nrow = 3, ncol = 3)
x

     [,1] [,2] [,3]
[1,]    1    4    7
[2,]    2    5    8
[3,]    3    6    9

Extract the element in the 2nd row, 3rd column.
- use square bracket with a coma inside [ , ] to indicate subsetting; the argument before coma is the row index, and the argument after the coma is the column index.
  - 2 is specified for row index, so we will extract elements from the first row
  - 3 is specified for column index, so we will extract elements from the the second column
  - Altogether, we extract a single element in row 2, column 3.

Code

x[2, 3] # the element in the 2nd row, 3rd column

[1] 8

If we leave blank for a dimension, we extract all elements along that dimension.
- if we want to take out the entire first row
  - 1 is specified for the row index
  - column index is blank

Code

x[1, ] # all elements in the first row

[1] 1 4 7

Exercise

Extract all elements in the second column
Extract all elements in the first and third rows

6.3 Matrices: operations

6.3.1 Apply a math function to a matrix

Let’s use 3 matrices x, y, and z:

Code

x <- matrix(1:6, nrow = 3)
y <- matrix(1:6, byrow = T, nrow = 2)
x

     [,1] [,2]
[1,]    1    4
[2,]    2    5
[3,]    3    6

Code

     [,1] [,2] [,3]
[1,]    1    2    3
[2,]    4    5    6

Functions will be vectorized over all elements in a matrix

Code

z <- x^2
z

     [,1] [,2]
[1,]    1   16
[2,]    4   25
[3,]    9   36

6.3.2 Matrices’ operations: matrix addition and multiplication

If the two matrices are of the same dimensions, they can do element-wise operations, including element-wise addition and element-wise multiplication

Code

x + z # elementwise addition

     [,1] [,2]
[1,]    2   20
[2,]    6   30
[3,]   12   42

Code

x * x

     [,1] [,2]
[1,]    1   16
[2,]    4   25
[3,]    9   36

If we want to perform the matrix multiplication as in linear algebra, we need to use %*%
- x and y must have conforming dimensions

Code

     [,1] [,2]
[1,]    1    4
[2,]    2    5
[3,]    3    6

Code

     [,1] [,2] [,3]
[1,]    1    2    3
[2,]    4    5    6

Code

x %*% y # matrix multiplication

     [,1] [,2] [,3]
[1,]   17   22   27
[2,]   22   29   36
[3,]   27   36   45

6.3.3 Matrices’ operations: inverse and transpose

We use t() to do matrix transpose

Code

     [,1] [,2]
[1,]    1    4
[2,]    2    5
[3,]    3    6

Code

t(x) # transpose

     [,1] [,2] [,3]
[1,]    1    2    3
[2,]    4    5    6

We use solve() to get the inverse of an matrix

Code

     [,1] [,2]
[1,]    1    4
[2,]    2    5
[3,]    3    6

Code

solve(t(x) %*% x) # inverse; must be on a square matrix

           [,1]       [,2]
[1,]  1.4259259 -0.5925926
[2,] -0.5925926  0.2592593

7 Data Frames

7.1 Data Frames: creating data.frame

7.1.1 Data Frames: create dataframe using `data.frame()`

You can think of data.frame as a spreadsheet in excel.

Code

df <- data.frame(
    id = 1:4,
    name = c("David", "Karima", "Anil", "Wei"),
    wage = rnorm(n = 4, mean = 10^5, sd = 10^3),
    male = c(T, T, T, T)
)
df

Data frames can also be created from external sources, e.g., from a csv file or database.

7.2 Data Frames: Basics

Each row stands for an observation; each column stands for a variable.
Each variable should have a unique name.
Each column must contain the same data type, but the different columns can store different data types.
- compare with matrix?
Each column must be of same length, because rows have the same length across variables.

7.3 Data Frames: check dimensions and variable types

You can verify the size of the data.frame using the command dim(); or nrow() and ncol()

Code

dim(df)

[1] 4 4

Code

nrow(df)

[1] 4

Code

ncol(df)

[1] 4

You can get the data type info using the command str()

Code

str(df)

'data.frame':   4 obs. of  4 variables:
 $ id  : int  1 2 3 4
 $ name: chr  "David" "Karima" "Anil" "Wei"
 $ wage: num  100023 98682 100462 100024
 $ male: logi  TRUE TRUE TRUE TRUE

Get the variables names of the data frame

Code

names(df)

[1] "id"   "name" "wage" "male"

8 Other data structures (Optional)

8.1 Arrays

We can use array() to generate a high-dimensional array
Just like vectors and matrices, arrays can include only data types of the same kind.
A 3D array is basically a combination of matrices each laid on top of other

Code

x <- 1:4
x <- array(data = x, dim = c(2, 3, 2))
x

, , 1

     [,1] [,2] [,3]
[1,]    1    3    1
[2,]    2    4    2

, , 2

     [,1] [,2] [,3]
[1,]    3    1    3
[2,]    4    2    4

8.2 Lists

A list is an R object that can contain anything. List is pretty useful when you need to store objects for latter use.

Code

x <- 1:2
y <- c("a", "b")
L <- list(numbers = x, letters = y)

8.3 Lists: indexing and subsetting

There are many ways to extract a certain element from a list.

by index
by the name of the element
by dollar sign $

Code

L[[1]] # extract the first element

[1] 1 2

Code

L[["numbers"]] # based on element name

[1] 1 2

Code

L$numbers # extract the element called numbers

[1] 1 2

After extracting the element, we can work on the element further:

Code

L$numbers[1:3] > 2

[1] FALSE FALSE    NA

9 Programming Basics: Flow Control

9.1 if/else

Sometimes, you want to run your code based on different conditions. For instance, if the observation is a missing value, then use the population average to impute the missing value. This is where if/else kicks in.

if (condition == TRUE) {
  action 1
} else if (condition == TRUE ){
  action 2
} else {
  action 3
}

Example 1:

Code

a <- 15

if (a > 10) {
    larger_than_10 <- TRUE
} else {
    larger_than_10 <- FALSE
}

larger_than_10

[1] TRUE

Example 2:

Code

x <- -5
if (x > 0) {
    print("x is a non-negative number")
} else {
    print("x is a negative number")
}

[1] "x is a negative number"

9.2 Loops

As the name suggests, in a loop the program repeats a set of instructions many times, until the stopping criteria is met.

Loop is very useful for repetitive jobs.

Code

for (i in 1:10) { # i is the iterator
    # loop body: gets executed each time
    # the value of i changes with each iteration
}

9.3 Nested loops

We can also nest loops inside other loops.

Code

x <- cbind(1:3, 4:6) # column bind
x

     [,1] [,2]
[1,]    1    4
[2,]    2    5
[3,]    3    6

Code

y <- cbind(7:9, 10:12) # row bind
y

     [,1] [,2]
[1,]    7   10
[2,]    8   11
[3,]    9   12

Code

z <- x

for (i in 1:nrow(x)) {
    for (j in 1:ncol(x)) {
        z[i, j] <- x[i, j] + y[i, j]
    }
}

z

     [,1] [,2]
[1,]    8   14
[2,]   10   16
[3,]   12   18

9.4 User-Defined Functions

A function takes the argument as input, run some specified actions, and then return the result to us.

Functions are very useful. When we would like to test different ideas, we can combine functions with loops: We can write a function which takes different parameters as input, and we can use a loop to go through all the possible combinations of parameters.

9.4.1 User-defined function syntax

Here is how to define a function in general:

Code

function_name <- function(arg1, arg2 = default_value) {
    # write the actions to be done with arg1 and arg2
    # you can have any number of arguments, with or without defaults
    return() # the last line is to return some value
}

Example:

Code

magic <- function(x, y) {
    results <- x^2 + y

    return(results)
}

magic(2, 3)

[1] 7

9.4.2 Arguments

Default values: UDF can have default values for arguments by using arg = default_value. If the user does not provide a value for the argument when calling the UDF, the default value will be used.

Code

magic <- function(x, y = 1) {
    results <- x^2 + y

    return(results)
}

magic(2)

[1] 5

Missing values: If the user does not provide a value for an argument without a default value, R will throw an error.

Code

magic <- function(x, y) {
    results <- x^2 + y

    return(results)
}

magic(2)

Error in magic(2): argument "y" is missing, with no default

Order of arguments: The order of arguments matters when calling the UDF. If you want to provide a value for the second argument, you must provide a value for the first argument as well.

Code

magic <- function(y, x) {
    results <- x^2 + y

    return(results)
}

magic(3, 2)

[1] 7

9.4.3 Returned Value

We can return a value from a function using the return() function. The value returned can be of any data type.

Code

magic <- function(x, y) {
    result <- x^2 + y

    return(result)
}

magic(2, 3)

[1] 7

If the function does not have a return() statement, it will return the last value calculated in the function.

Code

magic <- function(x, y) {
    x + y
}

magic(2, 3)

[1] 5

9.5 Variable Scope

Variables created inside a function are local to the function, and cannot be accessed outside the function.

Code

magic <- function(x, y) {
    result <- x^2 + y

    return(result)
}

result

Error in eval(expr, envir, enclos): object 'result' not found

9.6 A comprehensive example

Task: write a function, which takes a vector as input, and returns the max value of the vector

Code

get_max <- function(input) {
    max_value <- input[1]
    for (i in 2:length(input)) {
        if (input[i] > max_value) {
            max_value <- input[i]
        }
    }

    return(max_value)
}

get_max(c(-1, 3, 2))

[1] 3

Exercise

Write your own version of which.max() function

Footnotes

TIOBE Programming Community index is a measure of programming language popularity.↩︎
There are many R-exclusive packages, such as the state-of-the-art causal machine learning library grf , which we will learn in the final week.↩︎
Why the name Quarto? “We wanted to use a name that had meaning in the history of publishing and landed on Quarto, which is the format of a book or pamphlet produced from full sheets printed with eight pages of text, four to a side, then folded twice to produce four leaves. The earliest known European printed book is a Quarto, the Sibyllenbuch, believed to have been printed by Johannes Gutenberg in 1452–53.”↩︎
You can also use equal sign =, but it’s recommended to stick with R’s tradition.↩︎

--- author: Dr Wei Miao date: "`r (lubridate::ymd('20240927'))`" date-format: long date-modified: "`r (lubridate::ymd('20241021'))`" institute: UCL School of Management toc: true toc-depth: 2 format: html: number-sections: true df-print: paged page-layout: full toc-depth: 2 code-line-numbers: false code-copy: hover title: "R Basics" pdf: toc: true toc-depth: 2 number-sections: true number-depth: 2 fontsize: 10pt code-line-numbers: true title: "R Basics" knitr: opts_chunk: echo: true warning: true message: true error: true execute: freeze: auto cache: true editor_options: chunk_output_type: inline --- # Hello R ## Bilingual arrangements at MSc BA - Primary language is Python - Programming (MSIN00143), Business Strategy (MSIN0093), Machine Learning electives - Secondary language is R - Marketing Analytics (MSIN0094), Statistical Foundations (MSIN0096) ## A brief history of R - R project was initiated by **R**obert Gentleman and **R**oss Ihaka (Univ of Auckland) in 1991; both are statisticians, who later made the language **open-source**. - Since 1997, R has been developed by the R Core Team on CRAN. - As of January 2022, it has almost 20k contributed packages. - As of 2024, R is ranked 18th in the TIOBE index[^1]. [^1]: TIOBE Programming Community index is a measure of programming language popularity. ## Why learn R? - Highly powerful data analytics and visualizations, including[^2] - Data wrangling (`dplyr`) and data visualization (`ggplot`) - Statistics and Econometrics (major advantage of R over Python) - Predictive analytics such as machine learning - Write beautiful reports, dissertations, presentations using `Quarto` - Write your MSc dissertation - Effortlessly build websites. I built and maintain my [personal website](http://miaowei.netlify.app) and the marketing course website all in R. [^2]: There are many R-exclusive packages, such as the state-of-the-art causal machine learning library `grf` , which we will learn in the final week. ## One-One comparison with Python - As you will be learning Python in the programming course, it's good to know the differences between R and Python. In addition to the general comparison below, I have also prepared a detailed side-by-side comparison of R and Python [here](R-ComparisonWithPython.qmd). - It's highly recommended that when you learn both languages at the same time, you should be able to compare them side-by-side often. +--------------------------------------------+------------------------------------------------------------------------------------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------+ | | R | Python | +============================================+==========================================================================================================================================+===============================================================================================================+ | Language purpose | R is a **statistical language** specialized in the **data analytics and visualization**. | Python is a **general-purpose language** that is used for the deployment and development of various projects. | | | | | | | Best for data science, may not be robust for production environment. | Best for production environment. | +--------------------------------------------+------------------------------------------------------------------------------------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------+ | Data analytics | R is better at **statistical models** and **econometrics**. | Python is better at **machine learning** due to support from PyTorch and TensorFlow. | +--------------------------------------------+------------------------------------------------------------------------------------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------+ | IDEs (Integrated Development Environment) | RStudio | Many options such as Jupyter Notebook, Spyder, Pycharm, etc. | +--------------------------------------------+------------------------------------------------------------------------------------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------+ | Targeted users | Primary users of R include **data** **scientists** and **researchers** in academia, who heavily rely on data analyses and visualization. | Primary users of python include **developers and programmers.** | +--------------------------------------------+------------------------------------------------------------------------------------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------+ : R versus Python ## A first look at the RStudio Interface R is the **programming language**, and we need a "place" to write codes. This place is called an **Integrated development environment (IDE)**. RStudio is so far the best R IDE. And it's interface consists of the following major panels (clockwise from top left): - ***script***: (top left) where you do the coding - ***environment***: (top right) a list of named objects that we have generated - ***history***: (top right) the list of past commands that we have used - ***help***: (bottom right) user manuals of functions available in R - ***package***: (bottom right) a collection of ready-to-use packages written by others - ***console***: (bottom left) where you can run commands interactively with R and see code outputs ![](images/r_Rstudio.png) ## Where to write R codes (I): Console - You can write codes *interactively* in the R console. See an example: Type the following code into your console and see what happens. ```{r} print("Hello World") ``` - Used **for simple exploratory, unstructured tasks**, where you don't need to keep a record of codes. - e.g., summary statistics; check variable values, etc. ## Where to write R codes (II): `.qmd` script - `Quarto`[^3] files have a `.qmd` suffix. You can think of Quarto as **Microsoft Word that can run R codes**. - If you have experience with Jupyter Notebook, Quarto is the R equivalent of Jupyter Notebook, but just much more powerful. - Quarto can create dynamic contents with R, conveniently combining data analytics work with beautiful reporting. - Now, let's create a new quarto file together! Name it "MyFavoriteShow.qmd" and save it to your download folder. [^3]: Why the name Quarto? "We wanted to use a name that had meaning in the history of publishing and landed on Quarto, which is the format of a book or pamphlet produced from full sheets printed with eight pages of text, four to a side, then folded twice to produce four leaves. The earliest known European printed book is a Quarto, the [Sibyllenbuch](https://en.wikipedia.org/wiki/Sibyllenbuch_fragment), believed to have been printed by [Johannes Gutenberg](https://en.wikipedia.org/wiki/Johannes_Gutenberg) in 1452--53." # Introduction to Quarto ## YAML header - You can think of YAML header as a MS Word template, which determines how your final report looks like (font, font size, color, margins, etc.). - The YAML header is always at the beginning of a document, separated from the main text by three dashes (`---`). YAML does not appear in the final report. ## Authoring with normal texts RStudio provides two ways to edit a quarto file (1) **visual mode** and (2) **source mode**. - RStudio's [visual editor](https://quarto.org/docs/visual-editor/) offers a Microsoft-Word like experience for you to write R codes. - Explore the rich formatting tools available for report authoring - If you are familiar with markdown syntax, you can use the **source mode** to write the report (optional; for advanced users only). ![Visual Mode versus Source Mode](images/quarto%20mode.png){fig-align="center"} ::: callout-note ### Exercise Create a new quarto file from RStudio with the following level-1 and level-2 headers Level 1: Slowhorse Season 4 Level 2: Episode 1: Identity Theft Body: A London bombing puts Taverner under pressure. When River grows concerned for his grandfather, Louisa encourages him to go for a visit. ::: ## Coding with code blocks - In qmd files, we write R codes in so-called **code chunks** (sometimes referred to as code cells or code blocks) identified with `{r}`. - To insert a code chunk, click `Insert` -\>`Code Chunk` -\> `R`. You can also use the shortcut `Ctrl + Alt + I` or `Cmd + Option + I`. ::: callout-warning ## Caveat Ensure the first line remains {r} only and do not include any comments or code on this line. ::: - You can run each code chunk interactively by clicking the green solid triangle (run current code chunk). RStudio executes the codes in the code chunk and displays the results. - See an example and try on your computer! ```{r} print("R is the Best Language! Way better than Python! The battle is on!") ``` ::: callout-note ### Exercise Insert the above R code block in your quarto file under any section. ::: ## Rendering a report At the end, when the Quarto document (including codes and main texts) are ready, use the **`Render`** button in the RStudio IDE to render the file. The rendered report will be in the same folder with your qmd file. ![](images/render_icon.png) ::: callout-note ### Exercise Render your quarto file into a document and see how it looks like. ::: ## More learning resources for Quarto (Optional) - The available YAML fields vary based on document format - [Here](https://quarto.org/docs/reference/formats/pdf.html) for YAML fields for PDF documents - [Here](https://quarto.org/docs/reference/formats/docx.html) for MS Word - [Here](https://quarto.org/docs/reference/formats/html.html) for HTML documents - Markdown syntax - [Markdown basics](https://quarto.org/docs/authoring/markdown-basics.html) - [Markdown practice](https://www.markdowntutorial.com/) - Quarto (recommended to be reviewed after-class) - [Get started](https://quarto.org/docs/get-started/) # Basics of R ## Named objects - R is an **object-oriented language**, so we will be working on **named objects**. - We use the **left arrow** `<-` to create a named object, the keyboard shortcut for `<-` for windows users is `Alt` + `-`, or for mac users, `Option` + `-`. - The `<-` is an assignment operator, which assigns the **R objects** on the RHS to the **name** on the LHS.[^4] - The below code creates a new object called 'x' in R; x is a numeric object; its value is 3. [^4]: You can also use equal sign `=`, but it's recommended to stick with R's tradition. ```{r} # create an object x with value 3 x <- 3 ``` - After an object is created, we can refer to the object by its name ```{r} # print out x x ``` - We can also perform operations on the object ```{r} # Question: hmmm, why does Wei choose these two numbers? x^2 x^3 ``` ::: callout-note ### Exercise Insert a code block in your quarto file, which does the following: - Create an object with name 'x' with the formula of 2 + 2 ::: ## Rules for naming object For a variable to be valid, it should follow these rules - It should contain **letters**, **numbers**, and only **dot** or **underscore** characters. - It *cannot* start with a number (eg: 2iota), or a dot, or an underscore. ```{r} # 2iota <- 2 # .iota <- 2 # _iota <- 2 ``` - It should not be a reserved word in R (eg: mean, sum, etc.). ```{r} # mean <- 2 ``` ::: callout-tip ### Tips In the future, it's good practice to use memorable names to name an object - For instance, use prefix "df\_" or "data\_" to name datasets. ::: ## Functions - A **function** usually takes (one or several) objects as input, run specific operations on the object(s) defined by the function, and then return an output. - For instance, an R's built-in function `sqrt()` takes a number as input, and returns the square root of the number. Let's use it on object `x`. ```{r} sqrt(x) ``` - We will heavily rely on functions to conduct data analyses. For how to use a new function, search the function in RStudio's `help` panel. ::: callout-note ### Exercise 1. Search and learn the usage of function "log()". 2. Insert a code block in your quarto file to compute the logarithm of x. ::: ## Collection of functions: Packages The base R already comes with many useful built-in functions to perform basic tasks, but as data scientists, we need more. To perform certain tasks (such as a machine learning model), we can definitely write our own code from scratch, but it takes lots of (unnecessary) effort. Fortunately, many **packages** have been written by others for us to directly use. - To download a package, hit `Tools` -\> `Install Packages` in RStudio, and type the package name in the pop-up window. Now, download the package `praise`. - To load the packages, we need to type `library()`. ```{r} library(praise) ``` - Now that the package is loaded, you can use the functions in it. `praise()` is a function in the `praise` package. ```{r} praise() ``` ::: callout-tip ### Tips - Packages need to be downloaded only once, but need to be loaded every time you restart the RStudio. ::: ## Comment codes You can put a `#` before any code, to indicate that any codes after the `#` on the same line are your comments, and will not be run by R. It's a good practice to often comment your codes, so that you can help the future you to remember what you were trying to achieve. ```{r} # print("Let's fund Wei for an iPhone 16 Pro Max as a birthday gift!") ``` ```{r} # Is x 1 or 2 below? x <- 1 # +1 ``` # Data structures ## Data types ### **Numeric** - We can use R as a calculator for numeric objects ```{r} # Numeric Vector num2 <- 2.5 log(num2) num2^2 exp(num2) ``` ### **Logical** (TRUE, FALSE): - Logical objects are used to store logical values, such as `TRUE` and `FALSE`. ```{r} num2 <- 2.5 # larger than 2? num2 > 2 # smaller than 2? num2 < 2 # equal to 2? num2 == 2 # not equal to 2? num2 != 2 ``` - Sometimes, we may need to operation on multiple relational operations. We can use **logical operators** to combine multiple relational operations. ```{r} T & F # and T | F # or !T # not ``` - For instance, we may want to know if a number is between 3 and 8. ```{r} num2 >= 3 & num2 <= 8 ``` ### **Character**: - Characters are enclosed within a pair of quotation marks. - Single or double quotation marks can both work. - Even if a character may contain numbers, it will be treated as a character, and R will not perform any mathematical operations on it. ```{r} str1 <- "1 + 1 = 2" ``` ## Check data types using class() We can use `class()` to check the type of an object in R. ```{r} a <- "1+1" class(a) ``` ```{r} b <- 1 + 1 class(b) ``` This is very useful when we first load data from external databases, we need to make sure variables are of the correct data types. ## Data type: conversion Sometimes, data types of variables from raw data may not be what we want; we need to change the data type of a variable to the appropriate one. See the following example: - `a` is a string, and we cannot use mathematical operations on it, or R will report errors. ```{r} #| error: true a <- "1" class(a) a + 1 ``` - We can convert `a` to a numeric value. To convert from character to numeric, we use `as.numeric()` ```{r} b <- as.numeric(a) class(b) ``` # Vectors Below are the complete list of objects in R. ![](images/r_data_structures.png){#fig-R-datastructure alt="Visualization of data structures"}. We will use vector and matrix most frequently in this course. ## Creating vectors ### Creating vectors: c() - In R, a **vector** is a collection of elements of the same data type, which is often used to store a variable of a dataset. For instance, a vector can store the income of a group of people, the final grades of students, etc. - Vector can be created using the function `c()` by listing all the values in the parenthesis, separated by comma ','. - c() stands for "combine". ```{r} Income <- c(1, 3, 5, 10) Income ``` - Vectors must contain elements of the same data type. If they do not, R will automatically convert all elements into the same type, typically characters. ```{r} x <- c(1, "intro", TRUE) class(x) ``` ### Checking the number of elements in a vector: length() You can measure the length of a **vector** using the command `length()` ```{r} x <- c("R", " is", " the", " best", " language") length(x) ``` ### Creating numeric sequences: seq() It is also possible to easily create sequences with patterns - use `seq()` to create sequence with fixed steps ```{r} # use seq() seq(from = 1, to = 2, by = 0.1) ``` - If the step is 1, there's a convenient way using `:` ```{r} 1:5 ``` ### Concatenate multiple vectors into one: c() - Sometimes, we may want to concatenate multiple vectors into one. For instance, we may have collected income data from two different sources, and we want to concatenate them into one vector. - We can use `c()` to concatenate different vectors; this is very commonly used to concatenate vectors. ```{r} Income1 <- 1:3 Income2 <- c(10, 15) ``` ```{r} c(Income1, Income2) ``` ::: callout-note ### Exercise Create a sequence of {1,1,2,2,3,3,3}. ::: ## Indexing and subsetting We put the **index** of elements we would like to extract in a **square bracket `[ ]`**. ```{r} # create a vector of income for 4 lecturers at UCL income <- c(5000, 5500, 6000, 9000) ``` - Extract a single element: use the index of the element ```{r} # what is the income of the 3rd lecturer? income[3] ``` - Extract multiple elements: use a vector of indices ```{r} # what are the incomes of the 1st, 3rd, and 4th lecturers? income[c(1, 3, 4)] ``` ## Element-wise arithmetic operations R is a **vectorized** language, which broadcasts operations to all elements in a vector. This behavior is also called element-wise operations, or broadcasting. - If you operate on a vector with a single number, the operation will be applied to all elements in the vector ```{r} x <- c(1, 3, 8, 7) ``` ```{r} x + 2 ``` ```{r} x * 2 ``` ::: callout-note ### Exercise Create a geometric sequence {2,4,8,16,32} using seq(). ::: ## Elementwise relational operations - Besides arithmetic operations, we can also perform relational operations on vectors. ```{r} x <- c(1, 3, 8, 7) x > 2 ``` - We can also compare a vector with a vector, because R is vectorized ```{r} incomeUCL <- c(6000, 4600, 7000, 9100, 10000) incomeImperial <- c(5000, 4500, 6000, 9000, 10000) incomeUCL > incomeImperial ``` ## Special relational operation: `%in%` - A special relational operation is `%in%` in R, which tests whether an element exists in the object. ```{r} x <- c(1, 3, 8, 7) 3 %in% x 2 %in% x ``` ## After-class exercise 1. Create a vector of 10 numbers from 1 to 10, and extract the 2nd, 4th, and 6th elements. 2. Create a vector of 5 numbers from 1 to 5, and check if 3 is in the vector. 3. Now the interest rate is 0.1, and you have 1000 pounds in your bank account. Calculate the amount in your bank account after 1 year, 2 years, and 3 years, respectively. # Matrices ## Matrices: creating matrices ### Creating matrices: `matrix()` - A matrix can be created using the command `matrix()` - the first argument is the vector to be converted into matrix - the second argument is the number of rows - the last argument is the number of cols ```{r} matrix(1:9, nrow = 3, ncol = 3) ``` ::: callout-important R by default inserts elements **vertically** by **columns**. ::: - R will fill in the matrix by column and discard the remaining extra elements once fully filled, with a warning message ```{r} #| warning: true matrix(1:9, nrow = 3, ncol = 2) ``` ### Creating matrices: inserting by row However, we can ask R to insert by rows by setting the `byrow` argument. ```{r} matrix(1:9, nrow = 3, ncol = 3, byrow = TRUE) ``` ### Creating matrices: concatenate matrices `cbind()` and `rbind()` We can use `cbind()` and `rbind()` to concatenate vectors and matrices into new matrices. - `cbind()` does the column binding ```{r} a <- matrix(1:6, nrow = 2, ncol = 3) a cbind(a, a) # column bind ``` - `rbind()` does the row binding ```{r} rbind(a, a) # row bind ``` ## Matrices: indexing and subsetting Matrices have two dimensions: rows and columns. Therefore, to extract elements from a matrix, we just need to specify which row(s) and which column(s) we want. ```{r} x <- matrix(1:9, nrow = 3, ncol = 3) x ``` - Extract the element in the 2nd row, 3rd column. - use **square bracket** with a coma inside `[ , ]` to indicate subsetting; the argument before coma is the row index, and the argument after the coma is the column index. - 2 is specified for row index, so we will extract elements from the first row - 3 is specified for column index, so we will extract elements from the the second column - Altogether, we extract a single element in row 2, column 3. ```{r} x[2, 3] # the element in the 2nd row, 3rd column ``` - If we leave blank for a dimension, we extract all elements along that dimension. - if we want to take out the entire first row - 1 is specified for the row index - column index is blank ```{r} x[1, ] # all elements in the first row ``` ::: callout-note ### Exercise 1. Extract all elements in the second column 2. Extract all elements in the first and third rows ::: ## Matrices: operations ### Apply a math function to a matrix Let's use 3 matrices `x`, `y`, and `z`: ```{r} x <- matrix(1:6, nrow = 3) y <- matrix(1:6, byrow = T, nrow = 2) x y ``` - Functions will be vectorized over all elements in a matrix ```{r} z <- x^2 z ``` ### Matrices' operations: matrix addition and multiplication - If the two matrices are of the same dimensions, they can do element-wise operations, including element-wise addition and element-wise multiplication ```{r} x + z # elementwise addition ``` ```{r} x * x ``` - If we want to perform the matrix multiplication as in linear algebra, we need to use `%*%` - x and y must have conforming dimensions ```{r} x y x %*% y # matrix multiplication ``` ### Matrices' operations: inverse and transpose - We use `t()` to do matrix transpose ```{r} x t(x) # transpose ``` - We use `solve()` to get the inverse of an matrix ```{r} x solve(t(x) %*% x) # inverse; must be on a square matrix ``` # Data Frames ## Data Frames: creating data.frame ### Data Frames: create dataframe using `data.frame()` - You can think of `data.frame` as a spreadsheet in excel. ```{r} df <- data.frame( id = 1:4, name = c("David", "Karima", "Anil", "Wei"), wage = rnorm(n = 4, mean = 10^5, sd = 10^3), male = c(T, T, T, T) ) df ``` - Data frames can also be created from external sources, e.g., from a csv file or database. ## Data Frames: Basics - Each row stands for an `observation`; each column stands for a `variable`. - Each variable should have a **unique** name. - Each column must contain the same data type, but the different columns can store different data types. - compare with matrix? - Each column must be of same length, because rows have the same length across variables. ## Data Frames: check dimensions and variable types - You can verify the size of the `data.frame` using the command `dim()`; or `nrow()` and `ncol()` ```{r} dim(df) nrow(df) ncol(df) ``` - You can get the data type info using the command `str()` ```{r} str(df) ``` - Get the variables names of the data frame ```{r} names(df) ``` # Other data structures (Optional) ## Arrays - We can use `array()` to generate a high-dimensional array - Just like vectors and matrices, arrays can include only data types of the same kind. - A 3D array is basically a combination of matrices each laid on top of other ```{r arrays, include=T, echo=T} x <- 1:4 x <- array(data = x, dim = c(2, 3, 2)) x ``` ## Lists A list is an R object that can contain anything. List is pretty useful when you need to store objects for latter use. ```{r} x <- 1:2 y <- c("a", "b") L <- list(numbers = x, letters = y) ``` ## Lists: indexing and subsetting There are many ways to extract a certain element from a list. - by index - by the name of the element - by dollar sign `$` ```{r} L[[1]] # extract the first element L[["numbers"]] # based on element name L$numbers # extract the element called numbers ``` After extracting the element, we can work on the element further: ```{r} L$numbers[1:3] > 2 ``` # Programming Basics: Flow Control ## if/else Sometimes, you want to run your code based on different conditions. For instance, if the observation is a missing value, then use the population average to impute the missing value. This is where `if/else` kicks in. ``` if (condition == TRUE) { action 1 } else if (condition == TRUE ){ action 2 } else { action 3 } ``` Example 1: ```{r} a <- 15 if (a > 10) { larger_than_10 <- TRUE } else { larger_than_10 <- FALSE } larger_than_10 ``` Example 2: ```{r} x <- -5 if (x > 0) { print("x is a non-negative number") } else { print("x is a negative number") } ``` ## Loops As the name suggests, in a loop the program repeats a set of instructions many times, until the stopping criteria is met. Loop is very useful for repetitive jobs. ```{r} for (i in 1:10) { # i is the iterator # loop body: gets executed each time # the value of i changes with each iteration } ``` ## Nested loops We can also nest loops inside other loops. ```{r} x <- cbind(1:3, 4:6) # column bind x y <- cbind(7:9, 10:12) # row bind y z <- x for (i in 1:nrow(x)) { for (j in 1:ncol(x)) { z[i, j] <- x[i, j] + y[i, j] } } z ``` ## User-Defined Functions A function takes the argument as input, run some specified actions, and then return the result to us. Functions are very useful. When we would like to test different ideas, we can combine functions with loops: We can write a function which takes different parameters as input, and we can use a loop to go through all the possible combinations of parameters. ### User-defined function syntax Here is how to define a function in general: ```{r} function_name <- function(arg1, arg2 = default_value) { # write the actions to be done with arg1 and arg2 # you can have any number of arguments, with or without defaults return() # the last line is to return some value } ``` Example: ```{r} magic <- function(x, y) { results <- x^2 + y return(results) } magic(2, 3) ``` ### Arguments - Default values: UDF can have default values for arguments by using `arg = default_value`. If the user does not provide a value for the argument when calling the UDF, the default value will be used. ```{r} magic <- function(x, y = 1) { results <- x^2 + y return(results) } magic(2) ``` - Missing values: If the user does not provide a value for an argument without a default value, R will throw an error. ```{r} #| error: true magic <- function(x, y) { results <- x^2 + y return(results) } magic(2) ``` - Order of arguments: The order of arguments matters when calling the UDF. If you want to provide a value for the second argument, you must provide a value for the first argument as well. ```{r} magic <- function(y, x) { results <- x^2 + y return(results) } magic(3, 2) ``` ### Returned Value - We can return a value from a function using the `return()` function. The value returned can be of any data type. ```{r} magic <- function(x, y) { result <- x^2 + y return(result) } magic(2, 3) ``` - If the function does not have a `return()` statement, it will return the last value calculated in the function. ```{r} magic <- function(x, y) { x + y } magic(2, 3) ``` ## Variable Scope - Variables created inside a function are local to the function, and cannot be accessed outside the function. ```{r} #| echo: true magic <- function(x, y) { result <- x^2 + y return(result) } result ``` ## A comprehensive example Task: write a function, which takes a vector as input, and returns the max value of the vector ```{r} get_max <- function(input) { max_value <- input[1] for (i in 2:length(input)) { if (input[i] > max_value) { max_value <- input[i] } } return(max_value) } get_max(c(-1, 3, 2)) ``` ::: callout-note ### Exercise Write your own version of `which.max()` function :::

1 Hello R

1.1 Bilingual arrangements at MSc BA

1.2 A brief history of R

1.3 Why learn R?

1.4 One-One comparison with Python

1.5 A first look at the RStudio Interface

1.6 Where to write R codes (I): Console

1.7 Where to write R codes (II): .qmd script

2 Introduction to Quarto

2.1 YAML header

2.2 Authoring with normal texts

2.3 Coding with code blocks

2.4 Rendering a report

2.5 More learning resources for Quarto (Optional)

3 Basics of R

3.1 Named objects

3.2 Rules for naming object

3.3 Functions

3.4 Collection of functions: Packages

3.5 Comment codes

4 Data structures

4.1 Data types

4.1.1 Numeric

4.1.2 Logical (TRUE, FALSE):

4.1.3 Character:

4.2 Check data types using class()

4.3 Data type: conversion

5 Vectors

5.1 Creating vectors

5.1.1 Creating vectors: c()

5.1.2 Checking the number of elements in a vector: length()

5.1.3 Creating numeric sequences: seq()

5.1.4 Concatenate multiple vectors into one: c()

5.2 Indexing and subsetting

5.3 Element-wise arithmetic operations

5.4 Elementwise relational operations

5.5 Special relational operation: %in%

5.6 After-class exercise

6 Matrices

6.1 Matrices: creating matrices

6.1.1 Creating matrices: matrix()

6.1.2 Creating matrices: inserting by row

6.1.3 Creating matrices: concatenate matrices cbind() and rbind()

6.2 Matrices: indexing and subsetting

6.3 Matrices: operations

6.3.1 Apply a math function to a matrix

6.3.2 Matrices’ operations: matrix addition and multiplication

6.3.3 Matrices’ operations: inverse and transpose

7 Data Frames

7.1 Data Frames: creating data.frame

7.1.1 Data Frames: create dataframe using data.frame()

7.2 Data Frames: Basics

7.3 Data Frames: check dimensions and variable types

8 Other data structures (Optional)

8.1 Arrays

8.2 Lists

8.3 Lists: indexing and subsetting

9 Programming Basics: Flow Control

9.1 if/else

9.2 Loops

9.3 Nested loops

9.4 User-Defined Functions

9.4.1 User-defined function syntax

9.4.2 Arguments

9.4.3 Returned Value

9.5 Variable Scope

9.6 A comprehensive example

Footnotes

1.7 Where to write R codes (II): `.qmd` script

5.5 Special relational operation: `%in%`

6.1.1 Creating matrices: `matrix()`

6.1.3 Creating matrices: concatenate matrices `cbind()` and `rbind()`

7.1.1 Data Frames: create dataframe using `data.frame()`