Side-to-Side Comparison between R, Python, and Julia

Author

Affiliation

Dr Wei Miao

UCL School of Management

Published

September 16, 2024

Modified

September 16, 2024

Tip

This tutorial is designed for those who are familiar with either R, Python or Julia, and would like to learn another language.

In this tutorial, I will compare the basics of R, Python, and Julia side by side. We will cover the basic syntax, data types, and functionalities.

If you discover any mistakes or outdated content in this tutorial, please let me know. I will be very grateful for your feedback.

Code

library(reticulate)
use_condaenv("base")
library(JuliaCall)

1 Language Basics

1.1 Assignment of variables

Caution

In R and Python, assignment operations do not print the assigned object by default.

But Julia does print the assigned object by default. Unless you put a semicolon ; at the end of the line, Julia will not print the assigned object.

Code

# create an object x with value 3
x <- 3
x

[1] 3

Code

# create an object x with value 3
x = 3
x

Code

# create an object x with value 3
x = 3; # the ; suppresses the output

1.2 Comment codes

You can put a # before any code, to indicate that any codes after the # on the same line are your comments, and will not be run by R.

It’s a good practice to often comment your codes, so that you can help the future you to remember what you were trying to achieve.

Code

# Is x 1 or 2 below?
x <- 1 # +1

Same as R. You can put a # before any code, to indicate that any codes after the # on the same line are your comments, and will not be run by Python.

Code

# Is x 1 or 2 below?
x = 1 # +1

Same as R and Python. You can put a # before any code, to indicate that any codes after the # on the same line are your comments, and will not be run by Julia.

Code

# Is x 1 or 2 below?

x = 1 # +1

1.3 Rules for naming object

For a variable to be valid, it should follow these rules

It should contain letters, numbers, and only dot or underscore characters.
It cannot start with a number (eg: 2iota), or a dot, or an underscore.

Code

# 2iota <- 2
# .iota <- 2
# _iota <- 2

It should not be a reserved word in R (eg: mean, sum, etc.).

Code

# mean <- 2

For a variable to be valid, it should follow these rules

It should contain letters, numbers, and only underscore characters.
It cannot start with a number (eg: 2iota), or a dot, or an underscore.

Code


# 2iota = 2

# .iota = 2

# _iota = 2

It should not be a reserved word in Python (eg: mean, sum, etc.).

Code


# mean = 2

Same as R.

2 Packages and Functions

The base R already comes with many useful built-in functions to perform basic tasks, but as data scientists, we need more.

To perform certain tasks (such as a machine learning model), we can definitely write our own code from scratch, but it takes lots of (unnecessary) effort. Fortunately, many packages have been written by others for us to directly use.

To download a package, hit Tools -> Install Packages in RStudio, and type the package name in the pop-up window. Now, download the package dplyr.
To load the packages, we need to type library().

Code

library(dplyr)

Now that the package is loaded, you can use the functions in it. filter() is a function in the dplyr package that can be used to filter data.

Code

data(iris)  # load built in iris
iris %>%
  filter(Species == "setosa")

Python has a similar concept of packages, but they are called modules.

To install a module, you can use pip install in the terminal, or !pip install in Jupyter Notebook. You can also install a module in the Anaconda Navigator.

Code

# !pip install pandas

To load a module, you can use import. Now that the module is loaded, you can use the functions in it.

Code

import pandas as pd
iris = pd.read_csv('https://raw.githubusercontent.com/mwaskom/seaborn-data/master/iris.csv') # load iris

iris[iris['species'] == 'setosa']

    sepal_length  sepal_width  petal_length  petal_width species
0            5.1          3.5           1.4          0.2  setosa
1            4.9          3.0           1.4          0.2  setosa
2            4.7          3.2           1.3          0.2  setosa
3            4.6          3.1           1.5          0.2  setosa
4            5.0          3.6           1.4          0.2  setosa
5            5.4          3.9           1.7          0.4  setosa
6            4.6          3.4           1.4          0.3  setosa
7            5.0          3.4           1.5          0.2  setosa
8            4.4          2.9           1.4          0.2  setosa
9            4.9          3.1           1.5          0.1  setosa
10           5.4          3.7           1.5          0.2  setosa
11           4.8          3.4           1.6          0.2  setosa
12           4.8          3.0           1.4          0.1  setosa
13           4.3          3.0           1.1          0.1  setosa
14           5.8          4.0           1.2          0.2  setosa
15           5.7          4.4           1.5          0.4  setosa
16           5.4          3.9           1.3          0.4  setosa
17           5.1          3.5           1.4          0.3  setosa
18           5.7          3.8           1.7          0.3  setosa
19           5.1          3.8           1.5          0.3  setosa
20           5.4          3.4           1.7          0.2  setosa
21           5.1          3.7           1.5          0.4  setosa
22           4.6          3.6           1.0          0.2  setosa
23           5.1          3.3           1.7          0.5  setosa
24           4.8          3.4           1.9          0.2  setosa
25           5.0          3.0           1.6          0.2  setosa
26           5.0          3.4           1.6          0.4  setosa
27           5.2          3.5           1.5          0.2  setosa
28           5.2          3.4           1.4          0.2  setosa
29           4.7          3.2           1.6          0.2  setosa
30           4.8          3.1           1.6          0.2  setosa
31           5.4          3.4           1.5          0.4  setosa
32           5.2          4.1           1.5          0.1  setosa
33           5.5          4.2           1.4          0.2  setosa
34           4.9          3.1           1.5          0.2  setosa
35           5.0          3.2           1.2          0.2  setosa
36           5.5          3.5           1.3          0.2  setosa
37           4.9          3.6           1.4          0.1  setosa
38           4.4          3.0           1.3          0.2  setosa
39           5.1          3.4           1.5          0.2  setosa
40           5.0          3.5           1.3          0.3  setosa
41           4.5          2.3           1.3          0.3  setosa
42           4.4          3.2           1.3          0.2  setosa
43           5.0          3.5           1.6          0.6  setosa
44           5.1          3.8           1.9          0.4  setosa
45           4.8          3.0           1.4          0.3  setosa
46           5.1          3.8           1.6          0.2  setosa
47           4.6          3.2           1.4          0.2  setosa
48           5.3          3.7           1.5          0.2  setosa
49           5.0          3.3           1.4          0.2  setosa

Julia has a similar concept of packages.

To install a package, you can use Pkg.add() in the Julia terminal.

Code


using Pkg

Pkg.add("DataFrames")
Pkg.add("CSV")

To load a package, you can use using. Now that the package is loaded, you can use the functions in it.

Code


using DataFrames, CSV

iris = CSV.File(download("https://raw.githubusercontent.com/mwaskom/seaborn-data/master/iris.csv")) |> DataFrame;

# Filter the DataFrame where species is "setosa"
setosa_data = iris[iris.species .== "setosa", :];

# Display the first few rows of the filtered data
first(setosa_data, 5)

5×5 DataFrame
 Row │ sepal_length  sepal_width  petal_length  petal_width  species
     │ Float64       Float64      Float64       Float64      String15
─────┼────────────────────────────────────────────────────────────────
   1 │          5.1          3.5           1.4          0.2  setosa
   2 │          4.9          3.0           1.4          0.2  setosa
   3 │          4.7          3.2           1.3          0.2  setosa
   4 │          4.6          3.1           1.5          0.2  setosa
   5 │          5.0          3.6           1.4          0.2  setosa

3 Arithmetic, Logical, and Relational Operations

3.1 Arithmetic operations

Code

# arithmatic operations
x <- 3 
x + 1 # addition

[1] 4

Code

x - 1 # subtraction

[1] 2

Code

x * 2 # multiplication

[1] 6

Code

x / 2 # division

[1] 1.5

Code

x^2 # square

[1] 9

Code

x %% 2 # remainder

[1] 1

Code

x %/% 2 # integer division

[1] 1

Code

# math operations
log(x)  # natural logarithm

[1] 1.098612

Code

exp(x)  # exponential

[1] 20.08554

Code

sqrt(x) # square root

[1] 1.732051

Code

log10(x) # log base 10

[1] 0.4771213

Code

round(x/2) # round

[1] 2

Code

floor(x/2) # floor

[1] 1

Code

ceiling(x/2) # ceiling

[1] 2

Code

# arithmatic operations
x = 3
x + 1 # addition

Code

x - 1 # subtraction

Code

x * 2 # multiplication

Code

x / 2 # division

1.5

Code

x ** 2 # square

Code

x % 2 # remainder

Code

x // 2 # integer division

Code

# math operations
import math
math.log(x)  # natural logarithm

1.0986122886681098

Code

math.exp(x)  # exponential

20.085536923187668

Code

math.sqrt(x) # square root

1.7320508075688772

Code

math.log10(x) # log base 10

0.47712125471966244

Code

round(x/2) # round

Code

math.floor(x/2) # floor

Code

math.ceil(x/2) # ceiling

Code


# arithmatic operations

x = 3

Code


x + 1 # addition

Code


x - 1 # subtraction

Code


x * 2 # multiplication

Code


x / 2 # division

1.5

Code


x ^ 2 # square

Code


x % 2 # remainder

Code


div(x, 2) # integer division

Code


# math operations

log(x)  # natural logarithm

1.0986122886681098

Code


exp(x)  # exponential

20.085536923187668

Code


sqrt(x) # square root

1.7320508075688772

Code


log10(x) # log base 10

0.47712125471966244

Code


round(x/2) # round

2.0

Code


floor(x/2) # floor

1.0

Code


ceil(x/2) # ceiling

2.0

3.2 Logical operations

Code

# logical operations
x <- 3
x > 2 # larger than

[1] TRUE

Code

x < 2 # smaller than

[1] FALSE

Code

x == 2 # equal to

[1] FALSE

Code

x != 2 # not equal to

[1] TRUE

Code

# logical operations
x = 3
x > 2 # larger than

True

Code

x < 2 # smaller than

False

Code

x == 2 # equal to

False

Code

x != 2 # not equal to

True

Code


# logical operations

x = 3

Code


x > 2 # larger than

true

Code


x < 2 # smaller than

false

Code


x == 2 # equal to

false

Code


x != 2 # not equal to

true

3.3 Relational operations

Caution

R: Boolean values are TRUE and FALSE.
Python: Boolean values are True and False (case-sensitive).

Code

T & F # and

[1] FALSE

Code

T | F # or

[1] TRUE

Code

!T # not

[1] FALSE

Code

True & False # and

False

Code

True | False # or

True

Code

not True # not

False

Code


true & false # and

false

Code


true | false # or

true

Code


!true # not

false

4 Vectors

4.1 Creating vectors

In R, a vector is a collection of elements of the same data type, which is often used to store a variable of a dataset. For instance, a vector can store the income of a group of people, the final grades of students, etc.
Vector can be created using the function c() by listing all the values in the parenthesis, separated by comma ‘,’.
c() stands for “combine”.

Code

Income <- c(1, 3, 5, 10)
Income

[1]  1  3  5 10

Vectors must contain elements of the same data type. If not, it will automatically convert elements into the same type (usually character type).

Code

Income <- c(1, 3, 5, "10")
Income

[1] "1"  "3"  "5"  "10"

In Python, a list is a collection of elements of different data types, which is often used to store a variable of a dataset. For instance, a list can store the income of a group of people, the final grades of students, etc.
List can be created using the square brackets [] by listing all the values in the brackets, separated by comma ‘,’.

Code

Income = [1, 3, 5, 10]
Income

[1, 3, 5, 10]

List can contain elements of different data types.

Code

Income = [1, 3, 5, "10"]
Income

[1, 3, 5, '10']

If you want to create a list with elements of the same numeric data type, you can use the numpy package.

Code

import numpy as np
Income = np.array([1, 3, 5, 10])
Income

array([ 1,  3,  5, 10])

In Julia, a vector is a collection of elements of the same data type, which is often used to store a variable of a dataset. For instance, a vector can store the income of a group of people, the final grades of students, etc.
Vector can be created using the square brackets [] by listing all the values in the brackets, separated by comma ‘,’.

Code


Income = [1, 3, 5, 10]

4-element Vector{Int64}:
  1
  3
  5
 10

Vector can contain elements of different data types. However, you will note that the data type is now changed to any rather than Int64.

Code


Income = [1, 3, 5, "10"]

4-element Vector{Any}:
 1
 3
 5
  "10"

4.2 Indexing and subsetting

Caution

R, Python, and Julia have different indexing rules.

In R and Julia, the index starts from 1.
In Python, the index starts from 0.

To extract an element from a vector, we put the index of the element in a square bracket [ ].

Code

Income <- c(1, 3, 5, 10)
Income[1] # extract the first element

[1] 1

If we want to extract multiple elements, we can use a vector of indices.

Code

Income[c(1,3)] # extract the first and third elements

[1] 1 5

To extract an element from a list, we put the index of the element in a square bracket [ ].

Code

Income = [1, 3, 5, 10]
Income[0] # extract the first element

If we want to extract multiple elements, we can use a slice.

Code

Income[0:3] # extract the first and third elements

[1, 3, 5]

With numpy array, we can use the same syntax as R.

Code

Income = np.array([1, 3, 5, 10])
Income[0] # extract the first element

Code

Income[[0,2]] # extract the first and third elements

array([1, 5])

To extract an element from a vector, we put the index of the element in a square bracket [ ].

Code


Income = [1, 3, 5, 10];

Income[1] # extract the first element

If we want to extract multiple elements, we can use a slice.

Code


Income[1:3] # extract the first and third elements

3-element Vector{Int64}:
 1
 3
 5

4.3 Creating numeric sequences with fixed steps

It is also possible to easily create sequences with patterns

use seq() to create sequence with fixed steps

Code

# use seq()
seq(from = 1, to = 2, by = 0.1)

 [1] 1.0 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8 1.9 2.0

If the step is 1, there’s a convenient way using :

Code

1:5

[1] 1 2 3 4 5

In base Python, we can use range() to create sequence with fixed steps

Code

# from 1 to 6, with step 1
list(range(1, 6)) # range() returns a range object, we need to convert it to a list

[1, 2, 3, 4, 5]

use np.arange() to create sequence with fixed steps

Code

np.arange(1, 2, 0.1)

array([1. , 1.1, 1.2, 1.3, 1.4, 1.5, 1.6, 1.7, 1.8, 1.9])

In Julia, we can use 1:5 to create sequence with fixed steps

Code

1:5

1:5

However, the julia object is not a integer vector, but a UnitRange{Int64} object.

Code


typeof(1:5)

UnitRange{Int64}

4.4 Combine multiple vectors into one: c()

Sometimes, we may want to combine multiple vectors into one. For instance, we may have collected income data from two different sources, and we want to combine them into one vector.
We can use c() to combine different vectors; this is very commonly used to concatenate vectors.

Code

Income1 <- 1:3 
Income2 <- c(10, 15)

Code

c(Income1,Income2)

[1]  1  2  3 10 15

In Python, we can use the + operator to concatenate lists.

Code

Income1 = [1, 2, 3]
Income2 = [10, 15]

Code

Income1 + Income2

[1, 2, 3, 10, 15]

For numpy arrays, we can use np.concatenate() to concatenate arrays.

Code

Income1 = np.array([1, 2, 3])
Income2 = np.array([10, 15])

Code

np.concatenate((Income1, Income2))

array([ 1,  2,  3, 10, 15])

In Julia, we can use the vcat() function to concatenate vectors.

Code


Income1 = [1, 2, 3];

Income2 = [10, 15]; 

vcat(Income1, Income2)

5-element Vector{Int64}:
  1
  2
  3
 10
 15

4.5 Replicating elements

We can use the rep() function to replicate elements in a vector.

Code

rep(1:3, times = 2) # replicate 1:3 twice

[1] 1 2 3 1 2 3

Code

rep(1:3, each = 2) # replicate each element in 1:3 twice

[1] 1 1 2 2 3 3

We can use the * operator to replicate elements in a list.

Code

[1, 2, 3] * 2 # replicate 1:3 twice

[1, 2, 3, 1, 2, 3]

For numpy arrays, we can use np.tile() to replicate elements.

Code

np.tile([1, 2, 3], 2) # replicate 1:3 twice

array([1, 2, 3, 1, 2, 3])

Code

np.repeat([1, 2, 3], 2) # replicate each element in 1:3 twice

array([1, 1, 2, 2, 3, 3])

We can use the repeat() function to replicate elements in a vector.

Code


repeat([1, 2, 3], 2) # replicate 1:3 twice

6-element Vector{Int64}:
 1
 2
 3
 1
 2
 3

Code


repeat([1, 2, 3], inner = 2) # replicate each element in 1:3 twice

6-element Vector{Int64}:
 1
 1
 2
 2
 3
 3

4.6 Maximum and minimum

We can use the max() and min() functions to find the maximum and minimum values in a vector.

Code

Income <- c(1, 3, 5, 10)

max(Income) # maximum

[1] 10

Code

min(Income) # minimum

[1] 1

We can use the max() and min() functions to find the maximum and minimum values in a list.

Code

Income = [1, 3, 5, 10]

max(Income) # maximum

Code

min(Income) # minimum

For numpy arrays, we can use np.max() and np.min() to find the maximum and minimum values.

Code

Income = np.array([1, 3, 5, 10])

np.max(Income) # maximum

Code

np.min(Income) # minimum

We can use the maximum() and minimum() functions to find the maximum and minimum values in a vector.

Code


Income = [1, 3, 5, 10];

maximum(Income) # maximum

Code


minimum(Income) # minimum

4.7 Sum and mean

We can use the sum() and mean() functions to find the sum and mean values in a vector.

Code

Income <- c(1, 3, 5, 10)

sum(Income, na.rm = T) # sum and remove missing values

[1] 19

Code

mean(Income, na.rm = T) # mean and remove missing values

[1] 4.75

We can use the sum() and mean() functions to find the sum and mean values in a list.

Code

Income = [1, 3, 5, 10]

sum(Income) # sum

Code

np.mean(Income) # mean

4.75

For numpy arrays, we can use np.sum() and np.mean() to find the sum and mean values.

Code

Income = np.array([1, 3, 5, 10])

np.sum(Income) # sum

Code

np.mean(Income) # mean

4.75

We can use the sum() and mean() functions to find the sum and mean values in a vector.

Code


Income = [1, 3, 5, 10];

sum(Income) # sum

Code


mean(Income) # mean

4.75

4.8 Missing values

Caution

In R, missing values are represented by NA.
In Python, missing values are represented by np.nan.
In Julia, missing values are represented by missing.

In R, missing values are represented by NA.

Code

Income <- c(1, 3, 5, NA)

sum(Income, na.rm = T) # sum and remove missing values

[1] 9

Code

mean(Income, na.rm = T) # mean and remove missing values

[1] 3

In Python, missing values are represented by np.nan.

Code

Income = [1, 3, 5, np.nan]

np.nansum(Income) # sum and remove missing values

9.0

Code

np.nanmean(Income) # mean and remove missing values

3.0

In Julia, missing values are represented by missing. In order to take the sum or mean by removing missing values,

Code


Income = [1, 3, 5, missing];

sum(skipmissing(Income)) # sum and remove missing values

4.9 Element-wise arithmetic operations

Caution

R by default supports element-wise operations on vectors.
Python by default does not support element-wise operations on lists. You need to use numpy arrays to do element-wise operations.
Julia by default does not support element-wise operations on arrays. You need to use the . operator to do element-wise operations.

If you operate on a vector with a single number, the operation will be applied to all elements in the vector

Code

Income <- c(1, 3, 5, 10)

Income + 2 # element-wise addition

[1]  3  5  7 12

Code

Income * 2 # element-wise multiplication

[1]  2  6 10 20

However, the base Python does not support element-wise operations on lists.

Code

Income = [1, 3, 5, 10]

Income + 2 # element-wise addition

TypeError: can only concatenate list (not "int") to list

Code

Income * 2 # element-wise multiplication

[1, 3, 5, 10, 1, 3, 5, 10]

For numpy arrays, the behavior is the same as R.

Code

Income = np.array([1, 3, 5, 10])

Income + 2 # element-wise addition

array([ 3,  5,  7, 12])

Code

Income * 2 # element-wise multiplication

array([ 2,  6, 10, 20])

If you operate on a vector with a single number, the operation will be applied to all elements in the vector. However, the base Julia does not support element-wise operations on arrays. In order to do element-wise operations, you need to use the . operator.

Code


Income = [1, 3, 5, 10];

Income .+ 2 # element-wise addition

4-element Vector{Int64}:
  3
  5
  7
 12

Code


Income .* 2 # element-wise multiplication

4-element Vector{Int64}:
  2
  6
 10
 20

4.10 Vector multiplication

If the two vectors are of the same length, they can do element-wise operations, including element-wise addition and element-wise multiplication

Code

Income1 <- c(1, 3, 5, 10)

Income2 <- c(2, 4, 6, 8)

Income1 + Income2 # element-wise addition

[1]  3  7 11 18

Code

Income1 * Income2 # element-wise multiplication

[1]  2 12 30 80

For numpy arrays, we can use np.multiply() to do element-wise multiplication.

Code

Income1 = np.array([1, 3, 5, 10])

Income2 = np.array([2, 4, 6, 8])

np.add(Income1, Income2) # element-wise addition

array([ 3,  7, 11, 18])

Code

np.multiply(Income1, Income2) # element-wise multiplication

array([ 2, 12, 30, 80])

If the two vectors are of the same length, they can do element-wise operations, including element-wise addition and element-wise multiplication

Code


Income1 = [1, 3, 5, 10];

Income2 = [2, 4, 6, 8];

Income1 .+ Income2 # element-wise addition

4-element Vector{Int64}:
  3
  7
 11
 18

Code


Income1 .* Income2 # element-wise multiplication

4-element Vector{Int64}:
  2
 12
 30
 80

4.11 Max and min of 2 vectors

We can use the pmax() and pmin() functions to find the element-wise maximum and minimum values of two vectors.

Code

Income1 <- c(1, 3, 5, 10)

Income2 <- c(2, 4, 6, 8)

pmax(Income1, Income2) # element-wise maximum

[1]  2  4  6 10

Code

pmin(Income1, Income2) # element-wise minimum

[1] 1 3 5 8

We can use the np.maximum() and np.minimum() functions to find the element-wise maximum and minimum values of two numpy arrays.

Code

Income1 = np.array([1, 3, 5, 10])

Income2 = np.array([2, 4, 6, 8])

np.maximum(Income1, Income2) # element-wise maximum

array([ 2,  4,  6, 10])

Code

np.minimum(Income1, Income2) # element-wise minimum

array([1, 3, 5, 8])

We can use the max() and min() functions to find the element-wise maximum and minimum values of two vectors.

Code


Income1 = [1, 3, 5, 10];

Income2 = [2, 4, 6, 8];

max.(Income1, Income2) # element-wise maximum

4-element Vector{Int64}:
  2
  4
  6
 10

Code


min.(Income1, Income2) # element-wise minimum

4-element Vector{Int64}:
 1
 3
 5
 8

5 Character and String

5.1 Creating strings

Characters are enclosed within a pair of quotation marks.
Single or double quotation marks can both work.
If even a character may contain numbers, it will be treated as a character, and R will not perform any mathematical operations on it.

Code

str1 <- "1 + 1 = 2"

Strings are enclosed within a pair of quotation marks.
Single or double quotation marks can both work.

Code

str1 = "1 + 1 = 2"

In Julia, single quotation marks (') are used for defining individual characters. Double quotation marks (") are used for defining strings.

Code


character1 = '1'

'1': ASCII/Unicode U+0031 (category Nd: Number, decimal digit)

Code

str1 = "1 + 1 = 2"

"1 + 1 = 2"

5.2 Concatenating strings

We can use the paste() function to concatenate strings.

Code

str1 <- "1 + 1 = "
str2 <- "2"

paste(str1, str2)

[1] "1 + 1 =  2"

We can use the + operator to concatenate strings.

Code

str1 = "1 + 1 = "
str2 = "2"

str1 + str2

'1 + 1 = 2'

We can use the * operator to concatenate strings.

Code


str1 = "1 + 1 = "

"1 + 1 = "

Code


str2 = "2"

"2"

Code


str1 * str2

"1 + 1 = 2"

5.3 Checking the number of elements in a vector: length()

You can measure the length of a vector using the command length()

Code

x <- c('R',' is', ' the', ' best', ' language')
length(x)

[1] 5

You can measure the length of a list using the command len()

Code

x = ['R',' is', ' the', ' best', ' language']

len(x)

For numpy arrays, you can use the shape attribute to get the shape of the array.

Code

x = np.array(['Python',' is', ' the', ' best', ' language'])

x.shape

(5,)

You can measure the length of a vector using the command length()

Code


x = ["Julia", " is", " the", " best", " language"]

5-element Vector{String}:
 "Julia"
 " is"
 " the"
 " best"
 " language"

Code


length(x)

5.4 Special relational operation: `%in%`

A special relational operation is %in% in R, which tests whether an element exists in the object.

Code

x <- c(1,3,8,7) 

3 %in% x

[1] TRUE

Code

2 %in% x

[1] FALSE

In Python, we can use the in operator to test whether an element exists in the object.

Code

x = [1, 3, 8, 7]

3 in x

True

Code

2 in x

False

In Julia, we can use the in operator to test whether an element exists in the object.

Code


x = [1, 3, 8, 7];

3 in x

true

6 Matrices

6.1 Matrices: creating matrices

Caution

When creating R matrix using matrix(), the sequence of elements is filled by column. This by-column is named as column-major order.

When creating Python matrix using np.array(), the sequence of elements is filled by row. This by-row is named as row-major order.

A matrix can be created using the command matrix()
- the first argument is the vector to be converted into matrix
- the second argument is the number of rows
- the last argument is the number of cols

Code

matrix(1:9, nrow = 3, ncol = 3)

     [,1] [,2] [,3]
[1,]    1    4    7
[2,]    2    5    8
[3,]    3    6    9

A matrix can be created using the numpy package, np.array() function, where the argument is a list of lists, where each list is a row of the matrix

Code

import numpy as np

np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])

array([[1, 2, 3],
       [4, 5, 6],
       [7, 8, 9]])

A matrix can be created using the base Julia using square brackets [] and semicolon ; to separate rows.

Code


[1 2 3; 4 5 6; 7 8 9]

3×3 Matrix{Int64}:
 1  2  3
 4  5  6
 7  8  9

6.2 Creating matrices: combine matrices

We can use cbind() and rbind() to concatenate vectors and matrices into new matrices.

cbind() does the column binding

Code

a <- matrix(1:6, nrow = 2, ncol = 3)

a

     [,1] [,2] [,3]
[1,]    1    3    5
[2,]    2    4    6

Code

cbind(a, a) # column bind

     [,1] [,2] [,3] [,4] [,5] [,6]
[1,]    1    3    5    1    3    5
[2,]    2    4    6    2    4    6

rbind() does the row binding

Code

rbind(a, a) # row bind

     [,1] [,2] [,3]
[1,]    1    3    5
[2,]    2    4    6
[3,]    1    3    5
[4,]    2    4    6

We can use np.concatenate() to concatenate arrays.

Code

a = np.array([[1, 2, 3], [4, 5, 6]])

a

array([[1, 2, 3],
       [4, 5, 6]])

Code

np.concatenate((a, a), axis = 1) # column bind

array([[1, 2, 3, 1, 2, 3],
       [4, 5, 6, 4, 5, 6]])

Code

np.concatenate((a, a), axis = 0) # row bind

array([[1, 2, 3],
       [4, 5, 6],
       [1, 2, 3],
       [4, 5, 6]])

We can use the hcat() and vcat() functions to concatenate matrices.

Code


a = [1 2 3; 4 5 6]

2×3 Matrix{Int64}:
 1  2  3
 4  5  6

Code


hcat(a, a) # column bind

2×6 Matrix{Int64}:
 1  2  3  1  2  3
 4  5  6  4  5  6

Code


vcat(a, a) # row bind

4×3 Matrix{Int64}:
 1  2  3
 4  5  6
 1  2  3
 4  5  6

6.3 Matrices: indexing and subsetting

Matrices have two dimensions: rows and columns. Therefore, to extract elements from a matrix, we just need to specify which row(s) and which column(s) we want.

Code

x <- matrix(1:9, nrow = 3, ncol = 3)
x

     [,1] [,2] [,3]
[1,]    1    4    7
[2,]    2    5    8
[3,]    3    6    9

Extract the element in the 2nd row, 3rd column.
- use square bracket with a coma inside [ , ] to indicate subsetting; the argument before coma is the row index, and the argument after the coma is the column index.
  - 2 is specified for row index, so we will extract elements from the first row
  - 3 is specified for column index, so we will extract elements from the the second column
  - Altogether, we extract a single element in row 2, column 3.

Code

x[2,3] # the element in the 2nd row, 3rd column

[1] 8

If we leave blank for a dimension, we extract all elements along that dimension.
- if we want to take out the entire first row
  - 1 is specified for the row index
  - column index is blank

Code

x[1,] # all elements in the first row

[1] 1 4 7

Code

x = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])

x

array([[1, 2, 3],
       [4, 5, 6],
       [7, 8, 9]])

Extract the element in the 2nd row, 3rd column.

Code

x[1,2] # the element in the 2nd row, 3rd column

If we leave blank for a dimension, we extract all elements along that dimension.

Code

x[0,:] # all elements in the first row

array([1, 2, 3])

Code


x = [1 2 3; 4 5 6; 7 8 9];

Extract the element in the 2nd row, 3rd column.

Code


x[2,3] # the element in the 2nd row, 3rd column

Different from R, we need to use : to extract all elements along that dimension.

Code


x[1,:] # all elements in the first row

3-element Vector{Int64}:
 1
 2
 3

6.4 Matrices: check dimensions and variable types

You can verify the size of the matrix using the command dim(); or nrow() and ncol()

Code

x <- matrix(1:9, nrow = 3, ncol = 3)

dim(x)

[1] 3 3

Code

nrow(x)

[1] 3

Code

ncol(x)

[1] 3

You can get the data type info using the command str()

Code

str(x)

 int [1:3, 1:3] 1 2 3 4 5 6 7 8 9

You can verify the size of the matrix using the shape attribute

Code

x = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])

x.shape

(3, 3)

You can get the data type info using the dtype attribute

Code

x.dtype

dtype('int64')

You can verify the size of the matrix using the size() function

Code


x = [1 2 3; 4 5 6; 7 8 9]

3×3 Matrix{Int64}:
 1  2  3
 4  5  6
 7  8  9

Code


size(x)

(3, 3)

6.5 Matrices: special operations

6.5.1 Creating a diagonal matrix

We can use the diag() function to create a diagonal matrix.

Code

diag(1:3)

     [,1] [,2] [,3]
[1,]    1    0    0
[2,]    0    2    0
[3,]    0    0    3

We can use the np.diag() function to create a diagonal matrix.

Code

np.diag([1, 2, 3])

array([[1, 0, 0],
       [0, 2, 0],
       [0, 0, 3]])

We can use the diagm() function to create a diagonal matrix.

Code

using LinearAlgebra
diagm(0 => [1, 2, 3])

3×3 Matrix{Int64}:
 1  0  0
 0  2  0
 0  0  3

6.5.2 Creating an identity matrix

We can use the diag() function to create an identity matrix.

Code

diag(3)

     [,1] [,2] [,3]
[1,]    1    0    0
[2,]    0    1    0
[3,]    0    0    1

We can use the np.eye() function to create an identity matrix.

Code

np.eye(3)

array([[1., 0., 0.],
       [0., 1., 0.],
       [0., 0., 1.]])

We can use the I() function to create an identity matrix.

Code


I(3)

3×3 Diagonal{Bool, Vector{Bool}}:
 1  ⋅  ⋅
 ⋅  1  ⋅
 ⋅  ⋅  1

6.6 Matrices’ operations: matrix addition and multiplication

If the two matrices are of the same dimensions, they can do element-wise operations, including element-wise addition and element-wise multiplication

Code

set.seed(123)

x = matrix(rnorm(9), nrow = 3, ncol = 3)

z = matrix(rnorm(9), nrow = 3, ncol = 3)

x + z   # elementwise addition

           [,1]      [,2]       [,3]
[1,] -1.0061376 0.4712798  2.2478293
[2,]  0.9939043 0.2399705 -0.7672108
[3,]  1.9185221 1.1592239 -2.6534700

Code

x * x

           [,1]        [,2]      [,3]
[1,] 0.31413295 0.004971433 0.2124437
[2,] 0.05298168 0.016715318 1.6003799
[3,] 2.42957161 2.941447909 0.4717668

If we want to perform the matrix multiplication as in linear algebra, we need to use %*%
- x and y must have conforming dimensions

Code

           [,1]       [,2]       [,3]
[1,] -0.5604756 0.07050839  0.4609162
[2,] -0.2301775 0.12928774 -1.2650612
[3,]  1.5587083 1.71506499 -0.6868529

Code

y = matrix(rnorm(9), nrow = 3, ncol = 3)
x %*% y # matrix multiplication

           [,1]       [,2]       [,3]
[1,] -0.9186059 -0.2861301  0.6175429
[2,]  1.1282999  0.8396152 -1.1340507
[3,]  1.0157790 -1.5987826 -4.4424790

If the two matrices are of the same dimensions, they can do element-wise operations, including element-wise addition and element-wise multiplication

Code

x = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])

y = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])

x + y # elementwise addition

array([[ 2,  4,  6],
       [ 8, 10, 12],
       [14, 16, 18]])

Code

x * y # elementwise multiplication

array([[ 1,  4,  9],
       [16, 25, 36],
       [49, 64, 81]])

If we want to perform the matrix multiplication as in linear algebra, we need to use @
- x and y must have conforming dimensions

Code

x @ y # matrix multiplication

array([[ 30,  36,  42],
       [ 66,  81,  96],
       [102, 126, 150]])

If the two matrices are of the same dimensions, they can do element-wise operations, including element-wise addition and element-wise multiplication. It’s recommended to use . to indicate element-wise operations

Code


x = [1 2 3; 4 5 6; 7 8 9]

3×3 Matrix{Int64}:
 1  2  3
 4  5  6
 7  8  9

Code


y = [1 2 3; 4 5 6; 7 8 9]

3×3 Matrix{Int64}:
 1  2  3
 4  5  6
 7  8  9

Code


x .+ y # elementwise addition

3×3 Matrix{Int64}:
  2   4   6
  8  10  12
 14  16  18

6.7 Matrices’ operations: inverse and transpose

We use t() to do matrix transpose

Code

x = matrix(rnorm(9), nrow = 3, ncol = 3)
x

           [,1]       [,2]      [,3]
[1,]  0.1533731  0.4264642 0.8781335
[2,] -1.1381369 -0.2950715 0.8215811
[3,]  1.2538149  0.8951257 0.6886403

Code

t(x) # transpose

          [,1]       [,2]      [,3]
[1,] 0.1533731 -1.1381369 1.2538149
[2,] 0.4264642 -0.2950715 0.8951257
[3,] 0.8781335  0.8215811 0.6886403

We use solve() to get the inverse of an matrix

Code

           [,1]       [,2]      [,3]
[1,]  0.1533731  0.4264642 0.8781335
[2,] -1.1381369 -0.2950715 0.8215811
[3,]  1.2538149  0.8951257 0.6886403

Code

solve(t(x)%*%x) # inverse; must be on a square matrix

          [,1]      [,2]      [,3]
[1,]  417.2893 -803.5341  299.4938
[2,] -803.5341 1548.5735 -577.2074
[3,]  299.4938 -577.2074  215.6665

We use T to do matrix transpose

Code

x = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])

x

array([[1, 2, 3],
       [4, 5, 6],
       [7, 8, 9]])

Code

x.T # transpose

array([[1, 4, 7],
       [2, 5, 8],
       [3, 6, 9]])

We use np.linalg.inv() to get the inverse of an matrix

Code

np.linalg.inv(x.T @ x) # inverse; must be on a square matrix

array([[ 5.62949953e+14, -1.12589991e+15,  5.62949953e+14],
       [-1.12589991e+15,  2.25179981e+15, -1.12589991e+15],
       [ 5.62949953e+14, -1.12589991e+15,  5.62949953e+14]])

We use transpose() to do matrix transpose

Code


x = [1 2 3; 4 5 6; 7 8 9]

3×3 Matrix{Int64}:
 1  2  3
 4  5  6
 7  8  9

Code


transpose(x) # transpose

3×3 transpose(::Matrix{Int64}) with eltype Int64:
 1  4  7
 2  5  8
 3  6  9

We use inv() to get the inverse of an matrix

Code


inv(transpose(x) * x) # inverse; must be on a square matrix

3×3 Matrix{Float64}:
  5.6295e14  -1.1259e15   5.6295e14
 -1.1259e15   2.2518e15  -1.1259e15
  5.6295e14  -1.1259e15   5.6295e14

7 Programming Basics: Flow Control

Indentation Difference

In R, the code block is enclosed by curly braces {}. Indentation is not necessary and does not affect the code execution.
In Python, the code block is defined by indentation. Indentation is necessary and affects the code execution.
In Julia, the code block is defined by the beginning of if or for and end. Indentation does not affect the code execution.

7.1 if/else

Sometimes, you want to run your code based on different conditions. For instance, if the observation is a missing value, then use the population average to impute the missing value. This is where if/else kicks in.

if (condition == TRUE) {
  action 1
} else if (condition == TRUE ){
  action 2
} else {
  action 3
}

Example 1:

Code

a <- 15

if (a > 10) {
larger_than_10 <- TRUE  
} else {
  larger_than_10 <- FALSE
}

larger_than_10

[1] TRUE

Example 2:

Code

x <- -5
if(x > 0){
  print("x is a non-negative number")
} else {
  print("x is a negative number")
}

[1] "x is a negative number"

Code

a = 15

if a > 10:
    larger_than_10 = True
else:
    larger_than_10 = False

larger_than_10

True

Example 2:

Code

x = -5

if x > 0:
    print("x is a non-negative number")
else:
    print("x is a negative number")

x is a negative number

Code


a = 15

Code


if a > 10
    larger_than_10 = true
else
    larger_than_10 = false
end

true

Code


larger_than_10

true

Example 2:

Code


x = -5

-5

Code


if x > 0
    println("x is a non-negative number")
else
    println("x is a negative number")
end

x is a negative number

7.2 Loops

Caution

Both R and Python are very inefficient in terms of loops. Therefore, codes should be written in matrix form to utlize the vectorization as much as possible.

In constrast, Julia is very efficient at loops. Thus code readability should be prioritized instead of vectorization.

As the name suggests, in a loop the program repeats a set of instructions many times, until the stopping criteria is met.

Loop is very useful for repetitive jobs.

Code

for (i in 1:10){ # i is the iterator
  # loop body: gets executed each time
  # the value of i changes with each iteration
}

Example:

Code

for (i in 1:5){
  print(i)
}

[1] 1
[1] 2
[1] 3
[1] 4
[1] 5

Code

for i in range(1, 6):
    print(i)

Code


for i in 1:5
    println(i)
end

7.3 User-Defined Functions

A function takes the argument as input, run some specified actions, and then return the result to us.

Functions are very useful. When we would like to test different ideas, we can combine functions with loops: We can write a function which takes different parameters as input, and we can use a loop to go through all the possible combinations of parameters.

Here is how to define a function in general:

Code

function_name <- function(arg1 ,arg2 = default_value){
  # write the actions to be done with arg1 and arg2
  # you can have any number of arguments, with or without defaults
  return() # the last line is to return some value 
}

Example:

Code

magic <- function( x, y){
  return(x^2 + y)
}

magic(1,3)

[1] 4

Here is how to define a function in general:

Code

def function_name(arg1, arg2 = default_value):
    # write the actions to be done with arg1 and arg2
    # you can have any number of arguments, with or without defaults
    return # the last line is to return some value

NameError: name 'default_value' is not defined

Example:

Code

def magic(x, y):
    return x**2 + y

magic(1, 3)

Here is how to define a function in general:

Code


function function_name(arg1, arg2 = default_value)
    # write the actions to be done with arg1 and arg2
    # you can have any number of arguments, with or without defaults
    return # the last line is to return some value
end

function_name (generic function with 2 methods)

Example:

Code


function magic(x, y)
    return x^2 + y
end

magic (generic function with 1 method)

Code


magic(1, 3)

8 A comprehensive exercise

Task: write a function, which takes a vector as input, and returns the max value of the vector

Code

get_max <- function(input){
  max_value <- input[1]
  for (i in 2:length(input) ) {
    if (input[i] > max_value) {
      max <- input[i]
    }
  }
  
  return(max)
}

get_max(c(-1,3,2))

[1] 2

Code

def get_max(input):
    max_value = input[0]
    for i in range(1, len(input)):
        if input[i] > max_value:
            max_value = input[i]
    return max_value

get_max([-1, 3, 2])

Code


function get_max(input)
    max_value = input[1]
    for i in 2:length(input)
        if input[i] > max_value
            max_value = input[i]
        end
    end
    return max_value
end

get_max (generic function with 1 method)

Code


get_max([-1, 3, 2])

9 Conclusion about R and Python

Below are the most easy mistakes to make when you are switching between R and Python:

In R, the index starts from 1; in Python, the index starts from 0.
In R, missing values are represented by NA; in Python, missing values are represented by np.nan.
In R, the code block is enclosed by curly braces {}; in Python, the code block is defined by indentation.
In R, the : operator is used to create a sequence with a step of 1; in Python, the range() function is used to create a sequence with a step of 1.
In R, the c() function is used to combine vectors; in Python, the + operator is used to combine lists.
In R, the rep() function is used to replicate elements in a vector; in Python, the * operator is used to replicate elements in a list.
In R, the %in% operator is used to test whether an element exists in the object; in Python, the in operator is used to test whether an element exists in the object.
In R, the %*% operator is used to perform matrix multiplication; in Python, the @ operator is used to perform matrix multiplication.

1 Language Basics

1.1 Assignment of variables

1.2 Comment codes

1.3 Rules for naming object

2 Packages and Functions

3 Arithmetic, Logical, and Relational Operations

3.1 Arithmetic operations

3.2 Logical operations

3.3 Relational operations

4 Vectors

4.1 Creating vectors

4.2 Indexing and subsetting

4.3 Creating numeric sequences with fixed steps

4.4 Combine multiple vectors into one: c()

4.5 Replicating elements

4.6 Maximum and minimum

4.7 Sum and mean

4.8 Missing values

4.9 Element-wise arithmetic operations

4.10 Vector multiplication

4.11 Max and min of 2 vectors

5 Character and String

5.1 Creating strings

5.2 Concatenating strings

5.3 Checking the number of elements in a vector: length()

5.4 Special relational operation: %in%

6 Matrices

6.1 Matrices: creating matrices

6.2 Creating matrices: combine matrices

6.3 Matrices: indexing and subsetting

6.4 Matrices: check dimensions and variable types

6.5 Matrices: special operations

6.5.1 Creating a diagonal matrix

6.5.2 Creating an identity matrix

6.6 Matrices’ operations: matrix addition and multiplication

6.7 Matrices’ operations: inverse and transpose

7 Programming Basics: Flow Control

7.1 if/else

7.2 Loops

7.3 User-Defined Functions

8 A comprehensive exercise

9 Conclusion about R and Python

5.4 Special relational operation: `%in%`