Side-to-Side Comparison between R, Python, and Julia

Author
Affiliation

Dr Wei Miao

UCL School of Management

Published

September 16, 2024

Modified

September 16, 2024

Tip

This tutorial is designed for those who are familiar with either R, Python or Julia, and would like to learn another language.

In this tutorial, I will compare the basics of R, Python, and Julia side by side. We will cover the basic syntax, data types, and functionalities.

If you discover any mistakes or outdated content in this tutorial, please let me know. I will be very grateful for your feedback.

Code
library(reticulate)
use_condaenv("base")
library(JuliaCall)

1 Language Basics

1.1 Assignment of variables

Caution

In R and Python, assignment operations do not print the assigned object by default.

But Julia does print the assigned object by default. Unless you put a semicolon ; at the end of the line, Julia will not print the assigned object.

Code
# create an object x with value 3
x <- 3
x
[1] 3
Code
# create an object x with value 3
x = 3
x
3
Code
# create an object x with value 3
x = 3; # the ; suppresses the output
3

1.2 Comment codes

You can put a # before any code, to indicate that any codes after the # on the same line are your comments, and will not be run by R.

It’s a good practice to often comment your codes, so that you can help the future you to remember what you were trying to achieve.

Code
# Is x 1 or 2 below?
x <- 1 # +1

Same as R. You can put a # before any code, to indicate that any codes after the # on the same line are your comments, and will not be run by Python.

Code
# Is x 1 or 2 below?
x = 1 # +1

Same as R and Python. You can put a # before any code, to indicate that any codes after the # on the same line are your comments, and will not be run by Julia.

Code
# Is x 1 or 2 below?

x = 1 # +1
1

1.3 Rules for naming object

For a variable to be valid, it should follow these rules

  • It should contain letters, numbers, and only dot or underscore characters.

  • It cannot start with a number (eg: 2iota), or a dot, or an underscore.

Code
# 2iota <- 2
# .iota <- 2
# _iota <- 2
  • It should not be a reserved word in R (eg: mean, sum, etc.).
Code
# mean <- 2

For a variable to be valid, it should follow these rules

  • It should contain letters, numbers, and only underscore characters.

  • It cannot start with a number (eg: 2iota), or a dot, or an underscore.

Code

# 2iota = 2

# .iota = 2

# _iota = 2
  • It should not be a reserved word in Python (eg: mean, sum, etc.).
Code

# mean = 2

Same as R.

2 Packages and Functions

The base R already comes with many useful built-in functions to perform basic tasks, but as data scientists, we need more.

To perform certain tasks (such as a machine learning model), we can definitely write our own code from scratch, but it takes lots of (unnecessary) effort. Fortunately, many packages have been written by others for us to directly use.

  • To download a package, hit Tools -> Install Packages in RStudio, and type the package name in the pop-up window. Now, download the package dplyr.

  • To load the packages, we need to type library().

Code
library(dplyr)
  • Now that the package is loaded, you can use the functions in it. filter() is a function in the dplyr package that can be used to filter data.
Code
data(iris)  # load built in iris
iris %>%
  filter(Species == "setosa")

Python has a similar concept of packages, but they are called modules.

  • To install a module, you can use pip install in the terminal, or !pip install in Jupyter Notebook. You can also install a module in the Anaconda Navigator.
Code
# !pip install pandas 
  • To load a module, you can use import. Now that the module is loaded, you can use the functions in it.
Code
import pandas as pd
iris = pd.read_csv('https://raw.githubusercontent.com/mwaskom/seaborn-data/master/iris.csv') # load iris

iris[iris['species'] == 'setosa']
    sepal_length  sepal_width  petal_length  petal_width species
0            5.1          3.5           1.4          0.2  setosa
1            4.9          3.0           1.4          0.2  setosa
2            4.7          3.2           1.3          0.2  setosa
3            4.6          3.1           1.5          0.2  setosa
4            5.0          3.6           1.4          0.2  setosa
5            5.4          3.9           1.7          0.4  setosa
6            4.6          3.4           1.4          0.3  setosa
7            5.0          3.4           1.5          0.2  setosa
8            4.4          2.9           1.4          0.2  setosa
9            4.9          3.1           1.5          0.1  setosa
10           5.4          3.7           1.5          0.2  setosa
11           4.8          3.4           1.6          0.2  setosa
12           4.8          3.0           1.4          0.1  setosa
13           4.3          3.0           1.1          0.1  setosa
14           5.8          4.0           1.2          0.2  setosa
15           5.7          4.4           1.5          0.4  setosa
16           5.4          3.9           1.3          0.4  setosa
17           5.1          3.5           1.4          0.3  setosa
18           5.7          3.8           1.7          0.3  setosa
19           5.1          3.8           1.5          0.3  setosa
20           5.4          3.4           1.7          0.2  setosa
21           5.1          3.7           1.5          0.4  setosa
22           4.6          3.6           1.0          0.2  setosa
23           5.1          3.3           1.7          0.5  setosa
24           4.8          3.4           1.9          0.2  setosa
25           5.0          3.0           1.6          0.2  setosa
26           5.0          3.4           1.6          0.4  setosa
27           5.2          3.5           1.5          0.2  setosa
28           5.2          3.4           1.4          0.2  setosa
29           4.7          3.2           1.6          0.2  setosa
30           4.8          3.1           1.6          0.2  setosa
31           5.4          3.4           1.5          0.4  setosa
32           5.2          4.1           1.5          0.1  setosa
33           5.5          4.2           1.4          0.2  setosa
34           4.9          3.1           1.5          0.2  setosa
35           5.0          3.2           1.2          0.2  setosa
36           5.5          3.5           1.3          0.2  setosa
37           4.9          3.6           1.4          0.1  setosa
38           4.4          3.0           1.3          0.2  setosa
39           5.1          3.4           1.5          0.2  setosa
40           5.0          3.5           1.3          0.3  setosa
41           4.5          2.3           1.3          0.3  setosa
42           4.4          3.2           1.3          0.2  setosa
43           5.0          3.5           1.6          0.6  setosa
44           5.1          3.8           1.9          0.4  setosa
45           4.8          3.0           1.4          0.3  setosa
46           5.1          3.8           1.6          0.2  setosa
47           4.6          3.2           1.4          0.2  setosa
48           5.3          3.7           1.5          0.2  setosa
49           5.0          3.3           1.4          0.2  setosa

Julia has a similar concept of packages.

  • To install a package, you can use Pkg.add() in the Julia terminal.
Code

using Pkg

Pkg.add("DataFrames")
Pkg.add("CSV")
  • To load a package, you can use using. Now that the package is loaded, you can use the functions in it.
Code

using DataFrames, CSV

iris = CSV.File(download("https://raw.githubusercontent.com/mwaskom/seaborn-data/master/iris.csv")) |> DataFrame;

# Filter the DataFrame where species is "setosa"
setosa_data = iris[iris.species .== "setosa", :];

# Display the first few rows of the filtered data
first(setosa_data, 5)
5×5 DataFrame
 Row │ sepal_length  sepal_width  petal_length  petal_width  species
     │ Float64       Float64      Float64       Float64      String15
─────┼────────────────────────────────────────────────────────────────
   1 │          5.1          3.5           1.4          0.2  setosa
   2 │          4.9          3.0           1.4          0.2  setosa
   3 │          4.7          3.2           1.3          0.2  setosa
   4 │          4.6          3.1           1.5          0.2  setosa
   5 │          5.0          3.6           1.4          0.2  setosa

3 Arithmetic, Logical, and Relational Operations

3.1 Arithmetic operations

Code
# arithmatic operations
x <- 3 
x + 1 # addition
[1] 4
Code
x - 1 # subtraction
[1] 2
Code
x * 2 # multiplication
[1] 6
Code
x / 2 # division
[1] 1.5
Code
x^2 # square
[1] 9
Code
x %% 2 # remainder
[1] 1
Code
x %/% 2 # integer division
[1] 1
Code
# math operations
log(x)  # natural logarithm
[1] 1.098612
Code
exp(x)  # exponential
[1] 20.08554
Code
sqrt(x) # square root
[1] 1.732051
Code
log10(x) # log base 10
[1] 0.4771213
Code
round(x/2) # round
[1] 2
Code
floor(x/2) # floor
[1] 1
Code
ceiling(x/2) # ceiling
[1] 2
Code
# arithmatic operations
x = 3
x + 1 # addition
4
Code
x - 1 # subtraction
2
Code
x * 2 # multiplication
6
Code
x / 2 # division
1.5
Code
x ** 2 # square
9
Code
x % 2 # remainder
1
Code
x // 2 # integer division
1
Code
# math operations
import math
math.log(x)  # natural logarithm
1.0986122886681098
Code
math.exp(x)  # exponential
20.085536923187668
Code
math.sqrt(x) # square root
1.7320508075688772
Code
math.log10(x) # log base 10
0.47712125471966244
Code
round(x/2) # round
2
Code
math.floor(x/2) # floor
1
Code
math.ceil(x/2) # ceiling
2
Code

# arithmatic operations

x = 3
3
Code

x + 1 # addition
4
Code

x - 1 # subtraction
2
Code

x * 2 # multiplication
6
Code

x / 2 # division
1.5
Code

x ^ 2 # square
9
Code

x % 2 # remainder
1
Code

div(x, 2) # integer division
1
Code

# math operations

log(x)  # natural logarithm
1.0986122886681098
Code

exp(x)  # exponential
20.085536923187668
Code

sqrt(x) # square root
1.7320508075688772
Code

log10(x) # log base 10
0.47712125471966244
Code

round(x/2) # round
2.0
Code

floor(x/2) # floor
1.0
Code

ceil(x/2) # ceiling
2.0

3.2 Logical operations

Code
# logical operations
x <- 3
x > 2 # larger than
[1] TRUE
Code
x < 2 # smaller than
[1] FALSE
Code
x == 2 # equal to
[1] FALSE
Code
x != 2 # not equal to
[1] TRUE
Code
# logical operations
x = 3
x > 2 # larger than
True
Code
x < 2 # smaller than
False
Code
x == 2 # equal to
False
Code
x != 2 # not equal to
True
Code

# logical operations

x = 3
3
Code

x > 2 # larger than
true
Code

x < 2 # smaller than
false
Code

x == 2 # equal to
false
Code

x != 2 # not equal to
true

3.3 Relational operations

Caution
  • R: Boolean values are TRUE and FALSE.
  • Python: Boolean values are True and False (case-sensitive).
Code
T & F # and
[1] FALSE
Code
T | F # or
[1] TRUE
Code
!T # not
[1] FALSE
Code
True & False # and
False
Code
True | False # or
True
Code
not True # not
False
Code

true & false # and
false
Code

true | false # or
true
Code

!true # not
false

4 Vectors

4.1 Creating vectors

  • In R, a vector is a collection of elements of the same data type, which is often used to store a variable of a dataset. For instance, a vector can store the income of a group of people, the final grades of students, etc.

  • Vector can be created using the function c() by listing all the values in the parenthesis, separated by comma ‘,’.

  • c() stands for “combine”.

Code
Income <- c(1, 3, 5, 10)
Income
[1]  1  3  5 10
  • Vectors must contain elements of the same data type. If not, it will automatically convert elements into the same type (usually character type).
Code
Income <- c(1, 3, 5, "10")
Income
[1] "1"  "3"  "5"  "10"
  • In Python, a list is a collection of elements of different data types, which is often used to store a variable of a dataset. For instance, a list can store the income of a group of people, the final grades of students, etc.

  • List can be created using the square brackets [] by listing all the values in the brackets, separated by comma ‘,’.

Code
Income = [1, 3, 5, 10]
Income
[1, 3, 5, 10]
  • List can contain elements of different data types.
Code
Income = [1, 3, 5, "10"]
Income
[1, 3, 5, '10']
  • If you want to create a list with elements of the same numeric data type, you can use the numpy package.
Code
import numpy as np
Income = np.array([1, 3, 5, 10])
Income
array([ 1,  3,  5, 10])
  • In Julia, a vector is a collection of elements of the same data type, which is often used to store a variable of a dataset. For instance, a vector can store the income of a group of people, the final grades of students, etc.

  • Vector can be created using the square brackets [] by listing all the values in the brackets, separated by comma ‘,’.

Code

Income = [1, 3, 5, 10]
4-element Vector{Int64}:
  1
  3
  5
 10
  • Vector can contain elements of different data types. However, you will note that the data type is now changed to any rather than Int64.
Code

Income = [1, 3, 5, "10"]
4-element Vector{Any}:
 1
 3
 5
  "10"

4.2 Indexing and subsetting

Caution

R, Python, and Julia have different indexing rules.

  • In R and Julia, the index starts from 1.
  • In Python, the index starts from 0.
  • To extract an element from a vector, we put the index of the element in a square bracket [ ].
Code
Income <- c(1, 3, 5, 10)
Income[1] # extract the first element
[1] 1
  • If we want to extract multiple elements, we can use a vector of indices.
Code
Income[c(1,3)] # extract the first and third elements
[1] 1 5
  • To extract an element from a list, we put the index of the element in a square bracket [ ].
Code
Income = [1, 3, 5, 10]
Income[0] # extract the first element
1
  • If we want to extract multiple elements, we can use a slice.
Code
Income[0:3] # extract the first and third elements
[1, 3, 5]
  • With numpy array, we can use the same syntax as R.
Code
Income = np.array([1, 3, 5, 10])
Income[0] # extract the first element
1
Code
Income[[0,2]] # extract the first and third elements
array([1, 5])
  • To extract an element from a vector, we put the index of the element in a square bracket [ ].
Code

Income = [1, 3, 5, 10];

Income[1] # extract the first element
1
  • If we want to extract multiple elements, we can use a slice.
Code

Income[1:3] # extract the first and third elements
3-element Vector{Int64}:
 1
 3
 5

4.3 Creating numeric sequences with fixed steps

It is also possible to easily create sequences with patterns

  • use seq() to create sequence with fixed steps
Code
# use seq()
seq(from = 1, to = 2, by = 0.1)
 [1] 1.0 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8 1.9 2.0
  • If the step is 1, there’s a convenient way using :
Code
1:5
[1] 1 2 3 4 5
  • In base Python, we can use range() to create sequence with fixed steps
Code
# from 1 to 6, with step 1
list(range(1, 6)) # range() returns a range object, we need to convert it to a list
[1, 2, 3, 4, 5]
  • use np.arange() to create sequence with fixed steps
Code
np.arange(1, 2, 0.1)
array([1. , 1.1, 1.2, 1.3, 1.4, 1.5, 1.6, 1.7, 1.8, 1.9])
  • In Julia, we can use 1:5 to create sequence with fixed steps
Code

1:5
1:5
  • However, the julia object is not a integer vector, but a UnitRange{Int64} object.
Code

typeof(1:5)
UnitRange{Int64}

4.4 Combine multiple vectors into one: c()

  • Sometimes, we may want to combine multiple vectors into one. For instance, we may have collected income data from two different sources, and we want to combine them into one vector.

  • We can use c() to combine different vectors; this is very commonly used to concatenate vectors.

Code
Income1 <- 1:3 
Income2 <- c(10, 15) 
Code
c(Income1,Income2)
[1]  1  2  3 10 15
  • In Python, we can use the + operator to concatenate lists.
Code
Income1 = [1, 2, 3]
Income2 = [10, 15]
Code
Income1 + Income2
[1, 2, 3, 10, 15]
  • For numpy arrays, we can use np.concatenate() to concatenate arrays.
Code
Income1 = np.array([1, 2, 3])
Income2 = np.array([10, 15])
Code
np.concatenate((Income1, Income2))
array([ 1,  2,  3, 10, 15])
  • In Julia, we can use the vcat() function to concatenate vectors.
Code

Income1 = [1, 2, 3];

Income2 = [10, 15]; 

vcat(Income1, Income2)
5-element Vector{Int64}:
  1
  2
  3
 10
 15

4.5 Replicating elements

  • We can use the rep() function to replicate elements in a vector.
Code
rep(1:3, times = 2) # replicate 1:3 twice
[1] 1 2 3 1 2 3
Code
rep(1:3, each = 2) # replicate each element in 1:3 twice
[1] 1 1 2 2 3 3
  • We can use the * operator to replicate elements in a list.
Code
[1, 2, 3] * 2 # replicate 1:3 twice
[1, 2, 3, 1, 2, 3]
  • For numpy arrays, we can use np.tile() to replicate elements.
Code
np.tile([1, 2, 3], 2) # replicate 1:3 twice
array([1, 2, 3, 1, 2, 3])
Code
np.repeat([1, 2, 3], 2) # replicate each element in 1:3 twice 
array([1, 1, 2, 2, 3, 3])
  • We can use the repeat() function to replicate elements in a vector.
Code

repeat([1, 2, 3], 2) # replicate 1:3 twice
6-element Vector{Int64}:
 1
 2
 3
 1
 2
 3
Code

repeat([1, 2, 3], inner = 2) # replicate each element in 1:3 twice
6-element Vector{Int64}:
 1
 1
 2
 2
 3
 3

4.6 Maximum and minimum

  • We can use the max() and min() functions to find the maximum and minimum values in a vector.
Code
Income <- c(1, 3, 5, 10)

max(Income) # maximum
[1] 10
Code
min(Income) # minimum
[1] 1
  • We can use the max() and min() functions to find the maximum and minimum values in a list.
Code
Income = [1, 3, 5, 10]

max(Income) # maximum
10
Code
min(Income) # minimum
1
  • For numpy arrays, we can use np.max() and np.min() to find the maximum and minimum values.
Code
Income = np.array([1, 3, 5, 10])

np.max(Income) # maximum
10
Code
np.min(Income) # minimum
1
  • We can use the maximum() and minimum() functions to find the maximum and minimum values in a vector.
Code

Income = [1, 3, 5, 10];

maximum(Income) # maximum
10
Code

minimum(Income) # minimum
1

4.7 Sum and mean

  • We can use the sum() and mean() functions to find the sum and mean values in a vector.
Code
Income <- c(1, 3, 5, 10)

sum(Income, na.rm = T) # sum and remove missing values
[1] 19
Code
mean(Income, na.rm = T) # mean and remove missing values
[1] 4.75
  • We can use the sum() and mean() functions to find the sum and mean values in a list.
Code
Income = [1, 3, 5, 10]

sum(Income) # sum
19
Code
np.mean(Income) # mean
4.75
  • For numpy arrays, we can use np.sum() and np.mean() to find the sum and mean values.
Code
Income = np.array([1, 3, 5, 10])

np.sum(Income) # sum
19
Code
np.mean(Income) # mean
4.75
  • We can use the sum() and mean() functions to find the sum and mean values in a vector.
Code

Income = [1, 3, 5, 10];

sum(Income) # sum
19
Code

mean(Income) # mean
4.75

4.8 Missing values

Caution
  • In R, missing values are represented by NA.

  • In Python, missing values are represented by np.nan.

  • In Julia, missing values are represented by missing.

  • In R, missing values are represented by NA.
Code
Income <- c(1, 3, 5, NA)

sum(Income, na.rm = T) # sum and remove missing values
[1] 9
Code
mean(Income, na.rm = T) # mean and remove missing values
[1] 3
  • In Python, missing values are represented by np.nan.
Code
Income = [1, 3, 5, np.nan]

np.nansum(Income) # sum and remove missing values
9.0
Code
np.nanmean(Income) # mean and remove missing values
3.0
  • In Julia, missing values are represented by missing. In order to take the sum or mean by removing missing values,
Code

Income = [1, 3, 5, missing];

sum(skipmissing(Income)) # sum and remove missing values
9

4.9 Element-wise arithmetic operations

Caution
  • R by default supports element-wise operations on vectors.
  • Python by default does not support element-wise operations on lists. You need to use numpy arrays to do element-wise operations.
  • Julia by default does not support element-wise operations on arrays. You need to use the . operator to do element-wise operations.
  • If you operate on a vector with a single number, the operation will be applied to all elements in the vector
Code
Income <- c(1, 3, 5, 10)

Income + 2 # element-wise addition
[1]  3  5  7 12
Code
Income * 2 # element-wise multiplication
[1]  2  6 10 20
  • However, the base Python does not support element-wise operations on lists.
Code
Income = [1, 3, 5, 10]

Income + 2 # element-wise addition
TypeError: can only concatenate list (not "int") to list
Code
Income * 2 # element-wise multiplication
[1, 3, 5, 10, 1, 3, 5, 10]
  • For numpy arrays, the behavior is the same as R.
Code
Income = np.array([1, 3, 5, 10])

Income + 2 # element-wise addition
array([ 3,  5,  7, 12])
Code
Income * 2 # element-wise multiplication
array([ 2,  6, 10, 20])
  • If you operate on a vector with a single number, the operation will be applied to all elements in the vector. However, the base Julia does not support element-wise operations on arrays. In order to do element-wise operations, you need to use the . operator.
Code

Income = [1, 3, 5, 10];

Income .+ 2 # element-wise addition
4-element Vector{Int64}:
  3
  5
  7
 12
Code

Income .* 2 # element-wise multiplication
4-element Vector{Int64}:
  2
  6
 10
 20

4.10 Vector multiplication

  • If the two vectors are of the same length, they can do element-wise operations, including element-wise addition and element-wise multiplication
Code
Income1 <- c(1, 3, 5, 10)

Income2 <- c(2, 4, 6, 8)

Income1 + Income2 # element-wise addition
[1]  3  7 11 18
Code
Income1 * Income2 # element-wise multiplication
[1]  2 12 30 80
  • For numpy arrays, we can use np.multiply() to do element-wise multiplication.
Code
Income1 = np.array([1, 3, 5, 10])

Income2 = np.array([2, 4, 6, 8])

np.add(Income1, Income2) # element-wise addition
array([ 3,  7, 11, 18])
Code
np.multiply(Income1, Income2) # element-wise multiplication
array([ 2, 12, 30, 80])
  • If the two vectors are of the same length, they can do element-wise operations, including element-wise addition and element-wise multiplication
Code

Income1 = [1, 3, 5, 10];

Income2 = [2, 4, 6, 8];

Income1 .+ Income2 # element-wise addition
4-element Vector{Int64}:
  3
  7
 11
 18
Code

Income1 .* Income2 # element-wise multiplication
4-element Vector{Int64}:
  2
 12
 30
 80

4.11 Max and min of 2 vectors

  • We can use the pmax() and pmin() functions to find the element-wise maximum and minimum values of two vectors.
Code
Income1 <- c(1, 3, 5, 10)

Income2 <- c(2, 4, 6, 8)

pmax(Income1, Income2) # element-wise maximum
[1]  2  4  6 10
Code
pmin(Income1, Income2) # element-wise minimum
[1] 1 3 5 8
  • We can use the np.maximum() and np.minimum() functions to find the element-wise maximum and minimum values of two numpy arrays.
Code
Income1 = np.array([1, 3, 5, 10])

Income2 = np.array([2, 4, 6, 8])

np.maximum(Income1, Income2) # element-wise maximum
array([ 2,  4,  6, 10])
Code
np.minimum(Income1, Income2) # element-wise minimum
array([1, 3, 5, 8])
  • We can use the max() and min() functions to find the element-wise maximum and minimum values of two vectors.
Code

Income1 = [1, 3, 5, 10];

Income2 = [2, 4, 6, 8];

max.(Income1, Income2) # element-wise maximum
4-element Vector{Int64}:
  2
  4
  6
 10
Code

min.(Income1, Income2) # element-wise minimum
4-element Vector{Int64}:
 1
 3
 5
 8

5 Character and String

5.1 Creating strings

  • Characters are enclosed within a pair of quotation marks.

  • Single or double quotation marks can both work.

  • If even a character may contain numbers, it will be treated as a character, and R will not perform any mathematical operations on it.

Code
str1 <- "1 + 1 = 2"
  • Strings are enclosed within a pair of quotation marks.

  • Single or double quotation marks can both work.

Code
str1 = "1 + 1 = 2"
  • In Julia, single quotation marks (') are used for defining individual characters. Double quotation marks (") are used for defining strings.
Code

character1 = '1'
'1': ASCII/Unicode U+0031 (category Nd: Number, decimal digit)
Code
str1 = "1 + 1 = 2"
"1 + 1 = 2"

5.2 Concatenating strings

  • We can use the paste() function to concatenate strings.
Code
str1 <- "1 + 1 = "
str2 <- "2"

paste(str1, str2)
[1] "1 + 1 =  2"
  • We can use the + operator to concatenate strings.
Code
str1 = "1 + 1 = "
str2 = "2"

str1 + str2
'1 + 1 = 2'
  • We can use the * operator to concatenate strings.
Code

str1 = "1 + 1 = "
"1 + 1 = "
Code

str2 = "2"
"2"
Code

str1 * str2
"1 + 1 = 2"

5.3 Checking the number of elements in a vector: length()

  • You can measure the length of a vector using the command length()
Code
x <- c('R',' is', ' the', ' best', ' language')
length(x)
[1] 5
  • You can measure the length of a list using the command len()
Code
x = ['R',' is', ' the', ' best', ' language']

len(x)
5
  • For numpy arrays, you can use the shape attribute to get the shape of the array.
Code
x = np.array(['Python',' is', ' the', ' best', ' language'])

x.shape
(5,)
  • You can measure the length of a vector using the command length()
Code

x = ["Julia", " is", " the", " best", " language"]
5-element Vector{String}:
 "Julia"
 " is"
 " the"
 " best"
 " language"
Code

length(x)
5

5.4 Special relational operation: %in%

  • A special relational operation is %in% in R, which tests whether an element exists in the object.
Code
x <- c(1,3,8,7) 

3 %in% x
[1] TRUE
Code
2 %in% x
[1] FALSE
  • In Python, we can use the in operator to test whether an element exists in the object.
Code
x = [1, 3, 8, 7]

3 in x
True
Code
2 in x
False
  • In Julia, we can use the in operator to test whether an element exists in the object.
Code

x = [1, 3, 8, 7];

3 in x
true

6 Matrices

6.1 Matrices: creating matrices

Caution

When creating R matrix using matrix(), the sequence of elements is filled by column. This by-column is named as column-major order.

When creating Python matrix using np.array(), the sequence of elements is filled by row. This by-row is named as row-major order.

  • A matrix can be created using the command matrix()
    • the first argument is the vector to be converted into matrix
    • the second argument is the number of rows
    • the last argument is the number of cols
Code
matrix(1:9, nrow = 3, ncol = 3)
     [,1] [,2] [,3]
[1,]    1    4    7
[2,]    2    5    8
[3,]    3    6    9
  • A matrix can be created using the numpy package, np.array() function, where the argument is a list of lists, where each list is a row of the matrix
Code
import numpy as np

np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
array([[1, 2, 3],
       [4, 5, 6],
       [7, 8, 9]])
  • A matrix can be created using the base Julia using square brackets [] and semicolon ; to separate rows.
Code

[1 2 3; 4 5 6; 7 8 9]
3×3 Matrix{Int64}:
 1  2  3
 4  5  6
 7  8  9

6.2 Creating matrices: combine matrices

We can use cbind() and rbind() to concatenate vectors and matrices into new matrices.

  • cbind() does the column binding
Code
a <- matrix(1:6, nrow = 2, ncol = 3)

a
     [,1] [,2] [,3]
[1,]    1    3    5
[2,]    2    4    6
Code
cbind(a, a) # column bind
     [,1] [,2] [,3] [,4] [,5] [,6]
[1,]    1    3    5    1    3    5
[2,]    2    4    6    2    4    6
  • rbind() does the row binding
Code
rbind(a, a) # row bind
     [,1] [,2] [,3]
[1,]    1    3    5
[2,]    2    4    6
[3,]    1    3    5
[4,]    2    4    6
  • We can use np.concatenate() to concatenate arrays.
Code
a = np.array([[1, 2, 3], [4, 5, 6]])

a
array([[1, 2, 3],
       [4, 5, 6]])
Code
np.concatenate((a, a), axis = 1) # column bind
array([[1, 2, 3, 1, 2, 3],
       [4, 5, 6, 4, 5, 6]])
Code
np.concatenate((a, a), axis = 0) # row bind
array([[1, 2, 3],
       [4, 5, 6],
       [1, 2, 3],
       [4, 5, 6]])
  • We can use the hcat() and vcat() functions to concatenate matrices.
Code

a = [1 2 3; 4 5 6]
2×3 Matrix{Int64}:
 1  2  3
 4  5  6
Code

hcat(a, a) # column bind
2×6 Matrix{Int64}:
 1  2  3  1  2  3
 4  5  6  4  5  6
Code

vcat(a, a) # row bind
4×3 Matrix{Int64}:
 1  2  3
 4  5  6
 1  2  3
 4  5  6

6.3 Matrices: indexing and subsetting

Matrices have two dimensions: rows and columns. Therefore, to extract elements from a matrix, we just need to specify which row(s) and which column(s) we want.

Code
x <- matrix(1:9, nrow = 3, ncol = 3)
x
     [,1] [,2] [,3]
[1,]    1    4    7
[2,]    2    5    8
[3,]    3    6    9
  • Extract the element in the 2nd row, 3rd column.
    • use square bracket with a coma inside [ , ] to indicate subsetting; the argument before coma is the row index, and the argument after the coma is the column index.
      • 2 is specified for row index, so we will extract elements from the first row
      • 3 is specified for column index, so we will extract elements from the the second column
      • Altogether, we extract a single element in row 2, column 3.
Code
x[2,3] # the element in the 2nd row, 3rd column
[1] 8
  • If we leave blank for a dimension, we extract all elements along that dimension.
    • if we want to take out the entire first row
      • 1 is specified for the row index
      • column index is blank
Code
x[1,] # all elements in the first row
[1] 1 4 7
Code
x = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])

x
array([[1, 2, 3],
       [4, 5, 6],
       [7, 8, 9]])
  • Extract the element in the 2nd row, 3rd column.
Code
x[1,2] # the element in the 2nd row, 3rd column
6
  • If we leave blank for a dimension, we extract all elements along that dimension.
Code
x[0,:] # all elements in the first row
array([1, 2, 3])
Code

x = [1 2 3; 4 5 6; 7 8 9];
  • Extract the element in the 2nd row, 3rd column.
Code

x[2,3] # the element in the 2nd row, 3rd column
6
  • Different from R, we need to use : to extract all elements along that dimension.
Code

x[1,:] # all elements in the first row
3-element Vector{Int64}:
 1
 2
 3

6.4 Matrices: check dimensions and variable types

  • You can verify the size of the matrix using the command dim(); or nrow() and ncol()
Code
x <- matrix(1:9, nrow = 3, ncol = 3)

dim(x)
[1] 3 3
Code
nrow(x)
[1] 3
Code
ncol(x)
[1] 3
  • You can get the data type info using the command str()
Code
str(x)
 int [1:3, 1:3] 1 2 3 4 5 6 7 8 9
  • You can verify the size of the matrix using the shape attribute
Code
x = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])

x.shape
(3, 3)
  • You can get the data type info using the dtype attribute
Code
x.dtype
dtype('int64')
  • You can verify the size of the matrix using the size() function
Code

x = [1 2 3; 4 5 6; 7 8 9]
3×3 Matrix{Int64}:
 1  2  3
 4  5  6
 7  8  9
Code

size(x)
(3, 3)

6.5 Matrices: special operations

6.5.1 Creating a diagonal matrix

  • We can use the diag() function to create a diagonal matrix.
Code
diag(1:3)
     [,1] [,2] [,3]
[1,]    1    0    0
[2,]    0    2    0
[3,]    0    0    3
  • We can use the np.diag() function to create a diagonal matrix.
Code
np.diag([1, 2, 3])
array([[1, 0, 0],
       [0, 2, 0],
       [0, 0, 3]])
  • We can use the diagm() function to create a diagonal matrix.
Code
using LinearAlgebra
diagm(0 => [1, 2, 3])
3×3 Matrix{Int64}:
 1  0  0
 0  2  0
 0  0  3

6.5.2 Creating an identity matrix

  • We can use the diag() function to create an identity matrix.
Code
diag(3)
     [,1] [,2] [,3]
[1,]    1    0    0
[2,]    0    1    0
[3,]    0    0    1
  • We can use the np.eye() function to create an identity matrix.
Code
np.eye(3)
array([[1., 0., 0.],
       [0., 1., 0.],
       [0., 0., 1.]])
  • We can use the I() function to create an identity matrix.
Code

I(3)
3×3 Diagonal{Bool, Vector{Bool}}:
 1  ⋅  ⋅
 ⋅  1  ⋅
 ⋅  ⋅  1

6.6 Matrices’ operations: matrix addition and multiplication

  • If the two matrices are of the same dimensions, they can do element-wise operations, including element-wise addition and element-wise multiplication
Code
set.seed(123)

x = matrix(rnorm(9), nrow = 3, ncol = 3)

z = matrix(rnorm(9), nrow = 3, ncol = 3)

x + z   # elementwise addition
           [,1]      [,2]       [,3]
[1,] -1.0061376 0.4712798  2.2478293
[2,]  0.9939043 0.2399705 -0.7672108
[3,]  1.9185221 1.1592239 -2.6534700
Code
x * x 
           [,1]        [,2]      [,3]
[1,] 0.31413295 0.004971433 0.2124437
[2,] 0.05298168 0.016715318 1.6003799
[3,] 2.42957161 2.941447909 0.4717668
  • If we want to perform the matrix multiplication as in linear algebra, we need to use %*%
    • x and y must have conforming dimensions
Code
x
           [,1]       [,2]       [,3]
[1,] -0.5604756 0.07050839  0.4609162
[2,] -0.2301775 0.12928774 -1.2650612
[3,]  1.5587083 1.71506499 -0.6868529
Code
y = matrix(rnorm(9), nrow = 3, ncol = 3)
x %*% y # matrix multiplication
           [,1]       [,2]       [,3]
[1,] -0.9186059 -0.2861301  0.6175429
[2,]  1.1282999  0.8396152 -1.1340507
[3,]  1.0157790 -1.5987826 -4.4424790
  • If the two matrices are of the same dimensions, they can do element-wise operations, including element-wise addition and element-wise multiplication
Code
x = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])

y = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])

x + y # elementwise addition
array([[ 2,  4,  6],
       [ 8, 10, 12],
       [14, 16, 18]])
Code
x * y # elementwise multiplication
array([[ 1,  4,  9],
       [16, 25, 36],
       [49, 64, 81]])
  • If we want to perform the matrix multiplication as in linear algebra, we need to use @
    • x and y must have conforming dimensions
Code
x @ y # matrix multiplication
array([[ 30,  36,  42],
       [ 66,  81,  96],
       [102, 126, 150]])
  • If the two matrices are of the same dimensions, they can do element-wise operations, including element-wise addition and element-wise multiplication. It’s recommended to use . to indicate element-wise operations
Code

x = [1 2 3; 4 5 6; 7 8 9]
3×3 Matrix{Int64}:
 1  2  3
 4  5  6
 7  8  9
Code

y = [1 2 3; 4 5 6; 7 8 9]
3×3 Matrix{Int64}:
 1  2  3
 4  5  6
 7  8  9
Code

x .+ y # elementwise addition
3×3 Matrix{Int64}:
  2   4   6
  8  10  12
 14  16  18

6.7 Matrices’ operations: inverse and transpose

  • We use t() to do matrix transpose
Code
x = matrix(rnorm(9), nrow = 3, ncol = 3)
x
           [,1]       [,2]      [,3]
[1,]  0.1533731  0.4264642 0.8781335
[2,] -1.1381369 -0.2950715 0.8215811
[3,]  1.2538149  0.8951257 0.6886403
Code
t(x) # transpose
          [,1]       [,2]      [,3]
[1,] 0.1533731 -1.1381369 1.2538149
[2,] 0.4264642 -0.2950715 0.8951257
[3,] 0.8781335  0.8215811 0.6886403
  • We use solve() to get the inverse of an matrix
Code
x
           [,1]       [,2]      [,3]
[1,]  0.1533731  0.4264642 0.8781335
[2,] -1.1381369 -0.2950715 0.8215811
[3,]  1.2538149  0.8951257 0.6886403
Code
solve(t(x)%*%x) # inverse; must be on a square matrix
          [,1]      [,2]      [,3]
[1,]  417.2893 -803.5341  299.4938
[2,] -803.5341 1548.5735 -577.2074
[3,]  299.4938 -577.2074  215.6665
  • We use T to do matrix transpose
Code
x = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])

x
array([[1, 2, 3],
       [4, 5, 6],
       [7, 8, 9]])
Code
x.T # transpose
array([[1, 4, 7],
       [2, 5, 8],
       [3, 6, 9]])
  • We use np.linalg.inv() to get the inverse of an matrix
Code
np.linalg.inv(x.T @ x) # inverse; must be on a square matrix
array([[ 5.62949953e+14, -1.12589991e+15,  5.62949953e+14],
       [-1.12589991e+15,  2.25179981e+15, -1.12589991e+15],
       [ 5.62949953e+14, -1.12589991e+15,  5.62949953e+14]])
  • We use transpose() to do matrix transpose
Code

x = [1 2 3; 4 5 6; 7 8 9]
3×3 Matrix{Int64}:
 1  2  3
 4  5  6
 7  8  9
Code

transpose(x) # transpose
3×3 transpose(::Matrix{Int64}) with eltype Int64:
 1  4  7
 2  5  8
 3  6  9
  • We use inv() to get the inverse of an matrix
Code

inv(transpose(x) * x) # inverse; must be on a square matrix
3×3 Matrix{Float64}:
  5.6295e14  -1.1259e15   5.6295e14
 -1.1259e15   2.2518e15  -1.1259e15
  5.6295e14  -1.1259e15   5.6295e14

7 Programming Basics: Flow Control

Indentation Difference
  • In R, the code block is enclosed by curly braces {}. Indentation is not necessary and does not affect the code execution.

  • In Python, the code block is defined by indentation. Indentation is necessary and affects the code execution.

  • In Julia, the code block is defined by the beginning of if or for and end. Indentation does not affect the code execution.

7.1 if/else

Sometimes, you want to run your code based on different conditions. For instance, if the observation is a missing value, then use the population average to impute the missing value. This is where if/else kicks in.

if (condition == TRUE) {
  action 1
} else if (condition == TRUE ){
  action 2
} else {
  action 3
}

Example 1:

Code
a <- 15

if (a > 10) {
larger_than_10 <- TRUE  
} else {
  larger_than_10 <- FALSE
}

larger_than_10  
[1] TRUE

Example 2:

Code
x <- -5
if(x > 0){
  print("x is a non-negative number")
} else {
  print("x is a negative number")
}
[1] "x is a negative number"
Code
a = 15

if a > 10:
    larger_than_10 = True
else:
    larger_than_10 = False

larger_than_10
True

Example 2:

Code
x = -5

if x > 0:
    print("x is a non-negative number")
else:
    print("x is a negative number")
x is a negative number
Code

a = 15
15
Code

if a > 10
    larger_than_10 = true
else
    larger_than_10 = false
end
true
Code

larger_than_10
true

Example 2:

Code

x = -5
-5
Code

if x > 0
    println("x is a non-negative number")
else
    println("x is a negative number")
end
x is a negative number

7.2 Loops

Caution

Both R and Python are very inefficient in terms of loops. Therefore, codes should be written in matrix form to utlize the vectorization as much as possible.

In constrast, Julia is very efficient at loops. Thus code readability should be prioritized instead of vectorization.

As the name suggests, in a loop the program repeats a set of instructions many times, until the stopping criteria is met.

Loop is very useful for repetitive jobs.

Code
for (i in 1:10){ # i is the iterator
  # loop body: gets executed each time
  # the value of i changes with each iteration
}

Example:

Code
for (i in 1:5){
  print(i)
}
[1] 1
[1] 2
[1] 3
[1] 4
[1] 5
Code
for i in range(1, 6):
    print(i)
1
2
3
4
5
Code

for i in 1:5
    println(i)
end
1
2
3
4
5

7.3 User-Defined Functions

A function takes the argument as input, run some specified actions, and then return the result to us.

Functions are very useful. When we would like to test different ideas, we can combine functions with loops: We can write a function which takes different parameters as input, and we can use a loop to go through all the possible combinations of parameters.

Here is how to define a function in general:

Code
function_name <- function(arg1 ,arg2 = default_value){
  # write the actions to be done with arg1 and arg2
  # you can have any number of arguments, with or without defaults
  return() # the last line is to return some value 
}

Example:

Code
magic <- function( x, y){
  return(x^2 + y)
}

magic(1,3)
[1] 4

Here is how to define a function in general:

Code
def function_name(arg1, arg2 = default_value):
    # write the actions to be done with arg1 and arg2
    # you can have any number of arguments, with or without defaults
    return # the last line is to return some value
NameError: name 'default_value' is not defined

Example:

Code
def magic(x, y):
    return x**2 + y

magic(1, 3)
4

Here is how to define a function in general:

Code

function function_name(arg1, arg2 = default_value)
    # write the actions to be done with arg1 and arg2
    # you can have any number of arguments, with or without defaults
    return # the last line is to return some value
end
function_name (generic function with 2 methods)

Example:

Code

function magic(x, y)
    return x^2 + y
end
magic (generic function with 1 method)
Code

magic(1, 3)
4

8 A comprehensive exercise

Task: write a function, which takes a vector as input, and returns the max value of the vector

Code
get_max <- function(input){
  max_value <- input[1]
  for (i in 2:length(input) ) {
    if (input[i] > max_value) {
      max <- input[i]
    }
  }
  
  return(max)
}

get_max(c(-1,3,2))
[1] 2
Code
def get_max(input):
    max_value = input[0]
    for i in range(1, len(input)):
        if input[i] > max_value:
            max_value = input[i]
    return max_value

get_max([-1, 3, 2])
3
Code

function get_max(input)
    max_value = input[1]
    for i in 2:length(input)
        if input[i] > max_value
            max_value = input[i]
        end
    end
    return max_value
end
get_max (generic function with 1 method)
Code

get_max([-1, 3, 2])
3

9 Conclusion about R and Python

Below are the most easy mistakes to make when you are switching between R and Python:

  • In R, the index starts from 1; in Python, the index starts from 0.

  • In R, missing values are represented by NA; in Python, missing values are represented by np.nan.

  • In R, the code block is enclosed by curly braces {}; in Python, the code block is defined by indentation.

  • In R, the : operator is used to create a sequence with a step of 1; in Python, the range() function is used to create a sequence with a step of 1.

  • In R, the c() function is used to combine vectors; in Python, the + operator is used to combine lists.

  • In R, the rep() function is used to replicate elements in a vector; in Python, the * operator is used to replicate elements in a list.

  • In R, the %in% operator is used to test whether an element exists in the object; in Python, the in operator is used to test whether an element exists in the object.

  • In R, the %*% operator is used to perform matrix multiplication; in Python, the @ operator is used to perform matrix multiplication.