Speed Up Matrix Operations

Author
Affiliation

Wei Miao

UCL School of Management

Published

August 27, 2022

If your codes include lots of vector/matrix operations, especially those computation intensive tasks such as matrix inverse,1 you will probably feel that the default R is slow. Your intuition is likely correct! The reason is that the default R distributions from CRAN make use of the default BLAS/LAPACK implementation for linear algebra operations. The purpose of using such reference BLAS libraries by the development team is good: These implementations are built to be stable and cross platform compatible. However, the price that comes with stability is the sacrifice of speed. Is there a way to tweak your R settings to significantly boost the running speed? The answer is yes, and the steps are actually quite simple. We will review the steps to install highly optimized libraries and benchmark their performance.

1 What is BLAS?

BLAS, which is short for Basic Linear Algebra Subprograms, is a specification that prescribes a set of low-level routines for performing common linear algebra operations such as vector addition, scalar multiplication, dot products, linear combinations, and matrix multiplication. In short, it’s a library that can do many basic matrix operations. Because the BLAS is efficient, portable, and widely available, they are commonly used in the development of high quality linear algebra software.

LAPACK is written in Fortran 90 and provides routines for solving systems of simultaneous linear equations, least-squares solutions of linear systems of equations, eigenvalue problems, and singular value problems. The associated matrix factorizations (LU, Cholesky, QR, SVD, Schur, generalized Schur) are also provided, as are related computations such as reordering of the Schur factorizations and estimating condition numbers.

These default libraries are aimed for high stability and compatibility. And the cost is the speed, because these libraries are not optimized for your specific computers. Therefore, to overcome this bottleneck, we can switch the default linear algebra libraries to more efficient ones depending on your hardware.

2 For Apple Silicon CPU Macs

vecLib is a part of Apple’s Accelerate framework2, which provides an optimized BLAS implementation for Mac hardware.

Although in recent MacOS updates, vecLib is no long provided with the MacOS, the R binaries distributed by CRAN do provide vecLib BLAS, just that vecLib is not used by default. Next, I will show you how to switch to vecLib.

2.1 How to Switch to VecLib BLAS

On MacOS, the R’s framework path is /Library/Frameworks/R.framework/Resources/lib. To go to the folder, open terminal on your Mac, and enter the following command:

Code
cd /Library/Frameworks/R.framework/Resources/lib
ls -l
total 16336
-rwxrwxr-x  1 root     admin  3988192 24 Jun 11:57 libR.dylib
drwxrwxr-x  3 root     admin       96 23 Apr  2022 libR.dylib.dSYM
-rwxrwxr-x  1 root     admin   193440 24 Jun 11:57 libRblas.0.dylib
drwxrwxr-x  3 root     admin       96 23 Apr  2022 libRblas.0.dylib.dSYM
lrwxr-xr-x  1 weimiao  admin       48 28 Aug 21:10 libRblas.dylib -> /opt/homebrew/opt/openblas/lib/libopenblas.dylib
drwxrwxr-x  3 root     admin       96 23 Apr  2022 libRblas.dylib.dSYM
-rwxrwxr-x  1 root     admin   154464 24 Jun 11:57 libRblas.vecLib.dylib
drwxrwxr-x  3 root     admin       96 23 Apr  2022 libRblas.vecLib.dylib.dSYM
-rwxrwxr-x  1 root     admin  1624976 24 Jun 11:57 libRlapack.0.dylib
lrwxr-xr-x  1 weimiao  admin       46 28 Aug 21:11 libRlapack.dylib -> /opt/homebrew/opt/openblas/lib/liblapack.dylib
drwxrwxr-x  3 root     admin       96 23 Apr  2022 libRlapack.dylib.dSYM
-rw-rw-r--  1 root     admin   157792 24 Jun 11:57 libgcc_s.1.1.dylib
-rwxrwxr-x  1 root     admin  1865904 24 Jun 11:57 libgfortran.5.dylib
-rwxrwxr-x  1 root     admin   367040 24 Jun 11:57 libquadmath.0.dylib

which will change (c) the directory (d) to R’s framework folder. In this folder, you will see a few files by hitting ls:

  • libRblas.0.dylib is the default BLAS library.
  • libRblas.vecLib.dylib is the more efficient vecLib BLAS, which we are going to switch to.
  • libRblas.dylib is of alias file type. This means it’s kind of like a shortcut and points to a another file we set. By default, libRblas.dylib is pointed to libRblas.0.dylib, so R uses the default BLAS library. All we need to do is to relink the libRblas.0.dylib to libRblas.vecLib.dylib, such that R will use the vecLib.

To do so, type the following command in terminal:

Code
ln -sf libRblas.vecLib.dylib libRblas.dylib

If vecLib has issues on your computer3 and you would like to switch back to the default BLAS/LAPACK, simply link the libRblas.dylib back to libRblas.0.dylib, by entering the following command in terminal:

Code
ln -sf libRblas.0.dylib libRblas.dylib

2.2 Performance Comparison

We will compare the performance before and after switching to vecLib using the following R codes. The code generates a 1000 by 1000 matrix, and obtains the inverse of that matrix. I use microbenchmark package to run the inverse operation for 100 times, and capture the performance metrics.

Code
set.seed(888)
nd = 1000
a <- matrix(rnorm(nd^2), nd, nd)
library(microbenchmark)
mb <- microbenchmark(
  solve(a),
  times = 100,
  unit = "ms"
)

When I use the default BLAS/LAPACK, the benchmark result is as follows:

Unit: milliseconds
     expr     min      lq     mean   median       uq      max neval
 solve(a) 383.953 388.517 400.0499 392.0261 402.7341 629.8773   100

After we switch to vecLib, the benchmarks are as follows. As can be seen, the speed has increased dramatically!

Unit: milliseconds
     expr     min       lq     mean   median       uq      max neval
 solve(a) 27.6078 30.08605 31.93745 30.75814 31.26055 53.32608   100

Footnotes

  1. For instance, in my research projects that involve structural modelling, I often need to write my own codes in matrix form to compute the value functions, policy functions, equlibrium computation, etc.↩︎

  2. Apple’s Accelerate framework provides high-performance, energy-efficient computations on the CPU by leveraging its vector-processing capability. For more details, refer to Apple’s documentation here.↩︎

  3. As warned by CRAN, “Although fast, it is not under our control and may possibly deliver inaccurate results.”↩︎