UQ RCC HPC Documentation

High Performance Computing guides and policies for the University of Queensland.

View the Project on GitHub UQ-RCC/hpc-docs

Using R on Bunya - Making the move from Laptop to HPC

This document was the basis of a presentation to The University of Queensland R Users Group meetup on April 24 2024 by Dr David Green UQ RCC

Abstract

In this session, we will trace the steps taken by the many UQ researchers using R who have wanted, or needed, stronger (or more numerous) computers on which to run their codes.

There are tremendous benefits to porting your R computations to the high-performance computing (HPC) environment.

But there are also some obstacles to overcome, especially with the ease of use and there are a few gotchas.

Some recent innovations on the Bunya HPC system are addressing some of those obstacles.

Outline

Your R code

Imagine the scenario where your R code

Laptop limitations

Enter Bunya, for comparison

  This Laptop Bunya Phase 1 Bunya Phase 2 Bunya Phase 3 soon
CPU Intel Core i7-8650U AMD EPYC 7643 (Milan) AMD EPYC 9454 (Genoa)  
Speed 2.11 GHz 2.30 GHz 2.75 GHz  
Sockets 1 2 2  
Cores 4 96 96  
Logical Processors 8 192 192  
RAM 16 GB 2.0 TB 1.5 TB  
Number of Devices 1 (usually) 62 33  
Total number of cores 4 5952 3168  

And that’s not all of Bunya !!

HPC 101

Structure of Bunya HPC

Schematic of a HPC Cluster

The vast majority of work on HPCs is done via a batch system

#There are this many jobs running
[davidg@bunya1 ~]$ squeue | grep " R " | wc -l
801

#and this many are queued
[davidg@bunya1 ~]$ squeue | grep " PD " | wc -l
470

We often want to know how busy Bunya is. The sinfo command tells us the state of all the nodes in a partition.

[davidg@bunya1 ~]$ sinfo -p general
PARTITION AVAIL  TIMELIMIT  NODES  STATE NODELIST
general      up 14-00:00:0      1  down$ bun009
general      up 14-00:00:0      1  drain bun023
general      up 14-00:00:0     39    mix bun[008,010-011,026,034,045-050,060,062,083-086,088,090,092-095,097,099-105,108-115]
general      up 14-00:00:0     54  alloc bun[006-007,012-022,024-025,027-033,035-044,051-059,061,063-067,087,089,091,096,098,106-107]


[davidg@bunya1 ~]$ sinfo -p gpu_cuda
PARTITION AVAIL  TIMELIMIT  NODES  STATE NODELIST
gpu_cuda     up 7-00:00:00      1  down$ bun003
gpu_cuda     up 7-00:00:00     16    mix bun[004-005,068,071-082,116]

The node states will usually be one of

There is the possibility of interactive use BUT …

Interactive use must still be mediated by the batch system to fairly share the resources.

This ensures that the computation takes place on a compute nodes.

Installing R packages must not be done on the login nodes. Getting in early ;-)

Interactive graphical usage of R can be achieved using

So, yes, you can run RStudio. Indeed, two ways!

The implications for your R code

Some of the benefits of moving to HPC are access to potentially

Some of the challenges:

Using R on Bunya

How to activate R in your session

[uqdgree5@bun048 R]$ module purge

[uqdgree5@bun048 R]$ module load r/4.3.3-gfbf-2023a

[uqdgree5@bun048 R]$ which Rscript
/sw/auto/rocky8c/epyc3/software/R/4.3.3-gfbf-2023a/bin/Rscript

[uqdgree5@bun048 R]$ cat demo.R
1+1
2+2
[uqdgree5@bun048 R]$

[uqdgree5@bun048 R]$ Rscript demo.R
[1] 2
[1] 4

Where is it going to search for R packages?

[uqdgree5@bun048 R]$ module purge

[uqdgree5@bun048 R]$ module load r/4.2.1-foss-2022a

[uqdgree5@bun048 R]$ which R
/sw/auto/rocky8c/epyc3/software/R/4.2.1-foss-2022a/bin/R

[uqdgree5@bun048 R]$ R

R version 4.2.1 (2022-06-23) -- "Funny-Looking Kid"
Copyright (C) 2022 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

  Natural language support but running in an English locale

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.

[Previously saved workspace restored]

> .libPaths()
[1] "/home/uqdgree5/R/x86_64-pc-linux-gnu-library/4.2"
[2] "/sw/auto/rocky8c/epyc3/software/R/4.2.1-foss-2022a/lib64/R/library"
> q()
Save workspace image? [y/n/c]: n

Remember mention of EPYC3 and EPYC4 CPUs ?

You don’t need to re-build any of the following packages

[davidg@bunya1 ~]$ ls /sw/auto/rocky8c/epyc3/software/R/4.2.1-foss-2022a/lib64/R/library
abc                   doMC                ineq                      parallelly          scatterplot3d
abc.data              doParallel          influenceR                parallelMap         scs
abe                   doRNG               infotheo                  ParamHelpers        sctransform
abind                 doSNOW              ini                       parsedate           SDMTools
acepack               dotCall64           inline                    party               seewave
adabag                downloader          intergraph                partykit            segmented
ade4                  dplyr               interpretR                pastecs             selectr
ADGofTest             dr                  intrinsicDimension        patchwork           sem
admisc                drgee               inum                      pbapply             semPLS
aggregation           DRR                 ipred                     pbivnorm            semTools
AICcmodavg            drugCombo           irace                     pbkrtest            sendmailR

<SNIP>                <SNIP>              <SNIP>                    <SNIP>              <SNIP>

dismo                 HWxtest             origami                   Rvmmin              xtable
distillery            hypergeo            orthopolynom              RWeka               xts
distr                 ica                 osqp                      RWekajars           yaImpute
distrEx               IDPmisc             outliers                  s2                  yaml
distributional        idr                 packrat                   sampling            yulab.utils
DistributionUtils     ids                 pacman                    sandwich            zeallot
diveRsity             ie2misc             pammtools                 sass                zip
dlm                   igraph              pamr                      SBdecomp            zoo
DMCfun                image.binarization  pan                       scales
doc2vec               imager              parallel                  scam
docstring             imagerExtra         parallelDist              scatterpie
[davidg@bunya1 ~]$

Access Methods for R

Command line interactive batch job

Regular batch jobs

Bunya-on-Demand


Parameter Sweeps

#How parameter sweeps get big quickly ... 1,000,000 combinations
for (i in 0:999){
  for (j in 0:99){
    for (k in 0:9){
      do_something(i,j,k)
    }
  }
}

Even if you do this on a Bunya node, using library(parallel), you can only perform 96 combinations in parallel.

It is often more effective to break it into smaller pieces and use a high throughput computing (HTC) approach. Simplest would look like

Job Arrays

Bunya has support for “Job Arrays” which make sweeping over integer parameter values straightforward.

Job arrays can be adapted to sweep non-integer parameters, too.

Job arrays on Bunya are limited to 1,000 elements, but you can submit more than one to the batch system.

Nimrod

UQ RCC will be adding a tool called Nimrod to Bunya later this year.

That will make it easier to sweeps over combinations of parameters that are not integers.

Nimrod can sweep over parameters values that are

In summary

The possibilities that arise from combining R with HPC are well worth the learning curve!


A recording of this presentation is available online

This document was created using R Studio 2024.04.0 Build 735 (R Version 4.4.0) and KnitR on Windows 11.


.