17 High-performance computing (HPC)

Adapted by UCD-SeRG team from original by Anna Nguyen, Jade Benjamin-Chung, and Gabby Barratt Heitmann

When you need to run a script that requires a large amount of RAM, large files, or that uses parallelization, UC Davis provides several high-performance computing (HPC) resources.

17.1 UC Davis Computing Resources

17.1.1 Available Resources

UC Davis HPC Clusters: - Farm Cluster (hpc.ucdavis.edu): UC Davis’s primary HPC cluster providing shared computing resources for research

PHS Shared Compute Environments: For lab members affiliated with the School of Public Health Sciences (PHS), additional shared computing environments are available. These environments provide secure, HIPAA-compliant computing resources suitable for working with sensitive health data.

Shiva (shiva.ucdavis.edu): SLURM-based cluster for computational work
Mercury (mercury.ucdavis.edu): RStudio GUI computing environment

For detailed information about PHS shared compute environments, including access procedures, security guidelines, and usage policies, please refer to the PHS Shared Compute Environments Guide.

Contact lab leadership for assistance with: - Requesting access to computing resources - Choosing the appropriate computing environment for your project - Setting up your computing environment

17.2 Getting started with SLURM clusters

To access a UC Davis HPC cluster, in terminal, log in using SSH. For example, to access shiva:

ssh USERNAME@shiva.ucdavis.edu

You will be prompted to enter your UC Davis credentials and may need to complete two-factor authentication.

Once you log in, you can view the contents of your home directory in command line by entering cd $HOME. You can create subfolders within this directory using the mkdir command. For example, you could make a “code” subdirectory and clone a Github repository there using the following code:

cd $HOME
mkdir code
git clone https://github.com/jadebc/covid19-infections.git

17.2.1 One-Time System Set-Up

To keep the install packages consistent across different nodes, you will need to explicitly set the pathway to your R library directory.

Open your ~/.Renviron file (vi ~/.Renviron) and append the following line:

Note: Once you open the file using vi [file_name], you must press i (on Mac OS) or Insert (on Windows) to make edits. After you finish, hit Esc to exit editing mode and type :wq to save and close the file.

R_LIBS=~/R/x86_64-pc-linux-gnu-library/4.0.2

Alternatively, run an R script with the following code on the cluster:

r_environ_file_path = file.path(Sys.getenv("HOME"), ".Renviron")
if (!file.exists(r_environ_file_path)) file.create(r_environ_file_path)

cat("\nR_LIBS=~/R/x86_64-pc-linux-gnu-library/4.0.2",
    file = r_environ_file_path, sep = "\n", append = TRUE)

To load packages that run off of C++, you’ll need to set the correct compiler options in your R environment.

Open the Makevars file (vi ~/.R/Makevars) and append the following lines

CXX14FLAGS=-O3 -march=native -mtune=native -fPIC
CXX14=g++

Alternatively, create an R script with the following code, and run it on the cluster:

dotR = file.path(Sys.getenv("HOME"), ".R")
if (!file.exists(dotR)) dir.create(dotR)

M = file.path(dotR, "Makevars")
if (!file.exists(M)) file.create(M)

cat("\nCXX14FLAGS=-O3 -march=native -mtune=native -fPIC",
    "CXX14=g++",
    file = M, sep = "\n", append = TRUE)

17.3 Moving files to the cluster

The $HOME directory is a good place to store code and small test files. Save large files to the $SCRATCH directory or other designated storage areas. Check with the UC Davis HPC documentation for specific quotas and retention policies. It’s best to create a bash script that records the file transfer process for a given project. See example code below:

# note: the following steps should be done from your local 
# (not after ssh-ing into the cluster)

# securely transfer folders from Box to cluster home directory
# note: the -r option is for folders and is not needed for files
scp -r "Box/project-folder/folder-1/" USERNAME@shiva.ucdavis.edu:/home/users/USERNAME/

# securely transfer folders from Box to your cluster scratch directory
scp -r "Box/project-folder/folder-2/" USERNAME@shiva.ucdavis.edu:/scratch/users/USERNAME/

# securely transfer folders from Box to shared scratch directory
scp -r "Box/project-folder/folder-3/" USERNAME@shiva.ucdavis.edu:/scratch/group/GROUPNAME/

17.4 Installing packages on the cluster

When you begin working on a cluster, you will most likely encounter problems with installing packages. To install packages, login to the cluster on the command line and open a development node. Do not attempt to do this in RStudio Server, as you will have to re-do it for every new session you open.

ssh USERNAME@shiva.ucdavis.edu

sdev

You should only have to install packages once. The cluster may require that you specify the repository where the package is downloaded from. You may also need to add an additional argument to install.packages to prevent the packages from locking after installation:

install.packages(<PACKAGE NAME>, repos="https://cran.r-project.org", 
                  INSTALL_opts = "--no-lock")

In order for some R packages to work on clusters, it is necessary to load specific software modules before running R. These must be loaded each time you want to use the package in R. For example, for spatial and random effects analyses, you may need the modules/packages below. These modules must also be loaded on the command line prior to opening R in order for package installation to work.

module --force purge # remove any previously loaded modules, including math and devel
module load math
module load math gmp/6.1.2
module load devel
module load gcc/10
module load system
module load json-glib/1.4.4
module load curl/7.81.0
module load physics
module load physics udunits geos
module load physics gdal/2.2.1 # for R/4.0.2
module load physics proj/4.9.3 # for R/4.0.2
module load pandoc/2.7.3

module load R/4.0.2

R # Open R in the Shell window to install individual packages or test code
Rscript install-packages.R # Alternatively, run a package installation script in the Shell window

Figuring out the issues with some packages will require some trial and error. If you are still encountering problems installing a package, you may have to install other dependencies manually by reading through the error messages. If you try to install a dependency from CRAN and it isn’t working, it may be a module. You can search for it using the module spider command:

module spider DEPENDENCY NAME

You can also reach out to UC Davis HPC support for help. Visit hpc.ucdavis.edu for support information.

17.5 Testing your code

Both of the following ways to test code on a cluster are recommended for making small changes, such as editing file paths and making sure the packages and source files load. You should write and test the functionality of your script locally, only testing on the cluster once major bugs are out.

17.5.1 The command line

There are two main ways to explore and test code on computing clusters. The first way is best for users who are comfortable working on the command line and editing code in base R. Even if you are not comfortable yet, this is probably the better way because these commands will transfer between different cluster computers using Slurm.

Typically, you will want to initially test your scripts by initiating a development node using the command sdev. This will allocate a small amount of computing resources for 1 hour. You can access R via command line using the following code.

# open development node
sdev

# Load all the modules required by the packages you are using
module load MODULE NAME  

# Load R (default version)*
module load R 

# initiate R in command line
R

*Note: for collaboration purposes, it’s best for everyone to work with one version of R. Check what version is being used for the project you are working on. Some packages only work with some versions of R, so it’s best to keep it consistent.

17.5.2 RStudio Server

For RStudio GUI computing, UC Davis provides mercury.ucdavis.edu. This is accessed through a web browser and provides an RStudio interface. You will be prompted to authenticate with your UC Davis credentials. This is the best way to work with R for people who are not comfortable accessing & editing in base R in a Shell application.

Note that mercury does not have SLURM, so it’s best suited for interactive work and smaller computations. For large-scale computations requiring SLURM job scheduling, use shiva.ucdavis.edu instead.

When using RStudio Server, you can test your code interactively. However, do NOT use the RStudio Server’s Terminal to install packages and configure your environment for SLURM-based clusters, as you will likely need to re-do it for every session/project. For SLURM clusters, use the command line approach described earlier.

17.5.3 Filepaths & configuration on the cluster

In most cases, you will want to test that the file paths work correctly on the cluster. You will likely need to add code to the configuration file in the project repository that specifies cluster-specific file paths. Here is an example:

# set cluster-specific file paths
if(Sys.getenv("LMOD_SYSHOST")!=""){
  
  cluster_path = paste0(Sys.getenv("HOME"), "/project-name/")
  
  data_path = paste0(cluster_path, "data/")
  results_path = paste0(cluster_path, "results/")
}

17.6 Storage & group storage access

17.6.1 Individual storage

There are multiple places to store your files on computing clusters. Each user has their own $HOME directory as well as a $SCRATCH directory. These are directories that can be accessed via the command line once you’ve logged in to the cluster:

cd $HOME 
cd /home/users/USERNAME # Alternatively, use the full path

cd $SCRATCH
cd /scratch/users/USERNAME # Full path

You can also navigate to these using the File Explorer if available through a web interface.

$HOME typically has a volume quota (e.g., 15 GB). $SCRATCH typically has a larger volume quota (e.g., 100 TB), but files here may get deleted after a certain period of inactivity. Thus, use $SCRATCH for test files, exploratory analyses, and temporary storage. Use $HOME for long-term storage of important files and more finalized analyses.

Check with the UC Davis HPC documentation for specific storage options and quotas.

17.6.2 Group storage

The lab may have shared $GROUP_HOME and $GROUP_SCRATCH directories to store files for collaborative use. These typically have larger quotas and may have different retention policies. You can access these via the command line or navigate to them using the File Explorer:

cd $GROUP_HOME
cd /home/groups/GROUPNAME

cd $GROUP_SCRATCH
cd /scratch/groups/GROUPNAME

However, saving files to group storage can be tricky. You can try using the scp command in the section “Moving files to the cluster” to see if you have permission to add files to group directories. Read the next section to ensure any directories you create have the right permissions.

17.6.3 Folder permissions

Generally, when we put folders in $GROUP_HOME or $GROUP_SCRATCH, it is so that we can collaborate on an analysis within the research group, so multiple people need to be able to access the folders. If you create a new folder in $GROUP_HOME or $GROUP_SCRATCH, please check the folder’s permissions to ensure that other group members are able to access its contents. To check the permissions of a folder, navigate to the level above it, and enter ls -l. You will see output like this:

drwxrwxrwx 2 jadebc jadebc  2204 Jun 17 13:12 myfolder

Please review this website to learn how to interpret the code on the left side of this output. The website also tells you how to change folder permissions. In order to ensure that all users and group members are able to access a folder’s contents, you can use the following command:

chmod ugo+rwx FOLDER_NAME

17.7 Running big jobs

Once your test scripts run successfully, you can submit an sbatch script for larger jobs. These are text files with a .sh suffix. Use a text editor like Sublime to create such a script. Documentation on sbatch options is available from Slurm Workload Manager (“Slurm Workload Manager: Sbatch Documentation,” n.d.). Here is an example of an sbatch script with the following options:

job-name=run_inc: Job name that will show up in the SLURM system
begin=now: Requests to start the job as soon as the requested resources are available
dependency=singleton: Jobs can begin after all previously launched jobs with the same name and user have ended.
mail-type=ALL: Receive all types of email notification (e.g., when job starts, fails, ends)
cpus-per-task=16: Request 16 processors per task. The default is one processor per task.
mem=64G: Request 64 GB memory per node.
output=00-run_inc_log.out: Create a log file called 00-run_inc_log.out that contains information about the Slurm session
time=47:59:00: Set maximum run time to 47 hours and 59 minutes. If you don’t include this option, the cluster will automatically exit scripts after 2 hours of run time (default may vary by cluster).

The file analysis.out will contain the log file for the R script analysis.R.

#!/bin/bash

#SBATCH --job-name=run_inc
#SBATCH --begin=now
#SBATCH --dependency=singleton
#SBATCH --mail-type=ALL
#SBATCH --cpus-per-task=16
#SBATCH --mem=64G
#SBATCH --mem=64G
#SBATCH --output=00-run_inc_log.out
#SBATCH --time=47:59:00

cd $HOME/project-code-repo/2-analysis/

module purge 

# load R version 4.0.2 (required for certain packages)
module load R/4.0.2

# load gcc, a C++ compiler (required for certain packages)
module load gcc/10

# load software required for spatial analyses in R
module load physics gdal
module load physics proj

R CMD BATCH --no-save analysis.R analysis.out

To submit this job, save the code in the chunk above in a script called myjob.sh and then enter the following command into terminal:

sbatch myjob.sh

To check on the status of your job, enter the following code into terminal:

squeue -u $USERNAME