Using the HPC queuing systems¶

Slurm is the primary queuing system available on the Wildwood HPC infrastructure. If you have used the CQLS compute infrastructure for some time, you may be comfortable using the SGE commands to start and stop jobs, check on progress, and identify available compute resources. This guide will help you become more accustomed to the Slurm commands and hopefully reduce downtime when switching your workflows over to Slurm.

Note

SGE is available for legacy pipelines on specific compute nodes, but we recommend moving over to Slurm whenever possible. hpcman commands will continue to support both SGE and Slurm whenever possible.

Command overview¶

Purpose	SGE	Slurm	hpcman	Notes
Non-interactive job submission	`qsub`	`sbatch`	`hqsub`	`SGE_Batch` and `SGE_Avail` previously worked for this purpose
Interactive jobs	`qrsh`	`salloc`	`N/A`	`srun --pty $SHELL` is also acceptable
Terminating jobs	`qdel`	`scancel`	`N/A`
Monitoring job status	`qstat`	`squeue`	`hqstat`
Checking job details	`qstat -j $JOBID`	`scontrol show job $JOBID`	`N/A`	The `scontrol show job` option is more informative than the `sstat` command, which is available in Slurm
Getting available compute resources	`qstat -f`	`sinfo -Nl`	`hqavail`

Submitting jobs¶

We recommend most users transition from SGE_Batch and SGE_Array workflows to using hqsub for job submission. hqsub is part of the hpcman software developed at the CQLS to help manage HPC environments, software, and jobs. Under the hood, hqsub can submit scripts to both SGE and Slurm queueing systems, using qsub and sbatch, respectively.

Advanced users and those who previously wrote their own qsub scripts should find migration to sbatch relatively painless. See the Rosetta Stone of Workload Managers for more information.

Tip

The translation for number of cores/cpus seems to be -c rather than -n

Interactive jobs¶

In order to check out an interactive session on a node using Slurm, instead of using qrsh, users can use the salloc command (or srun --pty $SHELL). You can also specify a queue (called partitions in Slurm, and for the remainder of this document) and/or a specific node.

Here's an example:

salloc -c 8 -p core -w chrom1

I have checked out 8 CPUs (-c flag), on the core partition (-p flag), on the node chrom1 (-w flag).

Tip

Unlike SGE, Slurm does not limit job submission to a submit host. Because of this, you can check out a node interactively, then submit jobs to/from that node (using sbatch or hqsub). You can monitor job outputs and resource usage more directly using this job submission paradigm.

Link to Slurm reference regarding interactive jobs.

Terminating jobs¶

Users have a few options for filtering and/or selecting jobs to terminate using scancel compared to the qdel command of SGE. The general protocol is the same, with scancel $JOBID canceling a single job by specified jobid.

In general, I suggest always providing the -v flag to scancel as scancel provides no user feedback by default.

Option	Purpose
`--me`	Restrict canceling to your own jobs
`-t STATE`	Cancel jobs in a particular STATE, i.e. pending, running, or suspended
`-p PARTITION`	Restrict to the specified partition
`-w NODE`	Restrict to the specified node
`-n NAME`	Cancel jobs with the specified name

So, as an example, if I wanted to verbosely terminate all of my pending jobs in the all.q partition, I would run:

scancel -v --me -p all.q -t pending

Job status¶

The squeue command prints out information about all running jobs on the infrastructure. squeue acts as a replacement for the qstat command. In general, the hqstat command should suffice for most of your needs in terms of job status monitoring.

Tip

Use the hqstat --watch command to watch job status over time. Use ctrl+c to cancel the watch.

Job details¶

While we previously could get additional information about job details using the qstat -j $JOBID command, the closest command to replicate this functionality in Slurm is the scontrol show job $JOBID command.

Tip

To monitor job status and details programmatically, you can use either the squeue --json -j $JOBID or scontrol show job --json $JOBID commands. The outputs are nearly identical.

Node status and availability¶

In order to monitor node availability, we previously provided the SGE_Avail command, which ran qstat -j and qhost and aggregated the results in a table. In Slurm, these details can be gathered using the sinfo command, and are wrapped using the hqavail command of hpcman. To find out more information about a specific node or partition, you can use the scontrol show node $NODE or scontrol show partition $PARTITON, respectively. Add the --json flag to either of those commands for programmatic access using json.

Detailed node info using `hqinfo`¶

We've developed a new tool (hqinfo) to help users identify issues with their jobs on nodes. hqinfo shows information about all nodes by default. You can limit the output to nodes you have access to by using the hqinfo --me flag. This is similar to the squeue --me flag and others commands within Slurm.

Notably, hqinfo is useful to identify nodes that have memory utilized fully, but still have cpu slots available for checkout. Furthermore, you can identify issues with jobs on particular nodes more easily using the hqinfo interface.

Accounts and partitions¶

In Slurm, membership of users in Slurm accounts, which are unrelated to your linux groups, are what controls access to the Slurm partitions. To see what accounts you have access to, you can run this command:

sacctmgr user

➜ sacctmgr show user -s davised format=User,DefaultAccount%15,Account%15
      User        Def Acct         Account
---------- --------------- ---------------
   davised            core           grace
   davised            core           dmplx
   davised            core         jackson
   davised            core        cqls_gpu
   davised            core            core
   davised            core            cqls
   davised            core           ceoas

Note

The -s flag is required to show different associations between users and accounts. If you are missing access to a partition that you think you should have access to, you can see which accounts are allowed for a partition using scontrol show partition $PARTITION.

Finding Slurm account information using `hqacctinfo`¶

In order to facilitate identfying which accounts are tied to which partitions, and which users have access to those resources, we've developed the hqacctinfo command. To get a list of your own Slurm account access, and which partitions they are associated with, you can run the hqacctinfo --me command. You'll see your default account, along with all other Slurm accounts you have access to and what partitions they provide access to.

Partitions listed as bold* are those with priority access to the nodes they contain.

➜ hqacctinfo --me
Slurm Queue Account Associations
└── davised Default Acct: core
    ├── actf
    │   └── actf
    ├── bpp
    │   └── bpp
    ├── ceoas
    │   └── all.q,ceoas,ceoas-arm,ceoas-gpu,ewg,brizo
    ├── core
    │   └── cqls_gpu_core*,cqls_ppc64le_core*,core,samwise,arcs.q
    ├── cqls
    │   └── all.q,cqls_gpu,cqls_gpu-1080,cqls_ppc64le
    ├── cqls_gpu
    │   └── None
    ├── dmplx
    │   └── dmplx
    ├── fw
    │   └── fw_gpu,fw
    ├── grace
    │   └── grace
    ├── jackson
    │   └── jackson*
    ├── micro
    │   └── mus,bact,micro
    ├── nucleotide
    │   └── nucleotide
    ├── phyc_lab
    │   └── phyc_lab
    ├── roots
    │   └── roots
    ├── sharpton
    │   └── sharpton
    └── sharpton_gh
        └── sharpton_gh

You can also find out which users have access to a particular partition using the -p flag:

➜ hqacctinfo -p core
Slurm Queue Account Associations
├── davised Default Acct: core
│   └── core
│       └── core
├── carrells Default Acct: cqls
│   └── core
│       └── core
├── elserj Default Acct: bpp
│   └── core
│       └── core
├── wangb5 Default Acct: core
│   └── core
│       └── core
├── lettk Default Acct: cqls
│   └── core
│       └── core
├── kronmilb Default Acct: bpp
│   └── core
│       └── core
├── blackand Default Acct: core
│   └── core
│       └── core
├── talbots Default Acct: cqls
│   └── core
│       └── core
├── divilovk Default Acct: cqls
│   └── core
│       └── core
├── cgrbinst Default Acct: cqls
│   └── core
│       └── core
├── schmidtm Default Acct: cqls
│   └── core
│       └── core
├── bhatiaan Default Acct: jackson
│   └── core
│       └── core
└── smithai3 Default Acct: cqls
    └── core
        └── core

And also find out who has access to a particular account:

➜ hqacctinfo -a core
Slurm Queue Account Associations
├── davised Default Acct: core
│   ├── actf
│   │   └── actf
│   ├── bpp
│   │   └── bpp
│   ├── ceoas
│   │   └── all.q,ceoas,ceoas-arm,ceoas-gpu,ewg,brizo
│   ├── core
│   │   └── cqls_gpu_core*,cqls_ppc64le_core*,core,samwise,arcs.q
│   ├── cqls
│   │   └── all.q,cqls_gpu,cqls_gpu-1080,cqls_ppc64le
│   ├── cqls_gpu
│   │   └── None
│   ├── dmplx
│   │   └── dmplx
│   ├── fw
│   │   └── fw_gpu,fw
│   ├── grace
│   │   └── grace
│   ├── jackson
│   │   └── jackson*
│   ├── micro
│   │   └── mus,bact,micro
│   ├── nucleotide
│   │   └── nucleotide
│   ├── phyc_lab
│   │   └── phyc_lab
│   ├── roots
│   │   └── roots
│   ├── sharpton
│   │   └── sharpton
│   └── sharpton_gh
│       └── sharpton_gh
├── carrells Default Acct: cqls
│   ├── actf
│   │   └── actf
│   ├── biomed
│   │   └── biomed
│   ├── core
│   │   └── cqls_gpu_core*,cqls_ppc64le_core*,core,samwise,arcs.q
│   ├── cqls
│   │   └── all.q,cqls_gpu,cqls_gpu-1080,cqls_ppc64le
│   ├── dmplx
│   │   └── dmplx
│   └── jackson
│       └── jackson*
...

These features should help users (e.g. lab managers) identify when they do not have access to all the resources they should, either by comparing to other members of their lab, or by other members of their department.

Guidance for fair use of the queueing system¶

As we migrate our workflows to Slurm, we want to ensure everyone has access to the compute resources they need to complete their research projects. Please be mindful of how many resources your jobs are using at any given time on the infrastructure, especially on shared partitions.

Our primary goal for fair use should be that jobs that can complete within 24-48h are provided the resources to do so. Lack of resources should not inhibit jobs from finishing in a timely manner.

Here are some general guidelines that you can follow when submitting your jobs:

Use array jobs to group your submission of related jobs when processing multiple samples at once. If you are submitting tens-thousands of jobs at a time in a loop, please convert your scripts to submit an array job.
When using array jobs, control the concurrency (-b flag of hqsub). Concurrency is the setting that controls the maximum number of tasks in that array that will run at once. Multiply the cpus*concurrency to see the potential CPU usage of your array job. Leave space on the partition for other folks to use.
When using multiple CPUs (-p flag of hqsub), make sure to set the CPU usage in your command as well. Most programs will not automatically use the number of CPUs provided by Slurm.
Use the local drives (/scratch) when possible. Using more CPUs does not always lead to an increase in compute speed. Not all programs support using multiple CPUs. Often, using the local drives can lead to reduced runtime due to the speed-up in program I/O. If the CPUs are waiting for data, then providing more CPUs will never speed up your processing.
If you have a processing job that will require the majority of a partition's resources, submit the jobs during lower use times, i.e. after hours or on weekends. This will ensure jobs are moving through the queuing system more quickly during the work day.
Use the departmental partitions rather than lab partitions for most of your jobs, and only use lab partitions for high priority jobs so that priority queuing can work.

Priority queuing¶

In order to facilitate our shared goals of fair use, we have enabled the Slurm priority plugin. The weights for the priority are currently:

sprio -w

➜ sprio -w
          JOBID PARTITION   PRIORITY       SITE        AGE  FAIRSHARE    JOBSIZE  PARTITION
        Weights                               1       1000       2000       1000      10000

As you can see, we have implemented a system where partition has more weight than default. In this way, you can choose which partition you submit your jobs to in order to control the priority.

The departmental partitions (e.g. bpp) will have a lower priority than the lab-specific partitions within that department. In general, your standard/lower priority jobs should be targeted to the departmental queue, with your high- priority jobs targeted to your lab-specific partition.

We have set the preempt mode to GANG/SUSPEND, meaning that lower priority jobs may be suspended or fail to start if higher priority jobs are already scheduled to run. Suspended jobs will remain in memory on the node so that they can later be resumed (so memory will not be freed). CPUs and the remaining memory available on a machine will be available for the higher priority job.

Please let us know or contact me (Ed) over email, slack, teams, etc. if these settings appear to be working (or not!).

Note

If your jobs do list a reason with squeue of (PRIORITY) then it means priority queuing is affecting the job start/stop.

Using the HPC queuing systems¶

Command overview¶

Submitting jobs¶

Interactive jobs¶

Terminating jobs¶

Job status¶

Job details¶

Node status and availability¶

Detailed node info using hqinfo¶

Accounts and partitions¶

Finding Slurm account information using hqacctinfo¶

Guidance for fair use of the queueing system¶

Priority queuing¶

Detailed node info using `hqinfo`¶

Finding Slurm account information using `hqacctinfo`¶