Skip to content

Using local drives on the compute nodes

The Wildwood HPC cluster has local drives mapped to the /scratch volume. This is a change from the old infrastructure, where local drives were mapped as /data. We made this change for two primary reasons. For one, /scratch makes more intuitive sense when thinking about naming conventions, as the files stored in /scratch should be considered temporary and should be somewhat routinely cleaned up, while /data should be considered non-temporary. And second, the /data volume was previously used for different purposes on different types of machines. For webservers, the /data drive is used to store resources necessary to run the websites that are hosted on the device. Therefore, the /data drive was unavailable to users for writing in some cases, causing software error messages, including on the login and file nodes (previously shell and files, now shell-hpc and files-hpc). For compute nodes, the /data drive was expected to be available for writing temporary outputs. Now, we have this general rule for local disk mounting:

volume purpose notes
/tmp Small OS temporary space Used for OS temp work, and temp space on login and file nodes
/data Non-temporary data storage Used for storing web-server resources. On compute nodes with secondary disks, leftover OS-drive space is mapped as /data
/scratch Temporary processing space Used for temporarily storing processing output. Linked as /data if no secondary drive is available.

Note

/scratch should be preferentially used as processing space. /data may be present, and may even point to the same physical drive to reduce legacy issues. /scratch should be preferred for new pipelines and procedures.

Dynamic $TMPDIR settings to facilitate usage of /scratch

If you are using the updated dotfiles, then you may have already noticed that $TMPDIR is set dynamically for you, depending on what drive is available to you. On login and file-transfer nodes, your $TMPDIR will be /tmp. This is because /data is used for hosting websites and other data on these nodes, and /scratch is unavailable to discourage accidental data processing on these nodes. If you log in to a compute node with salloc you should find your $TMPDIR variable automatically updates to /scratch.

This happens because the updated configuration files have this little section of code to find which directory is writeable and update the $TMPDIR on login:

/local/cqls/etc/profile.d/tmpdir.sh
# Point compilers and System facilities to use the tmpfs for temp files
#
if [ -w "/scratch" ]; then
        TMPDIR=/scratch
elif [ -w "/data" ]; then
        TMPDIR=/data
else
        TMPDIR=/tmp
fi
export TMPDIR

The problem with Slurm batch jobs and $TMPDIR

One change that we have had to overcome regarding Slurm vs SGE is how each queuing system inherits from the submitting environment. SGE, even on batch jobs, would re-load a users environment and the dynamic $TMPDIR setting as above would find the appropriate $TMPDIR on the compute node. Slurm batch jobs, submitted using sbatch do not re-load the users environment, and inherit from the submit environment, where the $TMPDIR setting is /tmp.

A solution, using hqsub

hqsub, starting in version 1.5.0, has an autotmp setting on by default, that enables the automatic $TMPDIR code as shown above to be run on the compute node, such that the $TMPDIR is updated before processing of data occurs. This setting can be disabled using --no-autotmp on an individual run, or you can set HQSUB_AUTOTMP environment variable to False or 0 in your shell config file to permanently disable the feature.

In this way, interactive jobs started with salloc and batch/array jobs started with hqsub will have the same expected $TMPDIR settings.

Using the --local-drive option of hqsub

The --local-drive option of hqsub also benefits from the updated autotmp setting, assuming the feature has not been disabled. You can still specify a non-dynamic prefix for your hqsub --local-drive using the --local-prefix option, but by default, --local-drive pertask and --local-drive shared will work as expected with the dynamic $TMPDIR updates.