Using local drives on the compute nodes¶
The Wildwood HPC cluster has local drives mapped to the /scratch
volume. This is a change from the old infrastructure,
where local drives were mapped as /data
. We made this change for two primary reasons. For one, /scratch
makes more
intuitive sense when thinking about naming conventions, as the files stored in /scratch
should be considered temporary
and should be somewhat routinely cleaned up, while /data
should be considered non-temporary. And second, the /data
volume was previously used for different purposes on different types of machines. For webservers, the /data
drive is
used to store resources necessary to run the websites that are hosted on the device. Therefore, the /data
drive was
unavailable to users for writing in some cases, causing software error messages, including on the login and file nodes
(previously shell
and files
, now shell-hpc
and files-hpc
). For compute nodes, the /data
drive was expected to
be available for writing temporary outputs. Now, we have this general rule for local disk mounting:
volume | purpose | notes |
---|---|---|
/tmp |
Small OS temporary space | Used for OS temp work, and temp space on login and file nodes |
/data |
Non-temporary data storage | Used for storing web-server resources. On compute nodes with secondary disks, leftover OS-drive space is mapped as /data |
/scratch |
Temporary processing space | Used for temporarily storing processing output. Linked as /data if no secondary drive is available. |
Note
/scratch
should be preferentially used as processing space. /data
may be present, and may even point to the same
physical drive to reduce legacy issues. /scratch
should be preferred for new pipelines and procedures.
Dynamic $TMPDIR
settings to facilitate usage of /scratch
¶
If you are using the updated dotfiles, then you may
have already noticed that $TMPDIR
is set dynamically for you, depending on what drive is available to you. On login
and file-transfer nodes, your $TMPDIR
will be /tmp
. This is because /data
is used for hosting websites and other
data on these nodes, and /scratch
is unavailable to discourage accidental data processing on these nodes. If you log
in to a compute node with salloc
you should find your $TMPDIR
variable automatically updates to /scratch
.
This happens because the updated configuration files have this little section of code to find which directory is
writeable and update the $TMPDIR
on login:
# Point compilers and System facilities to use the tmpfs for temp files
#
if [ -w "/scratch" ]; then
TMPDIR=/scratch
elif [ -w "/data" ]; then
TMPDIR=/data
else
TMPDIR=/tmp
fi
export TMPDIR
The problem with Slurm batch jobs and $TMPDIR
¶
One change that we have had to overcome regarding Slurm vs SGE is how each queuing system inherits from the submitting
environment. SGE, even on batch jobs, would re-load a users environment and the dynamic $TMPDIR
setting as above would
find the appropriate $TMPDIR
on the compute node. Slurm batch jobs, submitted using sbatch
do not re-load the users
environment, and inherit from the submit environment, where the $TMPDIR
setting is /tmp
.
A solution, using hqsub
¶
hqsub
, starting in version 1.5.0, has an autotmp setting on by default, that enables the automatic $TMPDIR
code as
shown above to be run on the compute node, such that the $TMPDIR
is updated before processing of data occurs. This
setting can be disabled using --no-autotmp
on an individual run, or you can set HQSUB_AUTOTMP
environment variable
to False or 0 in your shell config file to permanently disable the feature.
In this way, interactive jobs started with salloc
and batch/array jobs started with hqsub
will have the same
expected $TMPDIR
settings.
Using the --local-drive
option of hqsub
¶
The --local-drive
option of hqsub
also benefits from the updated autotmp setting, assuming the feature has not been
disabled. You can still specify a non-dynamic prefix for your hqsub --local-drive
using the --local-prefix
option,
but by default, --local-drive pertask
and --local-drive shared
will work as expected with the dynamic $TMPDIR
updates.