hpcman queue submit¶

Syntax between SGE_Batch and SGE_Array has been resolved. hpcman queue submit will use the same syntax regardless of submit job type.

Tip

Use hqsub as an alias for hpcman queue submit so you don't have to type so much!

Comparison of settings¶

Parameter	`hpcman queue submit`	`SGE_Batch`	`SGE_Array`	Note
command	Positional argument or `STDIN`	-c 'COMMAND'	-c "FILENAME" or `STDIN`	`-c` not used in `hpcman`
# commands	Any number of commands for both batch and array jobs	one command, no '\n' or ';' supported	one command per task
processors/threads	`-p` or `-P` or `--procs`	`-P`	`-P`	Priority cannot be set in `hpcman`. `path` is set with `--path`. `$NPROCS` can be used to set procs in program as well.
runname	`-r`, can be auto set with `-r '*'`; also will be prompted if missing	`-r`, will be prompted if missing	`-r` or automatically generated	If `runname` exists already, will be prompted to overwrite. Can use `--force` to make this happen (like in `SGE_Array`).
queue	`-q`	`-q`	`-q`	Note: `-q` is required in `hpcman queue submit`! Can use `-q '*'` to use all available queues.
conda/current shell	`$PATH` is preserved in submit script	`$PATH` is partially preserved	`$PATH` is preserved in submit script	You can activate a `conda` env prior to submission, and it will remain active
Array jobs	Specifying `-t array` will convert the job to an array job	Use `$SGE_TASK_ID` to manually generate array job	Automatically converts all jobs to array jobs
Interactive menu	Will prompt for missing or overwrite existing `runname`	Fully interactive menu	No interactive menu
shell	Only `bash` is supported	`bash` or `tcsh` is supported	Only `bash` is supported

While SGE_Batch and SGE_Array will still be available, future work will be focused on making hpcman queue submit work better with the queue, integrate with SGE and Slurm, and integrate with our users workflows.

Anatomy of a queue submit script¶

Tip

Use the --dry-run option to generate the submit script and dir without executing the qsub command. You can manually edit the submit script and then run the script manually, if desired.

$ cat sge.blastp-test/sge.blastp-test.sh
#!/usr/bin/env bash
set -eo pipefail # (1)!
#
# This file generated by hpcman queue submit
#
# Export all environment variables
#$ -V (2)
#
# Use current working directory
#$ -cwd
#
# Use bash as the executing shell
#$ -S /bin/bash
#
# Set the job name
#$ -N sge.blastp-test
#
# Set the queue name
#$ -q micro
#
# Output files for stdout and stderr
#$ -o sge.blastp-test
#$ -e sge.blastp-test
#
export NPROCS=8 # (3)!
# Request processors
#$ -pe thread 8
#
# Set the memory limit(s)
#$ -l mem_free=4.0G
#
# Set filesize limit
#$ -l h_fsize=500.0G
#
# Set PATH variable and submit directory
export PATH=/home/cgrb/davised/.config/nvm/versions/node/v16.19.0/bin:`...` # (4)!
#
submitdir=$(pwd)
echo "##hpcman.jobs={'runid':'$JOB_ID','runname':'$JOB_NAME','host':'$(/bin/hostname -s)','wd':'$(pwd)','taskid':'$SGE_TASK_ID'}" >> /dev/stderr # (5)!
echo "  Started on: $(/bin/hostname -s)"
echo "  Started at: $(/bin/date)"
/usr/bin/time -f "\\tMemory (kb):                       %M\\n\\t# SWAP  (freq):                    %W\\n\\t# Waits (freq):                    %w\\n\\tCPU (percent):                     %P\\n\\tTime (seconds):                %e\\n\\tTime (hh:mm:ss.ms):                %E\\n\\tSystem CPU Time (seconds):         %S\\n\\tUser   CPU Time (seconds):         %U" \
    bash $submitdir/sge.blastp-test/sge.blastp-test.commands # (6)!
echo -e '\tFull Command:                      bash sge.blastp-test/sge.blastp-test.commands' >> /dev/stderr
echo "  Finished at:    $(/bin/date)"

Script fails if any part of the bash script fails.
Does not export current $PATH! Must set $PATH separately.
Use $NPROCS in your commands to sync the number of CPU cores.
Full $PATH is saved here to allow for conda env activation and other env var modications to be saved.
Your job submission information is saved here. You can view this in the .e job file (STDERR). In the future, this will also be cached in your home directory.
Your command is copied into a new file and run so that multiple commands can be submitted, even in a batch job.

Examples¶

Batch jobs¶

Tip

Use the --watch flag to confirm that your job was successfully submitted and started properly.

hpcman queue submitSGE_Batch

$ hpcman queue submit 'blastp -db nr -outfmt 7 -query /nfs1/CGRB/databases/test-data/ACE2-Hsapiens.fasta \
                      -num_threads $NPROCS -max_target_seqs 50 -out ACE2-Hsapiens_vs_nr.tab' \
                      -q micro \
                      -p 8 \
                      -r sge.blastp-test

🎉 Successfully submitted job 159058 sge.blastp-test to queue micro, logging job number, timestamp, and runname to .hpcman.jobnums

$ SGE_Batch -c 'blastp -db nr -outfmt 7 -query /nfs1/CGRB/databases/test-data/ACE2-Hsapiens.fasta \
               -num_threads 8 -max_target_seqs 50 -out ACE2-Hsapiens_vs_nr.tab' \
            -q micro \
            -P 8 \
            -r sge.blastp-test-SGE_Batch

    * Beginning the Data run
        RunID = sge.blastp-test-SGE_Batch
        Dir = sge.blastp-test-SGE_Batch

    * Your job 159059 ("sge.blastp-test-SGE_Batch") has been submitted

Array jobs¶

I generally recommend using a submit.sh script that uses echo to print the desired commands to stdout, and then piping those commands to hpcman queue submit (or SGE_Array).

Here is an example of that:

$ cat submit_blast.sh
#!/usr/bin/env bash
# USAGE (1)
#   blastp [-h] [-help] [-import_search_strategy filename]
#     [-export_search_strategy filename] [-task task_name] [-db database_name]
#     [-dbsize num_letters] [-gilist filename] [-seqidlist filename]
#     [-negative_gilist filename] [-negative_seqidlist filename]
#     [-taxids taxids] [-negative_taxids taxids] [-taxidlist filename]
#     [-negative_taxidlist filename] [-ipglist filename]
#     [-negative_ipglist filename] [-entrez_query entrez_query]
#     [-db_soft_mask filtering_algorithm] [-db_hard_mask filtering_algorithm]
#     [-subject subject_input_file] [-subject_loc range] [-query input_file]
#     [-out output_file] [-evalue evalue] [-word_size int_value]
#     [-gapopen open_penalty] [-gapextend extend_penalty]
#     [-qcov_hsp_perc float_value] [-max_hsps int_value]
#     [-xdrop_ungap float_value] [-xdrop_gap float_value]
#     [-xdrop_gap_final float_value] [-searchsp int_value] [-seg SEG_options]
#     [-soft_masking soft_masking] [-matrix matrix_name]
#     [-threshold float_value] [-culling_limit int_value]
#     [-best_hit_overhang float_value] [-best_hit_score_edge float_value]
#     [-subject_besthit] [-window_size int_value] [-lcase_masking]
#     [-query_loc range] [-parse_deflines] [-outfmt format] [-show_gis]
#     [-num_descriptions int_value] [-num_alignments int_value]
#     [-line_length line_length] [-html] [-sorthits sort_hits]
#     [-sorthsps sort_hsps] [-max_target_seqs num_sequences]
#     [-num_threads int_value] [-mt_mode int_value] [-ungapped] [-remote]
#     [-comp_based_stats compo] [-use_sw_tback] [-version]

for faa in faa/*; do # (3)!
    out=$( basename $faa )_vs_all.tsv
    echo blastp -db ./db/all.faa -outfmt 7 -out $out -query $faa -num_threads '$NPROCS' # (2)!
done

Put commented usage in here so you can reference it, if needed!
Look, you can use '$NPROCS' to get the proper substituted as variables in single quotes will not get resolved.
The faa directory has ~100 fasta format files. We use a bash glob here to iterate through them.

$ bash ./submit_blast.sh | head
blastp -db ./db/all.faa -outfmt 7 -out alpha_proteobacterium_HIMB114.faa_vs_all.tsv -query faa/alpha_proteobacterium_HIMB114.faa -num_threads $NPROCS
blastp -db ./db/all.faa -outfmt 7 -out alpha_proteobacterium_HIMB59.faa_vs_all.tsv -query faa/alpha_proteobacterium_HIMB59.faa -num_threads $NPROCS
blastp -db ./db/all.faa -outfmt 7 -out alpha_proteobacterium_HIMB5.faa_vs_all.tsv -query faa/alpha_proteobacterium_HIMB5.faa -num_threads $NPROCS
blastp -db ./db/all.faa -outfmt 7 -out alpha_proteobacterium_MED-G102.faa_vs_all.tsv -query faa/alpha_proteobacterium_MED-G102.faa -num_threads $NPROCS
blastp -db ./db/all.faa -outfmt 7 -out alpha_proteobacterium_MED-G103.faa_vs_all.tsv -query faa/alpha_proteobacterium_MED-G103.faa -num_threads $NPROCS
blastp -db ./db/all.faa -outfmt 7 -out alpha_proteobacterium_MED-G104.faa_vs_all.tsv -query faa/alpha_proteobacterium_MED-G104.faa -num_threads $NPROCS
blastp -db ./db/all.faa -outfmt 7 -out alpha_proteobacterium_SCGC_AAA240-E13.faa_vs_all.tsv -query faa/alpha_proteobacterium_SCGC_AAA240-E13.faa -num_threads $NPROCS
blastp -db ./db/all.faa -outfmt 7 -out alpha_proteobacterium_SCGC_AAA288-E13.faa_vs_all.tsv -query faa/alpha_proteobacterium_SCGC_AAA288-E13.faa -num_threads $NPROCS
blastp -db ./db/all.faa -outfmt 7 -out alpha_proteobacterium_SCGC_AAA288-G21.faa_vs_all.tsv -query faa/alpha_proteobacterium_SCGC_AAA288-G21.faa -num_threads $NPROCS
blastp -db ./db/all.faa -outfmt 7 -out alpha_proteobacterium_SCGC_AAA288-N07.faa_vs_all.tsv -query faa/alpha_proteobacterium_SCGC_AAA288-N07.faa -num_threads $NPROCS

Looks good! Now, let's submit.

hpcman queue submit -t arraySGE_Array

bash ./submit_blast.sh | hpcman queue submit - -t array -q fast -p 2 --watch -r sge.blast_array

Uses '-' to take STDIN as input. Uses -t array to make array job.

bash ./submit_blast.sh | hpcman queue submit -q fast -P 2 -r sge.blast_array

Uses STDIN by default. (hpcman queue submit does as well, but explicit is better than implicit).

Local drives¶

hpcman queue submit gives options to use the local drive of each machine.

graph LR
    A[vaughan] --->|Submit| B[compute node 1]
    A[vaughan] --->|Submit| C[compute node 2]
    B ---> Bdata[(/data)]
    A ---- nfs1[(/nfs1)]
    A ---- nfs2[(/nfs2)]
    A ---- nfs3[(/nfs3)]
    A ---- nfsN[(/nfsN)]
    A ---- local[(/local)]
    B ---- nfs1[(/nfs1)]
    B ---- nfs2[(/nfs2)]
    B ---- nfs3[(/nfs3)]
    B ---- nfsN[(/nfsN)]
    B ---- local[(/local)]
    C ---> Cdata[(/data)]
    C ---- nfs1[(/nfs1)]
    C ---- nfs2[(/nfs2)]
    C ---- nfs3[(/nfs3)]
    C ---- nfsN[(/nfsN)]
    C ---- local[(/local)]

Here are the options for local drives:

╭─ Local drive ──────────────────────────────────────────────────────────────────────────────────────────────────╮
│ --local-drive                          [pertask|shared]  Use the local drive for the submitted job. If set to  │
│                                                          pertask, uses $TMPDIR (None) as the prefix for the    │
│                                                          tempdir (`mktemp -d -p`). If set to shared, uses      │
│                                                          $TMPDIR/$USER/$DIRNAME (None/davised/hpcman) as the   │
│                                                          tempdir, where $DIRNAME is the name of the current    │
│                                                          directory.                                            │
│                                                          [default: None]                                       │
│ --local-prefix                         PATH              Override the default prefix of the tempdir. See the   │
│                                                          help for --local-drive.                               │
│                                                          [default: None]                                       │
│ --mirror-type                          [link|copy]       Type of mirroring to local drive. Only used if        │
│                                                          --local-drive is set.                                 │
│                                                          [default: (link)]                                     │
│ --copy-results    --no-copy-results                      Copy final results back to submission directory.      │
│                                                          Defaults to True if --local-drive pertask and False   │
│                                                          if --local-drive shared.                              │
╰────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯

The most well supported option(s) are --local-drive pertask, --mirror-type link, and --copy-results. Using these three options will generate a unique temp directory for each task on the node that the job lands on. It will symlink all of the files in the directory to the temp directory. After the job is complete, any newly generated files in that directory will be copied back to the submission directory.

I need to do some more testing with the --local-drive shared option, where a single directory is generated on each node. Using --copy-results with the --local-drive shared will generally cause problems, so --no-copy-results is default when --local-type shared is used. Generally, if you are running multiple tasks on the same node, using --local-drive shared would allow you to get everything copied over a single time, e.g. using --mirror-type copy.

I think a better way to handle this option, would be to submit a job that is solely for copying the inputs over, and holding job submission until that job is done. An example of this is below.

What is this, magic?¶

No, it's not magic. Here are the relevant sections of the job submission script that allow this:

$ cat sge.blast_array/sge.blast_array.sh
#!/usr/bin/env bash
...
submitdir=$(pwd)
# Generate local dir per task
mkdir -p /data
workdir=$(mktemp -d -p /data)
cd $workdir
cp -ans $submitdir/* . # (1)!
echo "##hpcman.jobs={'runid':'$JOB_ID','runname':'$JOB_NAME','host':'$(/bin/hostname -s)','wd':'$(pwd)','taskid':'$SGE_TASK_ID'}" >> /dev/stderr
echo "  Started on: $(/bin/hostname -s)"
echo "  Started at: $(/bin/date)"
arraycmd=$(sed "$SGE_TASK_ID q;d" $submitdir/sge.blast_array/sge.blast_array.commands)
echo "#!/usr/bin/env bash" > $submitdir/sge.blast_array/sge.blast_array.command.$SGE_TASK_ID
echo $arraycmd >> $submitdir/sge.blast_array/sge.blast_array.command.$SGE_TASK_ID
chmod u+x $submitdir/sge.blast_array/sge.blast_array.command.$SGE_TASK_ID
/usr/bin/time -f "\\tMemory (kb):                       %M\\n\\t# SWAP  (freq):                    %W\\n\\t# Waits (freq):                    %w\\n\\tCPU (percent):                     %P\\n\\tTime (seconds):                    %e\\n\\tTime (hh:mm:ss.ms):                %E\\n\\tSystem CPU Time (seconds):         %S\\n\\tUser   CPU Time (seconds):         %U" \
    bash $submitdir/sge.blast_array/sge.blast_array.command.$SGE_TASK_ID
echo -e "\tFull Command:                      sge.blast_array/sge.blast_array.command.$SGE_TASK_ID" >> /dev/stderr
if [ ! -z $workdir ]; then
    echo "copying results from $workdir to $submitdir and replacing with symlinks" >> /dev/stderr
    rsync --ignore-existing --remove-source-files -av $workdir/* $submitdir/ >> /dev/stderr # (2)!
    # Generate symlinks, but send the error message about existing files to /dev/null and return true.
    cp -ans $submitdir/* . 2> /dev/null || true # (3)!
fi
echo "  Finished at:    $(/bin/date)"

Makes a symlink for each file in $submitdir.
Copies the results from the $workdir to $submitdir.
Replaces the results with symlinks after copying them, cleaning up the local space.

Example using `--local-type pertask` and `--mirror-type link`¶

This type of job will generate a unique temp directory for each task.

Here is a grep output showing each unique temp directory:

$ bash ./submit_blast.sh | hpcman queue submit - -t array -q fast -p 2 --local-drive pertask --watch -r sge.blast_array
$ grep hpcman.jobs sge.blast_array/*.e* | head
sge.blast_array/sge.blast_array.e157846.1:##hpcman.jobs={'runid':'157846','runname':'sge.blast_array','host':'chrom1','wd':'/data/tmp.Vdc18ZXTAB','taskid':'1'}
sge.blast_array/sge.blast_array.e157846.10:##hpcman.jobs={'runid':'157846','runname':'sge.blast_array','host':'chrom1','wd':'/data/tmp.50cgW7bSp3','taskid':'10'}
sge.blast_array/sge.blast_array.e157846.100:##hpcman.jobs={'runid':'157846','runname':'sge.blast_array','host':'chrom1','wd':'/data/tmp.eSn3y6mBHJ','taskid':'100'}
sge.blast_array/sge.blast_array.e157846.11:##hpcman.jobs={'runid':'157846','runname':'sge.blast_array','host':'chrom1','wd':'/data/tmp.ZNWNxzDG2c','taskid':'11'}
sge.blast_array/sge.blast_array.e157846.12:##hpcman.jobs={'runid':'157846','runname':'sge.blast_array','host':'chrom1','wd':'/data/tmp.4wnuSjFkQi','taskid':'12'}
sge.blast_array/sge.blast_array.e157846.13:##hpcman.jobs={'runid':'157846','runname':'sge.blast_array','host':'chrom1','wd':'/data/tmp.lLu9KqvzDZ','taskid':'13'}
sge.blast_array/sge.blast_array.e157846.14:##hpcman.jobs={'runid':'157846','runname':'sge.blast_array','host':'chrom1','wd':'/data/tmp.m9w6JdBIiY','taskid':'14'}
sge.blast_array/sge.blast_array.e157846.15:##hpcman.jobs={'runid':'157846','runname':'sge.blast_array','host':'chrom1','wd':'/data/tmp.ydxZZDGpAR','taskid':'15'}
sge.blast_array/sge.blast_array.e157846.16:##hpcman.jobs={'runid':'157846','runname':'sge.blast_array','host':'chrom1','wd':'/data/tmp.BV8M0NvrUH','taskid':'16'}
sge.blast_array/sge.blast_array.e157846.17:##hpcman.jobs={'runid':'157846','runname':'sge.blast_array','host':'chrom1','wd':'/data/tmp.qGXSQ9Ltbi','taskid':'17'}

and it will copy the results from each job back to the submit directory using rsync when each job is complete:

$ ls *.tsv | head
alpha_proteobacterium_HIMB114.faa_vs_all.tsv
alpha_proteobacterium_HIMB5.faa_vs_all.tsv
alpha_proteobacterium_HIMB59.faa_vs_all.tsv
alpha_proteobacterium_MED-G102.faa_vs_all.tsv
alpha_proteobacterium_MED-G103.faa_vs_all.tsv
alpha_proteobacterium_MED-G104.faa_vs_all.tsv
alpha_proteobacterium_SCGC_AAA240-E13.faa_vs_all.tsv
alpha_proteobacterium_SCGC_AAA288-E13.faa_vs_all.tsv
alpha_proteobacterium_SCGC_AAA288-G21.faa_vs_all.tsv
alpha_proteobacterium_SCGC_AAA288-N07.faa_vs_all.tsv

Example using `--local-type shared` and `--mirror-type copy`¶

This will copy the inputs to the specified directory:

$ hpcman queue submit 'echo Files copied' --local-drive shared \
    --mirror-type copy \
    -q fast@chrom1 \ # (1)! \
    --local-prefix /data/davised/local-drive-shared \
    -r sge.copy_chrom1

Specifying the -q QUEUE@NODE for this option makes sense to control the node the job ends up on. If you submit to a queue without the node, you'll have to check which node the copy job lands on, and specify only that node for the processing jobs.

Then, we can specify the processing job while holding for the copy job to finish:

$ bash ./submit_blast.sh | hpcman queue submit - -t array \
    -p 2 \
    --watch \
    --local-drive shared \
    --mirror-type copy \
    -q fast@chrom1 \ # (2)! \
    --local-prefix /data/davised/local-drive-shared \
    -r sge.blast_chrom1 \
    --hold-auto # (1)! \

This will hold the processing job until the copy is finished.
Make sure this matches the QUEUE@NODE from the copy!

Then, we can queue up a copy off the space:

$ hpcman queue submit 'echo Results copied' --local-drive shared \
    --mirror-type copy \
    -q fast@chrom1 \
    --local-prefix /data/davised/local-drive-shared \
    --copy-results \ # (1)! \
    -r sge.copy_results_chrom1 \
    --hold-auto

Make sure you supply --copy-results because it is disabled by default when using --local-drive shared.

Let's take a look on chrom1 to see if the jobs are actually running in the specified directory.

$ qrsh -q fast@chrom1
$ cd /data/davised/local-drive-shared
$ ls *.tsv | head
alpha_proteobacterium_HIMB114.faa_vs_all.tsv
alpha_proteobacterium_HIMB5.faa_vs_all.tsv
alpha_proteobacterium_HIMB59.faa_vs_all.tsv
alpha_proteobacterium_MED-G102.faa_vs_all.tsv
alpha_proteobacterium_MED-G103.faa_vs_all.tsv
alpha_proteobacterium_MED-G104.faa_vs_all.tsv
alpha_proteobacterium_SCGC_AAA240-E13.faa_vs_all.tsv
alpha_proteobacterium_SCGC_AAA288-E13.faa_vs_all.tsv
alpha_proteobacterium_SCGC_AAA288-G21.faa_vs_all.tsv
alpha_proteobacterium_SCGC_AAA288-N07.faa_vs_all.tsv

Looks good! This mode will be useful for jobs that need to read and write a lot to the local /data drives.