hpcman queue submit¶
Syntax between SGE_Batch
and SGE_Array
has been resolved.
hpcman queue submit
will use the same syntax regardless of submit job type.
Tip
Use hqsub
as an alias for hpcman queue submit
so you don't have to type so much!
Comparison of settings¶
Parameter | hpcman queue submit |
SGE_Batch |
SGE_Array |
Note |
---|---|---|---|---|
command | Positional argument or STDIN |
-c 'COMMAND' | -c "FILENAME" or STDIN |
-c not used in hpcman |
# commands | Any number of commands for both batch and array jobs | one command, no '\n' or ';' supported | one command per task | |
processors/threads | -p or -P or --procs |
-P |
-P |
Priority cannot be set in hpcman . path is set with --path . $NPROCS can be used to set procs in program as well. |
runname | -r , can be auto set with -r '*' ; also will be prompted if missing |
-r , will be prompted if missing |
-r or automatically generated |
If runname exists already, will be prompted to overwrite. Can use --force to make this happen (like in SGE_Array ). |
queue | -q |
-q |
-q |
Note: -q is required in hpcman queue submit ! Can use -q '*' to use all available queues. |
conda/current shell | $PATH is preserved in submit script |
$PATH is partially preserved |
$PATH is preserved in submit script |
You can activate a conda env prior to submission, and it will remain active |
Array jobs | Specifying -t array will convert the job to an array job |
Use $SGE_TASK_ID to manually generate array job |
Automatically converts all jobs to array jobs | |
Interactive menu | Will prompt for missing or overwrite existing runname |
Fully interactive menu | No interactive menu | |
shell | Only bash is supported |
bash or tcsh is supported |
Only bash is supported |
While SGE_Batch
and SGE_Array
will still be available, future work will be
focused on making hpcman queue submit
work better with the queue, integrate
with SGE and Slurm, and integrate with our users workflows.
Anatomy of a queue submit script¶
Tip
Use the --dry-run
option to generate the submit script and dir without executing the qsub
command. You can
manually edit the submit script and then run the script manually, if desired.
$ cat sge.blastp-test/sge.blastp-test.sh
#!/usr/bin/env bash
set -eo pipefail # (1)!
#
# This file generated by hpcman queue submit
#
# Export all environment variables
#$ -V (2)
#
# Use current working directory
#$ -cwd
#
# Use bash as the executing shell
#$ -S /bin/bash
#
# Set the job name
#$ -N sge.blastp-test
#
# Set the queue name
#$ -q micro
#
# Output files for stdout and stderr
#$ -o sge.blastp-test
#$ -e sge.blastp-test
#
export NPROCS=8 # (3)!
# Request processors
#$ -pe thread 8
#
# Set the memory limit(s)
#$ -l mem_free=4.0G
#
# Set filesize limit
#$ -l h_fsize=500.0G
#
# Set PATH variable and submit directory
export PATH=/home/cgrb/davised/.config/nvm/versions/node/v16.19.0/bin:`...` # (4)!
#
submitdir=$(pwd)
echo "##hpcman.jobs={'runid':'$JOB_ID','runname':'$JOB_NAME','host':'$(/bin/hostname -s)','wd':'$(pwd)','taskid':'$SGE_TASK_ID'}" >> /dev/stderr # (5)!
echo " Started on: $(/bin/hostname -s)"
echo " Started at: $(/bin/date)"
/usr/bin/time -f "\\tMemory (kb): %M\\n\\t# SWAP (freq): %W\\n\\t# Waits (freq): %w\\n\\tCPU (percent): %P\\n\\tTime (seconds): %e\\n\\tTime (hh:mm:ss.ms): %E\\n\\tSystem CPU Time (seconds): %S\\n\\tUser CPU Time (seconds): %U" \
bash $submitdir/sge.blastp-test/sge.blastp-test.commands # (6)!
echo -e '\tFull Command: bash sge.blastp-test/sge.blastp-test.commands' >> /dev/stderr
echo " Finished at: $(/bin/date)"
- Script fails if any part of the bash script fails.
- Does not export current
$PATH
! Must set$PATH
separately. - Use
$NPROCS
in your commands to sync the number of CPU cores. - Full
$PATH
is saved here to allow forconda
env activation and other env var modications to be saved. - Your job submission information is saved here. You can view this in the
.e
job file (STDERR
). In the future, this will also be cached in your home directory. - Your command is copied into a new file and run so that multiple commands can be submitted, even in a
batch
job.
Examples¶
Batch jobs¶
Tip
Use the --watch
flag to confirm that your job was successfully submitted and started properly.
$ hpcman queue submit 'blastp -db nr -outfmt 7 -query /nfs1/CGRB/databases/test-data/ACE2-Hsapiens.fasta \
-num_threads $NPROCS -max_target_seqs 50 -out ACE2-Hsapiens_vs_nr.tab' \
-q micro \
-p 8 \
-r sge.blastp-test
🎉 Successfully submitted job 159058 sge.blastp-test to queue micro, logging job number, timestamp, and runname to .hpcman.jobnums
$ SGE_Batch -c 'blastp -db nr -outfmt 7 -query /nfs1/CGRB/databases/test-data/ACE2-Hsapiens.fasta \
-num_threads 8 -max_target_seqs 50 -out ACE2-Hsapiens_vs_nr.tab' \
-q micro \
-P 8 \
-r sge.blastp-test-SGE_Batch
* Beginning the Data run
RunID = sge.blastp-test-SGE_Batch
Dir = sge.blastp-test-SGE_Batch
* Your job 159059 ("sge.blastp-test-SGE_Batch") has been submitted
Array jobs¶
I generally recommend using a submit.sh
script that uses echo
to print the desired commands to
stdout, and then piping those commands to hpcman queue submit
(or SGE_Array
).
Here is an example of that:
$ cat submit_blast.sh
#!/usr/bin/env bash
# USAGE (1)
# blastp [-h] [-help] [-import_search_strategy filename]
# [-export_search_strategy filename] [-task task_name] [-db database_name]
# [-dbsize num_letters] [-gilist filename] [-seqidlist filename]
# [-negative_gilist filename] [-negative_seqidlist filename]
# [-taxids taxids] [-negative_taxids taxids] [-taxidlist filename]
# [-negative_taxidlist filename] [-ipglist filename]
# [-negative_ipglist filename] [-entrez_query entrez_query]
# [-db_soft_mask filtering_algorithm] [-db_hard_mask filtering_algorithm]
# [-subject subject_input_file] [-subject_loc range] [-query input_file]
# [-out output_file] [-evalue evalue] [-word_size int_value]
# [-gapopen open_penalty] [-gapextend extend_penalty]
# [-qcov_hsp_perc float_value] [-max_hsps int_value]
# [-xdrop_ungap float_value] [-xdrop_gap float_value]
# [-xdrop_gap_final float_value] [-searchsp int_value] [-seg SEG_options]
# [-soft_masking soft_masking] [-matrix matrix_name]
# [-threshold float_value] [-culling_limit int_value]
# [-best_hit_overhang float_value] [-best_hit_score_edge float_value]
# [-subject_besthit] [-window_size int_value] [-lcase_masking]
# [-query_loc range] [-parse_deflines] [-outfmt format] [-show_gis]
# [-num_descriptions int_value] [-num_alignments int_value]
# [-line_length line_length] [-html] [-sorthits sort_hits]
# [-sorthsps sort_hsps] [-max_target_seqs num_sequences]
# [-num_threads int_value] [-mt_mode int_value] [-ungapped] [-remote]
# [-comp_based_stats compo] [-use_sw_tback] [-version]
for faa in faa/*; do # (3)!
out=$( basename $faa )_vs_all.tsv
echo blastp -db ./db/all.faa -outfmt 7 -out $out -query $faa -num_threads '$NPROCS' # (2)!
done
- Put commented usage in here so you can reference it, if needed!
- Look, you can use
'$NPROCS'
to get the proper substituted as variables in single quotes will not get resolved. - The
faa
directory has ~100 fasta format files. We use abash
glob here to iterate through them.
$ bash ./submit_blast.sh | head
blastp -db ./db/all.faa -outfmt 7 -out alpha_proteobacterium_HIMB114.faa_vs_all.tsv -query faa/alpha_proteobacterium_HIMB114.faa -num_threads $NPROCS
blastp -db ./db/all.faa -outfmt 7 -out alpha_proteobacterium_HIMB59.faa_vs_all.tsv -query faa/alpha_proteobacterium_HIMB59.faa -num_threads $NPROCS
blastp -db ./db/all.faa -outfmt 7 -out alpha_proteobacterium_HIMB5.faa_vs_all.tsv -query faa/alpha_proteobacterium_HIMB5.faa -num_threads $NPROCS
blastp -db ./db/all.faa -outfmt 7 -out alpha_proteobacterium_MED-G102.faa_vs_all.tsv -query faa/alpha_proteobacterium_MED-G102.faa -num_threads $NPROCS
blastp -db ./db/all.faa -outfmt 7 -out alpha_proteobacterium_MED-G103.faa_vs_all.tsv -query faa/alpha_proteobacterium_MED-G103.faa -num_threads $NPROCS
blastp -db ./db/all.faa -outfmt 7 -out alpha_proteobacterium_MED-G104.faa_vs_all.tsv -query faa/alpha_proteobacterium_MED-G104.faa -num_threads $NPROCS
blastp -db ./db/all.faa -outfmt 7 -out alpha_proteobacterium_SCGC_AAA240-E13.faa_vs_all.tsv -query faa/alpha_proteobacterium_SCGC_AAA240-E13.faa -num_threads $NPROCS
blastp -db ./db/all.faa -outfmt 7 -out alpha_proteobacterium_SCGC_AAA288-E13.faa_vs_all.tsv -query faa/alpha_proteobacterium_SCGC_AAA288-E13.faa -num_threads $NPROCS
blastp -db ./db/all.faa -outfmt 7 -out alpha_proteobacterium_SCGC_AAA288-G21.faa_vs_all.tsv -query faa/alpha_proteobacterium_SCGC_AAA288-G21.faa -num_threads $NPROCS
blastp -db ./db/all.faa -outfmt 7 -out alpha_proteobacterium_SCGC_AAA288-N07.faa_vs_all.tsv -query faa/alpha_proteobacterium_SCGC_AAA288-N07.faa -num_threads $NPROCS
Looks good! Now, let's submit.
Uses '-'
to take STDIN
as input. Uses -t array
to make array job.
Local drives¶
hpcman queue submit
gives options to use the local drive of each machine.
graph LR
A[vaughan] --->|Submit| B[compute node 1]
A[vaughan] --->|Submit| C[compute node 2]
B ---> Bdata[(/data)]
A ---- nfs1[(/nfs1)]
A ---- nfs2[(/nfs2)]
A ---- nfs3[(/nfs3)]
A ---- nfsN[(/nfsN)]
A ---- local[(/local)]
B ---- nfs1[(/nfs1)]
B ---- nfs2[(/nfs2)]
B ---- nfs3[(/nfs3)]
B ---- nfsN[(/nfsN)]
B ---- local[(/local)]
C ---> Cdata[(/data)]
C ---- nfs1[(/nfs1)]
C ---- nfs2[(/nfs2)]
C ---- nfs3[(/nfs3)]
C ---- nfsN[(/nfsN)]
C ---- local[(/local)]
Here are the options for local drives:
â•â”€ Local drive ──────────────────────────────────────────────────────────────────────────────────────────────────╮
│ --local-drive [pertask|shared] Use the local drive for the submitted job. If set to │
│ pertask, uses $TMPDIR (None) as the prefix for the │
│ tempdir (`mktemp -d -p`). If set to shared, uses │
│ $TMPDIR/$USER/$DIRNAME (None/davised/hpcman) as the │
│ tempdir, where $DIRNAME is the name of the current │
│ directory. │
│ [default: None] │
│ --local-prefix PATH Override the default prefix of the tempdir. See the │
│ help for --local-drive. │
│ [default: None] │
│ --mirror-type [link|copy] Type of mirroring to local drive. Only used if │
│ --local-drive is set. │
│ [default: (link)] │
│ --copy-results --no-copy-results Copy final results back to submission directory. │
│ Defaults to True if --local-drive pertask and False │
│ if --local-drive shared. │
╰────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
The most well supported option(s) are --local-drive pertask
, --mirror-type link
, and --copy-results
. Using these
three options will generate a unique temp directory for each task on the node that the job lands on. It will symlink
all of the files in the directory to the temp directory. After the job is complete, any newly generated files in that
directory will be copied back to the submission directory.
I need to do some more testing with the --local-drive shared
option, where a single directory is generated on each
node. Using --copy-results
with the --local-drive shared
will generally cause problems, so --no-copy-results
is
default when --local-type shared
is used. Generally, if you are running multiple tasks on the same node, using
--local-drive shared
would allow you to get everything copied over a single time, e.g. using --mirror-type copy
.
I think a better way to handle this option, would be to submit a job that is solely for copying the inputs over, and holding job submission until that job is done. An example of this is below.
What is this, magic?¶
No, it's not magic. Here are the relevant sections of the job submission script that allow this:
$ cat sge.blast_array/sge.blast_array.sh
#!/usr/bin/env bash
...
submitdir=$(pwd)
# Generate local dir per task
mkdir -p /data
workdir=$(mktemp -d -p /data)
cd $workdir
cp -ans $submitdir/* . # (1)!
echo "##hpcman.jobs={'runid':'$JOB_ID','runname':'$JOB_NAME','host':'$(/bin/hostname -s)','wd':'$(pwd)','taskid':'$SGE_TASK_ID'}" >> /dev/stderr
echo " Started on: $(/bin/hostname -s)"
echo " Started at: $(/bin/date)"
arraycmd=$(sed "$SGE_TASK_ID q;d" $submitdir/sge.blast_array/sge.blast_array.commands)
echo "#!/usr/bin/env bash" > $submitdir/sge.blast_array/sge.blast_array.command.$SGE_TASK_ID
echo $arraycmd >> $submitdir/sge.blast_array/sge.blast_array.command.$SGE_TASK_ID
chmod u+x $submitdir/sge.blast_array/sge.blast_array.command.$SGE_TASK_ID
/usr/bin/time -f "\\tMemory (kb): %M\\n\\t# SWAP (freq): %W\\n\\t# Waits (freq): %w\\n\\tCPU (percent): %P\\n\\tTime (seconds): %e\\n\\tTime (hh:mm:ss.ms): %E\\n\\tSystem CPU Time (seconds): %S\\n\\tUser CPU Time (seconds): %U" \
bash $submitdir/sge.blast_array/sge.blast_array.command.$SGE_TASK_ID
echo -e "\tFull Command: sge.blast_array/sge.blast_array.command.$SGE_TASK_ID" >> /dev/stderr
if [ ! -z $workdir ]; then
echo "copying results from $workdir to $submitdir and replacing with symlinks" >> /dev/stderr
rsync --ignore-existing --remove-source-files -av $workdir/* $submitdir/ >> /dev/stderr # (2)!
# Generate symlinks, but send the error message about existing files to /dev/null and return true.
cp -ans $submitdir/* . 2> /dev/null || true # (3)!
fi
echo " Finished at: $(/bin/date)"
- Makes a symlink for each file in
$submitdir
. - Copies the results from the
$workdir
to$submitdir
. - Replaces the results with symlinks after copying them, cleaning up the local space.
Example using --local-type pertask
and --mirror-type link
¶
This type of job will generate a unique temp directory for each task.
Here is a grep
output showing each unique temp directory:
$ bash ./submit_blast.sh | hpcman queue submit - -t array -q fast -p 2 --local-drive pertask --watch -r sge.blast_array
$ grep hpcman.jobs sge.blast_array/*.e* | head
sge.blast_array/sge.blast_array.e157846.1:##hpcman.jobs={'runid':'157846','runname':'sge.blast_array','host':'chrom1','wd':'/data/tmp.Vdc18ZXTAB','taskid':'1'}
sge.blast_array/sge.blast_array.e157846.10:##hpcman.jobs={'runid':'157846','runname':'sge.blast_array','host':'chrom1','wd':'/data/tmp.50cgW7bSp3','taskid':'10'}
sge.blast_array/sge.blast_array.e157846.100:##hpcman.jobs={'runid':'157846','runname':'sge.blast_array','host':'chrom1','wd':'/data/tmp.eSn3y6mBHJ','taskid':'100'}
sge.blast_array/sge.blast_array.e157846.11:##hpcman.jobs={'runid':'157846','runname':'sge.blast_array','host':'chrom1','wd':'/data/tmp.ZNWNxzDG2c','taskid':'11'}
sge.blast_array/sge.blast_array.e157846.12:##hpcman.jobs={'runid':'157846','runname':'sge.blast_array','host':'chrom1','wd':'/data/tmp.4wnuSjFkQi','taskid':'12'}
sge.blast_array/sge.blast_array.e157846.13:##hpcman.jobs={'runid':'157846','runname':'sge.blast_array','host':'chrom1','wd':'/data/tmp.lLu9KqvzDZ','taskid':'13'}
sge.blast_array/sge.blast_array.e157846.14:##hpcman.jobs={'runid':'157846','runname':'sge.blast_array','host':'chrom1','wd':'/data/tmp.m9w6JdBIiY','taskid':'14'}
sge.blast_array/sge.blast_array.e157846.15:##hpcman.jobs={'runid':'157846','runname':'sge.blast_array','host':'chrom1','wd':'/data/tmp.ydxZZDGpAR','taskid':'15'}
sge.blast_array/sge.blast_array.e157846.16:##hpcman.jobs={'runid':'157846','runname':'sge.blast_array','host':'chrom1','wd':'/data/tmp.BV8M0NvrUH','taskid':'16'}
sge.blast_array/sge.blast_array.e157846.17:##hpcman.jobs={'runid':'157846','runname':'sge.blast_array','host':'chrom1','wd':'/data/tmp.qGXSQ9Ltbi','taskid':'17'}
and it will copy the results from each job back to the submit directory using rsync
when each job is complete:
$ ls *.tsv | head
alpha_proteobacterium_HIMB114.faa_vs_all.tsv
alpha_proteobacterium_HIMB5.faa_vs_all.tsv
alpha_proteobacterium_HIMB59.faa_vs_all.tsv
alpha_proteobacterium_MED-G102.faa_vs_all.tsv
alpha_proteobacterium_MED-G103.faa_vs_all.tsv
alpha_proteobacterium_MED-G104.faa_vs_all.tsv
alpha_proteobacterium_SCGC_AAA240-E13.faa_vs_all.tsv
alpha_proteobacterium_SCGC_AAA288-E13.faa_vs_all.tsv
alpha_proteobacterium_SCGC_AAA288-G21.faa_vs_all.tsv
alpha_proteobacterium_SCGC_AAA288-N07.faa_vs_all.tsv
Example using --local-type shared
and --mirror-type copy
¶
This will copy the inputs to the specified directory:
$ hpcman queue submit 'echo Files copied' --local-drive shared \
--mirror-type copy \
-q fast@chrom1 \ # (1)! \
--local-prefix /data/davised/local-drive-shared \
-r sge.copy_chrom1
- Specifying the
-q QUEUE@NODE
for this option makes sense to control the node the job ends up on. If you submit to a queue without the node, you'll have to check which node the copy job lands on, and specify only that node for the processing jobs.
Then, we can specify the processing job while holding for the copy job to finish:
$ bash ./submit_blast.sh | hpcman queue submit - -t array \
-p 2 \
--watch \
--local-drive shared \
--mirror-type copy \
-q fast@chrom1 \ # (2)! \
--local-prefix /data/davised/local-drive-shared \
-r sge.blast_chrom1 \
--hold-auto # (1)! \
- This will hold the processing job until the copy is finished.
- Make sure this matches the QUEUE@NODE from the copy!
Then, we can queue up a copy off the space:
$ hpcman queue submit 'echo Results copied' --local-drive shared \
--mirror-type copy \
-q fast@chrom1 \
--local-prefix /data/davised/local-drive-shared \
--copy-results \ # (1)! \
-r sge.copy_results_chrom1 \
--hold-auto
- Make sure you supply
--copy-results
because it is disabled by default when using--local-drive shared
.
Let's take a look on chrom1
to see if the jobs are actually running in the specified directory.
$ qrsh -q fast@chrom1
$ cd /data/davised/local-drive-shared
$ ls *.tsv | head
alpha_proteobacterium_HIMB114.faa_vs_all.tsv
alpha_proteobacterium_HIMB5.faa_vs_all.tsv
alpha_proteobacterium_HIMB59.faa_vs_all.tsv
alpha_proteobacterium_MED-G102.faa_vs_all.tsv
alpha_proteobacterium_MED-G103.faa_vs_all.tsv
alpha_proteobacterium_MED-G104.faa_vs_all.tsv
alpha_proteobacterium_SCGC_AAA240-E13.faa_vs_all.tsv
alpha_proteobacterium_SCGC_AAA288-E13.faa_vs_all.tsv
alpha_proteobacterium_SCGC_AAA288-G21.faa_vs_all.tsv
alpha_proteobacterium_SCGC_AAA288-N07.faa_vs_all.tsv
Looks good! This mode will be useful for jobs that need to read and write a lot to the local /data
drives.