New User 'onboarding' / FAQs¶

Assumptions¶

You have experience using the command-line; if not, see Training below

Accounts¶

When using mac or linux-based operating systems, you should have access to a terminal and the ssh program that will let you connect to the infrastructure. Using windows, you will need to use the Windows Subsystem for Linux (WSL) or putty.

You should log in to shell-hpc.cqls.oregonstate.edu using the information provided upon signing up. In order to eliminate the need to use duo, you can set up ssh keys that help confirm your identity.

How do I change my password?¶

The passwd program allows you to change your password. You’ll have to enter your current (even temporary) password before entering your new password twice. If you have ssh keys set up, entering your password just to log in is not required. Your password still is useful for signing in to https://gitlab.cqls.oregonstate.edu

What if I forget my password?¶

You have to submit a support ticket here.

What am I allowed to do on the shell server?¶

On shell-hpc.cqls.oregonstate.edu, you can edit text files, submit jobs to our queuing system (Slurm and/or SGE), and do basic text processing. Jobs requiring lots of processors and/or memory will be killed.

What am I NOT allowed to do on the shell server? Why?¶

Most processing jobs will be killed. This is so that everyone has equal access for logging in to shell-hpc.cqls and that processing jobs are not slowing down the shell machine. If all processors on the shell-hpc.cqls machine were used, then users would have a difficult time logging in and currently logged in users would have difficulty submitting jobs to the queuing system.. If your command on shell-hpc.cqls gets killed, please submit the job using hqsub, or check out an interactive node using salloc.

What is the default shell?¶

The default shell is tcsh for legacy users. Users can request a change of default shell to bash by submitting a ticket.

New users will be set up with bash shell.

What is a shell?¶

The shell is a command-line interface between you (the user) and the computer or server. The shell interprets what you type as commands and interprets the commands such that the computer or server understands what you want to do.

Do I have a quota for my $HOME directory?¶

Users have a 75GB quota for their home directories.

How do I check my quota?¶

~~Use the quota -s command to see your current usage and quota.~~

TBD.

What do I do if I go over my quota?¶

You will need to remove files if you exceed your quota such that you get under the 75GB limit.

What should I store in my $HOME directory?¶

Minimal configuration and other files should be stored in your $HOME directory. All processing should be done on networked filesystem drives and the local /scratch drives of the processing machines.

How do I edit my $PATH variable and save it across log-ins?¶

The exact changes you need to make depend on your $SHELL (either bash, tcsh, or zsh).

I suggest making the change temporarily first, and then change your configuration file (~/.bashrc for bash, ~/.cshrc for tcsh, and ~/.zshrc for zsh).

You'll need the full path to the directory that contains the program(s) you want to add to your $PATH. To temporarily add the programs to your $PATH, run the appropriate command below (export for bash, setenv for tcsh):

bash

export PATH=/path/to/new/directory:${PATH}

tcsh

setenv PATH /path/to/new/directory:${PATH}

After you test out the new command, then you can add those lines to your config file:

bash

echo 'export PATH=/path/to/new/directory:${PATH}' >> ~/.bashrc

tcsh

echo 'setenv PATH /path/to/new/directory:${PATH}' >> ~/.cshrc

Keys to doing this properly are:

Make sure to use the >> append redirect so you don't overwrite your config (> will overwrite the file).
Make sure to use single quotes ' for your echo command, otherwise the ${PATH} variable will get expanded unecessarily.

You can also just edit the appropriate file with a text editor (vim, emacs, nano) as you feel comfortable.

After you add the changes to your config files, the updated $PATH will get loaded on every new log-in shell.

File Transfers¶

What server should I use to transfer files via SFTP/SCP?¶

You must use files-hpc.cqls.oregonstate.edu for file transfers. File transfers are disabled on shell-hpc.cqls.oregonstate.edu.

What is SFTP?¶

SFTP stands for Secure File Transfer Protocol. SFTP allows for file transfers, both to and from the infrastructure, using the same security provided by ssh.

What is SCP?¶

SCP stands for Secure Copy Protocol. The scp program allows secure file transfers from the infrastructure to your own computer.

Can I use FTP?¶

FTP (File Transfer Protocol) is insecure and CQLS servers will not host files over FTP. However, you can use the ftp program from files-hpc.cqls.oregonstate.edu to transfer files externally, e.g. to NCBI, if necessary.

Why can’t I transfer files via SFTP to shell.cgrb?¶

The shell-hpc.cqls.oregonstate.edu machine should be used for submitting processing jobs only, not file transfers.

TBD

I want to publish data via the web, how can I do this?¶

TBD

Can I access my files via the web¶

TBD

Storage¶

What is ZFS / NFS?¶

ZFS is a file system and volume management technology that scales indefinitely and emphasizes zero data loss. We use the ZFS protocol on our networked file system (NFS) drives.

What is DFS / Quobyte?¶

DFS is a distributed file system. Its use has been discontinued at the CQLS.

What is stored on NFS, what should I be using it for?¶

All data and outputs should be stored on the NFS. We recommend using the /scratch drives, which are specific to each compute node, for doing the analysis and then copying the results back onto a NFS location.

Does my lab have NFS space?¶

You may have access to NFS; you will need to ask the post-doc/professor in your lab who will know.

I accidentally deleted a file; are my files backed up?¶

The $HOME directory is backed up. Each NFS location may or may not be backed up depending on if your lab pays for storage backup. Contact Support if you need to start backups on your space or if you require recovery.

What is tape backup?¶

Tape backup is a long term storage backup for recovery after some disaster. All sequencing runs are copied to tape prior to deletion. Each lab is still responsible for copying and maintaining raw sequencing data; tape backup is used for emergencies only, and is not guaranteed.

How do I get more space added to our NFS space?¶

Please contact Support to purchase more NFS disk space.

Batch Processing¶

What is batch processing?¶

A batch process is one that can run without human interaction. When we submit processes in a non-interactive mode on the infrastructure, we are submitting batch processes.

What is SGE?¶

SGE is Son of Grid Engine, which is a queuing system. SGE allows us to submit batch jobs to different compute nodes across the infrastructure, such that each job runs when resources permit.

What is Slurm?¶

Slurm is a newer queuing system that has (mostly) replaced SGE at the CQLS.

What is a Slurm partition (SGE queue)?¶

We have different partitions (aka queues) available on the infrastructure so that each lab may have different resources available at any given time.

What partitions are available to me?¶

Not all labs/colleges have access to the same resources. You can see which resources are available to you by running hqavail.

How do I submit jobs?¶

There are multiple ways to submit jobs. A single job can be run with the hqsub command. More information about queueing systems can be found here.

How do I check out a compute node for interactive use?¶

Use the salloc command.

What is Slurm (or SGE) Array?¶

hqsub allows easy submission of array jobs, which are most commonly used when a user has a command that they want to run on many individual inputs (10s-1000s). Instead of submitting hundreds or thousands of jobs, you can submit a single array job and control how many tasks are running at once.

How do I check the status of jobs?¶

The hqstat command allows you to see which jobs are running. You can look at a single job by running scontrol show job $JOBID.

How do I kill jobs?¶

You kill jobs with the scancel $JOBID command.

How can I use multiple CPUs?¶

Make sure your program has the capability of using multiple CPUs. Examine the help file for your program for more details. To check out multiple CPUs using hqsub, use the -p flag and provide the number of CPUs you want to check out. You can use the $NPROCS variable in your hqsub commands e.g. blastp -num_threads $NPROCS to make sure they are synced.

For an interactive job, you can salloc -c $NUM where $NUM is the number of CPUs to check out.

How do I check out GPUs?¶

See which partitions have GPUs available to you - hqavail --gpu - and submit a job to the appropriate partition. You can use the hqsub --opts='--gpus=$NUM' where $NUM is the number of gpus you want to reserve.

Note

A built-in gpu flag will be added to hqsub after some testing.

For an interactive job, you can salloc -p $GPU_QUEUE --gpus $NUM.

How do I know how long my bioinformatics job will run?¶

We suggest running a small test dataset through your pipeline(s) to determine an expected amount of processing time/resource utilization. The user is responsible for ensuring that their jobs are only using the amount of resources requested in the queuing system. Please monitor your jobs (you can salloc to the same node that your job is running on to check the health of the machine using e.g. htop) to ensure everything is going as requested.

How does my lab obtain more processing resources?¶

Most colleges have access to shared computing resources; if you are a member of a college where you think you should have access to machines and they are not available, please submit a support request. If you or your college does not have resources available, we have machines available to rent for up to 6 months in length. If you have more needs than that, you can email Support and discuss other options, including current costs of purchasing machines.

Another lab has asked me to collaborate with them, but I cannot access their files or compute resources, what do I do?¶

Email Support and cc the appropriate collaborators to get access to their files.

Support Tickets¶

How do I follow up to obtain further support?¶

To request general support, use the support form

Please use the 'cgrb-support' option for general questions and 'cgrb-software' option for software requests.

How do I accurately describe my issue?¶

Please provide information regarding:

What machine you are having an issue with (use qstat to see the node)
What software you are trying to run
What your expected output is, and what the observed output is
A copy/paste of any error messages you may have
How to reproduce the issue
What you may have tried to resolve the issue
Any links to the software or software help pages that might help

If you are submitting a software install request, please provide a link to the github page or other source material.

How do I check on the status of my support ticket?¶

Please follow up by emailing Support for support requests, and Ed Davis for software requests.

Training¶

How can I learn more about using the command line?¶

We offer 'Intro to Unix/Linux' and 'Command-line data analysis' courses - see the Workshops page for more information. We also offer one-on-one training for an hourly fee; please email the bioinformatics team.

Software¶

Docker/Singularity¶

Can I use docker on the Wildwood HPC?¶

We can enable running docker on single machines, especially web servers, if the need arises. Submit a support request if this is needed.

However, we prefer folks use singularity in place of docker, when possible. See here for more information.

Conda¶

How do I get conda set up?¶

Information about conda can be found here.

How can I use pixi?¶

See information about configuring here and using pixi here.

Why is my pixi saying 'Set PIXI_CACHE_DIR to continue.'¶

You probably need to set up your configuration.

Tip

Make sure you exec $SHELL to reload your shell and enable the settings.

If you see a directory name when you echo $PIXI_CACHE_DIR, pixi should be working.

If you don't want to use hpcman to automatically manage your pixi cache dir, you can manually set it to a directory you own on a nfs drive to continue, e.g. export PIXI_CACHE_DIR=/nfs/dept/lab/user/opt/pixi/cache.

We require the PIXI_CACHE_DIR to be set because otherwise the default cache home directory will get full of conda packages, which can take up considerable space.

If this has already happened, clean your cache:

pixi clean cache

pixi clean cache --yes

How do I fix my broken login/configs¶

The raw configuration files can be found here:

/local/cqls/etc/inits/bashrc
/local/cqls/etc/inits/cshrc

You can make a backup of your current file and then copy the raw configuration files into your home directory. You can also remove a ~/.tcshrc file if it's present.

mv ~/.bashrc ~/.bashrc.bak
mv ~/.cshrc ~/.cshrc.bak
rm -f ~/.tcshrc
cp /local/cqls/etc/inits/bashrc ~/.basrhc
cp /local/cqls/etc/inits/cshrc ~/.cshrc

Then log out and log back in. If your configuration seems fixed, you can add some of the modifications from your backups to the newly refreshed config files.

Where can I learn more about conda env activation?¶

See this link from the conda documentation

For most conda environments on our infrastructure, I run the scripts in /local/cluster/conda/conda_*_setup.sh to resolve version mismatches.

R isn't working in my conda env, why?¶

You likely have $R_LIBS or $R_LIBS_USER set and the R is pulling libraries from your home directory or other location that are incompatible with the R environment. You can manually unset those env vars or go to the base env of your conda directory and run bash /local/cluster/conda/conda_R_setup.sh to automatically set up the appropriate env vars on conda activate and conda deactivate.

For a copy/paste option if the conda env is active:

cd $CONDA_PREFIX
conda deactivate
bash /local/cluster/conda/conda_R_setup.sh
conda activate .

See above for more information

Python is not working or has version mismatches¶

You may need to unalias python unalias python. You can set it in your ~/.bashrc or ~/.cshrc files as well. You will have to fully type out /local/cluster/bin/python2 or /local/cluster/bin/python3 or add /local/cluster/bin in your $PATH upstream of /usr/bin e.g. export PATH=/local/cluster/bin:${PATH} so you don't have to type it out fully.

Your python is probably pulling from your .local install. You can manually unset those env vars or go to the base env of your conda directory and run bash /local/cluster/conda/conda_python_setup.sh to automatically set up the appropriate env vars on conda activate and conda deactivate.

For a copy/paste option if the conda env is active:

cd $CONDA_PREFIX
conda deactivate
bash /local/cluster/conda/conda_python_setup.sh
conda activate .

See above for more information

Perl is not working or has version mismatches¶

Your perl is probably pulling from your PERL5LIB env var. You can manually unset those env vars or go to the base env of your conda directory and run bash /local/cluster/conda/conda_perl_setup.sh to automatically set up the appropriate env vars on conda activate and conda deactivate.

For a copy/paste option if the conda env is active:

cd $CONDA_PREFIX
conda deactivate
bash /local/cluster/conda/conda_perl_setup.sh
conda activate .

See above for more information

Software is not working due to a mismatch in linked libraries (lib.so missing)¶

The compiler on our infrastructure is old (gcc 4.8.5), and does not provide the most up-to-date linked libraries. The conda LD_LIBRARY_PATH is not getting set properly. You can manually set your LD_LIBRARY_PATH to include $CONDA_PREFIX/lib, or you can go to the base env of your conda directory and run bash /local/cluster/conda/conda_perl_setup.sh to automatically set up the appropriate env vars on conda activate and conda deactivate. For a copy/paste option if the conda env is active:

cd $CONDA_PREFIX
conda deactivate
bash /local/cluster/conda/conda_LD_setup.sh
conda activate .

See above for more information

What software do I use for…¶

Adapter trimming¶

For automated adapter trimming, we currently recommend fastp. fastp is a good option for situations where you have adapters on the 3’ end of reads due to read-through of short inserts into the sequencing adapter. For trimming of primer sequences or other custom sequences, we suggest using bbduk.sh or cutadapt.

Short read alignment¶

bwa mem

Spliced alignment¶

STAR or hisat2

Long read alignment¶

minimap2

Genome assembly (Illumina)¶

SPAdes

Genome assembly (long read)¶

flye or nextdenovo

Genome annotation (prokaryote)¶

Bakta

RNA-Seq quantification¶

salmon

Differential gene expression analysis¶

deseq2

Pairwise sequence alignment¶

blast or diamond

Multiple sequence alignment¶

mafft --auto

Phylogenetic tree construction¶

IQ-TREE; fasttree can be useful for preliminary analysis

Orthologous group calculation (Prokaryote)¶

PIRATE for cultured organisms; PPanGGOLiN for MAGs/SAGs

anvio is useful for pangenome analysis as well

Principal component analysis or other ordination/dimensional reduction¶

Using R: vegan handles all types of ordinations. For nonmetric multidimensional scaling (NMDS), use the metaMDS function. For principal component analysis (PCA) use the rda function with no constraints. For principal coordinate analysis (PCoA), use the wcmdscale function. For constrained ordinations, use capscale (or dbRDA, constrained versions of PCoA) or rda (constrained PCA).

How do I…¶

Figure out how to run a program¶

Use -h e.g. $program -h
Use --help e.g. $program --help
Use help e.g. $program help
Use man (this works for system installed things like cat, mkdir, ls) e.g. man $program
Use tldr (works for common programs, awk, sed, tar, wget) e.g. tldr $program
Examine the script with less e.g. which $program; less -S /local/cluster/bin/$program (Note: does not work for compiled software)
Search the program name on google
Search the program name on the updates website https://software.cqls.oregonstate.edu/updates/tags

You can also try helpme for some curated help data.

Display a formatted markdown file on the command line¶

Use glow

Download reads from NCBI¶

Use prefetch and fasterqdump. See here for more info.

Link to SRA toolkit under construction.

Download genomes from NCBI¶

You can use the data-hub to get genome data. Use files-hpc.cqls.oregonstate.edu for downloads.

You can also use the get_assemblies program. Try this:

get_assemblies

uvx --with get-assemblies get_assemblies -h

Generate a BLASTDB¶

Use makeblastdb -in INPUT.fasta --dbtype [nucl|prot] to generate your BLASTDB. Please submit using hqsub 'makeblastdb ...'.

Do a BLASTN or BLASTP search¶

Use blastp or blastn. Use the -help flag for options. Do not use blastall as it is old and unsupported now.

Miscellaneous¶

My terminal output is garbled¶

Run reset. This should reset the output on your screen and you should be able to continue as normal.

Do my sequences have adapters?¶

All index sequences will not be included in the raw reads. Adapters could be on the 3' end of reads, depending on the library prep type. Please check the FASTQC reports sent by Matthew with each sequencing run to determine if your reads have adapters. Use the fastp program to remove them.

What is going on with my 16S sequencing results?¶

See this page for some information regarding the 16S preps provided at the CQLS.

New User 'onboarding' / FAQs¶

Assumptions¶

Accounts¶

How do I login? (SSH)¶

What server do I login to?¶

How do I change my password?¶

What if I forget my password?¶

What am I allowed to do on the shell server?¶

What am I NOT allowed to do on the shell server? Why?¶

What is the default shell?¶

What is a shell?¶

Do I have a quota for my $HOME directory?¶

How do I check my quota?¶

What do I do if I go over my quota?¶

What should I store in my $HOME directory?¶

How do I edit my $PATH variable and save it across log-ins?¶

File Transfers¶

What server should I use to transfer files via SFTP/SCP?¶

What is SFTP?¶

What is SCP?¶

Can I use FTP?¶

Why can’t I transfer files via SFTP to shell.cgrb?¶

Can I transfer files using a Windows drive share to the infrastructure?¶

I want to publish data via the web, how can I do this?¶

Can I access my files via the web¶

Storage¶

What is ZFS / NFS?¶

What is DFS / Quobyte?¶

What is stored on NFS, what should I be using it for?¶

Does my lab have NFS space?¶

I accidentally deleted a file; are my files backed up?¶

What is tape backup?¶

How do I get more space added to our NFS space?¶

Batch Processing¶

What is batch processing?¶

What is SGE?¶

What is Slurm?¶

What is a Slurm partition (SGE queue)?¶

What partitions are available to me?¶

How do I submit jobs?¶

How do I check out a compute node for interactive use?¶

What is Slurm (or SGE) Array?¶

How do I check the status of jobs?¶

How do I kill jobs?¶

How can I use multiple CPUs?¶

How do I check out GPUs?¶

How do I know how long my bioinformatics job will run?¶

How does my lab obtain more processing resources?¶

Another lab has asked me to collaborate with them, but I cannot access their files or compute resources, what do I do?¶

Support Tickets¶

How do I follow up to obtain further support?¶

How do I accurately describe my issue?¶

How do I check on the status of my support ticket?¶

Training¶

How can I learn more about using the command line?¶

Software¶

Docker/Singularity¶

Can I use docker on the Wildwood HPC?¶

Conda¶

How do I get conda set up?¶

How can I use pixi?¶

Why is my pixi saying 'Set PIXI_CACHE_DIR to continue.'¶

How do I fix my broken login/configs¶

Where can I learn more about conda env activation?¶

R isn't working in my conda env, why?¶

Python is not working or has version mismatches¶

Perl is not working or has version mismatches¶

Software is not working due to a mismatch in linked libraries (lib.so missing)¶

What software do I use for…¶

Adapter trimming¶

Short read alignment¶

Spliced alignment¶

Long read alignment¶

Genome assembly (Illumina)¶

Genome assembly (long read)¶

Genome annotation (prokaryote)¶

RNA-Seq quantification¶

Differential gene expression analysis¶

Pairwise sequence alignment¶

Multiple sequence alignment¶