Submitting data to the NCBI Sequence Read Archive (SRA)¶
Pre-processing steps¶
- Collect fastq.gz files for each sample
- Rename fastq.gz files from long name to short name per sample
- Get biosample data early (especially if it's from collaborators)
Links¶
- https://submit.ncbi.nlm.nih.gov/subs/sra
- https://www.ncbi.nlm.nih.gov/sra/docs/submitfiles
- Sample attributes page
- Biosample batch page
Basic Checklist¶
- Project Title
- Public Description for Project
- Grant Funding
Start a New Submission¶
The example that I will provide is for a host-associated 16S sequencing study.
Go to the link above and start a submission. Fill out the project title, description, grant funding, and departmental information as requested.
Choose Packages for metagenome submitters if you are following along with host-associated 16S data, then MIMS Environmental/Metagenome from the GSC MIxS section on the right. Otherwise, fill out the form with your organism and follow along.
Choose Upload a file using Excel or text format (tab-delimited) that includes the attributes for each of your BioSamples and download the template.
Minimum Biosample Checklist¶
Note: columns marked with delete should be completely deleted
Columns marked with * are required
env_broad_scale and env_local_scale will have a value if from environmental sources, but not applicable from host sources.
- *sample_name
- sample_title -
delete - bioproject_accession -
delete - *organism -
mouse metagenome(or other host metagenome) - *collection_date -
YYYY-mm-ddas one acceptable format - *env_broad_scale -
not applicable - *env_local_scale -
animal-associated environment [ENVO:01001002](or other host-environment) - *env_medium -
fecal material [ENVO:00002003](or other host tissue) - *geo_loc_name -
USA: Oregon - *host -
Mus musculus(or other species binomial) - *lat_lon - 44.566 N 123.283 W
- genetic_mod
- host_common_name -
mouse(or other species common name) - host_diet
- host_genotype
- host_sex -
maleorfemale - host_subject_id
- host_taxid -
10090or other host taxid - misc_param
- perturbation - experimental or control group
- neg_cont_type -
kitorwater
Minimum SRA metadata¶
- sample_name - must match biosample name submitted above
- library_ID - can match sample_name
- title -
16S metabarcoding of Mus musculus: feces - library_strategy -
AMPLICON - library_source -
METAGENOMIC - library_selection -
PCR - library_layout -
paired - platform -
ILLUMINA - instrument_model -
Illumina MiSeq - design_description - choose from below
Earth Microbiome Project 16S PCR protocolIllumina 16S PCR protocol- filetype -
fastq - filename -
sample_name_R1.fastq.gz - filename2 -
sample_name_R2.fastq.gz - filename3 -
delete - filename4 -
delete - assembly -
delete - fasta_file -
delete
Uploading data using ftp¶
- Expand the FTP instructions
- Log in to files.cqls.oregonstate.edu using ssh
- Navigate to the directory containing the reads
- Connect to NCBI over ftp using given credentials:
ftp -i ftp-private.ncbi.nlm.nih.gov-iflag allows multiple transfers without confirming- Change to your given directory on the website
mkdira new directory for this submission- Use
mputto upload multiple files at once mput *.fastq.gz- Wait until the files are available to select in the web interface
- Choose the direcory in the
Select preload folderdialog and click continue