Submitting data to the NCBI Sequence Read Archive (SRA)¶
Pre-processing steps¶
- Collect fastq.gz files for each sample
- Rename fastq.gz files from long name to short name per sample
- Get biosample data early (especially if it's from collaborators)
Links¶
- https://submit.ncbi.nlm.nih.gov/subs/sra
- https://www.ncbi.nlm.nih.gov/sra/docs/submitfiles
- Sample attributes page
- Biosample batch page
Basic Checklist¶
- Project Title
- Public Description for Project
- Grant Funding
Start a New Submission¶
The example that I will provide is for a host-associated 16S sequencing study.
Go to the link above and start a submission. Fill out the project title, description, grant funding, and departmental information as requested.
Choose Packages for metagenome submitters
if you are following along with host-associated 16S data, then MIMS Environmental/Metagenome
from the GSC MIxS
section on the right. Otherwise, fill out the form with your organism and follow along.
Choose Upload a file using Excel or text format (tab-delimited) that includes the attributes for each of your BioSamples
and download the template.
Minimum Biosample Checklist¶
Note: columns marked with delete
should be completely deleted
Columns marked with *
are required
env_broad_scale
and env_local_scale
will have a value if from environmental sources, but not applicable
from host sources.
- *sample_name
- sample_title -
delete
- bioproject_accession -
delete
- *organism -
mouse metagenome
(or other host metagenome) - *collection_date -
YYYY-mm-dd
as one acceptable format - *env_broad_scale -
not applicable
- *env_local_scale -
animal-associated environment [ENVO:01001002]
(or other host-environment) - *env_medium -
fecal material [ENVO:00002003]
(or other host tissue) - *geo_loc_name -
USA: Oregon
- *host -
Mus musculus
(or other species binomial) - *lat_lon - 44.566 N 123.283 W
- genetic_mod
- host_common_name -
mouse
(or other species common name) - host_diet
- host_genotype
- host_sex -
male
orfemale
- host_subject_id
- host_taxid -
10090
or other host taxid - misc_param
- perturbation - experimental or control group
- neg_cont_type -
kit
orwater
Minimum SRA metadata¶
- sample_name - must match biosample name submitted above
- library_ID - can match sample_name
- title -
16S metabarcoding of Mus musculus: feces
- library_strategy -
AMPLICON
- library_source -
METAGENOMIC
- library_selection -
PCR
- library_layout -
paired
- platform -
ILLUMINA
- instrument_model -
Illumina MiSeq
- design_description - choose from below
Earth Microbiome Project 16S PCR protocol
Illumina 16S PCR protocol
- filetype -
fastq
- filename -
sample_name_R1.fastq.gz
- filename2 -
sample_name_R2.fastq.gz
- filename3 -
delete
- filename4 -
delete
- assembly -
delete
- fasta_file -
delete
Uploading data using ftp¶
- Expand the FTP instructions
- Log in to files.cqls.oregonstate.edu using ssh
- Navigate to the directory containing the reads
- Connect to NCBI over ftp using given credentials:
ftp -i ftp-private.ncbi.nlm.nih.gov
-i
flag allows multiple transfers without confirming- Change to your given directory on the website
mkdir
a new directory for this submission- Use
mput
to upload multiple files at once mput *.fastq.gz
- Wait until the files are available to select in the web interface
- Choose the direcory in the
Select preload folder
dialog and click continue