All done in EVE cluster
Sometimes you may wish to download some already published data. Here we will download some data from the European Nucleotide Archive for individuals of a European bat species, Myotis escalarai. The data are from paper published a couple of years ago here and available to browse in the ENA here. The raw data are 5 libraries of dd-RADseq data, I have provided the first library for you in the /ddRAD-seq_workshop/data/Exercises_1-2/raw/
folder (BAT_01_NoIndex_L008_R1_001.fastq.gz, BAT_01_NoIndex_L008_R2_001.fastq.gz). Because these are paired-end sequencing data, For each library there are 2 files - a forward reads file (R1) and a reverse reads file (R2). They need to be downloaded and then un-bzipped and gzipped, as this is the format that Stacks needs.
Try and download one or two of the following remaining libraries (both R1 and R2 files) to practice yourself:
BAT_02_NoIndex_L005_R1_001.fastq.gz
BAT_02_NoIndex_L005_R2_001.fastq.gz
BAT_03_NoIndex_L004_R1_001.fastq.gz
BAT_03_NoIndex_L004_R2_001.fastq.gz
BAT_04_NoIndex_L005_R1_001.fastq.gz
BAT_04_NoIndex_L005_R2_001.fastq.gz
BAT_05_NoIndex_L006_R1_001.fastq.gz
BAT_05_NoIndex_L006_R2_001.fastq.gz
This is all done in the EVE cluster. As with all of the EVE work we will do, it can be submitted via sbatch using the script template (01_download_data.sh) we provide, but be sure to modify the email address and output paths to match your own details.
In the script headers (here and in all other scripts used in EVE):
#!/bin/bash
#SBATCH --job-name=download_ENA_data
#SBATCH --mail-user=christopher_david.barratt@uni-leipzig.de
#SBATCH --mail-type=BEGIN,END,FAIL,TIME_LIMIT
#SBATCH --output=/work/$USER/ddRAD-seq_workshop/job_logs/%x-%j.log
#SBATCH --cpus-per-task=1
#SBATCH --mem-per-cpu=15G
#SBATCH --time=04:00:00
# download ENA Data
cd /work/$USER/ddRAD-seq_workshop/data/Exercises_1-2/
cd ./raw
# Myotis escalerai bat data example download == 134 inds ddRADseq Paired end sequenced data, sbfI and nlaIII enzymes. Barcodes are inline inline adapters
#https://www.ebi.ac.uk/ena/browser/text-search?query=PRJEB29086
# I've commented out all except library 1 here, try downloading different files
wget ftp://ftp.sra.ebi.ac.uk/vol1/run/ERR283/ERR2832479/BAT_01_NoIndex_L008_R1_001.fastq.bz2
wget ftp://ftp.sra.ebi.ac.uk/vol1/run/ERR283/ERR2832479/BAT_01_NoIndex_L008_R2_001.fastq.bz2
#wget ftp://ftp.sra.ebi.ac.uk/vol1/run/ERR283/ERR2832480/BAT_02_NoIndex_L005_R1_001.fastq.bz2
#wget ftp://ftp.sra.ebi.ac.uk/vol1/run/ERR283/ERR2832480/BAT_02_NoIndex_L005_R2_001.fastq.bz2
#wget ftp://ftp.sra.ebi.ac.uk/vol1/run/ERR283/ERR2832481/BAT_03_NoIndex_L004_R1_001.fastq.bz2
#wget ftp://ftp.sra.ebi.ac.uk/vol1/run/ERR283/ERR2832481/BAT_03_NoIndex_L004_R2_001.fastq.bz2
#wget ftp://ftp.sra.ebi.ac.uk/vol1/run/ERR283/ERR2832482/BAT_04_NoIndex_L005_R1_001.fastq.bz2
#wget ftp://ftp.sra.ebi.ac.uk/vol1/run/ERR283/ERR2832482/BAT_04_NoIndex_L005_R2_001.fastq.bz2
#wget ftp://ftp.sra.ebi.ac.uk/vol1/run/ERR283/ERR2832483/BAT_05_NoIndex_L006_R1_001.fastq.bz2
#wget ftp://ftp.sra.ebi.ac.uk/vol1/run/ERR283/ERR2832483/BAT_05_NoIndex_L006_R2_001.fastq.bz2
# these are bzip2ed, so need to be unzipped and then gzipped for Stacks
ml GCCcore/10.2.0
ml bzip2/1.0.8
gunzip ./*.bz2.gz
bzip2 -d ./*.bz2
gzip ./*.fastq