Estimating genetic diversity and population information from short read (ddRAD-seq) type data

Date & time: June 20-21 2022 (09:00-15:00)

Location: German Centre for Integrative Biodiversity Research (iDiv), Puschstrasse 4, 04103, Leipzig. Room: Beehive (Ground floor)

Teachers: Chris Barratt & Laura Mendez Cuellar, sDiv and Evolution and Adaptation (iDiv)

Course summary | Course website

All powerpoint slides available here

Schedule

Day 1 (20th June): 09:00 - 15:00

Brief introductions and research topics of course leaders and course participants (2-3 mins each)
Introductory lecture on population genomics and ddRAD-seq and which kinds of questions can be answered with these data
Hands on session for downloading publicly available genomic data (e.g. from the European Nucleotide Archive) - using Myotis escalerai data (a European bat species)
Stacks 2 lecture (main processing steps)
Hands on session for each of the major processing steps required for the Stacks pipeline (process_radtags, denovo_map) - using Hyphaene coriacea data (an African palm)

Day 2 (21st June): 09:00 - 15:00

Hands on sessions to analyse population structure, phylogeny and genetic diversity - using Leptopelis flavomaculatus data (an African treefrog):

Admixture (pop structure)
sNMF (pop structure)
DAPC (pop structure)
RAxML (phylogenetic relationships)
F-stats (FST, genetic diversity)

Before the course, please:

Bring your own computer and know how to use it
Ensure you have working access to the UFZ EVE cluster and have a basic understanding of the job queuing system (i.e. how to submit jobs)
Also an interface for file transfer would be useful - e.g. FileZilla, CyberDuck - see here
Install R (at least version 4) and Rstudio on your computer
Read the course literature (see below)

Please contact us if you have any questions!

Objectives

Understand best practices on planning and executing a population genomics project based on ddRAD-seq (short read) type data
Learn how to download genomic sequence data from the European Nucleotide Archive
Familiarise yourself with the Stacks 2 bioinformatic pipeline to process raw reads
Be able to generate your own output files after Stacks processing for downstream analyses of population structure, phylogeny and genetic diversity
Run analyses to perform (Admixture, sNMF, DAPC, RAxML, F-statistics), understand what you’ve done and how to interpret the outputs

Background

The reduction in costs for genomic data generation has reduced drastically in recent years. This cost reduction has led to the large-scale adoption of ddRAD-seq as a method for denovo Single Nucleotide Polymorphism (SNP) discovery in non-model organisms, where thousands of molecular markers can be used to answer ecological, evolutionary and conservation questions that were previously not possible to answer.

In this course you’ll learn how to plan a ddRAD-seq style project effectively, learn the data types, how to analyse them and what the outputs mean. There’s also the chance to analyse your own data (if you already have it) and earn an extra 0.5 ECTS.

Literature

Andrews et al 2016. Harnessing the power of RADseq for ecological and evolutionary genomics

Rochette et al. 2019. Stacks 2: Analytical methods for paired-end sequencing improve RADseq-based population genomics

Paris et al. 2017. Lost in parameter space: a road map for stacks

Rochette and Catchen 2017. Deriving genotypes from RAD-seq short-read data using Stacks

Cerca et al. 2021. Removing the bad apples: A simple bioinformatic method to improve loci-recovery in de novo RADseq data for non-model organisms

Datasets

Barratt et al. 2018. Vanishing refuge? Testing the forest refuge hypothesis in coastal East Africa using genome-wide sequence data for seven amphibians

Razgour et al. 2019. Considering adaptive genetic variation in climate change vulnerability assessment reduces species range loss projections

Mendez et al. (in prep)