Detecting selection in protein-coding sequences
This session will address the question: Given a multiple sequence alignment containing protein-coding sequences and a corresponding phylogeny, what kinds of questions can we ask about natural selection?
Note: Newer analyses (like BUSTED, aBSREL, and RELAX) are only available via the development Datamonkey webserver.
These papers provide excellent overviews (with technical details as well) of phylogenetic codon models:
- Investigating Protein-Coding Sequence Evolution with Probabilistic Codon Substitution Models
- Trends in Substitution Models of Molecular Evolution
- If you want to dive in as much as possible, this book is the definitive overview of molecular evolution, including codon models: Computational Molecular Evolution
You can download slides for today’s session here.
We will investigate selection using two example datasets:
- A dataset of 10 CD2 mammalian orthologs.
- A dataset containing Borna disease virus sequences from both endogenous and closely-related “free-living” viruses.
Protip: This python script is a useful wrapper for prepping your data for HyPhy input. It is written in Python2 and requires that the aligner mafft and the phylogenetic reconstruction method FastTree are installed and available in your path.
Note that you can use this online widget to visualize and annotate phylogenies.
|Method||Input data||Datamonkey Version||Result page(s)*|
|BUSTED||CD2||test.datamonkey.org||Subset FG Results and All FG Results|
*Note: Some of these links will expire a few days after this workshop, so please re-run analyses in the future.