Functional Ecological Genomics 2017

Materials for Stephanie Spielman's sessions

Detecting selection in protein-coding sequences


This session will address the question: Given a multiple sequence alignment containing protein-coding sequences and a corresponding phylogeny, what kinds of questions can we ask about natural selection?

We will investigate the signatures of natural selection in protein-coding sequence data usng HyPhy, via the Datamonkey webserver.

Note: Newer analyses (like BUSTED, aBSREL, and RELAX) are only available via the development Datamonkey webserver.


These papers provide excellent overviews (with technical details as well) of phylogenetic codon models:


You can download slides for today’s session here.


We will investigate selection using two example datasets:

  1. A dataset of 10 CD2 mammalian orthologs.
  2. A dataset containing Borna disease virus sequences from both endogenous and closely-related “free-living” viruses.

Protip: This python script is a useful wrapper for prepping your data for HyPhy input. It is written in Python2 and requires that the aligner mafft and the phylogenetic reconstruction method FastTree are installed and available in your path.

Note that you can use this online widget to visualize and annotate phylogenies.


Method Input data Datamonkey Version Result page(s)*
BUSTED CD2 Subset FG Results and All FG Results
aBSREL CD2 Results
RELAX Borna Results
FEL CD2 Results
MEME CD2 Results

*Note: Some of these links will expire a few days after this workshop, so please re-run analyses in the future.