Methods
General policy: preferred methods will be suggested, no method will be excluded.
Alignments
- Recent papers benchmarking alignment methods:
- Recent papers benchmarking alignment filtering methods:
- How to decide which alignment is best? Any recommendations? Who, when? Order methods by preference
- Recent review-editorial with recommendations on the choice of an appropriate alignment method:
Alignment programs
BAli-Phy |
VERY slow, VERY good, not feasible for >40 sequences. Simultaneous alignment and phylogeny reconstruction |
PAGAN |
Better than PRANK, and faster |
PRANK |
Good, slow - small datasets, provides posterior probability scores for each position |
M-Coffee |
Good, slow, provides confidence scores for each position based on agreement of different aligners |
MAFFT |
Pretty good! fast! - large datasets |
FSA |
VERY slow, easy to use |
hmmer3 |
relatively fast. relatively easy to use; Only local alignments |
ProGraphMSA |
very fast Prank-like alignment based on graph representation and includes content sensitive feature (uses the idea from Biegert A and Soding J (2009) Sequence context-specific profiles for homology searching. Proc Natl Acad Sci USA). It should perform especially well for more divergent sequences and those with repeats. Not yet published but is currently under review |
Alignment filtering
Alignment filtering programs
GUIDANCE |
May be run with PAGAN, PRANK or MAFFT to give confidence score for each position. Based on robustness to guide tree uncertainty. Increases run time by a factor of the number of bootstrap iterations (100 by default; may reduce to 30) |
GBlocks |
Very strict, even under less stringent options; Faster but less accurate than GUIDANCE; GBlocks returns wrong column positions if columns with 100% gaps remain. It does NOT count them at all !!!; Can work on codons directly !; X are not as gaps; Do not seem to use a 0 exit code when it runs properly: always exits with code 1 for success or failure; Source code not available |
TrimAl |
very fast, easy to use; Column numbering starts at 0 ! So, 143 means 144th columns; Columns with gaps remain whatever the option (but nogaps); Cannot work directly on codons; X are not as gaps |
AliScore / AliCut |
fast, not easy to use |
NOISY |
fast |
BMGE |
fast; can work on codons |
MaxAlign |
fast, easy to use; Remove sequences that disrupt the alignment NOT columns ! |
AL2CO |
fast, easy to use; Need Clustal format as input |
Tree reconstruction
It was suggested to estimate gene phylogenies with ML and BI methods only.
All methods listed below handle gap positions as unknown characters.
Are you sure, or should we confirm in the comment section below for each method?
Tree reconstruction programs
CodonPhyML |
Includes about 50 different codon model for fast phylogeny reconstruction with ML. According to our current results (unpublished) codon models fit protein-coding DNA data significantly better and infer better trees, compared to amino acid and DNA models. This can be explained by the fact that the evolution of proteins is primarily driven by selection (negative and positive) and codon models explicitly allow selection, unlike amino acid and DNA models which do not allow selection. CodonPhyML will be available from sourceforge on publication - for now please contact: maria.anisimova@inf.ethz.ch if you would like to try it |
MrBayes |
None |
PhyML |
None |
RAxML |
None |
Gene synteny
Databases