Methods

I have selected gene families that are 'complete' in the sense that all or nearly all members of the gene family in Arabidopsis are represented in the family (judged by HMM searches of the complete set of predicted proteins in the genome). The families also have consistent domain arrangements, meaning that PFAM searches of the domains indicate that all members of the family have similar domain arrangements -- sometimes differing by the addition of a peripheral domain, but all containing the same core domain or domains.

Arabidopsis protein sequences are from the 2003 TIGR release 4.0. EST sequences are from June 2003 TIGR Gene Index releases.

All amino acid alignments were made using T-Coffee (default parameters) followed by HMM models of the alignments (non-default parameters, to cause the model to include fewer indel regions: --archpri .7 --fast --gapmax .3) followed by realignment to the HMM (and removal of indel sites falling outside of the HMM). Trees were calculated using parsimony (ProtPars in the Phylip package; gaps coded as non-informative, and one of several most-parsimonious trees was chosen at random). Maximum likelihood branch lengths were calculated for the parsimony topology using TreePuzzle, again coding gaps as non-informative. See notes on the multi-species trees and genomic positions at those sub-pages.

How was this site generated? Here's the script.

Other Software

Return to the main Plant Gene Family Evolution page.



The views and opinions expressed in this page are strictly those of the page author.
The contents of this page have not been reviewed or approved by the University of Minnesota.