Evolution of the NBS-LRR
Disease Resistance Gene Family
This page contains supplementary data and figures for the following manuscript:
Diversity, Distribution, and Ancient
Taxonomic Relationships within the TIR and non-TIR NBS-LRR Resistance Gene Subfamilies.
Journal of Molecular Evolution 54(4): 548-562
Please contact Steven Cannon (cann0010@tc.umn.edu) before using the data.
For ongoing work on Arabidopsis NBS-LRRs, by the "Functional and Comparative Genomics of Disease Resistance Gene Homologs" project, see
A 383-sequence Fitch-Margoliash tree
based on protein distances for NBS domains from 383 plant NBS-LRR RGAs.
See the next paragraph for more information. Bars on the right refer to major clades discussed in the manuscript.
The "verbose" version of the tree above.
Protein distances were calculated using PAM matrices, implemented in ProtDist in Phylip (Felsenstein, 2000).
Tree was computed using Fitch in Phylip. All sequences span between the P-loop and GLPL motifs.
Bars in the right column indicate clades with at least 70% bootstrap support based on 100 neigjbor joining trees.
Colors indicate plant family (see legend). For sequence naming conventions, see legend.
Notice the very uneven distribution of family-specific sequences in various clades.
For example, Arabidopsis (and a small number of other Brassica sequences) dominate the multi-family clade 47,
while few sequences from Fabaceae or Poaceae are found in this clade. Two sister clades (labeled A and B,
and with an asterisk and note) were added after the main tree-run. Placement was made using parsimony and
distance trees based on a representative subset of the sequences shown in this tree.
Data for the big Fitch-Margoliash tree:
Parsimony and maximum likelihood "Scaffold tree."
Maximum parsimony tree with maximum likelihood branch length calculations, based on 400 amino acid
positions spanning the NBS domain. "Scaffold" refers to the fact that most of the sequences used to
calculate this tree exist as full-length predicted proteins -- serving as references (in the manuscript)
for comparisons to shorter sequences. Sequences were gathered using the SAM-T99 hidden markov model alignment
and database search tool. Distances are in PAM units, as calculated by the Puzzle program (Strimmer and
von Haesler, 1999), based on a maximum parsimony topology calculated using ProtPars in Phylip (Felsenstein, 2000).
Bootstrap values are a percentage of 1000 neighbor joining bootstrap replicates.
Bootstrap values at or above 70% are shown.
Bootstrap values in parentheses are for a similar tree without inclusion of the partial-length pine sequences in the respective subtrees.
Bars on the right represent multi-family clades discussed in the text.
Counts are numbers of sequences from the indicated plant taxa.
Counts for Fabaceae are primarily from Medicago and Glycine, and counts for Brassicaceae are from Arabidopsis.
Data for the "scaffold tree":
non-TIR subtree with legume and some other sequences.
The tree was calculated using maximum parsimony, with maximum likelihood branch length calculations, for
NBS domains (between the P-loop and GLPL motifs) from representative sequences and most available legume RGHs.
Figure 4A shows non-TIR sequences, and Fig. 4B shows TIR sequences. The maximum parsimony tree was calculated using
ProtPars in Phylip (Felsenstein, 2000), with branch lengths calculated using maximum likelihood (PUZZLE program,
Strimmer and von Haeseler, 1999). Bootstrap values are a percentage of 1000 neighbor joining bootstrap replicates.
Bootstrap values are shown if at or above 68%. Bars on the right represent multi-family and legume-specific clades discussed in the text.
Bars on the right represent multi-family and legume-specific clades discussed in the text. Counts are numbers of
legume sequences from Glycine, Medicago, and other legume genera, by sequence clade.
Where counts are higher than in the figure, additional sequences have been identified in these clades on the basis of distance scores using
regions other than the P-loop-GLPL region used in Fig. 4.
The sequences used in this count are downdoadable below.
TIR subtree (companion to the one above).
A colored and simplified view of a similar data set.
This compares placement of sequences from Glycine, Medicago, and Arabidopsis.
Data for these "legume and other" trees:
Data for calculations of Ka:Ks in Arabidopsis, described in Cannon et al. (above).
Arabidopsis RGH sequences are those identified in the
Functional and Comparative Genomics of Disease
Resistance Gene Homologs database,
from the MIPS Arabidopsis Genome Initiative database.
Protein sequences were aligned using T-Coffee, v. 1.37 (Notredame et al., 2000),
using the multiple alignment default settings, and trimmed to include the N-terminal and NBS domains,
through the RNBS-D motif (described in Meyers et al., 1999).
Indel regions were removed (shown below both with and without removals).
TIR and non-TIR sequences were aligned separately.
Phylogenetic analysis was carried out as described for Figs. 3 and 4.
Nucleotide sequences were also aligned relative to the
protein sequences using a Perl program, TranslateAlign.pl (courtesy of Dan Kortshak).
Download the following protein and nucleotide alignments and tree (344 Kb):
Download
a big Excel file comparing partial-length sequences to scaffold sequences
Please contact Steven Cannon (cann0010@tc.umn.edu) for details.
Since May 1, 2002:
visitors