Genome Evolution

Du, C, Swigonova Z, Messing J.  2006.  Retrotranspositions in orthologous regions of closely related grass species. BMC evolutionary biology. 6:62. AbstractWebsite
BACKGROUND: Retrotransposons are commonly occurring eukaryotic transposable elements (TEs). Among these, long terminal repeat (LTR) retrotransposons are the most abundant TEs and can comprise 50-90% of the genome in higher plants. By comparing the orthologous chromosomal regions of closely related species, the effects of TEs on the evolution of plant genomes can be studied in detail. RESULTS: Here, we compared the composition and organization of TEs within five orthologous chromosomal regions among three grass species: maize, sorghum, and rice. We identified a total of 132 full or fragmented LTR retrotransposons in these regions. As a percentage of the total cumulative sequence in each species, LTR retrotransposons occupy 45.1% of the maize, 21.1% of the rice, and 3.7% of the sorghum regions. The most common elements in the maize retrotransposon-rich regions are the copia-like retrotransposons with 39% and the gypsy-like retrotransposons with 37%. Using the contiguous sequence of the orthologous regions, we detected 108 retrotransposons with intact target duplication sites and both LTR termini. Here, we show that 74% of these elements inserted into their host genome less than 1 million years ago and that many retroelements expanded in size by the insertion of other sequences. These inserts were predominantly other retroelements, however, several of them were also fragmented genes. Unforeseen was the finding of intact genes embedded within LTR retrotransposons. CONCLUSION: Although the abundance of retroelements between maize and rice is consistent with their different genome sizes of 2,364 and 389 Mb respectively, the content of retrotransposons in sorghum (790 Mb) is surprisingly low. In all three species, retrotransposition is a very recent activity relative to their speciation. While it was known that genes re-insert into non-orthologous positions of plant genomes, they appear to re-insert also within retrotransposons, potentially providing an important role for retrotransposons in the evolution of gene function.
Swigonova, Z, Bennetzen JL, Messing J.  2005.  Structure and evolution of the r/b chromosomal regions in rice, maize and sorghum. Genetics. 169:891-906. AbstractWebsite
The r1 and b1 genes of maize, each derived from the chromosomes of two progenitors that hybridized >4.8 million years ago (MYA), have been a rich source for studying transposition, recombination, genomic imprinting, and paramutation. To provide a phylogenetic context to the genetic studies, we sequenced orthologous regions from maize and sorghum (>600 kb) surrounding these genes and compared them with the rice genome. This comparison showed that the homologous regions underwent complete or partial gene deletions, selective retention of orthologous genes, and insertion of nonorthologous genes. Phylogenetic analyses of the r/b genes revealed that the ancestral gene was amplified independently in different grass lineages, that rice experienced an intragenomic gene movement and parallel duplication, that the maize r1 and b1 genes are descendants of two divergent progenitors, and that the two paralogous r genes of sorghum are almost as old as the sorghum lineage. Such sequence mobility also extends to linked genes. The cisZOG genes are characterized by gene amplification in an ancestral grass, parallel duplications and deletions in different grass lineages, and movement to a nonorthologous position in maize. In addition to gene mobility, both maize and rice regions experienced recent transposition (<3 MYA).
Swigonova, Z, Lai J, Ma J, Ramakrishna W, Llaca V, Bennetzen JL, Messing J.  2004.  Close split of sorghum and maize genome progenitors. Genome research. 14:1916-23. AbstractWebsite
It is generally believed that maize (Zea mays L. ssp. mays) arose as a tetraploid; however, the two progenitor genomes cannot be unequivocally traced within the genome of modern maize. We have taken a new approach to investigate the origin of the maize genome. We isolated and sequenced large genomic fragments from the regions surrounding five duplicated loci from the maize genome and their orthologous loci in sorghum, and then we compared these sequences with the orthologous regions in the rice genome. Within the studied segments, we identified 11 genes that were conserved in location, order, and orientation. We performed phylogenetic and distance analyses and examined the patterns of estimated times of divergence for sorghum and maize gene orthologs and also the time of divergence for maize orthologs. Our results support a tetraploid origin of maize. This analysis also indicates contemporaneous divergence of the ancestral sorghum genome and the two maize progenitor genomes about 11.9 million years ago (Mya). On the basis of a putative conversion event detected for one of the genes, tetraploidization must have occurred before 4.8 Mya, and therefore, preceded the major maize genome expansion by gene amplification and retrotransposition.
Clark, RM, Linton E, Messing J, Doebley JF.  2004.  Pattern of diversity in the genomic region near the maize domestication gene tb1. Proceedings of the National Academy of Sciences of the United States of America. 101:700-7. AbstractWebsite
Domesticated maize and its wild ancestor (teosinte) differ strikingly in morphology and afford an opportunity to examine the connection between strong selection and diversity in a major crop species. The tb1 gene largely controls the increase in apical dominance in maize relative to teosinte, and a region of the tb1 locus 5' to the transcript sequence was a target of selection during maize domestication. To better characterize the impact of selection at a major "domestication" locus, we have sequenced the upstream tb1 genomic region and systematically sampled nucleotide diversity for sites located as far as 163 kb upstream to tb1. Our analyses define a selective sweep of approximately 60-90 kb 5' to the tb1 transcribed sequence. The selected region harbors a mixture of unique sequences and large repetitive elements, but it contains no predicted genes. Diversity at the nearest 5' gene to tb1 is typical of that for neutral maize loci, indicating that selection at tb1 has had a minimal impact on the surrounding chromosomal region. Our data also show low intergenic linkage disequilibrium in the region and suggest that selection has had a minor role in shaping the pattern of linkage disequilibrium that is observed. Finally, our data raise the possibility that maize-like tb1 haplotypes are present in extant teosinte populations, and our findings also suggest a model of tb1 gene regulation that differs from traditional views of how plant gene expression is controlled.
Song, R, Llaca V, Linton E, Messing J.  2001.  Sequence, regulation, and evolution of the maize 22-kD alpha zein gene family. Genome research. 11:1817-25. AbstractWebsite
We have isolated and sequenced all 23 members of the 22-kD alpha zein (z1C) gene family of maize. This is one of the largest plant gene families that has been sequenced from a single genetic background and includes the largest contiguous genomic DNA from maize with 346,292 bp to date. Twenty-two of the z1C members are found in a roughly tandem array on chromosome 4S forming a dense gene cluster 168,489-bp long. The twenty-third copy of the gene family is also located on chromosome 4S at a site approximately 20 cM closer to the centromere and appears to be the wild-type allele of the floury-2 (fl2) mutation. On the basis of an analysis of maize cDNA databases, only seven of these genes appear to be expressed including the fl2 allele. The expressed genes in the cluster are interspersed with nonexpressed genes. Interestingly, some of the expressed genes differ in their transcriptional regulation. Gene amplification appears to be in blocks of genes explaining the rapid and compact expansion of the cluster during the evolution of maize.
Bradeen, JM, Timmermans MC, Messing J.  1997.  Dynamic genome organization and gene evolution by positive selection in geminivirus (Geminiviridae). Molecular biology and evolution. 14:1114-24. AbstractWebsite
Geminiviruses (Geminiviridae) are a diverse group of plant viruses differing from other known plant viruses in possessing circular, single-stranded DNA. Current classification divides the family into three subgroups, defined in part by genome organization, insect vector, and plant host range. Previous phylogenetic assessments of geminiviruses have used DNA and/or amino acid sequences from the replication-associated and coat protein genes and have relied predominantly on distance analyses. We used amino acid and DNA sequence data from the replication-associated and coat protein genes from 22 geminivirus types in distance and parsimony analyses. Although the results of our analyses largely agree with those reported previously, we could not always predict viral relationships based on genome organization, plant host, or insect vector. Loss of correlation of these traits with phylogeny is likely due to improved sampling of geminivirus types. Unrooted parsimony trees suggest multiple independent origins for the monopartite genome. genome organization is therefore a dynamic character. Estimates of nonsynonymous and synonymous nucleotide substitutions for extant and inferred ancestral sequences were used to evaluate hypotheses that the replication-associated and coat protein sequences evolve to accommodate plant host and insect vector specificities, respectively. Results suggest that plant host specificity does not solely direct replication-associated protein-evolution but that coat protein sequence does evolve in response to insect vector specificity. Genome organization and, possibly, plant host specificity are not reliable taxonomic characters.
Heidecker, G, Chaudhuri S, Messing J.  1991.  Highly clustered zein gene sequences reveal evolutionary history of the multigene family. Genomics. 10:719-32. AbstractWebsite
We have determined the nucleotide sequences of zein cDNA clones ZG14, ZG15, and ZG35. The three clones have 95 to 98% homology to the previously published sequence of clone A20, and 84% homology to sequences of the zein subfamily A30. Comparison of all sequences of the A30 and A20 subfamilies highlights the following features: the 5' nontranslated regions are 68 and 57 nucleotides in length for the A20- and A30-like mRNAs, respectively, and contain at least three repeats of the consensus sequence ACGAACAAta/gG; the majority of these genes are highly clustered as judged from pulsed-field gel electrophoresis of high molecular weight maize DNA. Furthermore, we discuss a model for the evolution of the multigene family which stresses the special importance of unequal crossingover and gene conversion in this system.