Genome Evolution

Calvino, M., Messing J.  2013.  Discovery of MicroRNA169 gene copies in genomes of flowering plants through positional information. Genome Biol Evol. 5:402-17. AbstractWebsite
Expansion and contraction of microRNA (miRNA) families can be studied in sequenced plant genomes through sequence alignments. Here, we focused on miR169 in sorghum because of its implications in drought tolerance and stem-sugar content. We were able to discover many miR169 copies that have escaped standard genome annotation methods. A new miR169 cluster was found on sorghum chromosome 1. This cluster is composed of the previously annotated sbi-MIR169o together with two newly found MIR169 copies, named sbi-MIR169t and sbi-MIR169u. We also found that a miR169 cluster on sorghum chr7 consisting of sbi-MIR169l, sbi-MIR169m, and sbi-MIR169n is contained within a chromosomal inversion of at least 500 kb that occurred in sorghum relative to Brachypodium, rice, foxtail millet, and maize. Surprisingly, synteny of chromosomal segments containing MIR169 copies with linked bHLH and CONSTANS-LIKE genes extended from Brachypodium to dictotyledonous species such as grapevine, soybean, and cassava, indicating a strong conservation of linkages of certain flowering and/or plant height genes and microRNAs, which may explain linkage drag of drought and flowering traits and would have consequences for breeding new varieties. Furthermore, alignment of rice and sorghum orthologous regions revealed the presence of two additional miR169 gene copies (miR169r and miR169s) on sorghum chr7 that formed an antisense miRNA gene pair. Both copies are expressed and target different set of genes. Synteny-based analysis of microRNAs among different plant species should lead to the discovery of new microRNAs in general and contribute to our understanding of their evolution.
Xu, JH, Bennetzen JL, Messing J.  2012.  Dynamic gene copy number variation in collinear regions of grass genomes. Mol Biol Evol. 29:861-71. AbstractWebsite
A salient feature of genomes of higher organisms is the birth and death of gene copies. An example is the alpha prolamin genes, which encode seed storage proteins in grasses (Poaceae) and represent a medium-size gene family. To better understand the mechanism, extent, and pace of gene amplification, we compared prolamin gene copies in the genomes of two different tribes in the Panicoideae, the Paniceae and the Andropogoneae. We identified alpha prolamin (setarin) gene copies in the diploid foxtail millet (Paniceae) genome (490 Mb) and compared them with orthologous regions in diploid sorghum (730 Mb) and ancient allotetraploid maize (2,300 Mb) (Andropogoneae). Because sequenced genomes of other subfamilies of Poaceae like rice (389 Mb) (Ehrhartoideae) and Brachypodium (272 Mb) (Pooideae) do not have alpha prolamin genes, their collinear regions can serve as "empty" reference sites. A pattern emerged, where genes were copied and inserted into other chromosomal locations followed by additional tandem duplications (clusters). We observed both recent (species-specific) insertion events and older ones that are shared by these tribes. Many older copies were deleted by unequal crossing over of flanking sequences or damaged by truncations. However, some remain intact with active and inactive alleles. These results indicate that genomes reflect only a snapshot of the gene content of a species and are far less static than conventional genetics has suggested. Nucleotide substitution rates for active alpha prolamins genes were twice as high as for low copy number beta, gamma, and delta prolamin genes, suggesting that gene amplification accelerates the pace of divergence.
Messing, J, Bennetzen J.  2008.  Grass Genome Structure and Evolution. Genome Dynamics. 4:41-56.
Messing, J.  2009.  The Polyploid Origin of Maize. The Maize Handbook: Domestication, Genetics, and Genome. :221-238.
Xu, JH, Messing J.  2009.  Amplification of prolamin storage protein genes in different subfamilies of the Poaceae. Theor Appl Genet. AbstractWebsite
Prolamins are seed storage proteins in cereals and represent an important source of essential amino acids for feed and food. Genes encoding these proteins resulted from dispersed and tandem amplification. While previous studies have concentrated on protein sequences from different grass species, we now can add a new perspective to their relationships by asking how their genes are shared by ancestry and copied in different lineages of the same family of species. These differences are derived from alignment of chromosomal regions, where collinearity is used to identify prolamin genes in syntenic positions, also called orthologous gene copies. New or paralogous gene copies are inserted in tandem or new locations of the same genome. More importantly, one can detect the loss of older genes. We analyzed chromosomal intervals containing prolamin genes from rice, sorghum, wheat, barley, and Brachypodium, representing different subfamilies of the Poaceae. The Poaceae commonly known as the grasses includes three major subfamilies, the Ehrhartoideae (rice), Pooideae (wheat, barley, and Brachypodium), and Panicoideae (millets, maize, sorghum, and switchgrass). Based on chromosomal position and sequence divergence, it becomes possible to infer the order of gene amplification events. Furthermore, the loss of older genes in different subfamilies seems to permit a faster pace of divergence of paralogous genes. Change in protein structure affects their physical properties, subcellular location, and amino acid composition. On the other hand, regulatory sequence elements and corresponding transcriptional activators of new gene copies are more conserved than coding sequences, consistent with the tissue-specific expression of these genes.
Salse, J, Abrouk M, Bolot S, Guilhot N, Courcelle E, Faraut T, Waugh R, Close TJ, Messing J, Feuillet C.  2009.  Reconstruction of monocotelydoneous proto-chromosomes reveals faster evolution in plants than in animals. Proc Natl Acad Sci U S A. 106:14908-13. AbstractWebsite
Paleogenomics seeks to reconstruct ancestral genomes from the genes of today's species. The characterization of paleo-duplications represented by 11,737 orthologs and 4,382 paralogs identified in five species belonging to three of the agronomically most important subfamilies of grasses, that is, Ehrhartoideae (rice) Panicoideae (sorghum, maize), and Pooideae (wheat, barley), permitted us to propose a model for an ancestral genome with a minimal size of 33.6 Mb structured in five proto-chromosomes containing at least 9,138 predicted proto-genes. It appears that only four major evolutionary shuffling events (alpha, beta, gamma, and delta) explain the divergence of these five cereal genomes during their evolution from a common paleo-ancestor. Comparative analysis of ancestral gene function with rice as a reference indicated that five categories of genes were preferentially modified during evolution. Furthermore, alignments between the five grass proto-chromosomes and the recently identified seven eudicot proto-chromosomes indicated that additional very active episodes of genome rearrangements and gene mobility occurred during angiosperm evolution. If one compares the pace of primate evolution of 90 million years (233 species) to 60 million years of the Poaceae (10,000 species), change in chromosome structure through speciation has accelerated significantly in plants.
Xu, JH, Messing J.  2008.  Organization of the prolamin gene family provides insight into the evolution of the maize genome and gene duplications in grass species. Proc Natl Acad Sci U S A. 105:14330-5. AbstractWebsite
Zea mays, commonly known as corn, is perhaps the most greatly produced crop in terms of tonnage and a major food, feed, and biofuel resource. Here we analyzed its prolamin gene family, encoding the major seed storage proteins, as a model for gene evolution by syntenic alignments with sorghum and rice, two genomes that have been sequenced recently. Because a high-density gene map has been constructed for maize inbred B73, all prolamin gene copies can be identified in their chromosomal context. Alignment of respective chromosomal regions of these species via conserved genes allow us to identify the pedigree of prolamin gene copies in space and time. Its youngest and largest gene family, the alpha prolamins, arose about 22-26 million years ago (Mya) after the split of the Panicoideae (including maize, sorghum, and millet) from the Pooideae (including wheat, barley, and oats) and Oryzoideae (rice). The first dispersal of alpha prolamin gene copies occurred before the split of the progenitors of maize and sorghum about 11.9 Mya. One of the two progenitors of maize gained a new alpha zein locus, absent in the other lineage, to form a nonduplicated locus in maize after allotetraplodization about 4.8 Mya. But dispersed copies gave rise to tandem duplications through uneven expansion and gene silencing of this gene family in maize and sorghum, possibly because of maize's greater recombination and mutation rates resulting from its diploidization process. Interestingly, new gene loci in maize represent junctions of ancestral chromosome fragments and sites of new centromeres in sorghum and rice.
Xu, J-H, Messing J.  2008.  Diverged Copies of the Seed Regulatory Opaque-2 Gene by a Segmental Duplication in the Progenitor Genome of Rice, Sorghum, and Maize. Mol Plant %R 10.1093/mp/ssn038. 1:760-769. AbstractWebsite
Comparative analyses of the sequence of entire genomes have shown that gene duplications, chromosomal segmental duplications, or even whole genome duplications (WGD) have played prominent roles in the evolution of many eukaryotic species. Here, we used the ancient duplication of a well known transcription factor in maize, encoded by the Opaque-2 (O2) locus, to examine the general features of divergences of chromosomal segmental duplications in a lineage-specific manner. We took advantage of contiguous chromosomal sequence information in rice (Oryza sativa, Nipponbare), sorghum (Sorghum bicolor, Btx623), and maize (Zea mays, B73) that were aligned by conserved gene order (synteny). This analysis showed that the maize O2 locus is contained within a 1.25 million base-pair (Mb) segment on chromosome 7, which was duplicated {approx}56 million years ago (mya) before the split of rice and maize 50 mya. The duplicated region on chromosome 1 is only half the size and contains the maize OHP gene, which does not restore the o2 mutation although it encodes a protein with the same DNA and protein binding properties in endosperm. The segmental duplication is not only found in rice, but also in sorghum, which split from maize 11.9 mya. A detailed analysis of the duplicated regions provided examples for complex rearrangements including deletions, duplications, conversions, inversions, and translocations. Furthermore, the rice and sorghum genomes appeared to be more stable than the maize genome, probably because maize underwent allotetraploidization and then diploidization.
Wei, F, Coe E, Nelson W, Bharti AK, Engler F, Butler E, Kim H, Goicoechea JL, Chen M, Lee S et al..  2007.  Physical and Genetic Structure of the Maize Genome Reflects Its Complex Evolutionary History. PLoS Genet. 3:e123. AbstractWebsite
Maize (Zea mays L.) is one of the most important cereal crops and a model for the study of genetics, evolution, and domestication. To better understand maize genome organization and to build a framework for genome sequencing, we constructed a sequence-ready fingerprinted contig-based physical map that covers 93.5% of the genome, of which 86.1% is aligned to the genetic map. The fingerprinted contig map contains 25,908 genic markers that enabled us to align nearly 73% of the anchored maize genome to the rice genome. The distribution pattern of expressed sequence tags correlates to that of recombination. In collinear regions, 1 kb in rice corresponds to an average of 3.2 kb in maize, yet maize has a 6-fold genome size expansion. This can be explained by the fact that most rice regions correspond to two regions in maize as a result of its recent polyploid origin. Inversions account for the majority of chromosome structural variations during subsequent maize diploidization. We also find clear evidence of ancient genome duplication predating the divergence of the progenitors of maize and rice. Reconstructing the paleoethnobotany of the maize genome indicates that the progenitors of modern maize contained ten chromosomes.