Genome Structure

Abrouk, M, Murat F, Pont C, Messing J, Jackson S, Faraut T, Tannier E, Plomion C, Cooke R, Feuillet C et al..  2010.  Palaeogenomics of plants: synteny-based modelling of extinct ancestors. Trends Plant Sci. 15:479-87. AbstractWebsite
In the past ten years, international initiatives have led to the development of large sets of genomic resources that allow comparative genomic studies between plant genomes at a high level of resolution. Comparison of map-based genomic sequences revealed shared intra-genomic duplications, providing new insights into the evolution of flowering plant genomes from common ancestors. Plant genomes can be presented as concentric circles, providing a new reference for plant chromosome evolutionary relationships and an efficient tool for gene annotation and cross-genome markers development. Recent palaeogenomic data demonstrate that whole-genome duplications have provided a motor for the evolutionary success of flowering plants over the last 50-70 million years.
Murat, F, Xu JH, Tannier E, Abrouk M, Guilhot N, Pont C, Messing J, Salse J.  2010.  Ancestral grass karyotype reconstruction unravels new mechanisms of genome shuffling as a source of plant evolution. Genome Res. 20:1545-57. AbstractWebsite
The comparison of the chromosome numbers of today's species with common reconstructed paleo-ancestors has led to intense speculation of how chromosomes have been rearranged over time in mammals. However, similar studies in plants with respect to genome evolution as well as molecular mechanisms leading to mosaic synteny blocks have been lacking due to relevant examples of evolutionary zooms from genomic sequences. Such studies require genomes of species that belong to the same family but are diverged to fall into different subfamilies. Our most important crops belong to the family of the grasses, where a number of genomes have now been sequenced. Based on detailed paleogenomics, using inference from n = 5-12 grass ancestral karyotypes (AGKs) in terms of gene content and order, we delineated sequence intervals comprising a complete set of junction break points of orthologous regions from rice, maize, sorghum, and Brachypodium genomes, representing three different subfamilies and different polyploidization events. By focusing on these sequence intervals, we could show that the chromosome number variation/reduction from the n = 12 common paleo-ancestor was driven by nonrandom centric double-strand break repair events. It appeared that the centromeric/telomeric illegitimate recombination between nonhomologous chromosomes led to nested chromosome fusions (NCFs) and synteny break points (SBPs). When intervals comprising NCFs were compared in their structure, we concluded that SBPs (1) were meiotic recombination hotspots, (2) corresponded to high sequence turnover loci through repeat invasion, and (3) might be considered as hotspots of evolutionary novelty that could act as a reservoir for producing adaptive phenotypes.
Paterson, AH, Bowers JE, Bruggmann R, Dubchak I, Grimwood J, Gundlach H, Haberer G, Hellsten U, Mitros T, Poliakov A et al..  2009.  The Sorghum bicolor genome and the diversification of grasses. Nature. 457:551-6. AbstractWebsite
Sorghum, an African grass related to sugar cane and maize, is grown for food, feed, fibre and fuel. We present an initial analysis of the approximately 730-megabase Sorghum bicolor (L.) Moench genome, placing approximately 98% of genes in their chromosomal context using whole-genome shotgun sequence validated by genetic, physical and syntenic information. Genetic recombination is largely confined to about one-third of the sorghum genome with gene order and density similar to those of rice. Retrotransposon accumulation in recombinationally recalcitrant heterochromatin explains the approximately 75% larger genome size of sorghum compared with rice. Although gene and repetitive DNA distributions have been preserved since palaeopolyploidization approximately 70 million years ago, most duplicated gene sets lost one member before the sorghum-rice divergence. Concerted evolution makes one duplicated chromosomal segment appear to be only a few million years old. About 24% of genes are grass-specific and 7% are sorghum-specific. Recent gene and microRNA duplications may contribute to sorghum's drought tolerance.
Messing, J.  2009.  Synergy of two reference genomes for the grass family. Plant Physiol. 149:117-24.Website
Nelson, W, Luo M, Ma J, Estep M, Estill J, He R, Talag J, Sisneros N, Kudrna D, Kim H et al..  2008.  Methylation-sensitive linking libraries enhance gene-enriched sequencing of complex genomes and map DNA methylation domains. BMC Genomics. 9:621. AbstractWebsite
ABSTRACT: BACKGROUND: Many plant genomes are resistant to whole-genome assembly due to an abundance of repetitive sequence, leading to the development of gene-rich sequencing techniques. Two such techniques are hypomethylated partial restriction (HMPR) and methylation spanning linker libraries (MSLL). These libraries differ from other gene-rich datasets in having larger insert sizes, and the MSLL clones are designed to provide reads localized to "epigenetic boundaries" where methylation begins or ends. RESULTS: A large-scale study in maize generated 40,299 HMPR sequences and 80,723 MSLL sequences, including MSLL clones exceeding 100 kb. The paired end reads of MSLL and HMPR clones were shown to be effective in linking existing gene-rich sequences into scaffolds. In addition, it was shown that the MSLL clones can be used for anchoring these scaffolds to a BAC-based physical map. The MSLL end reads effectively identified epigenetic boundaries, as indicated by their preferential alignment to regions upstream and downstream from annotated genes. The ability to precisely map long stretches of fully methylated DNA sequence is a unique outcome of MSLL analysis, and was also shown to provide evidence for errors in gene identification. MSLL clones were observed to be significantly more repeat-rich in their interiors than in their end reads, confirming the correlation between methylation and retroelement content. Both MSLL and HMPR reads were found to be substantially gene-enriched, with the SalI MSLL libraries being the most highly enriched (31% align to an EST contig), while the HMPR clones exhibited exceptional depletion of repetitive DNA (to ~11%). These two techniques were compared with other gene-enrichment methods, and shown to be complementary. CONCLUSION: MSLL technology provides an unparalleled approach for mapping the epigenetic status of repetitive blocks and for identifying sequences mis-identified as genes. Although the types and natures of epigenetic boundaries are barely understood at this time, MSLL technology flags both approximate boundaries and methylated genes that deserve additional investigation. MSLL and HMPR sequences provide a valuable resource for maize genome annotation, and are a uniquely valuable complement to any plant genome sequencing project. In order to make these results fully accessible to the community, a web display was developed that shows the alignment of MSLL, HMPR, and other gene-rich sequences to the BACs; this display is continually updated with the latest ESTs and BAC sequences.
Messing, J, Dooner HK.  2006.  Organization and variability of the maize genome. Current opinion in plant biology. 9:157-63. AbstractWebsite
With a size approximating that of the human genome, the maize genome is about to become the largest plant genome yet sequenced. Contributing to that size are a whole-genome duplication event and a retrotransposition explosion that produced a large amount of repetitive DNA. This DNA is greatly under-represented in cDNA collections, so analysis of the maize transcriptome has been an expedient way of assessing the gene content of maize. Over 2 million maize cDNA sequences are now available, making maize the third most widely studied organism, behind mouse and man. To date, the sequencing of large-sized DNA clones has been largely driven by the genetic interests of different investigators. The recent construction of a physical map that is anchored to the genetic map will aid immensely in the maize genome-sequencing effort. However, studies showing that the repetitive DNA component is highly polymorphic among maize inbred lines point to the need to sample vertically a few specific regions of the genome to evaluate the extent and importance of this variability.
Matsumoto, T, Wu JZ, Kanamori H, Katayose Y, Fujisawa M, Namiki N, Mizuno H, Yamamoto K, Antonio BA, Baba T et al..  2005.  The map-based sequence of the rice genome. Nature. 436:793-800.Website
Messing, J, Bharti AK, Karlowski WM, Gundlach H, Kim HR, Yu Y, Wei F, Fuks G, Soderlund CA, Mayer KF et al..  2004.  Sequence composition and genome organization of maize. Proceedings of the National Academy of Sciences of the United States of America. 101:14349-54. AbstractWebsite
Zea mays L. ssp. mays, or corn, one of the most important crops and a model for plant genetics, has a genome approximately 80% the size of the human genome. To gain global insight into the organization of its genome, we have sequenced the ends of large insert clones, yielding a cumulative length of one-eighth of the genome with a DNA sequence read every 6.2 kb, thereby describing a large percentage of the genes and transposable elements of maize in an unbiased approach. Based on the accumulative 307 Mb of sequence, repeat sequences occupy 58% and genic regions occupy 7.5%. A conservative estimate predicts approximately 59,000 genes, which is higher than in any other organism sequenced so far. Because the sequences are derived from bacterial artificial chromosome clones, which are ordered in overlapping bins, tagged genes are also ordered along continuous chromosomal segments. Based on this positional information, roughly one-third of the genes appear to consist of tandemly arrayed gene families. Although the ancestor of maize arose by tetraploidization, fewer than half of the genes appear to be present in two orthologous copies, indicating that the maize genome has undergone significant gene loss since the duplication event.
Song, R, Messing J.  2003.  Gene expression of a gene family in maize based on noncollinear haplotypes. Proceedings of the National Academy of Sciences of the United States of America. 100:9055-60. AbstractWebsite
Genomic regions of nearly every species diverged into different haplotypes, mostly based on point mutations, small deletions, and insertions that do not affect the collinearity of genes within a species. However, the same genomic interval containing the z1C gene cluster of two inbred lines of Zea mays significantly lost their gene collinearity and also differed in the regulation of each remaining gene set. Furthermore, when inbreds were reciprocally crossed, hybrids exhibited an unexpected shift of expression patterns so that "overdominance" instead of "dominance complementation" of allelic and nonallelic gene expression occurred. The same interval also differed in length (360 vs. 263 kb). Segmental rearrangements led to sequence changes, which were further enhanced by the insertion of different transposable elements. Changes in gene order affected not only z1C genes but also three unrelated genes. However, the orthologous interval between two subspecies of rice (not rice cultivars) was conserved in length and gene order, whereas changes between two maize inbreds were as drastic as changes between maize and sorghum. Given that chromosomes could conceivably consist of intervals of haplotypes that are highly diverged, one could envision endless breeding opportunities because of their linear arrangement along a chromosome and their expression potential in hybrid combinations ("binary" systems). The implication of such a hypothesis for heterosis is discussed.
Song, R, Llaca V, Messing J.  2002.  Mosaic organization of orthologous sequences in grass genomes. Genome research. 12:1549-55. AbstractWebsite
Although comparative genetic mapping studies show extensive genome conservation among grasses, recent data provide many exceptions to gene collinearity at the DNA sequence level. Rice, sorghum, and maize are closely related grass species, once sharing a common ancestor. Because they diverged at different times during evolution, they provide an excellent model to investigate sequence divergence. We isolated, sequenced, and compared orthologous regions from two rice subspecies, sorghum, and maize to investigate the nature of their sequence differences. This study represents the most extensive sequence comparison among grasses, including the largest contiguous genomic sequences from sorghum (425 kb) and maize (435 kb) to date. Our results reveal a mosaic organization of the orthologous regions, with conserved sequences interspersed with nonconserved sequences. Gene amplification, gene movement, and retrotransposition account for the majority of the nonconserved sequences. Our analysis also shows that gene amplification is frequently linked with gene movement. Analyzing an additional 2.9 Mb of genomic sequence from rice not only corroborates our observations, but also suggests that a significant portion of grass genomes may consist of paralogous sequences derived from gene amplification. We propose that sequence divergence started from hotspots along chromosomes and expanded by accumulating small-scale genomic changes during evolution.