Genome Structure

Messing, J, Llaca V.  1998.  Importance of anchor genomes for any plant genome project. Proceedings of the National Academy of Sciences of the United States of America. 95:2017-20. AbstractWebsite
Progress in agricultural and environmental technologies is hampered by a slower rate of gene discovery in plants than animals. The vast pool of genes in plants, however, will be an important resource for insertion of genes, via biotechnological procedures, into an array of plants, generating unique germ plasms not achievable by conventional breeding. It just became clear that genomes of grasses have evolved in a manner analogous to Lego blocks. Large chromosome segments have been reshuffled and stuffer pieces added between genes. Although some genomes have become very large, the genome with the fewest stuffer pieces, the rice genome, is the Rosetta Stone of all the bigger grass genomes. This means that sequencing the rice genome as anchor genome of the grasses will provide instantaneous access to the same genes in the same relative physical position in other grasses (e.g., corn and wheat), without the need to sequence each of these genomes independently. (i) The sequencing of the entire genome of rice as anchor genome for the grasses will accelerate plant gene discovery in many important crops (e.g., corn, wheat, and rice) by several orders of magnitudes and reduce research and development costs for government and industry at a faster pace. (ii) Costs for sequencing entire genomes have come down significantly. Because of its size, rice is only 12% of the human or the corn genome, and technology improvements by the human genome project are completely transferable, translating in another 50% reduction of the costs. (iii) The physical mapping of the rice genome by a group of Japanese researchers provides a jump start for sequencing the genome and forming an international consortium. Otherwise, other countries would do it alone and own proprietary positions.
Llaca, V, Messing J.  1998.  Amplicons of maize zein genes are conserved within genic but expanded and constricted in intergenic regions. The Plant journal : for cell and molecular biology. 15:211-20. AbstractWebsite
The 78,101 base pair long sequence of a cluster of 22-kDa alpha zein genes in the maize inbred BSSS53 was determined. Each zein gene is contained within a repeat unit that varies in length. If such a repeat, or amplicon, is aligned along the entire sequence, a 10.5-fold sequence amplification is delineated. Because of insertions and deletions in intergenic regions, many of the zein genes are spaced over different distances. Only three out of 10 zein-related sequences have an intact open reading frame, indicating an unusual large number of genes unable to contribute to the accumulation of normal-size 22-kDa zein proteins. It is proposed that the seven remaining zein-related sequences be considered gene reserves because of their potential to be restored by gene conversion. Intergenic insertions in the cluster range from 1098 to 14,896 base pairs. Although they are composed of transposable element sequences, they also contain additional open reading frames, two of them showing homology to rice cDNA sequences. The average amplicon is 4423 base pairs long, with the sequence surrounding each zein gene more than 90% conserved. Coincidently, the size of the amplicon is equivalent to the average gene density (one gene within 4640 bp) in the Arabidopsis thaliana genome, one of the smallest in plants. Successive steps of amplification and insertion of DNA might explain to a certain degree how genome size variation has been generated in plants.
International-Brachypodium-Initiative.  2010.  Genome sequencing and analysis of the model grass Brachypodium distachyon. Nature. 463:763-8. AbstractWebsite
Three subfamilies of grasses, the Ehrhartoideae, Panicoideae and Pooideae, provide the bulk of human nutrition and are poised to become major sources of renewable energy. Here we describe the genome sequence of the wild grass Brachypodium distachyon (Brachypodium), which is, to our knowledge, the first member of the Pooideae subfamily to be sequenced. Comparison of the Brachypodium, rice and sorghum genomes shows a precise history of genome evolution across a broad diversity of the grasses, and establishes a template for analysis of the large genomes of economically important pooid grasses such as wheat. The high-quality genome sequence, coupled with ease of cultivation and transformation, small size and rapid life cycle, will help Brachypodium reach its potential as an important model system for developing new energy and food crops.
Goettel, W, Messing J.  2009.  Change of gene structure and function by non-homologous end-joining, homologous recombination, and transposition of DNA. PLoS Genet. 5:e1000516. AbstractWebsite
An important objective in genome research is to relate genome structure to gene function. Sequence comparisons among orthologous and paralogous genes and their allelic variants can reveal sequences of functional significance. Here, we describe a 379-kb region on chromosome 1 of maize that enables us to reconstruct chromosome breakage, transposition, non-homologous end-joining, and homologous recombination events. Such a high-density composition of various mechanisms in a small chromosomal interval exemplifies the evolution of gene regulation and allelic diversity in general. It also illustrates the evolutionary pace of changes in plants, where many of the above mechanisms are of somatic origin. In contrast to animals, somatic alterations can easily be transmitted through meiosis because the germline in plants is contiguous to somatic tissue, permitting the recovery of such chromosomal rearrangements. The analyzed region contains the P1-wr allele, a variant of the genetically well-defined p1 gene, which encodes a Myb-like transcriptional activator in maize. The P1-wr allele consists of eleven nearly perfect P1-wr 12-kb repeats that are arranged in a tandem head-to-tail array. Although a technical challenge to sequence such a structure by shotgun sequencing, we overcame this problem by subcloning each repeat and ordering them based on nucleotide variations. These polymorphisms were also critical for recombination and expression analysis in presence and absence of the trans-acting epigenetic factor Ufo1. Interestingly, chimeras of the p1 and p2 genes, p2/p1 and p1/p2, are framing the P1-wr cluster. Reconstruction of sequence amplification steps at the p locus showed the evolution from a single Myb-homolog to the multi-gene P1-wr cluster. It also demonstrates how non-homologous end-joining can create novel gene fusions. Comparisons to orthologous regions in sorghum and rice also indicate a greater instability of the maize genome, probably due to diploidization following allotetraploidization.
Bruggmann, R, Bharti AK, Gundlach H, Lai J, Young S, Pontaroli AC, Wei F, Haberer G, Fuks G, Du C et al..  2006.  Uneven chromosome contraction and expansion in the maize genome. Genome research. 16:1241-51. AbstractWebsite
Maize (Zea mays or corn), both a major food source and an important cytogenetic model, evolved from a tetraploid that arose about 4.8 million years ago (Mya). As a result, maize has extensive duplicated regions within its genome. We have sequenced the two copies of one such region, generating 7.8 Mb of sequence spanning 17.4 cM of the short arm of chromosome 1 and 6.6 Mb (25.6 cM) from the long arm of chromosome 9. Rice, which did not undergo a similar whole genome duplication event, has only one orthologous region (4.9 Mb) on the short arm of chromosome 3, and can be used as reference for the maize homoeologous regions. Alignment of the three regions allowed identification of syntenic blocks, and indicated that the maize regions have undergone differential contraction in genic and intergenic regions and expansion by the insertion of retrotransposable elements. Approximately 9% of the predicted genes in each duplicated region are completely missing in the rice genome, and almost 20% have moved to other genomic locations. Predicted genes within these regions tend to be larger in maize than in rice, primarily because of the presence of predicted genes in maize with larger introns. Interestingly, the general gene methylation patterns in the maize homoeologous regions do not appear to have changed with contraction or expansion of their chromosomes. In addition, no differences in methylation of single genes and tandemly repeated gene copies have been detected. These results, therefore, provide new insights into the diploidization of polyploid species.
Haberer, G, Young S, Bharti AK, Gundlach H, Raymond C, Fuks G, Butler E, Wing RA, Rounsley S, Birren B et al..  2005.  Structure and architecture of the maize genome. Plant physiology. 139:1612-24. AbstractWebsite
Maize (Zea mays or corn) plays many varied and important roles in society. It is not only an important experimental model plant, but also a major livestock feed crop and a significant source of industrial products such as sweeteners and ethanol. In this study we report the systematic analysis of contiguous sequences of the maize genome. We selected 100 random regions averaging 144 kb in size, representing about 0.6% of the genome, and generated a high-quality dataset for sequence analysis. This sampling contains 330 annotated genes, 91% of which are supported by expressed sequence tag data from maize and other cereal species. Genes averaged 4 kb in size with five exons, although the largest was over 59 kb with 31 exons. Gene density varied over a wide range from 0.5 to 10.7 genes per 100 kb and genes did not appear to cluster significantly. The total repetitive element content we observed (66%) was slightly higher than previous whole-genome estimates (58%-63%) and consisted almost exclusively of retroelements. The vast majority of genes can be aligned to at least one sequence read derived from gene-enrichment procedures, but only about 30% are fully covered. Our results indicate that much of the increase in genome size of maize relative to rice (Oryza sativa) and Arabidopsis (Arabidopsis thaliana) is attributable to an increase in number of both repetitive elements and genes.
Lai, J, Li Y, Messing J, Dooner HK.  2005.  Gene movement by Helitron transposons contributes to the haplotype variability of maize. Proceedings of the National Academy of Sciences of the United States of America. 102:9068-73. AbstractWebsite
Different maize inbred lines are polymorphic for the presence or absence of genic sequences at various allelic chromosomal locations. In the bz genomic region, located in 9S, sequences homologous to four different genes from rice and Arabidopsis are present in line McC but absent from line B73. It is shown here that this apparent intraspecific violation of genetic colinearity arises from the movement of genes or gene fragments by Helitrons, a recently discovered class of eukaryotic transposons. Two Helitrons, HelA and HelB, account for all of the genic differences distinguishing the two bz locus haplotypes. HelA is 5.9 kb long and contains sequences for three of the four genes found only in the McC bz genomic region. A nearly identical copy of HelA was isolated from a 5S chromosomal location in B73. Both the 9S and 5S sites appear to be polymorphic in maize, suggesting that these Helitrons have been active recently. Helitrons lack the strong predictive terminal features of other transposons, so the definition of their ends is greatly facilitated by the identification of their vacant sites in Helitron-minus lines. The ends of the 2.7-kb HelB Helitron were discerned from a comparison of the McC haplotype sequence with that of yet a third line, Mo17, because the HelB vacant site is deleted in B73. Maize Helitrons resemble rice Pack-MULEs in their ability to capture genes or gene fragments from several loci and move them around the genome, features that confer on them a potential role in gene evolution.
Lai, J, Dey N, Kim CS, Bharti AK, Rudd S, Mayer KF, Larkins BA, Becraft P, Messing J.  2004.  Characterization of the maize endosperm transcriptome and its comparison to the rice genome. Genome research. 14:1932-7. AbstractWebsite
The cereal endosperm is a major organ of the seed and an important component of the world's food supply. To understand the development and physiology of the endosperm of cereal seeds, we focused on the identification of genes expressed at various times during maize endosperm development. We constructed several cDNA libraries to identify full-length clones and subjected them to a twofold enrichment. A total of 23,348 high-quality sequence-reads from 5'- and 3'-ends of cDNAs were generated and assembled into a unigene set representing 5326 genes with paired sequence-reads. Additional sequencing yielded a total of 3160 (59%) completely sequenced, full-length cDNAs. From 5326 unigenes, 4139 (78%) can be aligned with 5367 predicted rice genes and by taking only the "best hit" be mapped to 3108 positions on the rice genome. The 22% unigenes not present in rice indicate a rapid change of gene content between rice and maize in only 50 million years. Differences in rice and maize gene numbers also suggest that maize has lost a large number of duplicated genes following tetraploidization. The larger number of gene copies in rice suggests that as many as 30% of its genes arose from gene amplification, which would extrapolate to a significant proportion of the estimated 44,027 candidate genes of its entire genome. Functional classification of the maize endosperm unigene set indicated that more than a fourth of the novel functionally assignable genes found in this study are involved in carbohydrate metabolism, consistent with its role as a storage organ.
Lai, J, Ma J, Swigonova Z, Ramakrishna W, Linton E, Llaca V, Tanyolac B, Park YJ, Jeong OY, Bennetzen JL et al..  2004.  Gene loss and movement in the maize genome. Genome research. 14:1924-31. AbstractWebsite
Maize (Zea mays L. ssp. mays), one of the most important agricultural crops in the world, originated by hybridization of two closely related progenitors. To investigate the fate of its genes after tetraploidization, we analyzed the sequence of five duplicated regions from different chromosomal locations. We also compared corresponding regions from sorghum and rice, two important crops that have largely collinear maps with maize. The split of sorghum and maize progenitors was recently estimated to be 11.9 Mya, whereas rice diverged from the common ancestor of maize and sorghum approximately 50 Mya. A data set of roughly 4 Mb yielded 206 predicted genes from the three species, excluding any transposon-related genes, but including eight gene remnants. On average, 14% of the genes within the aligned regions are noncollinear between any two species. However, scoring each maize region separately, the set of noncollinear genes between all four regions jumps to 68%. This is largely because at least 50% of the duplicated genes from the two progenitors of maize have been lost over a very short period of time, possibly as short as 5 million years. Using the nearly completed rice sequence, we found noncollinear genes in other chromosomal positions, frequently in more than one. This demonstrates that many genes in these species have moved to new chromosomal locations in the last 50 million years or less, most as single gene events that did not dramatically alter gene structure.