Amplification of prolamin storage protein genes in different subfamilies of the Poaceae.
Theor Appl Genet. Abstract
Prolamins are seed storage proteins in cereals and represent an important source of essential amino acids for feed and food. Genes encoding these proteins resulted from dispersed and tandem amplification. While previous studies have concentrated on protein sequences from different grass species, we now can add a new perspective to their relationships by asking how their genes are shared by ancestry and copied in different lineages of the same family of species. These differences are derived from alignment of chromosomal regions, where collinearity is used to identify prolamin genes in syntenic positions, also called orthologous gene copies. New or paralogous gene copies are inserted in tandem or new locations of the same genome. More importantly, one can detect the loss of older genes. We analyzed chromosomal intervals containing prolamin genes from rice, sorghum, wheat, barley, and Brachypodium, representing different subfamilies of the Poaceae. The Poaceae commonly known as the grasses includes three major subfamilies, the Ehrhartoideae (rice), Pooideae (wheat, barley, and Brachypodium), and Panicoideae (millets, maize, sorghum, and switchgrass). Based on chromosomal position and sequence divergence, it becomes possible to infer the order of gene amplification events. Furthermore, the loss of older genes in different subfamilies seems to permit a faster pace of divergence of paralogous genes. Change in protein structure affects their physical properties, subcellular location, and amino acid composition. On the other hand, regulatory sequence elements and corresponding transcriptional activators of new gene copies are more conserved than coding sequences, consistent with the tissue-specific expression of these genes.
Ancestral grass karyotype reconstruction unravels new mechanisms of genome shuffling as a source of plant evolution.
Genome Res. 20:1545-57. Abstract
The comparison of the chromosome numbers of today's species with common reconstructed paleo-ancestors has led to intense speculation of how chromosomes have been rearranged over time in mammals. However, similar studies in plants with respect to genome evolution as well as molecular mechanisms leading to mosaic synteny blocks have been lacking due to relevant examples of evolutionary zooms from genomic sequences. Such studies require genomes of species that belong to the same family but are diverged to fall into different subfamilies. Our most important crops belong to the family of the grasses, where a number of genomes have now been sequenced. Based on detailed paleogenomics, using inference from n = 5-12 grass ancestral karyotypes (AGKs) in terms of gene content and order, we delineated sequence intervals comprising a complete set of junction break points of orthologous regions from rice, maize, sorghum, and Brachypodium genomes, representing three different subfamilies and different polyploidization events. By focusing on these sequence intervals, we could show that the chromosome number variation/reduction from the n = 12 common paleo-ancestor was driven by nonrandom centric double-strand break repair events. It appeared that the centromeric/telomeric illegitimate recombination between nonhomologous chromosomes led to nested chromosome fusions (NCFs) and synteny break points (SBPs). When intervals comprising NCFs were compared in their structure, we concluded that SBPs (1) were meiotic recombination hotspots, (2) corresponded to high sequence turnover loci through repeat invasion, and (3) might be considered as hotspots of evolutionary novelty that could act as a reservoir for producing adaptive phenotypes.