ABSTRACT
The competitiveness of peanuts in domestic and global markets has been threatened by losses in productivity and quality that are attributed to diseases, pests, environmental stresses and allergy or food safety issues. Narrow genetic diversity and a deficiency of polymorphic DNA markers severely hindered construction of dense genetic maps and quantitative trait loci (QTL) mapping in order to deploy linked markers in marker-assisted peanut improvement. The U.S. Peanut Genome Initiative (PGI) was launched in 2004, and expanded to a global effort in 2006 to address these issues through coordination of international efforts in genome research beginning with molecular marker development and improvement of map resolution and coverage. Ultimately, a peanut genome sequencing project was launched in 2012 by the Peanut Genome Consortium (PGC). We reviewed the progress for accelerated development of peanut genomic resources in peanut, such as generation of expressed sequenced tags (ESTs) (252,832 ESTs as December 2012 in the public NCBI EST database), development of molecular markers (over 15,518 SSRs), and construction of peanut genetic linkage maps, in particular for cultivated peanut. Several consensus genetic maps have been constructed, and there are examples of recent international efforts to develop high density maps. An international reference consensus genetic map was developed recently with 897 marker loci based on 11 published mapping populations. Furthermore, a high-density integrated consensus map of cultivated peanut and wild diploid relatives also has been developed, which was enriched further with 3693 marker loci on a single map by adding information from five new genetic mapping populations to the published reference consensus map.
The world's population is predicted to reach nine billion by 2050 (Nature Editorial, 2010), which means greater demand for food, hence, a continuing need to produce improved cultivars of crop plants. Advances in food production will require greater efforts in agricultural research to increase crop yield with improved genetics for plant protection from biotic and abiotic stresses. Agricultural biotechnology is one tool that holds great promise for sustainability of agricultural production and feeding an ever increasing population.
Peanut (Arachis hypogaea L.), or groundnut, is one of the major economically important legumes that is cultivated worldwide for its ability to grow in semi-arid environments with relatively low inputs of chemical fertilizers. On a global basis, peanut also is a major source of protein and vegetable oil for human nutrition, containing about 28% protein, 50% oil and 18% carbohydrates. Peanut is cultivated in more than 100 countries in Asia, Africa and the Americas, grown mostly by resource-limited farmers of the semi-arid regions. India and China together produce almost two-thirds of the world's peanuts, and the U.S. produces about 6% (Guo et al., 2012a). Nearly two-thirds of global production is crushed for oil and the remaining third is consumed as food. Peanut plays important roles in food and nutritional security along with improving the livelihood of resource-poor farmers. Peanut production has a significant role in sustainable agriculture, global food security and nutrition, fuel and energy, and enhanced agricultural productivity.
Genetics and genomics have substantial potential to enhance sustainable peanut production. The major contribution of these technologies for peanut will likely be improved disease resistance, oil quality, and enhanced productivity. Those attributes may be achieved more effectively through genomic biotechnology to utilize the genetic resources preserved in germplasm collections for maximizing the genetic potential in plant breeding and genetics programs. Superior varieties will maximize desirable genetic traits and provide growers with cultivars that are locally adapted and highly productive. Genomics involves the study of the complete genetic makeup of plants, through mapping, sequencing, and functional studies to identify genes that regulate, control and modify trait expression. Together plant breeding, genetics, and genomics are powerful approaches to enhance sustainability of agriculture. However, genomic research in peanut lags behind that of other crops due to the shortage of essential genomic infrastructure, tools, and resources. In addition, the cultivated peanut is an allotetraploid with a large genome size, which greatly complicates interpretation of genomic data because of genomes duplication.
Recognizing the challenges and importance of this crop, the peanut research community established the Peanut Genomics Initiative (PGI). This review focuses on updates and accomplishments of the PGI in three areas: (1) brief chronology of recent efforts in peanut genomics; (2) recent developments in molecular markers; and (3) recent advances in genetic linkage maps in diploid wild relatives and cultivated peanut.
Brief History of the Peanut Genome Initiative
In the light of the challenges and opportunities facing cool and warm season legume crops, the international research community has cooperated to develop new genomic technologies for legume crop improvement. These efforts were initiated at a meeting at Hunt Valley, MD on 30–31 July 2001. Twenty-six legume scientists with knowledge of structural and functional genomics, DNA markers, transformation, bioinformatics, and legume crop improvement participated in a workshop hosted by the United Soybean Board, the National Peanut Foundation, the USA Dry Pea and Lentil Council, and the USDA-ARS to develop a strategy to advance genomics research across five economically important legume species. The group of scientists published the U.S. Legume Crops Genomics White Paper (Boerma et al., 2001) that outlined six areas where progress was needed across all species, including: (i) genome sequencing of strategic legume species, (ii) physical map development and refinement, (iii) functional analysis, (iv) development of DNA markers for comparative mapping and breeding, (v) characterization and utilization of legume biodiversity, and (vi) development of legume data resources. This meeting was followed by a workshop in Santa Fe, NM, in 2003, where nearly 50 legume researchers and funding agency representatives met in Santa Fe, NM, to develop a plan for cross-legume genomics research and to develop an action plan for legume research (Gepts et al., 2005). The peanut scientific community participated in both workshops. These scientists published the status of genomic resources for each legume crop, including peanut, in the book, Legume Crop Genomics (Wilson et al., 2004) under the auspices of the U.S. Legume Crop Genome Initiative (LCGI).
On 22–23 March, 2004, U.S. and international peanut scientists participated in a workshop hosted by the Peanut Foundation/American Peanut Council in Atlanta, GA (Fig. 1). A National Strategic Plan for the Peanut Genome Initiative (Wilson, 2006b) was developed that outlined six objectives for the years 2004–2008: (1) improve the utility of genetic tools for peanut genomic research and develop useful molecular markers and genetic maps for peanut, (2) improve the efficacy of technology for gene manipulation in genomes and develop useful transformation methods for functional genomic research in peanut, (3) develop a framework for assembling the peanut genetic blueprint and locate abundant and rarely expressed genes, using genetic and physical approaches to integrate diverse data types, (4) improve knowledge of gene identification and regulation, (5) provide bioinformatics management of peanut biological information resources, and (6) determine the allergenic potential of peanut proteins. An action plan summarized in the white paper National Program Action Plan for the Peanut Genome Initiative soon followed (Wilson, 2006a); and in 2006 an assessment of costs associated with genomic research were presented in the Biotech Peanut White Paper “Benefits and Issues” (Valentine et al., 2006).
In 2006, the PGI sought to expand its mission through outreach to the international peanut research community. The foundation for this effort was established in November 2006 in Guangzhou, China at the International Conference on Aflatoxin Management and Genomics where delegates from nine nations voted to maintain an open dialog to explore opportunities for cooperative research, and to take steps toward achieving that goal with annual meetings. A PGI proposal was accepted to host the second conference of the international peanut research community on 24–26 October 2007 in Atlanta, GA. This meeting, Advances in Arachis through Genomics & Biotechnology: An International Strategic Planning Workshop, was another step toward bringing members of the international peanut community together to foster research collaboration on high priority issues. The International Strategic Plan for the Peanut Genome Initiative 2008–2012: Improving Crop Productivity, Protection, and Product Safety & Quality was developed at this workshop.
Since then the tradition of excellence that was established in Guangzhou has been upheld at subsequent meetings including Advances in Arachis through Genomics & Biotechnology (AAGB-2008) at International Crops Research Institute for the Semi-Arid Tropics (ICRISAT) in India, AAGB-2009 in Mali, and AAGB-2011 in Brazil. The Peanut Foundation and the American Peanut Council on behalf of the peanut research community discussed pursuing a peanut whole genome sequencing project on 12 July 2010, at Clearwater, FL, leading to the Peanut Genome Project Inaugural Meeting in Atlanta, 8 December 2010. On 12 January 2011, at the Plant and Animal Genome Conference, San Diego, CA, the executive committee tentatively decided to sequence four peanut cultivars and 200 recombinant inbred line (RIL) progenies derived from these parents (Qin et al., 2012) in collaboration with Chinese peanut researchers. The official launch of the International Peanut Genome Mapping Project was discussed at AAGB-2011 in Brasilia, Brazil, and the International Peanut Genomic Research Initiative - Strategic Plan for 2012–2016 was developed (Fig. 2).
Sequencing and assembly strategies were discussed and adopted on 28 March 2012 in Atlanta (Fig. 1). The Peanut Genome Consortium (PGC) is an extension of the International Peanut Genome Initiative (IPGI) which has more than 135 members representing 79 institutions from 20 countries, and is embodied by a coalition of international scientists and stakeholders engaged in the Peanut Genome Project (PGP). The chronology and progress of that effort was documented in the International Peanut Mapping Project. AAGB-2013 will be held in Zhengzhou, China, from 17–21 June 2013 and will be a forum to foster and align research activities with the International Peanut Genomic Research Initiative - Strategic Plan for 2012–2016 (Fig. 2).
Recent Development in Molecular Markers
In the early 1990s, peanut proteins were used as markers to identify genetic diversity among cultivated peanut and its wild diploid relatives; however, limited polymorphisms were detected (Lacks and Stalker, 1993). A greater level of polymorphism was discovered among wild species as compared to cultivated peanut (Lanham et al., 1994) suggesting domestication was a bottleneck. Advances in other marker types, such as randomly amplified polymorphic DNA (RAPD) (Halward and Stalker, 1991; Halward et al., 1992), restriction fragment length polymorphism (RFLP) (Halward and Stalker, 1991; Kochert et al., 1996; Gimenes et al., 2002; Burow et al., 2009), amplified fragment length polymorphism (AFLP) (Herselman et al., 2004), simple sequence repeat (SSR) (He et al., 2003; Gimenes et al., 2007; Cuc et al., 2008; Liang et al., 2009), sequence-related amplified polymorphism (SRAP) ( Wang et al. 2010), single strand conformational polymorphism (SSCP) (Nagy et al., 2010), and single nucleotide polymorphism (SNP) (Nagy et al., 2012), soon replaced the early exploration with proteins.
During the past two decades, much effort has been made to develop genetic and genomic tools in cultivated peanut, such as construction of BAC libraries (Yuksel and Paterson, 2005; Guimarães et al., 2008), cDNA libraries (Luo et al., 2005; Proite et al., 2007; Guo et al., 2008, 2009; Koilkonda et al., 2012), RNAseq using next generation sequencing technology (Guimaraes et al., 2012; Zhang et al., 2012) and development of DNA markers (see reviews of Feng et al., 2012; Pandey et al., 2012; Zhao et al., 2012; Varshney et al., 2013) (Table 1). Among the various molecular markers investigated to date, simple sequence repeats (SSR) have emerged as one of the preferred DNA marker system for conducting genetic and genomic studies in cultivated peanut. In recent years, a relatively large number of EST sequences have been made available in the National Center for Biotechnology Information (NCBI) public database for Arachis including the cultivated species and its wild relatives, particularly after the historic 2004 Atlanta Genomics Workshop where the ESTs were identified as a priority for marker development (Feng et al., 2012). As of December 2012, the international peanut research community had deposited 252,832 ESTs in the public NCBI EST database, and this resource has been providing the tools needed for genome-scale experiments before the whole genome sequencing project completion (Payton et al., 2009; Guo et al., 2011). To date, 15,518 SSRs have been developed by various research groups, although some are likely to be redundant (Table 1).
Several reviews have recently summarized progress in peanut genetics and genomics tool and resource development (see reviews of Feng et al., 2012; Pandey et al., 2012; Zhao et al., 2012, Varshney et al., 2013). Feng et al. (2012) reported on EST progress and application (Fig. 1). Pandey et al. (2012) and Varshney et al. (2013) reported that the last five years have witnessed accelerated expansion of genomic resources such as development of molecular markers, genetic and physical maps, generation of ESTs, development of mutant resources, and functional genomics platforms that facilitate the identification of QTLs and discovery of genes associated with tolerance/resistance to abiotic and biotic stresses and agronomic traits. Molecular breeding efforts have been initiated for several traits for development of superior genotypes (Chu et al., 2011; Pasupuleti et al., 2013). In summary, 15,518 SSR markers were generated during the past decade, particularly in the last five years. It is anticipated that these informative markers will be useful to accelerate molecular genetics and breeding in cultivated peanut.
Recently, Zhao et al. (2012) compiled a list of 9274 SSRs in both cultivated and wild peanut species and integrated various research reports of peanut DNA polymorphism into a single platform. Zhao et al. (2012) also identified 1343 of these as detecting polymorphisms in a panel of eight cultivated peanut genotypes (14.5%).
Next-generation sequencing technology (NGS) has provided a powerful approach for analyzing the transcriptome. Although it is still a challenge to assemble whole complex genome using NGS, the technology has facilitated cost-effective, large scale generation of ESTs. Since the reports of Pandey et al. (2012) and Zhao et al. (2012), there have been two publications of using NGS to generate EST-SSRs (Zhang et al., 2012; Guimaraes et al., 2012) (Table 1) and for SNP development (Nagy et al., 2012). Zhang et al. (2012) used Illumina HiSeq™ 2000 to analyze the transcriptomes of the immature seeds of three peanut varieties with different oil contents. A total of 26.1–27.2 million paired-end reads were generated and 59,077 unigenes were assembled with an N50 of 823 bp. In addition, 3919 microsatellite markers were developed from the unigene library, and 160 PCR primers for SSR loci were used for validation. Guimaraes et al. (2012) used 454 GS FLX Titanium to generate a total of 7.4 × 105 raw sequence reads covering 211 Mbp of two A-genome species (A. stenosperma and A. duranensis). High quality reads were assembled to 7723 contigs for A. stenosperma and 12,792 for A. duranensis. This data set was used to design a total of 2325 EST-SSRs, of which a subset of 584 amplified in both species and 214 were shown to be polymorphic using ePCR.
Nagy et al. (2012) reported a high-density genetic map of the diploid species A. duranensis based on SNPs mined from de novo EST sequence generated using Sanger and 454 GS-FLX technologies. More than one million EST sequences generated from normalized cDNA libraries of A. duranensis were assembled into 81,116 unique transcripts. In addition, 1236 EST-SNP markers were developed by mining this dataset between two A. duranensis accessions, and an additional 300 SNP markers were developed from genomic sequences representing conserved legume orthologs. Of the 1536 SNP markers, 1054 were placed on a genetic map (Nagy et al., 2012).
Markers have been developed for peanut based on Miniature Inverted-Repeat Transposable Elements (MITEs) (Shirasawa et al., 2012a) where 504 primer pairs were designed against both flanking sequences of each AhMITE1. Of the primer pairs designed, 240 and 171 generated single and double DNA bands, respectively. The double bands likely were derived from homoeologous regions in the A and B genomes. Of the 411 primer pairs that produced amplicons, 169 showed polymorphisms between the four tested peanut lines.
Recent Advancement in Genetic Linkage Maps
One of the major applications for molecular markers is construction of genetic linkage maps which is required for QTL studies. A more detailed linkage map of all chromosomes and with sufficient markers is necessary for QTL analysis and marker-assisted breeding. Over the past few years an international effort has resulted in progress for developing genetic maps in diploid wild relatives and cultivated peanut (Pandey et al., 2012; Varshney et al., 2013).
The narrow genetic base of cultivated peanut has provided a substantial obstacle to genetic mapping using only cultivated peanuts. Therefore, the initial maps were made using crosses involving wild species. Subsequently, mapping in cultivated × cultivated crosses has advanced considerably (Table 2). The first genetic linkage map of peanut was developed using an F2 population of a cross between A-genome diploids A. stenosperma and A. cardenasii (Halward et al., 1993). The first interspecific hybrid peanut map showing introgression of wild species into the cultivated peanut was developed from A. hypogaea × A. cardenasii using RFLP and RAPD markers (Garcia et al., 1995). A map was later developed using a synthetic tetraploid constructed from Florunner × the synthetic amphidiploid TxAG-6 {A. batizocoi × [A. cardenasii × A. diogoi]}4x (Burow et al., 2001).
The first partial linkage map for A. hypogaea was constructed using an F2 population (Herselman et al., 2004), which had five linkage groups with 12 AFLP markers spanning 139 cM of the genome. The genetic maps of cultivated peanut published by Hong et al. (2008) and Varshney et al. (2009) were the first maps with reasonable numbers of markers. Hong et al. (2008) tested 1048 SSR primer pairs and mapped 131 SSR loci onto 20 linkage groups for a total length of 670 cM on a RIL population between the cultivars Yueyou 13 and Zhenzhuhei. Varshney et al. (2009) screened 1145 SSR markers and mapped 135 loci onto 22 linkage groups spanning 1271 cM onto a RIL population developed from two parental genotypes, TAG 24 and ICGV 86031. Later a composite map containing 175 SSR markers in 22 linkage groups was developed from three cultivated crosses (Hong et al., 2010). Another composite map was constructed with 101 SSR markers in 17 linkage groups from four populations in China (Zhang, 2011).
The SSR-based cultivated genetic map with 135 marker loci developed by Varshney et al. (2009) was then further saturated to 191 SSR loci (Ravi et al., 2011). Two new partial genetic maps with 56 (TAG 24 × GPBD 4) and 45 (TG 26 × GPBD 4) marker loci (Khedikar et al., 2010; Sarvamangala et al., 2011) were constructed covering a genetic distance of 462.24 and 657.9 cM, respectively. These two maps were then saturated with enhanced genome coverage up to 188 (1922.4 cM) and 181 (1963 cM) marker loci, respectively, along with construction of a consensus map based on these two populations segregating for foliar disease resistance with 225 SSR loci and a total map distance of 1152.9 cM (Sujay et al., 2012). In addition to these maps, two additional genetic maps based on RIL populations segregating for traits related to drought tolerance, namely ICGS 76 × CSMG 84-1 (119 SSR loci) and ICGS 44 × ICGS 76 (82 SSR loci), were developed with genome coverage of 2208.2 cM and 831.4 cM, respectively (Gautami et al., 2012a). Since the above three populations (TAG 24 and ICGV 86031, ICGS 76 × CSMG 84-1 and ICGS 44 × ICGS 76) were segregating for drought tolerance related traits, a consensus map (2840.8 cM) with 293 SSR loci was developed. All the parental lines were cultivated genotypes, except for GPBD-4 which is predominantly cultivated with some A. cardenasii parentage derived through doubling the chromosome number of triploid interspecific hybrids to produce hexaploids and then selfing through several generations to recover 40-chromosome progenies (Gowda et al., 2002; Smartt et al., 1978).
More recently, Qin et al. (2012) screened a total of 4576 markers and identified 260 and 181 polymorphic markers, respectively, for the two RIL populations Tifrunner × GT-C20 (T population) and SunOleic 97R × NC94022 (S population). Individual genetic maps were constructed for T and S populations with 236 and 172 marker loci, respectively. An integrated map was then constructed with 324 marker loci covering 1352 cM genetic distance (Qin et al., 2012). In addition to the SSRs mentioned above, Wang et al. (2012) have also used 1152 SSRs mined from 36,314 BAC-end sequences (BES) and constructed a genetic map with a total of 318 markers covering 1674.4 cM in a single mapping population. For the creation of the highest density map of a single population of cultivated peanut to date, Shirasawa et al. (2012b) reported the use of in silico analysis of DNA sequence data from the parental lines, which increased the efficiency of polymorphic marker development by more than three-fold. In total, 926 (34.2%) of 2702 markers showed polymorphisms between parental lines of the mapping population. Linkage analysis of the 926 markers along with 253 polymorphic markers selected from 4449 published markers generated 21 linkage groups covering 2166.4 cM with 1114 loci (Shirasawa et al., 2012b).
In spite of these efforts, development of composite maps based on two or three individual maps has not sufficiently integrated genetic information across studies. An international effort was initiated to place the maximum number of markers on the same genetic map through integrating markers from all individual genetic maps published to date. Marker information from one backcross (BC) population (Foncéka et al., 2009) was also included in the development of a reference consensus map along with other 10 individual genetic maps developed from RIL populations. The first international reference consensus genetic map was constructed with 897 marker loci. These 897 marker loci (895 SSRs and 2 CAPS) could be mapped on 20 linkage groups spanning a total map distance of 3607.97 cM with an average map density of 3.94 cM. More interestingly, this reference consensus genetic map was divided into 203 sets of 20 cM bins, each which carry one to 20 loci with an average of four marker loci per bin. Furthermore, soon after the first dense consensus map published by Gautami et al. (2012b), another joint international research effort has resulted in a much improved consensus genetic map based on 16 mapping populations. The mapping information from five new genetic maps were utilized for improvement of the earlier consensus map from 897 marker loci to 3693 marker loci spanning 2651 cM of the genome and 20 linkage groups (Shirasawa et al., 2013). These dense consensus maps will have greater impact on peanut genetic studies and improvement because of the potential applications such as aligning genetic and physical maps, QTL analysis, genetic background effect on QTL expression, comparative mapping, and other genetic and molecular breeding research in peanut.
Summary
Lack of sufficient molecular markers, genetic linkage maps, and comparative genome sequences for peanut severely hampers peanut genetic improvement efforts as well as marker-trait association studies and functional validation. The international peanut community has come a long way toward achieving significant accomplishments in marker and genetic map development. As SSRs are the markers of choice for many genetic mapping studies, the mapped SSR loci are useful not only for trait-gene/marker association and QTL analysis but also for allocating associated QTLs to map-based cloning of functional gene(s) where numerous ESTs (that could be potential candidate genes) are currently placed on the map. With the development of the next generation SNP markers and completion of whole genome sequences for cultivated peanut and wild relatives anticipated in the near future, the “Orphan Legume Genome Whose Time Has Come” will soon be realized, as reported in this American Peanut Research and Education Society 2012 symposium. One demonstration of the power of marker-assisted breeding in peanut is the conversion of peanut cultivar ‘Tifguard’ (Holbrook et al., 2008) into ‘high oleic Tifguard’ in 26 months (Chu et al., 2011). Additionally, the collaborative and coordinated efforts of the international peanut community since 2004 have contributed to development of large-scale genomic resources and tools to tap into the rich resource of germplasm collections for improvement of peanut for sustainable production, quality, pest resistance and water use efficiency. With the establishment of NGS technology platforms and cost reduction for DNA sequencing, whole genome sequencing and re-sequencing will become a routine task for crop research and improvement in the near future (Varshney and May, 2012). The main issue will be in analyzing data and translating the information to peanut breeding and improvement through discovery of genes governing and molecular markers associated with the important traits.
Acknowledgments
We would like to express our appreciation to U.S. Peanut Industries and the leadership of the Peanut Foundation and American Peanut Council, particularly to Howard Valentine for his efforts to make this PGI and PGC a reality. We also thank Dr. Howard Shapiro of Mars Inc. for initiating discussions with BGI and opening the way for collaborations with Chinese researchers. We are grateful for the financial support from USDA-ARS, Peanut Foundation, Georgia Peanut Commission and U.S. National Peanut Board, Mars Inc., Chinese genome collaborators (Henan Academy of Ag. Sci., Oil Crops Research Institute of Chinese Academy of Ag. Sci., and Hi-Tech Center of Shandong Academy of Ag. Sci.), National Fund of the Basic and Strategic Research in Agriculture (NFBSRA) of the Indian Council of Agriculture Research (ICAR), and the Generation Challenge Programme of CGIAR.