Context.—DNA sequencing is critical to identifying many human genetic disorders caused by DNA mutations, including cancer. Pyrosequencing is less complex, involves fewer steps, and has a superior limit of detection compared with Sanger sequencing. The fundamental basis of pyrosequencing is that pyrophosphate is released when a deoxyribonucleotide triphosphate is added to the end of a nascent strand of DNA. Because deoxyribonucleotide triphosphates are sequentially added to the reaction and because the pyrophosphate concentration is continuously monitored, the DNA sequence can be determined.
Objective.—To demonstrate the fundamental principles of pyrosequencing.
Data Sources.—Salient features of pyrosequencing are demonstrated using the free software program Pyromaker (http://pyromaker.pathology.jhmi.edu), through which users can input DNA sequences and other pyrosequencing parameters to generate the expected pyrosequencing results.
Conclusions.—We demonstrate how mutant and wild-type DNA sequences result in different pyrograms. Using pyrograms of established mutations in tumors, we explain how to analyze the pyrogram peaks generated by different dispensation sequences. Further, we demonstrate some limitations of pyrosequencing, including how some complex mutations can be indistinguishable from single base mutations. Pyrosequencing is the basis of the Roche 454 next-generation sequencer and many of the same principles also apply to the Ion Torrent hydrogen ion-based next-generation sequencers.
A variety of methods are available for sequencing DNA, but Sanger and pyrosequencing are 2 of the most commonly used today. Although both Sanger and Maxam-Gilbert DNA sequencing were invented in 1977, Sanger sequencing has become the most used method of DNA sequencing.1,2 The Sanger method is also known as terminator sequencing because DNA fragments of varying lengths are synthesized by incorporating both nucleotides and dideoxy terminators (deoxyribonucleotide triphosphates [dNTPs] and dideoxynucleotide triphosphates [ddNTPs], respectively). Random incorporation of the ddNTPs causes chain termination that produces DNA fragments of every possible length. In a more recent adaptation, each ddNTP (A, C, T, or G) carries a unique, fluorescent molecule, such that the extension products are both terminated and labeled with the appropriate fluorophore.3 Terminated products must be purified from unincorporated ddNTPs, and the fragments are subsequently separated by size using capillary electrophoresis, in which the terminal nucleotide of each fragment is detected by fluorescence at wavelengths unique to each of the terminators. Polymerase chain reaction (PCR) amplification and DNA sequencing reactions are most commonly run separately, although they can be combined into a single reaction.4,5 Read lengths have increased for Sanger sequencing, and 800 base reads can now be achieved routinely.6
Pyrosequencing is designated as a sequence-by-synthesis technique because DNA synthesis is monitored in real time. It is based on the pioneering and elegant, basic science work of Pål Nyrén, PhD, who first demonstrated in 1987 that DNA polymerization can be monitored by measuring pyrophosphate production, which can be detected by light.7 Edward Hyman, PhD, capitalized on Dr Nyrén's work to invent pyrosequencing 1 year later,8 although it took several more years to be fully commercialized and more-widely implemented.9,10 After an oligonucleotide is annealed to the template strand of DNA to be sequenced, a DNA polymerase synthesizes DNA by extending the 3′ end of the nascent strand using the information encoded in the template strand. During pyrosequencing, dNTPs are sequentially dispensed into the chamber containing the template with the primer and DNA polymerase bound. When the correct complementary dNTP is injected and added by the polymerase, inorganic pyrophosphate (PPi) is released during the condensation reaction as shown below, where n is the number of nucleotides in the nascent strand and H+ is a hydrogen ion8,11 :
Through a sequence of another 2 reactions, the released pyrophosphate is converted into adenosine triphosphate (ATP), a cofactor for the enzyme luciferase, oxidizing luciferin to oxyluciferin and light.
where APS is adenosine 5′-phosphosulfate, SO4−2 is sulfate, AMP is adenosine monophosphate, CO2 is carbon dioxide, and hv is light.
Thus, light emission is proportional to the amount of pyrophosphate produced, which is directly proportional to the number of nucleotides added. Whether a given dispensed dNTP can be incorporated, apyrase catalyzes the degradation of excess dNTPs via the following reaction and before the next dNTP is dispensed12,13 :
where dNMP is deoxynucleoside monophosphate, and Pi is inorganic phosphate.
Before the actual pyrosequencing, the region of interest is first amplified via PCR using a reverse primer that is biotinylated (Figure 1, A). This allows firm immobilization of the PCR products onto beads coated with streptavidin through extremely tight avidin-biotin bonding (Figure 1, B).14 Because the beads are magnetic, they can be immobilized against the wall of a tube by the application of a magnetic field. This allows purification of the template strand (biotinylated, bottom strand) of the PCR product after denaturation and washing (Figure 1, C). The strand attached to the bead becomes the template strand for primer binding and undergoes the 4-enzyme pyrosequencing reaction when the correct dNTP is added (deoxycytidine triphosphate [dCTP]; Figures 1, D, and 2). Thus, the complementary, nascent DNA strand is synthesized on the template strand attached to the bead. An advantage of using the antisense strand as the template strand is that the DNA sequence produced is sense sequence. The DNA polymerase used in pyrosequencing is the Klenow fragment (bacterial exonuclease I with the exonuclease function deleted). This enzyme is used as it was found to empirically decrease background spurious signals.15,16
Pyrosequencing is fundamentally different from Sanger sequencing in that bioluminescence results from strand elongation in real time, whereas, with Sanger sequencing, fluorescence is detected as a separate step after chain termination. Pyrosequencing, with its low coefficient of variation, is inherently more quantitative: Different dNTPs generate similar peak heights following single incorporation events, and the peak height generated from incorporating 3 dNTPs is 3 times the signal from incorporating one of the same dNTPs.17 It also has a superior limit of detection (∼5% versus ∼20% for Sanger) of mutant alleles, but the read length is shorter, typically 100 to 400 bases.17,18 A comparison of pyrosequencing, Sanger sequencing, and next-generation sequencing is shown in the Table.
In 2005, Rothberg and colleagues19 developed the first “massively parallel” “next-generation sequencer” based on the pyrosequencing reaction. Single DNA molecules are first “painted” onto the surface of individual beads. They are then clonally amplified in an emulsion PCR, where each molecule is clonally amplified in its own aqueous bead-containing droplet, separated from its neighbors by oil. The amplicon-coated beads are then deposited into picoliter-sized wells and pyrosequenced in parallel. In another method of next-generation sequencing, produced by the company Ion Torrent Systems Inc (Guilford, Connecticut), the released hydrogen ion (equation 1) is measured, instead of the pyrophosphate. When a nucleotide is incorporated, the hydrogen ion released is detected by a corresponding drop in pH in micron-sized wells designated as “the world's smallest pH meters.”19
Given the widespread and growing use of pyrosequencing in the clinical molecular diagnostic laboratory, there is a parallel need to train new users—technologists, residents, fellows, pathologists, and physicians from other disciplines—in basic pyrosequencing and pyrogram interpretation. In this article, we demonstrate the fundamental principles of pyrosequencing, specifically showing how (1) a peak is generated only when the correct nucleotide is dispensed and incorporated, (2) peak heights are proportional to the number of nucleotides incorporated, (3) different dispensation sequences can produce different pyrograms, (4) suboptimal dispensation sequences can mask mutations, and (5) nonoptimized dispensation sequences can produce overly complex pyrograms. We show these concepts via theoretical pyrograms generated by Pyromaker (http://pyromaker.pathology.jhmi.edu, accessed October 1, 2012, Johns Hopkins University School of Medicine, Baltimore, Maryland).20 A tutorial demonstrating these concepts is posted on the Pyromaker Web site that also contains the free Pyromaker software. Other reviews, primarily focused on the biochemistry of pyrosequencing, have been published elsewhere.11,13,21,22
MATERIALS AND METHODS
Pyromaker (http://pyromaker.pathology.jhmi.edu) is an R script that accepts user inputs through a Web-page interface. These inputs include the wild-type DNA sequence, the mutant DNA sequence, dispensation order, the percentage of mutant bearing cells (eg cancer cells), and whether the mutation (or single nucleotide polymorphism) is present in the heterozygous or homozygous state. It then determines the response as various dNTPs are dispensed and maintains the position of the polymerase on each respective strand of DNA. It simulates real pyrograms in that the left edge of the peak is nearly straight vertical, whereas the right edge of the peak contains a tail, most likely indicating that the enzymes that convert pyrophosphate to light are not instantaneous. In pyrosequencing, the dispensation order is defined as the order in which individual dNTPs are sequentially injected into the chamber containing the pyrosequencing reaction, and they are represented on the x-axis as single letters (eg, C for dCTP, etc).
In pyrosequencing, the natural deoxyadenosine triphosphate (dATP) results in false signals because, like ribose adenosine 5′-triphosphate (rATP), it is a substrate for luciferase.13 Accordingly, the dATP analog, deoxyadenosine α-thio triphosphate (dATP-α-S), is used in lieu of dATP, but that produces a higher peak than the other dNTPs, which needs to be considered when comparing homopolymers of A that are equal in length to the other nucleotides.23 This relative peak height (A versus C, G, or T) is incorporated into Pyromaker and can be appreciated in many of the pyrograms shown below (eg, Figure 3).
Presence or Absence of Peaks Following Dispensation of a dNTP Indicates the Sequence
If we imagine the sequencing template and primer shown in Figure 3 (shaded box, left), the extension of the primer will generate 5′–ACGT–3′. Because this is done in real-time where the x-axis of the graph is time, the first 3 nucleotides dispensed (deoxythymidine triphosphate [dTTP], deoxyguanosine triphosphate [dGTP], and dCTP) do not elongate the growing strand because the first complementary nucleotide is dATP. Thus, there are no peaks on the pyrogram for these dispensed dNTPs (abbreviated T, G, and C in the graph) because they could not be incorporated into the growing strand. However, when dATP is dispensed, it is incorporated, pyrophosphate is released, and light is emitted. The light is detected by a charge-coupled device sensor and is represented by peaks in the pyrogram.21 The subsequent injection of dCTP also results in incorporation and light, producing the extended product shown (shaded box, right). When the following 2 bases, dTTP and dATP, are dispensed, they cannot be used to extend the nascent DNA strand, whereas peaks are seen when dGTP and dTTP are dispensed. Accordingly, the sequence of DNA can be determined from the light pattern that results from serially dispensing dNTPs into a chamber containing polymerase bound to a primer-bearing DNA template molecule.
Peak Heights Are Proportional to the Number of Nucleotides Incorporated
The height of the peak is proportional to the number of identical bases of a homopolymeric run as they are “simultaneously” incorporated into the elongating DNA strand during a single dispensation.13 The pyrogram of codons 12, 13, and 14 of KRAS demonstrates the proportionality between peak height and homopolymer length (Figure 4). The first G peak of codon 12 is twice as high as the subsequent T peak because the G peak represents the light produced from the extension of 2 dGTPs (2×), whereas the T peak represents the incorporation of only one dTTP (1×) into the elongating strand. This is also seen with codon 13 (GGC), but not for codon 14 (GTA), where all nucleotides are present as single incorporation events. Pyrosequencing is more accurate at detecting a difference between low numbers of mononucleotide bases, such as one dATP versus 2, as opposed to the difference between 8 and 9 dATPs. This is an inherent limitation of pyrosequencing.
Dispensation Order Is Important
The dispensation sequence for pyrosequencing can affect how the pyrogram will appear. Different sequences will produce nonidentical pyrograms for the same simple mutation, defined as a single base substitution. Two options for dispensation sequences exist: cyclic or optimized, based on the unique order of the bases in the region of DNA being sequenced.10 A programmed sequence is more efficient at detecting mutations and facilitates a longer read length, a faster readout, and less out of phase shifts. Of course, for unknown target sequencing, such as whole genome sequencing, the only option is cyclic dispensation.
In comparing 2 pyrograms that detect the same mutation, Figure 5, A, uses a sequence that is optimized for KRAS, whereas Figure 5, B, uses a cyclic dispensation sequence (A, G, C, and T repeated in that order). Note how the 2 different dispensation sequences create different pyrograms for this simple codon 12b KRAS mutation (GGT→GAT). The optimized dispensation sequence generates a cleaner pyrogram with an easily identifiable mutant peak at the second dATP dispensed (Figure 5, A). On the other hand, the use of a cyclic dispensation sequence results in a complex pyrogram with 6 novel peaks, making the pyrogram extremely difficult to analyze (Figure 5, B). The presence of additional peaks indicates that the wild-type and mutant elongating strands are out of phase with one another (discussed below).
Suboptimal Dispensation Can Mask Mutations
When creating an optimized dispensation sequence, it is important that the sequence does not mask any mutations that might be present. Pyrosequencing is unable to produce pyrograms that distinguish between some simple and complex mutations, if the dispensation order is not optimal. A complex mutation consists of 2 or more base substitutions, not necessarily consecutive. In clinical pyrosequencing, the percentage of mutant cells is not known with complete accuracy, whereas with Pyromaker, the user can define a precise percentage of mutant cells. Thus, a sample with a complex mutation and a sample with a simple mutation in the same codon can produce identical pyrograms when the percentage of cancer cells is not identical between the 2 samples. For example, the complex mutation in codon 12 of KRAS, GGT to AAT, results in a pyrogram (Figure 6, A) that is identical to the pyrogram for the simple codon 12a KRAS mutation, GGT to AGT (Figure 6, B). This is because the percentage of tumor cells is 25% with the AAT mutation and 50% with the AGT mutation. An alternative dispensation order could be designed to distinguish these mutations, and that highlights the importance of knowing all the mutations that may occur in a given region when optimizing the dispensation sequence.
Phase Affects Ease of Interpretation
Phase is defined as the relative positions of the DNA polymerase molecules on 2 different DNA template strands and is affected by dispensation order. The polymerase molecules are in phase when they are aligned at the same base location on both the wild-type and the mutant sequences and are out of phase when they are not aligned. When the molecules are out of phase, it suggests that 2 dissimilar molecule species are being simultaneously sequenced because of either a mutation or a polymorphism. In contrast to cyclic dispensation, an optimized dispensation sequence is more effective at keeping the mutant, such as codon 12b KRAS, and wild-type molecules in phase at the base locations that are identical in both samples while allowing the base locations that are truly dissimilar to be out of phase (Figure 5, A).
For codons 12, 13, and 14 of wild-type KRAS and 12b KRAS mutant (G→A), both molecules are elongated when dGTP is dispensed, but the nascent strands are out of phase because the strand replicating the wild-type allele incorporates 2 dGTPs, whereas the one replicating the mutant allele only incorporates one dGTP (Figures 5, A, and 7, A). The subsequent addition of dATP brings the molecules back in phase because the polymerase replicating the wild-type allele does not advance, whereas the one for the mutant advances by the one nucleotide. Thereafter, the polymerases replicating the wild-type and mutant alleles remain in phase for the rest of the pyrogram. Because the molecules are in phase for most of the pyrogram, it is easy to identify the simple mutation by the peak at the second dATP dispensed. A detailed, base-by-base extension of this pyrogram shows the location of in-phase and out-of-phase molecules as sequencing occurs (Figure 7, A).
In contrast, using an AGCT cyclic dispensation sequence for the identification of the same simple codon 12b KRAS mutation generates a more-complex pyrogram (Figures 5, B, and 7, B). The presence of more peaks at varying heights in comparison to the pyrogram of the same simple mutation, which uses an optimized dispensation sequence, is due to the wild-type and mutant molecules continuously being out of phase; because the polymerase is incorporating nucleotides in an unsynchronized manner, many of the peaks represent nucleotides incorporated by only one allele. Therefore, out-of-phase sequencing products can lead to complex pyrograms with more ambiguous peaks.
A detailed analysis of the pyrogram in Figure 5, B, depicting base-by-base extension demonstrates how a cyclic dispensation sequence causes the polymerase to remain out of phase (Figure 7, B). The cyclic dispensation sequence employed is AGCT. The first nucleotide dispensed, dATP, is not incorporated into either nascent strand, whereas the next one, dGTP, is incorporated twice on the strand for the wild-type allele and only once on the strand for the mutant allele, causing the polymerase molecules to go out of phase. The dCTP produces no peak, whereas the dTTP is incorporated by the elongating nascent strand from the wild-type allele, moving the polymerase that is replicating the wild-type allele out of phase with the polymerase replicating the mutant allele by 2 nucleotides. The second dATP dispensed is incorporated by the nascent strand for the mutant allele, yet the polymerase molecules remain out of phase. Because the dispensation is cyclic, once the sequencing products get out of phase, they are usually unable to get back in phase for the rest of the sequencing. This shows how a cyclic dispensation sequence can generate suboptimal pyrograms for even a simple mutation, especially one located at the beginning of the DNA sequence.
In this article, we demonstrate the unique features of pyrosequencing, a sequence-by-synthesis method of DNA sequencing, with the advantage of real-time analysis. Even though the pyrosequencing read length has an upper limit of approximately 400 bases, it is more quantitative and has a superior limit of detection (5%) compared with conventional Sanger sequencing. It reports the incorporation of nucleotides base by base by converting the production of pyrophosphate into light. The amount of pyrophosphate produced is proportional to the number of identical bases incorporated in a homopolymer, represented by the height of the peaks in pyrograms. Pyromaker is a free online software tool that generates pyrograms from DNA sequences entered by users.
In this article, we used Pyromaker to demonstrate the salient fundamentals of pyrosequencing. Some principles include (1) the incorporation of a correct nucleotide to generate a peak, (2) the proportionality between peak heights and the number of nucleotides incorporated, (3) the importance of dispensation sequence, and (4) the limitations of pyrosequencing, such as the masking of mutations and the generation of overly complex pyrograms. Despite its limitations, pyrosequencing employs unique features that make it advantageous over other sequencing methods for certain situations, and we support its continued and improved use in biomedicine.
The primary application of pyrosequencing is short reads, where a superior limit of detection than can be provided by Sanger sequencing is preferred. Pyrosequencing in a clinical molecular diagnostics laboratory requires technologists capable of performing high-complexity testing and a pyrosequencing instrument (commonly PyroMark Q24 and PyroMark Q96, Qiagen, Germantown, Maryland; Roche 454 and GS Junior, Roche Diagnostics, Indianapolis, Indiana). One common application is sequencing oncogenes in tumors, which are DNA mixtures of malignant and normal stromal cells, where the tumor cell percentage can be low, for example, only 10%. Pyrosequencing is a valuable tool to solve ambiguous Sanger sequencing results, such as differentiating between one dinucleotide substitution and 2 adjacent single-base substitutions, and between cis- and trans- configurations of closely juxtaposed mutations. Additional applications include human genetics testing and promoter methylation analysis using the relative degree of methylation between different CpG (cytosine–phosphate–guanine) sites.
The Pyromaker software is useful for other applications. We previously used it to demonstrate the expected pyrograms for all reported KRAS mutations, and these can serve as reference patterns to compare with clinical results.20 We also demonstrated its utility as a tool to help resolve complex “uninterpretable” clinical pyrograms using 2 methods. In the hypothesis testing method, virtual pyrograms are generated for different hypothesized mutations, and those patterns are matched to the unknown actual pyrogram. In the iterative mode, the user starts with the wild-type sequence and titrates in the minimum mutation to create the first change seen in the pyrogram and then iteratively titrates in additional mutations to eventually reproduce the experimental result.
Pyrosequencing is an especially valuable tool for oncogene detection in tumors, for methylation analysis, for identifying ambiguous Sanger sequencing mutations, and for confirming mutations identified using next-generation sequencing methods.
We thank Chris Gocke, MD, Marija Debeljak, BS, Alexis Norris, BS, and Katie Beierl, BS, from John Hopkins University School of Medicine for helpful discussions.
The authors have no relevant financial interest in the products or companies described in this article.