Context.—Monitoring minimal residual disease by quantitative reverse transcription polymerase chain reaction has proven clinically useful, but as yet there are no Food and Drug Administration–approved tests. Guidelines have been published that provide important information on validation of such tests; however, no practical examples have previously been published.
Objective.—To provide an example of the design and validation of a quantitative reverse transcription polymerase chain reaction test.
Design.—To describe the approach used by an individual laboratory for development and validation of a laboratory-developed quantitative reverse transcription polymerase chain reaction test for BCR-ABL1 fusion transcripts.
Results.—Elements of design and analytic validation of a laboratory-developed quantitative molecular test are discussed using quantitative detection of BCR-ABL1 fusion transcripts as an example.
Conclusions.—Validation of laboratory-developed quantitative molecular tests requires careful planning and execution to adequately address all required analytic performance parameters. How these are addressed depends on the potential for technical errors and confidence required for a given test result. We demonstrate how one laboratory validated and clinically implemented a quantitative BCR-ABL1 assay that can be used for the management of patients with chronic myelogenous leukemia.
For hematologic malignancies, the term minimal residual disease (MRD) can be defined as the detectable disease burden that is below the level of detection of available conventional diagnostic methods, including cytomorphology, chromosome analysis, and fluorescence in situ hybridization. Detection of MRD requires a limit of detection (LOD) of 1 malignant cell in a background of 1000 normal cells or better. Several studies have demonstrated the clinical utility of monitoring MRD.1–5 Presently the 2 primary methods used for MRD detection for hematologic malignancies are flow cytometry for leukemia-associated aberrant phenotypes and quantitative detection of certain molecular biomarkers, most commonly performed by quantitative polymerase chain reaction. The molecular targets can include clonal gene rearrangements, recurrent gene mutations, or aberrant gene expression of fusion gene transcripts. For acute myelogenous leukemia and acute lymphoblastic leukemia, MRD can be monitored by flow cytometry or quantitative detection of various molecular biomarkers, each with advantages and disadvantages. For chronic myelogenous leukemia (CML), options are generally limited to the quantitative detection of the fusion transcripts of BCR-ABL1. Despite certain methodologic challenges, reverse transcription with quantitative polymerase chain reaction (qRT-PCR) for BCR-ABL1 has proven to be clinically useful and is now routinely offered by many clinical laboratories. Although several excellent guidelines have been published on the validation of laboratory tests and specifically on the validation of this test,6–8 the goal of this paper is to provide practical information on the development and validation of a quantitative laboratory-developed PCR assay using qRT-PCR for BCR-ABL1 as an example. The steps involved in the validation of any clinical test include (1) familiarization and planning to define the intended use and requirements of the test, (2) generation of validation data to document that the test meets predetermined performance criteria, and (3) implementation of the test in line with its intended use.
FAMILIARIZATION AND PLANNING
An important first step in adding a new assay to a clinical laboratory menu is to define the intended use of the test, which in turn will define the expected performance requirements.6 For a laboratory-developed test, this requires a review of the literature to identify the clinical and analytic criteria, including clinical utility, appropriate types of specimens, and available techniques. Early in the planning stage, the director should also discuss the proposed test with the potential ordering physicians to better understand the intended use, clinical indications for testing, required turnaround time, optimal reporting format, and estimated test volume.
In the case of CML, the intended use of MRD detection is to identify patients with suboptimal response to therapy or loss of response. A review of the literature indicates that patients with a 3-log reduction in transcript levels from laboratory median baseline had a better 5-year event-free survival than patients who did not.4,5 A 3-log reduction corresponds to 1 malignant cell in a background of 1000 normal cells and has been called a “major molecular response.” 4 A recent review recommends an LOD of at least a 4-log reduction for monitoring MRD in CML.7
The options for testing MRD in CML are limited to qRT-PCR to detect the BCR-ABL1 fusion transcript. There are no Food and Drug Administration–approved tests currently available for BCR-ABL1 qRT-PCR, although a few research-use-only kits and analyte-specific reagents are available. Therefore, a laboratory that offers this test is responsible for developing the test and establishing all required analytic performance characteristics.
The test is intended to determine the relative number of BCR-ABL1 RNA transcripts as compared to the number of transcripts of a control gene. The test involves reverse transcription of RNA transcripts to complementary DNA followed by amplification using polymerase chain reaction and concurrent real-time detection of accumulating amplicon using a fluorescent reporter molecule. For clinical testing, the fluorescent detection typically is done with various types of sequence-specific probes (eg, hydrolysis or hybridization probes). The cycle number at which the fluorescence exceeds the background threshold is called the crossing point or crossing threshold. The crossing threshold correlates to the initial concentration of target. A standard curve that plots the crossing threshold value against the known copy number or dilution factor of BCR-ABL1 transcript can be generated by diluting a reference material (eg, cell line RNA or complementary DNA) or plasmids of known copy number containing the fusion transcript and performing qRT-PCR analysis. The corresponding concentration of BCR-ABL1 transcript in the unknown patient sample can be determined from the standard curve.
The complexity of this test and the lability of RNA make this assay challenging. Recent studies have shown considerable variability in BCR-ABL1 MRD results among laboratories even when testing the same samples.9,10 Technical variations that may account for this variability include the amount and integrity of RNA extracted as well as the efficiency of reverse transcription and polymerase chain reaction amplification. In an attempt to correct for this, most laboratories normalize BCR-ABL1 transcripts to control gene transcripts to generate a ratio. Ideally, the control gene that is used should be expressed at a constant level regardless of sample type, treatment regimen, or disease status. In addition, the ideal control gene should be expressed and degraded in a manner similar to the target transcript, BCR-ABL1. Although there is no ideal control gene, recent studies have identified ABL1, BCR, and GUSB as acceptable for use as a control gene.11–14
Sample types that are commonly used for MRD detection in CML are peripheral blood or bone marrow. Validation data should be available for each specimen type that will be accepted for analysis. Although BCR-ABL1 transcript levels in blood and bone marrow may be concordant for the majority of patients, for some patients there are significant differences between peripheral blood and bone marrow transcript levels.10,15,16 Therefore, it is recommended that only one specimen source be used for monitoring response to therapy for any given patient.15,16 Peripheral blood is preferred because of the less invasive nature of specimen acquisition.
Currently there are no calibration standards and no gold standard test available for comparison of results between laboratories. Most clinical laboratories report results in one or more ways. For a given sample, the ratio of BCR-ABL1 transcript to control gene transcript is measured. This ratio is usually reported as the change relative to the ratio obtained from the patient's diagnostic (baseline) sample, to the ratio obtained from the patient's most recent sample, and/or to the laboratory median value of 30 or more patient samples at diagnosis. Recently there has been a concerted effort to establish an international scale (IS) for reporting, similar to the international normalized ratio standardization for prothrombin time.17 This would allow the results from different laboratories to be more easily compared. The IS concept was introduced in 2003 with the publication of results of the International Randomized Study of Interferon Versus STI157.4 At that time the testing centers used the median from the same 30 samples at diagnosis as the baseline. The BCR-ABL1 transcript to control gene transcript ratio was then reported as the change relative to that baseline. Using this approach to normalization, it was shown that a 3-log reduction was associated with a significantly better clinical outcome. Subsequently, an international consortium met in Bethesda and proposed the IS, and recommended the use of 1 of 3 control genes (ABL1, BCR, GUSB). These genes were chosen because they were in common use, showed the least variability in transcript levels across samples, and degraded at approximately the same rate as BCR-ABL1.7 The IS categorizes levels of response as (1) a complete cytogenetic response, which corresponds to an approximately 2-log reduction or 1% residual disease, (2) a major molecular response, which corresponds to a 3-log reduction or 0.1% residual disease, and (3) a complete molecular response, which corresponds to a 4-log reduction or 0.01% residual disease. Presently, the only mechanism by which a laboratory can use the IS for reporting is by exchanging samples with an IS reference laboratory. However, recently a panel was developed comprising 4 different dilution levels of lyophilized cell lines. The expectation is that manufacturers will use these to develop secondary reference materials so that the IS standardization process might be simplified.18
ANALYTIC VALIDATION
The second step in the validation process is to write the validation protocol, generate the data, and summarize the results. The Clinical Laboratory Improvement Amendments of 1988 defines the performance characteristics that need to be addressed for a laboratory-developed test. These include accuracy, precision, reportable range, reference range (normal range), analytic sensitivity (LOD and limit of quantification [LOQ]), and analytic specificity (including interfering substances), as well as any other parameter that may be considered important. The Molecular Pathology Checklist of the College of American Pathologists (CAP) also requires clinical performance characteristics to be addressed (www.cap.org; accessed 1/5/2011).
The validation protocol should consider all required performance characteristics and establish acceptance criteria before generating validation data. Careful preparation of the validation protocol will ensure that all performance characteristics are addressed in an efficient manner. For this assay, the validation scheme was designed to assess the required performance characteristics during a 20-day period (to meet the recommendations of CLSI document EP5-A2).19 The validation plan is illustrated in Figure 1.
Controls were prepared as follows: total RNA from diagnostic patient samples were pooled and serially diluted into total RNA from normal blood samples. Dilutions were made to cover the expected analytic measurement range, LOD, and LOQ (ie, dilutions from 100 to 10−5). Repeated measurements of these concentrations were used to establish bias, precision, linearity, LOD, and LOQ (as described in CLSI documents EP9-A2, EP5-A2, EP6-A, and EP17-A).19–22
Samples for validation were collected through splitting samples and/or sample exchanges with other laboratories. Thirty diagnostic samples were used to establish the laboratory's median value for diagnostic samples. Patient follow-up samples were included and also exchanged with a laboratory that uses the IS to establish a conversion factor for future reporting using the IS. Reporting using the IS was initiated after the conversion factor was verified with another set of 25 samples 6 months later. The controls and samples were extracted and tested together as unknowns.
All samples and controls were performed in duplicate after the extraction of RNA because of the variability associated with reverse transcription.16 Each day, 8 samples were tested in duplicate (both BCR-ABL1 and control gene). At the end of 20 days, there were data on 22 negative samples and 20 controls at 10−4 dilution and 10−5 dilution that were used to establish the LOD and LOQ of the assay. The data obtained from the 10 controls at dilutions of 100, 10−1, 10−2, 10−3, and 10−4 (at least 5 analyses of each dilution from each technologist) were used to establish bias, precision, and linearity. After completing the validation, controls were subsequently used as quality controls to assess the ongoing performance of clinical runs. Data on a total of 80 clinical samples were obtained: 30 diagnostic samples (for establishing the median value of diagnostic samples), 22 fusion-undetectable samples, and 28 follow-up samples corresponding to approximately 10% or less residual disease (for exchange with IS reference laboratory).
ACCURACY
For quantitative tests, accuracy is defined as the closeness of agreement between the measured value and the value that is accepted either as a conventional true value or an accepted reference value.23 Accuracy is a measure of the total analytic error (TEa), which is the combination of systematic error (SE, also known as bias) and random error (RE, usually represented by coefficient of variance or CV).23 For the BCR-ABL1 test, SE and RE were evaluated by measuring a series of control samples generated by diluting total RNA extracted from diagnostic blood samples with total RNA extracted from normal blood samples. Six control samples with the following dilutions were analyzed: 100, 10−1, 10−2, 10−3, 10−4, and 10−5. Systematic error was determined by measuring log reduction from the laboratory diagnostic median value at each of the 6 dilutions ranging from 100 to 10−5. Random error was estimated by calculating the variation of repeated measurements as discussed below. Total analytic error was estimated by combining SE with RE to find the (2-sided) 95% confidence interval relative to the expected value at each level. The Deming linear regression is shown in Figure 2. Visual inspection shows that the data are linear (regression line near parallel to the equality line). This was confirmed by performing an F test (CLSI document EP10-A224). Estimates of bias (SE), imprecision (RE) and TEa are shown in Table 1. The expected value of each control sample (log reduction) is determined by the dilution factor. The difference between the mean observed log reduction and the expected log reduction is the bias (SE). The standard deviation about the observed mean is the imprecision (RE). The TEa was calculated as the bias ± F * SD where F is a factor between 1.96 and 1.65 (see supplemental material file at Jennings et al supplemental material at http://www.archivesofpathology.org in the January 2012 folder). The coefficient of TEa (total error as a percentage of the expected value) was calculated by dividing the TEa by the expected value. As shown, the 95% confidence for TEa ranges from 0.15 to 0.47 log values. This indicates that a change of up to half a log value could be expected by analytic variation alone (at least at the lower concentrations). These findings are consistent with the assertion that changes of half a log or more are required before a change can be attributed to biological variation.25
The results of the 30 diagnostic samples and 22 fusion-undetectable samples were compared to results on the same samples obtained from chromosome analysis and/or fluorescence in situ hybridization. The qualitative results were 100% concordant with the comparative tests (data not shown). The average and median values of the diagnostic samples are shown in Table 2. The results of the 28 follow-up samples were compared to those from the IS reference laboratory. As shown in Figure 3, the quantitative results correlated well with the reference laboratory results, although there was a constant bias through the range of measurements.
A conversion factor that corrects for this constant bias was determined using a Bland-Altman bias plot26 (Figure 4). After correction using the conversion factor, an acceptable bias requires that 95% of samples fall within 5-fold and the majority of samples fall within 2-fold of the expected value from the reference laboratory.27 If a laboratory's results fail to meet this requirement, then additional samples should be sent to the reference laboratory for testing. If a laboratory's results meet this requirement, as in this example, the laboratory then exchanges 25 more samples after approximately 6 months to verify that the conversion factor was accurate.
PRECISION
Precision of measurement is defined as the “closeness of agreement between independent test results obtained under stipulated conditions,” 23 and is a measure of the random analytic error. As discussed above, precision data were obtained from repeated measures of 6 control samples (undiluted to 10−5 dilutions). The tests were run by 2 technologists during multiple days to include multiple sources of analytic variation (ie, between technologists, runs, and days). The precision data are shown in Table 3. As illustrated, the mean and standard deviation by technologist do not differ significantly, which was verified by Student t test and F test, respectively.
REPORTABLE RANGE
The reportable range is defined as the range of values “over which the acceptability criteria for the method have been met; that is, where errors due to nonlinearity, imprecision, or other sources are within defined limits.” 21 In other words, the reportable range is the range through which total error is within acceptable limits. As shown in Table 1, the TEa at 95% CI (TEa95) is within 0.15 to 0.47 log values through 4-log reductions. These values are within the allowable error for this test (which is <0.5 log) and therefore the reportable range is 100 to 104 log reductions from the diagnostic median.
REFERENCE RANGE
The reference range is the range of values in a normal or reference population. It should be noted that even in a normal population the BCR-ABL1 fusion transcript can be detected by using nested polymerase chain reaction.28 However, at the LOD for this assay, BCR-ABL1 fusion transcript would not be detectable in a normal population. Therefore, the reference range would be “undetectable” with a statement of the LOD.
ANALYTIC SENSITIVITY (LOD AND LOQ)
The LOD is the lowest amount of analyte that is distinguishable from background (a negative sample) with a stated level of confidence. The level of confidence is usually set at 95%, which indicates that at least 95% of the time the given concentration could be distinguished from a negative sample. For this test, the LOD was determined by repeated measures of highly dilute samples (dilutions 10−4 and 10−5) and comparing these to known fusion negative samples. The 10−5 dilution was consistently negative whereas the 10−4 was consistently positive, indicating a LOD of 104-log reduction from the diagnostic median.
The LOQs are the lowest and highest concentrations of analyte that can be quantitatively determined with acceptable TEa. The LOQ was estimated as the dilution that could be quantified with a TEa of half a log or less relative to its expected value. The lower LOQ and LOD are both 104-log reduction from the diagnostic median.
ANALYTIC SPECIFICITY
To assess the analytic specificity of the test, 22 leukemia samples were run that were known to be BCR-ABL1 fusion transcript negative (by chromosome analysis and/or fluorescence in situ hybridization). All 22 samples were negative (100% specificity, 95% CI 85%–100%). Interfering substances may affect amplification efficiency. However, this assay is designed to assess target amplification as compared to a control gene and it is expected that any effect on efficiency from interfering substances will affect target and control gene amplification equally.
RECOMMENDED STANDARDS, CONTROLS, AND CALIBRATORS
BCR-ABL1 to BCR ratios and log reduction values for clinical specimens and controls are determined using a standard curve generated from complementary DNA dilutions derived from a cell line. Controls are run to ensure the ongoing accuracy of the method. Recent studies suggest that it does not make a difference if complementary DNA, total RNA, or plasmid DNA is used as reference material to generate a standard curve.9,10 A standard curve is highly recommended but not essential, because laboratories can calculate the relative concentration of target to control gene if the amplification efficiency is known or can be estimated. It is important that if the amplification efficiency is estimated, the relative efficiency of the target and control gene be close.29 Also, it is necessary to periodically verify the TEa and analytic measurement range (called calibration verification) at least once every 6 months and with any significant change in the test system (such as new reagent lots). This verification can be performed by running 2 controls that span the clinical decision threshold (eg, 10−2 and 10−4). It should be noted that controls must have a biological matrix appropriate for the clinical specimens.
Controls should be processed with the patient samples to ensure the performance of each step in the process. Because this test reports both a qualitative and quantitative result, a minimum of 3 controls are included with each run.16 A negative control is always necessary to detect problems with contamination and false positive reactions. Positive controls at 2 concentrations (high and low—near or at the LOD) are necessary to detect false negative results as well as loss of precision and linearity. Controls can be used for calibration verification if they have the appropriate biological matrix and are external to the system (ie, not part of the kit). Reference standards are currently being offered by commercial companies and may prove useful in this regard.
The original controls developed for the validation may be used for both the high and low controls as well as for calibration verification. These, along with the negative control, may be used to assess the analytic performance of all the steps of the test with the exception of the extraction step. Extraction is assessed by the amplification of the reference gene (eg, BCR) as well as by other methods to assess purity, quantity, and integrity (eg, spectrophotometry, gel electrophoresis). For this assay, a run is rejected if a control value is more than 3 SD from the expected mean, or a control value is more than 2 SD from the expected mean for 2 consecutive runs.
CLINICAL SENSITIVITY/SPECIFICITY
The studies described above address mainly the analytic performance characteristics of the assay. The Clinical Laboratory Improvement Amendments do not require laboratories to assess the clinical performance characteristics of an assay such as clinical sensitivity and specificity. However, the Clinical Laboratory Improvement Amendments do require clinical laboratories to have a clinical consultant who can “render opinions to the laboratory's clients concerning the diagnosis, treatment, and management of patient care” (http://wwwn.cdc.gov/clia/regs/subpart_m.aspx#493.1455, accessed 2/19/2011). This implies that the clinical consultant must understand the clinical performance characteristics and clinical significance of the tests offered by the laboratory. In addition, the CAP Molecular Pathology Checklist requires that laboratories establish the clinical performance characteristics of a laboratory-developed test, either by performing their own studies or, if that is not easily feasible, by citing studies that have addressed the clinical performance characteristics and clinical significance of the test. For this validation, several large published studies were cited that demonstrated that the presence of BCR-ABL1 fusion transcript in the appropriate clinical context is diagnostic of CML or that a major molecular response (>3-log reduction in relative number of BCR-ABL1 transcripts) is associated with an excellent prognosis.4,5,30
IMPLEMENTATION
Implementation involves several steps to incorporate the test into the workflow of the laboratory.6 The procedure was completed by adding calculations using the conversion factor to report using the IS, guidance for analytic and clinical interpretation, and references. The procedure together with worksheets and report templates were reviewed and signed by the director. Clinicians were educated, participation in proficiency testing was established, and the accrediting organization was informed that testing was now available. In addition, plans were made to send 25 additional samples to the reference laboratory for verification of the IS conversion factor at 6 months and every subsequent year until reference materials made from calibrators standardized against the IS are available.
In summary, we provided a detailed description of how one laboratory validated and clinically implemented a quantitative BCR-ABL1 assay that can be used for the management of patients with CML. The approach to clinical test validation described here should be useful to other laboratories that need to establish the analytic performance characteristics for other quantitative molecular tests.
References
Author notes
From the Department of Pathology and Laboratory Medicine, Northwestern University Feinberg School of Medicine, Chicago, Illinois (Drs Jennings and Smith); the Department of Laboratory Medicine, Mayo Foundation, Rochester, Minnesota (Dr Halling); the Department of Pathology and Laboratory Medicine, University of Kansas Medical Center, Kansas City (Dr Persons); and the Department of Pathology, The University Health Network, Toronto, Ontario, Canada (Dr Kamel-Reid).
The authors have no relevant financial interest in the products or companies described in this article.