We report the outcomes of patient-specific quality assurance (PSQA) for spot-scanning proton therapy (SSPT) treatment plans by disease site.
We analyzed quality assurance outcomes for 309 SSPT plans. The PSQA measurements consisted of 2 parts: (1) an end-to-end test in which the beam was delivered at the prescribed gantry angle and (2) dose plane measurements made from gantry angle 270°. The HPlusQ software was used for gamma analysis of the dose planes using dose-tolerance and distance-to-agreement levels of 2%, 2 mm and 3%, 3 mm, respectively. Passing was defined as a gamma score <1 in at least 90% of the pixels.
The overall quality assurance measurement passing rate was 96.2% for the gamma index criteria of 3%, 3 mm but fell to 85.3% when the criteria were tightened to 2%, 2 mm. The passing rate was dependent on the treatment site. With the 3%, 3 mm criteria, the passing rate was 95% for head-and-neck treatment plans and 100% for prostate plans. No significant difference was found between passing rates for multi-field and single-field optimized plans. The passing rate was 94.8% ± 0.6% for fields with range shifters and 99.0% ± 0.6% for those without (P = .002). Most low gamma index scores were due to steep dose gradients transverse to the measured plane. A less frequent cause of failures was an apparent systematic overestimation of the calculated dose at depths proximal to the spread-out Bragg peak.
A comprehensive PSQA program serves to ensure the safety of a specific treatment plan and acts as a check on the entire treatment system. We propose that the 3%, 3 mm with 90% pixel passing rate is a reasonable action level for 2-dimensional comparisons of dose planes in SSPT, although more restrictive tolerance levels would be appropriate for prostate treatment plans.
The number of centers offering spot-scanning proton therapy (SSPT) is increasing rapidly. In 2013 fewer than 10 centers offered SSPT. By 2020 more than 20 centers will likely possess the capacity for SSPT. The expansion of SSPT is motivated by its potential for highly conformal dose distributions, improved tumor control, and decreased toxicity. These conformal dose distributions are achieved by precisely positioning individual beam pulses, referred to as “spots.” Spots are positioned longitudinally by varying the beam energy and laterally by varying the field strength of steering magnets within the nozzle. Due to the high conformality and longitudinal degree of freedom for depth dose modulation, which does not exist in x-ray beam therapy, small deviations in beam delivery or approximations in the dose calculation algorithm may lead to clinically significant differences in the delivered and planed dose distributions. Given these unique characteristics, the standardized quality assurance (QA) procedures designed for other modalities, such as intensity-modulated radiation therapy, may not be appropriate for SSPT.
Because SSPT is a relatively new treatment modality, QA procedures have not yet been standardized, and little information on SSPT QA is available in the literature. Jäkel et al  reported on treatment planning system (TPS) QA for the scanned ion beam at Gesellschaft für Schwerionenforschung Lomax et al  reported treatment planning and verification results from the Paul Scherrer Institute. However, QA tools and practices have changed substantially since those reports. Zhu et al  reported specifically on results for prostate treatment plans, and Mackin et al  reported on a QA software tool and compared dose calculations for the Eclipse TPS with an in-house algorithm. However, to our knowledge, no study has reported on patient-specific QA (PSQA) results for SSPT in a large sample of patients with diverse tumor types.
-The purpose of this study was to assess the outcomes of the SSPT PSQA program at our institution by treatment site, by plan optimization technique, and by the use or no use of a range shifter for the year 2013.
Materials and Methods
From January 1 through December 31, 2013, we performed our PSQA procedure for 309 patient treatment plans, all of which were included in the present study. The disease sites included central nervous system (CNS), head and neck (HN), thoracic and gastrointestinal system (thoracic/GI), and prostate. The PSQA for these patients consisted of 2711 dose plane measurements for 748 treatment fields. The fields were delivered by the Hitachi synchrotron synchrotron-based PROBEAT Proton Beam Therapy System (Hitachi America, Ltd, Tarrytown, New York), which is capable of delivering up to 94 different beam energies ranging from 72.5 to 221.8 MeV, with a proton range of 4 to 30.6 g/cm2. The full width at half maximum of the proton beam spots in air ranged from 3.4 to 1.2 cm for lowest to the highest beam energies.
All treatment plans were developed with the Eclipse TPS, version 8.9 (Varian Medical Systems, Inc, Palo Alto, California), using images from a 16-slice computed tomography scanner (LightSpeed RT 16, GE Healthcare, Waukesha, Wisconsin). The Eclipse analytical proton pencil beam dose algorithm was commissioned using a combination of measured and Monte Carlo simulated integral depth dose profiles and in-air lateral profiles. For fields using a range shifter, the same analytical model was used but commissioned with independent data measured or simulated with the 6.7 g/cm2 range shifter in place. Plans were developed using single-field optimization (SFO), where each field provides full target coverage individually ,or multi-field optimization (MFO), where full target coverage is provided by all the fields in combination.
Patient-Specific QA Procedure
The core PSQA procedure consisted of 2 sets of measurements. For the first set, treatments were delivered at the prescribed gantry angles and measured with a MatriXX ion chamber array (IBA Dosimetry, Schwarzenbruck, Germany) at a 2-, 4-, or 5-cm water-equivalent depth. The ion chamber array and the acrylic build-up material were held perpendicular to the beam using a custom couch attachment. We termed these measurements “end-to-end” because they provide some verification for all system components, from dose calculation by the TPS to data transfer to and from the record and verify system, and through beam delivery at the prescribed gantry angle. The second set of measurements, consisting of 2 or 3 additional depth measurements for each field, were delivered with the gantry at 270° and measured using an ion chamber array with a Plastic Water (CIRS, Norfolk, Virginia) phantom. This PSQA procedure is covered in more detail by Mackin et al . For prostate treatment plans, the PSQA procedure is somewhat different in that the end-to-end test measures the dose using a 0.04-cm3 cylindrical ion chamber (Model CC04, IBA Dosimetry) in a water tank rather than an ion chamber array. These water tank measurements are described in detail by Zhu et al  and, therefore, not included in this study.
The measured dose planes were compared with the dose planes calculated by Eclipse using the gamma index  with dose tolerance and distance-to-agreement criteria of 2%, 2 mm and 3%, 3 mm, respectively. A passing gamma score was defined as a gamma score <1 for ≥90% of the test pixels. We abbreviate these 2o gamma index passing levels as 2%, 2-mm; >90% passing and 3%, 3-mm; >90% passing. For all gamma index calculations, the dose planes were normalized to the maximum dose in the measured planes, and test pixels included in the QA results were required to have at least 10% of the normalization dose. Our clinical action level required all measured dose planes to have a passing gamma score unless the plane was measured near the distal end of range. When the clinical action level is exceeded, the medical physics team reviews the QA results for the field in question and may reject the plan, request replanning, or approve the plan for treatment after notifying the physician and adding to the patient's QA report an explanation for the low score.
Treatment Field Characteristics
We compared number of monitor units (MUs), number of energy layers, number of spots, beam ranges, and the spread-out Bragg peak (SOBP) length for CNS, HN, prostate, and thoracic/GI sites. An MU is based on the amount of charge collected by the dose monitor in the delivery nozzle while delivering a reference field . The number of energy layers and beam spots are simply counts of the distinct beam energies and positions used to deliver the treatment field. The beam range and SOBP were calculated by Eclipse as the distal 90% falloff of the highest energy layer and the difference between the ranges of the highest and lowest energy layers . We also looked at the gamma index passing percentage for the 4 treatment sites, grouped by the residual range of the measurement. Here, we defined residual range as the difference between the beam range and the measurement depth. We used residual range because the dose deposition characteristics depend on the average energy of the protons, which is closely correlated with the residual range of the beam.
Comparison of PSQA Measurements for SFO and MFO
Given the increased modulation relative to SFO fields, measured doses for MFO fields might be particularly sensitive to measurement uncertainties. In addition, the higher modulation might increase the dose calculation uncertainty. Therefore, it is important to determine if there are differences in the PSQA results for SFO and MFO fields. To compare the SFO and MFO treatment fields and the corresponding PSAQ results, we again used the variables MU, range, energy layers, number of spots, and the gamma index passing percentage grouped by residual range. Because prostate treatment fields all used SFO, those fields were not relevant to the comparison of SFO and MFO and were removed.
Range Shifter Effect on PSQA Results
Understanding the differences in QA results caused by range shifters would aid future efforts to improve beam modeling and QA procedures. More importantly, large differences might be clinically relevant when the use of a range shifter is optional. To investigate these differences, we applied the same methodology used for comparing SFO with MFO field measurements. First, we removed the prostate field measurements because range shifters are not used for prostate treatments. Next, we divided the data into range shifter (RS) and no range shifter (NRS) groups. We compared the RS and NRS measurements, again using the variables MU, range, energy layers, and number of spots. We also looked at how the gamma index passing percentage differed between the 2 groups. We grouped by measurement depth, instead of residual range, to better study the modeling of the dose from secondary radiation due to scattering in the range shifter.
Table 1 summarizes the PSQA results for the study treatment plans by disease site. As shown in the table, SFO was used for 100% of the prostate treatment plans and 75% of the CNS treatment plans. Most head and neck (68%) and thoracic/GI (58%) treatment plans used MFO. The numbers of MUs, spots, and energy layers used for treatment fields varied widely for all disease sites except prostate, for which there was little variation among plans.
The box and whisker plots in Figure 1 show the distributions of beam spots per field, energy layers per field, number of beam spots, and nominal beam range. The differences between the prostate treatment fields and the treatment fields at other sites are apparent. As predicted, compared with the fields at the other 3 sites, prostate fields had longer range but fewer spots and fewer energy layers. Moreover, the prostate treatment fields had less variance in these characteristics. For example, the range in the number of beam spots for prostate fields ranged from 1,044 to 2,557, whereas the number of spots used for thoracic/GI fields ranged from 646 to 26&thinsp'738.
Table 2A summarizes the gamma index passing percentages for 1851 dose planes measured as part of the PSQA. The dose planes were required to have residual-range values >2 cm. For the 2%, 2 mm and the 3%, 3mm criteria, 85.3% and 96.2% of the dose planes, respectively, had passing percentages greater than 90%. Thus, the clinical action level was reached for 3.8% of measured dose planes.
Figure 2A shows the results for the 2%, 2 mm gamma index criteria for the 4 treatment sites grouped by the residual range. With the exception of prostate, each treatment site had at least one residual range group in which >25% of the measured values fell below the 90% passing rate threshold. The problems disappeared when the gamma criteria were relaxed to 3%, 3 mm (Figure 2B). In fact, only the range for (2 cm, 5 cm), the HN data group, and several outliers fell below the 90% passing rate threshold for 3%, 3 mm.
Figures 3A–D compare the characteristics of the SFO and MFO fields. Because all the prostate treatment plans used SFO, they were excluded from this comparison. The plots have notches that indicate the 90% confidence interval of the median . The non-overlapping notches for beam spots per field, the energy layers per field, the beam range, and the field cross-section at isocenter indicate that the median values of these parameters differ significantly between MFO and SFO fields. Compared with the SFO fields, the MFO fields tended to have a longer range, a larger field cross-section at isocenter, more spots, and more energy layers. It should be noted that if differences exist in the gamma index passing rate, they may be due to these differences in the field characteristics rather than in the optimization method. Figures 3E–H shows the gamma index passing percentages for the SFO and MFO dose planes grouped by residual range.
Table 2B summarizes the gamma index passing percentages for SFO and MFO treatment fields. A 2-sample test for equality of proportions indicated that differences between SFO and MFO gamma index passing rates were not significant (P = .366 and P = .761, respectively) for 2%, 2 mm and 3%, 3 mm, respectively .
Figures 4A–D compare the characteristics of treatment fields that do (labeled RS) and do not (labeled NRS) use range shifters. The NRS fields tended to have longer ranges. Otherwise, RS and NRS had similar numbers of MU per field, numbers of beam spots per field, and numbers of energy layers per field.
We compared the gamma index passing percentages for RS and NRS fields grouped by measurement depth, as shown in Figures 4E–H. For RS fields, the dose calculation must properly model the effects of multiple scattering and secondary particles production in the range shifter. Multiple scattering primarily affects the distal dose falloff, whereas the lower-energy secondary particles affect the shallow dose. Low gamma index passing percentages occurred more frequently for greater measurement depths. These results suggest that the dose algorithm is effectively modeling the dose from the secondary radiation and is less effectively modeling effects of the range shifter on the distal falloff.
Table 2B shows that for the 2%, 2 mm gamma criteria, the percentage of dose planes with gamma index passing rates of at least 90% was 81.9% ± 1.0% and 86.1% ± 2.0% for RS and NRS fields, respectively, suggesting (for 2-sample test for equality of proportions, P = .08) that NRS fields may have better agreement with measurement than RS fields have. For the 3%, 3 mm criteria, the passing rates increased to 94.8% ± 0.6% and 99.0% ± 0.6% for RS and NRS fields, respectively. In this case, the evidence of higher passing rates for NRS fields than for RS fields is much more significant (for 2-sample significant for 2-sample test for equality of proportions, P = .002) .
The results of this study suggest some general guidelines for SSPT PSQA. We found that the dose planes were >3 times more likely to fall below the 90% of pixels passing criteria when the 3%, 3 mm gamma index criteria were lowered to 2%, 2-mm. Distinguishing between treatment-site groups, we found that prostate treatment field characteristics are consistent from patient to patient and that these characteristics are generally distinct from those of treatment fields from the other sites (CNS, HN, and thoracic/GI). All of the prostate dose planes measured in the SOBP during PSQA had better than 98% gamma index pass rates with the 2%, 2 mm gamma index criteria. Because prostate treatment fields can be expected to have a beam range of 21–29 cm and to have fewer than 3000 beam spots and 30 energy layers, these values may be useful for detecting anomalies during PSQA. Conversely, the fields for the other 3 treatment sites (CNS, HN, and thoracic/GI) showed widely varying and overlapping characteristics, suggesting that these case types can be approached using a common PSQA procedure. Importantly, we found no indication that the SFO/MFO or RS/NRS groups require special treatment in PSQA. In summary, we would recommend a consistent PSQA procedure and gamma index acceptance criteria for all nonprostate treatment plans.
What the gamma index acceptance criteria for SSPT should be remains an open question. The criteria of 3%, 3 mm; >90% passing is commonly used for intensity-modulated radiation therapy PSQA [10–13] and seems to be a reasonable starting point for SSPT as well. However, a more ideal approach might be to set the acceptance criteria at a level that ensures that the dose calculation and dose delivery uncertainties are small relative to the range, setup, and anatomic uncertainties. Because these ideal acceptance criteria are unknown and may not be reasonably achievable, we can more practically base the acceptance criteria on our current ability to calculate, deliver, and measure dose combined with a reasonable PSQA failure rate.
The results from this study provide guidelines for this practical approach. The 2%, 2 mm; >90% passing criteria is too restrictive as it produces a >15% PSQA failure rate for nonprostate fields. Instead, we recommend the 3%, 3 mm; >90% passing criteria, which is consistent with the intensity-modulated radiation therapy literature and yields a PSQA failure rate of approximately 5% for nonprostate fields. We also include the 2%, 2 mm gamma index results in our QA process to provide additional quantitative information for evaluating the QA results.
Only 84 of the 1851 dose planes fell below the 3%, 3 mm; > 90% passing gamma index criteria. An attempt was made to understand the cause of each of the 84 cases. The steep longitudinal dose gradients of proton beams make 2-dimensional, lateral dose comparisons problematic. For this reason, dose planes measured within 2 cm of the end of the beam range were excluded from the gamma comparisons. However, upon further review the longitudinal dose gradient was still the cause of 70 of the 84 low gamma scores. The Eclipse dose calculation, which systematically gives a higher dose value than the measurement in regions proximal to the SOBP, seemed to account for 9 of the low gamma scores. Two planes seemed to have measurement errors and were remeasured, bringing the agreement to within tolerance. Finally, for the remaining 3 low gamma scores, the cause was unclear. In these cases, the treatment plans were rejected and had to be replanned.
A comprehensive PSQA program serves to ensure the safety of a specific treatment plan. Such a PSQA program also serves as a check on the entire treatment system, as a major failure in any aspect is likely to result in a delivered dose distribution that deviates from the plan and a corresponding low gamma index passing percentage. Careful and consistent monitoring of PSQA results can uncover small discrepancies or accidental modifications to the delivery system or to the TPS. In fact, Dong et al  reported such a discovery after noticing a 3% variation in intensity-modulated radiation therapy PSQA results. We believe the additional check of the delivery system provided by PSQA is complementary to the systematic and consistent checks provided by machine QA procedures.
The PSQA measurements have shown that the planned and delivered doses consistently agree within tolerance levels. We conclude that the 3%, 3 mm; >90% passing gamma criteria is a reasonable action level for 2-dimensional comparisons of dose planes in SSPT. More restrictive tolerance levels would be reasonable for prostate treatment plans.
ADDITIONAL INFORMATION AND DECLARATIONS
Conflicts of Interest: The authors have no conflicts to disclose.
Acknowledgments: The authors would like to thank Diane Hackett of MD Anderson's Scientific Publications Department for her suggestions for and careful review of this manuscript.