NRG Oncology Survey of Monte Carlo Dose Calculation Use in US Proton Therapy Centers

Purpose/Objective(s) Monte Carlo (MC) dose calculation has appeared in primary commercial treatment-planning systems and various in-house platforms. Dual-energy computed tomography (DECT) and metal artifact reduction (MAR) techniques complement MC capabilities. However, no publications have yet reported how proton therapy centers implement these new technologies, and a national survey is required to determine the feasibility of including MC and companion techniques in cooperative group clinical trials. Materials/Methods A 9-question survey was designed to query key clinical parameters: scope of MC utilization, validation methods for heterogeneities, clinical site-specific imaging guidance, proton range uncertainties, and how implants are handled. A national survey was distributed to all 29 operational US proton therapy centers on 13 May 2019. Results We received responses from 25 centers (86% participation). Commercial MC was most commonly used for primary plan optimization (16 centers) or primary dose evaluation (18 centers), while in-house MC was used more frequently for secondary dose evaluation (7 centers). Based on the survey, MC was used infrequently for gastrointestinal, genitourinary, gynecology and extremity compared with other more heterogeneous disease sites (P < .007). Although many centers had published DECT research, only 3/25 centers had implemented DECT clinically, either in the treatment-planning system or to override implant materials. Most centers (64%) treated patients with metal implants on a case-by-case basis, with a variety of methods reported. Twenty-four centers (96%) used MAR images and overrode the surrounding tissue artifacts; however, there was no consensus on how to determine metal dimension, materials density, or stopping powers. Conclusion The use of MC for primary dose calculation and optimization was prevalent and, therefore, likely feasible for clinical trials. There was consensus to use MAR and override tissues surrounding metals but no consensus about how to use DECT and MAR for human tissues and implants. Development and standardization of these advanced technologies are strongly encouraged for vendors and clinical physicists.


Introduction
The number of proton therapy centers has increased rapidly in recent years. The proton pencil beam scanning technique has been widely implemented in almost all new proton centers, and it is expected to be the future trend in proton therapy [1]. The dose calculation algorithm plays an important role in the accuracy and quality of proton beam therapy. The analytic dose engine, such as pencil beam superposition convolution (PSC), has the advantage of fast calculation speed and is widely implemented in commercial treatment-planning systems (TPSs), such as Eclipse (Varian Medical Systems, Palo Alto, California), RayStation (RaySearch Laboratories, Stockholm, Sweden), and Pinnacle (Philips, Amsterdam, Netherlands). However, Taylor et al [2] reported that proton centers using the analytical dose engine had low passing rates on the Imaging and Radiation Oncology Core (IROC) lung phantom. On the contrary, Monte Carlo (MC) dose calculation has been found to improve dose calculation accuracy in low-density heterogeneities, and correspondingly, has a favorable IROC lung phantom pass rate. Hence, the National Cancer Institute recommends using MC in its sponsored lung-related clinical trials. Similarly, MC has been reported to be advantageous in highly heterogeneous patients and is recommended for patients with a large metal implant [3][4][5][6][7].
Since 2018, MC for pencil beam scanning techniques has been implemented by major commercial TPSs. The Eclipse TPS currently offers MC for final dose calculation, but plan optimization is still based on analytical methods. The RaySearch TPS offers MC for both plan optimization and final dose calculation; however, if quick optimization is desired, analytical PSC is an option for plan optimization. Commissioning TPS MCs has been reported to improve dosimetric accuracies over PSC [8][9][10]. Besides the aforementioned commercial MCs, in-house MCs, such as TOPAS [11] (TOPAS MC Inc, Boston, Massachusetts), MCsquare [4] (Université Catholique de Louvain, Belgium), and others, from early in-house development at the Paul Scherrer Institute [5] to general-purpose MC (Geant4/Gate [12] and Geant4/FLUKA [13]) to graphics processing unit (GPU)-based MC (gPMC at Massachusetts General Hospital in Boston [14] and the in-house MC at the Mayo Clinic, Phoenix, Arizona [15]) have been widely used for clinical dose calculations.
The advantages of MC over PSC have motivated clinical trial sponsors to consider requiring MC in all future proton therapy clinical trials. In 2018, the NRG Oncology Medical Physics Subcommittee established a working group to examine the feasibility of implementing MC in all clinical trials involving proton therapy. The working group comprised radiation oncologists and a therapeutic medical physicist with expertise in proton therapy. The working group designed a survey for National Clinical Trial Network (NCTN) members to query the current clinical practice and distributed it to proton therapy centers in the United States. The goal of this survey was to inform clinical site-specific practice guidelines for NRG Oncology-sponsored clinical trials.
MC is intrinsically different from PSC and requires specific considerations for implementation. For example, unlike the conversion of computed tomography (CT) Hounsfield units (HUs) to stopping power in the analytical solution, the MC method looks for elemental composition and mass density to determine stopping power and scattering cross-sections. Heterogeneity is typically placed into 2 categories: human tissues, described by stoichiometric calibration in single-energy CT [16], and artificial implants, which can have artifacts over surrounding tissues and HUs that cannot be represented by that of human tissues. Guidelines are desired for both users and vendors regarding the definition of artificial implants, including geometry and elemental composition.
Recent developments in dual-energy CT (DECT) [17][18][19][20][21] and CT metal artifact reconstruction (MAR) [22][23][24] show potential to improve range uncertainty and enable treatment of patients with large implants [25]. Guidelines are needed on the use of tissue-mimicking phantoms for the validation of proton therapy dose calculation and on how to properly use DECT and MAR images to achieve the most accurate patient dose distribution leading to best patient outcome and safety. Hence, the last purpose of this survey was to find out how these complementary imaging techniques were used along with the practice of MC.

Methods and Materials
Given the novelty and importance of MC to clinical trials, NRG Oncology has formed a work group to develop a practice guideline on the application of MC to national cooperative group clinical trials. The overall goal was to improve treatment dose calculation accuracies for patients without artificial materials and to safely treat patients with artificial materials that were previously reported to be associated with inferior local control when delivering proton therapy alone [26] and recommended to receive mixed photon/proton therapies to mitigate proton dose inaccuracies [27,28].
As no publications to date have reported how proton therapy centers implement MC and complementary imaging technologies such as DECT and MAR, a current practice pattern assessment is required to determine the feasibility of including them in clinical trials. To address the issue, a survey was designed to query key clinical parameters: scope of MC utilization, validation methods in homogeneous and heterogeneous phantoms, clinical site-specific imaging guidance, proton range uncertainties, and how metal implants are handled.
The survey data provide insightful information on several questions. First, how widely available MC is and whether it has been implemented in current proton therapy centers. Second, what type of cancers are being optimized and calculated by MC. This information informs whether the field is ready for implementing MC for clinical trials and what disease sites may show clinical advantages with MC.
The full version of the survey questions is included in the Supplemental Appendix. Questions 1, 2, and 4 surveyed the type and use of MC. Questions 3 and 7 surveyed the need for MC and values of range uncertainty for specific disease sites. Questions 5 and 8 covered the commissioning and validation of MC, including heterogeneous validation and patient quality assurance methods. Question 6 surveyed site-specific imaging methods (eg, DECT, MAR, magnetic resonance imaging, others). Question 9 covered different issues encountered treating patients with implants: material, composition, dimension, range uncertainties, and mitigation strategies. Pull-down lists of values and frequency (always, often, sometimes, never) were used along with free-form text boxes to allow some users to input unique answers or clarify the answers given.
The survey was distributed to all 29 operational US proton therapy centers on 13 May 2019 following European and NRG Oncology precedents [29][30][31]. The survey was modified slightly for clarification and redistributed on 18 October 2019. The survey was distributed by IROC, which monitors proton therapy centers that participate in NCTN protocols.
Two-sample t-tests were performed by using MATLAB and Statistics Toolbox Release 2018b (The MathWorks, Inc, Natick, Massachusetts). A P value ,.05 was considered statistically significant.

MC Implementation and Availability in Proton Centers
The overall MC availability in proton centers and the specific implementation of commercial and in-house systems are shown in Figure 1. In total, 25 centers responded (86% participation rate). Of those, 23 centers reported (92%) having at least 1 MC system. Of the centers, 12 had 1 MC system, 3 had 2, 5 had 3 and 4 had 5 MC systems. Only 2 centers did not report use of MC. The commercial MC systems RayStation and Eclipse were used in 16 and 6 centers, respectively. Most of the in-house MC systems were freely available softwares: Topas (n ¼ 9), Geant4 (n ¼ 2), and MCsquare (n ¼ 6). The other 6 in-house MC systems reported were 4 GPU-based MCs [14,15] and 2 developmental MCs. Figure 2 shows the general usage of MC systems in proton centers. Primary dose evaluation is mostly done by commercial MC systems rather than in-house MC systems (18 vs 4 centers). Specifically, RayStaton was used by 15 centers, Eclipse was used by 3 centers, and the in-house MC systems used by 4 centers were Topas/MCsquare, GPU-based MC [14,15], and development MC, respectively. Of note: Eclipse MC systems currently do not support plan optimization. MC plan optimization was performed in 16 centers using RayStation and in 3 centers using in-house MC (2 MCsquare and 1 GPU-based MC [15]). Linear energy transfer (LET) relative biological effectiveness (RBE) evaluation was done in different forms: half of the 8 centers used RayStation and the others used in-house MC for primary evaluation; 6 centers used in-house MC systems as secondary LET/RBE evaluation. Figure 3 shows the use of MC in specific disease sites. Significantly more centers (P ¼ .007) used MC for both dose calculation and plan optimization when treating brain/central nervous system (CNS), head and neck, lung, and breast cancers than for gastrointestinal, genitourinary (GU), gynecologic (GYN), and extremity cancers. This is potentially due to the latter group of disease sites being of more homogeneous tissue density than the first group. Primary LET/RBE evaluation is used more for brain/CNS tumors than for all the other disease sites (P , .001), potentially due to risks to serial organs, such as optical structures, brain stem and spinal cord.

MC for Modeling Accessories
MC was always or sometimes used to model range shifters in 60% and 20% of proton centers, respectively. Apertures were used in 16 centers, and MC was used to model apertures at most of these centers (69%). Two proton centers also used MC for patient-specific bolus and adaptive aperture (eg, multi-leaf collimators).

Commissioning and Validation Methods for Heterogeneities
Most centers used electron density phantoms for stoichiometric calibration (84%). IROC phantoms were widely used for heterogeneity validation (92%). Ten centers (40%) and 7 centers (28%) used real animal tissues for stoichiometric calibration and heterogeneity validation, respectively. Other than IROC and electron density phantoms, there was no consensus over the use of heterogeneous phantoms. Figure 4 shows the number of proton centers that used various image methods to reduce uncertainties from heterogeneity for brain/CNS, head and neck, lung, breast, gastrointestinal, GU, GYN, and extremity cancers. Metal artifact reduction (eg, iterative metal artifact reduction and orthopedic metal artifact reduction) is the imaging technique most proton centers used to reduce the artifacts (76%). Dual-energy CT (DECT) was rarely implemented for clinical use (3/25). Magnetic resonance imaging was used by ,44% of centers to reduce uncertainty from heterogeneities. Seven centers (,28%) used positron emission tomography/CT, proton radiography, and prompt Gamma to reduce uncertainty from tissue heterogeneity.

Range Uncertainty in Treatment Planning
Most centers (92%) used range uncertainties of 3% or 3.5% for treating all disease sites except lung. For lung cancer, 7 centers (28%) used a larger range uncertainty of 4% or 5%.

Criteria for Patient-Specific Quality Assurance
For plans that did not pass the robustness evaluation, most centers (96%) required a replan before patient-specific quality assurance (QA). Most centers (84%) used .90% to 95% Gamma passing rate with 3-mm distance to agreement and 3% dose accuracy as patient-specific QA criteria. If the patient QA was outside criteria, the most common action (taken by 84% of centers) was to renormalize the plan monitor unit according to the patient QA results. The second most common action if the patient QA was outside criteria was to redo the treatment plan (68% of centers). Many centers (40%) also reported that they would hold patient treatment if the patient QA was outside 3%/3 mm.

Procedure to Handle Metal Implants
Most centers (96%) did not have an absolute maximum size of metal implants to determine the feasibility of treatment and instead evaluated case by case. When proton beams had to pass through metal implants, 71% of centers used a larger range uncertainty margin to take into account the larger range uncertainty. All proton centers (100%) considered using more beams from different passing angles to smooth out the dose perturbations caused by metal implants. Most centers (81%) used MC methods to improve dose accuracy with metal implants in the beamline. All proton centers except 1 (96%) would override the artifacts in the tissue surrounding the metal implants. However, for the metal implant itself, there was no consensus on precisely determining metal dimension or assigning materials density and stopping powers.

Discussion
A majority of proton therapy centers were using MC (commercial or in-house) for various purposes in their clinics. MC was used for treatment planning, plan optimization, LET/RBE evaluation, and secondary dose calculations. The survey responses showed that MC use for some disease sites, such as GU and GYN, was lower than for other disease sites. While some proton centers were not treating GYN cancers with proton therapy (currently there are no NCTN clinical trials for GYN cancer that allow proton therapy), GU diseases, such as prostate cancer, were currently being studied in multiple active phase III trials [32,33]. Perhaps one reason for less use of MC in these disease sites is the relative tissue homogeneity of the anatomy compared with more heterogeneous disease sites, such as lung tumors. MC was also dominantly used in brain/CNS, head and neck, and breast cancers, probably due to spine implants, dental implants, and breast implants, respectively. However, from a resource perspective, it is cumbersome to maintain one TPS for some disease sites and a second for others. As many centers have already implemented MC for clinical use, it would be ideal for clinical trial data collection and consistency for all proton centers to use the same treatment-planning algorithm. Given this, we encourage proton centers to consider using MC for all disease sites. However, this may slow down the clinic because MC dose engines are more time consuming in general than the analytic ones.
One of the important issues identified in this survey was lack of consensus regarding the use of DECT and consensus of using MAR for artificial materials. Despite the promise of DECT, it was surprising that so few proton centers had implemented this clinically. We speculate that this is partly due to a lack of adequate support for DECT in commercial TPSs. Furthermore, current DECT uses 2-parameter [34] to extract material composition, which is adequate for HU accuracy of human tissues but might not be sufficient for artificial tissues. For patients with large and high Z artificial materials, MC is desired.
However, current DECT techniques cannot provide the mass density and material composition that is needed for MC. Furthermore, a pixilated proton counting detector has been introduced to validate derived material information for anthropomorphic phantoms [35]. DECT and TPS vendors are encouraged to provide a solution to extract and implement material information for human and artificial tissues so that MC can be properly used in clinics. Proton centers with DECT capabilities are encouraged to evaluate DECT for MC-based treatment planning purposes.
The survey had some design limitations. For example, the survey combined the imaging techniques of positron emission tomography/CT, proton radiography, and prompt gamma into a single option when querying methods used to minimize patientspecific heterogeneity uncertainty. Therefore, it is impossible to distinguish which method is available at individual centers. Another limitation was with the question regarding patient-specific QA. The survey used the broad term ''patient specific QA,'' which can be interpreted in many ways, for example, the comparison between the dose calculated by TPSs with the measurements by detectors, the dose calculated by the second monitor unit check, or the dose recalculated by the log files.

Conclusion
Based on the survey results, the use of MC for primary dose calculation and optimization was available and feasible for patients without artificial materials in most proton therapy centers. For comparability, consistency, and accuracy in clinical trials, we encourage proton centers to commission or adopt MC for clinical use. For patients with metal implants, there is consensus to use metal artifact reduction and to override tissues surrounding metals. However, there is no consensus about DECT regarding the use of virtual monoenergetic images or the extraction of material information for artificial and human tissues. Development and standardization of these advanced technologies are strongly encouraged for vendors and clinical physicists alike.