We propose and simulate a model-based methodology to incorporate heterogeneous treatment benefit of proton therapy (PrT) versus photon therapy into randomized trial designs. We use radiation-induced pneumonitis (RP) as an exemplar. The aim is to obtain an unbiased estimate of how predicted difference in normal tissue complications probability (ΔNTCP) converts into clinical outcome on the patient level.
ΔNTCP data from in silico treatment plans for photon therapy and PrT for patients with locally advanced lung cancer as well as randomly sampled clinical risk factors were included in simulations of trial outcomes. The model used at point of analysis of the trials was an iQUANTEC model. Trial outcomes were examined with Cox proportional hazards models, both in case of a correctly specified model and in a scenario where there is discrepancy between the dose metric used for ΔNTCP and the dose metric associated with the “true” clinical outcome, that is, when the model is misspecified. We investigated how outcomes from such a randomized trial may feed into a model-based estimate of the patient-level benefit from PrT, by creating patient-specific predicted benefit probability distributions.
Simulated trials showed benefit in accordance with that expected when the NTCP model was equal to the model for simulating outcome. When the model was misspecified, the benefit changed and we observed a reversal when the driver of outcome was high-dose dependent while the NTCP model was mean-dose dependent. By converting trial results into probability distributions, we demonstrated large heterogeneity in predicted benefit, and provided a randomized measure of the precision of individual benefit estimates.
The design allows for quantifying the benefit of PrT referral, based on the combination of NTCP models, clinical risk factors, and traditional randomization. A misspecified model can be detected through a lower-than-expected hazard ratio per predicted ΔNTCP.
Radiation oncology has—as have many other technology-driven medical disciplines—struggled to generate level I evidence demonstrating the clinical benefits of technologic advances. This is also the case for proton therapy , where the superiority over standard photon therapy remains controversial [2–4], even for widely accepted treatments such as pediatric malignancies [5–7]. It has been suggested that purely relying on evidence created from randomized controlled trials (RCTs) may not be the optimal way of identifying patients (or patient groups) likely to benefit from proton therapy [8–11]. An alternative methodology for development of data-based treatment strategies involves prediction of individual patient benefit using normal tissue complication probability (NTCP) models . The health care payers in some European countries have accepted NTCP model predictions as a basis for reimbursement of the cost of proton therapy [8, 12]. Still, selection of patients, based on outcome prediction models, suffers from risk of bias or inaccurate results if the underlying model lacks in accuracy or generalizability.
From an evidence-based medicine perspective, proton and photon therapies should be compared in RCTs, as would be required for any novel drug, except in indications where the benefit is deemed to be so large that randomization as a method for avoiding bias may not be necessary [13, 14]. At the time of writing, a number of clinical trials of proton therapy are indeed in progress in a wide range of indications . The challenge of this approach is that simple head-to-head comparison ignores the expected heterogeneity of treatment effect: Some patients may expect large benefits from proton therapy in terms of toxicity reduction, while others may have relatively little to gain (or might even be disadvantaged). Or to put it differently: an RCT of proton versus photon therapy may provide a methodologically correct answer to the wrong question—whether protons are uniformly better than photons in a defined population of patients—while missing out on the more relevant question—who, if any, among the patients will have a clinically meaningful benefit from proton therapy compared with photon therapy.
Here, we propose and simulate an approach where the heterogeneous treatment benefit of proton therapy predicted by a model is incorporated into the trial design. This idea also extends into the recurrent discussions regarding personalized medicine , where technology-driven improvements have the potential to play a major role. We show how such a randomized trial may provide a randomized estimate of the patient-level benefit from proton therapy, using comparative dose planning and taking clinical risk factors for toxicity into account.
As a proof-of-concept, we demonstrate the design and interpretation of a hypothetical trial of definitive radiation therapy for locally advanced non–small cell lung cancer (NSCLC), with radiation-induced pneumonitis (RP) as the primary endpoint. The model used at point of analysis of the trials is an individualized QUANTEC (iQUANTEC) model taking clinical risk factors into account [17, 18].
Materials and Methods
Twenty consecutive patients with locally advanced NSCLC treated with definitive radiation therapy and concomitant chemotherapy at our center, from January 1, 2015, and onwards, were selected as dose-planning cases, to provide realistic estimates of interpatient heterogeneity in dose metric differences. This retrospective study was approved by the Danish Health Authority, approval No. 3-3013-817/1, in accordance with Danish law. All patients were treated in free breathing with volumetric modulated arc therapy. Robust proton dose plans were generated, and clinical photon plans and generated proton dose plans fulfilled clinical constraints. See Supplementary Materials for additional information on patient data, dose planning, and clinical constraints.
Dose-Volume Histogram Data
For each dose plan (photon and proton) the mean lung dose (MLD) and the volumes receiving at least 10 Gy, 20 Gy, 30 Gy, 40 Gy, and 60 Gy (V10Gy, V20Gy, V30Gy, V40Gy, and V60Gy) were retrieved for the lungs (total lung minus gross tumor volume). Mean heart dose and V35Gy to esophagus were also retrieved. Data were retrieved with Eclipse Scripting API (version 13.6) using in-house developed software (Visual Studio Community Edition 2015).
We considered the setting of a randomized trial of proton versus photon therapy for locally advanced NSCLC, with rate of symptomatic RP as primary endpoint and with a 1:1 allocation ratio to the 2 trial arms and 300 patients per arm. Figure 1 shows the overall simulation setup.
We assumed the risk of RP could be predicted by the dose plan MLD and calculated an individual patient risk factor, NTCPsum, for RP, using an individualized QUANTEC model (iQUANTEC) . This model uses the dose-response relationship for symptomatic RP found in the QUANTEC review , but integrates additional clinical risk factors identified in a literature-based meta-analysis and their associated odds ratios (ORs)  (see Figure 2). We assumed a logistic dose-response relationship and estimated the OR for RP compared to a reference (see below) for a patient with a specific MLD and a set of clinical risk factors X1, X2..., as
A summarized individual patient risk was estimated by using the logarithm of the OR, NTCPsum = ln(OR), as this is additive in changes in risk factors.
We simulated outcomes by using a Weibull hazard function, h(t), parameterized by the shape and scale parameters ρ and λ:
Individual changes in the estimated risk of event (RP) were modeled by adjusting the scale parameter λrisk = λexp(NTCPsum) = λOR. Further, we required a patient with baseline clinical risk factors and average MLD with photon therapy (17.0 Gy) to have a probability of freedom from RP at 2 years of 85%, which in turn defines ρ = 0.6 and λ = 0.11. Finally, we assumed follow-up evenly distributed over a 2-year period.
For simulation of trial outcomes, we used bootstrap resampling from the 20 patients with dual planning. For each sample, we used the photon dose plan MLD for risk estimation based on dose and randomly sampled clinical risk factors as identified by Vogelius and Bentzen , with prevalences as summarized in the study of Appelt et al : preexisting pulmonary comorbidity, OR 2.27, prevalence 0.26; mid or inferior tumor location, OR 1.87, prevalence 0.44; current smoker, OR 0.62, prevalence 0.28; age >63 years, OR 1.66, prevalence 0.50.
To include the effect of the experimental treatment (proton therapy), we calculated the individual change in risk (ie, ΔNTCP) resulting only from the reduction in NTCP by using the difference in MLD between photon and proton therapy. We randomly selected patients for the experimental arm and adjusted their individual risk factor according to: NTCPsum.proton = NTCPsum.photon + ΔNTCP. Event times were simulated by sampling from the corresponding Weibull distribution (ie, with λrisk, based on either NTCPsum.proton or NTCPsum.photon, as appropriate), and censoring times were randomly sampled from a uniform distribution over the interval of 0 to 2 years.
Once the simulated trial dataset was created, we used the Cox proportional hazards model to analyze the data. The ΔNTCP and NTCPsum.photons were always included as covariates in the Cox model (Figure 1).
Model misspecification refers to the scenario where the user is making NTCP and ΔNTCP estimates (eg, in the model-based selection process) based on a certain parametrization (eg, MLD), but where the driver of toxicity is another metric (eg, lung V40Gy).
To study the effect of model misspecification, we changed the dose-volume model used in the simulation of outcome data while keeping the analysis of trial data the same. The assumed alternative driver dose-response models were based on dose cutoffs (VXX-driven models) and were taken from literature data . These models did not account for clinical risk factors and were not modified further. Fitting a logistic model to the data set provided estimates of γ50 and D50 (See Supplementary Materials). As Willner et al  only included predictive parameters up to V40Gy, the data were extrapolated to V60Gy in order to test a dose metric that correlated differently with MLD in the photon and proton plans (Figure 3). Supplementary Materials tabulate γ50 and D50 estimates for the different models; see Figure 2 for graphical versions of all models.
For both the main analysis (Trial Simulation) and the model misspecification we simulated 1000 trials each with 2 × 300 patients and report the resulting estimate of the logarithm of the hazard ratio [log(HR)] per ΔNTCP from the Cox model. Log(HR) per ΔNTCP is calculated by assuming the QUANTEC model for MLD (receiver model) regardless of which model was used to drive the simulated trial outcomes. To assess the effect of uncertainty associated with delivery of proton therapy on the trial outcome, we also included a worst-case proton plan MLD, selected from the uncertainty calculations performed during the planning process (using uncertainties of 0.5-cm isocenter shift and 3.5% calibration curve error). The worst-case proton plan was defined as the uncertainty plan with the highest MLD and was used as alternative driver dose-response models.
Application of Trial Results in Selecting Patients for Referral
Finally, we wanted to illustrate how the outcome of a trial could be used to support clinical decisions for future patients. The most frequent log(HR) per ΔNTCP, from 1000 simulations with a correctly specified model, was used for illustration. The trial yielded a log(HR) per ΔNTCP (based on a randomized comparison) for protons versus photons and we converted this result into an estimate of the expected benefit in 4 illustrative cases: the patient with the highest ΔMLD and a patient with approximately median ΔMLD, in both cases assuming no clinical risk factors or all clinical risk factors present (note that “current smoker” has a protective effect, ie, OR < 1). The effect size estimate of the Cox proportional hazards model could be used to provide a probability distribution of absolute benefit from proton therapy at the individual patient level, using the basic properties of the Cox model: HR, where HR = exp(β*ΔNTCP) and β [=log(HR)] were assumed normally distributed according to the confidence interval of the simulated trial. This approach naturally combines the model prediction (NTCP) and the result of randomization, as log(HR) is an estimate with ΔNTCP as covariable in strict randomization.
All simulations and analysis of outcome data were performed in RStudio (RStudio: Integrated Development for R, RStudio Team , RStudio Inc, Boston, Massachusetts).
Dose Plan Comparison
The mean ΔMLD was 4.3 Gy when comparing the robust planned proton dose plans with the clinical photon dose plans (Figure 3). The proton dose plans had a lower MLD and a lower V20Gy for all patients. However, V40Gy and V60Gy were higher with protons in an increasing proportion of cases. Evaluating the worst-case proton plans, the mean ΔMLD was 3.2 Gy.
The simulated trials favored proton therapy when an input MLD-based model was used (correctly specified model), and progressively decreased the predicted benefit of protons when the input model was changed from V10Gy towards V60Gy (Figure 4). Assuming the worst-case MLD (MLD-robustness in Figure 4) from the proton uncertainty plans had a limited effect on the trial outcome.
Use in Future Patients
We finally assumed the completion of a trial with a resulting estimate of the value of log(HR) for ΔNTCP. If the model was well specified, the most frequent trial result was used: log(HR) per ΔNTCP = −7.77 (95% confidence limit: −16.6 to 1.05). This result was converted into probability distributions of predicted benefit, shown for 4 illustrative cases in Figure 5, detailing how a patient with all risk factors in combination with a large ΔMLD reduction, comparing photons to protons, will have a reduced probability of developing RP, namely, from 49% to 9%. Additionally, a patient with the same number of risk factors, but median ΔMLD between the 2 modalities, will only have a 31% to 15% probability reduction.
Our trial simulation incorporates heterogeneity of treatment effect and quantified risk at the individual patient level into a randomized comparison of 2 treatment strategies. The aim is to obtain an unbiased estimation of how a difference in model-predicted ΔNTCP converts into a clinical outcome on the patient level.
Figure 5 shows how outcomes from randomized trial results can be used to quantify the individual benefits of proton versus photon therapy and thereby support the decision to refer to proton therapy on a rational basis. This allows moving from simple hypothesis testing in a comparative effectiveness trial towards individualized estimates of benefit. Arguably, the cases presented in Figure 5 reflect subjective decisions already made by treating physicians, but the estimate of the HR per ΔNTCP quantifies the relative merits of the 2 radiation modalities and provides decision support for patients, caregivers, and policy makers. A supportive decision-making tool based on the data from this article has been developed in RStudio and is available online at: https://protontrialsimulation.shinyapps.io/trial_webb/.
The trial design presented here still requires strict randomization. A possible elaboration on the present approach could be to use adaptive Bayesian designs. In such a design patients with a large predicted benefit are randomly assigned with higher weighting to proton therapy and when a predefined limit on the magnitude of benefit is exceeded, the patient can be allocated to the according arm without randomization at all.
A major strength of the presented design is that future selection of patients does not rely on the model being correctly specified. The estimate of HR per ΔNTCP adjusts the predicted benefit to the actual clinical data and thus reduces the risk of wrongly allocating future patients to protons if, for example, the V60Gy model would be a better predictor of NTCP. In other words, where the model-based approach relies on nonrandomized follow-up studies after implementation to identify model misspecification, the current proposal detects model misspecification from randomized data through the HR per ΔNTCP from a trial. Another strength of the design is that an uncertainty in the MLD did not affect the trial outcome (Figure 4).
It should be acknowledged that it is not ideal only to consider NTCP of a single endpoint in benefit estimations. Here we focused on a single endpoint for simplicity, but further studies should look at several NTCP endpoints, preferably at a more comprehensive NTCP profile as pointed out by others , but also include verification that the tumor control probability is not affected by choice of modality.
To illustrate this point in the lung cancer radiation therapy setting, we considered acute esophagitis and heart complications, both of which are of high relevance. We provide NTCP estimates for acute esophagus  and heart toxicity  in the Supplementary Materials. A reduction in NTCP for the heart when using protons compared to photons was observed, but we did not see a reduction in the risk of acute esophagitis (Supplementary Materials).
Evaluating only a reduction in MLD, the expected difference in RP in an RCT should favor intensity-modulated proton therapy. A recent study from MD Anderson Cancer Center (Houston, Texas) on an RCT comparing modulated photon therapy and passive scattering proton therapy in lung cancer  indicated that there was no difference in either MLD or RP rates between the 2 arms. However, the authors noted a difference in RP between early and late inclusion in the proton group, with no difference in MLD. The fact that the late proton group had both a significant smaller tumor volume and lower delivered dose might explain the lower RP rate. This suggests that higher prescribed doses further elucidate the risk of a higher-than expected dose metric from the lung and might itself be associated with the probability of RP induction when using proton therapy.
Our proposed trial design quantifies the benefit of referral to proton therapy, based on the combination of NTCP models and traditional randomization. The proposed method behaves well under the investigated types of NTCP model misspecification and provides quantitative benefit estimates on the individual patient level, which may have greater clinical utility than the information derived from a traditionally designed RCT.
ADDITIONAL INFORMATION AND DECLARATIONS
Conflicts of Interest: Ivan R. Vogelius, PhD, DMSc, has received grants and educational fees from Varian Medical Systems. Johannes A. Langendijk, MD, PhD, received honoraria from IBA for a consulting and advisory role and from IBA Speakers' Bureau, paid to the UMCG Research B.V. No other author declares conflicts of interest.
Acknowledgments: This work was supported by the Danish Cancer Society (grant R125-A7989) and by a donation from the Kirsten and Freddy Johansen Foundation (Rigshospitalet's International KFJ Award). Ane L. Appelt, PhD, is also supported by Yorkshire Cancer Research Academic Fellowship funding (grant L389AA).
Jonas Scherman, PhD, and Ane L Appelt, PhD, contributed equally to this work.