Incorporating NTCP into Randomized Trials of Proton Versus Photon Therapy

Purpose: We propose and simulate a model-based methodology to incorporate heterogeneous treatment benefit of proton therapy (PrT) versus photon therapy into randomized trial designs. We use radiation-induced pneumonitis (RP) as an exemplar. The aim is to obtain an unbiased estimate of how predicted difference in normal tissue complications probability (ΔNTCP) converts into clinical outcome on the patient level. Materials and Methods: ΔNTCP data from in silico treatment plans for photon therapy and PrT for patients with locally advanced lung cancer as well as randomly sampled clinical risk factors were included in simulations of trial outcomes. The model used at point of analysis of the trials was an iQUANTEC model. Trial outcomes were examined with Cox proportional hazards models, both in case of a correctly specified model and in a scenario where there is discrepancy between the dose metric used for ΔNTCP and the dose metric associated with the “true” clinical outcome, that is, when the model is misspecified. We investigated how outcomes from such a randomized trial may feed into a model-based estimate of the patient-level benefit from PrT, by creating patient-specific predicted benefit probability distributions. Results: Simulated trials showed benefit in accordance with that expected when the NTCP model was equal to the model for simulating outcome. When the model was misspecified, the benefit changed and we observed a reversal when the driver of outcome was high-dose dependent while the NTCP model was mean-dose dependent. By converting trial results into probability distributions, we demonstrated large heterogeneity in predicted benefit, and provided a randomized measure of the precision of individual benefit estimates. Conclusions: The design allows for quantifying the benefit of PrT referral, based on the combination of NTCP models, clinical risk factors, and traditional randomization. A misspecified model can be detected through a lower-than-expected hazard ratio per predicted ΔNTCP.


Introduction
Radiation oncology has-as have many other technology-driven medical disciplines-struggled to generate level I evidence demonstrating the clinical benefits of technologic advances. This is also the case for proton therapy [1], where the superiority over standard photon therapy remains controversial [2][3][4], even for widely accepted treatments such as pediatric malignancies [5][6][7]. It has been suggested that purely relying on evidence created from randomized controlled trials (RCTs) may not be the optimal way of identifying patients (or patient groups) likely to benefit from proton therapy [8][9][10][11]. An alternative methodology for development of data-based treatment strategies involves prediction of individual patient benefit using normal tissue complication probability (NTCP) models [8]. The health care payers in some European countries have accepted NTCP model predictions as a basis for reimbursement of the cost of proton therapy [8,12]. Still, selection of patients, based on outcome prediction models, suffers from risk of bias or inaccurate results if the underlying model lacks in accuracy or generalizability.
From an evidence-based medicine perspective, proton and photon therapies should be compared in RCTs, as would be required for any novel drug, except in indications where the benefit is deemed to be so large that randomization as a method for avoiding bias may not be necessary [13,14]. At the time of writing, a number of clinical trials of proton therapy are indeed in progress in a wide range of indications [15]. The challenge of this approach is that simple head-to-head comparison ignores the expected heterogeneity of treatment effect: Some patients may expect large benefits from proton therapy in terms of toxicity reduction, while others may have relatively little to gain (or might even be disadvantaged). Or to put it differently: an RCT of proton versus photon therapy may provide a methodologically correct answer to the wrong question-whether protons are uniformly better than photons in a defined population of patients-while missing out on the more relevant question-who, if any, among the patients will have a clinically meaningful benefit from proton therapy compared with photon therapy.
Here, we propose and simulate an approach where the heterogeneous treatment benefit of proton therapy predicted by a model is incorporated into the trial design. This idea also extends into the recurrent discussions regarding personalized medicine [16], where technology-driven improvements have the potential to play a major role. We show how such a randomized trial may provide a randomized estimate of the patient-level benefit from proton therapy, using comparative dose planning and taking clinical risk factors for toxicity into account.
As a proof-of-concept, we demonstrate the design and interpretation of a hypothetical trial of definitive radiation therapy for locally advanced non-small cell lung cancer (NSCLC), with radiation-induced pneumonitis (RP) as the primary endpoint. The model used at point of analysis of the trials is an individualized QUANTEC (iQUANTEC) model taking clinical risk factors into account [17,18].

Dose Planning
Twenty consecutive patients with locally advanced NSCLC treated with definitive radiation therapy and concomitant chemotherapy at our center, from January 1, 2015, and onwards, were selected as dose-planning cases, to provide realistic estimates of interpatient heterogeneity in dose metric differences. This retrospective study was approved by the Danish Health Authority, approval No. 3-3013-817/1, in accordance with Danish law. All patients were treated in free breathing with volumetric modulated arc therapy. Robust proton dose plans were generated, and clinical photon plans and generated proton dose plans fulfilled clinical constraints. See Supplementary Materials for additional information on patient data, dose planning, and clinical constraints.

Dose-Volume Histogram Data
For each dose plan (photon and proton) the mean lung dose (MLD) and the volumes receiving at least 10 Gy, 20 Gy, 30 Gy, 40 Gy, and 60 Gy (V 10Gy , V 20Gy , V 30Gy , V 40Gy , and V 60Gy ) were retrieved for the lungs (total lung minus gross tumor volume). Mean heart dose and V 35Gy to esophagus were also retrieved. Data were retrieved with Eclipse Scripting API (version 13.6) using in-house developed software (Visual Studio Community Edition 2015).
We considered the setting of a randomized trial of proton versus photon therapy for locally advanced NSCLC, with rate of symptomatic RP as primary endpoint and with a 1:1 allocation ratio to the 2 trial arms and 300 patients per arm. Figure 1 shows the overall simulation setup.
We assumed the risk of RP could be predicted by the dose plan MLD and calculated an individual patient risk factor, NTCP sum , for RP, using an individualized QUANTEC model (iQUANTEC) [18]. This model uses the dose-response relationship for symptomatic RP found in the QUANTEC review [19], but integrates additional clinical risk factors identified in a literature-based meta-analysis and their associated odds ratios (ORs) [20] (see Figure 2). We assumed a logistic doseresponse relationship and estimated the OR for RP compared to a reference (see below) for a patient with a specific MLD and a set of clinical risk factors X 1 , X 2 ..., as A summarized individual patient risk was estimated by using the logarithm of the OR, NTCP sum ¼ ln(OR), as this is additive in changes in risk factors.
We simulated outcomes by using a Weibull hazard function, h(t), parameterized by the shape and scale parameters q and k: Individual changes in the estimated risk of event (RP) were modeled by adjusting the scale parameter k risk ¼ kexp(NTCP sum ) ¼ kOR. Further, we required a patient with baseline clinical risk factors and average MLD with photon therapy (17.0 Gy) to have a probability of freedom from RP at 2 years of 85%, which in turn defines q ¼ 0.6 and k ¼ 0.11. Finally, we assumed follow-up evenly distributed over a 2-year period.
For simulation of trial outcomes, we used bootstrap resampling from the 20 patients with dual planning. For each sample, we used the photon dose plan MLD for risk estimation based on dose and randomly sampled clinical risk factors as identified by Vogelius and Bentzen [20], with prevalences as summarized in the study of Appelt et al [18]: preexisting pulmonary comorbidity, OR 2.27, prevalence 0.26; mid or inferior tumor location, OR 1.87, prevalence 0.44; current smoker, OR 0.62, prevalence 0.28; age .63 years, OR 1.66, prevalence 0.50.
To include the effect of the experimental treatment (proton therapy), we calculated the individual change in risk (ie, DNTCP) resulting only from the reduction in NTCP by using the difference in MLD between photon and proton therapy. We randomly selected patients for the experimental arm and adjusted their individual risk factor according to: NTCP sum.proton ¼ NTCP sum.photon þ DNTCP. Event times were simulated by sampling from the corresponding Weibull distribution (ie, with k risk , Figure 1. Schematics of the trial simulation setup. Twenty patients with dual planning provided a distribution of dose plans. These dose plans were input into a simulation model, which used either the iQUANTEC/QUANTEC model based on mean lung dose or a V XX model to simulate a baseline risk of toxicity corresponding to a given dose distribution. These data were forwarded to a simulation stage, where the dose plans were sampled with replacement for up to 2 3 300 simulated trial patients. Each patient also had a randomly generated set of clinical risk factors (smoking, pulmonary comorbidity, and age), and the combination of the radiation dose and clinical risk factors could then be used to generate a patient-specific Weibull distribution of event probability versus follow-up time. Simulated binary survival data were generated from the corresponding probabilities. The result was a 2 3 300 patient data structure simulating clinical outcomes in a trial. This ''in silico trial dataset'' was analyzed according to our suggested strategy: using the complication probability of an individualized QUANTEC NTCP model as a covariate in a Cox model (see figure for the exact expression used in the model). Abbreviations: HR, hazard ratio; MLD, mean lung dose; NTCP, normal tissue complication probability. based on either NTCP sum.proton or NTCP sum.photon , as appropriate), and censoring times were randomly sampled from a uniform distribution over the interval of 0 to 2 years.
Once the simulated trial dataset was created, we used the Cox proportional hazards model to analyze the data. The DNTCP and NTCP sum.photons were always included as covariates in the Cox model (Figure 1).

Model Misspecification
Model misspecification refers to the scenario where the user is making NTCP and DNTCP estimates (eg, in the model-based selection process) based on a certain parametrization (eg, MLD), but where the driver of toxicity is another metric (eg, lung V 40Gy ).
To study the effect of model misspecification, we changed the dose-volume model used in the simulation of outcome data while keeping the analysis of trial data the same. The assumed alternative driver dose-response models were based on dose cutoffs (V XX -driven models) and were taken from literature data [21]. These models did not account for clinical risk factors and were not modified further. Fitting a logistic model to the data set provided estimates of c 50 and D 50 (See Supplementary Materials). As Willner et al [21] only included predictive parameters up to V 40Gy , the data were extrapolated to V 60Gy in order to test a dose metric that correlated differently with MLD in the photon and proton plans ( Figure 3). Supplementary Materials tabulate c 50 and D 50 estimates for the different models; see Figure 2 for graphical versions of all models.
For both the main analysis (Trial Simulation) and the model misspecification we simulated 1000 trials each with 2 3 300 patients and report the resulting estimate of the logarithm of the hazard ratio [log(HR)] per DNTCP from the Cox model. Log(HR) per DNTCP is calculated by assuming the QUANTEC model for MLD (receiver model) regardless of which model was used to drive the simulated trial outcomes. To assess the effect of uncertainty associated with delivery of proton therapy on the trial outcome, we also included a worst-case proton plan MLD, selected from the uncertainty calculations performed during the planning process (using uncertainties of 0.5-cm isocenter shift and 3.5% calibration curve error). The worst-case proton plan was defined as the uncertainty plan with the highest MLD and was used as alternative driver dose-response models.

Application of Trial Results in Selecting Patients for Referral
Finally, we wanted to illustrate how the outcome of a trial could be used to support clinical decisions for future patients. The most frequent log(HR) per DNTCP, from 1000 simulations with a correctly specified model, was used for illustration. The trial yielded a log(HR) per DNTCP (based on a randomized comparison) for protons versus photons and we converted this result into an estimate of the expected benefit in 4 illustrative cases: the patient with the highest DMLD and a patient with approximately median DMLD, in both cases assuming no clinical risk factors or all clinical risk factors present (note that ''current smoker'' has a protective effect, ie, OR , 1). The effect size estimate of the Cox proportional hazards model could be  [19] and Appelt et al [18]. Alternative dose-risk models (right) for the dose-volume histogram parameters used in simulation of outcomes (first dark grey box in Figure 1) from Willner et al [21]. Abbreviation: MLD, mean lung dose. used to provide a probability distribution of absolute benefit from proton therapy at the individual patient level, using the basic properties of the Cox model: NTCP sum:protons ¼ NTCP sum:photons HR , where HR ¼ exp(b*DNTCP) and b [¼log(HR)] were assumed normally distributed according to the confidence interval of the simulated trial. This approach naturally combines the model prediction (NTCP) and the result of randomization, as log(HR) is an estimate with DNTCP as covariable in strict randomization.
All simulations and analysis of outcome data were performed in RStudio (RStudio: Integrated Development for R, RStudio Team [2015], RStudio Inc, Boston, Massachusetts).

Dose Plan Comparison
The mean DMLD was 4.3 Gy when comparing the robust planned proton dose plans with the clinical photon dose plans ( Figure 3). The proton dose plans had a lower MLD and a lower V 20Gy for all patients. However, V 40Gy and V 60Gy were higher with protons in an increasing proportion of cases. Evaluating the worst-case proton plans, the mean DMLD was 3.2 Gy.

Trial Simulation
The simulated trials favored proton therapy when an input MLD-based model was used (correctly specified model), and progressively decreased the predicted benefit of protons when the input model was changed from V 10Gy towards V 60Gy (Figure 4). Assuming the worst-case MLD (MLD-robustness in Figure 4) from the proton uncertainty plans had a limited effect on the trial outcome.

Use in Future Patients
We finally assumed the completion of a trial with a resulting estimate of the value of log(HR) for DNTCP. If the model was well specified, the most frequent trial result was used: log(HR) per DNTCP ¼ À7.77 (95% confidence limit: À16.6 to 1.05). This result was converted into probability distributions of predicted benefit, shown for 4 illustrative cases in Figure 5, detailing how a patient with all risk factors in combination with a large DMLD reduction, comparing photons to protons, will have a reduced probability of developing RP, namely, from 49% to 9%. Additionally, a patient with the same number of risk factors, but median DMLD between the 2 modalities, will only have a 31% to 15% probability reduction.

Discussion
Our trial simulation incorporates heterogeneity of treatment effect and quantified risk at the individual patient level into a randomized comparison of 2 treatment strategies. The aim is to obtain an unbiased estimation of how a difference in modelpredicted DNTCP converts into a clinical outcome on the patient level. Figure 5 shows how outcomes from randomized trial results can be used to quantify the individual benefits of proton versus photon therapy and thereby support the decision to refer to proton therapy on a rational basis. This allows moving from simple On the x-axis is log(HR) per DNTCP of the Cox proportional hazards model. As expected, when the MLD model is used to generate data, the outcomes favor the modelpredicted DNTCP [log(HR) , 0]. When the model is misspecified, the outcome favors protons even more for V 10Gy as the underlying model, and progressively moves towards favoring photons for the V 60Gy model as input (ie, HR per DNTCP favors the plan with highest NTCP in the model). Comparing with Figure 3, this illustrates that the trials will reject a benefit of treating with protons if V 60Gy is the underlying driver of RP, as desired. *MLDþrobustness assumes the worst MLD taken from proton uncertainty plans, using uncertainties of 0.5-cm isocenter shift and 3.5% calibration curve error. Abbreviations: DVH, dosevolume histogram; HR, hazard ratio; MLD, mean lung dose; NTCP, normal tissue complication probability; RP, radiation-induced pneumonitis. hypothesis testing in a comparative effectiveness trial towards individualized estimates of benefit. Arguably, the cases presented in Figure 5 reflect subjective decisions already made by treating physicians, but the estimate of the HR per DNTCP quantifies the relative merits of the 2 radiation modalities and provides decision support for patients, caregivers, and policy makers. A supportive decision-making tool based on the data from this article has been developed in RStudio and is available online at: https://protontrialsimulation.shinyapps.io/trial_webb/.
The trial design presented here still requires strict randomization. A possible elaboration on the present approach could be to use adaptive Bayesian designs. In such a design patients with a large predicted benefit are randomly assigned with higher weighting to proton therapy and when a predefined limit on the magnitude of benefit is exceeded, the patient can be allocated to the according arm without randomization at all.
A major strength of the presented design is that future selection of patients does not rely on the model being correctly specified. The estimate of HR per DNTCP adjusts the predicted benefit to the actual clinical data and thus reduces the risk of wrongly allocating future patients to protons if, for example, the V 60Gy model would be a better predictor of NTCP. In other words, where the model-based approach relies on nonrandomized follow-up studies after implementation to identify model misspecification, the current proposal detects model misspecification from randomized data through the HR per DNTCP from a trial. Another strength of the design is that an uncertainty in the MLD did not affect the trial outcome ( Figure 4).
It should be acknowledged that it is not ideal only to consider NTCP of a single endpoint in benefit estimations. Here we focused on a single endpoint for simplicity, but further studies should look at several NTCP endpoints, preferably at a more comprehensive NTCP profile as pointed out by others [11], but also include verification that the tumor control probability is not affected by choice of modality.  To illustrate this point in the lung cancer radiation therapy setting, we considered acute esophagitis and heart complications, both of which are of high relevance. We provide NTCP estimates for acute esophagus [22] and heart toxicity [23] in the Supplementary Materials. A reduction in NTCP for the heart when using protons compared to photons was observed, but we did not see a reduction in the risk of acute esophagitis (Supplementary Materials).
Evaluating only a reduction in MLD, the expected difference in RP in an RCT should favor intensity-modulated proton therapy. A recent study from MD Anderson Cancer Center (Houston, Texas) on an RCT comparing modulated photon therapy and passive scattering proton therapy in lung cancer [4] indicated that there was no difference in either MLD or RP rates between the 2 arms. However, the authors noted a difference in RP between early and late inclusion in the proton group, with no difference in MLD. The fact that the late proton group had both a significant smaller tumor volume and lower delivered dose might explain the lower RP rate. This suggests that higher prescribed doses further elucidate the risk of a higher-than expected dose metric from the lung and might itself be associated with the probability of RP induction when using proton therapy.

Conclusion
Our proposed trial design quantifies the benefit of referral to proton therapy, based on the combination of NTCP models and traditional randomization. The proposed method behaves well under the investigated types of NTCP model misspecification and provides quantitative benefit estimates on the individual patient level, which may have greater clinical utility than the information derived from a traditionally designed RCT.