Cost-Effectiveness Models of Proton Therapy for Head and Neck: Evaluating Quality and Methods to Date

Purpose Proton beam therapy (PBT) is associated with less toxicity relative to conventional photon radiotherapy for head-and-neck cancer (HNC). Upfront delivery costs are greater, but PBT can provide superior long-term value by minimizing treatment-related complications. Cost-effectiveness models (CEMs) estimate the relative value of novel technologies (such as PBT) as compared with the established standard of care. However, the uncertainties of CEMs can limit interpretation and applicability. This review serves to (1) assess the methodology and quality of pertinent CEMs in the existing literature, (2) evaluate their suitability for guiding clinical and economic strategies, and (3) discuss areas for improvement among future analyses. Materials and Methods PubMed was queried for CEMs specific to PBT for HNC. General characteristics, modeling information, and methodological approaches were extracted for each identified study. Reporting quality was assessed via the Consolidated Health Economic Evaluation Reporting Standards 24-item checklist, whereas methodologic quality was evaluated via the Philips checklist. The Cooper evidence hierarchy scale was employed to analyze parameter inputs referenced within each model. Results At the time of study, only 4 formal CEMs specific to PBT for HNC had been published (2005, 2013, 2018, 2020). The parameter inputs among these various Markov cohort models generally referenced older literature, excluding many clinically relevant complications and applying numerous hypothetical assumptions for toxicity states, incorporating inputs from theoretical complication-probability models because of limited availability of direct clinical evidence. Case numbers among study cohorts were low, and the structural design of some models inadequately reflected the natural history of HNC. Furthermore, cost inputs were incomplete and referenced historic figures. Conclusion Contemporary CEMs are needed to incorporate modern estimates for toxicity risks and costs associated with PBT delivery, to provide a more accurate estimate of value, and to improve their clinical applicability with respect to PBT for HNC.


Introduction Methodological Quality
The Philips checklist [22] was selected as the tool to measure methodological quality because it was specifically designed for the assessment of modeling studies and is recommended by both the National Institute for Health and Care Excellence (NICE; London, England) [23] and the Cochrane Collaboration (London, England) [24]. Note that this checklist was not intended to comprehensively standardize the decision model-development process, functioning more as a series of consolidated good practice statements to consider in appraising decision models.

Assessment of CEM Parameters
A hierarchy scale developed by Cooper et al [25] was used to assess the quality of parameter inputs from data sources referenced in the CEM studies. The scale ranks the quality of evidence from 1 (best quality) to 6 (lowest quality) for studies of clinical effect and safety, medical resource use, health care costs, and utility scores.

Summary of CEM Studies (Tables 1-5)
Lundkvist et al [26] The Swedish Markov cohort model simulated several histologies including 300 patients with HNC comparing PBT and conventional radiotherapy (CBT) from a societal perspective. This simplified disease model had only 3 health states: healthy, chronic adverse event conditions, and death. Acute mucositis and xerostomia, as well as chronic xerostomia, were considered but only for their contribution to mortality risk. In conclusion, PBT was simulated to contribute an additional 1.02 QALYs at the cost of E3 887 (US $3 693), which translated to E3 800 (US $3 693) per QALY in 2002. The authors concluded that PBT was cost effective, assuming a willingness-to-pay level of E55 000 (US $52 250).

Ramaekers et al [27]
The Dutch Markov cohort model simulated 25 patients with stage III-IV HNCs (oral cavity, laryngeal, and pharyngeal tumors) from the Dutch health care perspective, comparing intensity-modulated PBT (IMPT) with IMRT. This model better reflected the natural history of cancer by including the following health states: disease free without toxicity, disease free with toxicity, locoregional recurrence, distant progression, and death. However, toxicities focused solely on grade 2 or greater dysphagia and xerostomia, with parameter inputs based on normal tissue complication probability (NTCP) models and comparative plan studies, not empiric clinical data.
In addition to evaluating IMPT for the whole study population, a subset cohort (PBT ''if efficient'') was analyzed in which patients were stratified to the most cost-effective modality under a willingness-to-pay threshold of E80 000 (US $106 400) in 2010. Employing this case-by-case strategy, the authors concluded that PBT could be cost effective but yielded a mere 0.043 QALYs gained at the additional cost of E2 612 (US $3 474) (translating to an ICER of E60 278 [US $80 170] per QALY). However, on sensitivity analysis, which differentiated disease and survival outcomes among PBT versus IMRT, the latter dominated with more QALYs at lower cost when considering all patients.
Sher et al [28] The American Markov model compared PBT to IMRT for stage IVA oropharyngeal carcinoma, structuring the model on the natural cancer history of a single HNC patient, upon which various sensitivity analyses were applied. Separate model inputs were applied to simulate HPV þ versus HPV À subpopulations, with external calibration of disease-related outcomes against independently published data. Toxicity endpoints (acute dysgeusia, late grade !2 xerostomia, percutaneous endoscopic gastrostomy [PEG]-tube placement, and dental complications) once again incorporated hypothetical modeling inputs from retrospective clinical studies. The estimated benefit with PBT assumed a symmetric triangular distribution for reduction for each toxicity endpoint (ranging from 0% to 50%), with a ''best-case scenario'' defined as a maximal 50% improvement. With significant limitations, the authors concluded that PBT, in general, was not cost effective from either the payer or societal perspectives (at a threshold of US $100 000 in 2016) and would only be cost effective from the payer perspective under favorable conditions for young HPV þ patients. Li et al [20] The Chinese Markov cohort model compared IMPT versus IMRT for treating paranasal sinus and nasal cavity cancers from a Chinese health care perspective. The model adopted a simple structure of only 3 health states: no cancer, alive with cancer (including recurrent, metastatic, or residual disease), and death. Interestingly, no acute or long-term side effects were included because the authors assumed similar toxicity outcomes among IMPT and IMRT; however, the modalities were assumed to yield different disease control outcomes (in favor of IMPT). Evaluation started with the base-case assessment of a 47-year-old patient, along with stratified analyses for different age subgroups, altogether indicating cost effectiveness for patients 56 years or younger (including the base case), at a threshold of $30 828 specific to China.
Reporting Quality (Supplemental Material Table S1) The 4 CEMs generally adhered to the CHEERS checklist criteria [21], although each lacked certain components. For example, only the Sher et al [28] study indicated both (1) study comparators, and (2) evaluation type within their title. The CHEERS checklist also requires inclusion of objectives, perspective, setting, methods, results, and conclusions in the abstract-with at least one component overlooked by each study. Overall, methodology reporting would have been stronger if references were provided justifying the rationale for model design, as well as input selection (such as base case population, time horizon, and discount rate). Specifically, the Lundkvist et al [26] study provided no support for use of the Markov cohort methods, and the Ramaekers et al [27] study failed to represent uncertainties among model distributions. The Sher et al [28] study lacked both criteria and failed to explore the actual generalizability of their findings (or lack thereof). The Li et al [20] study did not report mean estimated outcomes or costs for each radiation therapy and lacked recognition of and discussion of previous studies [26,27] Methodological Quality (Supplemental Material Table S2) In terms of structural design, each CEM simplified their models with numerous assumptions to the extent of inadequately reflecting the comprehensive natural history of HNC. Across all 4 CEMs, the duration of treatment effect was left ambiguousa major limitation given the significant effect on calculated QALYs, ICERs, and conclusions-with varying cycle times applied to each model, for which only the Ramaekers et al [27] study provided justification (and extrapolated the short-term effect as an alternative assumption tested on sensitivity analysis). The Sher et al [28] constructed their model based off the history of a 65- Abbreviations: HNC, head and neck cancer; PBT, proton beam therapy; CRT, conventional radiotherapy; IMPT, intensity-modulated proton therapy; IMRT, intensity-modulated radiation therapy; EVPI, expected value of perfect information; SCC, squamous cell carcinoma; PEG, percutaneous endoscopic gastrostomy; HPV, human papillomavirus; OPC, oligodendrocyte progenitor cell. year-old patient with HNC, on top of which various sensitivity models were applied. Three CEMs, except for Lundkvist et al [26] provided justification for model structure, although the Ramaekers et al [27] model provided no reference, and the Li et al [20] model was based only on an older published study. All 4 CEMs directed appropriate attention toward significant parameters; however, exact methods of data identification, selection, and quality assessment were unreported. In the setting of multiple data sources, no systematic approaches were applied toward evidence synthesis, and all CEMs appeared to lack internal consistency checks. With respect to cost inputs, the Lundkvist et al [26] and Sher et al [28] studies presented evaluations from the societal perspective, but focused mainly on the cost of building and/or financing a proton facility and omitting other essential elements for comprehensive analysis. Per the Second Panel on Cost-Effectiveness in Health and Medicine [29], the societal perspective should (1) cover all parties affected, and (2) include all significant costs incurred (direct and indirect). Direct cost estimates among these studies failed to consider out-of-pocket patient expenses, for example, and only the Lundkvist et al [26] study considered direct nonmedical costs (eg, transportation, accommodations, among others) associated with acquisition of provider services. The most important nonmedical cost-lost productivity-was poorly captured by existing CEMs as well. Finally, none of the CEMs clearly specified the primary decision maker for analysis. Material Table S3) Dating back to 2005, the Lundkvist et al [26] study generated the simplest model with the literature sources available at the time. Survival rates were estimated from the Swedish cancer registry, with the relative mortality risk of PBT based purely off assumptions. The Li et al [20] study, from 2020, assumed differential probabilities eradicating disease in the setting of paranasal sinus and nasal cavity cancers [30]. In contrast, both the Ramaekers et al [27] and Sher et al [28] studies (in 2013 and 2018, respectively) generally considered PBT and IMRT to be analogous with respect to disease and/or survival outcomes (although the Ramaekers et al [27] study also explored the alternative scenario in the sensitivity analysis). In Ramaekers et al [27], disease progression probabilities were based on a meta-analysis of randomized controlled trials examining arms comprising radiation with concomitant chemotherapy. Cancer-related mortalities from locoregional recurrence and distant metastases were from a single randomized controlled trial (Radiation Therapy Oncology Group [RTOG], Philadelphia, Pennsylvania; RTOG 9610) and a prospective study, respectively. The Sher et al [28] study used more-recent data of RTOG 0129 and RTOG 0522 for disease progression and mortality rates, whereas the Li et al [20] study incorporated probabilities and rates from systematic review and meta-analysis data [30].

CEM Parameter Assessment (Supplemental
With respect to complications, the Lundkvist et al [26] study completely excluded toxicity considerations for HNC. Similarly, the Li et al [20] study assumed no differential outcomes in irradiation-induced acute and late toxicities between IMPT and IMRT for paranasal sinus and nasal cavity cancers. Both the Ramaekers et al [27] and Sher et al [28] studies attempted to apply clinically relevant data from NTCP modeling studies, but again, these theoretical models estimate adverse-event probabilities based on dosimetric plan comparisons and are beset by their own uncertainties in the absence of clinical validation. The Sher et al [28] study did also incorporate a couple of small retrospective series available at the time, but still entailed significant, unsupported assumptions for relative toxicity risks. Further issues include the poor quality of utility or disutility values across each of these studies. Although the utility value in the Lundkvist et al [26] study was referenced to several studies, the actual value associated with them was not readily traceable. The Ramaekers et al [27] study incorporated a very indirect measure of utilities from EuroQol 5-Dimension (EQ-5D; EuroQol Research Foundation, Rotterdam, the Netherlands) data among HNC patients, whereas the Sher et al [28] study provided direct measurements derived from healthy subjects via the standard gamble method. Utility values in Li et al [20] stemmed from cross-sectional data.
Finally, regarding costs, medical resource use across models was mainly based on expert opinion or quantities reported among procedural guidelines, in lieu of direct observation, whereas preexisting literature sources were cited for radiotherapy treatment and cancer-related costs. Two studies employed activity-based costing (Ramaekers et al [27] and Sher et al [28]), citing government-issued fee schedules as data sources for unit costs. Both Lundkvist et al [26] and Sher et al [28] also evaluated the societal perspective, estimating the investment cost for a proton therapy center: calculations in the former were based off very outdated literature [26], whereas the latter [28] failed to specify relevant sources (raising questions of validity).  Of all the studies, only Lundkvist et al [26] included patient-level transportation and hotel accommodation costs (although even those figures were based on assumptions). Cost estimates from a proton center and cancer center in China were used in Li et al [20].

Discussion
In this modern era of value-based health care, formal economic evaluations can be useful tools to compare novel interventions and to provide preliminary rationale for management decisions [16]. However, as demonstrated in this review, CEMs are not without their limitations. These hypothetical simulations are largely bound by the quality of available evidence for reference as inputs. Although quantitative in nature, subjective choices in design, methodology, and parameters can drastically affect the quality and applicability of findings. These comments aside, the referenced authors are to be applauded for their innovative efforts during the early era of PBT, when few, if any, high-quality studies existed in the literature. In general, CEMs are notoriously difficult to conduct because of their comprehensive nature and difficulties with simultaneously accounting for each outcome and variable of significance. The purpose of this review is to evaluate the methodology of existing models and to identify future directions for improvement. In its simplest form, the health care value equation is defined as quality (or outcomes) over cost, and our suggestions involve both components.
CEM Development: General Best Practices (Table 6) Future CEMs should follow the recommendations of the Second Panel on Cost-Effectiveness in Health and Medicine [29], which suggests inclusion of both a health care sector and a societal perspective. For either perspective, all economic and clinical effects of interventions in the impact inventory should be taken into consideration to ensure comprehensive coverage of all consequences for each party. To ensure transparency, all model parameters, assumptions, analyses, and structural decisions should be clearly outlined and supported with appropriate rationale. For critical model inputs that will significantly influence results or model validity, parameter selection should be informed via formal evidence synthesis. Finally, the panel recommends that identical discount rates for both costs and health consequences be explored within sensitivity analyses, in contrast to the heterogeneous rates applied by Ramaekers et al [27].

PBT Outcomes: Focus on Toxicity Parameters
Disease-specific outcomes, such as distant metastases and progression-free survival, are presumably similar for PBT and IMRT across multiple disease sites and histologies; thus, no significant concerns were observed for such outcomes. However, the referenced models inadequately represented the most important difference (and major benefit) of PBT over IMRT: decreased toxicities. Lundkvist et al [26] failed to incorporate any toxicity data for HNC because of the paucity of clinical literature at the time, whereas Li et al [20] assumed similar complication rates for paranasal sinus and nasal cavity cancers (despite data indicating decreased toxicities with PBT over IMRT [31,32]). The other two Markov models were flawed in their choice of toxicity states (with the exclusion of several clinically relevant complications) and suboptimal quality of input values for their included endpoints.
The Ramaekers et al [27] study, similarly because of the lack of data, based parameter inputs off comparative planning studies and NTCP models, which attempt to predict the incidence of treatment-related complications [33]. Admittedly, this was an innovative approach to compare outcomes in the absence of patient-level data [34]; however, those extrapolations of radiobiologic theory are beset by numerous assumptions, which culminate in significant uncertainties and concerns of external validity. Furthermore, only dysphagia and xerostomia were included as toxicity states, with the notable omission of PEG-tube placement (a common and clinically significant treatment-related complication). The NTCP models also underestimated the benefit of PBT versus IMRT with respect to dysphagia (as 18% versus 23%), in contrast to published clinical studies. Follow-up was also quite limited at 1 year, underestimating the long-term value of PBT. Sher et al [28] improved upon prior models by incorporating additional toxicity endpoints (late xerostomia, acute dysgeusia, PEG-tube placement, and dental complications) and extending the follow-up period for calculation. However, their parameter inputs were again informed by modeling studies and the limited number of small, retrospective series available at the time of study. The benefit of PBT in PEG-tube placement was underestimated as a mere 25% reduction versus IMRT (odds ratio, 0.75), in contrast to published clinical data on patients with oropharyngeal and nasopharyngeal cancer [10,11] (although, admittedly, this endpoint is dependent upon clinician preference and the relative volume of oropharyngeal mucosa in the field). Furthermore, both the Sher et al [28] and the Dutch model [27] focused on grade 2 or greater toxicities, implying moderate symptoms that only modestly interfere with function and may not require intervention. In contrast, grade 3 or greater toxicities (severe symptoms that interfere with daily activity and merit intervention) would arguably be a superior endpoint given their larger impact on both quality and cost. Although some late grade-2 toxicities are clinically significant, appropriate follow-up periods are necessary after treatment to adequately capture them.
At no fault of the authors, pertinent clinical data were quite limited during their period of publication (2005-18), but since then, several robust studies have been published with patient-level data that must be incorporated in future CEMs. For example, a prospective, comparative analysis of IMPT versus IMRT for oropharyngeal cancer demonstrated several benefits of PBT in acute toxicities and patient-reported outcomes (PROs) [10]: notably, significant reductions in PEG-tube use (20% versus 46%), posttreatment hospitalizations (9% versus 31%), and narcotic requirements. Decreased cough, dysgeusia, and dysphagia were also documented on PROs, whereas providers observed less pain and mucositis with IMPT. On an even larger scale, a retrospective comparative effectiveness study of 1483 patients (29% HNCs) [13] linked PBT with a two-thirds reduction in grade 3 or greater adverse events associated with hospital admission (11.5% versus 28%; relative risk [RR], 0.31; 95% confidence interval [CI], 0.15-0.66) and found less detriment to performance status as compared with IMRT (RR, 0.51; 95% CI, 0.37-0.71). To the earlier point emphasizing Common Terminology Criteria for Adverse Events severity, a lower (yet still significant) benefit was observed with grade 2 or greater toxicities (RR, 0.78; 95% CI, 0.65-0.93) as well.
As a final note on outcome parameters, both newer studies already demonstrate the oversight of prior CEMs for several clinically significant and costly endpoints, such as unplanned hospital admissions, pain and corresponding narcotic use, cough, and performance status decline. Other considerations include fatigue, osteoradionecrosis, endocrine complications, fibrosis, esophagitis, and oral mucositis. Furthermore, none of the aforementioned CEMs incorporated PRO data, which can supplement Common Terminology Criteria for Adverse Events assessments to more granularly capture quality-of-life differences [10,12]. Future CEMs should strive to identify and include such endpoints among their model parameters to provide a more comprehensive demonstration of the true benefit of PBT.

PBT Costs: Toxicity Management, Treatment Expenses, and Indirect Benefits
Health care costs in the United States are difficult to measure because of varying stakeholder perspectives: that of patients, payers, providers, and society-each of which merits evaluation via distinct scenario analyses. Given their often-conflicting interests, those stakeholders place widely varying emphases on different cost contributors toward decision-making. For example, although fixed costs for research and technology development should always be considered [29], the actual cost of building and financing a proton facility may be more relevant to some groups than others (ie, providers and society, but not patients or payers), given that payers do not directly factor those investment costs into their reimbursement policies.
However, for investment and operational expense of RT delivery, technologic advancements have actually led to dramatic reductions in PBT equipment cost over time [35,36]. Within the past few years, the massive industrial-sized facilities of the past have transitioned toward smaller, compact units that are readily incorporable within existing medical campuses and are associated with significantly decreased capital investment: from $100-$250 million historically, to now, as low as $25-$30 million per center [35,37,38]. Accompanied by optimized delivery efficiency [39], these innovations have simultaneously lowered the threshold for operational sustainability [35,37,38], resulting in broader provider adoption and patient access. These same technologic adaptations have also been accompanied by improvements in plan robustness [40], with continued benefits to patient outcomes (and treatment quality). Future CEMs should incorporate contemporary cost figures reflecting such changes, which will certainly enhance the value of PBT relative to historic models.
Comprehensive cost estimates also need to consider direct, as well as indirect, contributors: direct costs are associated with the actual medical services delivered to patients by providers, whereas indirect costs result from disability and productivity loss from disease-related (or treatment-related) morbidity. Optimizing toxicity endpoints, as previously mentioned, would enhance direct estimates by more accurately measuring the cost of treatment-related complications. Indirect costs, on the other hand, are poorly reflected by traditional economic evaluations because of difficulties associated with their measurement. However, the expenses incurred from treatment-related disability and productivity loss are significant contributors to financial toxicity [41], with broad societal implications. These measures are also particularly relevant to PBT because lower treatmentrelated toxicities could help maintain performance status and work productivity among patients with HNC [13] (many of whom are young working-aged adults). Thus, although challenging, future studies should attempt to incorporate such endpoints, starting, perhaps, with basic work-productivity assessments.
On a final note, CEMs should transition from hypothetical cost estimates toward incorporating real-world figures for cost inputs, just as recommended for clinical outcome and toxicity parameters. From the payer perspective, actual reimbursement records present validated cost data that can differ dramatically from model estimates [42,43]. For example, our proton therapy center collaborated with stakeholders on an insurance-coverage pilot for PBT, attempting to address patient-access barriers associated with insurance prior authorization [42,44,45]. That entailed a comprehensive cost-of-care analysis evaluating total medical charges among case-matched PBT versus IMRT patients, demonstrating no significant differences in average medical costs [42]-a surprising result (among a prospective clinical cohort), which diverged significantly from anticipated cost estimates. Similarly, from the provider perspective, time-driven, activity-based methodologies can measure the actual cost of treatment delivery by methodically quantifying resource use associated with a full care cycle; and from the patient perspective, prospective surveys are useful instruments to assess financial toxicity [41]. Incorporating any of these real-world data sources would bolster the external validity of future CEMs.

Conclusions
In summary, we identified limitations of the existing CEMs and outlined several areas for improvement among future models. Given their increasing relevance in the modern health care era, economic evaluations should strive to incorporate the highestquality evidence for parameter inputs (particularly with phase II/III randomized trial data highly anticipated [9,46]). Collectively, these suggestions will help minimize the uncertainties associated with CEMs, thus providing a more valid and applicable picture of the true value of PBT.