Abstract
Dual-energy computed tomography (DECT) has been used to derive relative stopping power (RSP) maps by obtaining the energy dependence of photon interactions. The DECT-derived RSP maps could potentially be compromised by image noise levels and the severity of artifacts when using physics-based mapping techniques. This work presents a noise-robust learning-based method to predict RSP maps from DECT for proton radiation therapy.
The proposed method uses a residual attention cycle-consistent generative adversarial network to bring DECT-to-RSP mapping close to a 1-to-1 mapping by introducing an inverse RSP-to-DECT mapping. To evaluate the proposed method, we retrospectively investigated 20 head-and-neck cancer patients with DECT scans in proton radiation therapy simulation. Ground truth RSP values were assigned by calculation based on chemical compositions and acted as learning targets in the training process for DECT datasets; they were evaluated against results from the proposed method using a leave-one-out cross-validation strategy.
The predicted RSP maps showed an average normalized mean square error of 2.83% across the whole body volume and an average mean error less than 3% in all volumes of interest. With additional simulated noise added in DECT datasets, the proposed method still maintained a comparable performance, while the physics-based stoichiometric method suffered degraded inaccuracy from increased noise level. The average differences from ground truth in dose volume histogram metrics for clinical target volumes were less than 0.2 Gy for D95% and Dmax with no statistical significance. Maximum difference in dose volume histogram metrics of organs at risk was around 1 Gy on average.
These results strongly indicate the high accuracy of RSP maps predicted by our machine-learning–based method and show its potential feasibility for proton treatment planning and dose calculation.
Introduction
Proton radiation therapy has been one of the emerging treatment modalities that may have better clinical outcomes for a wide range of patients due to favorable dosimetric properties related to the Bragg peak and virtually no exit dose compared with photon radiation therapy [1–4]. The calculation of proton dose based on computed tomography (CT) simulation images requires the conversion from the Hounsfield unit (HU) numbers to the relative stopping power (RSP) for different materials [5–8].
One of the currently implemented methods is to calibrate RSP based on the HU number on tissue characterization phantoms with known atomic compositions and electron densities. A 1-to-1 relationship between HU and RSP can then be established by a piecewise linear function. However, a direct HU-RSP calibration may introduce ambiguity since tissues with different combinations of atomic composition and electron density may have the same attenuation, which may cause inaccuracy in determining radiation absorption properties for proton dose calculations in treatment planning [9]. Moreover, the approximation of real tissue with tissue substitutes in phantom also introduces error due to the differences in chemical composition [10].
Recently, dual-energy CT (DECT) has been introduced to radiation therapy simulation for its ability to provide material-specific information by differentiating the energy dependence of photoelectric and Compton interactions of different materials [11–14]. Parametric maps, such as RSP, electron density (ρe), effective atomic number Zeff, and mean excitation energy (I), can be derived from DECT images in a voxel-wise manner using physical equations [15, 16]. However, the physics-based method can be very sensitive to the noise and artifacts on the DECT images since the overlapping of the 2 energy spectra would turn the system into an ill-posed problem. On the other hand, in addition to the noise and artifacts present in single-energy CT, DECT suffers extra noise and artifacts caused by non-ideal dual-energy dataset acquisition schemes, such as patient motion artifacts in 2-sequential-scan DECT [17, 18] and cross scatter artifacts in dual-source CT [19, 20]. Meanwhile, twin-beam DECT (TBCT) has been introduced into radiation therapy simulation for its good temporal coherence, full field of view, and low hardware complexity and cost, while it has poorer energy spectra separation compared with other DECT modalities [17, 21, 22]. The strong overlapping of energy spectra of linear attenuation coefficient among different materials would lead to significant noise magnification from the acquired projection to the results of material differentiation [12, 23–25]. The physical derivation does not accommodate these non-idealities and would magnify the noise and artifacts on the DECT images to the derived parametric maps that directly lead to uncertainty and inaccuracy.
Inspired by the success of the development of machine learning in recent years, novel methods have been developed to convert between images presenting similar anatomy but different modalities [26–35]. Due to the data-driven properties, learning-based methods are expected to be more robust to noise. These algorithms have also been introduced for parametric map generation from DECT [9]. In the study by Su et al [9], models are trained by a large number of pairs of DECT images and corresponding parametric maps using different algorithms, and then predict parametric maps from a new DECT image input. However, the training model is based on phantom of tissue substitutes but is used for predicting clinical patient datasets. The approximation of real tissue by tissue substitutes still exists, and the learning on phantom with piecewise values on simple geometry may neglect the heterogeneities and complexities present in real patient datasets. Thus, an appropriate training model with ground truth based on human scans is of interest.
In this work, we propose a novel machine-learning–based method to predict RSP maps from DECT for proton radiation therapy. The aim of this study was to provide an alternative approach to the physics-based method with more noise robustness. In our method, we integrated a residual attention architecture into cycle-consistent generative adversarial networks (cycle-GANs). Compared with other machine-learning–based methods, the advantages include automatic extraction of deep features from DECT images and use of residual blocks to force the model to focus on the differences between RSP and DECT. Moreover, a deep attention strategy was integrated into the network architecture to highlight the informative features that would well represent the difference between RSP and DECT images. Third, cycle-GANs were used to let the DECT-to-RSP mapping be close to a 1-to-1 mapping by introducing an inverse RSP-to-DECT mapping. We retrospectively investigated 20 head-and-neck cancer patients with twin-beam DECT scan acquired during CT simulation and treated with proton radiation therapy. Ground-truth RSP values were assigned by calculation based on chemical compositions [9, 10, 36]. These ground-truth RSP maps acted as learning targets in the training process for DECT datasets and were evaluated against results from the proposed method using a leave-one-out cross-validation strategy. The accuracy of predicted RSP maps by the proposed method was quantified with multiple quality and dosimetric metrics. All results generated by the proposed method were compared with the physics-based stoichiometric method.
Materials and Methods
Workflow
Figure 1 outlines the schematic workflow of our prediction method. The DECT is able to provide 2 CT image datasets acquired with low- and high-energy spectra. For a given series of low and high DECT images and their corresponding RSP maps, the RSP maps were used as the regression target. The use of both energy images gives the model higher fidelity since the CT values at 2 energy levels demonstrate 2 different energy dependences of material attenuation, which indicates material information. During training, 96 × 96 × 32 voxel patches were extracted from low- and high-energy images by a sliding window with an overlap of 80 × 80 × 22 between any 2 neighboring patches. Furthermore, data augmentations were used to enlarge the training data variation. Then, prepared patches were fed into a 3-dimensional (3D) deep learning framework as a 2-channel input and mapped to a 1-channel output patch, which was correlated to training target, that is, RSP image. To enforce a 1-to-1 mapping, a cycle-GAN–based framework was used to introduce an inverse mapping, which took the RSP map as 1-channel input and mapped it to low- and high-energy CT images as 2-channel output. In order to learn the specific differences among these 3 datasets, 6 residual blocks were used as short skip connections in the end-to-end U-Net–based generator architecture. To further highlight significant features that can fully represent these 3 image datasets, 3 attention gates were used for the other 3 long skip connections as shown in the generator architecture in Figure 2. To judge the realism of the predicted maps, a fully convolutional network architecture-based discriminator was used to discriminate the true image from a predicted image. After training, the paired patches of low and high DECT images were extracted from a new arrival patient's DECT dataset and are fed into the trained networks to obtain the predicted RSP patches. Finally, by using patch fusion, the RSP map of a new arrival patient's DECT was predicted.
The schematic flowchart of the proposed method. The first row shows the training stage. The second row shows the prediction stage. DECT, dual-energy computed tomography; RSP, relative stopping power.
The generator and discriminator network. Wg and Wh are two linear transformations in attention gates. Deonv, Deconvolution; PReLU, Parametric Rectified Linear Unit; bn, Batch normalization; Conv, Convolution.
Network Architecture
Figure 2 shows the network architecture of both generators and discriminators used in the proposed method. Both generators share same architecture with different trainable weights. Both discriminators (discriminator of RSP and discriminator of DECT) share same architecture with different trainable weights. We used a traditional fully convolutional network to implement the discriminators [34]. The generator architecture (of both DECT-to-RSP and RSP-to-DECT) is an end-to-end U-Net, including encoding and decoding paths. The encoding path is composed of 3 convolution layers with a stride size of 2 to reduce the feature maps size and several further convolution layers with stride size of 1. The decoding path is composed of 3 deconvolution layers to enlarge the feature map size and match the size of input image (end-to-end), several further convolution layers with stride size of 1, and a tanh layer to perform the regression. To collect features extracted from encoding path and decoding path with same feature map size, 1 short skip connection and 3 long skip connections were used to concatenate equal-sized feature map that is extracted from encoding and decoding convolution layers, respectively. The short connection included 6 residual blocks. The aim of using residual block is to lead the network focusing on the difference between source and target images. The details of implementation can be shown in Harm et al [37]. As shown in Figure 2, the long skip connection was implemented by concatenating these feature maps extracted from the layer of encoding path with same-sized feature maps extracted from the layer of decoding path. Attention gate could capture the most relevant semantic (segment) information without enlarging the receptive field [31]. Since in this work, the target RSP image is close to a semantic image, we proposed integrating attention gate into the long skip connection to highlight the semantic features from feature maps extracted from the previous layer of encoding path. The details of the implementation of attention gate can be found in our previous work [31].
Loss Function
Data Acquisition
In this study, we retrospectively analyzed the images from 20 patients with squamous cell carcinoma in the head and neck region. The 20 patients included 13 men and 7 women ranging in age from 25 to 89 years. The locations of tumors in these 20 patients included larynx, buccal mucosa, tongue, etc, and 12 of them underwent excisions. Each patient had CT simulation by TBCT in DECT mode with a 110-second delay after 100 mL Omnipaque 300 iodine contrast injected at 2.5 mL/s. Their targets and organs-at-risk (OARs) were delineated by physicians. Institutional review board approval was obtained with no informed consent required for this Health Insurance Portability and Accountability Act–compliant retrospective analysis.
The DECT images were acquired by a Siemens SOMATOM Definition Edge twin-beam CT scanner (Erlangen, Germany) at 120 kVp with the patient in treatment position (pitch: 0.45, rotation time: 0.5 seconds, scan time: around 30 seconds, CTDIvol: around 20 mGy, reconstruction algorithm: SAFIRE, reconstruction kernel: Q30f, tube current ranges from 500 to 650 mA, and metal artifact correction was in use). The 120 kVp x-rays were split into high and low spectra by 0.6 mm tin and 0.05 mm gold filters, yielding high-energy and low-energy scans for reconstruction, respectively. Composed images were also reconstructed from raw projection dataset by disregarding spectral differences. All the above images were reconstructed by Siemens Syngo CT VA48A with spacing 0.98 × 0.98 × 1.5 mm3 and 512 × 512 pixels of each slice.
The ground truth RSP map of each patient was created by manually classifying the composed DECT images into different materials and assigning corresponding RSPs. In this study, we differentiated 7 types of material as listed in Table 1, each of which has RSP calculated using the chemical compositions published in Su et al [9] and ICRP Publication 23 [36]. Equations and details of calculation were done as previously reported [10]. Note that although the learning target is bulk-assigned RSP maps, the predicted RSP maps are still patient-specific with continuous values.
Ground-truth relative stopping power of different materials calculated based on chemical compositions published in ICRP Publication 23 [36].
![Ground-truth relative stopping power of different materials calculated based on chemical compositions published in ICRP Publication 23 [36].](https://allen.silverchair-cdn.com/allen/content_public/journal/theijpt/7/3/10.14338_ijpt-d-20-00020.1/2/m_i2331-5180-7-3-46-t01.png?Expires=1616880254&Signature=e4vmO4IbSe5nfz4NeQOvdFmjwvkgDWl08AfpprdwE-PBIKOsOMD869IW38tpNo9gg~OsNEOKI~YAWO3XCPXPpgyNw5vcSl3j1dpmAuVA89V1pZ~qMTIO1MUf6Oo3aUH55OCUSbVi1-GJiqPhguw-DYiCJJ996kFF18ZTN-B36nWpB2gzP9~Q-DwrAeEOM9J51AcNVBkeE-rpS3LVLGqsjZNoP-XZOy9SDRCiyp~ybBDGJHx-0YY8NSseDc2ONQPw4m~2TEXoncSLm2-pd7tsyJHhACAxgzIZ9OksRJm0S1pAu2LWrR2zB6SdrrpxXP5wVp9-uH8l0wGPGbWTXzFm8Q__&Key-Pair-Id=APKAIE5G5CRDK6RD3PGA)
The cohort of 20 patients was used to evaluate our method using leave-one-out cross-validation. For each test patient, the model was trained by the remaining 19 patients. The model is initialized and retrained for next test patient by training another group of 19 patients. The training datasets and testing datasets were separated and independent during each study. For our training, we used data augmentation and 3D patch-based method to increase training data variation. Flipping, rotation, scaling, and rigid warping were used to enlarge the data size by 72 times. Patch size was set to 96 × 96 × 32.
For comparison, a physics-based dual-energy stoichiometric method was implemented to generate RSP maps as well. We used the Gammex RMI 467 electron density phantom (Sun Nuclear, Melbourne, FL) with the chemical compositions of inserted materials specified by the manufacturer. The RSP value of each rod was calculated from the known chemical compositions using the Bethe equations as reported [10]. The rods were randomly placed throughout the phantom for 5 different scans so an average HU value, relatively independent of positioning, could be obtained. The same CT scanning protocol as used for patient scans was used during the phantom scan. The HU was measured on each rod at both high- and low-energy images and calibrated with the calculated RSP values using equations reported [9]. The calibration was then applied on patient DECT images to generate physics-based RSP maps.
To investigate the performance of the proposed method at different noise levels, we simulated additional noise on the original DECT image datasets. On each pixel, random noise was added with a probability distribution of Gaussian with mean = 0 and standard deviation = 0.5%, 1%, 2%, and 5% of the value of that pixel + 1000 HU, respectively.
To further demonstrate the feasibility of dose calculation using the predicted RSP by the proposed method in proton treatment planning, we compared the difference between dose maps calculated on the ground truth RSP and the predicted RSP using the same plan parameters. Of the 20 patients, 19 had clinical proton treatment plans using pencil beam scanning and optimized with multi-field optimization technique. All plans had 4 to 5 beams and were robustly optimized with 3-mm setup uncertainty in all directions, which was covered by the 3.5% range uncertainty. The average prescription dose to clinical target volumes (CTVs) was 60.2 Gy among all patients. The structures and original treatment plans were duplicated onto the RSP maps of ground truth or prediction for dose calculation using the same algorithm (proton convolution superposition) in Eclipse 13.7 (Varian Medical Systems, Palo Alto, California). For each plan, we visually checked the similarity of the dose distributions calculated on ground-truth RSP and predicted RSP. Quantitatively, clinically relevant dose volune histogram (DVH) metrics were extracted for comparison of dose to CTVs and relevant OARs. For plans with multiple CTVs, the DVH of each CTV was counted separately.
In addition to the patient study, we also included a phantom study to evaluate our method in a controlled condition. We scanned an anthropomorphic phantom (CIRS Adult Male 701) with same scanning protocol for patients with head and neck cancer. Half of the slices (about 150 slices) were trained, and the rest were tested in an alternating pattern. The training slices and testing slices were alternating. The same data augmentation and 3D patch-based method was used to increase training data variation. The phantom mainly contained 3 materials, that is, soft tissue, lung, and bone, and the ground-truth RSP of each material was calculated based on the material information found in the manual.
Image-Quality Metrics



Results
Figure 3 shows the anthropomorphic phantom results of our proposed method as well as ground truth. Most RSP differences were in lung and at the boundary of 2 materials. In such case with simple geometry and piece-wise constant RSP distribution, the proposed method achieved NMSE of 0.60% within the patient body, and ME/NMSE of –2.20%/4.08%, 0.31%/0.55%, and –0.10%/0.43% for lung, soft tissue, and bone, respectively.
Axial views of phantom on (a) high-energy computed tomography images, (b) low-energy tomography images, and (c) its ground-truth relative stopping power (RSP) maps, and (d) RSP maps produced by the proposed method, respectively. (e) The difference maps of (d)–(c). The window level of RSP maps (c and d) is [0, 2].
Axial views of phantom on (a) high-energy computed tomography images, (b) low-energy tomography images, and (c) its ground-truth relative stopping power (RSP) maps, and (d) RSP maps produced by the proposed method, respectively. (e) The difference maps of (d)–(c). The window level of RSP maps (c and d) is [0, 2].
In Figure 4, the quality of RSP maps is shown using a side-by-side comparison with ground-truth and physics-based results from 1 patient without additional simulated noise. By the proposed method, the RSP maps maintained comparable resolution, contrast, and most of the details as the ground truth. Errors were observed due to misclassification between adipose and muscle, as well as between spongiosa bone and cortical bone, in addition to boundaries between different materials (column g). Compared with the results from the proposed method, physics-based results showed a larger error in lungs (e3). It also showed degraded quality caused by noise and artifacts. For example, a large discrepancy in (e3) can be seen around the corresponding region of streaking artifacts in (a3) and (b3).
Axial views of 1 patient without simulated noise. Rows (1), (2), and (3) are 3 different slices. Columns (a), (b), (c), (d), and (f) are the high-energy computed tomography images, low-energy computed tomography images, ground-truth relative stopping power (RSP) maps, physics-based RSP map, and maps produced by the proposed method, respectively. Columns (e) and (g) are the difference maps of (d)–(c) and (f)–(c), respectively. The yellow dotted lines on (a) indicate the positions of profiles displayed in Figure 6. The window level of RSP maps (a, b, c, d, and f) is [0, 2].
Axial views of 1 patient without simulated noise. Rows (1), (2), and (3) are 3 different slices. Columns (a), (b), (c), (d), and (f) are the high-energy computed tomography images, low-energy computed tomography images, ground-truth relative stopping power (RSP) maps, physics-based RSP map, and maps produced by the proposed method, respectively. Columns (e) and (g) are the difference maps of (d)–(c) and (f)–(c), respectively. The yellow dotted lines on (a) indicate the positions of profiles displayed in Figure 6. The window level of RSP maps (a, b, c, d, and f) is [0, 2].
Similarly, an exemplary result with additional 5% simulated noise is also shown in Figure 5. RSP maps generated by the proposed method and their corresponding profiles (Figure 6, C2) remained close to the results without additional noise, while those generated by the physics method were highly affected by the noise and demonstrated severe inaccuracies (column b). The averages of measured NMSE within the patient body at different noise levels among all 20 patients are summarized in Table 2. The proposed method maintained a comparable accuracy when noise increased from 0% to 5%, while the physics-based method showed severe degradation in performance.
Axial views of 1 patient with 5% additional simulated noise. Rows (1), (2), and (3) are 3 different slices. Columns (a), (b), (c), and (e) are the high-energy computed tomography images, low-energy computed tomography images, Relative stopping power maps by the physics method and by the proposed method, respectively. Columns (d) and (f) are the difference maps of (c) and (e) from ground truth in Figure 4 column (c), respectively. The yellow dotted lines on (a) indicate the positions of profiles displayed in Figure 6. The window level of relative stopping power maps (a, b, c, and e) is [0, 2].
Axial views of 1 patient with 5% additional simulated noise. Rows (1), (2), and (3) are 3 different slices. Columns (a), (b), (c), and (e) are the high-energy computed tomography images, low-energy computed tomography images, Relative stopping power maps by the physics method and by the proposed method, respectively. Columns (d) and (f) are the difference maps of (c) and (e) from ground truth in Figure 4 column (c), respectively. The yellow dotted lines on (a) indicate the positions of profiles displayed in Figure 6. The window level of relative stopping power maps (a, b, c, and e) is [0, 2].
Comparison of relative stopping power (RSP) map profiles. The positions of profiles (a), (b), and (c) are indicated by yellow dotted lines in Figure 3 or Figure 4 (a1), (a2), and (a3), respectively. Rows (1) and (2) are the profiles of results in Figure 3 (without simulated noise) and 4 (5% additional simulated noise), respectively.
Comparison of relative stopping power (RSP) map profiles. The positions of profiles (a), (b), and (c) are indicated by yellow dotted lines in Figure 3 or Figure 4 (a1), (a2), and (a3), respectively. Rows (1) and (2) are the profiles of results in Figure 3 (without simulated noise) and 4 (5% additional simulated noise), respectively.
Mean ± standard deviations of normalized mean square error within patient body at different noise levels among all 20 patients.

The averages of measured NMSE and ME within different VOIs at different noise levels among all 20 patients are summarized in Table 3. With several exceptions of lower but comparable NMSEs in adipose and muscle at the lowest noise level, overall, our method outperformed the physics-based method in NMSE and ME. When the noise level increased, physics-based results were more affected in accuracy, while the accuracy of the proposed method was successfully maintained. Thus, the advantage of the proposed method over the physics-based method was more prominent at higher noise levels, which indicates greater robustness to noise.
Mean ± standard deviation of mean error and normalized mean square error within different VOIs at different noise level among all 20 patients.

Figure 7 compares the calculated proton dose maps at selected axial, sagittal, and coronal planes of a patient as an example in the case of no additional noise simulated. The dose maps calculated on the ground truth qualitatively appeared to be more similar on those of the predicted RSP maps by the proposed method than by the physics-based method. Most dose errors of the proposed method were around the distal end of the beams, while that of the physics-based method happened at all the high dose-gradient regions to a much wider and severe extent. The gamma index maps (3%/3 mm) also showed more failed areas in physics-based results than our results. The gamma passing rates (threshold = 10%) for the physics-based method were 79.67%, 83.03%, and 91.43% for the axial, coronal, and sagittal planes at iso-center, and 97.81%, 97.75%, and 98.92% for the proposed method at the corresponding planes.
Dose distribution calculated on (a) ground-truth and predicted relative stopping power by the (b) physics-based method and (d) proposed method from dual-energy computed tomography datasets of 1 patient without additional simulated noise presented in 3 orthogonal views (1), (2), and (3). The dose difference maps of (a) verses (b) and (a) versus (d) are shown in (c) and (e), respectively.
Dose distribution calculated on (a) ground-truth and predicted relative stopping power by the (b) physics-based method and (d) proposed method from dual-energy computed tomography datasets of 1 patient without additional simulated noise presented in 3 orthogonal views (1), (2), and (3). The dose difference maps of (a) verses (b) and (a) versus (d) are shown in (c) and (e), respectively.
The differences in DVH metrics of clinically relevant OARs between ground truth and predicted RSP among all patients are summarized in Table 4. For the proposed method, the mean differences were less than 0.2 Gy for CTVs with no statistical significance. A 2-tailed paired t-test was performed to compare the mean values of each DVH metric between ground truth and predicted RSP maps. It was shown that DVH metrics had no significant difference in all OARs except esophagus and brainstem with overestimation less than 1.2 Gy. Compared with the results by the physics-based method, the results by our method had less variation in most of the DVH metrics.
Discussion
In this study, we proposed a novel machine-learning based method to predict RSP maps from DECT for proton radiation therapy. As an alternative to the current physics-based method, our proposed method aims to provide accurate RSP values with more resistance to noise. We evaluated the accuracy of predicted RSP maps using our method in the context of patients with head-and-neck cancer. The RSP maps showed an average NMSE of 2.83% across the whole body volume and average ME less than 3% in all VOIs. With additional simulated noise added in DECT datasets, the proposed method still maintained comparable performance, while the physics-based method suffered from degraded inaccuracy with increased noise level. Based on the statistical analysis of comparative DVH metrics of dose maps calculated on ground truth and predicted RSP maps among 19 pencil beam scanning proton treatment plans, we showed that the average differences in DVH metrics for CTVs were less than 0.2 Gy for D95% and Dmax with no statistical significance. Maximum difference in DVH metrics of OARs was around 1 Gy on average. These results strongly indicated the high accuracy of RSP maps predicted by our machine-learning–based method and showed potential feasibility for proton treatment planning and dose calculation.
In this study, we proposed a method to generate RSP maps by learning from DECT image datasets and the corresponding ground-truth RSP. Note that the patients' true RSP values were unavailable. The ground truth of this study was then assumed by assigning calculated RSP values to manually segmented regions of materials on patient DECT images, where the calculation was based on the known chemical composition. Such a material assignment method is used in current proton and photon studies when the required material information cannot be readily or accurately derived from CT simulation images [9, 38, 39]. Although the learning target was bulk-assigned RSP maps, the predicted RSP maps were still patient-specific with continuous values, preserved fine image textures and contrasts. Potential error may have been caused by the inaccuracy in segmentation and inter- and intrapatient variation in composition, which could be a limitation in the implementation of this study. Another limitation is that the training and testing images in the anthropomorphic phantom study, although from different slices, were still very similar in terms of geometry and material. Testing on a different anthropomorphic phantom with different materials and reasonable geometry variation would be helpful in investigating the fundamental correlation between learning algorithm and physical artifacts in DECT images under a well-controlled condition.
However, it is worth noting that this study did not aim to evaluate the absolute error of the predicted RSP from patients' true RSP values which were unavailable. Instead, we demonstrated the performance of the proposed method by evaluating its discrepancy of the prediction with its training dataset. Such performance should remain at a similar level when the training dataset is selected differently. With the prediction performance maintained, a better knowledge of RSP in the training stage, such as more and finer types of materials segmented and assigned on a patient, differentiation of the same type of tissue among diverse demographics [40], or estimation from animal tissue models [41, 42], would directly lead the predicted results to be closer to physical reality.
Machine-learning–based methods are relatively new for DECT RSP map prediction. Only a few existing studies have relied on patient data, rather than phantom data. The RSP prediction accuracy by our method was competitive to others. Su et al [9] investigated the performance of historical centroid, random forest, and artificial neutral networks in solving this problem. Models were trained on tissue substitutes and tested on patient data. Results showed that MEs were around 5% for all VOIs in the abdominal region compared with calculated RSP values as ground truth. Our study moved a step further in integrating a deep attention strategy and cycle-GAN into the mapping from DECT to RSP, directly training on patient datasets, and demonstrating the dosimetric feasibility.
In this study, we used a twin-beam scanner for DECT acquisition. Note that the proposed method does not specify the scan scheme; thus, it is applicable to other DECT modalities. We presented it in the context of TBCT in order to demonstrate the noise robustness of the proposed method. TBCT has inferior energy separation compared with other DECT modalities, which leads to a higher sensitivity to artifacts and noise on DECT image datasets [22]. It potentially explains why the results by the physics-based method demonstrated larger error and higher noise level than previously reported [15], where DECT images were acquired by 2 sequential scans at 2 different energy levels, which have a larger separation between the 2 energy spectra than the twin-beam scan scheme used in this study. However, TBCT has unique advantages over other DECT modalities for radiation therapy simulation in good temporal coherence, full field of view, and low hardware complexity and cost. Thus, our method overcomes the drawback of TBCT in RSP map generation by its insensitivity to noise, thus facilitating the clinical use of TBCT in radiation therapy workflow. Moreover, the recently developed multispectral CT, which allows concurrent identification of multiple materials with increased accuracy [43], is also compatible with the proposed method by adding more input channels for CT images at multiple energy levels.
We investigated the performance of the proposed method at different noise levels and compared with the physics-based method. The DECT images without any simulated noise represents the most likely noise level in clinical practice. With simulated noise, the DECT images may be much noisier than usual, while it may still happen when the patient is large or a low-mAs scanning protocol is preferred. In such cases, the advantage in noise robustness of the proposed method is desired. In addition to the noise robustness investigated in this study, deep learning-based methods may have an advantages over physics-based methods in other aspects. For example, a major struggle in physics-based methods is the systematic uncertainty in reconstructed images, which includes, but is not limited to, beam-hardening artifact, scatter artifact, CT number variations on reconstruction algorithms, and patient size [44–47]. Deep learning-based methods, on the other hand, are expected to be less affected by these uncertainties, as long as the training datasets are representative and the corresponding training target (RSP maps) reflects the true RSP distributions. It can be attributed to the data-driven approach used in deep learning that focuses on images features, unlike the sole dependence on pixel values in the physics-based method. Given these potential merits, deep learning-based methods may overcome the current obstacles in physics-based proton RSP estimation, while the specific improvement needs further investigations.
In the present study, we found that the dose difference in columns (c) and (f) of Figure 7 mostly happened at the distal end of the beam. It is consistent with current studies about range uncertainties of proton beams caused by the error in RSP maps [39]. The overestimation in dose could have resulted from the underestimation of muscle RSP with less compensation from the overestimation of adipose RSP. Future studies could include investigations in determining the range deviations to the ground truth from the systematic prediction error on each material.
Deep learning-based methods usually involve large computational costs to train a model. We implemented the proposed algorithm with Python 3.7 (Python Software Foundation, Wilmington, DE) and TensorFlow (Company, City, State) as in-house software on a NVIDIA Tesla V100 GPU (Santa Clara, CA) with 32 GB of memory. The Adam gradient optimizer with a learning rate of 2e-4 was used for optimization. In the present study, the training stage required ∼30 GB and ∼16 hours for the training datasets of 19 patients and ∼2 minutes for each patient in the testing stage.
We limited our study to the head-and-neck region. Patients with head-and-neck cancer have high anatomical complexity and variability between patients. The tumor shape, size, and location can vary greatly for different patients, and it is common to see the tumor changing the exterior body shape, which is challenging for the learning-based method. A comprehensive evaluation with a larger population of patients with diverse anatomical abnormalities should be included in future studies during the model training. Different testing and training datasets from other centers would also be necessary to validate the clinical utility of our method. Moreover, the proposed method can be applied to other treatment sites of clinical importance for proton therapy, which would be of great interest for expanding this work to the clinic.
ADDITIONAL INFORMATION AND DECLARATIONS
Conflicts of Interest: The authors have no relevant conflicts of interest to disclose.
Funding: This research was supported in part by the National Cancer Institute of the National Institutes of Health under Award Number R01CA215718, and Emory Winship Pilot Grant.
Ethical Approval: All patient data have been collected under internal review board approved protocol.