Recent data support that low-risk submucosally invasive (pT1) colonic adenocarcinomas (ie, completely resected tumors that lack high-grade morphology, tumor budding, and lymphovascular invasion) are considered cured via endoscopic resection, provided that the submucosal invasion is less than 1000 μm. Hence, the pathologists' assessment of depth of submucosal invasion may guide further management (ie, surveillance versus colectomy).
To assess interobserver concordance among gastrointestinal pathologists in measuring submucosal depth of invasion in colonic endoscopic resections.
Six gastrointestinal pathologists from 5 academic centers independently measured the greatest depth of submucosal invasion in micrometers on 52 hematoxylin-eosin–stained slides from colonic endoscopic specimens with pT1 adenocarcinomas, per published guidelines (round 1 scoring). Two separate measurements (round 2 scoring) were subsequently performed by each pathologist following a consensus meeting, (1) from the surface of the lesion and (2) from the muscularis mucosae, and pathologists were asked to choose their (3) “real-life (best)” assessment between the first 2 measurements. Interobserver agreement was assessed by the intraclass correlation coefficient (ICC) and Cohen κ statistics.
Round 1 had poor ICC (0.43; 95% CI, 0.31–0.56). Round 2 agreement was good when measuring from the surface (ICC = 0.83; 95% CI, 0.76–0.88) but moderate (ICC = 0.59; 95% CI, 0.47–0.70) when measuring from the muscularis mucosae and became poor (ICC = 0.49; 95% CI, 0.36–0.61) for the best-assessment measurement.
Our findings indicate that clearer and reproducible guidelines are needed if clinical colleagues are to base important management decisions on pathologists' estimate of the depth of submucosal invasion in colonic endoscopic resections.
Widespread screening colonoscopy procedures and advances in definitive endoscopic therapy call for precise and reproducible criteria for the evaluation of superficially invasive colorectal adenocarcinoma (CRC) in endoscopic resections. Submucosa-invasive CRC (pT1) is defined as invasion beyond the muscularis mucosae, but limited to the submucosa. A subset of these tumors are at low risk for nodal metastases and can be effectively managed by endoscopic procedure, sparing patients the morbidity and mortality associated with colectomy.1–5 The pathologic criteria for cure of an endoscopically resected pT1 colonic adenocarcinoma include clear margins, along with low-grade histologic features, that is, lack of high–grade morphology or tumor budding, and no lymphovascular invasion.3,5–16 Recently, it has been suggested that, in the absence of the aforementioned features, depth of submucosal invasion 1000 μm or more may be an indication for colectomy, whereas tumors that invade less than 1000 μm can be managed conservatively with surveillance.17
The project study by the Japanese Society for Cancer of the Colon and Rectum (JSCCR) reported that nodal metastases occurred in 12.5% of CRCs with a submucosal depth of invasion 1000 μm or greater.5,17,18 This also implies, however, that surgery may be unnecessary for approximately 90% of patients in this group. Of note, a subset of these patients also had other high-risk factors. Indeed, high-risk features are summative in their risk for lymph node metastases, and hence should be considered in conjunction with other patient factors when deciding subsequent management.4,17 The 2019 JSCCR guidelines for the treatment of CRC state that in the absence of other risk factors for lymph node metastasis (ie, other than the depth of submucosal invasion), the incidence of nodal metastasis is 1.3% (95% CI, 0%–2.4%) when the submucosal depth of invasion is 1000 μm or more.17
Pathologists are increasingly asked by clinical colleagues to report the depth of submucosal invasion along with other well-established high-risk features when evaluating endoscopically resected pT1 adenocarcinomas. Existing guidelines5 advocate that pathologists perform a direct measurement, in micrometers, of tumor thickness below the level of the recognizable muscularis mucosae. If the muscularis mucosae is obscured or destroyed by tumor, then the recommendation is to measure from the surface of the lesion.5 In cases of ulcerated adenocarcinoma, the recommendation is to measure from the ulcer base.5 For pedunculated lesions with a tangled muscularis mucosae, depth of submucosal invasion is measured as the distance between the point of deepest invasion and the reference line, which is defined as the boundary between the tumor head and the stalk.5 Other authorities3,4,10,18,19 recommend a similar approach. Given the importance of depth of invasion in managing pT1 CRC, this study aimed to assess interobserver agreement among pathologists with subspecialty interest in gastrointestinal (GI) pathology when asked to measure submucosal depth of invasion in colonic endoscopic resection specimens with pT1 colonic adenocarcinoma by adhering to these guidelines.
DESIGN
The respective institutional review boards of all the participating authors' institutions approved the study. Six pathologists with subspecialty interest in GI pathology from 5 different academic centers retrospectively collected hematoxylin-eosin (H&E)–stained glass slides from colonic endoscopic resections for this study. Slides from well-oriented colonic polypectomy, endoscopic submucosal dissection, and endoscopic mucosal resection specimens that were diagnosed as pT1 colonic adenocarcinoma by the contributing pathologist were included in the study. Piecemeal, improperly oriented, or tangentially sectioned specimens were excluded. Submucosa-invasive (pT1) colonic adenocarcinoma was diagnosed in the endoscopic specimens by the presence of angular and infiltrative glands identified in a desmoplastic stroma and/or associated with muscularized vessels. Intramucosal adenocarcinoma (a term discouraged in the lower GI tract) was used by some pathologists to denote an absence of definitive extension of neoplastic glands through muscularis mucosae in their assessment. The slides in this study represented an unselected sample of cases meeting inclusion criteria. All study pathologists independently reviewed all the study slides. Five of the 6 pathologists had at least 6 years of clinical sign-out experience, and one pathologist had 1 year of clinical sign-out experience. For the purpose of this study, the slides were reviewed solely for the depth of submucosal invasion in micrometers (no other histopathologic parameters were assessed).
All 6 study pathologists were aware of the existing guidelines prior to undertaking this study. Four of the 6 study pathologists had been reporting submucosal depth in their routine clinical practice prior to the study, 1 (reviewer 2) was providing this information upon request, and 1 (reviewer 4) had not reported this measurement. Guidelines were still provided for review prior to round 1 scoring to ensure that standard criteria were used.4,5 The pathologists were asked to individually measure the maximal depth of submucosal invasion for each case, applying the guidelines according to their own interpretation. Each measurement was documented via a computer-captured photomicrograph of the H&E slide. The pathologists used their available measuring software (Olympus's CellSens imaging software for 5 reviewers and Nikon's NIS-Elements imaging software for 1 reviewer) for their Olympus or Nikon cameras that were properly calibrated for each objective. All the pathologists took one image using their preferred objective that allowed them to demonstrate the entire invasive tumor depth.
Three months after round 1 scoring, an online consensus meeting via an online videoconferencing platform among all the pathologists was undertaken. Selected cases (n = 8) demonstrating disagreement when a cutoff of 1000 μm was applied (ie, same case scored as ≥1000 μm and <1000 μm by different pathologists) were reassessed together, possible reasons for discrepancies were discussed, and the recommended guidelines were revisited among the authors. After the consensus meeting, round 2 scoring was performed, wherein 2 separate measurements were recorded by each pathologist on each slide. The first measurement was from the surface of the lesion to the greatest depth of submucosal invasion, and the second measurement was from the muscularis mucosae (whether interpreted as intact or obscured) to the greatest depth of submucosal invasion. In the latter case (ie, muscularis mucosae being interpreted as obscured or not assessable), the deepest point of the muscularis mucosae was identified by locating the neighboring intact muscle and drawing an imaginary line following the curve of the muscularis mucosae (ie, where it should have been located). Each pathologist then chose the “best (real-life)” measurement from the above 2 options (ie, the one the pathologist would report in clinical practice). This was an attempt to assess agreement as to whether cases showed muscularis mucosae destruction and should be measured from the surface or intact muscularis mucosae to use as a landmark. Hence, the best (real-life) measurement was also considered as a surrogate for whether the pathologist thought that muscularis mucosae was assessable or not. A photograph displaying the measurements for round 2 was taken by each reviewer again using the preferred objective. The pathologists were blinded to each other's results for both rounds of scoring.
Interobserver agreement was assessed by the intraclass correlation coefficient (ICC) for continuous variables and by the Cohen κ statistic for categorical variables. When conducting a reliability study, it is recommended to obtain at least 30 heterogeneous samples and involve at least 3 raters whenever possible. Under such conditions, ICC values less than 0.5 are indicative of poor reliability, values of 0.5 to 0.75 indicate moderate reliability, values of 0.75 to 0.9 indicate good reliability, and values greater than 0.90 indicate excellent reliability.20 Cohen suggested the κ results be interpreted as follows: values of 0 or less as indicating no agreement, 0.01 to 0.20 as none to slight, 0.21 to 0.40 as fair, 0.41 to 0.60 as moderate, 0.61 to 0.80 as substantial, and 0.81 to 1.00 as almost perfect agreement.21
RESULTS
The 6 participating pathologists cumulatively collected 52 H&E-stained slides from 40 colonic endoscopic specimens originally diagnosed as submucosally invasive (pT1) adenocarcinoma by the sign-out pathologist. For the sake of increasing the power of the study (number of study cases), different slide(s) of the same lesion were considered separate measurable test cases in some resection specimens that showed submucosal tumor in more than one section (22 slides from 10 endoscopic specimens; 1 slide each from 30 endoscopic specimens).
Round 1 Scoring
The scoring results for all 6 reviewers are tabulated in Table 1. Overall agreement among the 6 reviewers was poor (ICC = 0.43; 95% CI, 0.31–0.56). Lack of agreement can be seen by the variation around the line of agreement in each plot (Figure 1). Of the 52 cases, 19 cases (36.5%) were scored as 1000 μm or more by all 6 reviewers, 1 case (2%) was scored as less than 1000 μm by all reviewers, and 32 cases (61.5%) showed disagreement when measurements were categorized as less than 1000 μm or 1000 μm or more (ie, a case scored variably as <1000 μm and ≥1000 μm by different reviewers) (Figures 2, A through F, and 3, A through F). The resultant agreement at a cutoff point of 1000 μm was only fair (Cohen κ = 0.23; 95% CI, 0.07–0.39) among the 6 reviewers.
Online Consensus Meeting
Possible reasons for variations in measurements were discussed. The most common source of discrepancies was the interpretation of the muscularis mucosae, with some pathologists interpreting it as intact or identifiable and assessable and hence measuring from the muscularis mucosae, whereas to others it appeared distorted or obliterated, and hence measuring was performed from the surface of the lesion for the same case (Figures 2, A through F, and 3, A through F). In cases wherein the muscularis mucosae was splayed by tumoral infiltration, there was also disagreement over whether one should consider the deepest or most intact (thickest) portion as the point of reference (Figure 3, A through F). Cases with mucinous differentiation proved challenging, with some authors measuring to the deepest extent of mucin pools and others measuring to the level of tumor cells in the mucin pools (Figure 4, A through F). We also found that the authors differed in choosing the microscopic field with the greatest depth of invasion in some cases, adding to the interobserver variation. We developed recommendations to address these scenarios (Table 2) and applied them in round 2 scoring.
Round 2 Scoring
The round 2 scoring results for the 6 reviewers for the measurements from the surface and muscularis mucosae are tabulated in Table 3. The scoring result of each author's best (real-life) assessment, wherein the author either interpreted the muscularis mucosae to be obliterated (nonassessable) and measured from the surface or from the muscularis mucosae (interpreted as intact), is tabulated in Table 1 and compared with the round 1 assessment.
The overall estimate of agreement when measuring from the surface among the 6 reviewers was good (ICC = 0.83; 95% CI, 0.76–0.88). The percentage of measurements from the surface of 1000 μm or more did not vary much among the scorers, ranging from 88% to 100% (Table 3). However, there were insufficient measurements of less than 1000 μm (only 2 of 312) when measuring from the surface to reliably evaluate κ at a cutoff point of 1000 μm. The overall estimate of agreement for measurement from muscularis mucosae among the 6 reviewers was moderate (ICC = 0.59; 95% CI, 0.47–0.70). The percentage of measurements from the muscularis mucosae of 1000 μm or more ranged from 63% to 85% (Table 3). Overall, when the measurements from the muscularis mucosae were categorized at a cutoff point of 1000 μm, the estimate of agreement among the 6 reviewers was only fair (Cohen κ = 0.37; 95% CI, 0.18–0.56). The overall estimate of agreement for the reviewer's best (real-life) measurements among the 6 reviewers indicated poor agreement (ICC = 0.49; 95% CI, 0.36–0.61). Lack of agreement can be seen by the variation around the line of agreement in each plot and in photomicrographs (Figures 5, 6, A through F, and 7, A through F). The percentage of measurements for the best assessment 1000 μm or more ranged from 71% to 94% (Table 1). Overall, the Cohen κ estimate of agreement among the 6 reviewers when the best-assessment measurements were categorized at a cutoff point of 1000 μm was only moderate (0.41; 95% CI, 0.16–0.66). The comparisons between round 1 and round 2 measurements are tabulated in Table 4.
Agreement Between Each Pathologist's Scores for Rounds 1 and 2
The agreement for each pathologist between round 1 and round 2 (best assessment) measurements ranged from 0.41 (poor; reviewer 6) to 0.77 (good, reviewer 3); both of these reviewers had at least 6 years of clinical experience. The agreement values for other reviewers were 0.45 (reviewer 4), 0.59 (reviewer 2), 0.60 (reviewer 1), and 0.75 (reviewer 5). Of note, because round 2 measurements were undertaken after discussion and establishment of a consensus approach, this was not considered true intraobserver reliability.
DISCUSSION
With advances in endoscopic management for early invasive colonic adenocarcinoma being made for both pedunculated and nonpedunculated lesions, many recent studies have looked into pathologic criteria for curative endoscopic therapy, that is, patients who can be spared major surgery in favor of subsequent endoscopic surveillance. These criteria include complete resection, that is, negative deep margin with adequate clearance (usually 1–2 mm), low-grade tumor morphology with no lymphovascular invasion, and low tumor budding. Recently, depth of submucosal invasion less than 1000 μm has been added as another favorable histologic feature.* The adverse risk factors, when present, are summative in their risk of nodal metastases, and these patients warrant a subsequent colectomy with lymph node resection.4,26,27
The importance of submucosal depth has been recognized for some time. Haggitt et al28 showed that the depth of submucosal invasion was associated with nodal metastases in pedunculated malignant polyps, and Kikuchi et al29 and Kudo30 demonstrated similar findings in sessile polyps. However, these levels are difficult to apply in endoscopic specimens, particularly for sessile lesions, as the full submucosal thickness may not be visualized and the muscularis propria is not present as a landmark for accurate subdivision of submucosa. Multiple studies have reported a positive correlation between depth of submucosal invasion in endoscopic specimens (including malignant polyps) and nodal metastases. A Japanese study showed that the rate of nodal metastases was 0% if the submucosal depth of invasion was less than 1000 μm in the author's cohort of 724 nonpedunculated submucosal invasive CRCs.18 Another multi-institutional Japanese study including 806 submucosal invasive CRCs showed submucosal invasion of 1000 μm or more to be an independent predictor of nodal metastases by multivariate analysis (odds ratio, 5.56; 95% CI, 2.14–19.10).10 Some other, mainly Japanese, studies have shown similar results when using a cutoff depth of submucosal invasion 1000 μm or more.7,10,31 Recently, the depth of submucosal invasion was studied in a North American cohort consisting of 116 surgically resected pT1 colorectal carcinomas, and depth of submucosal invasion of 1000 μm or more was significantly associated with lymph node metastasis on univariate analysis (P = .04), although it was not an independent predictor of lymph node metastasis on multivariate analysis. These authors also reported that depth of submucosal invasion 1000 μm or more was significantly more common among lymph node–positive compared with lymph node–negative tumors (81% versus 60%, P = .04).11 However, the literature has been divided on the cutoff for submucosal depth, with some other studies proposing 2000 μm as a more reasonable cutoff.6,7 Nonetheless, pathologists are now increasingly being expected to include the depth of submucosal invasion in their pathology reports. But, for the clinical guidelines to be effective in any decision-making process, they need to be reproducible in addition to being accurate.
In this study, we found substantial interobserver variability among pathologists with subspecialty interest in GI pathology when measuring the depth of invasion of pT1 colonic adenocarcinoma in endoscopic specimens using the recommended guidelines, even after pathologists revisited the guidelines in a consensus meeting. The results show that the pathologists in this study had a good interobserver agreement when they had to measure from the surface of the lesion to the greatest depth of submucosal invasion. The interobserver agreement decreased and became moderate when they had to measure from the muscularis mucosae (whether interpreted as obliterated or intact). However, when pathologists had to interpret and use the guidelines in a “real-life” setting, the interobserver agreement became poor. The guidelines state that the depth of invasion should be measured from the muscularis mucosae, when it is intact or when it is possible to identify or estimate the muscularis mucosae, and that “when it is not possible to identify or estimate the location of the muscularis mucosae” or in case of muscularis mucosae deformity, the tumor should be measured “from the surface of the lesion.”5 This former clause of “possible to identify or estimate the location of the muscularis mucosae” may be a reason for interobserver variability in the application of guidelines, as pathologists likely apply different criteria to assess the intactness of the muscularis mucosae. The guidelines do state that the phrase “possible to identify or to estimate” means that there is no “deformity” of the muscularis mucosae as a result of submucosal invasion. Although judging whether there is a deformity is not always straightforward, if a desmoplastic reaction is present around the muscularis mucosae, it is assumed to be deformed.5 However, in the authors' experience, submucosal invasion is almost always associated with some deformity of the muscularis mucosae and desmoplasia, especially in the region of submucosal invasion. Measuring from the surface, although the most objective and reproducible method in our study, results in most measurements exceeding 1000 μm and would likely result in overtreatment with associated morbidity in a substantial proportion of patients. Although most papers that correlated depth of invasion with nodal metastases do state the recommended guidelines,11,18 details about intraobserver reproducibility and numbers of pathologists performing the assessments are generally lacking. Indeed, varying recommendations for the cutoff of submucosal depth of invasion for endoscopic management of pT1 colonic adenocarcinomas may, in part, reflect different standards and methods used to measure the depth itself. Another reason for discordance we found was that in some cases of mucinous adenocarcinoma, some authors measured to the deepest extent of mucin, whereas others measured to the deepest tumor cells floating in the mucin pools. However, it was agreed upon in the meeting that given that all of these are untreated tumors, it would be best to measure to the deepest extent of mucin, as mucin is part of the neoplastic process.32 Lastly, we also found that the pathologists were not uniform in selecting the focus with the deepest invasion on the same slide, adding to the discordance in measurements among pathologists.
In this study, we did not attempt to separate pedunculated from sessile lesions when measuring the depth of invasion, as this information is frequently missing from endoscopic reports. Furthermore, current guidelines, including the JSCCR guidelines,5,17 state that depth of submucosal invasion should be measured from the lower border of the muscularis mucosae when it can be identified, regardless of the “macroscopic type” or “polyp configuration.” Only in pedunculated lesions with “tangled muscularis mucosae” do they advocate giving the depth of invasion from the Haggitt line (defined as the boundary between the tumor head and the stalk).5,10,18,19 All the authors in this paper used these recommended guidelines when measuring the depth of invasion in this study. In our study, there were occasional cases that were diagnosed as pT1 CRC by most (≥4) study pathologists, including the contributing pathologist, but as intramucosal adenocarcinoma by others, in keeping with prior reports of interobserver variability regarding the presence of submucosal invasion in challenging cases.33,34
Although many studies correlate the depth of invasion with nodal metastases and clinical outcomes for pT1 colonic adenocarcinomas, few have examined agreement among pathologists when measuring the depth of invasion. The interobserver agreement has ranged from poor to good in prior reports.33,35,36 These studies either used Haggitt levels of invasion and not the actual depth in micrometers,33 or they stratified the polyps into low risk or high risk based on a cutoff of 2000 μm instead of 1000 μm.35,36 In one study, the mean depth of invasion among 4 reviewers ranged from 4.1 to 4.6 mm, presumably reflecting a cohort of more advanced pT1 cancers. When the authors tried to dichotomize these cases into Ueno high risk on the depth of invasion (ie, depth of 2 mm or greater), the agreement was moderate (κ = 0.49).35 No prior studies have examined the reasons for discordance or attempted to reconcile differences of opinions among participating pathologists, nor have they provided photomicrographs to demonstrate how measurements were performed. A strength of our study was that we included 2 rounds of measurements, 1 before and 1 after the meeting. Additionally, 2 separate measurements were recorded in round 2 as an attempt to assess the reason for discordance among pathologists, with representative photomicrographs highlighting the possible reasons for discordance. A limitation of our study was that the specimens were all collected, grossed, and processed at different institutions; however, we believe that this workflow is similar to what many centers use when patients transfer care among institutions. By circulating glass slides rather than digital slides or photomicrographs, we sought to simulate current standard pathology practice. We did not use ancillary studies, such as smooth muscle immunostains, as their use has not been mentioned or advocated in the current guidelines.4,5 This study is focused on the reproducibility of the depth of invasion, as this histopathologic parameter has most recently been added to the risk assessment of pT1 colonic adenocarcinomas. The reproducibility (or lack thereof) of other high-risk features has been studied elsewhere.37–39 Of note, of a total of 52 cases, only 1 case in round 1 scoring and 3 cases in round 2 best-assessment scoring were scored as less than 1000 μm by all 6 reviewers. Hence, the study is underpowered to specifically comment upon the reproducibility of measurement for exclusively small cancers (ie, <1000 μm).
Our findings indicate poor interobserver agreement among pathologists with subspecialty interest in GI pathology when measuring the depth of invasion in colonic endoscopic specimens with submucosally invasive adenocarcinoma. Clearly, more concrete and reproducible guidelines are needed before this parameter is widely applied to make important management decisions. Herein, we have defined the histologic features that most significantly contribute to poor reproducibility, namely, ambiguity in histologic landmarks that define the more superficial reference point for measurement. Future studies may focus on identifying a single, reproducible point of reference and correlating outcomes with depth from this point. Given the limitations of the current system, pathologists should consider commenting on the subjective nature of these measurements in challenging cases, particularly for cases without other high-risk features and wherein use of different surface points (surface versus muscularis mucosae) would lead to measurements that fall on opposite sides of the 1000-μm threshold.
References
Author notes
The authors have no relevant financial interest in the products or companies described in this article.
Portions of this study were presented as a poster at the United States and Canadian Academy of Pathologists annual 2021 virtual meeting.