Context.—

The histopathologic criteria for idiopathic pulmonary fibrosis were revised in the American Thoracic Society/European Respiratory Society/Japan Respiratory Society/Latin American Thoracic Association guidelines in 2011. However, the evidence of diagnosis based on the guidelines needs further investigation.

Objective.—

To examine whether the revised histopathologic criteria for idiopathic pulmonary fibrosis improved interobserver agreement among pathologists and the predicted prognosis in patients with interstitial pneumonia.

Design.—

Twenty, consecutive, surgical lung-biopsy specimens from cases of interstitial pneumonia were examined for histologic patterns by 11 pathologists without knowledge of clinical and radiologic data. Diagnosis was based on American Thoracic Society/European Respiratory Society guidelines of 2002 and 2011. Pathologists were grouped by cluster analysis, and interobserver agreement and association to the patient prognosis were compared with the diagnoses for each cluster.

Results.—

The generalized κ coefficient of diagnosis for all pathologists was 0.23. If the diagnoses were divided into 2 groups: usual interstitial pneumonia (UIP)/probable UIP (the UIP group) or possible/not UIP (the non-UIP group), according to the 2011 guidelines, the κ improved to 0.37. The pathologists were subdivided into 2 clusters in which 1 showed an association between UIP group diagnosis and patient prognosis (P < .05).

Conclusions.—

Agreement about pathologic diagnosis of interstitial pneumonia is low; however, results after division into UIP and non-UIP groups provided favorable agreement. The cluster analysis revealed 1 of the 2 clusters providing high interobserver agreement and prediction of patient prognosis.

After publication of the 2002 American Thoracic Society/European Respiratory Society (ATS/ERS) classification of idiopathic interstitial pneumonia (IIP),1  interobserver variability in pathologic diagnosis remained a problem.24  Some validation studies on the reproducibility of IIP diagnosis revealed significant disagreement among pathologists. In 2011, the diagnostic guidelines for idiopathic pulmonary fibrosis (IPF) were revised, and the histopathologic criteria for usual interstitial pneumonia (UIP) were established.5  The criteria set 4 levels of certainty: (1) definite, (2) probable, (3) possible, and (4) not. The new guidelines provided the opportunity to standardize the pathologic diagnosis of IIP; therefore, we assessed their reproducibility. In addition, we studied the correlation between pathologic diagnosis and disease prognosis.

MATERIALS AND METHODS

Patient Selection

We selected 20, consecutive, chronic and fibrosing, interstitial pneumonia cases that underwent video-assisted thoracoscopic surgery from a single institute between September 2008 and May 2009. Purely cellular disease was not included. Clinical data immediately before the video-assisted thoracoscopic surgery biopsy, including patient age, sex, background systemic disease, medication history, occupational exposure, type of treatment, smoking history, pulmonary function tests, and survival status were obtained. Pulmonary function test data included predicted vital capacity, predicted forced vital capacity, and predicted diffusing capacity of carbon monoxide.

Data Collection

Five North and South American, one Saudi Arabian, and 5 Japanese pulmonary pathologists reviewed whole slide images of the surgical lung specimens stained with hematoxylin-eosin (IntelliSite Ultra Fast Scanner, Philips, Amsterdam, the Netherlands). The reviewers were blinded to clinical and radiologic data. They were asked to classify the histopathologic pattern according to the 2002 ATS/ERS statement on IIP1  as follows: UIP, nonspecific interstitial pneumonia, organizing pneumonia, diffuse alveolar damage, respiratory bronchiolitis, desquamative interstitial pneumonia, and other specific diseases. The reviewers were then asked to distribute the diagnoses into 4 patterns (UIP, probable UIP, possible UIP, and not UIP) based on the histopathologic criteria in the 2011 IPF guidelines.5 

Statistical Analysis

The overall agreement and interobserver agreement between 2 pathologists were analyzed by measuring the κ coefficient. The agreement was categorized as poor, fair, moderate, good, or very good, according to κ values less than 0.20, 0.20 to 0.39, 0.40 to 0.59, 0.60 to 0.79, or more than 0.80, respectively.6  Histologic diagnoses were collapsed into 2 groups: definite/probable UIP (the UIP group) or possible/not UIP (the non-UIP group). Then, hierarchic cluster analysis, in which distance was defined by the Ward method, was performed with the 2 collapsed patterns. The clusters with κ values more than 0.6 among pathologists were considered favorable clusters and were applied to survival analysis. The association between diagnosis and survival for each observer and cluster was evaluated with Kaplan-Meier curves. P < .05 was considered significant.

All statistical analyses were performed with JMP Pro 11.2.0 software (SAS Institute Inc, Cary, North Carolina).

RESULTS

Patient Characteristics

A clinical summary of all cases is shown in Table 1. There were 13 men and 7 women, aged 31 to 75 years (median, 62 years). There were 4 current smokers, 9 ex-smokers, and 7 never smokers. Follow-up ranged from 1 to 78 months after video-assisted thoracoscopic surgery, with a median of 57 months. Seven patients died during follow-up. Patients 2, 6, 11, and 20 experienced acute exacerbation. Patients 3, 5, 6, 8, 10, 12, 19, and 20 were treated with pirfenidone, and patients 5, 8, and 20 were treated with pirfenidone alone. Patients 2, 3, 6, 7, 10 to 14, 16, 17, and 19 were treated with steroids and cyclosporine A. Patients 1, 4, 9, 15, and 18 were under observation and did not receive any treatment during follow-up (Table 1).

Table 1. 

Clinical Data of 20 Cases Analyzed in the Current Study

Clinical Data of 20 Cases Analyzed in the Current Study
Clinical Data of 20 Cases Analyzed in the Current Study

Two patients developed an underlying connective tissue disease: systemic sclerosis in patient 7 and rheumatoid arthritis in patient 17. Patients 6 and 13 had a suspicion of connective tissue disease, but neither progressed to a definitive connective tissue disease within the follow-up period. Patient 18 had a history of exposure to birds and was diagnosed with chronic hypersensitivity pneumonia by multidisciplinary discussion diagnosis (MDD). Three cases had a history of occupational exposure (asbestos, machine fluid, and cotton), but none was considered to have pneumoconiosis after MDD.

Patient 10 received a lung transplant 52 months after initial evaluation. Patient 11 was initially included in the study because of a histopathologic diagnosis of fibrosis. However, the patient died within 1 month after presentation, and the case was classified as acute interstitial pneumonia based on MDD. Therefore, this case was not included in further survival analyses.

Agreement of Pathologic Diagnosis

We obtained video-assisted thoracoscopic surgery biopsy samples from segments S5, S8, and S9 from 17 cases, segments S5 and S8 from 2 cases, and segment S8 from one case. All diagnostic data, based on 2002 ATS/ERS classification and obtained by 9 pathologists, are shown in Table 2. Some pathologists used diagnoses of organizing pneumonia, desquamative interstitial pneumonia, respiratory bronchiolitis, or diffuse alveolar damage. However, the use of these diagnoses was minor, with only 14 of 180 diagnoses (8%), and none was agreed upon by more than 3 pathologists. The overall κ coefficient for 20 cases for 9 pathologists was 0.23 and was considered as fair (Table 3). Thirty-six patterns of combinations were found, with no cases (0%) demonstrating very good or good agreement, moderate agreement in 7 cases (19%), fair agreement in 13 cases (36%), and poor agreement in 16 cases (44%). Along with the 2011 guidelines, all the diagnostic data from 11 pathologists are shown in Table 4. For diagnostic agreement, overall κ was 0.19 (Table 5). When diagnoses were collapsed into 2 groups—definite/probable UIP or possible/not UIP—overall κ improved to 0.37 (Table 6). Fifty-five patterns of combinations were found, with no cases (0%) demonstrating very good agreement, good agreement in 1 case (2%), moderate agreement in 4 cases (7%), fair agreement in 20 cases (36%), and poor agreement in 30 cases (54%). When the diagnoses were collapsed, they demonstrated no cases (0%) of very good agreement, good agreement in 7 cases (13%), moderate agreement in 16 cases (29%), fair agreement in 14 (25%), and poor agreement in 18 cases (33%). The levels of agreement were not affected by the numbers of slides or sites. There was one case of chronic hypersensitivity pneumonia and 2 cases of connective tissue disease–associated interstitial lung disease included. Interobserver agreement in those cases was high for non-UIP diagnosis: 7 not UIP and 3 possible UIP for chronic hypersensitivity pneumonia; 10 not UIP and 1 possible UIP for rheumatoid arthritis–associated interstitial lung disease; 8 not UIP and 3 possible UIP for systemic sclerosis–associated interstitial lung disease.

Table 2. 

All Pathologic Diagnoses Scored by 9 Pathologists

All Pathologic Diagnoses Scored by 9 Pathologists
All Pathologic Diagnoses Scored by 9 Pathologists
Table 3. 

κ Values of all Combinations Among 9 Pulmonary Pathologists Based on 2002 American Thoracic Society/European Respiratory Society Classification Indicating Fair Interobserver Agreement

κ Values of all Combinations Among 9 Pulmonary Pathologists Based on 2002 American Thoracic Society/European Respiratory Society Classification Indicating Fair Interobserver Agreement
κ Values of all Combinations Among 9 Pulmonary Pathologists Based on 2002 American Thoracic Society/European Respiratory Society Classification Indicating Fair Interobserver Agreement
Table 4. 

All Guideline Diagnoses Based on 2011 Idiopathic Pulmonary Fibrosis Guidelines Scored by 11 Pathologists

All Guideline Diagnoses Based on 2011 Idiopathic Pulmonary Fibrosis Guidelines Scored by 11 Pathologists
All Guideline Diagnoses Based on 2011 Idiopathic Pulmonary Fibrosis Guidelines Scored by 11 Pathologists
Table 5. 

κ Values of All Combinations Among 11 Pulmonary Pathologists Based on 2011 Idiopathic Pulmonary Fibrosis Guidelines Indicating Poor Interobserver Agreement

κ Values of All Combinations Among 11 Pulmonary Pathologists Based on 2011 Idiopathic Pulmonary Fibrosis Guidelines Indicating Poor Interobserver Agreement
κ Values of All Combinations Among 11 Pulmonary Pathologists Based on 2011 Idiopathic Pulmonary Fibrosis Guidelines Indicating Poor Interobserver Agreement
Table 6. 

κ Values of All Combinations Among 11 Pulmonary Pathologists Based on the Collapsed Diagnosis Indicating Fair Interobserver Agreement

κ Values of All Combinations Among 11 Pulmonary Pathologists Based on the Collapsed Diagnosis Indicating Fair Interobserver Agreement
κ Values of All Combinations Among 11 Pulmonary Pathologists Based on the Collapsed Diagnosis Indicating Fair Interobserver Agreement

Cluster Analysis and Association With Disease Progression

Cluster analysis is a common statistical method to identify groups of individuals or objects that are similar to each other but different from individuals in other groups and is useful for determining the groups that are meaningful. We applied hierarchic, unsupervised cluster analysis for our research to determine meaningful diagnoses, using the collapsed diagnoses of the 2011 IPF guidelines. The diagnoses given by 11 pathologists were applied. The dendrogram showed 2 clusters (Figure 1). Cluster 1 showed high interobserver agreement (κ = 0.65); however, the agreement inside cluster 2 was low (κ = 0.25). The Kaplan-Meier curves created by consensus diagnosis of cluster 1 showed significant survival differences between UIP and non-UIP groups (Figure 2; P < .01). The diagnoses given by 5 of the 6 pathologists (83%) showed similar significant survival differences between UIP and non-UIP cases. The remaining pathologist (17%) showed a similar trend, but it did not reach statistical significance (P = .06). In contrast, the diagnoses given by only 1 of 5 pathologists (20%) showed a survival difference between UIP and non-UIP cases in cluster 2 (P < .01). Kaplan-Meier curves for all pathologists are shown in Figure 3, A through F (pathologists A through D, J, and K), and Figure 4, A through E (pathologists E through I).

Figure 1. 

Cluster dendrogram of 2 collapsed diagnoses. Pathologists are arrayed at the left side. They are divided into clusters 1 and 2.

Figure 2.Survival in the usual interstitial pneumonia (UIP) and non-UIP groups of patients using Kaplan-Meier curves. The mortality rate during follow-up was significantly higher in the UIP group of cluster 1 diagnoses.

Figure 1. 

Cluster dendrogram of 2 collapsed diagnoses. Pathologists are arrayed at the left side. They are divided into clusters 1 and 2.

Figure 2.Survival in the usual interstitial pneumonia (UIP) and non-UIP groups of patients using Kaplan-Meier curves. The mortality rate during follow-up was significantly higher in the UIP group of cluster 1 diagnoses.

Figure 3. 

A through F, Kaplan-Meier curves for each pathologist in cluster 1. Survival in the usual interstitial pneumonia (UIP) group was significantly worse than it was in the non-UIP group diagnosed by pathologists A, B, C, K, and J (P < .01 for all).

Figure 3. 

A through F, Kaplan-Meier curves for each pathologist in cluster 1. Survival in the usual interstitial pneumonia (UIP) group was significantly worse than it was in the non-UIP group diagnosed by pathologists A, B, C, K, and J (P < .01 for all).

Figure 4. 

A through E, Kaplan-Meier curves for each pathologist inside cluster 2. Curve of pathologist H only shows the survival difference between the usual interstitial pneumonia (UIP) and non-UIP groups (P < .01).

Figure 4. 

A through E, Kaplan-Meier curves for each pathologist inside cluster 2. Curve of pathologist H only shows the survival difference between the usual interstitial pneumonia (UIP) and non-UIP groups (P < .01).

DISCUSSION

The results of this study show that agreement among pathologists in the diagnosis of IIP may be low, and that the use of the 2011 IPF guidelines failed to increase the level of agreement in current form. The most frequent reason for this disagreement was inconsistent judgment of airway-centered changes and cellular interstitial pneumonia away from honeycomb. Case 4 was classified as UIP by 6 pathologists (55%), possible UIP by 1 pathologist (9%), and not UIP by 4 pathologists (36%). Not UIP was diagnosed by 4 pathologists because of a predominant airway-centered change (Figure 5, A through D). The original MDD diagnosis of case 4 was IPF; however, clinical course was stable during a 78-month follow-up. Considering favorable clinical behavior, the diagnosis of not UIP may be reasonable. Airway-centered fibrosis involve peripheral zones of the lobules in its progression, and the judgment of the disease distribution inside the lobule is often difficult when the fibrosis is extensive. The levels of interobserver agreement regarding disease distribution may be low at this point and needs to be investigated before practically involved into the criteria. Case 20 had low diagnostic agreement because of inconsistent judgments of alternative diagnosis of nonspecific interstitial pneumonia, which was diagnosed as UIP by 2 pathologists (18%), probable UIP by 3 pathologists (27%), possible UIP by 2 pathologists (18%), and not-UIP by 4 pathologists (36%) (Figure 6, A through D). The patient died from acute exacerbation of the disease 67 months into follow-up. This unfavorable clinical outcome suggests that the judgment of UIP was adequate. From the viewpoint of standardization, the criteria of the 2011 IPF guidelines in the present form may have limitations, and additional reproducible histologic criteria and/or biomarkers may be needed.

Figure 5

Case with low interobserver agreement. This case was classified as usual interstitial pneumonia (UIP) by 6 pathologists (55%) and as non-UIP by 5 pathologists (45%). The major reason for making a diagnosis of non-UIP was a judgment of predominantly airway-centered change. A, Low magnification shows patchy, dense fibrosis. B and C, There are areas showing a mixture of peripheral, accentuated fibrosis (arrowheads) and airway-centered (arrows) fibrosis. D, Some areas show hyalinized fibrosis of uncertain distribution (hematoxylin-eosin, original magnifications ×5 [A] ×20 [B and C], and ×50 [D]).

Figure 5

Case with low interobserver agreement. This case was classified as usual interstitial pneumonia (UIP) by 6 pathologists (55%) and as non-UIP by 5 pathologists (45%). The major reason for making a diagnosis of non-UIP was a judgment of predominantly airway-centered change. A, Low magnification shows patchy, dense fibrosis. B and C, There are areas showing a mixture of peripheral, accentuated fibrosis (arrowheads) and airway-centered (arrows) fibrosis. D, Some areas show hyalinized fibrosis of uncertain distribution (hematoxylin-eosin, original magnifications ×5 [A] ×20 [B and C], and ×50 [D]).

Figure 6

Another case with low interobserver agreement. This case was classified as usual interstitial pneumonia (UIP) by 5 pathologists (45%) and as non-UIP by 6 pathologists (55%). The major reason for making a diagnosis non-UIP was a judgment for an alternative diagnosis of nonspecific interstitial pneumonia. A, Major histologic findings are diffuse, dense fibrosis with microscopic honeycombing. B through D, Mixture of fibrotic areas and normal-looking areas (arrows) are shown. Fibroblastic foci is inconspicuous (hematoxylin-eosin, original magnifications ×10 [A through C], and ×20 [D]).

Figure 6

Another case with low interobserver agreement. This case was classified as usual interstitial pneumonia (UIP) by 5 pathologists (45%) and as non-UIP by 6 pathologists (55%). The major reason for making a diagnosis non-UIP was a judgment for an alternative diagnosis of nonspecific interstitial pneumonia. A, Major histologic findings are diffuse, dense fibrosis with microscopic honeycombing. B through D, Mixture of fibrotic areas and normal-looking areas (arrows) are shown. Fibroblastic foci is inconspicuous (hematoxylin-eosin, original magnifications ×10 [A through C], and ×20 [D]).

Our results suggest that combining the diagnoses of both UIP and probable UIP into the category pathologic UIP is useful for increasing interobserver agreement and predicting patient survival. At the same time, pathologic diagnoses of both possible UIP and not UIP could be combined into not UIP, consistent with the current guidelines. In addition, we used cluster analysis to establish which clusters had high interobserver agreement and were able to predict prognostic differences between UIP and non-UIP groups. We suggest that the diagnoses given by cluster 1 were ideal for the present cases. The images of the cases along with the consensus diagnoses of cluster 1 are available at http://nagasaki-pathology.jp/ip-cases-for-agreement-study. We believe that sharing these cases with pathologists worldwide is important for standardization of pathologic diagnosis for IIPs.

Needless to say, the diagnostic process of this study was different from real occasions in which adequate clinical and radiologic data are available at the time of diagnosis. The interobserver agreement should be better in clinical practice because MDD is the gold standard for the approach of interstitial pneumonia. Therefore, the low reproducibility of the pathology diagnosis may be adjusted after the overall multidisciplinary discussion. It may be important to repeat the same study with clinical and radiologic data.

Our study had several limitations. First, the sample size was small; therefore, further studies are required to validate our results. Second, our study did not involve general surgical pathologists who were not specialized in lung pathology. Third, we did not analyze each histologic criterion to predict patient survival. Fourth, we did not evaluate the reasons for the inconsistency between probable and possible UIP.

In conclusion, we have indicated that the current IPF guidelines in the present form are not being sufficiently implemented, but that combining UIP and probable UIP into a single diagnosis of UIP provides favorable agreement. Cluster analysis based on the diagnoses revealed the combined group provided high interobserver agreement and better prediction of poor patient prognosis. We believe that this study provides a practical method for standardizing the diagnosis of diffuse lung disease.

This study was partly funded by the study group on Diffuse Pulmonary Disorders, Scientific Research/Research on Intractable Diseases from the Ministry of Health, Labour and Welfare of Japan.

References

References
1
American Thoracic S, European Respiratory S
.
American Thoracic Society/European Respiratory Society international multidisciplinary consensus classification of the idiopathic interstitial pneumonias: this joint statement of the American Thoracic Society (ATS), and the European Respiratory Society (ERS) was adopted by the ATS board of directors, June 2001 and by the ERS Executive Committee
, June 2001 [published correction appears in
Am J Respir Crit Care Med
.
2002
;
165(3):426]
.
Am J Respir Crit Care Med. 2002;
165
(
2
):
277
304
.
2
Flaherty
KR,
King
TE
Jr,
Raghu
G,
et al.
Idiopathic interstitial pneumonia: what is the effect of a multidisciplinary approach to diagnosis?
Am J Respir Crit Care Med
.
2004
;
170
(
8
):
904
910
.
3
Nicholson
AG,
Addis
BJ,
Bharucha
H,
et al.
Inter-observer variation between pathologists in diffuse parenchymal lung disease
.
Thorax
.
2004
;
59
(
6
):
500
505
.
4
Lettieri
CJ,
Veerappan
GR,
Parker
JM,
et al.
Discordance between general and pulmonary pathologists in the diagnosis of interstitial lung disease
.
Respir Med
.
2005
;
99
(
11
):
1425
1430
.
5
Raghu
G,
Collard
HR,
Egan
JJ,
et al
ATS/ERS/JRS/ALAT Committee on Idiopathic Pulmonary Fibrosis. An official ATS/ERS/JRS/ALAT statement: idiopathic pulmonary fibrosis: evidence-based guidelines for diagnosis and management
.
Am J Respir Crit Care Med
.
2011
;
183
(
6
):
788
824
.
6
Gwet
KL.
Handbook of Inter-Rater Reliability: The Definitive Guide to Measuring The Extent of Agreement Among Raters. 4th ed
.
Gaithersburg, MD
:
Advanced Analytics;
2014
:
311
341
.

Author notes

The authors have no relevant financial interest in the products or companies described in this article.