In Reply.—We appreciate Dr Qu taking the time to reply to our letter.1 We agree with many, if not most, of the author's points. In particular, we strongly agree with the statement that “given the importance of the issue and the broad impact of its solution, judicious caution, careful design, and stringent validation must be exercised before reliable, meaningful, and, above all, correct conclusions can be drawn.”
The topic of data reporting and data extraction in pathology reports is important and has a significant impact on the daily practice of pathologists, clinicians, and the care of their patients. As such, the topic deserves much more study than it has received, and we remain puzzled why so few pathologists and pathology organizations feel this is a topic not worth exploring. We have known for more than a decade that using a checklist can lead to a more complete surgical pathology report,2–4 but there are many ways to incorporate checklists into surgical pathology. As such, it is unfortunate that it is difficult to find accurate or validated data to support the decision by the College of American Pathologists (CAP) to offer pathologists only 2 specific methods to implement synoptic reporting (manual or the electronic Cancer Checklist [eCC]). This is particularly problematic, since one of these methods (manual) has been repeatedly shown to be unreliable in a busy practice setting,5–7 and the other is expensive, difficult to customize, and uses formatting features that have been repeatedly shown to be inferior for patient care.6 Designing their own Web site for synoptic reporting is likely beyond the resources of most pathologists.8 In the absence of any useful guidance in this area from the CAP, the current letter1 is part of a series of publications5,6,8–16 in which we are trying to build a body of data from which validated and accurate solutions can be designed. Having said that, we do believe that there are both specific and broader points in this reply that need to be addressed.
Specifically, there are many ways to extract data from narrative reports using natural language parsing. Two of the most important decisions to be made before embarking on such a task are to decide what is going to be extracted and how structured the narrative report should be. Dr Qu assumes we were trying to extract every possible feature from an entirely unstructured report. This was never our intention. We were very specific in our letter about what data we were going to try to extract and how the narrative text was formatted. Without a doubt our narrative reports have a structure, which our clinicians strongly prefer. Many, if not most, surgical pathology reports have a structure. But this structure is very different than the structure that the CAP demands for synoptic reporting, and the data show that unlike adding synoptic reports to surgical pathology reports, adding this structure does not increase amendment rates.5,7 While it is certainly possible that in the future there will be natural language parsers that can easily extract all the data Dr Qu suggests from entirely unstructured reports, that technology both currently does not exist and is not necessary. The point of the letter was that pathologists have a choice about extracting data from prostate biopsy reports. While they certainly can choose the additional work of adding a separate synoptic report to all their biopsies and increasing their amendment rate, they can also achieve the same goal by simply applying a limited set of structural rules to their current reports, which many, if not most, pathologists have already done anyway. In particular, many laboratories have already added structure to denote how many biopsies are in a specimen. “Uniform tissue sampling mode” is not required. If one wishes to extract diagnoses that do not require Gleason grading (including other tumors and suspicious but not diagnostic cases), one simply has to choose a different set of rules to extract these diagnoses. Dealing with negation is a topic that natural language parsing has already addressed and is well suited to handle. Nevertheless, we strongly agree with the author that regardless of which data extraction method one chooses, the reliability of the data extracted should be checked by using a routine auditing mechanism, such as the one we used in this study. Indeed, in our hospital system, there were far more errors in extracted data (and in the actual surgical pathology report) from the hospitals using the eCC from CAP (related to difficulties in implementing the eCC in the hospital laboratory information system) than from the hospitals using natural language parsing on a Web-based tool (E.W.G., unpublished data, October 2019).
More broadly, our goal is, first and foremost, to produce the best reports for our clinicians and patients and only once that is achieved, to try to extract data from the report for other uses. We strongly agree with Dr Qu that the best way to achieve our goals is by studying the issue and collecting data that can be analyzed and used to create methods of report generation whose accuracy and reliability can be measured and validated. In our opinion, this is a topic where we should all be on the same page as we work to do what is best for pathologists, patients, and clinicians. There remain significant opportunities for improvement in the formatting of surgical pathology reports to achieve these goals. We hope our fellow pathologists will help us find these opportunities.
Author notes
The authors have no relevant financial interest in the products or companies described in this article.