To the Editor.—In the report “Do Synoptic Reports Add Value in Prostate Needle Biopsies?” Drs Renshaw and Gould1  described a software parsing method to extract diagnostic data from the pathology report of 100 prostatic core biopsies and concluded that “prostate needle biopsy reports are easily amenable to structured data extraction without the need for a separate synoptic report.”

Their conclusion, however, rests almost exclusively on 2 conditions: (1) uniform tissue sampling mode (ie, one core per biopsy) and (2) simple dichotomy of diagnosis (ie, prostatic adenocarcinoma versus other). First, as we know, tissue sampling mode largely depends on the urologists who obtain the biopsy and frequently, in turn, may depend on the clinical needs. Pathologists do not control this variable. Second, in addition to prostatic adenocarcinoma, other malignancies such as small cell (neuroendocrine) carcinoma, lymphoma, urothelial carcinoma, etc (each accounting for <1% of prostatic malignancy), also occur in the prostate2,3  but are not graded with the Gleason grade. How does the reported method parse out cases with these tumors when using “Gleason” as the sole identifier? Moreover, the pathology reports frequently reflect the continuum of our diagnostic interpretation, not the presumptuous ideal dichotomy of pathobiology. Examples such as “Findings suspicious for but not diagnostic of (adenocarcinoma)” or “A minute focus of prostatic adenocarcinoma, volume insufficient for grading” do exist but are unlikely to be captured by the described method. More importantly, is the data set without these common elements representative of actual pathology reports?

Far more challenging but quite essential to such a data-extracting project is handling linguistic variable/syntactic variation, especially of the negation statement.4,5  How does their method segregate “Perineural invasion is present” from “No perineural invasion is identified” or “Negative for perineural invasion?” Although they reported a 100% accuracy of their parsing results, the results should be treated with alarming caution unless a sound validation schema is applied. A rudimentary human-machine cross-validation of a very small cohort (100 biopsies in this case) may be acceptable to test the programming skill, but is not even adequate for a simple proof of concept, let alone for testing the architecture design, programming algorithm, validation strategy, and results.

With these to be clarified, the key question becomes: Could the authors' conclusion hold if their method requires an impractical uniformity of tissue sampling pattern, can only identify one type of tumor in the reports, cannot handle syntactic variation, and is not based on a representative data set?

I do believe that the issue embarked upon by the authors is very important. It's valuable to find a pragmatic solution applicable to reporting core biopsies of other organs/sites as well. Otherwise, the value of such a project would be extremely limited. In other words, the value of such a strategy/method is critically proportionate to its applicability to other data-mining scenarios. Given the importance of the issue and the broad impact of its solution, judicious caution, careful design, and stringent validation must be exercised before reliable, meaningful, and, above all, correct conclusions can be drawn.

The author thanks Mark A. Micale, PhD, for his careful proofread of and constructive input to this commentary.

1
Renshaw
AA
,
Gould
EW
.
Do synoptic reports add value in prostate needle biopsies?
Arch Pathol Lab Med
.
2019
;
143
(
8
):
910
911
.
2
Nadal
R
,
Schweizer
M
,
Kryvenko
ON
,
Epstein
JI
,
Eisenberger
MA
.
Small cell carcinoma of the prostate
.
Nat Rev Urol
.
2014
;
11
(
4
):
213
219
.
3
Bostwick
DG
,
Iczkowski
KA
,
Amin
MB
,
Discigil
G
,
Osborne
B.
Malignant lymphoma involving the prostate: report of 62 cases
.
Cancer
.
1998
;
83
(
4
):
732
738
.
4
Chapman
WW
,
Bridewell
W
,
Hanbury
P
,
Cooper
GF
,
Buchanan
BG
.
Evaluation of negation phrases in narrative clinical reports
.
Proc AMIA Symp.
2001
:
105
109
.
5
Nadkarni
PM
,
Ohno-Machado
L
,
Chapman
WW
.
Natural language processing: an introduction
.
J Am Med Inform Assoc
.
2011
;
18
(
5
):
544
551
.

Author notes

The author has no relevant financial interest in the products or companies described in this article.