To the Editor.—Most non–small cell lung cancer (NSCLC) patients are diagnosed at advanced stages. The 5-year survival rate of patients with advanced lung cancer is less than 20%, which makes lung cancer the leading cause of cancer-related deaths worldwide. Thus, accurately predicting the prognosis of these patients is extremely important in clinics, and it could be a challenge for laboratories to determine a predictor of disease severity. In a recent issue of the Archives of Pathology & Laboratory Medicine, Sun et al1  aimed to find the association between circulating tumor cells, circulating tumor-derived endothelial cells and their subtypes, and the prognosis of NSCLC patients. Based on univariate and multivariate analysis, they found that small-size circulating tumor cells are a reliable prognostic indicator and a probable predictor of the severity of disease in NSCLC patients. However, in this letter, we raise some statistical concerns about this study that may change the results or conclusion.

For the predictor logistic regression analysis in 2 previous studies, 10 outcomes for 1 variable were demanded as a basic statistical rule.2,3  However, in the study by Sun et al,1  Table 3 analyzed 12 variables, demanding 120 death cases, for the univariate analysis of survival in all patients. In contrast, there were only 3 death cases in the study by Sun et al.1  This huge gap between 3 and 120 death cases could not have yielded reliable statistical results, because these predictor logistic regression analysis models were severely overfitted.

Moreover, we were curious about the selection criteria of these variables in the predictor logistic regression analyses used in the Sun et al1  cohort study. Selecting the variables at random will result in inaccurate results. Potential variables or predictors are usually found by comparison between the deceased group and the alive group, then using these potential predictors for the subsequent predictor logistic regression analysis, so that more accurate results can be obtained.

In this letter, I point out a statistical pitfall that results in inaccurate results. However, we congratulate and show great respect for the outstanding work of Sun et al.

1.
Sun
Q,
Li
W,
Yang
D,
Lin
PP,
Zhang
L,
Guo
H.
The presence of small-size circulating tumor cells predicts worse prognosis in non–small cell lung cancer patients [published online April 18, 2024]
.
Arch Pathol Lab Med
.
2.
Pavlou
M,
Ambler
G,
Seaman
SR,
et al.
How to develop a more accurate risk prediction model when there are few events
.
BMJ
.
2015
;
351
:
h3868
.
3.
Riley
RD,
Ensor
J,
Snell
KIE,
et al.
Calculating the sample size required for developing a clinical prediction model
.
BMJ
.
2020
;
368
:
m441
.

Competing Interests

The authors have no relevant financial interest in the products or companies described in this article.