The goal of the lymphocytosis diagnosis approach is its classification into benign or neoplastic categories. Nevertheless, a nonnegligible percentage of laboratories fail in that classification.
To design and develop a machine learning model by using objective data from the DxH 800 analyzer, including cell population data, leukocyte and absolute lymphoid counts, hemoglobin concentration, and platelet counts, besides age and sex, with classification purposes for lymphocytosis diagnosis.
A total of 1565 samples were included from 10 different lymphoid categories grouped into 4 diagnostic categories: normal controls (458), benign causes of lymphocytosis (567), neoplastic lymphocytosis (399), and spurious causes of lymphocytosis (141). The data set was distributed in a 60-20-20 scheme for training, testing, and validation stages. Six machine learning models were built and compared, and the selection of the final model was based on the minimum generalization error and 10-fold cross validation accuracy.
The selected neural network classifier rendered a global 10-class classification validation accuracy corresponding to 89.9%, which, considering the aforementioned 4 diagnostic categories, presented a diagnostic impact accuracy corresponding to 95.8%. Finally, a prospective proof of concept was performed with 100 new cases with a global diagnostic accuracy corresponding to 91%.
The proposed machine learning model was feasible, with a high benefit-cost ratio, as the results were obtained within the complete blood count with differential. Finally, the diagnostic impact with high accuracies in both model validation and proof of concept encourages exploration of the model for real-world application on a daily basis.
The authors have no relevant financial interest in the products or companies described in this article.