Conventional karyotype analysis, which provides comprehensive cytogenetic information, plays a significant role in the diagnosis and risk stratification of hematologic neoplasms. The main limitations of this approach include long turnaround time and laboriousness. Therefore, we developed an integral R-banded karyotype analysis system for bone marrow metaphases, based on deep learning.
To evaluate the performance of the internal models and the entire karyotype analysis system for R-banded bone marrow metaphase.
A total of 4442 sets of R-banded normal bone marrow metaphases and karyograms were collected. Accordingly, 4 deep learning–based models for different analytic stages of karyotyping, including denoising, segmentation, classification, and polarity recognition, were developed and integrated as an R-banded bone marrow karyotype analysis system. Five-fold cross validation was performed on each model. The whole system was implemented by 2 strategies of automatic and semiautomatic workflows. A test set of 885 metaphases was used to assess the entire system.
The denoising model achieved an intersection-over-union (IoU) of 99.20% and a Dice similarity coefficient (DSC) of 99.58% for metaphase acquisition. The segmentation model achieved an IoU of 91.95% and a DSC of 95.79% for chromosome segmentation. The accuracies of the segmentation, classification, and polarity recognition models were 96.77%, 98.77%, and 99.93%, respectively. The whole system achieved an accuracy of 93.33% with the automatic strategy and an accuracy of 99.06% with the semiautomatic strategy.
The performance of both the internal models and the entire system is desirable. This deep learning–based karyotype analysis system has potential in a clinical application.
Cytogenetic information, along with cytomorphology, immunophenotyping, and molecular genetics, is critical to the diagnosis and prognosis of acute myeloid leukemia, myelodysplastic syndrome, and other hematologic malignancies.1–5 Conventional karyotype analysis is an important cytogenetic investigation that is commonly applied in hematology and oncology. However, the karyotyping process is laborious. Routine image processing to generate a karyogram comprises multiple steps, including denoising, segmentation, classification, polarity recognition, and karyotype interpretation. Moreover, the accuracy and quality of the karyotyping results rely highly on skilled and experienced cytogeneticists.
With artificial intelligence (AI), especially deep neural networks (DNNs), showing great promise in computer-aided medical image processing and the standardization of image interpretation,6,7 using integral multistep DNN models may be a feasible way of improving the efficiency of karyotype analysis and realizing the standardization of clinical karyotype analysis.
Bone marrow is the ideal tissue for karyotype analysis in most hematologic malignancies. Both G- and R-banding are common banding techniques in hematologic cytogenetic investigations.8 Although G-banding is routinely used to identify normal and abnormal chromosomes in hereditary and neoplastic conditions, R-banding may have advantages because it darkly stains the ends of chromosome arms, which are frequently involved in hematologic malignancies. High-quality, large-scale data sets with annotations are indispensable in developing highly accurate and robust DNN models. Except for the few G-banded chromosome data sets9,10 and the fluorescent R-banded chromosome data set,11 metaphase or chromosome data sets derived from bone marrow have been rarely reported. A conventional R-banding data set is required for the development of a relative full-process karyotype analysis system.
Recently, several groups tried to improve the process of karyotype analysis by using DNN. Algorithmic research usually comprises 2 aspects: chromosome separation9,12–26 and chromosome classification.10,27–42 Chromosome separation is the preprocessing and generation of individual chromosomes from metaphase images or chromosome clusters (overlapping or touching chromosome clusters), including detection12,13 and segmentation.9,14–26 Chromosome classification involves the sorting of chromosomes into 24 classes containing 22 autosomes (chromosome Nos. 1–22) and the sex chromosomes (X and Y). Some researchers established classification models that can simultaneously recognize the chromosome polarity,27 while others applied models to chromosome classification in hematologic cytogenetics with large-scale data sets.11,43 At present, although several AI-assisted karyotype analysis systems are used, such as MetaSystems (Germany) and ASI (Israel) karyotyping software, conclusions based on a comprehensive evaluation of the individual parts and the entire workflow are rare, especially in the field of R-banded karyotype analysis in hematologic disease.
In this study, we first established an R-banded bone marrow metaphase data set with annotations for the different stages of the entire workflow. Subsequently, we constructed a multistep DNN-based R-banded bone marrow metaphase karyotype analysis system, including denoising, segmentation, classification, and polarity recognition. Then, the accuracy of the entire system was tested through real cases to evaluate its potential application in clinical practice.
MATERIALS AND METHODS
This study was approved by the institutional review board of the State Key Laboratory of Medical Genomics (Shanghai, China) and conducted in compliance with the Declaration of Helsinki. The images of the data set were retrospectively obtained and fully anonymized and de-identified.
Data Set Establishment
A total of 4442 R-banded bone marrow metaphases from 1435 patients with normal karyotype between 2018 and 2021 were retrospectively collected from Ruijin Hospital, Shanghai Jiao Tong University School of Medicine (Shanghai, China). Images from different stages of karyotyping for bone marrow cell metaphases were obtained, including the original microscopic images (Supplemental Figure 1, A; see the supplemental digital content containing 2 figures and 3 tables at https://meridian.allenpress.com/aplm in the August 2024 table of contents), the corresponding denoised images (Supplemental Figure 1, B), the segmented images (Supplemental Figure 1, C), and the karyograms (Supplemental Figure 1, D). A karyogram is a diagram of assigning segmented chromosomes from the metaphase according to the chromosome classes and polarities.44 The R-banded metaphases were captured at ×630 (×63 objective and ×10 ocular) by a CoolCube 1 camera (MetaSystems) attached to an AXIO IMAGER Z2 (Carl Zeiss, Germany). The associated karyograms were created with the Ikaros system (MetaSystems). The original microscopic, denoised, and segmented metaphases and the karyograms were exported by the Ikaros system and reviewed by 2 experienced cytogeneticists.
On the basis of the aforementioned images, the R-banded bone marrow metaphase data set consists of 4 associated subsets, namely, the original metaphase, denoising, segmentation, and classification and polarity recognition subsets. The denoising subset comprises the original microscopic metaphase images annotated with minimum bounding rectangles of all chromosomes within the single metaphase. The segmentation subset consists of metaphase images with manual annotation of chromosomes on the polygonal annotation tool, Labelme Python package (https://doi.org/10.5281/zenodo.5711226). The classification and polarity recognition subset comprises chromosome images cropped from karyograms with both classification and polarity information.
Deep Learning Models
Four DNN models were developed: the denoising, segmentation, classification, and polarity recognition models (Figure 1, A through D). To develop the denoising model, we first chose a 2-stage object detection model, Faster R-CNN (Region-based Convolutional Neural Network),45 to detect and crop the distribution region of metaphase in the originally captured microscopic image. The tiny background noises were then filtered by using morphology-based methods. The original microscopic metaphase images were taken as the input to the denoising model, and the denoised metaphase output was used for further chromosome segmentation. For the segmentation model, an instance segmentation method, Cascade Mask R-CNN,46 was exploited to obtain individual chromosomes. The instance segmentation method can not only achieve pixel-level segmentation on images but also deal with complex touching or overlapping chromosomes. The segmentation model took the denoised metaphases as input and output individual segmented chromosomes. For chromosome classification, we introduced an end-to-end combinatorial optimization-based method. This method was developed in our previous work and introduces a grouping-guided feature interaction module (GFIM) to refine the relative longitudinal information of chromosomes from the same cell and a deep assignment module to reassign the chromosomes with 4 bidirectional recurrent neural network blocks.47 During the establishment of the classification model, the image size of each chromosome was normalized within the corresponding karyotype to preserve the chromosome length information. Horizontal flipping was adopted for data augmentation in the training process of the classification model. The classification model used the segmented chromosome images as input and exported the predicted karyotype with chromosome classification information. The polarity recognition model used a typical CNN model for a 2-category prediction task, which uses the segmented chromosome with chromosome classification information as input and outputs the identification of the long and short arms of the chromosome. Finally, the segmented chromosomes within a metaphase were assigned on the karyogram according to their chromosome classes and oriented with short arms upward.
All the networks were implemented with the PyTorch deep learning library.48 The networks were trained on NVIDIA Tesla A100 graphics processing units. For chromosome classification and polarity recognition model training, we used cross entropy loss with the initial learning rate of 0.0001. The chromosome classification model was trained with 40 000 iterations, and the polarity recognition model was trained with 20 epochs. The denoising model and the segmentation model were implemented by using the MMDetection toolbox49 ; both models were trained with 15 epochs.
Evaluation Metrics
The performance of the denoising and segmentation models was evaluated in terms of the intersection-over-union (IoU) and the Dice similarity coefficient (DSC), and the performance of the classification and polarity recognition models was evaluated in terms of accuracy and the F1 score. Moreover, precision, recall, and the F1 score were used to precisely assess the classification result for each chromosome. For the integral system, the performance of both automatic and semiautomatic strategies was evaluated in terms of the classification accuracy for each chromosome.
The precision, recall, accuracy, and F1 score values range from 0 to 1, where 1 indicates the best performance.
The mean value and SD of the abovementioned metrics are provided to evaluate the stability of the system and were calculated from the 5-fold cross validation results.
Data Visualization
Uniform manifold approximation and projection (UMAP)50 was applied to visualize the extracted features of the classification model.
RESULTS
Structure of the Integral R-Banded Bone Marrow Karyotyping Analysis System
To establish the full-process R-banded bone marrow karyotype analysis system, 4 internal AI models were developed for the various stages of the karyotyping workflow on the basis of DNNs. These 4 internal AI models include (1) a denoising model to find and locate the metaphases within the original captured microscopic images, (2) a segmentation model to separate individual chromosomes, (3) a classification model for the 24 classes (1–22, X, Y) of chromosomes, and (4) a polarity recognition model to adjust the position of the chromosome arms in the proposed karyogram. Following the karyotype analysis workflow, each model processed the input and fed the results into the next model (Figure 1, A through D). Meanwhile, the models also output relevant processed plots for the cytogeneticist’s reference.
Performance of Internal Models
To evaluate the performance of the internal models of the karyotype analysis system, the image data sets were randomly divided into 5 subsets. The original metaphase, denoising, segmentation, and classification and polarity recognition subsets were all divided according to the original metaphase image distribution. Each model was trained and evaluated through 5-fold cross validation. Each time, 4 folds were used for training and validation. The remaining fold was used for testing (Supplemental Table 1). The denoised, segmented, classified, and polarity-oriented results produced by the proposed models are displayed in Figure 2, A through D.
Denoising Model
The denoising model, which was the first model in the proposed system, aimed to identify and isolate the metaphase within the original microscopic images while filtering out debris. Herein, the original microscopic metaphase was processed by the denoising model, and the denoised metaphase is the output (Figure 2, A). The mean IoU of the denoising model was 99.20%, and the mean DSC was 99.58% (Table 1), indicating that most of the intact metaphases were denoised without losing any chromosomes.
Segmentation Model
In the segmentation step, the segmentation model based on DNN was trained to segment individual chromosomes. Given that chromosomes were randomly scattered on the slide, sometimes they touched or overlapped into chromosome clusters, requiring the segmentation of either the individual chromosomes or the chromosome clusters for further classification. The proposed segmentation model accomplished the segmentation for individual chromosomes on the denoised metaphases (Figure 2, B). The mean DSC and IoU of the segmentation model were 91.95% and 95.79%, respectively (Table 1). The mean accuracy of the segmentation model was 96.77% (Table 1).
Classification Model
After segmentation, the individual chromosomes were assigned into 24 classes (1–22, X, and Y) by the classification model (Figure 2, C). The F1 score and the accuracy rate were 98.58% and 98.77% (Table 1), respectively. The evaluation metrics of each class of chromosomes are listed in Table 2. The mean recall, precision, and F1 score for all the chromosome classes were 98.68%, 98.67%, and 98.67%, respectively. The evaluation values of most of the classes of chromosomes reached more than 98%, while the Y chromosome had the lowest precision, recall, and F1 score (95.96%, 95.46%, and 95.70%, respectively).
We applied the UMAP dimension reduction tool50 to visualize the extracted features of the classification model (Figure 3). The UMAP plot showed the proximity of data points on a 2-dimensional plot. Chromosomes 14 and 15, chromosomes 21 and 22, and chromosomes 18 and Y were near each other. These chromosomes are similar in terms of size, shape, and banding pattern, consistent with their similar extracted features in the model.
Polarity Recognition Model
The polarity recognition step aimed to distinguish between the p arm (short arm) and q arm (long arm) of a chromosome and orient the p arms upward in the karyograms (Figure 2, D). The mean F1 score and accuracy of the polarity recognition model were 99.90% and 99.93%, respectively (Table 1).
Performance of the Entire Process of the Integral Karyotyping Analysis System
The aforementioned internal models formed the integral karyotype analysis system. Two strategies for implementing the integral karyotype analysis system were proposed: an automatic strategy and a semiautomatic strategy. The automatic strategy processed the original bone marrow R-banded metaphase through the system without manual adjustment (Supplemental Figure 2, A). Next, the original R-banded bone marrow metaphase images were used as input to directly create the karyograms. Internally, the output was sequentially delivered to the next model. The semiautomatic strategy introduced manual adjustments after each primary processing of the internal models. The output was manually rectified to eliminate errors and then fed to the next model (Supplemental Figure 2, B). To assess the performance of the entire workflow of the system for both strategies, the image sets were divided into the training, validation, and test sets at a 3:1:1 ratio (Supplemental Table 2). The classification accuracy of the entire workflow was analyzed.
The final results of the 2 strategies are compared in Table 3. The average classification accuracy of the system with the automatic strategy was 93.33%. Except for chromosomes 15, 18, 20, 21, 22, and Y, the classification accuracy of each chromosome exceeded 90%. The classification performance of the semiautomatic strategy was based on well-segmented chromosomes. The proposed system with the semiautomatic strategy presented an average classification accuracy of 99.06%. Except for chromosomes 14, 15, 18, and Y, the accuracy of each class of chromosomes exceeded 98% (Table 3). The comparison of the classification results of the 2 strategies indicated that the average accuracy of classification increased by 5.73% after manual adjustment. In particular, the classification accuracy of chromosome Y increased drastically by 22.56%.
The confusion matrices within the correctly segmented chromosomes of the 2 strategies depicted the specific relationship between the misclassified chromosome pairs in Figure 4, A and B. With the semiautomatic strategy, low accuracies could be seen in chromosomes 15, 14, Y, and 18. Mutual misclassifications could be observed between chromosomes 14 and 15 and chromosomes 18 and Y (Figure 4, A). With the automatic strategy, chromosomes Y, 18, 20, 15, and 14 had the top 5 lowest accuracies. Chromosome Y was often mistaken as 18, 21, 20, or 22, while chromosomes 14 and 15 also presented a mutually misclassified pattern (Figure 4, B). According to the confusion matrices, the common misclassification pairs of both strategies were chromosomes 14 and 15 and chromosomes 18 and Y, which were consistent with the results presented in the UMAP plot (Figure 3) and the natural features in terms of size, length, and banding pattern. These misclassification pairs accounted for 29.50% and 15.78% of the errors in the semiautomatic and automatic strategy confusion matrices, respectively. The major difference between the 2 strategies was the input of the classification model: one was the direct output of the segmentation model, and the other was the segmented chromosome after manual adjustment.
DISCUSSION
AI, especially DNNs, has shown great promise in various image-processing tasks. A data set with professional annotation is a prerequisite of a well-established deep learning model. Given that the R-banded karyotype on bone marrow has a utility in hematologic malignancies, we established an R-banded bone marrow metaphase data set with annotation in this study for the proposed integral R-banded karyotype analysis system, which may contribute to the cytogenetic diagnostic practice.
The integral system we built mainly focused on chromosome separation and classification. The proposed denoising and segmentation models were used to accomplish chromosome separation. Previous studies used object detection methods12,13 and semantic segmentation9,14,15,19,51 for chromosome separation. DNN-based chromosome detection methods aim to enumerate chromosomes rather than produce karyograms.12,13 Semantic segmentation models mainly deal with pairs of overlapping chromosomes.14 With a reconstruction strategy, a semantic segmentation model was applied to recognize overlapping and nonoverlapping chromosomal parts and reconstruct intact chromosomes.19 However, in the practical application of clinical chromosome segmentation, chromosomes are located randomly within a metaphase on the slide, requiring a segmentation model capable of solving complicated situations for each metaphase. Of note, semantic segmentation models cannot simultaneously solve all the aforementioned problems. Recently, the instance segmentation method was developed for chromosome separation and achieved good results.16,20,22–24 Liu et al25 established RC-Net for chromosome instance segmentation in a data set containing 985 Giemsa-banding chromosome images. An accuracy of 98.06% was achieved when the IoU threshold was 75%. In our study, we applied an instance segmentation method (ie, Cascade Mask R-CNN46) in our segmentation model and achieved good results on a large-scale data set, with an accuracy of 96.77% when the IoU threshold was 70%, an IoU of 91.95%, and a DSC of 95.79%. These results enabled the direct feeding of the segmented chromosomes into the subsequent classification model to finally produce karyograms.
Regarding chromosome classification, deep learning models often incorporate prior knowledge to improve performance. A constant number of chromosomes in normal karyotype was applied to the classification distribution strategy,27 based on which the researchers were inspired to develop relevant models.28,39 In addition, the assignment of chromosomes to 7 groups (A–G) according to the Denver system was implemented to reduce the intergroup misclassification.39 The proposed model proved the usefulness of emphasizing the relative length information within a metaphase to resolve the heterogeneity of chromosome images. Transformer is a recently proposed machine learning network that has been reported to improve the performance of image classification.52 In our classification model, the transformer module was introduced as the GFIM to extract comparative longitudinal information between different chromosome classes. Moreover, the deep assignment module refined the classification result according to the comparative information extracted by the GFIM. By taking the relative length information as prior knowledge, favorable results were obtained, with each image of this classification model having an accuracy of 98.77% and an F1 score of 98.58%. This optimized classification model47 was also proved to surpass other models such as the baseline of ResNet50,53 the Hungarian algorithm–based method,54 DeepMOT,55 and the state-of-the-art method of chromosome classification, Varifocal-Net.27
The performance of the entire system was evaluated on the test set according to the 2 strategies. Following the automatic strategy, the karyotype system achieved a classification accuracy of 93.33%, and for the semiautomatic strategy, the classification accuracy of the system reached 99.06%. The main accuracy loss was caused by the segmentation and classification process. For the segmentation part, the chromosome classes with an accuracy lower than 95% were chromosomes 1, 21, 22, and Y (Supplemental Table 3). By R-banding, chromosome 1 has significant pale-staining regions around the centromere, which might cause it to be cut into 2 parts. Chromosomes 21, 22, and Y are small and might be easily missed. With the manual adjustment mainly for segmentation, the classification accuracy was improved by 5.73%. We assumed it was manual adjustment for the output of the segmentation model by cytogeneticists that rectified errors. For the classification part, the 2 most misclassified chromosome pairs, chromosomes 14 and 15 and chromosomes 18 and Y, were similar in morphologic nature, which could not be improved by manual adjustment. Both chromosomes 14 and 15 are medium-sized acrocentric chromosomes with similar banding patterns, and misclassification between chromosomes 14 and 15 has been noted in other studies.11,27 Misclassification of chromosome Y is quite interesting because chromosome Y is polymorphic in size. When the size of chromosome Y is large, it is more often misclassified with chromosome 18. In contrast, a small Y is more often misclassified with chromosome 21 or 22. In our observation, the mismatch of chromosome Y and chromosome 18 seemed to be more significant, which might be due to the size and morphologic characteristics of chromosome Y in our data cohort. These items also showed proximity in the UMAP plot, indicating that the features extracted by the classification model were real and reliable. In accordance with the result of a DNN classification model based on a large-scale data set,11 misclassification cannot be entirely eliminated by increasing the number of images of the data set such as misclassification pairs of chromosomes 14 and 15 and chromosomes 18 and Y. The classification accuracy of Y is relatively low, owing to its polymorphic size and the imbalance of training data, as the Y chromosome only appears in male metaphases. According to the source of errors in the entire workflow, the segmentation errors can be alleviated by either manual adjustment or further improvement in methodology, and the classification part still needs to be addressed through deep learning algorithm improvement. Although the result of the automatic strategy was less optimal than that of the semiautomatic strategy, with algorithm improvement and data set enlargement, a fully automatic karyotype analysis system can be realized in the near future. Moreover, multicenter validation is required for further validation, and the identification of chromosomal structural aberration was also necessary.
In conclusion, an R-banded bone marrow metaphase data set of 4442 metaphases was constructed in this study, and 4 internal models for the sequential steps of chromosome image analysis were then developed. Subsequently, an integrated R-banded karyotyping analysis system was developed. The proposed system achieved favorable results in both the automatic and semiautomatic strategies. We hope that this deep learning–based karyotype analysis system may be able to serve as a diagnostic aid for cytogeneticists, which could improve karyotyping workflow in cases of hematologic diseases. However, there is no doubt that the G-banding technique is also widely used, and therefore the application of R-banded karyotype analysis system in commercial and hospital-based laboratories might be limited. A more comprehensive automatic karyotype analysis system for either G- or R-banded chromosomes should be established afterward.
References
Author notes
Supplemental digital content is available for this article at https://meridian.allenpress.com/aplm in the August 2024 table of contents.
Wang, Xia, J. Yang, and B. Chen contributed equally to this work, with Wang and Xia considered co-first authors
This study was supported by the National Natural Science Foundation of China Grant (81670137), Shanghai Municipal Education Commission-Gaofeng Clinical Medicine Grant Support grant (20152501), State Key Laboratory of Medical Genomics Support Grant (201802), and Sjtu Trans-med Awards Research (2022102).
The authors have no relevant financial interest in the products or companies described in this article.