To compare the accuracy and computational efficiency of two of the latest deep-learning algorithms for automatic identification of cephalometric landmarks.
A total of 1028 cephalometric radiographic images were selected as learning data that trained You-Only-Look-Once version 3 (YOLOv3) and Single Shot Multibox Detector (SSD) methods. The number of target labeling was 80 landmarks. After the deep-learning process, the algorithms were tested using a new test data set composed of 283 images. Accuracy was determined by measuring the point-to-point error and success detection rate and was visualized by drawing scattergrams. The computational time of both algorithms was also recorded.
The YOLOv3 algorithm outperformed SSD in accuracy for 38 of 80 landmarks. The other 42 of 80 landmarks did not show a statistically significant difference between YOLOv3 and SSD. Error plots of YOLOv3 showed not only a smaller error range but also a more isotropic tendency. The mean computational time spent per image was 0.05 seconds and 2.89 seconds for YOLOv3 and SSD, respectively. YOLOv3 showed approximately 5% higher accuracy compared with the top benchmarks in the literature.
Between the two latest deep-learning methods applied, YOLOv3 seemed to be more promising as a fully automated cephalometric landmark identification system for use in clinical practice.
The use of machine-learning techniques in the field of medical imaging is rapidly evolving.1,2 Attempts to apply machine-learning algorithms in orthodontics are also increasing. Some of the major applications currently used are automated diagnostics,1 data mining,3 and landmark detection.4,5 Inconsistency in landmark identification has been known to be a major source of error in cephalometric analyses. The diagnostic value of the analysis depends on the accuracy and the reproducibility of landmark identification.6,7 The most recent studies in orthodontics, however, still rely on conventional cephalometric analysis depending on human tasks.4,8–11 A completely automated approach has thus gained attention with the aim of alleviating human error due to the analyst's subjectivity and reducing the tediousness of the task.12–19
Since the first introduction of an automated landmark identification method in the mid-1980s,20 numerous methods of artificial intelligence techniques have been suggested. However, in the past, the various approaches did not seem to be accurate enough for use in clinical practice.15 Rapidly evolving newer algorithms and increasing computational power provide improved accuracy, reliability, and efficiency. Recent approaches for fully automated cephalometric landmark identification have shown significant improvement in accuracy and are raising expectations for daily use of these automatic techniques.12,16,18 Recently, an advanced machine-learning method called “deep learning” has been receiving the spotlight.14 However, the first step toward applying this latest method to the automated cephalometric analysis system is just recently being taken.12
Currently available automated landmark detection solutions previously focused on a limited set of skeletal landmarks (less than 20), limiting their application either in determining precise anatomical structures or in providing soft tissue information.12,16–18 Cephalometric landmarks are not solely used for cephalometric analysis for skeletal characteristics. A much greater number of both skeletal and soft tissue landmarks are necessary for evaluation, treatment planning, and predicting treatment outcomes. It has repeatedly been emphasized that, when a greater number of anatomic landmark locations are used, a more accurate prediction of treatment outcome will result.8,9,21–24 To apply automatic cephalometrics in clinical practice effectively, computational performance would also be an important factor, especially when the system has to deal with a large number of landmarks to be identified. Previous research revealed that the systems based on the random forest method detected 19 landmarks in several seconds.18 Recently, one of the deep-learning methods, You-Only-Look-Once (YOLO), was shown to require a shorter time for detecting objects.25 A comparison among the latest machine-learning algorithms in terms of computational efficiency might be of interest to clinical orthodontists.
The purpose of this study was to compare the accuracy and computational performance of two of the latest machine-learning methods for automatic identification of cephalometric landmarks. This study applied two different algorithms in identifying 80 landmarks: (1) the YOLO version 3 (YOLOv3)–based method with modification25,26 and (2) the Single Shot Multibox Detector (SSD)–based method.27 The null hypothesis was that there would be no difference in accuracy and computational performance between the two automated landmark identification systems.
MATERIALS AND METHODS
A total of 1311 lateral cephalometric radiograph images were selected and downloaded from the Picture Aided Communication System server (INFINITT Healthcare Co Ltd, Seoul, Korea) at Seoul National University Dental Hospital, Seoul, Korea. In later stages, 1028 images were randomly selected as learning data, and the remaining 283 images played a role as new test data. Images of patients with growth capacity, fixed orthodontic appliances, large dental prostheses, and/or surgical bone plates were all included. The exclusion criteria were limited to only extremely poor-quality images, which made landmark identification practically impossible. The institutional review board for the protection of human subjects at Seoul National University School of Dentistry and Seoul National University Dental Hospital reviewed and approved the research protocol (institutional review board Nos. S-D 2018010 and ERI 19007).
Manual Identification of Cephalometric Landmarks
Of 1311 lateral cephalometric images, a total of 80 landmarks, including two vertical reference points that were located on the free-hanging metal chain on the right side, 46 skeletal landmarks, and 32 soft tissue landmarks (Figure 1), were manually identified by a single examiner with more than 28 years of clinical orthodontic experience. A modification of a commercial cephalometric analysis software (V-Ceph version 8, Osstem Implant Co Ltd, Seoul, Korea) was used to digitize the records for the 80 landmarks. Among them, 27 were arbitrary landmarks to render smooth line drawings of anatomic structures, and 53 were conventional landmarks that have been well-accepted in clinical orthodontic practice (Table 1).
Two Deep-Learning Systems
Two systems were built on a server running Ubuntu 18.04.1 LTS OS with a Tesla V100 GPU acceleration card (NVIDIA Corp, Santa Clara, Calif). One system was based on YOLOv3,26 the other was based on SSD.27 Learning data (N = 1028) trained the two machines' learning algorithms. Manually recorded location data of 80 landmarks served as standardized inputs in this learning process.
The target image was resized to 608 × 608 pixels from the original size of 1670 × 2010 pixels for optimal deep learning. One millimeter was equal to 6.7 pixels. While learning, each image along with its corresponding landmark labels was then passed through convolutional neural network (CNN) architecture for both YOLOv3 and SSD.
Test Procedures and Comparisons Between the Two Systems
To test the accuracy and computational efficiency between the two systems, 283 test data that were not included in the learning data were used. The accuracy of the two systems are reported as point-to-point errors that were calculated as the absolute distance value between the ground truth position and the corresponding automatically identified landmarks. To visualize and evaluate errors, two-dimensional scattergrams and 95% confidence ellipses based on chi-square distribution28–30 for each landmark were depicted. To follow the format of previous accuracy reports, thereby making analogous comparisons with previous results possible, the successful detection rates (SDRs) for 2-, 2.5-, 3-, and 4-mm ranges were calculated for 19 landmarks that were previously used in the literature.12 Computational performances were reported as the mean running time required to identify 80 landmarks of an image under this study's laboratory conditions. The differences in the test errors between YOLOv3 and SSD were compared with the t-test at the probability of .05 with the Bonferroni correction of alpha errors. All statistical analyses were performed by Language R (Vienna, Austria).31
YOLOv3 outperformed SSD in accuracy for 38 of 80 landmarks. The other 42 of 80 landmarks did not show statistically significant differences between the two methods. None of the landmarks was found to be more accurately identified by the SSD method (Figure 2).
Among the scattergrams, the porion and condylion points are provided as representative plots in Figure 3. The figure shows that YOLOv3 has not only smaller ellipses in size but also a more homogenous distribution of detecting errors irrespective of the direction. The latter can be seen by a more circular shape of the ellipses of YOLOv3, while SSD has crushed-shaped ellipses (Figure 3).
The mean time spent in identification and visualization of the 80 landmarks per image was recorded as 0.05 and 2.89 seconds for YOLOv3 and SSD, respectively. When compared with the top benchmark in the literature to date so far,12 YOLOv3 showed approximately 5% higher SDR in all ranges (Figure 4).
The present study was performed to investigate which kind of latest deep-learning method would produce the most accurate results in automatically identifying cephalometric landmarks. Although automatic cephalometric landmark identification has been a topic of interest, until the mid-2000s the developed algorithms did not seem accurate enough for clinical purposes.15 More recently, annual global competitions revealed impressive improvements in the accuracy of automated cephalometric landmark identification.12,17,18 In fact, recent approaches based on deep learning algorithms showed accuracy comparable with an experienced orthodontist.16,18 The result of the present study demonstrated that YOLOv3 was better than SSD. Furthermore, the accuracy results of the present study showed that YOLOv3 was better than other top benchmarks to date so far.12,17,18 Among the previous literature, the most accurate result was produced after applying CNNs, which identified 19 landmarks.12 The present study identified significantly more: 80 landmarks that could readily be extrapolated for clinical use in predicting treatment outcomes.8,21–24 For clinical purposes, data from cephalometric landmark identification could readily be extended even to predict and visualize soft tissue changes after treatment. For the aforementioned purposes, the previous international competitions dealing with 19 landmarks17,18 might not meet the clinical needs in orthodontic practice.
Applications of deep learning models to overall technology are becoming reality.14 Papers focusing on one of them, CNN, have been rapidly accumulating.1,2,12 Regarding automated cephalometric landmark identification, efforts to apply CNN have begun relatively recently. In 2016, with the aim of real-time object detection in testing images, two novel algorithms came out, namely, YOLO and SSD.25,27 YOLO uses CNN to reduce the spatial dimension detection box. It performs a linear regression to make boundary box predictions. The purported advantage of YOLO is fast computation and generalization. In the case of SSD, the size of the detecting box is usually fixed and used for simultaneous size detection. Therefore, the purported advantage of SSD is known to be the simultaneous detection of objects with various sizes. However, in landmark identification of cephalometric radiographs, the size of the detecting box is generally fixed. This was conjectured to be one reason for the poorer detection performance of SSD. A well-known limitation of both YOLO and SSD was that their accuracy was inferior to other methods when the size of objects is small. However, the latest version of YOLO (YOLOv3) claimed to improve its accuracy to the level of other preexisting methods while keeping the aforementioned advantages.26
Some of the landmarks are prone to error in the vertical direction, while others show greater errors in the horizontal direction.15,28 Hence, evaluating the accuracy based only on the linear distance might not be informative enough. Therefore, two-dimensional scattergrams and 95% confidence ellipses of 80 landmarks were depicted. As shown in Figure 3, YOLOv3 was revealed to have ellipses with smaller sizes and more circular shapes. In other words, YOLOv3 was not just more accurate but also resulted in a more isotropic shape of error patterns than did SSD. This feature might be another advantage of YOLOv3.
The computational time of an automated cephalometric landmark identification system might be a concern to clinicians. The mean time spent per image was 0.05 seconds for YOLOv3 and 2.89 seconds for SSD under this study's laboratory conditions. Even with an extensive number of landmarks to be identified, both algorithms showed excellent speed. The application of artificial intelligence in automated cephalometric landmark identification may lessen the burden and alleviate human errors. By gathering radiographic data automatically, the YOLOv3 method may also help reduce human tasks and the time required for both research and clinical purposes.
One strength of the present study was that the data included comprised the largest number of learning (n = 1028) and test data (n = 283) ever investigated. Limitations of the present study were that intra/interexaminer reliability statistics and reproducibility comparisons are necessary. To determine whether the automated cephalometric landmark identification may perform better than orthodontic clinicians, a future study is envisioned.
YOLOv3 outperformed SSD in accuracy and computational time. YOLOv3 also demonstrated a more isotropic form of detection errors than SSD did. YOLOv3 seems to be a promising method for use as an automated cephalometric landmark identification system.
This study was partly supported by grant 05-2018-0018 from the Seoul National University Dental Hospital Research Fund and the Technology Development Program (grant S2538233) funded by the Ministry of Small and Medium Enterprises and Startups, the Korean Government.
Some among the coauthors have a conflict of interest. The final form of the machine-learning system was developed by computer engineers of DDH incorporation (Seoul, Korea), which is expected to own the patent in the future. Among the coauthors, Hansuk Kim and Soo-Bok Her are shareholders of DDH Inc. Youngsung Yu and Girish Srinivasan are employees at DDH Inc. Other authors do not have a conflict of interest.
The first two authors contributed equally to this study.