Abstract
Objective: The purpose of the present study was to evaluate reliability of a system that performs automatic recognition of anatomic landmarks and adjacent structures on lateral cephalograms using landmark-dependent criteria unique to each landmark.
Materials and Methods: To evaluate the reliability of the system, the system was used to examine 65 lateral cephalograms. The area of each system-identified anatomic structure surrounding the landmark and the position of each system-identified landmark were compared with norms using confidence ellipses with α = .01, which were derived from the scattergrams of 100 estimates obtained according to the method reported by Baumrind and Frantz. When the system-identified area overlapped with the norm area, anatomic structure recognition was considered successful. In addition, when the system-identified point was located within the norm area, landmark identification was considered successful. Based on these judgment criteria, success rates were calculated for all landmarks.
Results: The system successfully identified all specified anatomic structures in all the images and determined the positions of the landmarks with a mean success rate of 88% (range, 77%– 100%).
Conclusion: With the incorporation of the rational assessment criteria provided by confidence ellipses, the proposed system was confirmed to be reliable.
INTRODUCTION
Lateral cephalograms are essential in contemporary orthodontic diagnosis and treatment planning. Anatomic landmark identification based on the visual assessment of the lateral cephalogram remains a task that requires specially trained clinicians. A fully automated clinical examination of cephalograms would reduce the workload during routine clinical service and would provide orthodontists with more time for optimum treatment planning. Various methods such as a knowledge-based technique with edge tracking,1–4 model-based approaches,5–8 pattern-matching techniques,9–11 and combined algorithms12–16 have been developed and are available. However, most of these methods have not been adopted in clinical practice.17
Recently, a system that recognizes general grayscale images using an automated psychologic brain model,18 ie, a hardware-friendly algorithm to accomplish real-time recognition by recalling a set of modeled data that is mathematically described using a finite number of traits and previously stored in the system, has been developed. This system employs a new technique called the projected principal edge distribution (PPED) as a means for extracting features from an image, and it has been confirmed that the system demonstrates robust performance in recognizing images, including cephalograms.19,20 Although experiments have suggested the efficacy of the system in recognizing images, it remains uncertain whether such a system will detect conventionally used landmarks with high precision. On the other hand, a previous study21 documented that topographic variations exist in humans' subjective judgments of cephalometric landmarks, and the shapes and size of the variances are unique to each landmark. Mathematical formulation of these landmark-dependent variations in measurement would be help researchers to evaluate objectively the reliability of the automatic cephalogram recognition system.
The purpose of the present study was to examine the reliability of a system that performs automatic recognition of anatomic landmarks and their surrounding anatomic structures in which the landmarks are located on lateral cephalograms using landmark-dependent criteria unique to each respective landmark.
MATERIALS AND METHODS
System Overview
The system employed in the present study, which performs automatic recognition of anatomic landmarks and their surrounding anatomic structures on lateral cephalograms, is shown in Figure 1. A detailed description of the system has been reported elsewhere.1,19 Briefly, the system incorporates two major tasks: the “knowledge-generation” (system learning) phase and the “recognition” phase. In the knowledge-generation phase, image data extracted from learning samples are converted into PPED vectors consisting of 64 variables that feature contours of the anatomic structures.1,18,19 From these vectors, template vectors, ie, the principal information for identifying the landmarks, are generated using a generalized Lloyd algorithm22 for each landmark, which are stored in the system as the system's knowledge. During the recognition phase, the system is designed to perform pixel-by-pixel film scanning with template-matching operations between PPED vectors that are generated from an input film and template vectors stored on the system. The system recognizes the most matched position as a landmark position.
Data
Pretreatment lateral cephalograms were obtained from 465 Japanese orthodontic patients (147 male and 318 female patients; mean age, 22 years 6 months; age range, 10 years 9 months to 60 years) who had visited the university dental hospital between the years 1998 and 2003. Patients were enrolled consecutively. Criteria for selection were permanent dentition, no congenital anomalies, and no missing teeth or metallic restorations. Digital lateral head films of each patient, with a magnification ratio of 1:1.1, were taken with the teeth in habitual maximum intercuspation position and the lips in repose. The sample was divided into two groups at random: 400 cephalograms for system learning and 65 cephalograms for testing the system's performance.
Each record was cross-marked directly by one of the authors (CT) on the top right and bottom left corners of the film with a pencil for use as reference points. The films were then digitized in a scanner (ES8500, EPSON, Tokyo, Japan) at a resolution of 300 dots per inch to provide an image dataset (2320 × 2960 pixels, 1 pixel = 0.085 mm; hereafter referred to as a raw image dataset). The films were then traced with a pencil on acetate paper overlaid on the films. Twenty anatomic landmarks23–25 (Table 1) were identified visually, cross-marked by one of the authors, and double-checked by another author. The degree of certainty was recorded for each landmark using the following three subjective judging scores: 1 = absolutely correct; 2 = probably correct; and 3 = difficult to recognize. All procedures were performed on a light box with identical writing tools in an air-conditioned and darkened room.
To obtain data from the traced images, the films with the traced papers were digitized using the scanner following the same method as described to obtain the “traced image data.” Using a mouse, one of the authors identified the positions of reference points and anatomic landmarks on both the raw image data and the traced image data on a computer monitor (17-inch LCD monitor, 1701FP, Dell, Round Rock, Tex). The images were used at their actual size and magnified to 200% of their original size. The positions of the reference points on the two types of images were mathematically superimposed to provide coordinate values for the landmarks on the raw image data.
The raw image data and the position data thus were used for system's learning. Fifteen template vectors were generated as system knowledge and stored in the system according to a previously reported method.20 Positions assigned to the degrees of certainty of 2 or 3 were ruled out from the dataset for the system learning/knowledge-generation phase of data collection.
Confidence Ellipses
To evaluate the system's performance reliability, scattergrams that designated errors for manual landmark identification when 10 orthodontists identified a landmark on 10 cephalograms were obtained according to the method reported by Baumrind and Frantz.21 (For details, see Appendix.) Confidence ellipses with a confidence limit of α were developed for each landmark from the scattergram, using equation (1).
where CHI2 is the function that provides the one-tailed probability α of the chi-square distribution with 2 degrees of freedom; x and y are the coordinate values; σx and σy are standard deviation values for x and y, respectively; and ρ is the covariance of correlation between x and y.26 The parameter α was assigned to 1.0 when the input position was at the best estimate position, whereas the value was assigned closer to zero when the input was in a biased position from the best estimate position.
The confidence ellipses with α = .01 for the landmarks are shown in Figure 2A,B. The parameters (the angular measurement between the x-axis and the semimajor axis, the lengths of the semimajor and semiminor axes) representing the confidence ellipses with α = .01 are provided in Table 2. The average for the semiminor axis was 2.68 mm (range, 1.35–4.22 mm) and for the semimajor axis the average was 4.66 mm (range, 2.99–10.20 mm). The landmark with the shortest semiminor axis was nasion.
Figures 2A and 2B. Confidence ellipses obtained for cephalometric landmarks. Black points indicate coordinate values of landmarks identified by 10 orthodontists on 10 cephalograms. The black lines designate confidence ellipses with α = .01. Origin indicates the best estimate; x-axis, the line that passes through the origin and is parallel to the line S-N; and y-axis, the line that is perpendicular to the x-axis through the origin
Figures 2A and 2B. Confidence ellipses obtained for cephalometric landmarks. Black points indicate coordinate values of landmarks identified by 10 orthodontists on 10 cephalograms. The black lines designate confidence ellipses with α = .01. Origin indicates the best estimate; x-axis, the line that passes through the origin and is parallel to the line S-N; and y-axis, the line that is perpendicular to the x-axis through the origin
System Performance Test
After learning the image characteristics of the 400 cephalograms, the automatic recognition system was asked to identify 65 cephalograms that had been reserved to test the performance of the system. Landmark positions assigned to a 3 for degree of certainty were ruled out of the dataset for this task.
First, for each landmark, the system computed an area that was the most probable location of the anatomic structure surrounding the landmark as a minimum rectangular area that included the first 50 candidate positions of the landmark (hereafter referred to as “search area”) in 1/16 downscaled targeted images. If the fiducial zone, designated by a confidence ellipse with α = .01, was found to overlap the search area, recognition of the anatomic structure was judged successful.
Second, for each landmark, the system computed the most probable position of the landmark by examining targeted images with original resolution in the search area. The success or failure of the assessment by the system was evaluated using confidence ellipses with α = .01. In short, when a system-identified point was located within a confidence limit of α = .01, the landmark identification was judged to be successful.
The success rates for the recognition of the landmark and the anatomic structure surrounding that landmark were defined as the proportion of the total samples that could be successfully recognized by the system. The success rates for all the landmarks were calculated. All procedures were carried out on a workstation (Sun Blade 2000, Sun Microsystems, Palo Alto, Calif).
RESULTS
The system successfully recognized all anatomic structures surrounding all landmarks (sella turcica, nasofrontal junction, infraorbital area, mandibular symphysis, etc) within a 20.5- × 20.5-mm search area (range, 11.9- × 11.9-mm to 27.7- × 27.7-mm). The positions of the 20 cephalometric landmarks identified by the system are given in Figure 3. The mean success rate for identifying the landmark positions was 88%, with a range of 77% to 100%. The system demonstrated a 100% success rate in recognizing Me and success rates above 90% in recognizing point B, Ptm, Pog, Ba, L1_C, U1_C, and Gn. On the other hand, the success rates for N, Go, S, L1, and PNS were under 80%, with the lowest success rate (77%) for N (Table 3).
Positions of the 20 cephalometric landmarks that were identified and exemplified by the system
Positions of the 20 cephalometric landmarks that were identified and exemplified by the system
Among the cephalograms employed for the system's performance test, 12% had no misidentified points, whereas 75% had fewer than three misidentified points, with a maximum of six.
DISCUSSION
For humans, recalling past memory (experiences) in immediate response to a sensory input is assumed to be the very basis of recognition. Based on this postulate, a psychologic brain model employed in the present study was developed in which an image is compressed into a PPED vector so that the system performs a search for the most similar vector stored in the system as template vectors.1,18,19 Table 4 gives an overview and comparison of earlier systems for automatic landmark recognition on cephalograms with the present system.
To evaluate the accuracy of the landmark identification provided by the systems—in other words, whether the system's definition of a landmark position is clinically acceptable—has been a critical issue in testing the performance reliability of such systems. Three major methods for such an evaluation have been employed so far. In the first method, an individual orthodontist makes a visual judgment as to whether or not the system's recommendation is acceptable.1 This method, which is also used for evaluating whether an orthodontic resident's reading of a cephalogram is acceptable, is limited to the extent that it inevitably incorporates intrajudge and interjudge variations. The second approach involves describing mean recognition errors, ie, the mean distance between the point provided by an orthodontist(s) and the point determined by the system.6–8,14 The third method is to examine whether the system-identified landmark is located in a circle with a 2-mm radius.2,3,7,9–12,15,16 This approach is meaningful in the sense that it provides an objective judgment as to whether or not the system's recommendation is correct, but it leaves room for argument as to whether it is reasonable to apply a circle with a 2-mm radius to all cephalometric landmarks, given that such landmarks are located in varying anatomic structures. In the present study, confidence ellipses were developed from the scattergrams that represent topographic variations in experts' subjective judgments of cephalometric landmarks, and these were employed to assess the system performance as the landmark-dependent criteria.
The sizes of the confidence ellipses for sella (the center of a pseudo-ellipse) and nasion (a junction between two bones) were relatively smaller because of their simple definitions and the good contrast found around the imaged landmarks. The large vertical variation in the distributions for Ptm is well-known empirically but was tolerable because in cephalometric analysis only the horizontal position of Ptm is used to determine the position of the posterior limit of the maxilla. By the same token, the greater horizontal variances in the distributions for ANS and PNS were tolerable because in cephalometric analysis only the vertical positions of ANS and PNS are used to determine the orientation of the palatal plane. In light of the fact that these factors are likely to cause errors and variations, we employed confidence ellipses for assessing whether the determinations of anatomic landmarks on cephalograms were correctly made by the automatic recognition system. The allowable determination domain has been expressed as clinically “correct” by specialists in terms of statistical probability, and the fiducial zones established were therefore considered to be suitable for assessing the performance of the automatic recognition system.
Identification of sella, nasion, gonion, L1, and PNS achieved success rates in the range of 77% to 80%. The relatively lower success rates for sella and nasion were associated with narrow confidence ellipses with distances of 1.5 mm and 1.4 mm, respectively, in the semiminor axis direction. The smaller zoning made the judgment more stringent, resulting in relatively lower rates of successful recommendations by the system compared with other anatomic landmarks. The landmark PNS, misidentified by the system, was located within the confidence ellipse horizontally but fell out of the fiducial zone vertically in most cases. The zone considered “correct” for PNS was as short as 1.6 mm in terms of the length of the semiminor axis, whose direction is almost vertical; this appeared to lower the rate of successful identifications by the system.
In a case involving excessive negative overjet, L1, the lower incisor tip, was incorrectly recognized as positioned posterior to the upper incisor tip. In a case involving reduced overbite, L1 was placed inferior to the fiducial zone. The relatively lower success rates for L1 were caused by great variation in the spatial relationship between the upper and lower incisors. Similarly, there was great variation in where the system located gonion because of its position in an obscured image area with complicated overlapping of the pharyngeal region, the cervical vertebrae, and the mandibular angle area. The lower rates of successful recognition of these anatomic landmarks may be explained by the presumption that the cephalograms used for testing the system's performance did not employ records of patients whose dentoskeletal/soft tissue relationships were similar to those seen in the record set employed for the system's learning. In a pilot study with sella, nasion, and orbitale, landmarks that have fewer dentoskeletal/soft tissue relationship patterns, 400 cephalograms were considered to be the optimum number for teaching the system. With regard to gonion, L1, and PNS, which have great variation in their locations, however, recognition performance would be improved by increasing the amount of image data available for building the system's knowledge base.
Table 5 gives mean recognition errors in the present study, ie, the distances between the coordinate values determined by our system and the corresponding fiducial coordinates, and a comparison with three previous studies that reported the mean error values for 13 major landmarks (sella, nasion, orbitale, porion, ANS, point A, point B, pogonion, menton, gnathion, gonion, U1, and L1).5,7,14 The total mean error of landmark identification in the present study was the smallest, and it was similar to that reported by Liu et al.14 As for sella and porion, the errors in the present study were larger than those reported by Liu et al. Our recognition mean errors for orbitale and point A, however, were significantly smaller than those reported by Liu et al. In addition, our system showed the smallest mean errors for nasion, orbitale, ANS, point A, pogonion, menton, gnathion, U1, and L1 compared with the three previous reports.5,7,14 Since our system showed a relatively lower success rate of 78% for sella, it is supposed that the use of the edge-based technique, which was employed by Liu et al, might improve the performance of our system. As for gonion, which also showed relatively lower success rate of 78%, the error reported by Saad et al5 was smaller than that in the present study. This result implies that the performance of our system for gonion could be improved by using the model-based technique.
Finally, it should be noted that the system correctly recognized the positions of all 20 anatomic landmarks in 12% of the cephalograms used in the test. The system incorrectly identified the positions of three or fewer landmarks in 75% of the cephalograms tested. The maximum number of misidentified landmark positions per cephalogram was six, which occurred in just 4% of the cephalograms used for the test. In addition, in the present study, all anatomic features were correctly recognized. This means accuracy of performance is guaranteed if the current system is given the task of automatically extracting images near to the landmark of an input cephalogram, magnifying them, and displaying them. Thus, the development of an interface that automatically searches anatomic features using our system could help reduce the workload in clinical practice as well as an increase educational efficiency for orthodontic residents. In summary, the results of the performance test obtained in the present study suggest that the proposed system is effective and has potential for possible clinical application.
CONCLUSIONS
The fiducial zones established by the panel of experienced orthodontists are considered valid for evaluation of the ability of the automatic recognition system to recognize anatomic features.
With the incorporation of the rational assessment criteria provided by confidence ellipses, the proposed system was confirmed to be reliable. The system successfully recognized anatomic features surrounding all the landmarks. The mean success rate for identifying the landmark positions was 88% with a range of 77% to 100%.
Acknowledgments
This study was partially supported by the Ministry of Education, Science, Sports, and Culture under Grant-in-Aid for Scientific Research (No. 14370695, No. 14370696, No. 16791284), by Japan Science and Technology Agency (JST) in the program of Core Research for Evolution of Science and Technology (CREST) and by the 21st-century Centers of Excellence (COE) program “Creation of Frontier Bio-dentistry.” The authors are also deeply indebted to Dr Nellie Kremenak for meaningful and productive advice in manuscript preparation.
REFERENCES
APPENDIX
Ten lateral cephalograms were selected at random from the learning samples. A panel of 10 dentists (7 men and 4 women; age range, 30 years 1 month to 41 years 2 months), with clinical careers in orthodontics of more than 5 years, was selected. Each film was overlaid with acetate paper and anatomic contours traced in pencil by each panel member, who visually identified and marked 20 anatomic landmarks. All procedures were performed on a light box with identical writing tools in an air-conditioned and darkened room.
Given that the coordinate values determined by a judge e for a cephalometric landmark m on a cephalogram c are designated as V(m, c, e) [c = 1, 2, …, 10, e = 1, 2, …, 10] where m = 1(S), 2(N), …, 20(Ptm), the following computations were made to generate a scattergram SG(m).
Step 1. The mean coordinate value V(m, c, *) was calculated by equation (2) and defined as the ‘best estimate’ position data of landmark m on the cephalogram c.
Step 2. A new coordinate system V′ was developed, where V(m, c, *) was the origin, and a line through the origin and parallel to the line S-N connecting the mean coordinate values V(1, c, *) and V(2, c, *) for S and N, respectively, was chosen as the x-axis. The y-axis was a line perpendicular to the x-axis through the origin.
Step 3. V(m, c, e), a set of the coordinate values, was projected onto the coordinate system V′ for each landmark m to generate a scattergram SG(m) consisting of 100 points.