ABSTRACT
Conventional two-dimensional (2D) cephalometric radiography is an integral part of orthodontic patient diagnosis and treatment planning. One must be mindful of its limitations as it indeed is a 2D representation of a vaster three-dimensional (3D) object. Issues with projection errors, landmark identification, and measurement inaccuracies impose significant limitations, which may now be overcome with the advent of cone-beam computed tomography (CBCT). A systematic review of the reliability of different 3D cephalometric landmarks in CBCT imaging was conducted.
Electronic database searches were administered until October 2017 using PubMed, MEDLINE via OvidSP, EBMR and EMBASE via OvidSP, Scopus, and Web of Science. Google Scholar was used as an adjunctive search tool.
Thirteen articles considering CBCT scans of human subjects from preexisting data sets were selected and reviewed. Most of the studies had methodological limitations and were of moderate quality. Because of their heterogeneity, key data from each could not be combined and were reported qualitatively. Overall, in 3D, midsagittal plane landmarks demonstrated greater reliability compared with bilateral landmarks. A minimum number of dental landmarks were reported, although most were recommended for use.
Further research is required to evaluate the reliability of 3D cephalometric landmarks when evaluating 3D craniofacial complexes.
INTRODUCTION
Cephalometric radiography is a standardized radiographic technique employed to provide a better understanding of an individual's craniofacial structures in three planes of space: anteroposteriorly (AP), vertically, and transversely. Landmarks routinely used in two-dimensional (2D) lateral cephalometric analyses are chosen based on their ability to be reliably identified.1 Distances/angles between these landmarks are measured and then compared with one or various sets of standardized norms that provide an indication of relationships shared within the craniofacial complex of an individual at a given time. Radiographic findings are then compared with clinical findings. Issues with image distortion and superimposition of bilateral structures may pose significant limitations to the interpretation of these data.2,3 Sometimes, a 2D PA cephalogram is a valuable adjunct to routine orthodontic diagnosis and treatment planning as it provides invaluable information, especially in the transverse direction, eliminating superimposition of certain bilateral structures that eases detection of potential facial asymmetries. Even though chosen landmarks may be easily identified and reproducible, it is imperative to question their true meaningfulness as this transverse dimension is often unaccounted for without additional imaging.
Limitations once imposed by 2D may now be overcome: volumetric data contained within voxels of a single 360° cone-beam computed tomography (CBCT) scan is instrumental in reconstructing and understanding skeletal, dental, and soft tissue drape relationships in three dimensions (3D).3 The accuracy of landmark identification and placement is now enhanced as each occupies a specific location along a coordinate system of x-y-z axes. As such, it is possible that theoretical discrepancies existing between 2D and 3D cephalometric analyses are attributed to the fact that measurements are made between two lines in the prior whereas, alternatively, CBCT imaging affords the possibility for measurements to be made between two planes.4
While most “old” 2D landmarks are reliable for use in 3D cephalometric analyses, specific nerve foramina in the maxilla and mandible provide better landmarks in 3D imaging.5 These include mental foramina and infraorbital foramina, which are more reliable and reproducible than others. However, the obliquity of infraorbital foramina and oral incisive foramina tends to pose challenges as it can make locating their center point difficult.5
A prior systematic review (SR) examining the reliability and reproducibility of 3D cephalometric landmarks using CBCT was published in 2014 by Lisboa et al.6 Their search ended much earlier, in October 2014; hence, the decision was made to further explore this area based on the increasing popularity of CBCT imaging and corresponding significance demonstrated by the abundance of scientific studies increasingly available every day. As such, the search period for the current SR was vaster, inclusive from 1998 (first introduction of CBCT into dentistry) to October 2017. Databases intended for the search were also more widespread than those previously considered. At least 17 additional articles were reviewed in the second selection phase, published between October 2014 and October 2017. The purpose of this SR was to investigate the available scientific literature to evaluate the reliability of different 3D cephalometric landmarks in CBCT imaging.
MATERIALS AND METHODS
Protocol and Registration
This SR followed as closely as possible the methodology detailed by the PRISMA guidelines7 for the transparent reporting of SRs and meta-analyses.
Eligibility Criteria
An extensive search of available scientific literature was carried out electronically, with only those studies that examined the reliability of 3D cephalometric anatomic landmarks using CBCT considered for review. No language or study restrictions were placed. Unpublished materials were not excluded.
Information Sources
Databases searched included PubMed, MEDLINE via OvidSP, EBMR and EMBASE via OvidSP, Scopus, and Web of Science. To ensure that a wide range of academic literature was well represented, Google Scholar was used as an adjunctive search tool to discover other scholarly sources that may have existed. The first 100 relevant hits were evaluated from this “gray literature” and considered for inclusion.
Search
Strategic design was developed through consultation with a health sciences research librarian using appropriate keywords and their combinations. The full electronic search strategy for each database is illustrated in Table 1.
Study Selection
Evaluation of selected articles was staged in a two-step process to determine eligibility. First, each individual article title and abstract was screened by two reviewers (Dr Sam, Dr Currie) independently. The aim of this step was to ensure each article pertained to the following topics: 3D imaging, anatomic landmarks, cephalometric analysis, and accuracy and/or reliability of findings. Next, decisions for final eligibility were made based on full-text assessments by the same reviewers. They were not blinded to the authors nor results of the studies. Any disagreements between reviewers were resolved by discussion or by introduction of a third reviewer (Dr Lagravére-Vich) to mediate when deemed necessary.
Data Collection
Data collection was done in duplicate. Key features of eligible articles were documented by each reviewer. Statistical results and conclusions of every study were also retrieved.
Risk of Bias Among Included Studies
Individual articles then underwent a methodological quality scoring, adapted from a process described in a previous related study with modifications based on a research methodology series for reliability articles.8,9 Each criterion for judgment was open to discussion among reviewers with the aim of limiting the risk of bias and serving as a baseline for assessments. The way in which points were awarded is detailed in Table 2. Each article received a grading score and was then categorized per its overall quality of evidence and strength of its recommendations. Articles were categorized into groupings based on the methodological quality/magnitude of scoring: excellent or high (76% or more), good or moderate (51%–75%), and poor or low (50% or less). It must be noted that this is a nonvalidated assessment tool.
Synthesis of Results
A meta-analysis was not justifiable for this topic as studies were very diverse, both in study design and report of relevant findings. However, it may be possible to complete one in the future if reliability measures of only a limited number of landmarks are combined for applicable included studies.
RESULTS
Study Selection
A final total of 13 articles satisfied the selection criteria and were included in this review. A detailed outline of the selection process, from identification through articles included, is illustrated Figure 1.10
In comparison to the previously published SR,6 this follow-up gained three additional articles11–13 and excluded four considered in the previous. The search criteria of the previous study ended much earlier (October 2014) and combined reliability studies using both human patient scans and dry human skulls. Thus, any discrepancy between the prior and this latter SR reflect these differences. One of three additional articles11 retrieved by this review yielded excellent or high methodological quality scoring. Considering it was only one of four included articles to obtain this scoring overall, it had the potential to offer very useful insight into this area of study. The second12 introduced new landmarks and measurements to shift the traditional 2D cephalometric analysis paradigm toward a novel 3D one. Lastly, the third offered insights into the use of landmark-based superimposition in 3D.14
Study Characteristics
Selected articles were published between 2008 and 2017 in several diverse medical/dental journals. All were written in English apart from two: one was in French and another in Korean. These articles were obtained, although English versions were not accessible at the time and the decision was made for them to be excluded. All were retrospective and cross-sectional in nature (data collected before the research project). Summary characteristics of the included articles are described in detail in Table 3.1–3,5,11–19
The aim of all studies was to investigate first the reliability (intra- and/or interrater measurements) of anatomic landmarks in 3D cephalometric analysis, reported statistically with one or more of the following: intraclass correlation coefficients (ICCs), Bland-Altman testing, mean error and standard deviations, 3D scatterplots, and Pearson correlation coefficient.
The methodological assessment tool used is outlined in Table 2. The summary of the scores imparted to reliability articles is found in Table 4.1–3,5,11–19 In general, weaknesses included inadequate description of sample characteristics of subjects (eg, sex, age, inclusion criteria, exclusion criteria, specific database used), no justification or calculation for sample sizes, and lack of explanation regarding dealing with cofounders such as exclusion criteria and employment of randomization.
Risk of Bias Within Studies
A possible source of bias within each article was based on timing of records. As all studies were retrospective and it is unethically sound to expose patients to radiation solely for research purposes, investigators were reliant on the use of preexisting data sets for subject populations. As this review was interested in reliability, it became problematic if a study utilized a set not representative of the spectrum of individuals to generalize findings in a research or clinical context. Although a few studies mentioned the use of randomization, few described how, and none reported using sequence generation within their data set to ensure randomization was somewhat reflected when extracting their subject sample.
Results of Individual Studies
Because of the heterogeneity of studies, specific characteristics of each and key data are reported in Table 3. Notable statistical results, as detailed in Tables 5 through 17, encompass a summary of pertinent statistical reliability measures for various landmarks listed by included studies. Typically, intraexaminer reliability was higher than interexaminer reliability in landmark identification. Skeletal landmarks presented similar reliabilities compared with dental ones; variability was dependent on challenges a specific location posed.
Summary of Notable Statistical Results for Schlicher et al. (2012) 16,a

In general, midsagittal plane landmarks tended to demonstrate better consistency in identification compared with bilateral landmarks. The ease of locating landmarks along midlines may come naturally to most clinicians, as manipulation and interpretation of CBCT sagittal views are quite like 2D lateral cephalograms. Midsagittal plane landmarks recommended for use in 3D included Sella, basion, nasion, anterior nasal spine, A-point, B-point, pogonion, gnathion, and menton. Bilateral landmarks demonstrating variable consistency in identification included those on the condyles, orbitale, porion, and lingula. This was further complicated by the fact that some located along broad curvatures or that had indistinct boundaries were more difficult to locate and thus were more erroneous in identification. Dental landmarks demonstrating the greatest consistency were incisor crown tips, tooth root apices, and defined points on teeth. Some nontraditional landmarks recommended for use were infraorbital foramina, mental foramina, and possibly frontozygomatic sutures. Novel 3D landmarks, maxillary and mandibular centroid landmarks, also showed favorable reliability.
Synthesis of Results
A meta-analysis was not possible. Methodologies of the selected studies were highly heterogeneous, posing a challenge to the consideration of combining results together. In addition, not all studies evaluated the same landmarks, making the comparison more challenging. Some of these were traditionally used cephalometric landmarks, whereas others were nontraditional in nature.
Risk of Bias Across Studies
The more observers involved in measurements of a single study, the greater potential for measurement error due to individual expertise. Also, as some authors were involved in more than one study of this sort, it was possible to use the same preexisting database for patient CBCT scans across multiple studies. If this were the case, it could pose a significant problem as it would artificially inflate the reliability values.
Additional Analysis
No additional analyses were performed.
DISCUSSION
Summary of Evidence
Bilateral landmarks, including midramus, orbitale, ramus point, and sigmoid notch, demonstrated more consistent identification in 3D than 2D. This is likely explained by the 2D limitation of structural superimposition being overcome. Each left and right side of a landmark could be evaluated independently, in a specific location in all three planes of space, without any other structures impeding its interpretation. Because of the unfamiliarity of routine landmarking along a transverse axis, as in 2D, bilateral landmarks tended to show more variability than those located in the midline. De Oliveira et al.19 found that two bilateral landmarks demonstrated poor reliability in one of the three axes: the ramus in the y-coordinate and the condylion in the z-coordinate. Many bilateral landmarks are located along broad curvatures and pose a challenge for the eye to detect the most prominent point or depression of the structure at hand. Differences in landmark identification error in the axes may differ, and as such, certain landmarks were useful in detecting changes in one axis but not another.2 Landmarks that demonstrated considerable variability in the x-coordinate were not suitable for use in width (transverse) measurements of the dentofacial complex. For example, condylion, orbitale, and porion demonstrated statistically greater variability in the mediolateral direction, or x-axis, in multiplanar reconstruction views (MPRV) and may not be suitable for use in taking width measurements. A possible explanation for bilateral landmarks showing more variability in the x-axis was perhaps related to the inadequacy of landmark definition in this dimension.1
A main limitation of 2D imaging is that a 3D object is reduced to two planes of space. More precise landmark identification was obtained with most MPRV in 3D than in 2D cephalograms. Of these landmarks, Sella demonstrated the lowest variability of 0.7 mm, whereas soft tissue pogonion showed the highest variability of 2.6 mm.1 This was in agreement with another study that found a high reproducibility (ICC >0.9) of all measurements traditionally employed in 2D cephalometric analyses.3 In contrast, another study found that intraobserver identification for only specific landmarks was greater in 3D than in 2D.18 This could be explained by the fact that this study included a total of 27 landmarks, which was more than included in the prior two studies. In fact, the higher number of landmarks analyzed afforded a benefit, as meaningful errors were more likely noted. It is important to realize that ease of identifiability of points does not necessarily translate to meaningful implications. This may provide an artificial sense of reliability.
Sometimes a discrepancy between the reliability of identifying left vs right structures is apparent. The manifestation may be attributed purely to the individual examiner's systematic error. Another hypothetical and plausible explanation is that this could be the neuropsychological linkage between left- and right-handedness and its effect on preferences of the human brain. Right-handed artists have been shown to prefer their subjects on the right with light sources from the left. Left-handed artists tend to demonstrate the opposite trend.20 Extrapolating, this may imply some influence of handedness in an evaluator's spatial orientation of CBCT scans and their identification of 3D landmarks.
There was a recognizable trend that midline landmarks such as Sella and A-point showed the same consistency, if not greater, in landmark identification as in 2D. In contrast to bilateral structures, this may actually be facilitated by the familiarity of observers with radiographic interpretation in the sagittal plane, used in lateral head films. The MPRV display in 3D software provides an avenue for limiting the magnitude of superimposition of multiple structures, as slices can be set to a particular thickness when investigating an area of interest.
Three-dimensional objects occupy a specific location on an x-y-z coordinate system. Although the maximum mean difference was minimal, one study noted that the y-coordinate was more reliable than the x- and z-coordinates among observers. The least reliable landmark identifications in these axes were as follows: condylion in the x-coordinate or mediolateral direction, ramus point closely followed by tuberosity in the y-coordinate or anteroposterior direction, and condylion in the z-coordinate or caudal-cranial direction.19 In contrast, Chien et al.18 highlighted that some difficulty arose in determining the best estimates of the y-locations for gonion, L1 tip, Sella, and U1 tip. Difficulty arose locating the y-location of structures such as gonion, midramus, and ramus point, since a precise vertical position must be established along these broadly curved structures in 2D and 3D. Most of these inaccuracies were linked to a line parallel to its curvature. Specifically for 3D, erroneous measures may be attributed to the inappropriate use of surface display shading used by the operator.18
Most traditionally used cephalometric landmarks were reproducible both in 2D and 3D imaging modalities. Since 3D has the enhanced ability to fulfill the precision of a third dimension, it makes one wonder if there were also nontraditional landmarks that are reproducible for use in 3D analyses. Using 42 newly defined anatomic landmarks, Naji et al.5 concluded that the mean differences of all measurements were less than 1.4 mm. Moreover, if a center coordinate point was chosen using the x-, y-, and z-coordinates to locate a specific landmark, the analysis of its reliability among evaluators was maximized, and these differences were more impactful clinically. In fact, bilateral mental foramina, dens axis, bilateral transversium atlas, bilateral inferior hamulus, right infraorbital foramen, medial right condyle, and lateral left condyle showed 0.5 mm or even less of a difference. However, one should be mindful of its application, as not all nontraditional landmarks should be routinely used.5
Since 3D cephalometric landmarking is still a new concept, labeling landmarks with a variability of clinical significance is not necessarily concrete per se. The clinical significance of cephalometric landmarks with a variability of less than 0.5 mm is unlikely, whereas variability between 0.5 mm and 1.0 mm may be likely. This differs from 2D, as cephalometric landmarks less than 1.0 mm are unlikely to have clinical significance. Thus, if linear and angular measurements were taken using these landmarks, their clinical implications may be considered to be reduced.
When evaluating the effects of software MPRV vs 3D-virtual reconstruction view (3D-VRV) for anatomic landmark identification, two included studies offered valuable insight. MPRV has been shown to be more highly reliable than 3D-VRV when considering these two types of visualizations independently.11 However, most software used to import and view DICOM image formats of CBCT scans have the capability for simultaneous viewing of both modalities.
It was an interesting finding that viewing MPRV and 3D-VRV did not improve the precision of identifying the upper right and left central incisors and that MPR alone demonstrated consistency in accurate landmark identification. A reasonable explanation for this is that the root apex of the mandibular incisors is typically difficult to identify in the sagittal view because of the superimposition of the root apices of the anterior teeth.
Limitations
Since CBCT technology is a recent development and its integration into routine practice in dentistry is relatively recent, all selected studies were from no earlier than 2009. Although there have been prior attempts to synthesize a single document conveying all research done in understanding applications of landmark identification in 3D techniques, studies covered a broad range of topics and should not be automatically unified. As such, to narrow in on the area of interest, it was opted to use more rigid exclusion criteria than those prior.
One of the exclusion criteria that was chosen to be used after the synthesis of selected articles was studies using human dry skulls. This was because the soft tissue attenuation of facial structures could not be accounted for in absolute, despite researchers' best attempts with fluid-filled units.
All selected studies underwent a methodological quality assessment carried out by a single examiner. There is no gold standard methodological quality assessments tool used in reliability studies at this time. This posed difficulties when trying to emphasize the relative weight of certain studies on the overall conclusions.
CONCLUSIONS
The mid-sagittal plane, followed by bilateral structures, demonstrated the highest reliability.
Landmarks with the lowest reliability included those marked on the condyle and other anatomic structures with prominent curvatures without definitive boundaries.
A minimum number of dental landmarks was reported, with many demonstrating good to excellent reliability.