To compare the efficacy and efficiency of treatment with clear aligners (CAT) vs fixed appliances (FAT) in adolescents with Class I and II moderate to severe malocclusions.
One operator’s (Garfinkle) cases from 2014 to 2019, started at age 12–18 years, with pre- and posttreatment records were identified and used according to an institutional review board–approved protocol. Records were measured by two calibrated, blinded investigators, aided by software (OrthoCAD [Cadent, Fairview, N.J.], Dolphin Imaging & Management Solutions [Chatsworth, Calif]). Discrepancy index (DI) and cast radiograph evaluation (CRE) scores, treatment duration, number of scheduled and emergency visits, and reported appliance and interarch elastic wear compliance were compared between groups using Wilcoxon rank sum and Fisher’s exact tests. Cephalometric superimpositions were completed to evaluate craniofacial growth and dental changes.
Records from 72 cases met the criteria and were included. For the 47 CAT and 25 FAT cases, mean DI (21 ± 5 and 24 ± 8, respectively; P = .20) and CRE (35 ± 10 and 34 ± 9, respectively; P = .90) scores were not significantly different. Other case attributes and reported appliance and interarch elastic wear compliance were also not significantly different. CAT vs FAT cases had significantly smaller treatment durations (24 ± 6 vs 27 ± 5 months; P = .01) and visit numbers (16 ± 5 vs 24 ± 4; P < .01), but emergency visit numbers were not significantly different (2 ± 2 vs 3 ± 2; P = .08).
In adolescents with Class I and II malocclusions and moderate to severe DI scores, on average, CAT vs FAT cases were completed 3 months faster with eight fewer visits, but treatment efficacy was not significantly different.
Commercial clear aligner systems were introduced in 1999 and have since gained popularity. Initial studies comparing clear aligner treatment (CAT) with fixed appliance treatment (FAT) were limited due to lack of provider experience, CAT attachments,1 and objective measures of case difficulty and outcome to compare other modalities.2 Recent systematic reviews have included studies using the discrepancy index (DI) and cast-radiograph evaluation (CRE) to determine case difficulty and outcomes, respectively. These reviews found that CAT had a shorter mean treatment time by 6.3 months3 and was good for mild-to-moderate malocclusions and nonextraction treatment,4 whereas FAT had better outcomes for occlusal contacts, torque control,3 extraction space closure, and correcting vertical and antero-posterior discrepancies.4 Notably, the full array of malocclusions that are treated currently with CAT should be studied.5 That is, few studies have compared CAT vs FAT effectiveness for cases of moderate to severe difficulty or for different classes of malocclusion; even fewer used DI and CRE scores for objective measures.
One retrospective study6 that compared CAT and FAT in adolescents with mild malocclusions defined by similar DI scores of 12 ± 5 and 12 ± 5, respectively, showed that CAT vs FAT cases (N = 26 per group) had significantly lower (more ideal) CRE scores (30 ± 8 vs 37 ± 8; P = .01), fewer scheduled visits (14 ± 4 vs 19 ± 4; P = .0001), fewer emergency visits (1 ± 1 vs 4 ± 2; P = .0001), and shorter treatment durations (17 ± 6 vs 23 ± 4; P = .0001). It is unclear if the results of this study translate to more difficult cases or differing classes of malocclusions.
The current retrospective study aimed to address these information gaps. Thus, the null hypotheses were that in adolescents with Class I and II malocclusions and moderate to severe DI scores, there were no differences between CAT and FAT in (1) CRE scores; (2) appliance wear compliance, treatment duration, and numbers of scheduled and emergency visits; and (3) CRE scores within the CAT and FAT groups when comparing Class I and Class II malocclusions.
MATERIALS AND METHODS
Data were collected per an institutional review board–approved protocol from the 2014 to 2019 records of one experienced operator’s (Garfinkle) cases treated by CAT (Invisalign, Align Technology, San Jose, Calif) or FAT (0.022 × 0.028-inch Damon Q, Ormco, Orange, Calif), as chosen by patients. Pre- and posttreatment records consisted of digital models, panoramic radiographs, and lateral cephalograms. Charted data regarding each patient’s age, sex, treatment duration (months), use of and compliance with (patient-reported percentage of time/day) interarch elastics and/or Class II–correction appliances (Carriere Motion Appliance [CMA], Henry Schein Orthodontics, Carlsbad, Calif), extractions, and number of scheduled and emergency visits, were recorded.
The inclusion criteria for cases were (1) complete set of pre- and posttreatment records, (2) 12–18 years of age at treatment start, and (3) moderate to severe malocclusion determined by DI scores ≥16. Exclusion criteria for cases were prior orthodontic treatment; Class III malocclusion; diagnosed syndrome or craniofacial anomaly; teeth that were congenitally missing, extracted for nonorthodontic reasons, or impacted; and treatment ceased before completion. Target sample sizes were 26 per group based on the previous study that used similar approaches and demonstrated significant differences in CRE scores.6 Hence, cases were selected from those most recently completed, working backward in time. All cases were given an identification number and blinded to group assignment.
Records were assessed by two of us (CE, WW). Pretreatment digital models and lateral cephalograms were evaluated using digital model software (OrthoCAD version 220.127.116.11, Cadent, Fairview, N.J.) and cephalometric software (Dolphin Imaging & Management Solutions, Imaging 11.95, Chatsworth, Calif) to calculate the DI score for each case. Measurements of subnasale–nasion–supramentale (ANB), mandibular plane–Sella–Nasion (MPSN), and incisor long axis–mandibular plane (IMP) angles were retraced and remeasured if they differed in value by more than 2.7, 4.7, and 7.5 degrees, respectively, based on investigations of reliability and reproducibility of cephalometric measurements.7,8 Posttreatment digital models and panoramic radiographs were evaluated to calculate CRE scores. All DI and CRE score measurements were rounded to the nearest millimeter. To assess inter- and intraexaminer reliabilities, five CAT and five FAT cases were randomly selected and remeasured by both of the investigators ≥1 week after the initial assessments.
One of us (Chou) conducted superimpositions using the pre- and posttreatment lateral cephalograms following American Board of Orthodontics guidelines.9 Superimpositions were reviewed, and the final results were determined by consensus of three of us (Chou, Nickel, Iwasaki) and then evaluated to qualitatively assess results.
Data and Statistical Analyses
The mean, standard deviation, and median were calculated for each group for the continuous variables of age (years), treatment duration (months), number of scheduled and emergency visits (total and non–CMA-associated emergency visits), reported CMA and interarch elastic wear compliance (percentage of time), and total and component scores for DI and CRE. Percentages of cases within each group were calculated for sex, type of malocclusion, Class II treatment with CMA, extractions, and interarch elastics.
Between-group differences in medians for all of the continuous variables were analyzed using Wilcoxon rank sum tests. Proportions of boys and girls; cases treated with and without CMA, extractions, and interarch elastics; and cases with CRE scores <30 were analyzed using Fisher’s exact tests, with significance defined as P < .05 and β ≥ 0.85. Intraclass correlation coefficients (ICC) and 95% confidence intervals were estimated using a mean rating (k = 2) based on absolute agreement and two-way mixed-effects modeling to assess intra- and interinvestigator reliabilities. Reliability levels were defined as poor, <0.5; moderate, 0.5–0.75; good, 0.75–0.90; and excellent, >0.90.10 All analyses were conducted using statistical software (R [R Foundation for Statistical Computing, Vienna, Austria]).
Records of 558 cases with treatment starts from May 2014 to August 2019 at ages 12–18 years were screened. Main reasons for case exclusion were congenitally missing teeth (27), treatment ceased before completion (19), and impacted canines (14). After DI scores were calculated, 72 cases (47 CAT and 25 FAT) met the criteria and were included. Pretreatment DI scores for the CAT (21 ± 5) and FAT (24 ± 8) groups were not significantly different (P = .20; Table 1). For CAT vs FAT cases, age at treatment start, proportions of boys and girls and Class I and Class II cases, and compliance with CMA and interarch elastic wear were also not significantly different (Table 2). However, CAT vs FAT showed a significantly higher proportion of Class II cases treated with CMA (97% vs 57%; P < .01) and significantly lower proportions of cases treated with extractions (0% vs 24%; P < .01) and interarch elastics (79% vs 100%; P = .01; Table 2).
Regarding efficacy, total CRE scores for the CAT and FAT groups were 35 ± 10 and 34 ± 9, respectively, and not significantly different (P = .90; Table 3). However, comparing CRE components for CAT vs FAT showed that occlusal contact scores were significantly worse (9 ± 5 vs 5 ± 3; P < .01), whereas overjet scores were significantly better (6 ± 4 vs 8 ± 3; P < .01), as were root angulation scores (3 ± 2 vs 4 ± 2; P = .02; Table 3). Comparisons of Class I vs Class II outcomes for both CAT and FAT showed no significant differences between components of or total CRE scores (Tables 4 and 5, respectively; all P ≥ .20). The numbers and percentages of cases with CRE scores <30 for CAT and FAT groups were 15/47 (32%) and 8/25 (32%) and were not significantly different (P = 1.00).
Regarding efficiency (Table 6), for CAT vs FAT cases, treatment durations were significantly shorter (24 ± 6 vs 27 ± 5 months; P = .01) as were the numbers of scheduled visits (16 ± 5 vs 24 ± 4; P < .01; Table 6), but the numbers of emergency visits were not significantly different (2 ± 2 vs 3 ± 2; P = .08). Segregating by malocclusion type showed that Class I cases in the CAT vs FAT group had significantly fewer emergency visits (1 ± 1 vs 3 ± 2; P = .04).
ICC were not computed for the DI and CRE components of lateral open bite and interproximal contacts, respectively, because of the many zero values for these measurements and the limitations of the ICC formula to account for true zero values. For all other DI component scores, intrainvestigator reliabilities were good–excellent (ICC ≥0.89). For CRE scores, intrainvestigator reliabilities for buccolingual inclination, occlusal contacts, occlusal relation, and root angulations were excellent (ICC ≥0.90), good for alignment and marginal ridges (ICC = 0.78–0.85), and moderate for overjet (ICC = 0.67). All components of DI and CRE scores showed ICC ≥0.99, indicating excellent interrater reliabilities.
The objectives of this study were to compare the efficacy and efficiency of CAT vs FAT in adolescents with Class I and II moderate to severe malocclusions. No significant differences were noted for DI and CRE scores between the CAT and FAT groups, numbers of boys or girls, nor between malocclusion types within the CAT and FAT groups, and the two treatment groups had equal percentages of cases with a CRE score <30. However, the CAT group had significantly shorter treatment durations by 3 months and fewer scheduled visits by eight when compared with the FAT group. Class I cases in the CAT group had significantly fewer emergency visits; one compared with three for the FAT group.
Pretreatment variables evaluated (Table 1) were not significantly different between the two groups, but some differences in treatment-related variables included the following: no cases were treated with extractions in the CAT group compared with six cases in the FAT group and the percentages of cases treated with interarch elastics were significantly smaller; and the percentages of Class II cases treated with CMA were significantly larger in the CAT vs FAT group. Patients chose the treatment modalities, so these differences likely reflected operator preferences.
The results of the lateral cephalogram superimpositions showed some general differences in outcomes; for example, IMPA was maintained in the CAT group for Class I and II malocclusion cases (Figures 1 and 2), except in one case (#8) where no mandibular growth was noted and CMA was used (Figure 2), whereas the FAT group showed a general increase in IMPA (Figures 3 and 4), which could be at least be partly attributed to the −3 degree torque prescribed by the mandibular incisor bracket in the system used. The superimpositions of the Class I cases also indicated that the FAT group had more cases of mandibular growth in addition to more mandibular incisor proclination (Figure 3). The mandibular changes were likely from normal growth, as no orthodontic growth modification was used in the Class I cases. The increased mandibular incisor proclination seen in the FAT cases could likely be attributed to resolving crowding by proclination as well as the prescribed bracket torque.
Compared with the previous study on adolescents with mild malocclusions, reflected by DI scores of 12 for both groups,6 the current study involved adolescents with moderate to severe malocclusions, as reflected by much larger average DI scores of ≥21. In the previous study, the average total CRE scores for CAT vs FAT in mild malocclusions were significantly better (30 ± 8 vs 37 ± 8; P = .01), as were component scores for alignment, occlusal relations, and overjet. The current study demonstrated no significant differences in total CRE scores for adolescents with moderate to severe malocclusions, but similarly, overjet scores were significantly better in the CAT vs FAT cases.
CRE component scores (Table 3) in the CAT vs FAT group in this study were significantly larger for occlusal contacts and smaller for overjet and root angulations. The difference in occlusal contacts is likely due to occlusal coverage of clear aligners, which prevents teeth from settling into occlusion as they could with fixed appliances.1,11,12 The use of digital software in the CAT cases has the potential for improved visualization, evaluation, and planning13 and could have contributed to better overjet scores compared with the FAT cases. Better root angulation scores for the CAT vs FAT group contradicts findings of previous studies.1,4 Interestingly, the FAT cases treated with extractions averaged two points in root angulation, whereas the average for total FAT cases was four points. Many points were from distally tipped mandibular second molars, which were likely an adverse effect of leveling the mandibular arch using a wire with a reverse curve of Spee.
When evaluating treatment efficiency (Table 6), previous studies found that the CAT had a shorter treatment time than the FAT by 4–6 months for mild-to-moderate cases.14,15 Specifically, when compared with the previous study on adolescents with mild malocclusions, where CAT vs FAT had significantly smaller treatment durations by 6 months, the numbers of scheduled visits by five, and the numbers of emergency visits by three, the current study similarly showed significantly smaller treatment durations and fewer scheduled visits by 3 months and eight visits, respectively, whereas the numbers of emergency visits were not significantly different for moderate to severe malocclusion cases. The difference in number of scheduled visits in the current study could be due in part to the differences in scheduled visit intervals, which were 8–12 weeks for CAT and 6–10 weeks for FAT cases. Although the numbers of emergency visits were not different between groups in the current study, differences in the nature of the emergency visits between CAT and FAT could contribute differently to treatment duration. That is, emergency visits for CAT cases commonly involved broken attachments or lost aligners, which would not cause serious delays in treatment if tooth positions could be approximately maintained by an available aligner. However, emergency visits for FAT were commonly associated with debonded brackets, which could result in the loss of targeted tooth positions and the potential to extend treatment duration. CAT compliance was also considered as a factor in the current study, but the number of cases excluded due to cessation of treatment before completion in the two groups were not significantly different. In addition, the uneven distribution of extraction cases was considered, but when those cases were excluded, the average FAT treatment time decreased by only 1 month and the number of scheduled visits remained the same. Overall, the average numbers of emergency visits per case were small (≤3), so although there were significantly fewer emergency visits in Class I malocclusion cases treated with CAT compared with FAT, this may not be clinically important.
There were several limitations associated with the current study. First, it is important to note that these cases were treated by one provider, so operator bias may limit the generalizability of the results. In addition, the number of cases that met the inclusion criteria in the FAT group fell just below the number identified by the previous study and power analysis.6 There were almost twice as many cases in the CAT than the FAT group, and categorical variables such as CMA, elastic wear, and extractions could not be properly matched between groups. In addition to modifying the inclusion and exclusion criteria, future studies could involve cases from multiple providers and practices, for example, using a practice-based research network model, to increase the sample sizes, account for operator bias, and improve the generalizability of the results. Cases could also be identified over a shorter time period, which could reduce variations in the CAT systems used compared with the 4.5 years for the current study, which encompassed two aligner system modifications by the manufacturer (Invisalign G5 January 2015 and Invisalign G6 October 2016).
In adolescents with Class I and II malocclusions and moderate to severe DI scores, there were no differences between the CAT and FAT groups in efficacy in terms of CRE scores and appliance wear compliance.
However, CAT was more efficient in terms of treatment duration by 3 months and scheduled visits by eight compared with FAT.
There were no differences in CRE scores within the CAT and FAT groups when comparing Class I and Class II malocclusions.
Superimpositions indicated greater lower incisor proclination in the FAT group compared with the CAT group.
This work was based in part on a master of science in orthodontics thesis. The contributions of Wen Wu and Christopher Elkhal as calibrated and blinded measurers and Kim Theesen who prepared the figures are gratefully acknowledged.
Assistant Professor, Division of Orthodontics, Department of Oral and Craniofacial Sciences, School of Dentistry, Oregon Health & Science University, Portland, Ore, USA.
Professor and Advanced Education Program Director, Division of Orthodontics, Department of Oral and Craniofacial Sciences, School of Dentistry, Oregon Health & Science University, Portland, Ore, USA.
Professor, Oregon Health & Science University-Portland State University School of Public Health, Oregon Health & Science University, Portland, Ore, USA.
Adjunct Associate Professor, Division of Plastic and Reconstructive Surgery, School of Medicine, Oregon Health & Science University, Portland, Ore, USA.
Professor, Division of Orthodontics, and Chair, Department of Oral and Craniofacial Sciences, School of Dentistry, Oregon Health & Science University, Portland, Ore, USA.