ABSTRACT
To evaluate gingival phenotype (GP) and thickness (GT) using visual, probing, and ultrasound (US) methods and to assess the accuracy and consistency of clinicians to visually identify GP.
The GP and GT of maxillary and mandibular anterior teeth in 29 orthodontic patients (mean age 25 ± 7.5 years) were assessed using probing and US by a single examiner. General dentist and dental specialist assessors (n = 104) were shown intraoral photographs of the patients, including six repeated images, and asked to identify the GP via a questionnaire.
An increasing trend in GT values of thin, medium, and thick biotype probe categories was found, though this was not statistically significant (P = .188). Comparison of probing method to determinations of GT made by US yielded slight agreement (κ = 0.12). Using the visual method, assessors’ identification of the second GP determination ranged from poor to moderate agreement (κ = 0.29 to κ = 0.53).
The probe method is sufficient in differentiating between different categories of GP. However, further research is required to assess the sensitivity of the probe method in recognizing phenotypes in the most marginal of cases. Assessors using the visual method lack the ability to identify GP accurately and consistently among themselves.
INTRODUCTION
Maintaining the integrity of the gingival tissues is essential for all facets of dentistry to ensure ideal, long-term clinical outcomes.1 Gingival thickness (GT) and phenotype (GP) are considered useful predictors for the likelihood of achieving favorable esthetic and functional outcomes.2,3
Periodontal phenotype is composed of distinctive anatomic characteristics. It includes in its definition, GT, GP, keratinized tissue width, thickness of buccal bone, and tooth dimension.4,5 Phenotype is most commonly defined using a binary classification that considers GT ≤1 mm as thin and >1 mm as thick.1,3,6–9 However, the values defining different phenotype classifications exist only as a matter of discourse.1,2,10,11
It has been suggested that thin and thick phenotypes respond differently to orthodontic, periodontal, surgical, and restorative treatments.1,3,10,12–18 Individuals with thin phenotypes may respond poorly and be prone to the development of gingival recession after excessive orthodontic movements,7,19–21 implant surgery,6,13,22 crown lengthening and root coverage procedures,23,24 nonsurgical periodontal therapy,25 and prosthodontic treatment.22 This is in contrast to thick phenotypes that have shown greater soft tissue resilience and an increased likelihood to have presence of a papilla between an implant and adjacent tooth.12 Therefore, the need for precise determination of GP and GT prior to commencing treatment is crucial in maintaining the long-term health and stability of gingival tissues.
There continues to be no favored or recognized method for determining GP, especially one that can repeatedly and reliably be used to make appropriate classifications. Two methods that are highly practiced among clinicians are: visual assessment (where different morphological characteristics such as tooth shape and size, contour of the gingiva, width of keratinized tissue, and papilla height are evaluated); and probing (where the transparency of the probe through the gingival tissues is assessed).9,12,26 However, both methods are subjective and offer no empirical measurement.1–3,10,11,23,27 Still, they are highly practiced during routine clinical examinations.3,12,23
Alternatively, direct methods of measuring true GT provide actual numerical information and are the most objective methods.1,14,27 However, techniques such as bone sounding and calipers are invasive and require local anesthetic.1,3,7,8,13,14,27–34 In recent years, there has been increasing interest in the use of direct, but noninvasive, methods such as computed tomography and ultrasound (US). Muller and colleagues conducted numerous studies using US to assess GT, which they deemed to have remarkable validity and repeatability.8,9,15–18,35
This study evaluated these common indirect methods and assessed two hypotheses:
There would be no difference in the GP classification afforded by the probing method compared to a direct measurement of GT;
Clinicians could accurately identify GP using the visual assessment method with no difference in the first and second instances of classification.
MATERIALS AND METHODS
The Human Ethics Committee of the University of Western Australia approved this study with relevant patient information and consent documents (reference number: RA/4/20/5449).
Twenty-nine pre-orthodontic patients from the Department of Orthodontics at the Oral Health Centre of Western Australia (OHCWA) were recruited on a volunteer basis from February to July 2019. Patients were included if they were 18 years or over with good oral hygiene. The exclusion criteria were:
Decay, crowns, or fillings of the maxillary and mandibular anterior teeth;
Gingivitis or periodontitis;
Pregnant or lactating;
Smoker; or
Currently or history of taking any medications known to cause enlargement of the gums (calcium antagonists, cyclosporin A, phenytoin).
Visual Assessment
Standardized intraoral photographs of the patients’ anterior teeth and surrounding tissues in optimal occlusion were taken by the same examiner (JK), then cropped and prepared to a standardized format. Any identifiable characteristics were removed to ensure anonymity. The photographs were collated in a web-based questionnaire using Qualtrics Survey Software (Qualtrics, Provo, UT, USA) and distributed via email as a direct web-linked survey. Assessors were recruited on a volunteer basis and comprised general and specialist dentists. Information on how to determine GP using only visual information in the photographs was provided before starting the questionnaire. The assessors submitted an overall GP determination as either thin, medium, or thick. Six duplicate photographs were also inserted randomly to measure intrarater reliability. None of the assessors had been informed of this double scoring.
Probe Transparency
GP was assessed using a Colorvue Biotype Probe (CBP) (Hu-Friedy Mfg. Co., LLC Chicago, IL, USA). It features three colors: white, green, and blue, each representing thin, medium, and thick GP, respectively. The CBP was inserted into the gingival sulcus at the midlabial aspect of each maxillary and mandibular anterior tooth with minimal pressure. Depending on visibility of the colors through the labial gingiva, a GP classification for each tooth was made. All measurements were performed by a single examiner (JK), who had been calibrated against a periodontist (LAM).
Measuring GT
US (Philips Affiniti 70G) was carried out by a dentomaxillofacial radiologist (JA) using a hockey-stick shaped transducer (10 mm × 30 mm) Koninklijke Philips, NV, USA with a frequency of 7–15 MHz to measure the GT of the maxillary and mandibular anterior teeth of each patient. A tasteless gel pad (Aquaflex) Parker Laboratories Inc, Fairfield, NJ, USA was used as the medium, covering the labial surface of the teeth and gingiva. Prior to this study, validation of this machine against another direct method (transgingival probing) was performed.
US images captured at each tooth showed a buccolingual cross section of the enamel, gingiva, and alveolar bone (Figure 1). A total of 348 images were used to measure GT using a perpendicular line drawn from the mucogingival surface to the summit of the alveolar bone crest. Measurements were performed three times, averaged per tooth, and taken to the nearest 0.01 mm. All measurements were performed by a single examiner (JK), calibrated against the radiologist.
Buccolingual cross-section of the alveolar bone, gingiva, and enamel produced by US from which GT was measured. GT indicates gingival thickness; US, ultrasound.
Buccolingual cross-section of the alveolar bone, gingiva, and enamel produced by US from which GT was measured. GT indicates gingival thickness; US, ultrasound.
Intraexaminer Repeatability
Intraexaminer reproducibility was analyzed by selecting 10 random patients who were re-examined 1 week apart using the probing method and remeasuring US images.
Statistical Analysis
Sample size calculation was based on intraclass correlation coefficient (ICC) in which each tooth was considered as a replicate unit. Therefore, considering an alpha level of 0.05, a power of 0.80, and 0.50 as the minimal acceptable level of ICC, a sample size of approximately 233 teeth (19 subjects) was necessary.36
Data were analyzed using the R environment (R Foundation for Statistical Computing). GT for each patient was compared with the probe classifications using analysis of variance (ANOVA). A weighted kappa coefficient (κ) assessed the agreement of the probe method as well as the intra- and inter-rater reliability. It was also calculated for each assessor and used as the response value in a multivariate linear regression to investigate if there was any relationship with assessor accuracy and their demographic information. Additionally, κ was calculated to assess intrarater reliability of the assessors using the visual method.
For continuous variables, ICC was used to validate the US machine with transgingival probing as the reference standard and to assess intra- and inter-rater reliability for the US method.
RESULTS
Of the 29 preorthodontic patients, there were 19 females and 10 males. The mean age was 25 ± 7.9 years with a range of 18–45 years. The majority of patients were Asian (62%) followed by Caucasian (38%). A total of 55% of patients were Class I, 27% were Class III, and 17% were Class II.
The US machine used in this study showed good agreement (ICC = 0.85) with transgingival probing. The intrarater reliability of the examiner using US and probing methods was excellent (ICC = 0.9, ICC = 0.97). The interrater reliability for US and probing method was also excellent (ICC = 0.98, k = 0.95). A summary of the GP determinations is shown in Figure 2.
GP determinations grouped into maxillary and mandibular anterior teeth, incisors, and canines. GP indicates gingival phenotype.
GP determinations grouped into maxillary and mandibular anterior teeth, incisors, and canines. GP indicates gingival phenotype.
Probing Method
The counts and percentages of the number of thin, medium, and thick phenotypes classified by US and probing are summarized in Table 1.
Frequency and Percentages of Gingival Phenotype Determinations by Ultrasound, Probing, and Visual Methods

Table 2 shows counts of GP determinations made by both methods. The ANOVA analysis to compare probing and US methods found an insignificant relationship (P = .188) (Figure 3). When GT were categorized into groups of thin, medium, and thick phenotype and then compared to the same probe classification groups, there was slight agreement between the US and probing methods (κ = 0.12).
Correlation of Probing Classification and Ultrasound Phenotype Classifications. More Thin (60%) and Medium (60%) Phenotypes Were Correctly Identified by the Probe Compared to Thick (26%) Phenotypes

An increasing trend was observed between the mean GT and the ordered probe classification groups; however, this relationship was not significant (P = .188).
An increasing trend was observed between the mean GT and the ordered probe classification groups; however, this relationship was not significant (P = .188).
Figure 4 details the proportion of each phenotype recorded by probing arranged in GT of 0.05-mm increments. For mean thicknesses of 0.7–0.75 mm and 0.85–0.9 mm, 50% of probing determinations were recorded as thin.
Mean GT grouped into thin, medium, and thick phenotypes as determined by probing.
Mean GT grouped into thin, medium, and thick phenotypes as determined by probing.
Visual Assessment
A total of 104 assessors participated in the web-based questionnaire. From each demographic, the greatest proportion were general dentists (62%), and males (61%) aged between 30 and 39 years (39%). A higher proportion of orthodontists (14%) over other specialists (9%) and general dentists (8%) made more correct phenotype identification using the visual method. No demographic variables were statistically significant.
Table 3 details the counts and percentages of the overall GP determination made by assessors using visual assessment. The agreement of each assessor to identify GP correctly ranged from disagreement (κ = −0.23) to fair agreement (κ = 0.35). Table 4 shows counts of GP determinations made by the US and visual methods.
Correlation of Visual Classification and Ultrasound Phenotype Classifications. Visual Assessment Correctly Identified 60% of Thin and 21% of thick Phenotypes. Visual Assessment Identified 15 Medium Phenotypes, of Which Five Were Considered Correct

Figure 5 shows the proportion of each phenotype recorded by assessors using the visual method arranged in GT of 0.05-mm increments. For patients with mean GT of 0.7–0.75 mm and 0.85–0.9 mm, the responses were comparable.
Mean GT grouped into thin, medium, and thick phenotypes as determined by assessors using visual assessment.
Mean GT grouped into thin, medium, and thick phenotypes as determined by assessors using visual assessment.
The first and second GP determinations by each assessor for the six repeated patient images is summarized in Tables 5–10.
DISCUSSION
Comparison of the probing method to determinations of GT made by US yielded slight agreement (κ = 0.12). This was consistent with other studies that compared direct and indirect methods.1,31 For instance, Alves et al. compared the probe with transgingival probing and CT methods and found slight agreement in both cases (κ = 0.19, 0.12).1
In contrast, the findings in this study differed from Kan et al. in which probe transparency was statistically insignificant compared to direct caliper measurements.3 However, their measurements were taken from a single tooth compared to multiple points of measurements, contributing to the differing results.
An increasing trend in the GT values of thin, medium, and thick phenotype probe categories was found, though this was not statistically significant. This portrayed apparent difficulties for the probe to determine phenotypes of borderline thicknesses especially in the range from 0.7 to 0.9 mm. Although this was shown with only a small sample size of 29 patients in this study, a similar conclusion was made by Kan et al. in which the ability of the probe to correctly identify GP was questionable for GT values between 0.6 and 1.2 mm.3 Alves et al. also found this with a smaller sample size.1 Interestingly, the mean GT found in this study similarly ranged from 0.58 to 1.22 mm (mean = 0.9 mm; SD = 0.32 mm). Fischer et al. also noted difficulty of the probe to discriminate between marginal cases of thin and thick phenotypes with mean values between 0.53 and 0.62 mm.32 These results suggest that, although the probing method is sufficient in identifying different GP, it may not be sufficiently discriminatory to overcome subjectivities of the user, especially in borderline phenotype cases of similar GT.
This study found clinicians unable to identify GP accurately or consistently when using the visual method. This was also evident in the spread of different GP determinations among assessors for the same patient. Previous studies have also demonstrated poorer ability to identify GP visually, compared to other indirect and direct methods.1,3 Figure 4 further illustrates a similar division of all three phenotype determinations, which was most markedly distinct across the 0.7–0.9 mm range. This suggests risk of misinterpretation of thin phenotypes for thick phenotypes, which may have a significant impact on treatment planning and, eventually, the final outcome.
Understanding an assessors’ rationale for their determination of phenotype by visual assessment may have illustrated partialities to certain teeth or morphological features. If more assessors were predisposed to a certain feature than others, then this may explain the poor agreement found with the visual method.
For all repeated images in the questionnaire, there were changes in GP determination by the assessors. The agreement for each repeated image ranged from poor to moderate, indicating internal inconsistency during phenotype determination. This was demonstrated in another study evaluating clinician determinations of phenotype by visual assessment alone.10 There was also no significant pattern observed to suggest clinicians might reliably determine one phenotype over another.
Thus, visual examination alone may not be a satisfactory technique for accurate diagnosis of GP or sufficient as a predictor of gingival esthetics after orthodontic treatment.
Limitations
This study classified GT into three categories to correspond with the categories of the CBP used in this study. Although a third category provides an extra level of precision, determining GT threshold values for each phenotype was difficult due to fewer studies employing a tertiary classification system.14,32 Without accepted, standardized threshold values, the nonsystematic approach to deciding these values will always remain a key drawback.
Most previous studies have homogenized ethnicities by sampling only Caucasians.3,9,14 In this study, the majority identified as Asian. It is conceivable that degrees of gingival color pigmentation may have influenced clinician subjectivity during probing and visual assessment.
Although it remains common in clinical practice to assess individual sites to make an overall GP determination for the whole dentition, this study found different GP exists in different teeth. Thus, there is a possibility of overestimating GP and overlooking thin phenotype, which is at most risk of mucogingival problems.
CONCLUSIONS
Within the limitations of the present study, the following can be concluded:
The probe method is sufficient in differentiating between different categories of GP. However, further research is required to assess the sensitivity of the probe method in recognizing phenotypes in the most marginal of cases.
Assessors using the visual method lack the ability to identify GP accurately and consistently among themselves.
ACKNOWLEDGMENTS
The authors wish to thank The Australian Society of Orthodontists Foundation for Research and Education for the support in funding this project.
REFERENCES
Author notes
Graduate Student, Orthodontics, Dental School, The University of Western Australia, Nedlands, Western Australia, Australia.
Former Associate Professor and Discipline Lead, Dentomaxillofacial Radiology, Dental School, The University of Western Australia, Nedlands, Western Australia, Australia.
Senior Lecturer, Orthodontics, Dental School, The University of Western Australia, Nedlands, Western Australia, Australia.
Associate Professor and Discipline Lead, Periodontics, Dental School, The University of Western Australia, Nedlands, Western Australia, Australia.
Professor and Head, Division of Health and Medical Sciences, School of Population and Global Health, The University of Western Australia. Nedlands, Western Australia, Australia.
E. Preston Hicks Endowed Professor of Orthodontics and Oral Health Research, College of Dentistry, University of Kentucky, KY, USA; and Professor of Microbiology and Molecular Genetics, College of Medicine; and Clinical Professor, Director of the Craniofacial Genetics Program, Division of Oral Development and Behavioural Sciences, School of Dentistry, The University of Western Australia, Nedlands, Western Australia, Australia.
Discipline Lead, Orthodontics, Dental School, The University of Western Australia, Nedlands, Western Australia, Australia.