To examine the orthodontic patient experience having braces compared with Invisalign by means of a large-scale Twitter sentiment analysis.
A custom data collection program was created that collected tweets containing the words “braces” or “Invisalign” for a period of 5 months. A hierarchal Naïve Bayes sentiment analysis classifier was developed to sort the tweets into five categories: positive, negative, neutral, advertisement, or not applicable. Each category was then analyzed for specific content.
A total of 419,363 tweets applicable to orthodontics were collected. Users posted significantly more positive tweets (61%) than they did negative tweets (39%; P ≤ .0001). There was no significant difference in the distribution of positive and negative sentiment between braces and Invisalign tweets (P = .4189). Positive orthodontics-related tweets often highlighted gratitude for a great smile accompanied with selfies. Negative orthodontic tweets frequently focused on pain.
Twitter users expressed more positive than negative sentiment about orthodontic treatment with no significant difference in sentiment between braces and Invisalign tweets.
Communication plays a critical role in health care. Providers seek to improve patient care by connecting with patients and understanding their experiences. Traditionally, health care providers gathered this information through surveys, reviews, and word of mouth. In the past decade, communication methods have rapidly changed with the growth of social media.
Founded in 2006, Twitter is an online, fast-paced microblog wherein users share posts in 140 characters or less. Traditional blogs allow for longer, more static content, while microblogs like Twitter focus on shorter, more frequent posts. With 320 million active monthly users, Twitter has grown exponentially and become a primary method of multipurpose communication throughout the world.1
People use Twitter every day for communication, information, and entertainment. However, people primarily utilize Twitter to express their current thoughts and feelings. Kelly2 categorized the content of Twitter posts and found that 41% of Twitter posts are “pointless babble” and another 38% of tweets are conversations between users. News, information, spam, and self-promotion made up the remaining 21% of the posts. Eighty percent of users now access Twitter through their mobile device, allowing people to tweet in the moment.1
These written thoughts and feelings posted as tweets are unsolicited and publicly available. As a result, Twitter is a unique source of data. Traditional surveys often introduce recall bias and are difficult to conduct on a large scale. Twitter data are collected in real time, free from recall bias.3 With millions of tweets posted per day, the data source is vast.
Twitter data are best analyzed on a large scale with sentiment analysis.4,5 Sentiment analysis, often referred to as opinion mining, is a method to extract and characterize subjective information. Twitter sentiment analysis has been employed to study many fields, from stock market indicators to political election predictions.6–8 Companies seek ways to mine Twitter for consumer feedback about their goods and services.9 Twitter's immense information source is largely untapped in orthodontics.
Heaivilin et al.10 found that the public uses Twitter to broadcast experiences and thoughts about dental pain in real time. Their Twitter results were similar to traditional surveys about dental pain, potentially validating Twitter as a data source in the dental field. Henzell et al.11 analyzed 131 orthodontics-related tweets and found that orthodontic patients use social media sites such as Twitter to convey positive and negative feelings about their treatment.
The current literature regarding the patient experience with braces compared with Invisalign is sparse and conflicting. Miller et al.12 compared the two treatment methods and found that Invisalign patients experienced less discomfort, pain, and analgesic use during their first week of orthodontic treatment than patients with traditional appliances. However, Shalish et al.13 found no statistically significant differences in pain levels, analgesic use, or speech dysfunctions between patients treated with Invisalign and traditional appliances. Patients wearing traditional braces reported more oral sores and food accumulation but similar levels of sleep and daily life disturbances. Given the increasing popularity of clear aligners, further research is needed to investigate other aspects of the patient experience such as esthetics and treatment satisfaction. Twitter provides a new and exciting medium in which to examine the impact of orthodontic treatment on everyday life.
The aim of this study was to examine orthodontic patient experience with braces compared with that of Invisalign by means of a large-scale Twitter analysis. The null hypothesis was that there is no difference in sentiment between tweets about braces and tweets about Invisalign.
MATERIALS AND METHODS
The Virginia Commonwealth University Institutional Review Board granted an exemption for this because no individual human subject was enrolled and no identifiable information was collected. Tweets were collected over a 5-month period from April through September 2015. All tweets were publicly accessible from Twitter's database. Inclusion criteria consisted of any tweet that contained either of the two keywords, “braces” or “Invisalign.” Each tweet was classified into one of five categories: positive, negative, neutral, advertisement, or not applicable. Applicable tweets were defined as pertaining to orthodontics and written in the English language.
The software programs for this project consisted of two sections: data collection and data interpretation. The data collection program was written to interact with Twitter's servers and continuously collect all tweets that met the inclusion criteria. A second program was written to interpret the entire collected database. Each tweet was classified into one of the five previously listed categories by machine-learning sentiment analysis. The program was constructed using a Hierarchical Naïve Bayes classifier, the preferred method for Twitter sentiment analysis.4 Naïve Bayes classifiers are probabilistic classifiers that break down a block of text into a group of independent words and classify the text into a category based on the text's similarity to precategorized texts.14 Thus, the context-aware program “learns” from the precategorized texts.
Naïve Bayes classifiers require manual classification of a number of tweets to act as reference material to “train” the program. In this study, an independent reviewer manually sorted 3784 tweets into one of the five categories. These preclassified tweets, referred to as a corpus, were used to achieve two objectives: to train the program and to test it. From the corpus, 71% (2706 tweets) were used to train the classifier on which words and features were most representative of each category. The other 1078 tweets in the corpus were used to test agreement between the independent reviewer and the program.
Text classifiers are most effective when classifying text into one of two categories. The program sorted tweets into the five categories in a specific sequence (Figure 1). This method is known as hierarchical classification.15 The first classifier determined whether the text was applicable to orthodontics. Examples of nonapplicable tweets included posts such as “Britain braces for election gridlock” or tweets about knee braces. If the tweet was classified as applicable, the text advanced to the second classifier, which determined whether the tweet was an advertisement. If it was not an advertisement, the text was sent to the third classifier, which determined whether the tweet was neutral or expressed a strong sentiment. If the tweet was not neutral, it advanced to the fourth and final classifier, which determined whether the tweet expressed a positive or a negative sentiment.
Next, the corpus of 3784 tweets was analyzed and evaluated for specific content. Frequently used words were incorporated into tables of indicator words and ratios. These indicator ratios showed how likely a specific word was to cause a tweet to be sorted into a certain category, offering insight into the content of each category.14
Agreement between the independent reviewer and the program was examined by testing 1078 common tweets. The agreement was found to be almost perfect (κ = 0.97).18
Over a 5-month period, a total of 477,054 tweets were collected that contained the word braces or Invisalign, of which 419,363 were applicable to orthodontics. Many more tweets contained the word braces (96%) than Invisalign (4%).
Figure 2 is a flow chart of all collected tweets and their classification. Tweets not applicable to orthodontics made up 12% (57,691) of all collected tweets and were excluded. Among the applicable tweets, advertisements made up 8% (34,819). The remaining 92% of the tweets applicable to orthodontics were assumed to be from orthodontic patients or people interested in orthodontics. Next, 53,677 tweets were classified as neutral and filtered out. The remaining subset contained 330,867 positive and negative tweets about the orthodontic experience.
Table 1 shows a breakdown of the applicable and nonapplicable indicator words among the 3784 braces and Invisalign tweets that were analyzed for specific content. The ratios show how likely a given word was to be sorted into a particular category. For example, Table 1 shows that “weather” had a 76.2:1 ratio of not applicable to applicable. If the word weather appeared in a tweet, it was 76.2 times more likely to be classified as not applicable than applicable. Other tweets that were not applicable to orthodontics contained the words, “severe,” “#suspenders,” “#menswear,” or “#fashion.” In contrast, the word “teeth” was 47.4 times more likely to be classified as applicable to orthodontics. Other tweets applicable to orthodontics included the words “off” and “want.”
There was a significant difference in the proportion of advertisements between Invisalign and braces tweets (P < .0001), with 33% of Invisalign tweets being classified as advertisements and only 7% of braces tweets classified as such. Despite this difference in proportion, a greater number of braces advertisements (28,879) were collected than were Invisalign advertisements (5940). Table 2 displays the total counts and percentages for each category.
Table 3 presents the indicator words for the advertisement tweets broken down between braces and Invisalign tweets. The indicator words listed in the table were much more frequently found in tweets that were classified as advertisements than as nonadvertisements. In contrast to Table 1, the indicator words in Table 3 have a braces column and an Invisalign column to compare and contrast the advertising differences between the two groups. Braces ads often contained the words “smile,” “straight,” and “traditional,” while the Invisalign tweets included the words “offer,” “whitening,” and “alternative.”
Sentiment was then analyzed after separating tweets into the two main categories of braces and Invisalign. There was no significant difference in the distribution of positive and negative tweets for braces compared with Invisalign (P = .4189), as 38% of Invisalign tweets were classified as negative and 39% of braces tweets were classified as negative.
Among all braces and Invisalign tweets that expressed polarity, there were significantly more positive tweets than negative (P < .0001), as 62% of polarized tweets were positive and 38% were negative. Table 4 displays the total counts and percentages for each category. Table 5 shows a breakdown of indicator words for the positive and negative tweets. Common positive words included “thank,” “#smile,” and “#selfie,” while common negative words included “hate,” “pain,” “hurts,” and “food.”
Strengths of this study include accurate classification of a large volume of tweets and quantitative analyses of each category of tweets. Eke19 raised concern about the use of Twitter for research, worrying that because context is not taken into account when extracting specific words, such a method could result in low predictive values. To reduce this risk, the Naïve Bayes classifying technique utilized “context-aware” machine learning. As a result, sorting by the program had nearly perfect agreement with the manual human sorting. A limitation of Twitter studies is the inability to gather demographic information, as users' demographics are not linked to their profile.
Align Technology (San Jose, Calif) advertises Invisalign treatment as offering an improved patient experience over traditional braces, emphasizing comfort, convenience, and an improved lifestyle.20 The results of this study did not find a significant difference in the distribution of sentiment between the braces and Invisalign tweets. Therefore, the null hypothesis that there is no difference in sentiment between tweets about braces and tweets about Invisalign was not rejected.
Most of the tweets expressed positive sentiments regarding both major treatment modalities—braces and Invisalign. Many users expressed gratitude for their orthodontic treatment, using the word thank to express appreciation for their new smile. Many of the positive #selfie tweets were accompanied by photographs of patients showing their new smile upon completion of treatment. This finding demonstrates that appliance removal defines a significant day in the life of patients. Social media can be employed to capture these important moments and connect a practice to the community.
The negative tweets gave insight into the experiences considered irritating by patients. The most frequent complaint about treatment was pain. Most of the negative tweets containing the word food were focused on the pain from chewing and not necessarily the food restrictions limiting sticky and hard foods. Other Twitter users bemoaned the challenges of wearing “rubber bands.” “Lisps” developed from Invisalign aligners was another objection, while others said they were “sick” of braces and their “ugly” appearance. Others sources of frustration were “retainers” and broken appliances. Orthodontic providers need to have a thorough understanding of these common negative reactions to treatment in order to improve the orthodontic experience.
Interestingly, the word off was almost always applicable to orthodontics at a ratio of 46.4:1, but off was not found in the list of positive or negative indicator words. While some users excitedly tweeted about getting their braces off, others complained that their orthodontist refused to remove their braces.
Orthodontic advertisements on Twitter emphasized smiles. Ads often contained the word “offer.” Some of these tweets stated the services offered by the office, and others announced special offers to begin orthodontic treatment. Ads for braces sometimes attempted to attract new patients by showcasing “clear” braces. Invisalign was often marketed as an “alternative” to braces, and some offices offered whitening along with Invisalign treatment. Other providers used Twitter to distribute practical information such as hours of operation and practice Web site links.
This study analyzed Twitter posts about orthodontics to gain a better understanding of the patient experience. On average, over 2700 orthodontic tweets were collected each day. Most of the tweets about orthodontics were positive, and there was no significant difference in the sentiment distribution between tweets about braces versus tweets about Invisalign.
Twitter users shared more positive than negative sentiment about their orthodontic experiences.
There was no significant difference in the distribution of positive and negative sentiment between tweets about braces and tweets about Invisalign.
Negative orthodontic tweets frequently focused on pain.
Positive orthodontic tweets often highlighted gratitude for a great smile accompanied by selfies.
This study was supported by the Southern Association of Orthodontists and the Virginia Orthodontic Education and Research Foundation. The authors report no conflict of interest with Align Technology.