ABSTRACT
To evaluate the utility and efficiency of four voice-activated, artificial intelligence–based virtual assistants (Alexa, Google Assistant, Siri, and Cortana) in addressing commonly asked patient questions in orthodontic offices.
Two orthodontists, an orthodontic resident, an oral and maxillofacial radiologist, and a dental student used a standardized list of 12 questions to query and evaluate the four most common commercial virtual assistant devices. A modified Likert scale was used to evaluate their performance.
Google Assistant had the lowest (best) mean score, followed by Siri, Alexa, and Cortana. The score of Google Assistant was significantly lower than Alexa and Cortana. There was significant variablity in virtual assistant response scores among the evaluators, with the exception of Amazon Alexa. Lower scores indicated superior efficiency and utility.
The common commercially available virtual assistants tested in this study showed significant differences in how they responded to users. There were also significant differences in their performance when responding to common orthodontic queries. An intelligent virtual assistant with evidence-based responses specifically curated for orthodontics may be a good solution to address this issue. The investigators in this study agreed that such a device would provide value to patients and clinicians.
INTRODUCTION
Artificial intelligence (AI) can serve to amplify human capabilities as well as increase productivity. The capabilities of AI range from simple reasoning to human cognitive-like abilities. Currently, AI serves roles in industries such as manufacturing, transport, energy, financial services, advertisement, management, and health care.1 One facet of AI technology that has evolved substantially in recent years is the use of voice-activated virtual assistant technology. Voice assistants such as Amazon's (Seattle, Wash) Alexa, Google Assistant (Google, Mountain View, Calif), Microsoft's (Redmond, Wash) Cortana, and Apple's (Cupertino, Calif) Siri are popular software programs that are designed to simulate human conversation on the front-end while being supported by a large database on the back-end to deliver information to consumers.2 The intuitive manner of interacting with technical devices without the need for tactile contact makes verbal communication the new interface to technology.3
In the health care industry, there has been an increase in the use of voice-activated AI technology. AI systems can be used to assist clinicians by providing up-to-date information from journals, textbooks, and clinical practices to improve patient care.4 The investigated uses for these systems include diagnostics, health promotion, counseling, and triage as well as health care professional training.5 In dentistry, AI technologies have not been as integrated in comparison with other health care counterparts. This is likely attributed to (1) limited data availability, structure, and comprehensiveness; (2) lack of methodological rigor/standards in development; and (3) practical questions about the usefulness, ethics, and responsibility of these technologies.6
The use of AI-based virtual assistant technology has not been documented in orthodontics. A search on databases such as Scopus, Google Scholar, and PubMed did not yield any studies that examined the use of these technologies in orthodontics. This area has experienced enormous changes in the past few decades. Notably, orthodontics has been gradually working toward a fully digital workflow. The emergence of new technologies has allowed the field to expand its accessibility of care and better educate patients about their treatment planning as well as reduce costs.7 Because of the almost universal implementation and established utility of such technologies in recent years, the utility and potential benefit of technologies such as virtual assistants in orthodontic offices should be considered.
There is a lack of knowledge about the utility of current iterations of AI-based virtual assistants in orthodontics. Therefore, the objective of this novel study was to evaluate the utility and efficiency of the following four most widely disseminated, voice-activated, AI-based virtual assistants in addressing commonly asked patient questions in orthodontic offices: Alexa, Google Assistant, Siri, and Cortana.
MATERIALS AND METHODS
Two orthodontists (S Yadav [SY] and M Upadhyay [MU]), an orthodontic resident (L Cardarelli [LC]), an oral and maxillofacial radiologist (A Tadinada [AT]), and a dental student (A. Perez-Pino [AP]) queried contents of a questionnaire consisting of 12 frequently asked orthodontic questions to the following four commercially available, voice-based assistants: Alexa, Siri, Google Assistant, and Cortana. The series of questions was selected to represent the most frequently asked questions that patients have when going to orthodontic offices. The questions were developed based on discussions with experienced orthodontists as well as recurring questions on website searches. Subsequently, the evaluators queried the four devices and rated their responses using a five-point modified Likert scale: (1) the device responded with adequate information; (2) the device responded but did not provide adequate information; (3) the device did not directly answer the question, but provided a list of accurate websites that address the question; (4) the device did not directly answer the question, but provided a list of inaccurate websites that address the question; and (5) the device did not know the response for the question. The Likert scale ratings were designed to reflect the utility and efficiency of the devices with (5) being the worst rating. The results of the queries were then recorded on a Microsoft Excel sheet for a comparative evaluation. One-way analysis of variance (ANOVA) and post hoc analyses were run to compare the mean responses of the devices among the investigators. The sample of questions used is provided in Table 1. Subsequently, the orthodontist investigators (SY, MU, and LC) were asked a yes/no question about the value of having an orthodontic-specific curated virtual assistant in orthodontic offices. Their responses were recorded in the results and examined in the discussion.
Following the query, multiple one-way ANOVA and post-hoc analyses were done. The first analysis was done to compare the combined mean values for the questions among the four devices. Lower scores were associated with higher efficiency and utility. The second analysis compared the combined mean values of all the questions between the four investigators to determine if there was variability among the devices and how they responded to the individual investigators.
RESULTS
Of all the devices, Google Assistant had the lowest mean score (1.47) (Table 2). It had a significantly lower score than Alexa and Cortana (Table 2). The scores were followed by Siri, Alexa, and Cortana (1.62, 2.28, and 3.25, respectively). Siri's score was significantly lower than Cortana's score (Table 2). There was significant variability within each of the devices (with the exclusion of Alexa) when comparing how they responded to each investigator (Tables 3 through 6). For instance, with Google Assistant and Siri, investigator MU had significantly different mean scores than most of the other investigators (Tables 3 and 5). For Cortana, there was significant variability between investigator AP and investigators SY and LC (Table 4). However, there was no significant variability in Alexa's responses to the investigators' queries (Table 6).
At the end of the rating sessions, the investigators were asked the following question: “While responses varied across the devices and users, would you think there is value in having a device that provides well curated, evidence-based responses to commonly asked questions by the patients in an orthodontic office?” The investigators unanimously answered “yes” to this inquiry.
DISCUSSION
AI is a field of computer science that seeks to simulate intelligent human behavior through computer systems. In recent years, artificial intelligence has been able to perform/automate tasks that many would not have imagined years ago. An example is the use of conversational intelligent virtual assistants (IVA) in the home, commercial, and medical settings. Among the most prominent are Apple's Siri, Amazon's Alexa, Microsoft's Cortana, and the Google Assistant.8 These technologies are changing the way that the public at large seeks information.9 A distinction about the utility of these devices in seeking information compared with traditional search engines is that they often provide single, concise answers, whereas traditional search engines provide users with a myriad of results to a query. Thus, with traditional search engines, the user must deduce their conclusion from among these results. Although traditional text-to-search is still the most common modality for addressing inquiries, this distinction, in addition to their touch-free nature, are the main reasons why IVAs will likely become the new interface to retrieving information.10 However, there are potential disadvantages to this distinction in the current state of technology. On one hand, a search engine may provide a list of resources that users can deduce from. On the other hand, IVAs may provide a concise answer that may or may not be from an accurate source. In this study, the Google Assistant provided some answers that stemmed from dental practice websites; these offices may have paid to have their information conveyed on the device.
The efficient nature of virtual assistants can serve as a utility in the orthodontic clinical setting. For patients, the use of an orthodontic-specific IVA may lead to better treatment outcomes and compliance because of the access to evidence-based information without the need of speaking to an orthodontist. For providers, it can lead to a more efficient workflow, allowing time for communication with patients/parents and diagnosis/treatment planning as well as providing financial benefits as a result of the higher patient volume. This will decrease excess patient and provider time while achieving better clinical outcomes.11 With the lack of confidence that graduating dental students report on the subjects of malocclusion and space management, general dentists refer to orthodontists to have these concerns addressed. With the growing need for orthodontic care in the United States, there will certainly need to be more efficient means to address patient concerns.12
Siri is the voice-based IVA created by Apple.13 In this study, Siri had the second lowest mean score (1.616) for its effectiveness and efficiency in answering orthodontic frequently asked questions. It had a significantly better score than Microsoft's Cortana (1.16 vs 3.25) (Table 2). This was in contrast to a study done in the field of oral and maxillofacial radiology in which Siri obtained the worst score among the four virtual assistants.14 Also, in the current study, it was found that there was significant variability in how Siri responded to individual queries. Investigator MU (a board-certified orthodontist) had significantly different scores from the other investigators. This could have been attributed to many factors. The first may have been attributed to accent variability or certain pronunciations that the device was not able to recognize. The investigators were instructed to ask the questions verbatim how they were written on the to limit permutations. However, accents may be another source of such permutations. The second possibility was that it may have been attributed to machine learning. Machine learning refers to how computers learn from data previously collected.15 All users of these devices have different information-seeking habits. Therefore, it was suspected that this may have been a factor influencing the variation in the query responses among investigators.
Microsoft's Cortana serves functions such as setting reminders and answering questions for the user using the Bing search engine on the back-end. Of all the devices queried in this study, Cortana had the highest (least favorable) score (3.25) (Table 2). It was significantly worse than Google Assistant (1.47) and Siri (1.62) (Table 2). For many of the questions, it simply responded with “Sorry, I don't know the answer to this one. But I am learning.” In addition, there was significant variability between some of the investigators in terms of how it responded to the questions (Table 4).
Google Assistant was launched by Google in 2016.16 Google Assistant had the lowest (best) mean score (1.47). Its mean score was significantly lower than Alexa (2.28) and Cortana (3.25) (Table 2). For many of the questions, it provided accurate and concise responses. However, there was some significant variability among investigators. Similar to Siri, it was determined that there were significant differences in the scoring between MU and the other investigators (Table 5). Again, this could have been attributed to differences in accents, pronunciation, machine learning, and/or other unaccounted factors. Also, it could be speculated that the device used to answer these questions may produce a different response. For example, this IVA could be used with a stand-alone device or a smartphone. All of the investigators in this study (with the exception of AP, who used a Google Nest) queried this IVA on their respective Apple devices.
Amazon's Alexa was released in 2014 and was the first voice-activated intelligent virtual assistant linked to a stand-alone home device in contrast to integration into an already existing device, such as a smartphone.17 However, it is still available on the app store for iOS and Android devices. Alexa had the third lowest score (2.28) in our study (Table 2). It performed significantly worse than Google Assistant (Table 2). It is worth noting that Alexa is specifically developed for streamlining shopping, not providing medical information. Certain questions such as “Do I have to wear a retainer forever?” prompted the device to provide clothing suggestions; this is likely a result of the word “wear.” Interestingly, no significant variability was observed among investigators with this device (Table 6); however, the other devices had significant variability.
IVAs respond to queries based on the database that supports the front-end. Certain devices such as Google Assistant contain a database that provides the user with generic information about a vast array of topics. In summary, the robust Google search engine proved to be effective at responding to most of the frequently asked orthodontic questions. However, many of the responses were derived from specific private practice websites instead of research-validated sources. Siri had similar results. Alexa and Cortana proved to be significantly worse than Google Assistant (Table 2). Alexa is mainly designed for shopping purposes and tasks. Meanwhile, Cortana remains on Windows computers mainly as a utility to respond to queries and perform tasks, reminders, lists, and functions of that nature. Therefore, the nature of these devices explains their performances in this study.
The findings were consistent with previous research by Miner et al., who found that IVAs responded inconsistently and incompletely to questions regarding mental health and interpersonal violence.9 In the current study, with the exception of Alexa, the IVA's did not prove to be consistent among investigators. The findings also concurred with Alagha and Helbing, who found that Siri and Google Assistant responded to vaccine safety queries with more accuracy than Alexa.18
The use of IVA devices in the orthodontic setting is inevitable. According to Sezgin et al., the prevalence of the use of these technologies in health care was accelerated as a result of the COVID pandemic. This was likely attributed to the touch-free and efficient nature of IVAs. The implementation of IVAs reduces the impact of delayed care and lightens the burden of completing routine tasks to providers.19 Despite this widespread use in medicine, there is no current technology that is specifically designed for orthodontics. Although some of the IVAs investigated showed some utility in orthodontics, their inconsistency in providing accurate and concise information among investigators needs to be addressed. There is a need for either an orthodontics module in an already existing IVA or an IVA specifically curated for the field of orthodontics. For instance, such technology could be implemented into the phone services of each practice with practice-specific answers on topics such as appointment scheduling, office hours, types of treatment offered at the office, patient management during treatments (eg, “What can I eat with my braces?”), and more. Another use of this technology could be a stand-alone device in the waiting room that has been preloaded with well-curated and evidence-based responses that will deliver accurate and reliable information to patients. Implementation of this technology would likely increase the efficiency of orthodontic practices, much like it has with medical counterparts. The orthodontists involved in this study unanimously agreed that such technology would provide value to orthodontic offices.
This study had several limitations. First, the sample consisted of five individuals with varied dental backgrounds. Varied responses were observed among the virtual assistants, and it would be of interest to study a larger sample size and evaluate how variation is affected among the devices. In addition, all the individuals involved in this study work in the field of dentistry. Thus, their information-seeking habits may have been favorable for the devices in generating higher quality responses to these questions. Also, a list of standardized questions was used for the purpose of the study. In reality, patients and/or providers may word their questions differently, influencing the responses they receive. Therefore, in the future, studies should be done using a broader range of questions and a larger sample size of investigators. Despite these limitations, there is a promising future for the use of these devices in the field of orthodontics. Their implementation is inevitable.
CONCLUSIONS
The widely disseminated virtual assistants tested showed significant variability in responses among the study's investigators.
There were significant differences in the virtual assistant performance scores when responding to common orthodontic queries.
An intelligent virtual assistant with evidence-based responses specifically curated for orthodontics may be a good solution to address these issues.
The investigators in this study unanimously agree that such a device would provide value to patients and clinicians.
REFERENCES
Author notes
Dental Student, School of Dental Medicine, University of Connecticut, Farmington, Conn, USA.
Professor & Chair, Henry and Anne Cech Professor of Orthodontics, Department of Growth and Development, UNMC College of Dentistry
Associate Professor, Division of Orthodontics, University of Connecticut Health Center, Farmington, Conn, USA.
Resident, School of Dental Medicine, Department of Orthodontics, University of Connecticut, Farmington, Conn, USA.
Associate Dean for Graduate Research, Education, and Training, University of Connecticut Health Center, Farmington, Conn, USA.