Artificial intelligence (AI) is a broad-ranging term describing any machine-based system that can, for a given set of human-defined objectives, make predictions, recommendations, or decisions influencing real or virtual environments. To date, most areas of medicine have been affected by AI, and this will likely continue with technological advances.1 AI chatbot technologies, including Chat Generative Pre-Trained Transformer (ChatGPT) and similar generative AI systems, have gained attention due to their ability to generate text almost indistinguishable from human-generated text.2 GPT-3 and GPT-4, the large language models powering ChatGPT, enable a chat interface allowing users to refine inputted requests or to converse with the program, similar to an online human-to-human chat. How generative AIs have been used, combined with their additional capabilities, has raised alarm for some regarding their influence in various fields.3 ChatGPT’s claimed capabilities include the ability to write code in more than 20 programming languages, analyze and summarize blocks of text, and write text on any topic,4 though the fidelity and full range of these capabilities has yet to be truly assessed. Examples of generative AI’s text generation capabilities could be a letter of recommendation (LOR) for a medical student applying for residency or the personal statement of that same student as part of their application.
Like many new technologies, the adoption of generative AIs has grown faster than the ethics and regulation surrounding them. Following its release in November 2022 there have been numerous reports of ChatGPT’s use in various areas from undergraduate education to the courtroom.5 Concerns regarding AI have been noted by many including the CEO of OpenAI, the developer of ChatGPT, who testified before the US Senate in May 2023 urging regulation of AI.6 In June 2023 the National Institutes of Health prohibited the use of ChatGPT for peer review, citing breach of confidentiality concerns.7 In 2023 Nature published principles outlining the use and attribution of generative AIs in its journal.8 Organizations such as the Association of American Medical Colleges (AAMC) are considering appropriate chatbot use,9 and the Electronic Residency Application Service (ERAS) now requires applicants to certify that their personal statement is not “the product of artificial intelligence.”10
Knowledge about how AI chatbots will affect both sides of the application paradigm is slowly emerging, with the technology able to influence how applicants prepare applications, letter writers prepare LORs, and programs assess and filter applicants. From an applicant’s perspective, this is most likely to take the form of assistance, using the chatbot to create and edit a personal statement, draft emails to letter writers or program directors, and proofread various components of the application. Tables 1 and 2 show examples of ChatGPT-4’s answers to prompts asking for different levels of assistance, ranging from authoring a new statement to editing an existing one. There are also other ways generative AIs may assist applicants, such as providing recommendations regarding which programs to apply to, providing mock interview questions and feedback on responses, and giving recommendations for a rank order list.
From a program standpoint, there may be a dangerous temptation to use AI to review applications and provide summaries and assessments, screen applicants based on defined criteria, draft emails to applicants, and even write interview questions. While some programs already use metric-based criteria in the interview invite and ranking process, AI-based recommendations pose a potential risk given the lack of transparency in how its decisions are made and the noted discriminatory hiring practices when relying solely on AI.11 Though many of the above are possible uses of AI, it will ultimately be up to applicants and programs to determine the utility and appropriateness of each one in their respective contexts.
There is a risk that the uncertainty regarding AI’s usage and impact will translate to compulsory attempts to limit its effect on the application process. The AAMC has already prohibited AI use to produce personal statements, though the phrase “the product of artificial intelligence” does not offer clarity regarding whether any use of the technology is acceptable. Another risk of the technology relates to our ability (or inability) to differentiate AI-generated text from human-generated text. While numerous software programs claim to accomplish this, their ability to do so has not been fully assessed. How would one defend oneself from a false accusation that something they wrote was AI-generated? Although a desire to limit the use of this new technology is a common knee-jerk reaction, it is important to consider where to draw the line in the spectrum of AI capabilities. It may be obvious a fully AI-generated personal statement should not be allowed; however, would it be acceptable to have a generative AI proofread and offer suggestions of an already-written personal statement? Most word processors, editors, and software programs (eg, Microsoft Word, Gmail, Grammarly) already automatically offer grammatical corrections and style aids that help proofread text. These products can also autocomplete sentences, which is similar to how ChatGPT synthesizes text. Some applicants, while not using AI, have elected to hire a proofreader for editing and suggestions for their personal statement, resulting in a similar outcome. Concrete answers to these questions are hard to find, and there is unlikely to be unanimous agreement on what is acceptable. As the technology grows in sophistication and utility, it will be important for stakeholders and institutions to be thoughtful and innovative in any policies written to guide the use of AI in the application process.
At their core, many attempts to limit AI do not stem from an aversion or outright hatred of the technology, but instead relate to trust. When a program director reads an applicant’s personal statement, they trust the text was written by the applicant, even if some may use revisions or suggestions from those close to them, such as a family member, partner, or mentor. If that trust is broken, program directors may be less likely to believe the stories of inspiration or hardships applicants put forth. The same applies to LORs and other aspects of the residency and fellowship application. A small study with plastic surgery residents and faculty showed no difference detected between AI and human-generated personal statements in judgements of originality, authenticity, readability, and overall quality of the essays.12 Given these results, this group and others have suggested that the personal statement component of residency applications should be revaluated.12,13 Applicants also seek a fair, transparent, and equitable process. If programs utilize AI in the application review, interview, or ranking, any misinterpretation of data or introduction of bias has the potential to erode overall trust in the process. Trust in AI use will likely serve as a useful litmus test when deciding which capabilities are acceptable or not going forward. If an AI capability does not threaten the trust between applicants and programs, then it is likely more acceptable than one with a higher risk of creating mistrust.
There has been no work to date assessing the actual impact AI-generated text has on residency and fellowship applications. The presence of synthetic text (known or unknown by programs) may have little influence on programs’ decision to invite an applicant for an interview or where to place them on their rank list. Alternatively, the presence of synthetic text may in some situations improve an applicant’s odds by catching typographical errors or providing assistance with writing a legible and convincing personal statement. However, some program directors may find synthetic personal statements inauthentic and too generic. There also may be specialty-specific differences in how AI-generated text affects recruitment. Our research group aims to study these factors to determine the extent of use and potential benefits for institutions, programs, and different specialties.
It may be premature to develop proscriptive policies given the lack of data on the effects of the technology. In the interim, we urge applicants and program leadership to familiarize themselves with generative AI technologies to ensure they are informed of their potential influence. Programs should also consider assessing the effects of the technology on this and future recruitment seasons to determine how big of an issue it is for a given institution, program, and specialty. This assessment could take the form of anonymous post-Match surveys, which are regularly utilized by programs, with questions asking if generative AIs were used during the application and how they were used. Additionally, conversations with already-matched medical students could provide more detailed feedback on how AI was utilized during the recruitment season or in other areas of medical education. This rapidly evolving technology is and will continue to be used by students for the foreseeable future, so ongoing research will be essential to understand its benefits and risks.