After decades of frustration with long “AI Winters,” various business industries are witnessing the arrival of AI's “Spring,” with its massive and compelling benefits. Auditing will also evolve with the application of AI. Recently, there has been a progressive evolution of technology aimed at creating “artificially intelligent” devices. Although this evolution has been permeated with false starts and exaggerated claims, there is some convergence on the fact that substantive progress has been obtained in the last few years with the adoption of deep learning in conjunction with much faster machines and dimensionally larger storage spaces (and samples). The area of auditing has lagged business adoption in the past (Oldhouser 2016), but is prime for partial automation due to its labor intensiveness and range of decision structures. Several accounting firms have disclosed substantive investments in the AI fields. This paper proposes various areas of AI-related research to examine where this emerging technology is most promising. Moreover, this paper raises a series of methodological and evolutionary research questions aiming to study the AI-driven transformation of today's world of audit into the assurance of the future.
Over the last few decades there has been a progressive evolution of technology aimed at creating “artificially intelligent” systems. The conceptualization of artificial intelligence and its usefulness are the subject of discussion in academia and business practice. The introduction of revolutionary technologies eventually brings basic changes to processes and the reorganization of entire industries. Industries, and now auditing, are making substantive investments in these domains.
Background and Technological Process Reframing
Although this evolution has been permeated with false starts and exaggerated claims, there is some agreement on the fact that substantive progress has been obtained in the last few years. This was aided by the adoption of deep learning in conjunction with much faster machines, dimensionally larger storage spaces, and consequently very large data populations. Several aspects of hardware and software capabilities combined with statistics and modeling have been used to change system capabilities, creating some resemblance of intelligent functions that have been traditionally attributed only to human beings.
Wikipedia defines artificial intelligence (AI) as:
intelligence exhibited by machines. In computer science, an ideal “intelligent” machine is a flexible rational agent that perceives its environment and takes actions that maximize its chance of success at some goal. Colloquially, the term “artificial intelligence” is applied when a machine mimics “cognitive” functions that humans associate with other human minds, such as “learning” and “problem solving.” As machines become increasingly capable, facilities once thought to require intelligence are removed from the definition. For example, optical character recognition is no longer perceived as an example of “artificial intelligence” having become a routine technology. Capabilities still classified as AI include advanced Chess and Go systems and self-driving cars. (https://en.wikipedia.org/wiki/Artificial_intelligence)
The area of auditing has lagged behind business in technology adoption in the past (Oldhouser 2016), but is prime for partial automation due to its labor intensiveness and range of decision structures. Furthermore, several technologies have been progressively developing that can serve as motivators of automation as well as change auditing methodology. This is due to new technological capabilities or changes to the cost/benefit of the execution of certain functions. The phenomenon that we will call Technological Process Reframing (TPR) can be defined as the reconsideration of methods and processes on an area of endeavor consequent of the advent of a disruptive technology.1 Standards and regulations tend to have a delaying effect, which is highly noticeable in the auditing area. Table 1 illustrates technologies that eventually may bring audit into TPR.
The first research question (RQ) to be asked is:
How will the field of artificial intelligence change the audit process through TPR?
Industry Is Adopting AI on a Large Scale
There is a fast rising trend among giant technology firms such as Google, Microsoft Corporation, and Baidu, Inc. to improve their AI activities. The level of expenditures these companies allocated to AI-related deals has quadrupled from 2010 to 2015, to reach approximately $8.5 billion (The Economist 2016a). This level of resource allocation to the AI field is unprecedented and unusual, as evidenced by little investment in AI during the 1980s and through the 1990s. This was mostly attributed to the failure of AI technologies to meet initial expectations. However, recent AI deals (mostly initiated by Google) revived the interest in the AI field. For instance, DeepMind Technologies Limited, the startup company that developed AlphaGo (a computer capable of winning at the complex game Go), was recently acquired by Google for approximately $600 million. Facebook, Inc. followed Google's lead by establishing their own AI research lab, going so far as to enlist the services of a renowned academic from New York University to manage it (The Economist 2016a).
International Business Machines Corporation's (IBM) Watson has been mainly applied in various healthcare applications, where it has been most successful. The medical field witnesses highly dynamic research, with new studies continuously adding to medical literature. In addition, the problem is exacerbated by additional complex and dynamic patient-related information that physicians need to process, and subsequently incorporate in their diagnoses. Watson is not affected by the human limitations of processing excessive amounts of information. Consequently, it has the capacity to combine massive amounts of textual information from the literature, to patients' records, and even image data, to develop and evaluate hypotheses. In fact, Watson proved to be highly effective and efficient in the identification of cancers, and even in recommending specific treatments. It is the combination of large training datasets, such as images of cancer and strong algorithms that enable Watson to correctly identify cancer. Currently, numerous academic medical institutes, such as Mayo Clinic and MD Anderson Cancer Center, are partnering with IBM to develop healthcare systems that can assist healthcare providers in examining and understanding patients' cases and providing a diagnosis and a recommendation for patient-specific treatment (Power 2015).
Public Accounting Firms Are Investing Large Amounts in Deep Learning
Recognizing the massive potential in AI, the Big 4 accounting firms are delving into AI. Jon Raphael, the chief innovation officer at Deloitte Touche Tohmatsu Limited (Deloitte), states that, with the effective implementation of cognitive technologies, the audit process will become “smarter, more insightful, and more efficient. This is the future of the audit profession, and the users of financial statements deserve it” (Raphael 2015). KPMG, in March 2016, announced that it would work with IBM Watson to apply cognitive computing technology to its professional services offerings. The idea is that the auditor will use Watson to analyze massive volumes of financial data to detect anomalies. For example, “by scaling human skills and judgment through the application of cognitive technology across a bank's commercial mortgage loan portfolio, auditors gained a more detailed and comprehensive understanding of the bank's credit files and potential audit exceptions based on loan grading” (KPMG 2016). At the same time, Deloitte is collaborating with Kira Systems Inc., a contract analysis system, to create cognitive models that examine large numbers of complex documents, extract and structure textual information for better analysis, and assist auditors with the difficult task of document reviewing (Deloitte 2016). The other big accounting firms are showing similar interest in the AI field. While Ernst & Young (EY) has been providing a software that models human behavior since 2015 (EY 2016), PricewaterhouseCoopers employs AI techniques such as DeNovo within its own operations. This tool, for instance, helps analysts and clients evaluate the disruptive potential and future use of a particular financial technology (MIT Technology Review 2016). Since industries and big accounting firms have launched numerous AI projects, the cost and benefits of the inputs should be considered.
How do we analyze the cost and benefits of the investment in AI?
How can we make auditors, who lack data mining and AI knowledge and skill, master AI techniques and tools?
AI IN AUDITING: DEFINITION AND HISTORY
Operationally, for the audit area, we will define artificial intelligence as a hybrid set of technologies supplementing and changing the audit. Audit procedures are a direct consequence of available technologies. The advent of computers changed the scope and the methods of examination. The advent of analytics will change the time scope of the audit (more proactive than reactive), the efficiencies, and the cost and benefit of the work. The advent of AI will embed human-like activities into automation. In general, it is thought that technology applied to audit allows activities to be performed more effectively and more efficiently. It must be pointed out that in the audit domain technology can totally change what is done, in addition to the efficiency considerations above. For example, the advent of computers allows for full population testing, but it is a different process than manual document examination. As an illustration, the application of AI to contract analysis will eventually allow the full examination of contract populations and extraction of their features (Deloitte 2016; PwC 2016).
The History of AI
As early as the 17th century, Thomas Hobbes proposed the initial idea of Artificial Intelligence. He pointed out that the behavior of a human being could possibly be understood in mechanical terms, and symbols (e.g., number, graphs, calculations, and statistics) could be used as synonymous substitutes for longer expressions to solve problems (Hobbes 1651; CLEVERISM 2015). In 1955, McCarthy, Minsky, Rochester, and Shannon (2006) initiated one of the first AI research projects. The objective was to enable machines to use language (in terms of abstraction and concepts) to solve problems and improve itself. Subsequently, scientists implemented different approaches to build up artificial intelligence. Unfortunately, after the initial fascination with the AI research came the “AI Winters,” as the research of AI did not achieve solid results due to technology limitations. Dreyfus (1965) divided AI into four areas: (1) game playing, (2) language translation and learning, (3) pattern recognition, and (4) problem solving. Twenty-seven years later he indicated that much of the promise of AI has not been realized (Dreyfus 1992). Recently, we have observed a resurgence of AI enabled by improvements in infrastructure speed, availability, and scale, as well as the innovation of cloud computing and the emergence of new data storage and processing technology such as Apache™ Hadoop®. Deep learning, a new frontier in AI focusing on computational models (deep neural networks) for information representation, has the capacity to automatically extract features from unstructured or semi-structured data like images, speech, text, video, etc.
AI in Auditing-Expert Systems and Artificial Neural Networks
While the current literature on AI is enormous and ranges from algorithmic essays (e.g., Courbariaux, Hubara, Soudry, El-Yaniv, and Bengio 2016) to a wide set of applications in many areas of endeavor (e.g., Zhang, Zhao, and LeCun 2015; Silver et al. 2016), there is little extant research on AI in auditing. Furthermore, a vast majority of those now-aged publications are centered on expert systems. Expert systems have been frequently mentioned as potential elements in the audit process (e.g., Bedard and Graham 1994; Graham, Damens, and Van Ness 1991) and tax planning (e.g., Shpilberg, Graham, and Schatz 1986). Gillett (1993) designed an audit expert system to assist the auditor in tailoring audit programs and described the first steps of the long implementation process (Vasarhelyi 1993). Moreover, from 1989 to 2005, six volumes of book series were published, which covered a variety of applications of expert systems and discussed the values that expert systems created for accounting and auditing (Vasarhelyi and Kogan 1989, 1994a, 1994b, 1998; Vasarhelyi and O'Leary 2005; Vasarhelyi, Bonson, and Hoitash 2005).
The British Computer Society Specialist Group on Expert Systems defined an expert system as “the embodiment within a computer of a knowledge-based component, from an expert skill, in such a form that the system can offer intelligent advice or take an intelligent decision about a processing function. A desirable additional characteristic, which many would consider fundamental, is the capability of the system, on demand, to justify its own line of reasoning in a manner directly intelligible to the enquirer. The style adopted to attain these characteristics is rule-based programming” (Connell 1987, 221).
When applied to auditing, an effective expert system brings numerous benefits such as automatic understanding of audit task processes, as well as increased knowledge and knowledge transferability (Omoteso 2012; Lombardi and Dull 2016). The application of an expert system to accounting, auditing, and tax domain started in the 1980s (Michaelsen 1982; Dungan 1983). Early expert system research was conceptual and mainly for demonstration purpose. Thanks to the early research, accounting practitioners explored the feasibility and potential of expert systems and showed great interest in this area. Public accounting firms made great investments to build expert systems (with larger knowledge bases) to support audit planning, compliance testing, substantive testing, risk assessment, and decision making (Brown 1991).
In order to develop more sophisticated systems, case studies investigated real-world activities, and practitioners were interviewed about their experiences in expert systems implementation (e.g., Brown 1991). As the technology became increasingly mature, research started to shed light on approaches for knowledge acquisition from the human experts, human interface, and implementation (Gray, Chiu, Liu, and Li 2014). Another line of research focused on the strengths and limitations of expert systems and the impact of expert systems on auditing (e.g., O'Leary 2009; Arnold, Collier, Leech, and Sutton 2004; Eining and Dorr 1991; Gray, McKee, and Mock 1991). Subsequently, expert system research tended to be more empirical and centered on more disaggregated topics. Examples include the reliance on expert systems based on a laboratory experiment (Swinney 1999), the design of a fuzzy logic expert system for materiality assessment (Rosner, Comunale, and Sexton 2006), and the introduction of decision rules for the evaluation of the entity's going-concern status (Murphy 2008). Expert systems research in the accounting and auditing domains follows life cycle stages similar to those of the generic industry life cycle (Gray et al. 2014). This research grew in popularity and peaked in the 1986–1998 timeframe, and has progressively disappeared from the literature since 1999. On the other hand, some research interests shifted toward traditional artificial neural networks (ANN).
The traditional ANNs simulate the structure of biological neural networks and aim to loosely mimic the way humans receive and process input information. However, compared with neural networks in human brains, the traditional ANN, which consists of one input layer, one or two hidden layers, and an output layer, is too simplistic and can only be used for supervised learning. As a result, the traditional ANNs are applied in limited areas, including assessing management fraud (Green and Choi 1997), forecasting fraudulent financial reporting (Bell and Carcello 2000; Lin, Hwang, and Becker 2003), predicting going-concern status (Koh 2004), and supporting the issuance of qualified opinions (Pourheydari, Nezamabadi-pour, and Aazami 2012).
The Comparison and Combination of Expert Systems and Artificial Neural Networks
Although the research stream of expert systems is no longer the leading research of AI in the auditing domain, and the traditional ANNs cannot be applied to conduct complicated audit analytics tasks (especially Big Data analytics), a combination of the idea of expert systems and ANNs could facilitate the implementation of modern AI in auditing. An example is deep learning. While past research in expert systems used experienced individuals to extract the rules and algorithms to deal with sequencing, prioritizing, and resolving conflicting rules, deep learning uses its multiple layers within a neural network to derive the parameters of the network. For example, just as an expert system has a large knowledge base and rules, a deep learning model, using a deep neural network with deeper hierarchical structure than that of a traditional artificial neural network, has its training dataset and extracts features from those samples. The difference lies in the fact that for the deep learning model the “knowledge base” training dataset is much larger, with higher dimensionality and multiple structures. As a result, a well-trained deep learning model is able to analyze unstructured or semi-structured data without human intervention. With the help of deep learning, numerous audit tasks such as reviewing contracts, processing paper work, and analyzing financial statements can be automated. Therefore, the comparison and combination of these methods may be highly promising research.
What are the differences and similarities between expert system and deep learning methodologies?
What can be learned from the extant research in expert systems to support the application of deep learning to auditing?
Is it appropriate to directly employ existing deep learning models (such as Watson AlchemyAPI) trained with nonfinancial data to analyze financial content?
A CONCEPTUALIZATION OF AI FOR AUDITING
In general, what can be expected in a reasonable period of time from artificial intelligence for auditing is a composite of functionalities drawn from many disciplines and applications that can perform complementarities of audit functions of many types increasing the competencies and effectiveness of the assurance function. This can be compared to the self-driving car, which relies on a variety of techniques and intelligence such as radar, LIDAR, GPS, odometry, and computer vision, as well as deep learning systems that are capable of analyzing sensory data to distinguish between different cars and objects on the road (Lassa 2012; Zhu, Miao, Hu, and Qing 2014). Cars have been launched/developed that warn of obstacles, improve safety in a collision, autonomously manage velocity, can drive autonomously on highways, and are fully autonomous. For this purpose, there is a constant aggregation of different technologies that result in a sense of intelligence, as there is an incremental increase of functionalities, some of which are superior to features of human intelligence. For example, the size of databases, the accuracy of structured data retrieval, the speed of perception to reaction, and the ability for accurate and rapid computation with large numbers are all features that when combined create some form of intelligence. The discussion on whether a device is intelligent is very definitional. What matters, certainly in this context, are capabilities, process performance, and economics.
The development of cheap and reliable sensing devices that possess capabilities of vision, smell, sound detection, voice recognition, motion detection, face recognition, etc., opened the doors to a wide range of production functionalities that are also usable in the assurance function. Archives of these measures can serve as confirmatory evidence of the performance of tasks, secondary evidence of levels achieved, or as confirmation of flows. Many of these applications are still outside the confines of current technological competence, but are likely to develop rapidly over time with the expansion of technology. For example, inventory items tagged with radio frequency identification (RFID) chips and tagged in the trajectories of acquisition of goods or manufacture can be used for production control, supply chain management, and can be stored as audit trails of inventory usage. Face and voice recognition software and archives can serve as supporting evidence for cyber security, or more prosaically as authorization and separation of duties controls and meta-controls.
The term meta-control pertains to the concept of control/evaluation/verification of controls and potentially the development of capabilities within an assurance framework that do not exist today, but can be done (TPR) with new methods and technologies. The traditional audit is retroactive in nature mainly due to the information processing capabilities of early accounting. The assurance of these numbers was originally motivated by third parties relying on these measures and needing to be reassured of the quality of the measurements. Once a third party performs a function there is the potential for the need of assurance. The advent of automatic measurement, fast processing, and better analytic methods improve the possibility of predicting the “normal” outcomes (Kuenkaikaew 2013; Abbott 2014) and their immediate verification that the actual values correspond to the predictions. This, in turn, creates a new form of control/assurance, much of which has not yet been explored and incorporated into the assurance processes. This different form of controls (or meta-controls) can be argued not to be auditing but rather a form of more advanced assurance. These will evolve with time into the toolset of assurance, but are not currently part of the “traditional” audit.
What are the changes in audit conceptualization that will be facilitated by sensing, archiving, and predictive technologies?
Exogenous Measurement and Quality of Measurement
Another issue emerging with meta-processes is the quality of measurement. Organizations often resort to direct physical counts of inventories, product sales, and worked hours. A traditional audit verifies documents and records retroactively but another form of verification outside of the traditional audit with a different (not necessarily worse) quality of measurement is emerging. Some of these exogenous measures are presented in Table 2.
Rapid Detection of Phenomena
Auditing is not currently capable of rapidly helping operations in the detection of anomalies, poor measurement, violations of security (cyber), etc. Furthermore, the evolution of continuous audit (Vasarhelyi and Halper 1991) raised many conceptual questions analogous to the meta-process issues raised above. It is difficult to conceive that a more current and precise detection of anomalies, perhaps in time with the process chain to avoid the downstream transmission of early detected faults, is not beneficial to third parties (stakeholders) as well as the management of the firm. Although this creates conceptual difficulties such as the definition of lines of defense, meta-control is not what is defined as current audit; it has to motivate a certain degree of TPR.
What are the lines of defense in the modern continuous and (partially) intelligent audit?
What are the modules of the modern (intelligent) assurance process?
Integration of Evidence
The environment of the emerging assurance process (Kozlowski 2016) is one of big endogenous and exogenous data, automatic audit analytics being applied to a continuous data flow, audit by exception (ABE) (Vasarhelyi and Halper 1991; Issa 2013; Appelbaum, Kogan, and Vasarhelyi 2016), and a wide set of “modern audit assertions” relative to risks, controls, and data being frequently examined. It also includes auditors, with a wide set of tools examining constantly found exceptions and extracting exogenous data from the environment for secondary (and tertiary) evidence serving management, assurance, and third parties. Consequently, a much wider, frequent, and conflicting set of evidence will constantly emerge that will have to be eventually automatically evaluated and exposed/negotiated with management, leading to a set of assurances of many types and frequencies. This environment will confront the current rules of evidence, the type of data to be accepted as evidence, the need for constant usage of peer data, the need for accounting and disclosure at a much more disaggregate level, etc.
Rather than focusing on the limited information provided by financial statements, auditors will be able to take advantage of textual data from social networks, video recordings, captured imagery, sensor data (e.g., GPS locational data, RFID data), and combine the extracted features with accounting and financial information. The various functions of deep learning allow auditors to automate a number of tasks such as reviewing source documents (e.g., bank check, deposit slip, sales invoice), processing paper work, analyzing conference calls, emails, press release, news, and extract metadata from them, all of which could be additional supporting evidence used to supplement traditional financial attributes. These functions serve financial statement analysis, which is a comprehensive task. When auditors analyze financial reports, the machine scans and identifies each account and its balance and links these numbers to the related supporting evidences automatically, thus enabling the detection of irregularities.
In a recent study, weather data were collected from open sources and utilized to predict revenue sales, which were subsequently compared to traditional auditing methods (Yoon 2016). This is one form of sensor data that is continuously generated and collected, and is readily accessible, but remains largely underutilized by companies, auditors, and researchers alike. Other examples include RFID tags to track the cost of goods sold as well as the level of inventory, and GPS locational data to identify goods in transit.
In order to explore text data, Liu and Moffitt (2016) examine the association between SEC comment letters and the probability of a company 10-K restatement using text mining. Bochkay and Levine (2013) examine how the use of text analytics on MD&A information from SEC filings can improve earnings forecasts. The text analysis is based on the “bag-of-words” method without consideration word sequence. Sun and Vasarhelyi (2016) explore textual analysis with deep learning. Their study utilizes the sentiment features of management earnings conferences calls generated by AlchemyLanguage API, a deep learning textual analyzer, as predictors for material weakness in internal control over financial reporting and provides evidence that those sentiment features provide incremental information for the prediction task. Accounting firms are investing significant resources in order to analyze textual information. Contracts analysis is yet another example of how the utilization of AI can efficiently analyze large documents, a task that has been reserved to humans until now (Yan and Moffitt 2016).
The evolution of assurance processes can be broken down into process/automation changes and progressive automation of judgments in a scenario of constantly increasing testable “modern assertions” and many different forms and timings of opinions.
What are these more detailed “modern assertions?”
What data will be evidence?
What is an adequate way to taxonomize audit judgments that is appropriate for intelligent automation?
To what degree can audit judgment be automated?
Data for Deep Learning in Audit
Generally, to avoid overfitting and to ensure high classification accuracy, a large size of training data is required, as deep learning employs a deep neural network with numbers of parameters and deep hidden layers. Unfortunately, due to statutory limitations, auditors do not have an ocean of data like those provided by Google or Facebook. The auditor has an ethical (or legal) obligation to maintain the confidentiality of client information and is prohibited from disclosing any confidential client information without the consent of the client (AICPA SAS No. 103;2 AICPA Code of Professional Conduct and Bylaws, Rule 301 [ET Section 301.01]).3 It raises a large set of uncertainties and barriers for an audit firm. The multiple dimensions that make audits unique, and the decisions artisanal, may be a major difficulty in this work. As a result, the investigation of the “ideal” training dataset to achieve the target accuracy for different analysis tasks is imperative.
In fact, there are no heuristics for determining the optimal size of data for deep learning applications. The amount of training data for deep learning depends on different aspects of the experiment, including: (1) the nature of the data (e.g., are they image or text?); (2) the quality of the data (e.g., are they representative or biased?); (3) the dimension of the data; (4) the complexity of the problem (e.g., how different are the classes you want to categorize if you want to identify black images from white ones or distinguish the face of a particular person?); (5) the architecture of the neural network implemented; and (6) the techniques used for the experiment. A number of approaches can be used to reduce the amount of data required (ResearchGate 2016). Sander Dieleman, a member of the winning group of the National Data Science Bowl competition, posted the group's work in the blog “Classifying plankton with deep neural networks.” They classify the image of plankton with deep learning using only 30,000 examples for 121 classes. The author discusses the question of “how much data do deep neural networks require?” “A challenge with this competition was the size of the dataset: about 30,000 examples for 121 classes. Several classes had fewer than 20 examples in total. Deep learning approaches are often said to require enormous amounts of data to work well, but recently this notion has been challenged, and our results in this competition also indicate that this is not necessarily true. Judicious use of techniques to prevent overfitting such as dropout, weight decay, data augmentation, pre-training, pseudo-labeling and parameter sharing, has enabled us to train very large models with up to 27 million parameters on this dataset” (Dieleman et al. 2015).
One project of Lab41 (2016) conducts an experiment on the tradeoff between data size and model performance for sentiment analysis. It compares the performance of five models on subsets of a training dataset of Amazon.com, Inc. reviews: the full 3 million, 500 thousand, 100 thousand, 50 thousand, and 25 thousand. The result shows that the final test accuracy does not decrease as much as they expected with smaller data. Only the model with the smallest training data does not provide acceptable prediction results (Lab41 2016). They claim that “the famed ‘data hunger' of deep learning applies less strongly to text classification problems like sentiment analysis than to computer vision or more complicated NLP tasks like machine translation” (Lab41 2016).
Even in the image recognition area, it is not always true that huge samples are necessary for deep learning and a learning curve could help explain this (Lab41 2016). Cho et al. (2016) develop six deep learning models with different sizes of medical CT images of six body parts as training data (the training data size for each model is 5, 10, 20, 50, 100, and 200, respectively), and then test the trained model by introducing 1,000 new images of each body class (brain, neck, shoulder, chest, abdomen, pelvis), with a total of 6,000 CT images increasingly introduced. The results (Figure 1) show that the system could reach the desired accuracy of 99.5 percent with a training dataset per body class of 4,092, and there is a learning curve for the system to classify CT images.4
On the other hand, one can use cross-validation to examine the misclassification rate to determine whether the data size needs to be increased. Although many techniques can be used to help reduce the required data size, the currently available data for auditors is not sufficient for developing insightful data analysis. Audit firms, legal departments, regulators, and clients will have to cooperate to open the doors to find contracts, leases, and data events necessary for the usage of this technology.
Are audit populations a large enough sample for deep learning?
How can the deep learning of financial statement fraud identification be performed if the known number of fraudulent cases is very limited?
HOW WILL AI AFFECT THE AUDITING PROFESSION?
Companies and organizations are generating and collecting large amounts of data on a continuous basis, from points of sale, to shipment tracking information, as well as inventory counts in real time. In addition, information from exogenous sources, in the form of social media and news feeds to name a few, is readily accessible and available for analysis. It is, in fact, the application of AI to this type of Big Data that is expected to take the auditing profession a step forward. With such large databases, traditional audit procedures become less effective and efficient, which necessitates a rethinking of the way audits are conducted (Dai and Vasarhelyi 2016).
A number of studies in the social sciences literature have found that humans perform poorly in the complex tasks that require the collection and aggregation of excessive information from multiple sources (Kleinmuntz 1990; Iselin 1988; Benbasat and Taylor 1982). It has been well documented in the accounting and auditing literature that exposure to large amounts of information can potentially lead to increased ambiguity, information overload, difficulty identifying relevant information and patterns and, consequently, lead to suboptimal audit judgment (Driver and Mock 1975; Chewning and Harrell 1990; Stocks and Harrell 1995; Alles, Kogan, and Vasarhelyi 2008; Alles, Brennan, Kogan, Vasarhelyi 2006; Brown-Liburd, Issa, and Lombardi 2015). This problem is exacerbated by the unstructured nature of Big Data and the high level of complexity and ill structure involved in certain audit tasks, such as the evaluation of internal controls. Hence the new methodologies can assist auditors in overcoming the aforementioned limitations.
The auditing profession is standards driven, making it impractical for the profession to adopt any new technology or methodology if not required or approved by the standard-setting boards. The profession will face the challenge of adjusting the current auditing standards in order for the adoption of such a disruptive technology to prevail. An example is continuous auditing, where the adoption reluctance of external auditors seems to be driven by the current auditing standards. The standards are still based on traditional auditing procedures, which were effective when the sizes of databases were small, but became ineffective in today's real-time digital economy. These standards will have to allow and even encourage auditors to take advantage of AI in order to provide a higher level of assurance more frequently, if not in real time. In addition to improving audit effectiveness through the integration of new types of evidence, AI application in auditing can significantly improve audit efficiency. Rather than manually examining a sample of transactions, auditors can take advantage of AI methodologies to examine complete populations of transactions in a much shorter time. Instead of spending their time on manual labor, auditors will be able to put their professional skills to better use on high-value tasks by focusing their efforts on the interpretation of the results produced by AI. This issue is expected to escalate as more data are continuously generated and collected, and demand for more frequent audits increases (Vasarhelyi, Alles, and Williams 2010).
What It Will Enable
The increasing maturity of AI technologies, more specifically deep learning technology, such as visual recognition, textual analysis, natural language processing, and audio processing, provides unlimited potential and inspiration for its application to auditing.5
Deep Learning in Image Recognition
Empowered by deep convolutional neural networks (CNN) and the availability of very large amounts of sensor data, visual recognition techniques are capable of recognizing object categories (such as a “car” or a “building”), or “image classification,” as well as detecting the precise position of a certain object in the image, or “object detection” (He, Zhang, Ren, and Sun 2015). As a result, visual recognition techniques are able to “understand” the content of an image taken by a drone or video captured by surveillance cameras, automatically identify the object and subject (including human faces) in the image, and subsequently organize and classify each image into a predefined logical class. Simultaneously, each image is linked to searchable tags based on the visual content in the image, in order for the system to effectively and efficiently locate the image with specific content or concept, as well as related images, with a confidence level per the user's request (Rosenberg 2013). Such technique can facilitate the automation of assets and inventory checks and fraud detection. For example, a deep neural network learns from the abundantly existing product images, and then analyzes inventory images captured by automated drones to replace physical inventory checks (Appelbaum and Nehmer 2016). Another example involves the analysis of security footage from parking lots, where a deep learning model can be trained to analyze the number of customers, and consequently predict the revenues of a company. It is important to note, however, that these types of evidence should be regarded as supplemental evidence, rather than replacement of financial information, in order to provide higher audit effectiveness.
Deep Learning in Language Analysis
Deep learning in conjunction with linguistic analysis is able to automatically analyze text, including both HTML/text documents and webpages. Unlike the prevalent “bag-of-words”-based text mining method that requires time-consuming data preprocessing steps,6 deep learning does not require a human user to preprocess the text. The returned value of the textual analysis includes the authors, keywords, concepts, relationships among those concept, involved concepts and, more importantly, the attitude (sentiment) and other emotions (such as anger, joy, disgust, sadness, etc.) for targeted phrases, entities, or keywords.
The extracted metadata could constitute informative attributes for audit analytics. This task is conducted by learning from the significantly large amounts of textual data (for example, AlchemyLanguage API is trained on billions of webpages) using a complicated computational process and running it through the deep hierarchical neural network involving multiple hidden layers and nodes. In fact, the trained text analysis model is capable of understanding the meaning and even the context of the given text, which in turn can be used for pattern identification (IBM 2016). With this technology, the machine will extract attributes from conference call transcripts, management discussion and analysis sections of 10-Ks, and other text data, and the extracted attributes are informative factors for audit analytics (Sun and Vasarhelyi 2016; Sun et al. 2016).
Deep Learning in Natural Language Classification
Besides understanding the meaning of the text, deep learning can be applied to classify it. To do this, the training text must contain labels indicating what category the text file belongs to. The trained deep learning model interprets the intent behind text and returns a corresponding classification with associated confidence levels. This technique is able to function in a number of auditing scenarios. For instance, we can develop a model to classify earnings conference call transcripts into two groups: “fraudulent” and “non-fraudulent.” To train the model, each conference call transcript in the dataset is labeled either as “fraudulent” or “non-fraudulent.” The trained model will be able to classify the future transcripts into “fraudulent” and “non-fraudulent” categories with varying confidence levels. The output from the classification process can subsequently be used to trigger a follow-up audit action. For example, transcripts classified as “fraudulent” may warrant a more careful examination by auditors who possess the skills to identify problematic items. Similarly, auditors could rely on the language classification model to analyze forum posts, comments, and conversations from social media like Twitter and Facebook to obtain supplemental audit evidence. To illustrate, tweets can be classified as events, news, or opinions and the categories can be further classified as criticism or praise.
Deep Learning in Speech Recognition
Speech, in the form of streamed or recorded audio like phone calls, interview conversations, and management presentations, presents another type of audit evidence. Deep learning speech recognition techniques bridge the gap between the spoken word and its written form. It analyzes the grammar and language structure of the speech and transfers the audio signal to transcript. This is a difficult task, as speech may contain errors, accents, and environmental noise. The voice interactivity can transform speech into searchable and analyzable audio data.
What Will It Make Less Efficient?
AI is a disruptive technology that is expected to change the way audits are conducted. The increasing use of drones (Appelbaum and Nehmer 2016) and intelligent software agents is gradually replacing humans in the workforce. Consequently, it does not come as a surprise that certain current practices will become obsolete. The currently existing technology is capable of collecting and analyzing entire databases of quantitative information of companies, rendering traditional sampling less effective. Existing auditing standards that require certain labor-intensive procedures will need to be updated to encourage companies and accounting firms to take advantage of AI in their audit procedures. This would decrease the probability of fraudulent activities, manipulations, and misstatements from eluding auditors. This is especially true when companies run the analytical models more frequently, if not continuously. As a result, the company will have a competitive advantage over external auditors. In order to counterbalance this, external auditors will need to be more involved with the companies' systems, which would negatively impact auditor independence. To understand this, consider a scenario where a company is running a continuous control monitoring systems. Such a system can act as a meta-control monitoring all the underlying internal controls. External auditors will face the challenge of either relying on such a system, possibly compromising their independence, or continuing to resort to sampling methodologies, and running the risk of lagging behind their client from an information perspective. To put it differently, if external auditors go with the latter, then they will be at a great disadvantage, as the client will have access to much more information. If they agree to rely on the continuous control monitoring system as a meta-control, thus considering its examination sufficient, then they will be able to provide a better assurance, however at the expense of their independence. Current auditing standards emphasize the concept of auditor independence (Peterson 2016). However, if less independence can result in better assurance, then should the standards be modified in that direction? Future research needs to investigate this matter.
Another change AI can bring is the obsolescence of manual preprocessing and examinations of certain documents (e.g., contracts). Using various AI methodologies, such as text mining and DNN, these procedures will be replaced by automated AI analytics that are likely to produce more accurate and more efficient results. Another aspect that can be impacted by AI is the training provided to new auditors. Some of the skill sets that are developed today will become inefficient in the AI era. Training to run sampling techniques for example, will give way to learning various AI methodologies. Accounting curricula will need to be adapted to accommodate the new requirements of the future auditor (Peterson 2016). Accounting firms will hire more data scientists to train auditors to take advantage of AI more efficiently and effectively. AI experts used to be hired by academic institutes. Nowadays there is an increasing trend of tech firms and accounting firms hiring some of the best AI experts (The Economist 2016a).
How will AI affect the training necessary for auditors to take advantage of AI technologies?
Should the standards become more lenient in regard to auditor independence?
THE FORMALIZATION OF AUDIT THROUGH AUTOMATION
Auditing can be discussed, as in many other automation efforts, in two major components: first the repetitive automatable work and, second, judgments of many types. It may be argued that the wide variety of corporate measurement methods of lines of business or business processes, of cultural environments, and of international constraints and practices creates a multidimensional context where automation and systematization cannot be performed. It is better to take this consideration as an explanation of the nature of the historical evolution of auditing, with professional individuality and judgment being used to deal with the enormous variance of the context. In other words, auditing became a process where the lack of repetitive systematic reasonable sub-processes led to the reliance on judgments and the adoption of the mentality of a “profession” not a production process.
Automation through Audit Sub-Processes/Audit as a Production Line
The inclusion of technology has dramatically changed the nature of audit work, however, surprisingly little work has been published in the line of production engineering (Parker and Lewis 1995) to rationalize and systematize work. Before devices that make decisions can be easily incorporated into the audit production line, substantive organization and formalization must be performed. Internally, large CPA firms have progressively developed methodologies that they consider proprietary in this direction, which they consider to be part of their competitive advantage.
As obstacles to the wider rationalization and formalization of the assurance processes, many intrinsic issues must be considered:
Bill-by-hour: Although the rigid bill-by-hour approach has been progressively replaced by more comprehensive pricing strategies, the incentives for automation and reduction of the labor involved are significantly less than in other industries.
Rigidity of the standards: The standards being derived from processes when technology was very different, has led to anachronistic manual processes. Since the Sarbanes-Oxley Act of 2002, they have been aggressively enforced by tight examination of anachronistic rules.
Formalization of audit steps: It is possible to conceive that certain parts of the audit could be standardized and automated through the adoption of audit data standards (AICPA 2015a, 2015b, 2015c, 2015d), standardized audit planning, formalization of assertions, adoption of quantitative schemata of weighting evidence, closer integration with corporate monitoring processes, etc.
With regard to the last point, it helps to revisit the AI-enabled auditing process from a production line perspective, analogous to production assembly lines. In such a process, the output of one phase becomes the input of the subsequent one. Below is a general audit process, comprised of seven phases (Louwers, Ramsay, Sinason, and Strawser 2015). AI can help automate this process and transform it into a high-efficiency and highly effective audit line production process. The proposed automated phases are presented below:
Pre-Planning Phase: This phase involves acquiring initial knowledge of the client and their industry. AI can collect, aggregate, and examine Big Data from various exogenous sources. Next, AI will incorporate client's organization structures and their operational methods, as well as their accounting and financial systems. AI then estimates the initial risk level associated with that client.
Contracting Phase: Using the output from the previous phase (i.e., initial risk level), AI estimates the number of hours the engagement would require and calculates audit fees. Subsequently, AI refers to a database of previously analyzed contracts, and automatically generates a client-specific engagement letter. Both auditor and client sign the AI-prepared contract.
Understanding Internal Controls and Identifying Risk Factors: This step is important for the planning of all aspects of the audit engagement. Using text mining and image recognition techniques, AI analyzes client-provided flowcharts, narratives, and filled questionnaires. Rather than physically taking a tour, or a walkthrough, drones can be used to capture video footage, which can be analyzed to identify any anomalies. At this stage, AI uses pattern recognition and visualization methods to identify risk factors. Finally, all this information is aggregated to identify fraud and illegal-act risk factors.
Control Risk Assessment: This phase involves the examination of the client's internal control system design and implementation. An AI-based continuous control monitoring system examines the complete population of records to identify any control violations and reports them. In the case of a high number of violations, a ranking system can be implemented to prioritize the identified violations based on their level of riskiness (Issa 2013; Issa and Kogan 2014; Issa, Kogan, and Brown-Liburd 2016). Moreover, AI runs process mining (Jans et al. 2014) on the complete population to ensure that the internal control system is not only properly designed, but also configured and implemented correctly. Logs are generated automatically to ensure the integrity of the data and to prevent the tampering with such audit evidence. This step can be repeated as frequently as desired, which allows for a minimal response time if any control violations are found.
Substantive Tests: The difference between traditional auditing and AI-enabled auditing is most pronounced in this phase. Data provenance and data quality are examined as they are collected, eventually in real time. Rather than running a periodical test of details on a sample of the transactions, AI can examine 100 percent of the population on a continuous basis. The same idea applies to the test of details of balances. This continuous and comprehensive test of details decreases the likelihood of an abnormal record passing undetected. Moreover, by running continuously, the time it takes to identify such an anomaly is significantly decreased. The concept of exception prioritization that was discussed in the previous phase can also be applied to the substantive tests to help address large numbers of exceptions riskiness (Issa 2013; Issa and Kogan 2014; Issa et al. 2016). The incorporation of pattern recognition, visualization, benchmarks, and outlier detection methods on top of analytical procedures can greatly increase audit effectiveness.
Evaluation of Evidence: This phase will be included in the previous phase, due to the importance of ensuring data quality prior to running the substantive tests.
Audit Report: The final step in the audit process is the issuance of a verdict based on the findings from the previous steps. In traditional audits, the auditor issues a categorical opinion (clean, qualified, adverse, etc.) to the client company.
Table 3 below illustrates the comparison between an AI-enabled process and the traditional auditing process:
What are the phases of the audit that can be divided into a series of automatable production processes?
Would a different organization of the audit process be more appropriate to the AI-enabled audit (AIEA)?
Judgment as a Complement to Production Processes
Auditing is a profession whose activities are widely based on judgments, as opposed to well-defined and repetitive tasks, which are more prone to automation. The auditing profession has historically organized its structures hierarchically with many lower-level employees performing repetitive low-level tasks (ticking, extending) in verifying manual documents, and their hierarchical superiors examining (reviewing) these tasks and drawing conclusions, which are typically judgmental. With the advent of computers, in particular laptops that are taken to engagements, the manual nature of the work changed to where ten-key calculating machines are not used for totals and extensions, and in general there is more reliance on the mathematical accuracy of documents.
The progressive usage of networking in computing brought in shared electronic workpapers and an avalanche of supporting attachment evidence. This leads to a different form of cooperation and joint work by auditors. Seniors, managers, and partners now can review workpapers remotely and cooperate while these are being developed.
What are the subcategories of audit judgments?
Which of these can be formalized?
Which of these can be supported by expert systems/neural networks/deep learning methodologies?
How does the evolution of technology and its adoption affect the audit process? Is there a substantive amount of TPR?
WORKFORCE REPLACEMENT OR SUPPLEMENTATION?
Concerns about the negative impact of automation on employment have been raised numerous times in the past. Societies are uncomfortable with radical changes that are the result of disruptive technologies. From the first industrial revolution and the steam engine, to Edison's invention of electric lighting, workers worried about their futures, fearing replacement by a machine. A significant proportion of those workers in fact lost their jobs, but that mass job loss was counterbalanced by an even bigger mass job creation (The Economist 2014). In the past, automation has always brought in the destruction of certain occupations, only to give rise to even more new and different jobs that did not exist in the past (The Economist 2016b). The same concerns are coming back to life now with fear of increased computerization of company systems, namely in AI and robotics and aided by the internet (The Economist 2014). Several companies like Google are developing auto-piloted cars with increasing success. Technology has the potential to either assist or replace the human user. The question is: Which one will it be—workforce replacement or workforce supplementation?
To illustrate, consider the example of navigational technology in vehicles. GPS is one type of navigation technology that can be viewed as an assistant to the human driver. The GPS will recommend a certain route, but it is the driver who makes the decision whether to follow or ignore the recommendation. In this scenario, the assistant (GPS) provides the driver with a recommendation for a route. On the other hand, navigation technology can be a replacement of the workforce altogether, which is the case of auto-piloting vehicles. Other examples of work force replacement as opposed to supplementation include Amazon's delivery drones, which are still in the testing phase. Walmart Stores Inc. recently announced the introduction of smart shopping carts in the near future. The initial role of such carts is to assist the customer in locating the desired products. However, AI can evolve such technology to also collect additional sensory data (Soper and Pettypiece 2016). The carts can potentially replace the employee who monitors inventory levels and checks for empty shelves, even picking up and delivering products within the stores. Similarly, self-checkout kiosks are spreading in large stores like Home Depot Product Authority, LLC and Target Corporation. This system has already replaced a significant number of cashiers in those stores.
A recent study examines the likelihood of computerization and automation of 702 occupations in the U.S. The study finds that over 45 percent of current occupations are susceptible to automation within the next ten to 20 years (Frey and Osborne 2013). The likely replacement of accountants and auditors ranks very high, with a probability 94 percent (Figure 2).
The accounting profession has already been impacted by computerization, with the ubiquity of tax preparation software. As a result, it is not surprising to see an increasing concern within the auditing profession about the impact of automation brought by AI on employment. Automation of a task usually increases productivity, as computers and machines are much faster and more efficient than humans. In addition to that, they are less expensive, driving overall cost down (Pacific Standard 2015). The main question that needs to be answered by future research is: Will automation of certain audit procedures through the application of AI create more or destroy more? The factor that most determines the risk of automation is whether a task is routine or not, rather than manual versus white collar. Work can be divided into four different types: non-routine manual, routine manual, routine cognitive, and non-routine cognitive (The Economist 2016b). Routine tasks are performed in a similar way day in day out, as opposed to non-routine tasks, which change from day to day. Cognitive tasks require the utilization of our brains, while manual tasks involve physical activity. In cases of routine tasks, AI can significantly facilitate their automation. This is the case of tax preparation and form filling where automation has replaced humans.
On the other hand, it is very difficult at this point to automate cognitive non-routine tasks that would require fast adaptation. In such cases, AI can act as an assistant to the auditor by facilitating the performance of the task, while leaving the decision up to the auditor. The risk of computerization of the accounting and auditing professions, however, partially emanates from the possibility of breaking down complex tasks into smaller routine tasks, which can in turn be automated. Until recently, driving a car was considered to be a very complex task that could not be automated (Levy and Murnane 2005). However, several companies such as Google and Tesla Motors are currently developing auto-piloted cars with increasing success. Will the auditing profession encounter the same kind of automation? Or will they be assisted by AI? Auditors can supplement their judgment capacities with AI analytical power to reach a better-informed decision, and consequently provide a higher level of assurance. Initially, auditors will probably utilize AI in their analysis of Big Data. Intelligent software agents can continuously collect and aggregate data from various sources to provide the auditor with additional evidence and information that they can incorporate in their judgments. This is analogous to the GPS scenario. Eventually, AI will potentially replace auditors in various automated tasks, and become the self-piloted auditing robots.
Will automation cause workforce replacement or supplementation in the auditing field?
The substantive success achieved by deep learning in visual recognition, Natural Language Processing, and game playing, etc., has brought about the Spring resurgence of AI. AI has been thought of as a potentially valuable tool for Big Data analytics (Najafabadi et al. 2015; Chen and Lin 2014). The complexity and the repeatability of audit tasks, the multiple structures of source documents and data, and the requirement of professional judgments have made auditing lag behind business in the adoption of emerging technologies for a long time (Oldhouser 2016). AI has been brought to transportation, healthcare, security, home/service, and many other industries. Similarly, the Big 4 accounting firms are cooperating with AI systems providers to make these systems serve auditing purposes. This paper proposes areas for research where a broad range of AI-related research can be performed, and draws conclusions as to where this area of emerging technologies is most promising. It indicates that several progressively developing technologies, including scanning and OCR, electronic records and plentiful computing, cloud, block chain, smart contracts, large data stores, and plentiful computing, can serve as motivators of Technological Process Reframing (TPR) for auditing. It reviews the history of AI in auditing and analyzes the components of the conceptualization of AI for auditing: sensing, archiving, and predictive technologies, meta-controls/meta-processes, exogenous measurement, rapid detection of phenomena, integration of evidence, and the data for deep learning in auditing. The potential opportunities for the applications of deep learning, in the domain of image recognition, textual analysis, and speech processing, to auditing are explained.
The paper also discusses areas that will become obsolete because of the implementation of AI. Audit automation can be achieved with the support of AI through audit sub-processes/audit as a production line, with the judgment work as a complement to production processes. Finally, the paper raises the questions “will the auditing profession face the same type of automation? Or will it be assisted by AI?” and concludes that AI will potentially replace auditors in various automated tasks, and is capable of automatically designing the entire audit plan based on the situation of the client and the existing evidences, self-correcting mistakes, and continuously improving the audit process. Along with its discussion, the paper raises a series of methodological and evolutionary questions as to how today's world of audit will be transformed into the assurance of the future. Table 4 summarizes the research questions that were raised in this paper.
The limitation of the discussion paper is that it mainly centers on the feasibility side of the application of AI to auditing, with little discussion on the necessity side, which is of equal importance. Future research could shed light on why it is necessary for auditing to implement AI and what factors lead auditing to evolve. In addition, the paper does not explore the relationship between deep learning and traditional data mining techniques, such as logistic regression, decision trees, Support Vector Machines, and so on. While it is well known that deep learning technique outperforms the traditional data mining techniques for Big Data analysis, we do not know whether deep learning is more effective and less costly for structured data analysis.
After decades of frustration with its “Winters,” AI's Spring finally comes, providing massive and compelling benefits to various business industries. Auditing will also evolve with the application of AI. AI is not a technology for future, but the reality (LTP 2016). It would be wise for the auditing profession to actively embrace the brought opportunities and rethink how the audit will benefit from progressively increasing execution by machines. Moreover, the profession should examine how the role of the auditor will change with process automation, and harvest its potential to make auditing more effective, smarter, and easier.
—Miklos A. Vasarhelyi
Rutgers, The State University of New Jersey, Newark
As shown in Figure 1, the curve much better fits the data points at large sample sizes (100 and 200) by the weighted least square estimator. The classification accuracy increased rapidly from training size 5 to 50, while the accuracy did not increase significantly from training size 100 to 200. After that, the learning curve reached a steady state and did not change much in accuracy regardless of training size. The learning curve predicted 98 percent classification accuracy for the training data size of 1,000 per body class, with the observed actual accuracy at 97.25 percent.
The following deep learning techniques are provided by IBM Watson and Google.
The authors are appreciative for the suggestions of Andrea Rosario, Helen Brown-Liburd, Alexander Kogan, Deniz Appelbaum, and the research assistance of Jamie Freiman.