Changes in the audit profession after Sarbanes-Oxley, including mandatory audits of internal control over financial reporting and PCAOB oversight and inspection of audit work, have potentially changed the nature and extent of audit sampling in the largest accounting firms. In our study, “Behind the Numbers: Insights into Large Audit Firm Sampling Policies” (Christensen, Elder, and Glover 2015), we administered an extensive, open-ended survey to the national offices of the Big 4 and two other international accounting firms regarding their firm's audit sampling policies. We find variation among the largest firms' policies in their use of different sampling methods and in inputs used in the sampling applications that could result in different sample sizes. We also provide evidence of some of the sampling topics firms find most problematic, as well as changes to firms' policies regarding revenue testing due to PCAOB inspections. Our evidence provides important insights into current sampling policies, which may be helpful to audit firms in evaluating their sampling inputs and overall sampling approaches.
Over the last two decades there have been significant changes in audit approaches, including federally mandated audits of internal control over financial reporting for large public companies as a result of the Sarbanes-Oxley Act of 2002 (SOX). These changes have the potential to change the nature and extent of audit sampling techniques. Our recently published study, “Behind the Numbers: Insights into Large Audit Firm Sampling Policies” (Christensen, Elder, and Glover 2015), seeks to provide insights into the current state of audit sampling. To do so, we asked open-ended questions to the national sampling experts at the Big 4 and two other international accounting firms regarding sampling policies and practices currently in place at each firm. In this summary, we focus on important differences between the firms. For a more detailed discussion, see Christensen et al. (2015).
Our analysis of the firms' sampling approaches highlights important similarities and differences among the firms' policies. For tests of controls and details, the firms are divided among use of statistical and nonstatistical sampling. This variation in approaches among firms is different than earlier time periods when almost all firms either followed statistical approaches (Akresh 1980) or nonstatistical approaches (Sullivan 1992). We also report differences in the sampling inputs used by firms, thus resulting in different sample sizes, regardless of whether the firm follows a statistical or nonstatistical sampling approach.1 Depending on the level of assurance obtained from other audit procedures, differences in sample sizes raise the possibility that different levels of assurance are obtained to support audit opinions. Interestingly, most firms use identical sampling approaches and parameters for public and private clients despite the differences in business and engagement risk. We also report differences in error projection methods used and how firms respond to identified errors and misstatements. Finally, we show that some firms now rely more heavily on substantive testing using sampling when testing revenue (i.e., testing a sample of individual revenue transactions) than other substantive testing, such as analytical procedures.
Our study provides evidence on current sampling practices and identifies important differences in sampling policies among the largest audit firms. These findings provide insights into sampling policies and procedures that are important to better understand the application of audit sampling in the current audit environment. This evidence may also be helpful to audit firms in evaluating their sampling inputs and overall sampling approaches.
TESTS OF CONTROLS
Sampling in Tests of Controls: Application and Parameters
While sampling is not required to test many types of controls, firms replied that sampling is frequently used for tests that involve inspection or re-performance of manual controls, but is less frequently used to test controls that operate at the entity level or those that are automated. When deciding to use sampling in tests of controls, auditors choose between statistical or nonstatistical sampling approaches. According to auditing standards, auditors selecting a nonstatistical approach should arrive at a sample size that is “comparable to the sample size resulting from an efficient and effectively designed statistical sample, considering the same sampling parameters” (AICPA 2011, §530.A14; PCAOB 2003, §350.23). While either method is acceptable under auditing standards, statistical sampling requires a statistically acceptable selection method (i.e., random selection, but not haphazard selection) and allows the auditor to quantify sampling risk in evaluating the results of testing. Our study reports an equal division among the six participating firms' approaches in this regard. Based on survey responses, firm guidelines appear to either explicitly require the use of statistical methods or, when nonstatistical methods are permitted, include guidance based on statistical theory that results in these methods arriving at a sample size and conclusion similar to what would have been reached using a statistical method.2 Our survey did not address why a firm chose to use a statistical or nonstatistical approach.
Once the firm decides on the general approach (e.g., statistical versus nonstatistical), the sample size is calculated based on a set of inputs: desired confidence level, expected deviation rate, and tolerable deviation rate. Table 1, which is reproduced from our original study (Christensen et al. 2015), reports the typical values used by each firm for these key inputs, as reported by the respondents.
The range of 90–95 percent confidence is consistent with audit firms providing a high level of assurance (Christensen, Glover, and Wood 2012; AICPA 2012, §3.42), which AS 5 (PCAOB 2007) requires for integrated audits. Levels of confidence below 90 percent, such as reported by Respondent 1, could be used for non-integrated audits. Responses consistently indicated that engagement teams typically plan for zero deviations when calculating sample size for control tests. Regarding tolerable deviation rates, two respondents indicated 10 percent as a standard tolerable deviation rate, whereas the remaining respondents provided ranges, including 6 to 9.5 percent, 6 to 10 percent, and 5 to 10 percent.
Based on the inputs reflected in Table 1, the range of the sample size is from 22 (0 expected deviations, 10 percent tolerable deviation rate, 90 percent confidence) to 59 (0 expected deviations, 5 percent tolerable deviation rate, 95 percent confidence).3 While comparisons of sample sizes between firms is incomplete without the fuller context of the other audit procedures performed, differences in sample-size inputs reported by the firms could result in substantially different sample sizes.
Sample Selection Process
After determining sample size, the engagement team selects the items from the population to test. A variety of sample selection methods exist including random, haphazard, stratified, and systematic selection. Three respondents stated that random or systematic selection methods are preferred and encouraged, but haphazard selection is allowed. Of the five firms that permit haphazard selection, only one noted that such samples are penalized with larger sample sizes. It is important to note that haphazard selection is permitted by auditing standards (AICPA 2011, §530.A17; PCAOB 2003, §350.24) and the Audit Guide, Audit Sampling (AICPA 2012). However, with programs like Microsoft Excel, selecting a random sample is straightforward and there is some evidence that auditors may struggle to select unbiased samples using non-random methods (e.g., Hall, Higson, Pierce, Price, and Skousen 2012).
Evaluation of Results and Resolution of Deviations
When sample results indicate control deviations, engagement teams are faced with three options: (1) expand testing of the control, (2) test compensating or redundant controls, or (3) conclude that the control is ineffective, evaluate the severity of the control failure, and revise the nature, timing, and/or extent of planned substantive testing accordingly.
Two respondents indicated that if it is deemed effective to expand testing of the control, the sample size can be doubled. If no additional deviations are found in this larger sample, the auditor can conclude that the control is operating effectively. However, another respondent indicated that it is more common to modify planned substantive tests and noted that “we typically do not expand our sample because it is likely that we will continue to discover deviations in the expanded sample.” When the control in question has failed, several respondents noted the importance of identifying compensating controls. As one respondent noted very clearly, “[I]f these controls cannot be found or are found to not be effective, substantive testing will be expanded.” These responses suggest different firm preferences as to how to respond to deviations identified in the course of controls testing.
SUBSTANTIVE TESTS OF DETAILS
Sampling in Substantive Testing: Application and Parameters
While AS 5 has dramatically altered auditors' use of sampling for test of controls, other changes, such as PCAOB inspections, also have the potential to impact the application of sampling in substantive testing. Our study reports that sampling is commonly used when testing accounts that cannot be efficiently tested using specific identification testing, such as accounts receivable confirmations, inventory price testing, loan and deposit confirmations, and inventory test counts. Regarding the choice between statistical and nonstatistical sampling, four of the six firms emphasized the use of statistical sampling methods, with monetary unit sampling (MUS) being the dominant method used in practice.
As summarized in Table 2, which is taken from our original study, most respondents focused on three key inputs to determine sample size: required confidence level, tolerable misstatement, and expected misstatement.4 The required confidence levels varied both within and between firms, although the high end of the confidence range is consistently at or near 95 percent. The desired level of assurance from sampling is affected by the assessed account risk as well as the assurance provided by other tests. For example, Respondent 1 indicated that a confidence level of 30 percent would be deemed appropriate “when analytical procedures are effective and inherent and control risk are assessed as being low,” but 95 percent is appropriate when “the assertion subject to testing includes significant risks, control risk is high, and analytical procedures are ineffective.”
As indicated in Table 2, the firms differed in the extent to which misstatements were planned for in tests of detail sampling, which can substantially impact the calculated sample size. Finally, all respondents indicated that tolerable misstatement is set equal to or less than performance materiality. As with tests of controls, statistical and nonstatistical approaches are designed to yield similar sample sizes. However, differences in planning inputs such as those reported in Table 2 can result in significant differences in samples sizes, regardless of the sampling approach followed.5
Sample Selection Process
Sample items for tests of details can be selected by one of several methods including specific identification, stratification, random selection, haphazard selection, or systematic selection. Unique to tests of details, all respondents indicated that firm guidance either explicitly requires or encourages that all items greater than tolerable misstatement are selected for specific identification testing. This approach is consistent with guidance in the 2012 Audit Guide, Audit Sampling because these items can present high risk and are therefore tested separately from the items selected by applying sampling (AICPA 2012, paras. 4.11 and 4.18).
Regarding the selection of items that are not separately tested, three respondents indicated that systematic or random selection is used when the sample size is calculated using statistical methods, and haphazard selection (with some penalty) is used when nonstatistical methods are used.6 On the other hand, three other respondents indicated that various methods are allowed, but that no penalties are levied for the use of haphazard selection. Therefore, while haphazard selection is used across all participating firms, some firms impose a larger sample size for haphazard selection of nonstatistical samples and other firms do not.
Evaluation of Results and Resolution of Misstatements
We asked respondents whether firm policy requires a projection of identified misstatements to the population and, if so, what projection method is typically used. All respondents indicated that projection of errors is generally required by firm policy. The two methods most commonly referenced were ratio projection (applies the misstatement ratio observed in the sample to the entire population) and difference projection (projects the average misstatement of each item in the sample to all items in the population). One respondent indicated that both methods are used for each misstatement, and the larger of the two projected amounts is used. Another respondent indicated that the ratio method is preferred per firm guidance, but difference projection may be used if the misstatements relate more to the occurrence of a transaction and not the volume or dollar value.
While firm policy generally requires error projection, we also asked respondents how frequently they believe that misstatements are treated as anomalies and thus are not projected to the full population. One respondent indicated that firm policy explicitly prohibits this treatment, whereas another stated that isolation of errors occurs less than half of the time sampling is applied in substantive testing and that when it does occur, no consultation outside the engagement team is necessary. A third respondent identified a policy somewhere in between the first two. Taken together, responses indicate a fairly wide range of policies regarding error projection and isolation of misstatements.
Further discussion with respondents indicated that, consistent with prior research (e.g., Burgstahler and Jiambalvo 1986; Elder and Allen 1998) and PCAOB inspection reports (PCAOB 2008), engagement teams have difficulty understanding how to treat misstatements identified during testing when sampling is used. For example, one respondent said, “[T]eams sometimes fail to project an error because the sample error is relatively small, and they fail to recognize that a projected error coupled with sampling risk might result in a material misstatement.” Similarly, another respondent stated that “most auditors cannot manually recalculate the projection and do not understand which errors cause the large projection of an error.” Respondents' comments suggest that additional training in the logic underlying sampling and/or sampling templates (see Durney, Elder, and Glover 2014) may help improve auditors' ability to correctly project errors.
PCAOB versus AICPA Guidance
We asked respondents whether their firm has different sampling policies for audits performed under PCAOB auditing standards and those performed under AICPA auditing standards.7 Whereas two of the six respondents stated that different control testing policies exist for integrated and non-integrated audits, none of the firms indicated differences in the overall sampling approaches when performing tests of details. This similarity in sampling approaches across different entities subject to very different regulatory regimes is somewhat surprising given the fact that higher assurance levels may be required for public companies as auditors seek to reduce litigation and regulatory risk through additional audit effort (Badertscher, Jorgensen, Katz, and Kinney 2014; DeFond and Zhang 2014).
In recent years, the PCAOB has increasingly focused on revenue testing in the inspection and standard-setting process (Hanson 2013; Rand 2012). We asked respondents about their use of audit sampling in testing revenue and if the sampling policy for revenue is the same as for other accounts. One respondent stated that while substantive analytical procedures are permitted when testing revenue, auditors on PCAOB engagements are “required to also perform tests of details and the minimum sample size is 25.” Another firm “now strongly encourages test of details of the revenue account.” Two respondents stated that the use of sampling when testing revenue accounts is not uncommon, but that their firms do not have specific sampling policies for revenue. Finally, one firm's expert said, “[W]e do not typically use sampling to provide substantive evidence for income statement related accounts.” While in the past many firms may have relied in part on substantive analytical procedures to obtain assurance over revenue, based on these responses it appears that most participating firms now also use sampling in the testing of revenue (see Glover, Prawitt, and Drake  for a recent commentary on the regulatory impact on auditing revenue).
CONCLUSION AND LIMITATIONS
The concept of assurance obtained by examining items on a test basis referenced in the standard PCAOB audit report speaks to the importance of sampling during the performance of an audit of financial statements and internal control over financial reporting. Given regulatory changes brought about through the Sarbanes-Oxley Act of 2002 and the creation of the PCAOB, our study asked open-ended questions regarding firm-specific sampling policies and practices to the leading sampling expert from each of the Big 4 and two other large international firms. While we do not provide a detailed discussion of all results in this summary, Table 3 provides a comprehensive review of similarities and differences among the firms' approaches, along with their implications for practice.
We find that sampling methods differ significantly among the largest auditing firms; while some emphasize statistical methods, others use nonstatistical methods. Somewhat surprisingly, we find that each firm generally applies the chosen sampling method and sampling parameters for audits of both its private and public clients. Further, firms frequently use different inputs to these sampling models, thus potentially resulting in relatively different sample sizes. This variation in sampling approaches and inputs appears to be different than in previous time periods, and the variety of approaches used is interesting given the highly regulated auditing environment and PCAOB criticism of sampling in areas such as revenue (PCAOB 2014).
Nonstatistical methods are allowed under AICPA and PCAOB auditing standards. Although firms that use nonstatistical sampling were clear that their methodology was designed to result in sample sizes and sample evaluations that are similar to those determined using statistical sampling, additional guidance may be needed to ensure that conclusions reached using nonstatistical methods are similar to those reached using statistical methods. Due to the identified differences in sample size inputs, firms should also evaluate whether sample sizes are sufficient to achieve the level of assurance desired by the test. Finally, firms also often select samples haphazardly, and auditors may need additional guidance to increase the likelihood that representative samples are selected.
Additionally, we find differences among firms regarding the response to identified errors and misstatements. Sampling experts inform us that responding to and resolving identified misstatements is one of the biggest hurdles that audit engagement teams from all firms face when using sampling techniques, and auditors have also struggled to effectively resolve errors in the past (PCAOB 2008). Additional training and use of templates may assist auditors in projecting errors and evaluating sampling risk. In particular, firms appear to differ in the extent to which they allow identified errors to be treated as anomalies. While ISA 530 (IFAC 2009) notes that some misstatements may be anomalies, AU-C 530 paragraph 0.13 indicates that “the auditor should project the results of audit sampling to the population” (AICPA 2011). The AICPA Audit Guide, Audit Sampling (AICPA 2012, 4.101–4.104) provides guidance on when it may be appropriate to not project an error and the documentation necessary to support this decision. We recommend that guidance on the treatment and documentation of anomalies be specifically addressed in AICPA and PCAOB auditing standards. Finally, we present evidence that some firms have significantly changed their approach to revenue testing due to PCAOB inspections, relying more heavily on testing individual transactions selected by sampling than other substantive testing, such as analytical procedures.
Given the limited evidence on firms' sampling policies after the Sarbanes-Oxley Act, our study provides insights into sampling policies and procedures that are important for practitioners, researchers, educators, and regulators to better understand the application of audit sampling in the current audit environment.
Audit sampling is “[T]he selection and evaluation of less than 100 percent of the population of audit relevance such that the auditor expects the items selected (the sample) to be representative of the population and, thus, likely to provide a reasonable basis for conclusions about the population. In this context, representative means that evaluation of the sample will result in conclusions that, subject to the limitations of sampling risk, are similar to those that would be drawn if the same procedures were applied to the entire population” (AICPA 2011, §530.05; emphasis in the original). A full sampling application includes the following three stages: (1) the determination of sample size, (2) sample item selection, and (3) evaluation of results. A sampling approach is deemed nonstatistical if any one of the three stages is not consistent with statistical theory. For example, haphazard selection or judgmental evaluation of results would render a sampling application as nonstatistical.
Regardless of whether statistical or nonstatistical sampling is used, if the determined attribute sample size is appropriate given the statistical sample size planning parameters and the selection technique is statistically based (e.g., random selection), the results of a sample will be acceptable (i.e., provide the desired level of confidence and precision for sampling risk) whenever the observed sample deviation rate is less than the expected deviation rate used in planning the sample. Similarly, a larger than expected sample deviation rate indicates the sample results did not achieve the desired objective. This relationship of observed error to expected error does not always hold when testing monetary values.
Other factors were also mentioned, including extent of evidence from other procedures, risk of material misstatement, and audit posting threshold.
Respondents indicated that typical sample sizes ranged from 1 to 200 items, with most falling between 10 and 100 items. One respondent indicated a predetermined maximum limit, and only then in “limited low risk circumstances in testing revenue.” Most respondents indicated their firm has established nonstatistical minimum sample sizes (e.g., a minimum of 5 or 10) to be used for small populations.
PCAOB and AICPA auditing standards are similar in their requirements. However, audits performed under PCAOB auditing standards are subject to PCAOB inspections, while audits performed under AICPA auditing standards are subject to AICPA peer review requirements. PCAOB audits include integrated audits of the financial statements and internal control over financial reporting for accelerated filers, and financial statement audits for other issuers. Audits performed under AICPA auditing standards are mostly financial statement audits, although audits of financial institutions with assets above $1 billion ($500 million before 2005) also include an audit of internal control under the FDIC Improvement Act of 1991. Audits of governmental entities and nonprofits whose federal grant expenditures exceed reporting thresholds (currently $750,000) are also required to have a single audit that includes testing of internal controls and federal grant compliance, in addition to the audit of the financial statements.
We thank the sampling experts from the six participating audit firms for their time and participation in this study. Brant E. Christensen acknowledges funding from the Deloitte Foundation and Steven M. Glover acknowledges funding from the K. Fred Skousen Endowed Professorship.