During functional analysis (FA), therapists arrange contingencies between potential reinforcers and problem behavior. It is unclear whether this fact, in and of itself, facilitates problem behavior's acquisition of new (false-positive) functions. If problem behavior can come under the control of contingencies contrived between it and known reinforcers for which there is no direct history, then outcomes of reinforcer analysis (RA) should perfectly predict FA outcomes. This study evaluated the degree to which RA outcomes corresponded with FA outcomes for eight participants referred to a university-based outpatient clinic for problem behavior. For 75% (6 of 8) of participants, correspondence was imperfect. These findings appear to support the construct validity of contemporary interpretations of FA data.
Current publication practices (Beavers et al., 2013; Hanley et al., 2003), conventional wisdom (Hanley, 2012; Iwata & Dozier, 2008; Mace, 1994), and a growing research base (e.g., Hurl et al., 2016) suggest optimal intervention outcomes for problem behavior are most likely when pre-intervention assessments include functional analysis (FA, Iwata et al., 1994). During FAs, experimenters systematically establish and abolish the reinforcing value of programmed consequences (often isolated by stimulus class [e.g., attention, tangibles] or process [e.g., escape]) to probe the evocative and reinforcing effects these manipulations have on problem behavior. Conditions that consistently produce elevated rates of responding, relative to no-consequence control conditions, are purported to shed light on problem behavior's reinforcement history and are used to implicate its function(s).
Despite majority consensus about the superiority of FA to alternative approaches to function identification, few practitioners actually employ this technology in their day-to-day practice (Oliver et al., 2015; Roscoe et al., 2015). One reason for this may be that some stakeholders raise concerns about various aspects of FA methodology (e.g., Hastings & Noone, 2005; Martin, et al., 1999; Solnick & Ardoin, 2010; Sturmey, 1995).
Potential Concerns With FA Methodology and Interpretation
During FAs, practitioners deliberately evoke and reinforce problem behavior. Because reinforcement, by definition, increases the future probability of behavior (Catania, 2013; Cooper, et al.; 2007), a valid concern may be that FAs are counter-therapeutic and might intensify the nature of existing problem behavior in naturalistic settings (by increasing its rate or magnitude), or might shape or establish new functions for problem behavior (by delivering reinforcers not commonly provided in naturalistic settings). To evaluate the former, Shabani et al. (2013) collected data on problem behavior in each of four participants' classrooms during the 10 min immediately prior to, and following, FA sessions and compared pre/post rates. Although results were inconsistent across participants (with evidence suggestive of generalization in some cases and contrast in others), differences were generally negligible. Call et al., (2012) and Call et al. (2017) conducted similar experiments and produced similar results.
These studies shed light on the ways FAs may, or may not, influence baseline rates of problem behavior in naturalistic settings. However, they do not evaluate whether FAs are instructional by nature. That is, results from studies like Call et al. (2012), Call et al. (2017), and Shabani et al. (2013) cannot prove or disprove functions identified through FAs are the product of artificially contrived contingencies arranged during the assessment process. In fact, the results of other research (e.g., Galiatsatos & Graff, 2003; Rooker et al.; 2011; Shirley et al., 1999) lead us to question the validity of functions identified in certain conditions of the analysis.
For example, Galiatsatos and Graff (2003) and Shirley et al. (1999) found tangibles could maintain problem behavior during FAs, even though tangibles were never delivered for problem behavior during an extended series of descriptive assessments conducted in participants' typical settings. Similarly, Shirley et al. and Rooker et al. (2011) both found highly preferred tangible deliveries not observed in typical settings could increase rates of automatically-maintained problem behavior, relative to conditions in which no tangibles (or tangibles that did follow problem behavior in typical settings) were delivered. Finally, Rooker et al. demonstrated contingent tangibles were more effective at establishing and maintaining high rates of an arbitrary task than either contingent attention or contingent escape, suggesting tangibles were more likely to establish new functions for a target response than other reinforcers commonly programmed into FA conditions. The conglomerate of findings across these studies led Rooker et al. to conclude that tangible conditions should be omitted from most FAs of problem behavior and, when included, results from this condition should be interpreted with caution.
Importantly, across these studies, highly preferred tangibles were identified via preference assessments. By contrast, the specific qualities of attention and escape were never systematically evaluated. Although this aligns with how consequences are often selected for FAs (cf., Slocum et al., 2018), it represents a procedural inconsistency that allows for alternative interpretations of data published on this topic. For example, it may be that the contingent delivery of any highly preferred consequence, irrespective of stimulus class (e.g., tangible or attention) or process (e.g., positive or negative) would have been equally likely to establish functional relations that did not exist prior to the analysis. If functions identified via FAs are artifacts of learning that has occurred during the analysis (i.e., if FAs generate “false positives”, in the sense that functional relations did not exist prior to the analysis), it would be hard to refute claims that FA procedures are flawed and outcomes invalid (Martin et al., 1999). By extension, similar criticisms could be levied against the validity of intervention outcomes in studies that used FA conditions as baseline for treatment evaluations (e.g., Sturmey, 1995).
The prospect of FA creating the histories it is merely meant to detect has far-reaching implications. For example, if FAs do not reliably examine pre-assessment histories, then practitioners should be cautious about attributing historical significance to FA outcomes. Related, if FA methodology only evaluates the reinforcing effect of programmed consequences on target responses, then FAs are no different than reinforcer analyses (RA) and should not carry a special distinction. By extension, the practice of targeting and reinforcing problem behavior during FAs would be difficult to justify because the strategy would not add value to assessment outcomes and would be unnecessarily taxing on the social validity of behavior-change technology.
Similar to FAs, reinforcer analyses (RA) are experiments designed to evaluate whether programmed contingencies can maintain elevated rates, magnitudes, or allocations of responding across time (e.g., Buchmeier et al., 2018; Call et al., 2012; Cote et al., 2007; DeLeon & Iwata, 1996; Hagopian et al., 2000; Hodos, 1961; Quick et al., 2018; Tarbox et al., 2007). Unlike FAs, RAs do not purport to probe pre-experimental learning histories. In fact, experimenters will often provide instructions, forced-choice exposures, or both before initiating the analysis to ensure that negative outcomes implicate ineffective contingencies rather than insufficient experience (cf. Buchmeier et al., 2018; Call et al., 2012; Cote et al., 2007; DeLeon & Iwata, 1996; Hagopian et al., 2000; Hodos, 1961; Quick et al., 2018; Tarbox et al., 2007).
Pre-Experimental History With Reinforcers
Certainly, frequent contact with powerful reinforcers can facilitate a response's acquisition of new functions (cf., Rooker et al., 2011). However, during FAs, participants are not instructed to engage in problem behavior and forced-choice exposures to programmed contingencies do not occur. Rather, therapists arrange contingencies between potentially reinforcing consequences and problem behavior and probe to see whether manipulated discriminative stimuli (SD) and establishing operations (EO) evoke problem behavior. If a participant is not already inclined to engage in problem behavior following EO onset, then problem behavior should never contact programmed consequences and new learning should not occur. As the goal of FA is not to teach, but rather to examine what participants have already learned (by arranging the conditions most likely to evoke demonstrations of that learning), FAs should only produce functional relations when those relations existed prior to analysis.
Rooker et al. (2011) and Shirley et al. (1999) both identified circumstances under which FAs don't live up to this ideal by demonstrating that it is possible to teach new relations during the analysis. However, Rooker et al. and Shirley et al. targeted automatically maintained problem behavior. That is, they evaluated the effect superimposed contingencies of arbitrary reinforcement would have on the rates of problem behavior demonstrated to occur frequently in the absence of social consequences; thus guaranteeing frequent contact between it and superimposed consequences and therefore creating the very history the analysis was only meant to detect.
It is unclear whether arbitrary contingencies of reinforcement superimposed onto problem behavior with socially mediated functions would be equally as likely to implicate false-positive functions during FAs. This is because experimenters can more precisely control socially-mediated problem behavior (cf. responding in control conditions in Jessel et al., 2016; Jessel et al., 2018). Increased control likely decreases the potential for within-analysis experiences with historically irrelevant contingencies and may decrease the probability of producing false-positive outcomes during FAs of socially-mediated problem behavior (although, see Jessel et al., 2014). Thus, it remains unclear whether FAs of socially-mediated problem behavior can or do implicate false-positive functions.
To date, little empirical work has been done on this topic. Perhaps one reason for this stems from the difficulties associated with confirming or negating extra-experimental contact with programmed consequences. Although Rooker et al. (2011) and Shirley et al. (1999) used descriptive assessment (DA) results as comparisons against which to evaluate the ecological validity of tangible functions identified during FAs, tangible delivery is a fairly discriminable event and is relatively easy to quantify. By contrast, various qualities and magnitudes of attention and escape can be sufficient to maintain behavior (cf. Kelly et al., 2014; Lerman et al., 2002) and may be harder to detect during uncontrolled observations. Furthermore, individuals' histories are often complex (Beavers & Iwata, 2011) and schedules of reinforcement in the natural environment can be lean and unpredictable (Call et al., 2017). When behavior occurs with any degree of variability, even DAs that include extended-duration observations (e.g., 60 min) are unlikely to generate representative samples of real-world circumstances (Tiger et al., 2013) or reliably isolate the antecedent and consequent events responsible for maintaining problem behavior (Pence et al., 2009).
Thus, we concluded it is problematic to suggest functions implicated during analogue FAs are “false” because target consequences are not observed (or correctly quantified) during formal DAs. This conclusion presents a problem when evaluating the validity of contemporary interpretations of FA data because there does not appear to be an acceptable standard for confirming or negating pre-experimental reinforcement histories, outside of the analysis itself.
Construct validation is a term which describes the process by which investigators defend inferences about data when attributed meaning extends beyond obtained results (Cronbach & Meehl, 1955). To engage in this process, investigators propose nomologicals that theoretically connect constructs to observed outcomes and inform falsifiable hypotheses that can later be tested. When correspondence between predicted and obtained outcomes is high, there is evidence that interpretations of data are correct and that assessments measure what they are intended to measure (Yoder et al., 2018). Although construct validation is typically done through statistical analysis of data obtained through group design (cf. Yoder et al.), the concept of divergent construct validity may hold relevance to the current issue. Divergent validity is demonstrated when a relevant variable is not associated with outcomes with which it should not be associated.
For example, we can state that persistent patterns of behavior are a product of experience with reinforcement. From this, we can hypothesize that contingencies of reinforcement that have been learned will control behavior whereas contingencies of reinforcement that have not been learned will not (even though they could, given sufficient experience). We might also say that both FAs and RAs are designed to empirically demonstrate the control that reinforcement can have over behavior but only FAs are designed to detect pre-experimental experiences with reinforcement. Said differently, in order to establish a functional relation during an RA, the programmed contingency need only be reinforcing. However, to establish a functional relation during FA, the programmed contingency needs to be reinforcing and the individual would theoretically need to have experienced the contingency prior to the FA.
As problem behavior is unlikely to be maintained by all known reinforcers (for indirect evidence of this, see discussion of the prevalence of multiply controlled problem behavior in Beavers & Iwata, 2011), every functional relation established during RAs should not be replicated during FAs. Therefore, if we conduct both analyses and compare outcomes across a number of participants, we can evaluate (indirectly) the validity of contemporary interpretations of FA data in ways that are analogous to how others have assessed divergent construct validity.
Specifically, if RAs and FAs are functionally equivalent, then RA results should consistently and perfectly predict FA results across participants. By contrast, if RAs and FAs each identify what they purport to identify (i.e., reinforcing contingencies in the case of the former and experience with said contingencies in the case of the latter), then RA results should serve as unreliable predictors of FA results. Each time the outcomes of RAs and FAs do not perfectly correspond, we accumulate evidence that they measure different things and demonstrate that something more than a simple contingency of reinforcement is needed to establish functional relations between reinforcers and target responses during FAs (presumably pre-experimental experience).
Thus, the purpose of this study was to compare results of RAs with results of FAs when implementers and systematically identified consequences were held constant across assessments. We propose that imperfect correspondence between assessments across participants would contribute divergent construct validity to contemporary interpretations of FA outcomes.
Participants and Setting
We included the first eight individuals referred for services at a university-based outpatient clinic who (a) engaged in problem behavior hypothesized to be socially mediated during pre-assessment interviews, (b) for whom informed consent could be obtained, (c) who completed RAs and FAs with matched tangible, attention, and escape conditions, and (d) for whom acceptable interobserver agreement (IOA) and procedural fidelity scores could be obtained. Participant ages, parent-reported diagnoses, and researcher-generated ratings of communication skills are displayed in Table 1. All preference assessments, RAs, and FAs were conducted in a clinical room equipped with table, chairs, condition-specific materials, and a one-way mirror.
The primary dependent variable during RAs was the rate of occurrence of an arbitrary behavior. The primary dependent variable during FAs was the rate of occurrence of problem behavior. Arbitrary behaviors targeted during RAs were individualized for each participant and ranged from wiping tables to clapping hands. To be targeted, arbitrary responses needed to be: (a) free operant, (b) easily performed by the participant, and (c) not automatically maintained. Similarly, problem behavior was individualized for each participant, based on caregiver report. Arbitrary and problem behaviors selected for each participant are listed in Table 1. Definitions are available in this study's supplemental materials section.
Using handheld computers with data-collection software, observers trained with the video-based protocol outlined by Dempsey et al. (2012) collected frequency data on arbitrary responses during RAs and frequency data on problem behavior during FAs.
A second observer simultaneously but independently collected data on all dependent variables for no less than 25% of total sessions conducted during each analysis (i.e., RA and FA) for each participant. We compared primary and reliability data and calculated IOA by dividing RA and FA sessions into 10-s bins, dividing agreements about the occurrence of each dependent variable within each bin by the sum of agreements and disagreements, and multiplying by 100. Bin scores were then averaged to generate session IOA scores. Average IOA for each participant is reported in Table 2.
Observers used yes/no checklists to evaluate therapist fidelity to programmed procedures during both RAs and FAs. We calculated fidelity scores by dividing the number of “yes” (marked when a procedure was implemented as programmed) by the sum of “yes” and “no” and multiplying by 100. Average fidelity scores for each participant are shown in Table 2.
All data for this study were obtained during the initial weeks of service provision for each participant, prior to the onset of intervention. All participants attended two appointments per week and each appointment lasted two hours. All participants progressed through the same general sequence of events. First, we conducted open-ended functional-assessment interviews (Iwata et al., 2013; O'Neill et al., 1997) with primary caregivers to operationalize problem behavior and identify an array of consequences that might plausibly follow problem behavior in naturalistic settings. We then organized these consequences into the classes of social interaction commonly arranged during FAs (i.e., attention, tangible, & escape).
Next, we incorporated variables from the above-mentioned categories into a series of preference and demand assessments to identify the consequence most likely to function as reinforcement for each class of social interaction. Then, we conducted RAs to test whether contingent access to, or escape from, identified events would function as reinforcement. Finally, we conducted FAs of problem behavior. With the exclusion of programmed breaks, the entirety of each appointment with each participant was dedicated to obtaining data from these assessments, which were conducted in the order and fashion described above and below.
To facilitate discrimination between conditions during RAs and FAs (Conners et al., 2000), we assigned a specific therapist (i.e., a graduate student in Special Education and Applied Behavior Analysis), who wore a specific colored shirt (e.g., red, blue, white), to each condition tested. We always conducted RAs prior to FAs, manipulated identical qualities and durations of antecedent and consequent events within and across the conditions of both analyses, and used the same schedule-correlated stimuli (e.g., therapists and colors) to signal the same active contingencies during both analyses. We did this to ensure that negative functions implicated during FAs weren't artifacts of insufficient exposure to SDs, EOs, or experience with consequences. That is, we arranged our experiment in this way to capitalize on learning that occurred during RAs in a way that would increase the probability of achieving perfect correspondence across analyses. Notwithstanding, we used different control techniques (i.e., reversal [RA] and multielement [FA]) to minimize the possibility of response induction attributable to similarities in experimental design.
Because we presumed that any response would quickly come under the control of contingent access to a reinforcer if explicit instructions were provided, or if experience with relevant contingencies was ensured, we provided both contingency reviews and forced-choice exposures during RAs of arbitrary responses but provided neither during FAs of problem behavior. Likewise, we conducted RAs using 2-min sessions and FAs using 5-min sessions. We did this to demonstrate it was possible to establish the reinforcing effect of programmed consequences in less time than what was allotted during FAs. Assuming non-correspondences between analyses would be due to positive functional relations established during RAs but not FAs, we made these modifications to address potential concerns about whether better correspondences would have occurred if FAs had been carried out for longer periods of time.
To identify each participant's highest preferred tangible item from arrays identified during interviews, therapists either conducted a multiple stimulus without replacement (MSWO; DeLeon & Iwata, 1996) or a paired stimulus (Fisher et al., 1992) preference assessment. To identify participants' highest preferred form of attention, therapists identified four potentially preferred social interactions using information from parent reports and informal observations and pitted them against each other in discrete-trial paired-stimulus social-interactions preference assessments adapted from procedures described by Clay et al. (2013). Likewise, to identify a low-probability demand for each participant (from which escape might function as a reinforcer), we identified five to ten potential low-probability demands using information from parent reports and informal observations and conducted discrete-trial demand assessments adapted from procedures outlined by Roscoe et al. (2009). Highest-preferred tangibles, highest-preferred attention, and lowest-probability demands for each participant are listed in Table 1.
Procedures for this analysis were adapted from procedures described by DeLeon and Iwata (1996). We conducted tangible, attention, and escape RAs (in that order) with each participant. Session duration was 2 min. During sessions, clinical rooms contained participants and condition-specific therapists. Prior to initiating RAs, therapists selected a single arbitrary response to reinforce across all RAs for each participant. Arbitrary responses needed to fall within participants' current repertoires (e.g., elbow touch, cup flip) and could not be automatically maintained. If a target response persisted across sessions during the first baseline condition of the tangible RA (suggesting automatic control), data from those sessions were discarded, a new response was identified, and a new RA was initiated. Only materials required to engage in target responses and those required to deliver programmed consequences were present during RA sessions.
Each RA entailed A (baseline) conditions and B (Fixed Ratio [FR] 1) conditions. During baseline conditions, therapists did not interact with participants nor did they react to targeted responses. Prior to the first session of each FR1 condition, participants were given 30 s access to programmed consequences (i.e., highly preferred tangibles, highly preferred attention, or no demands). At session onset, therapists denied participants access to highly preferred tangible items (tangible RA), removed their attention (attention RA) or began presenting low-probability demands using a two-step prompting hierarchy (vocal, manual guidance; 5-s inter-prompt interval) (escape RA). Following each target response, therapists produced 15 s access to programmed consequences (i.e., tangible, attention, or escape). During FR1 conditions of escape RAs, compliance produced brief praise and a new demand. If problem behavior occurred, it did not produce programmed consequences, nor did it delay reinforcement for target responses (anecdotally, problem behavior rarely emerged during RAs and did not occur contiguously with target responding).
Prior to the first session of every condition (baseline or FR1), therapists conducted contingency reviews concurrently with forced-choice exposures to programmed consequences (i.e., no consequence [baseline] or 15-s access to reinforcement [FR1]). Contingency reviews entailed verbal descriptions of programmed consequences for target responses (i.e., “when you do this, nothing happens” [baseline]. Or, “when you do this, this happens” [FR1]). Because therapists and caregivers both questioned whether a single contingency review would be sufficient to teach Becky and Diana about programmed consequences, therapists also provided within-session vocal reminders of relevant contingencies (i.e., “remember, when you do x, you get y”) on a fixed-time 30-s schedule during both baseline and FR1 conditions.
Therapists conducted a minimum of three sessions of each condition and implemented an ABAB reversal design for any RA in which rates of target responding increased in FR1 conditions relative to rates in baseline conditions. Therapists conducted a minimum of five sessions in FR1 conditions before concluding that a demonstration of effect had not occurred and discontinuing an RA.
Sessions were 5 min in duration. During FAs, every participant was exposed to traditional test (i.e., tangible, attention, escape) and control (play) conditions. A select subset of participants were also exposed to idiosyncratic test conditions (e.g., social avoidance, divided attention in a mixed-gender group context) implicated during interviews or informal interactions (Hanley et al., 2014; Schlichenmeyer et al., 2013). During play conditions, no demands were presented, participants were given continuous access to highly preferred tangibles, and attention was delivered at least once every 30 s. Prior to every test session, participants were provided with 30-s access to putative reinforcers. At session onset, reinforcers were removed, relevant EOs presented, and problem behavior produced 15 s access to programmed consequences. If responses targeted during RAs occurred during FAs they did not produce programmed consequences, nor did they delay reinforcement for problem behavior (anecdotally, target responses rarely emerged during FAs and did not occur contiguously with target responding).
With the exception of responses targeted, session duration, and experimental design, tangible, attention, and escape FA sessions were identical to tangible, attention, and escape RA sessions. Likewise, ignore sessions were identical to baseline RA sessions. Although we did not compare idiosyncratic FA conditions to RA conditions, we included them in the analysis to demonstrate that functions could be identified in cases in which problem behavior was insensitive to consequences programmed into tangible, attention, or escape conditions.
We interpreted preference assessment, RA, and FA results for each participant using visual analysis and summarized our findings in Table 1. We identified a case of correspondence when consequences that controlled target responding during RAs (i.e., produced three demonstrations of effect [as evidenced by deviations from level, trend or variability in responding following condition onset]) also controlled problem behavior during FAs (as determined by criteria described in Roane et al., 2013), and vice versa. We identified a case of non-correspondence if a consequence controlled responding during one analysis but not the other. Across participants, RA outcomes corresponded with FA outcomes for 50% (4 of 8) of tangible conditions, 50% (4 of 8) of attention conditions, and 62.5% (5 of 8) of escape conditions.
Importantly, when evaluating correspondence, we only compared conditions that manipulated identical parameters and qualities of consequences. Thus, if escape from demands (identified from previous demand assessments) did not function as a reinforcer during either analysis but escape from social interactions (identified during parent interviews and included as an idiosyncratic test condition during Becky's FA) functioned as a reinforcer during the FA (see Becky's data in Figure 1), we counted this as a case of correspondence (even though “escape” was implicated during the FA but not during the RA).
To inform discussions of construct validity, we considered cases of correspondence and non-correspondence against the full functional profile of each participant's problem behavior. Specifically, we proposed that cases of “perfect correspondence” (i.e., participants for whom RAs and FAs never produced instances of non-correspondence) contributed evidence to support assumptions of functional equivalence between RAs and FAs. By contrast, we proposed that cases of “imperfect correspondence” (i.e., participants for whom RAs and FAs produced at least one instance of non-correspondence) contributed evidence to support assumptions of functional asymmetry. Thus, using logic inspired by work on divergent construct validity, we proposed that a high percentage of perfect correspondences would challenge the construct validity of contemporary interpretations of FA data and a high percentage of imperfect correspondences would support it.
When considered in this way, obtained results generally produced outcomes which supported the construct validity of contemporary interpretations of FA data. For 25% (2 of 8) of participants (i.e., Becky and Ian), RA results predicted FA results (Figure 1); either by indicating that tangible, attention, and escape would function as reinforcers for problem behavior (Ian), or that they would not (Becky). For 75% (6 of 8) of participants (i.e., Adam, Carrie, Diana, Earl, George, & Heather) RA results did not predict FA results (Figure 2); either because they erroneously predicted tangible, attention, or escape would function as a reinforcer for problem behavior (Adam, Carrie, Earl, George, and Heather), or because they erroneously predicted attention and escape would not (Diana).
Before initiating either analysis, we hypothesized that Heather's self-injurious behavior (SIB) served a different function than her aggression and we conducted separate FAs of each (shown in the bottom right panels of Figure 2). During the FA of SIB, responding did not demonstrate sensitivity to programmed consequences. Conversely, Heather's aggression quickly came under the control of contingent attention.
The purpose of this study was to evaluate the likelihood of analogue FAs shaping new functions of problem behavior during the assessment process. Using logic akin to that used in studies of divergent construct validity, we proposed that imperfect correspondence between RAs and FAs would demonstrate that the practice of contriving contingencies between problem behavior and known reinforcers, in and of itself, is likely insufficient to establish new functional relations during FAs (i.e., to produce “false positives”). After completing matched RA and FA conditions for eight participants, we found perfect correspondence between RA and FA outcomes for only 25% of cases, which represented the extremes of what was possible (i.e., all consequences functioned as reinforcers [Ian], or no consequences functioned as reinforcers [Becky]). We interpret this as low and propose that our findings indirectly validate conclusions drawn about behavior function derived from traditional FAs of problem behavior conducted in analogue settings.
This study's limitations should be noted. First, we did not track rates of problem behavior during RAs or rates of arbitrary responding during FAs; so, it is impossible to determine whether non-targeted behavior occurred during either analysis. Notwithstanding, we do not see this possibility as a major threat to the study's findings. Had RA targets occurred during FAs, we would have treated them like precursors to problem behavior, allowed them to contact extinction, and expected an escalation to culminate in the occurrence of problem behavior (e.g., Smith & Churchill, 2002). Conversely, if problem behavior occurred during RAs, it is possible that extinction suppressed its occurrence to a level from which it never recovered during the matched FA condition(s), potentially providing an alternative explanation for false positive functions identified by RAs. However, clear functional relations between problem behavior and maintaining consequences were obtained in the first iterations of non-synthesized FA-test conditions during 66.7% (6 of 9) of our FAs; a percentage that falls well above what might be expected (i.e., 40–50%), given previous work on the topic (Hagopian et al., 2013; Lambert et al., 2017; Slaton et al., 2017). This suggests that problem behavior was at strength at the onset of FAs.
Another limitation of our study was that RAs and FAs differed in several ways. For example, we did not attempt to match problem-behavior-response effort or duration when identifying arbitrary responses during RAs. Furthermore, we used different control techniques across assessments (i.e., withdrawal [RA] and multielement [FA]). Finally, our population was fairly heterogeneous and it is unknown whether a more homogenous sample would have produced greater correspondence. Future researchers may wish to explore whether specific participant characteristics (e.g., age, diagnosis, language ability, sophistication of problem-solving repertoires, etc.) moderate the probability of assessment correspondence.
Notwithstanding these limitations, our results provide evidence suggesting that FA-of-problem-behavior outcomes are a function of more complex variables than the simple arrangement of contingencies between problem behavior and known reinforcers. Although our data do not allow us to examine why, one of two explanations seems plausible. First, participants simply never learned that problem behavior would produce programmed consequences during the FA. Alternatively, it is possible that certain response topographies are less sensitive to some forms for reinforcement than others. In both cases, it does not appear as though problem behavior is likely to come under the control of contingencies of reinforcement that have been contrived but not experienced. We interpret these results as evidence which supports the construct validity of FA methodology. That is, we suggest that low correspondence between RA results and FA results lends validity to the assumption that functional relations identified during FAs implicate important aspects of individuals' idiosyncratic learning histories.
As a point of discussion, our results appear to contribute to a growing body of evidence that cautions practitioners from using preference- or reinforcer-assessment results, or FAs of alternative behavior, as replacements for FAs of problem behavior. For example, Schieltz et al. (2010), LaRue et al. (2011), and Lambert et al. (2018) all compared the results of FAs of socially appropriate behavior under motivational control (i.e., “mands”) with FAs of problem behavior and found varying degrees of correspondence (75% in LaRue et al., 20% in Schieltz et al., and 50% and 0% in Lambert et al.). Similarly, Berg et al. (2007) compared preference assessment results with FAs of problem behavior and obtained 75% correspondence. Our own findings (i.e., 25% perfect correspondence) tell a similar story. The conglomerate of these findings suggests that FAs of alternative behavior (e.g., arbitrary responses, mands, choice responses) provide unreliable predictions of the functions of problem behavior.
Perhaps the most obvious reason for this stems from the fact that reinforcers responsible for maintaining problem behavior represent only a small subset of the total reinforcers available to a given context. Because different responses produce different consequences and because access to problem behavior's functional reinforcers would not necessarily abolish alternative reinforcers (for example, see the behavioral-economics concepts of commodity substitutes and compliments [Madden, 2000]), it follows that alternative responses (e.g., choice responses, mands, etc.) could be evoked by EOs that do not evoke problem behavior and that problem behavior's abolishing operations might not abolish those alternative responses.
From a practical standpoint, our findings may be useful because they underscore the related but distinct purposes of FAs and RAs. Specifically, RAs are designed to evaluate the potential reinforcing impact of contingent consequences. Although FAs are capable of identifying powerful reinforcers, they are more likely than RAs to “miss” them because analysis is constrained to the subset of consequences hypothesized to be functionally related to problem behavior. Thus, RAs are the more appropriate analysis when practitioners need to identify consistently effective instructional consequences (e.g., when working in skill-acquisition paradigms). By contrast, RAs and preference assessments will, on occasion, accurately predict functions of problem behavior (cf. Ian's results), and can be useful for identifying consequences for inclusion in the experimental conditions of FAs (e.g., Slocum et al., 2018). However, FAs are the more appropriate assessment when awareness of pre-experimental histories is necessary to inform treatment components whose efficacy depends on function.
The potential adverse repercussions of assuming equivalence between RA results and FA results may be best typified by Heather's case. Not only did RA results incorrectly predict the function of either topography of problem behavior (i.e., SIB and aggression), we also did not have a clear method for identifying which topography to assign the implicated function. Had we not thought to conduct separate FAs for SIB and aggression, we may have assumed that both shared the same function. Likewise, had we equated RA results with FA results, we may have assumed that either or both had a tangible function. All of these assumptions would have been wrong and function-dependent treatment components (e.g., extinction) informed by them would be unlikely to generate favorable therapeutic outcomes (Iwata et al., 1994).
A few additional points may be worth discussing. First, we anticipated that the most likely disagreement between analyses would occur when RAs identified a reinforcer that did not maintain problem behavior during FAs (i.e., RAs would implicate a false-positive function). Thus, we arranged our experiment in such a way to decrease the possibility of this occurring. That is, we always conducted RAs before FAs and FA sessions (5 min) were 2.5 times longer than RA sessions (2 min). Thus, participants had always experienced programmed consequences multiple times before any FA began. Furthermore, in cases in which RAs identified reinforcers that did not maintain problem behavior during FAs, there was clear evidence that exposure to programmed EOs could evoke high rates of independent responding across a time frame (i.e., 2 min) that was much briefer than was allotted during FA sessions (i.e., 5 min). As a result, it would be hard to argue that lack of responding during FA sessions was due to insufficient exposure to EOs (as is sometimes the case; cf. Kahng et al., 2001).
Because we anticipated that RAs would only implicate false positive functions for problem behavior, we were surprised to see that they implicated a false-negative for Diana. That is to say, Diana would not clap to escape from low-probability demands but she would aggress. Although we cannot draw strong conclusions from her data because RA sessions were considerably shorter than FA sessions (and Diana may have eventually clapped with more time), it is worth noting that RA sessions were conducted in quick succession (thus, Diana was exposed to 10 min of near-continuous demands). It is also worth noting that Diana was reminded she could clap to escape every 30 s (and had experienced this contingency during forced-choice exposures) and that these same demands evoked aggression during the first escape session of her FA (unfortunately, we did not track aggression during her escape RA and cannot say whether aggression occurred, and perhaps competed with clapping, during that assessment). Thus, although any conclusions about her results would only be speculative, the possibility that reinforcers might maintain some topographies of behavior but not others (in spite of instructions and direct experience with relevant contingencies) suggests a certain inflexibility in operant membership that might further protect against invalid conclusions of problem-behavior function drawn from FA data. This could be worth exploring in future research.
It is likely important to note that tangible functions of problem behavior were only identified during 37.5% (3 of 8) of cases and were identified for less than half of participants for whom sensitivity to tangible reinforcement was demonstrated during RAs (i.e., 7 participants). Tangible functions were less prevalent than escape functions, (4 of 8 cases) but were more prevalent than attention functions (2 of 8 cases). In a review on multiply controlled problem behavior conducted by Beavers and Iwata (2011), the authors noted a relatively high percentage of confirmed tangible functions (83.7%) amongst FAs analyzed, when they considered the number of confirmed functions (41 cases) against the number of opportunities to evaluate function (49 FAs which included a tangible condition). Beavers and Iwata, and Rooker et al. (2011), both presented two hypotheses to account for this. One was that researchers only included tangible conditions in published FAs when there was good reason to believe that problem behavior served a tangible function. The other was that tangible conditions are more likely than other FA conditions to produce false-positive functions.
Although the findings of Rooker et al. (2011) and Shirley et al. (1999) suggest the latter may be the case for automatically-maintained problem behavior (but see Dozier et al., 2013 for contradictory findings), our own results (both in terms of general non-correspondence between tangible RA and FA outcomes and in terms of the moderate percentage of cases in which tangible functions were identified) suggest a more plausible account is that published FAs have only included tangible conditions when tangible functions have been suspected and have not included them otherwise. To the extent to which this is true, including tangible conditions in FAs of problem behavior may not be as detrimental to the validity of FA results as previously indicated. However, as we did not explicitly design our study to test either hypothesis, additional research on the topic appears to be warranted.
We thank Rachel Mottern, Andrea Perkins, Casey Chauvin, Gillian Cattey, Kate Chazin, Kristen Stankiewicz, Lilly Stiff, Sara Sheffler, Sarah Reynolds, Sarah Shaw, and Naomi Parikh for their assistance collecting and summarizing data for this study and for coordinating and delivering high-quality therapeutic services for participants following this study's conclusion. We also thank the Treatment and Research Institute for Autism Spectrum Disorders for providing clinic space and Jessica Torelli and Paul Yoder for their feedback on an earlier draft of this manuscript.