Walsh and Kastner (2006) have written a detailed article setting out their concerns about the integrity and interpretation of the data in Conroy, Spreat, Yuskauskus, and Elks (2003). I have tried to avoid simply repeating the critique that they have made. I was a reviewer of their original submission and had access to the database this Journal supplied. I replicated the analyses described in Conroy et al. (2003) and in the submitted critique and reported on the accuracy of the data contained in both papers in my review to this Journal's editor. I also formed my own view about the criticisms made by Walsh and Kastner (2006) and similarly reported these. I will be brief in describing my independent verification of the thrust of their criticisms as set out in the published article because many of the points of departure I had with the submitted critique have been dealt with in its revision. In my opinion, the analyses of the supplied database reported by Walsh and Kastner (2006) are accurate.
Rather than unduly re-emphasizing points made in Walsh and Kastner (2006), I will comment on what Conroy et al.'s (2003) study might tell us, how this might fit with a more general understanding of the outcomes of deinstitutionalization, and where the research agenda on residential supports should be heading. Conroy et al.'s conclusions are very positive. They interpreted their findings as demonstrating significant benefit, almost without qualification. They stated:
This study adds to the growing body of empirical literature that attests to the benefits of community living. … Members of the Hissom Focus Class, who were transferred … to small, community-based homes … evidenced increased skills, were more integrated into their communities and families, and received more services. (p. 271)
In the remainder of this commentary, I discuss whether such unequivocal conclusions are justified, and what alternative conclusions might be reached if they are not. I focus on the three outcome areas highlighted above: change in behavior, community integration and contact with families, and receipt of services. I end by suggesting the need to move on from a research emphasis on comparing institutional and community-based provision to one directed towards understanding the determinants of quality of life outcome within community settings.
Status of the Data and Appropriate Analytic Methods
Anomalies in the Data
One of the problems in assessing what reliance might be put on the findings of the Conroy et al. (2003) study is that there are anomalies in the published results and the supplied dataset that cannot be resolved. I replicated the analyses of the adaptive behavior scores as described by Conroy et al. together with those based on 31 of the 32 behavioral items (see Walsh & Kastner, 2006) and was not able to reproduce or divine the basis of the adaptive behavior score results stated on p. 267 of the published article or two thirds of the factor means and SDs provided in Table 2 (p. 268). In doing so, I independently discovered that the apparent adaptive behavior scale summary variables in the dataset were not the sum of the individual items. Moreover, although some agreement with what is published can be found by accepting that the factor analysis was performed on 31 rather than 32 items (i.e., one can replicate the proportions of variance explained (p. 267) and Table 1 (p. 268) of the 2003 article), I also discovered that the apparent factor summary scores within the dataset sum to the totals of the 32 behavioral items at both time points. It is not possible to resolve such confusions without further input from the investigators.
There are difficulties documented by Walsh and Kastner (2006) in relation to other variables. It is difficult to see how a 16-item scale measuring challenging behavior scored as described can produce a maximum score of 100. Measures of productive use of time and receipt of services were arbitrarily constrained to a two-digit maximum when the sample lived in the institution but not after they moved to the community. Certain high values postmove were also extreme and would require explanation as to how they could possibly arise. The range of values for the family contact measure postmove (up to and including 27) exceeded the maximum possible (18), according to Conroy et al.'s statement in the Methods section of the article (p. 266). Like Walsh and Kastner (2006), I could not understand how the way of analyzing the family contact measure described by Conroy et al. (2003) could be applied to the data in the dataset, and I could not replicate their analysis.
There is inconsistent reporting within Conroy et al. (2003) of the psychometric properties of the measures used. Validity is rarely discussed. Evidence of internal consistency, test–retest reliability, and interrater reliability is variously provided. Walsh and Kastner (2006) raised a number of points of criticism in relation to this. My main concern here is to emphasize the importance of interrespondent agreement. In a pre–post evaluation of altered provision arrangements, such as that undertaken by Conroy et al., change in personnel acting as respondents is almost inevitable, particularly as any continuity across the two types of provision will also be affected by staff turnover. Being able to gauge whether the apparent change between arrangements over time is more or less than the scale of difference between respondents within the same provision arrangements at the same time is helpful in interpreting whether the altered provision arrangements have really made a difference. Knowledge of interrespondent reliability is required.
Conroy et al. (2003) did acknowledge that third party informants differed between the two data-collection points and recognized this as a potential problem (p. 266). However, they implied that this is not an actual problem in their study because “in general interrater reliability on the components of the questionnaire appears satisfactory” (Fullerton, Douglass, & Dodder, 1999, p. 266). Neither Conroy et al. nor Fullerton et al. stated what they mean by the term interrater—whether it referred to agreement between two people independently rating an interview with the same respondent or agreement between two informants being independently interviewed. My understanding of the meaning of the term would be that it would be the former. What little description of procedure given by Fullerton et al. (1999) is consistent with this interpretation. If so, Conroy et al. (2003) were incorrect to suggest that satisfactory interrater reliability is relevant to assessing the scale of the problem due to disagreement between informants. Their potential problem remains a real problem.
In my own research using the Adaptive Behavior Scale (Felce & Perry, 1996), I have found that differences in scores arising from interviewing two respondents closely together in time can be as large as typical change over time. In that paper we concluded that:
There is the problem of the unreliability in the accounts given by different respondents which renders the interpretation of small-scale changes virtually impossible. Unreliability in this study was not greater than that found in the original development of the scale. It must be considered to some extent as an endemic property of any interview scale. (p. 113)
Moreover, if this is true of assessments of adaptive behavior, the problem may be even greater in relation to challenging behavior because these assessments typically have lower reliability statistics than do scales of adaptive behavior. Greater caution is required by Conroy et al. in the interpretation of their behavioral findings.
Use of Parametric Statistical Tests
In general, Conroy et al. (2003) used parametric statistical tests without demonstrating that the data were normally distributed. In fact, Kolmogorov-Smirnov tests for normality in relation to most of the measures reported are highly significant, indicating skewed data. This was true for adaptive and challenging behavior, productive uses of time, and receipt of developmentally oriented therapy and services. Nonparametric tests should have been employed.
Are Any Conclusions Possible?
Reanalyzing the Data
As stated above, there are anomalies in the data that cannot be satisfactorily resolved. Ultimately, therefore, any conclusion has to be treated with caution. In the absence of any clarification from the investigators, it is perhaps useful to see what might be concluded nonetheless. The following discussion is based on the data in the dataset being accurate. Walsh and Kastner (2006) did raise some questions over this.
Conroy et al. (2003) reported an approximately 6-point difference in adaptive behavior scores between the two data-collection points (1990: M = 41.5, SD = 28.8; 1995: M = 47.3, SD = 29.5). Walsh and Kastner (2006) reported that the summary adaptive behavior scores in the dataset produced different means and SDs (1990: M = 34.9, SD = 24.2; 1995: M = 41.2, SD = 25.7). I have also found that summing the individual adaptive behavior items in the dataset at each point in time and summarizing these produces a third set of means and SDs (1990: M = 44.7, SD = 30.9; 1995: mean = 53.2, SD = 33.1). Although such disagreement is clearly not what would be expected from a carefully conducted analysis, the three analyses do point to the same conclusion. Moreover, this conclusion is reinforced by the new found consistency in factor score differences reported by Walsh and Kastner (2006). Analysis of the differences over time in these alternative total and factor scores using an appropriate nonparametric statistical test, a Wilcoxon signed-ranks test, shows them to be highly significant, all zs over .58 and all ps < .001. Overall, one might conclude, as Conroy et al. did, that adaptive behavior scores were significantly higher in 1995 than they were in 1990. Whether this might be due to systematic or chance differences in the way respondents reported behavior or to the passage of time would still need to be considered before assuming that it was due to the move from institutional to community living.
Reanalysis of the challenging behavior measures by nonparametric methods also produced similar results to the parametric analysis reported by Conroy et al. (2003). Using a Wilcoxon signed ranks test, I found that the frequency of challenging behavior in 1995 was significantly lower than in 1990, z = −2.71, p < .01, and the severity of challenging behavior in 1995 was significantly lower than in 1990, z = −4.85, p < .001. Again, whether such differences could be attributed to reporting bias, interrespondent unreliability, or maturation would need to be considered.
Similar reanalysis of the significance of the differences reported by Conroy et al. (2003) on productive uses of time (Table 4, p. 269) also produced results that agreed with the parametric analysis. However, as reported by Walsh and Kastner (2006), there was a different conclusion in relation to receipt of 2 of the 9 developmentally oriented therapy and services examined. Receipt of homemaker services did not significantly differ and receipt of occupational therapy was significantly lower in 1995 than in 1990. (These differences compared to the original article are found even if 1995 values above 99 are not recoded to that value.)
In general, with the exceptions noted above, reanalyzing the data more appropriately did not alter the majority of significant differences highlighted, albeit that one might appropriately be more cautious in assuming, in the absence of any control over alternative explanations, that they were attributable to the changed provision arrangements.
Interpreting the Scale of Change
Demonstrating that a difference between two measurements is statistically significant establishes a reasonable degree of confidence that the difference is real rather than an artifact of inherent variation. It does not imply that the difference is large or significant in any other way (e.g., clinically significant or socially valid). One general criticism of Conroy et al. (2003) is that they relied solely on the statistical significance of differences between means to convey the sense of scale of change. Although I would emphasize that an attention to statistical significance is important, in that one does not have a difference worth reporting without it, statistical significance alone cannot convey a complete understanding and interpretation of the data. Measures of effect size, descriptions of how the data were distributed, and analysis of how many people's scores on a particular indicator improved, stayed the same, or deteriorated are all useful to helping readers appreciate the meaning of significant changes.
Walsh and Kastner (2006) have demonstrated that, notwithstanding the statistical significance of differences, effect sizes for the changes in adaptive and challenging behavior reported are small. In my analysis of the challenging behavior data, I found that improvement was limited to about half the sample. In terms of frequency of challenging behavior, 132 people improved, 85 deteriorated, and 37 were unchanged. In terms of severity, 143 people improved, 62 deteriorated, and 49 stayed the same. There can be little doubt that clinically significant levels of challenging behavior remained after the move to the community.
Findings in relation to productive uses of time and receipt of developmentally oriented therapy and services provide a further illustration that statistical significance can mask what might otherwise be viewed as rather mixed or marginal change. For the majority of indicators, the median number of hours of service input was zero at both points in time. This means that generally over half of the sample did not receive the particular service input at all, even under the better provision arrangement. Moreover, the number of people whose service input did not change was also usually large. The numbers of ties for prevocational services, sheltered employment, supported employment, competitive employment, homemaker services, occupational therapy, physical therapy, psychotherapy, psychiatry, and audiology were 133, 146, 188, 244, 221, 151, 145, 158, 228, and 213, respectively. Only with respect to habilitation training, communication training, and nursing were ties in the minority (7, 67, and 56, respectively). Overall, there was a trend towards greater sheltered, supported, or competitive employment after the move to the community, but these changes involved only 43%, 26%, or 4% of the sample, respectively. Using nonparametric tests, I found that receipt of habilitation training and psychotherapy increased; receipt of occupational therapy, physical therapy, audiology, and nursing services decreased; and receipt of homemaker, psychiatric, and communication training services did not significantly differ between 1990 and 1995. The three changes affecting the majority of the sample highlighted above were equally distributed between being significantly positive, nonsignificant, and significantly negative.
A relatively large effect was reported on a minimal measure of community integration, the proportion of people who had some community experience in the week prior to data collection. Doubt has been voiced as to whether community presence, in itself, represents any measure of integration (see Cummins & Lau, 2003) or whether heightened use of community amenities leads to change in other, more relevant indicators, such as expanded social networks or relationships (see Robertson et al., 2001). Therefore, although the effect size may be large, the measure may provide such a weak indictor of community integration that the overall social significance of the change is small. (I have not been able to assess the status of the family contact results. Conroy et al., 2003, stated that people who had no family contact were excluded from the analysis, but it was not clear how to identify such people in the dataset.)
Possible Conclusions in the Light of Existing Literature
Earlier, I restated the conclusion reached by Conroy et al. (2003) that people moving from an institution to supported living in the community evidenced increased skills, were more integrated into their communities and families, and received more services. In relation to increased skills, one might accept that there was a small increase in adaptive behavior, with a degree of caution due to the fact that informants differed across data-collection points, and there was no control for maturation. Such a conclusion would be consistent with the existing literature, in which researchers have usually, but not always, found a step change in adaptive behavior scale scores following deinstitutionalization (see Emerson & Hatton, 1996; Felce, 2000; Kim, Larson, & Lakin, 2001; Young, Sigafoos, Suttie, Ashman, & Grevell, 1998). However, the few investigators who have focused on longer term development once individuals have moved into the community have tended to report a plateau effect (see Felce & Emerson, 2001), which suggests that the reported improvements in adaptive behavior upon moving may reflect the impact of increased opportunities to display existing skills in new ways rather than skill acquisition, a point discussed by Walsh and Kastner (2006). There is no evidence that community settings are inherently better environments for promoting developmental growth than the institutions that they replaced.
In a similar vein, there is little convincing evidence that community settings occasion less challenging behavior or are better at promoting alternatives to challenging behavior over time (Emerson & Hatton, 1996; Felce, 2000). Conroy et al. (2003) claimed improvement in challenging behavior following the move to the community but acknowledged that the change was modest (p. 271). The fact that challenging behavior scores deteriorated for about a quarter to a third of the sample serves to underline the marginal nature of change. Any claim that deinstitutionalization significantly alters the difficulties that challenging behavior presents to the delivery of high quality service support would greatly exceed what has been demonstrated.
In relation to community integration, one might accept that people had greater community presence (e.g., more did at least some activities in the community) after moving to the community than while living in an institution. This result is, perhaps, hardly surprising because it is a familiar finding in the deinstitutionalization literature (see Emerson & Hatton, 1996; Felce, 2000). Indeed, results of postdeinstitutionalization research on alternative community settings also generally supports the proposition that participation in community activities is more likely if housing arrangements are more normative. After controlling for the effects of participant characteristics, researchers have found increased participation in community-based activities in (a) community-based residences when compared to campus or cluster housing (Emerson, 2004), (b) supported living or semi-independent living arrangements when compared to group homes (Emerson et al., 2001; Howe Horner, & Newton, 1998; Stancliffe & Keane, 2000), (c) settings with a less institutional milieu (Felce, Lowe, & Jones, 2002), and (d) more homelike settings (Egli, Feurer, Roper, & Thompson, 2002). However, how socially significant this outcome is remains open to question (see Cummins & Lau, 2003).
It is difficult to reach even a tentative conclusion in relation to family contact for the reasons already mentioned. In the United Kingdom deinstitutionalization literature (Felce, 2000), increased family contact has been found following a move to the community in some studies but not in others. Distance between the individuals and their family has been implicated as an obstacle to contact, and research has shown that, among other factors, living near family is associated with increased contact (Felce, 2000). However, Conroy et al. (2003) did not establish that the distance between the individuals studied and their families decreased. Nor is it an inevitable result of deinstitutionalization that this will be the case; it will depend on the criteria by which new homes are found for these people.
In relation to receipt of services, the conclusion of Conroy et al. (2003) appears entirely overstated. Most people's receipt of services was unaltered, and the number of services that increased were approximately equal to the number of services that decreased. Other changes concerning health care were acknowledged by Conroy et al. as small and, although medication practices appeared to change over the 5 years, it is difficult to attribute these necessarily to the change in residential accommodation because there is no control for the passage of time. The documented changes might well have arisen from the recent general emphasis on reducing the use of antipsychotic medication.
Overall, it would seem that two reactions to the critique of Conroy et al. (2003) up to this point are possible. The first might be termed the purist line, in which uncertainty over the integrity of the data and demonstrated problems in analytic rigor lead readers to discount the study altogether, until a definitive analysis is produced. The second, the pragmatic line, might recognize a lack of confidence in the precise findings but considers that, having been subject to scrutiny and reanalysis, some of the original conclusions can be defended, albeit in considerably more modest terms. I think that it probably might be the case that the people studied experienced greater opportunities to exercise skills following their move and that this is reflected in a small increase in adaptive behavior scores. Equally, it is plausible that more undertook some activities in the community after their move than before. Beyond that, uncertainty grows. In neither case would accepting these conclusions alter the corpus of findings on the impact of deinstitutionalization.
Precision of the Research Question
Up to this point, I have been careful to discuss the findings presented by Conroy et al. (2003) in relation to the impact of the movement of the individuals studied from an institution to supported living in the community because this was the actual focus of their study. As Walsh and Kastner (2006) explained, despite the title “The Hissom Closure Outcomes Study,” this was not a study of the impact of deinstitutionalization on Hissom Focus Class Members but, rather, a selective evaluation of a defined move for just under half of their number. I do not criticize Conroy et al. for selecting a subgroup for study. Indeed, I think it entirely legitimate to evaluate the movement of a group of people from one defined setting type (institution) to another (supported living) and to select those individuals who experienced this change of environment. Moreover, I would support their decision to be selective. Although the outcome for the entire class is an important evaluative concern for those responsible for local service provision, the question is of limited interest for the readership of an international journal because it is too particular. A comparison between two defined types of environment is more generalizable.
My criticism of Conroy et al. (2003) in this respect concerns failures to (a) explain the nature of the sample, (b) establish a clear research purpose, and (c) interpret the data consistently in line with that purpose. In the final paragraph of the introduction (p. 265), the authors informed readers that there were 520 Focus Class Members. Two sentences later they stated that “our purpose … is to describe various outcome indicators for persons who lived in Hissom in 1990 and in small supported arrangements in the community approximately 5 years later.” The first section of the Method (p. 265) is then entitled “Characteristics of the Focus Class Members” and, subsequently, subject descriptors are given for 254 individuals. No explanation is given as to why 520 had become 254. A simple statement of sample attrition and the different deinstitutionalization end points for Focus Class Members would have positioned the evaluation accurately. According to Conroy (1996), full data were collected for 382 of the 520 Focus Class Members, many of whom did not move to supported living but to private ICFs/MR, larger group homes, back to their family, or to other public institutions. Hence, it is clear that the sample studied in the published article was a selected subset.
If one assumes that the 254 individuals included were all of the Focus Class Members who moved to supported living for whom full data were available, the statement of research purpose quoted above is accurate. However, any precision in this respect is repeatedly undermined by reference to the Focus Class and the closure of Hissom. There are numerous imprecise statements in the concluding few paragraphs of the 2003 article. Conroy et al. summed up their study by framing it incorrectly as a “study of the closure of Hissom Memorial Center” (p. 272). They went on to state that “one irrefutable outcome of the closure of Hissom Memorial Center is the fact that the closure was accomplished entirely by movement of persons to community-based living situations” (p. 272) and that
The closure of Hissom was unique not only because of its complete reliance on community living arrangements, but also because of the size of the living arrangements that were selected. As noted above, practically all of the Focus Class Members live in homes by themselves or with just one or two roommates. (pp. 272–273)
Both of these latter statements appear to be false, contradicted by the additional detail provided in Conroy (1996), in which he clearly stated that some individuals leaving Hissom but not included in this study did move to facilities other than supported living arrangements. These facilities included private ICFs/MR, larger group homes, or other public institutions. It is very likely that these would include settings in which more than 3 people lived together.
The statements above illustrate imprecision that misrepresents what has been achieved by the particular deinstitutionalization initiative, at least if the Conroy (1996) report is accurate. The significance of the claim that deinstitutionalization was comprehensively achieved via the provision of small group living presumably lies in the fact that such an achievement would be atypical of deinstitutionalization experience generally. Although much progress towards replacing outmoded institutions has been made in many high income countries, evidence is still weak that community-based services have been provided that comprehensively offer services to all those requiring support. Alongside the development of normative supported accommodation in the community, other new settings may have institutional features in a new form. For example, in the United Kingdom, the early movement to bring children out of institutional care was accompanied by an expansion of residential schools. Not all adults who left institutional care in the United Kingdom now live in a house within a residential area to which they have some personal tie. Some people continue to be excluded from typical community living, often those whose behavior or needs for support challenge inadequate quality provision. I am, of course, much less familiar with experience in the United States, but I would expect that a somewhat similar situation might exist there too. The United States still has a broad range of residential provision. According to Braddock et al. (2005), 21% of the 492,385 persons in out-of-home placements in 2004 lived in settings for 16 or more persons, 11% lived in group homes or ICFs/MR for 7 to 15 persons, and 68% lived in settings for 6 or fewer persons, approximately evenly divided between supported living and a number of other arrangements, including ICFs/MR, group homes, and foster or host homes. There were no states in which there were no individuals living in larger settings; only 7 states had 90% or more of those placed out of home who lived in settings for 6 or fewer persons.
If Conroy et al. (2003) have overstated the comprehensiveness by which supported living met the needs of Hissom Focus Class Members, I believe that they have done no service to those people who remain or tend to be excluded for one reason or another from a fulfilling life alongside their fellow citizens. Accurate knowledge about the competencies of community programs to meet the needs of all persons who depend on them is needed to maintain adequate scrutiny and legitimate criticism of them. It is important not to overinflate the impact of recent reform. It is service users who bear the consequences of overinflated conclusions. They live with the actual consequences of program arrangements, not the documented ones.
Towards a Postinstitutional Research Agenda
Although it is important to ensure that published evaluations can stand up to critical scrutiny, I also feel that the continued emphasis on the evaluation of deinstitutionalization, which is reflected in both Conroy et al. (2003) and Walsh and Kastner (2006), detracts from what I would argue is a more important current research agenda, namely, the investigation of the determinants of outcome within postinstitutional provision. I think that we now have sufficient evidence that deinstitutionalization has, on the whole and in some modest ways, improved the well-being of people with intellectual disabilities in out-of-home placements (see Emerson et al., in press). However, the continued advocacy of new paradigms by which to conceptualize the provision of support and of particular working methods to support individuals better testify to the considerable room left for further improvement. The variability in outcome within postinstitutional community provision of ostensibly similar kind is great and, at the extremes, may overlap with even the low baselines set within institutional services (see Felce & Perry, in press). Understanding the determinants of quality of life outcome within community provision and ensuring that all programs provide for individuals equally well is a pressing current research agenda.
Unfortunately, the deinstitutionalization evaluation literature contributes little to our understanding of what program characteristics result in good outcomes for users (see Felce & Perry, in press, for elaboration of this position). It is essential that the research community become interested in variation in outcome and not just in the comparison of measures of central tendency. It is essential that they establish the integrity of the independent variables under scrutiny (i.e., provision models or support arrangements) by describing potentially important program characteristics, such as their size, groupings, architectural design, material quality, location, neighborhood, staffing characteristics, working methods, staff training, and staff performance. It is essential that they design well-controlled comparisons, conduct experimental studies, and use sophisticated analytic procedures to attempt to isolate the relationships between individual program characteristics and particular quality of life outcomes. Categorization of service types by label (e.g., ICF/ MR, group home, supported living) without elaboration is inadequate. It assumes a homogeneity across a range of independent variables that is mistaken. There are sufficient studies illustrating the diversity among similarly categorized settings to now count such classifications as meaningless. Typifying settings by a few key adjectives is a commonplace practice (the absence of description of the two types of setting under investigation within the Conroy et al., 2003, article illustrates the point), but it is one that needs to end. Developing a greater understanding of which factors within which types of provision arrangements promote which quality of life outcomes is essential if people with intellectual disabilities are to have the opportunity to lead more fulfilling and desirable lives than they do currently.
Author: David Felce, PhD, Professor, Welsh Centre for Learning Disabilities, Centre for Health Sciences Research, Cardiff University, Neuadd Meirionnydd, Heath Park, Cardiff CF14 4YS. firstname.lastname@example.org