NHLBI funded seven projects as part of the Disparities Elimination through Coordinated Interventions to Prevent and Control Heart and Lung Disease Risk (DECIPHeR) Initiative. They were expected to collaborate with community partners to (1) employ validated theoretical or conceptual implementation research frameworks, (2) include implementation research study designs, (3) include implementation measures as primary outcomes, and (4) inform our understanding of mediators and mechanisms of action of the implementation strategy. Several projects focused on late-stage implementation strategies that optimally and sustainably delivered two or more evidence-based multilevel interventions to reduce or eliminate cardiovascular and/or pulmonary health disparities and to improve population health in high-burden communities. Projects that were successful in the three-year planning phase transitioned to a 4-year execution phase. NHLBI formed a Technical Assistance Workgroup during the planning phase to help awardees refine study aims, strengthen research designs, detail analytic plans, and to use valid sample size methods. This paper highlights methodological and study design challenges encountered during this process. Important lessons learned included (1) the need for greater emphasis on implementation outcomes, (2) the need to clearly distinguish between intervention and implementation strategies in the protocol, (3) the need to address clustering due to randomization of groups or clusters, (4) the need to address the cross-classification that results when intervention agents work across multiple units of randomization in the same arm, (5) the need to accommodate time-varying intervention effects in stepped-wedge designs, and (6) the need for data-based estimates of the parameters required for sample size estimation.
Introduction
Key indicators of cardiovascular and pulmonary health outcomes continue to vary markedly by race, ethnicity, sex and/or gender, geographic location, and socioeconomic status (SES).1,2 These factors account for a substantial proportion of preventable death and disability (e.g., ref.3 ).
To address this issue, the National Heart, Lung, and Blood Institute (NHLBI) launched the Disparities Elimination through Coordinated Interventions to Prevent and Control Heart and Lung Disease Risk (DECIPHeR) initiative in September 2019, soliciting applications for UG3 awards for clinical centers (Centers)4 and for a U24 award for a research coordinating center (RCC).5 NHLBI funded seven centers and one RCC in September 2020.6 The centers were funded using three-year UG3 planning awards; following an administrative review, successful centers transitioned to a four-year UH3 implementation award in September 2023. The RCC was funded for seven years to support the clinical centers.
The DECIPHeR projects were expected to collaborate with community partners to employ validated theoretical or conceptual implementation research frameworks, include implementation research study designs, include implementation measures as primary outcomes, and inform our understanding of mediators and mechanisms of action of the implementation strategy. Several focused on late-stage implementation strategies that optimally and sustainably delivered two or more proven-effective, evidence-based, multilevel interventions to reduce or eliminate cardiovascular and/or pulmonary health disparities, and that promote and improve population health in high-burden communities.
During the UG3 phase, the NHLBI recognized that there were opportunities to provide the clinical centers with guidance on a variety of complex design and analytic issues facing these implementation studies. NHLBI created a Technical Assistance (TA) Workgroup comprised of methodologists from the NIH Office of Disease Prevention and the NHLBI Office of Biostatistics Research who had extensive experience with those issues, the methodologists from the RCC and the clinical centers, and staff from the NHLBI Center for Translation Research and Implementation Science. The TA Workgroup was charged with assisting the centers and the RCC with refining study aims, research designs, analytic plans, and appropriate sample size determinations.
The TA Workgroup was established in February 2022 and met with each center’s team 6-7 times through October 2022. For each meeting, a center submitted requested sections of the study protocol. The TA Workgroup used a cooperative group model to consider design, analytic, and sample size issues that were identified for each center. Following a robust discussion, the Workgroup chair provided written comments and centers edited the previously submitted sections and prepared additional sections as directed by the TA Workgroup. Through this process each center completed protocol sections that described the design and analysis plans for the UH3 research. These sections fed into the full draft of the protocol. The TA Working group reviewed the protocols and submitted feedback in December 2022 and March 2023. The centers submitted final protocols to NHLBI for administrative review in April 2023.
This paper reviews the methodological and study design challenges identified and the important lessons learned during the TA Workgroup’s activities during the UG3 planning phase. They included (1) the need for greater emphasis on implementation outcomes, (2) the need to clearly distinguish between intervention and implementation strategies, (3) the need to address clustering due to randomization of groups or clusters, (4) the need to address the cross-classification that results when intervention agents work across multiple units of randomization in the same study arm, (5) the need to accommodate time-varying intervention effects in stepped-wedge designs, and (6) the need for data-based estimates of the parameters required for sample size estimation. We discuss each of these lessons learned in detail below.
Lessons Learned
(1) The need for greater emphasis on implementation outcomes
Effectiveness-Implementation hybrid designs study both health outcomes and implementation outcomes and were introduced by Curran et al.7 These authors defined a Hybrid Type 1 design as “testing a clinical intervention while gathering information on its delivery during the effectiveness trial and/or on its potential for implementation in a real-world situation” (p. 4). They defined a Hybrid Type 2 design as “simultaneous testing of a clinical intervention and an implementation intervention/strategy” (p. 5). They defined a Hybrid Type 3 design as “testing an implementation intervention/strategy while observing/gathering information on the clinical intervention and related outcomes” (p. 6).
These hybrid designs are not complete research designs and instead characterize the study’s emphasis on health outcomes and implementation outcomes. Recognizing that, Curran et al. recently dropped “design” in favor of “study” and discussed the research designs commonly used with each study type.8 They stressed the assessment of the performance of an implementation strategy or strategies on an intervention’s reach, adoption, fidelity, and/or other implementation outcomes.
The original DECIPHeR applications proposed a variety of designs: five of the seven described their studies as Hybrid Type 2 studies, one as a Hybrid Type 3, and the seventh did not specify a hybrid type. Close examination indicated that most of the studies were Hybrid Type 1 studies, placing heavy emphasis on examining health outcomes. All centers identified a single or a cluster of health outcomes and in the applications, these health outcomes were the focus of the aims, statistical models, and power calculations. Implementation aims were often described in only general terms and experimental arms and outcomes were not clearly defined.
Once the focus on health outcomes was fully recognized, NHLBI reminded the centers that the projects should focus on implementation outcomes and Type III hybrid designs as specified in the Funding Opportunity Announcement. This led several centers to change their research design. For example, Tulane began with a Hybrid Type 1 study, comparing a community-health worker led multifaceted intervention against an enhanced usual care arm that did not receive the intervention. At the end of the TA consultation process, Tulane was able to characterize their study as a Hybrid Type III study, with the same intervention delivered in two arms but using different implementation strategies and with an implementation outcome as the primary outcome. As another example, UCLA began with a stepped-wedge group-randomized trial (SWGRT) with three sequences, five periods. and a health outcome as the primary outcome. At the end of the TA consultation process, UCLA was able to characterize their study as a Hybrid Type III parallel group-randomized trial (GRT) design with 3 arms in year 1 receiving the intervention using different implementation strategies and in year 2 with sites crossing over to a different implementation strategy than originally assigned so they could evaluate the relative merits of various implementation sequences on implementation outcomes.
In the final protocols, six of the DECIPHeR centers planned to do Hybrid Type 3 studies, with one center conducting a Hybrid Type 2 study. All centers had well-specified implementation outcomes. Most centers planned a primary analysis that compared the effects of two sets of implementation strategies on a single, primary implementation outcome. These were 2-arm studies with an estimand that was relevant to implementation. A careful modeling plan and power calculations were completed for the primary analysis and mock tables in the format suggested by the CONSORT guidelines were prepared for the display of future results.
(2) The need to clearly distinguish between intervention and implementation activities in the protocol
With the increased focus on implementation outcomes, it became clear that the centers needed to carefully distinguish between clinical intervention activities and implementation activities as well as differentiate between standard implementation activities and enhanced implementation activities.9 In addition, the centers decided to include the Standards for Reporting Implementation Studies (StaRI) guidelines10,11 in descriptions of their studies. The StaRI guidelines prompt researchers to clearly distinguish between intervention strategies designed to impact effectiveness and implementation strategies designed to promote the implementation. Use of the StaRI guidelines was expected to increase the transparency and accuracy in the reporting of implementation studies similar to the role of the CONSORT guidelines for reporting randomized clinical trials (RCTs).12
Experts in intervention and implementation strategies from the RCC joined the TA Workgroup once this issue was identified and played a critical role in this discussion. The components of the clinical intervention had to be clearly delineated, as well as the personnel who would deliver those components; those components were expected to affect the health outcomes for the study. The implementation strategies also had to be clearly described, distinguishing between the standard and enhanced implementation strategies;13,14 all of those components were expected to affect the implementation outcomes of the study, though they could also affect the health outcomes.
As an example, Tulane began the project with a single intervention delivered using a single implementation strategy. As such, it was difficult to distinguish the effects of the intervention from the effects of the implementation strategy. Moreover, the distinctions between which activities were part of the intervention and which activities were part of the implementation strategy were not always clear. At the end of the TA process, those activities were distinct, and in a Hybrid Type 3 design, the same intervention was delivered in both arms using two different implementation strategies.
Many of the centers used multilevel interventions as their evidence-based intervention and struggled with distinguishing the differences between the intervention components designed to impact system-level determinants of their health outcome from implementation strategies focusing on system-level approaches to increase uptake of the evidence-based intervention. As an example, Northwestern used the Kaiser Bundle as their evidence-based intervention. The Kaiser Bundle includes many components that focus on change at the system level, for example clinic-wide adoption of a comprehensive hypertension registry and the development and sharing of performance metrics between clinicians.15 Distinguishing between these system level intervention activities and activities occurring within the clinics to encourage their adoption of the Kaiser Bundle required thoughtful discussions within the TA Workgroup. By the end of the TA process, Northwestern had clearly articulated a plan for a Hybrid Type 3 study where they will compare the reach of a community-adapted Kaiser Bundle in adults with hypertension in community health clinics randomized to receive the Kaiser Bundle with non-supported implementation activities as compared with clinics receiving Practice Facilitation (the implementation strategy) to support implementation over a two-year period.
(3) The need to address clustering due to randomization of groups or clusters or delivery of interventions to groups or clusters
As the TA Workgroup assessed the details of the proposed designs, it became clear that most of the DECIPHeR studies had planned a parallel GRT. A parallel GRT randomizes groups or clusters rather than individuals to study arms and outcomes are measured in participants from each group.16–22 In the parallel GRT, there is no crossover of groups or clusters to a different study arm during the trial. The parallel GRT is the best comparative design available when there is a good reason for randomization of groups rather than individuals.16,21 The usual reasons for selecting this design are concern for contamination across arms or use of a group-level intervention.
The key feature of the parallel GRT is the randomization of groups to study arms. Outcomes on participants from the same group or cluster are expected to be positively correlated as a result of common exposures, shared experience, or participant interaction.23 This correlation violates the assumption of independence of errors that underlies the familiar analytic methods for RCTs.16–22 This correlation is often measured by the intraclass correlation (ICC) which characterizes the association between outcomes measured on individuals from the same cluster. Five of the seven DECIPHeR centers settled on a parallel GRT design for their Type 2 or 3 Hybrid studies.
One center settled on a SWGRT for its Type 3 Hybrid study. The key feature of the SWGRT is the crossover of each group or cluster from the control arm to the intervention arm, usually in a random order and on a staggered schedule.24 The observations from participants from the same group or cluster will be positively correlated as in a parallel GRT; however, the impact of the ICC is reduced in the SWGRT because the groups or clusters are crossed with study arms rather than being nested within study arms. Unlike the parallel GRT, the time-varying intervention status is confounded with calendar time since early study times are control periods while later study times are intervention periods.24,25 Like other repeated-measures designs, the effect of the intervention may vary depending on how much time has passed since the intervention was introduced25,26 and the pattern of correlation over time can be complex.27–29
One center used an individually randomized group treatment study design (IRGT). An IRGT differs from the usual RCT in that the method by which the intervention is delivered creates a level of ICC that otherwise would not exist.30 Correlated observations can result if participants receive at least some of their intervention in a group format (e.g., attend the same weight loss class), if participants share the same intervention agent (e.g., have the same instructor, therapist, or surgeon), or if participants interact with one another in some other way that is related to the method in which the intervention is delivered (e.g., through a virtual chat room created for participants in the same study arm). The IRGT is the best comparative design available if randomization of individuals is possible but it is necessary to deliver at least some of the intervention in a group format or through a shared intervention agent.21 The key feature of the IRGT is that the method by which the intervention is delivered generates some level of correlation among outcomes taken on groups of participants within the same study arm, creating the same type of ICC seen in GRTs. Similar to a GRT, investigators adopting an IRGT design must account for the ICC in the sample size to avoid an invalid power calculation and in the analysis to avoid type I errors.30–35 If the method of intervention delivery creates multiple overlapping groups or if the group structure changes over time, the situation is even more complicated, further increasing the risk of invalid inference such as an inflated type I error rate if the investigators do not account for these features of the study design.36–39
The TA sessions helped the centers understand the implications of their plans for randomization. That was particularly true for sample size methods, where the TA Workgroup recommended sample size tools available on the Research Methods Resources website and other specialized tools for sample size calculations for clustered designs.40 For example, Northwestern began with a non-randomized design comparing one intervention area (South Chicago) to one comparison area (West Chicago). This was an example of a one-group-per-arm design that confounds the effect of the intervention with the effect of the community; there is no valid analysis for that design.41 At the end of the TA consultation, Northwestern had decided to employ a two-arm GRT comparing the same intervention delivered using two different implementation strategies with a valid analytic plan and used the Research Methods Resources sample size calculator to estimate the required sample size.
(4) The need to address the cross-classification that results when intervention agents work across multiple units of randomization in the same arm
Interactions between participants and intervention agents such as patient navigators or care delivery staff are common in behavioral interventions and were expected in the DECIPHeR studies. These interactions pose no problem for design, analysis, and power so long as the agents work within a single unit of randomization. However, when the same agents interact with participants in more than one unit of randomization, even if in the same arm, the observations taken on those participants may become correlated, threatening the independence of the units of randomization. In those situations, it is important to document the interactions among intervention agents and participants so that such cross-classification can be reflected in the analysis.42–44 This issue surfaced for several of the DECIPHeR studies.
For example, Colorado randomized nurses to study arms with each nurse working with a different school or small group of schools. Colorado also employed asthma navigators; each navigator worked with several nurses within the same study arm. As such, participants were cross-classified by nurses and navigators and it was important to reflect both sources of clustering in the design, sample size, and analytic plans. That was accomplished by including separate random effects for nurses and navigators in the analytic model and by using the most recent Kenward-Roger degrees of freedom for testing effects.45
(5) The need to accommodate time-varying intervention effects in stepped-wedge designs
An intervention effect may change over time due to many factors including a potential learning curve (effects get stronger over time) or due to intervention fatigue (effects get worse over time). Standard analytic methods for SWGRTs do not explicitly acknowledge the possibility of time-varying treatment effects and often assume that the impact of intervention is immediate and sustained. When effects do change with time, special analytic methods are required to protect the type 1 error rate and the validity of inference regarding the effect estimate;46,47 standard methods may yield surprisingly biased estimates of the average treatment effect and generate an invalid type 1 error rate.
This issue was identified in the literature only in July 2022 and so was unknown to the investigators and to the TA Working Group until that time. When the paper by Kenny et al.46 was published, only the NYU study was planning to use a SWGRT design and there was time to adapt the design, analytic, and sample size methods, allowing NYU to propose methods appropriate for a SWGRT in which the intervention effect was expected to change over time. The issue was so new that NYU had to develop novel methods for sample size for their study.48
(6) The need for data-based estimates of the parameters required for sample size estimation
All of the major textbooks on the design and analysis of group- or cluster-randomized trials recommend data-based estimates for the parameters required for sample size estimation in those trials.16–22 The CONSORT Statements for GRTs, IRGTs, and SWGRTs call for investigators to report the key parameter which is the ICC.49–51 Such estimates have become increasingly available, but usually for health-related outcomes. The DECIPHeR studies used implementation measures as their primary outcome, such as reach, fidelity, and adoption.52 Parameter estimates for those variables were often not available and the DECIPHeR centers had to estimate them from data from other sources.
Summary
The DECIPHeR Initiative includes seven research projects supported by a biphasic, milestone-driven UG3/UH3 activity codes and a U24-supported research coordinating center. The goal of the research projects is to develop and test strategies for eliminating health disparities in heart and lung diseases and related risk factors. All seven research projects moved into the UH3 phase with validated implementation research frameworks, implementation research study designs, and implementation measures as primary research outcomes. To maximize overall rigor and reproducibility, a TA Workgroup was established to support the research projects during the UG3 planning phase. This group partnered with the study investigators in a cooperative group process of review and revision of the UH3 study protocols that lasted over a year. Several methodological and study design challenges encountered during the planning phase were addressed. Six important lessons learned during the planning phase have been highlighted in this article. We hope these methodological and study design challenges and related lessons learned provide valuable insights for avoiding common pitfalls and improving the design of future implementation studies.
Acknowledgments
The authors wish to acknowledge the participation of Drs. Paul Cotton, Michelle Freemer, Jennifer Curry, Paula Einhorn, and Nishadi Rajapakse in several of the TA Workgroup meetings. Dr. Cotton is Director of the Office of Extramural Research Activities at NIMHD. Dr. Freemer is a staff member in the Airway Biology and Disease Branch, NHLBI. Dr. Curry is a staff member in the Center for Translation Research and Implementation Science, NHLBI. Paula Einhorn is a staff member in the Division of Cardiovascular Sciences, NHLBI. Dr. Rajapakse is a staff member in the Division of Diabetes, Endocrinology, & Metabolic Diseases, NIDDK.
This research was supported in part by the Intramural Research Program of the NIH and NHLBI. The opinions expressed in this article are the author's own and do not reflect the view of the National Heart, Lung, and Blood Institute, the National Institutes of Health, the Department of Health and Human Services, or the United States government.