Despite advances in our understanding of the disease, significant therapeutic gaps remain for pulmonary arterial hypertension (PAH). Indeed, no cure exists yet for this devastating disease, and very few innovative therapies beyond the traditional pathways of endothelial dysfunction have reached late clinical trial phases in PAH. While there are inherent limitations to the currently available animal models of PAH, the delayed translation of innovative therapies to the clinic may also relate to flawed preclinical research methodologies. The present article discusses the limitations and flaws in the design of preclinical PH trials and discusses opportunities to create preclinical studies with improved predictive value in identifying key mechanisms involved in PAH development and progression and guiding early phase drug development in PAH patients. The implementation of rigorous study design will need support not only from researchers, peer reviewers, and editors, but also from academic institutions, funding agencies, and animal ethics authorities.
The number of scientific publications related to pulmonary arterial hypertension (PAH) increased exponentially over the last decades, leading to significant advances in our understanding of its pathophysiology1 and management,2 allowing the delay of clinical worsening,3 and likely improving survival.4 However, long-term prognoses of PAH patients can be further improved.5 –7 Significant translational and therapeutic gaps between preclinical research and improved patient outcomes thus persist, as very few innovative therapies have reached late clinical trial phases in PAH.8 –11 This translational gap is not unique to PAH. Since the beginning of the millennium, the number of new drugs approved yearly by health authorities has declined12 despite marked increases in total research and development expenditures.13 Clinical drug development is notoriously arduous, with fewer than 5% of high impact basic science discoveries 14 and fewer than 10% of development paths in Phase 112 being eventually approved by health authorities. Several reasons may explain this phenomenon, including higher regulatory efficacy hurdles and increased complexity and cost of clinical trials.15 There are also inherent limitations to the currently available in vitro and animal models, which imperfectly mimic the full spectrum of the human disease.16 Moreover, it has been proposed that the failing might also be related to the study design, implementation, and analysis, ultimately weakening our confidence in preclinical studies to identify promising therapeutic targets.17 Bias in study design, analytical methods, and reporting practices may indeed compromise scientific validity18 and data reproducibility,19 and ultimately jeopardize translation to human studies. Given the limited financial resources, the persistent medical need for improved therapy in PAH, and the restricted study population available for clinical trials, there is a need for reducing the number of false positive signals in preclinical studies and for optimizing the development of innovative therapeutic targets through performance of clinical trials based on more robust experimental data.
In research, bias occurs when “a systematic error is introduced into sampling or testing by selecting or encouraging one outcome or answer over others.”20 Bias can cause estimates of association to be either larger or smaller than the true association; in some cases, bias can even cause a perceived association that is directly opposite of the true association. Importantly, bias is relatively independent of both study power and statistical significance in contrast to imprecision, which relates to a random error. Thus, studies may produce precise but biased results because of flaws in study design and execution. Conversely, a study may be free of significant bias but yield an incorrect effect estimate due to low statistical power. While some degree of bias is nearly always present in a study, researchers should make every effort to identify, quantify, and/or eliminate bias through proper study design and data analysis, and to acknowledge its occurrence when unavoidable.
IDENTIFYING AND LIMITING BIAS IN IN-VITRO PAH PRECLINICAL STUDIES
The access to human samples of high quality from PAH patients and appropriate controls represents an invaluable resource for improving our understanding of PAH and validating emerging hypotheses. However, human PAH tissues and cells have been most commonly obtained at the time of lung transplantation from patients with long-standing disease that may no longer be representative of mechanisms accounting for PAH development or progression, introducing a significant selection bias (Figure 1). More importantly, most experimental studies using human samples are performed with small sample sizes due to the scarcity of specimens. Given that PAH is a heterogeneous disease in terms of background genetic defects, concomitant diseases predisposing to PAH, as well as genetic variations influencing response to therapy,11, 21 limited sample size may easily amplify the effects of selection bias and lead to erroneous conclusions (Figure 2A). Therefore, collaborative studies allowing the exploration of promising targets and the real interindividual heterogeneity in a larger number of samples are thus essential (Figure 2B).22 The creation of a structured PAH network (eg, the Pulmonary Hypertension Breakthrough Initiative or the International Consortium for Genetic Studies in PAH) and biobank facilities dedicated to harvesting and preserving explanted lung tissues, facilitating access to human tissue, and ensuring homogeneity in tissue processing, is thus warranted since human tissues are currently underexploited in PAH experimental research.
The choice of the cells and tissues to which PAH samples are compared is also crucially important. Indeed, there are often systematic differences between the groups being compared, known as confounding, so much so that differences in signaling pathways or outcomes may result from these differences rather than actual pathobiological abnormalities. Minimizing these inherent differences is thus essential. Control samples should ideally be matched for age/sex and for their underlying disease. In many cases, PAH researchers have relied on resected lung tissue for cancer. However, special attention is required to obtain tissues sufficiently distal from the tumor that may significantly influence the phenotype and genotype of neighboring cells.23 In all cases, equivalent tissue specimens should be collected from the same organ areas. This is particularly important within the lungs as distal versus more proximal pulmonary arteries may significantly differ phenotypically. In addition, the same handling and processing has to be used. Taken together, careful selection of control tissues that most likely represent healthy lungs/tissues is crucial.
IDENTIFYING AND LIMITING BIAS IN IN-VIVO PAH PRECLINICAL STUDIES
Despite the importance of scientific results obtained from animal models, most of these studies have been hampered by the fact that these models do not entirely encompass the typical features of human PAH. This may explain why animal models are frequently considered poor predictors of whether an experimental drug can become an effective treatment. Sometimes, though, the real reason is that confirmatory preclinical studies were not rigorously designed. Accordingly, the statistical and methodological rigor should be adapted, and in many ways, confirmatory studies should resemble clinical trials. Indeed, only the most rigorously conducted trials can completely exclude bias as an alternate explanation for the promising results observed following an intervention. Thoughtful subject eligibility criteria, sample size estimation, randomization and treatment allocation concealment, blinding, standardized outcome assessment, proper data handling, and transparent reporting methods have profoundly improved the validity of clinical trial results over the years.24 Such improvements are also essential in confirmatory preclinical research.
Matching Models to Human Manifestations of PAH
Recruiting a study population representative of future patients to be treated while minimizing confounding effects is the first step of an appropriately designed prospective study. Despite the limitations of current animal models previously discussed,16, 25 a detailed characterization and reporting of animal traits at baseline and appropriate controls using animal characteristics that are representative of the human disease should be promoted for a better standardization of the experimental design, enhanced reproducibility, and greater predictive ability. Currently, variations in disease induction and the potential for persistent and unrecognized confounders, including considerable inconsistencies in animals' ages and weights, how pulmonary hypertension (PH) is induced, and when the intervention is initiated and terminated,26 represent important sources of bias in PH preclinical research. Care must thus be taken to prespecify eligibility criteria before animals are enrolled. In confirmatory preclinical studies, it is also reasonable to randomize animals to novel therapies when irreversible PH is expectedly fully established and following prior confirmation (eg, by echocardiography). In addition, the rationale for choosing models should be stated,27, 28 and performing studies using more than one model and across different animal strains is encouraged. Ultimately, large animal models may share some common features of the human disease and are often the last step before translating novel drug candidates to clinical trials.29 While the need to include women is now a well-established requirement in clinical trials,30 analogous standards have not been equally enforced in preclinical stages of research. While the vast majority of preclinical PH studies still use male rodents only,26, 31 inferring experimental findings to both sexes when a single sex is studied could disadvantage women by biasing our understanding of disease processes toward male-predominant patterns. This is especially problematic in PAH, where there is a significant female predominance in humans.32 The landscape of clinical trials in PAH also dramatically changed over the last decade, and future compounds will almost necessarily be tested on top of currently available therapies in clinical trials leading to drug approval.2, 33 Although new targets can be alternatives to the currently approved therapies in humans, the demonstration of additive or synergic effects of novel therapeutic targets nowadays appears desirable for confirmatory preclinical studies.
Randomization and Allocation Concealment
The starting point for an unbiased interventional study is the use of a mechanism that ensures that the same sorts of participants receive each intervention. Even an apparently homogeneous group of animals may have inherent differences when the intervention is introduced. Thus, processes need to be considered to allow proper balance between groups and, as for humans, random animal allocation generally minimizes bias and balances characteristics that may influence response to treatment if properly done in a large enough sample. Techniques used to implement the allocation sequence (ie, allocation concealment) are also essential to avoid selection bias being introduced by selecting animals based on the upcoming intervention assignment. There is indeed empirical evidence from preclinical research34, 35 that either inadequate generation or concealment of allocation sequence yield to exaggerated estimates of intervention effects. Therefore, researchers should ideally report measures of successful randomization and allocation concealment.
OTHER POTENTIAL BIAS IN PAH PRECLINICAL STUDIES
Blinding of Outcome Assessment
Blinding refers to the process by which the study personnel are kept unaware of intervention allocations. Lack of blinding in clinical trials is associated with exaggerated estimates of intervention effects,36 especially when the outcome of interest is subjective.37 Importantly, many apparently objective outcome measures in preclinical PAH studies remain subject to interpretation. Unconscious bias can thus creep into evaluation of unblinded experiments even when performed by scientists of high integrity. Although blinding the investigator administering the treatment/intervention may not be possible in all instances, blinded assessment of imaging, hemodynamics, and histological outcomes is almost universally possible through independent team members performing outcome ascertainment.
Study Readouts and Interstudy Standardization
Even with rigorous attention to study design, studies may not have translational validity if the endpoint specified is not valid or is not measured using robust techniques. Importantly, outcome measures should match the clinical realm using relevant measures (eg, comprehensive hemodynamics in in-vivo studies). Secondary readouts are generally used to provide supportive information (eg, to ensure hemodynamic endpoints are correlated with histological, anatomic, and biochemical findings postmortem) or exploratory, hypothesis-generating information. Obviously, the exploratory nature of some experiments makes sample size calculation impossible or meaningless. Conversely, the importance of prespecified sample size calculation, referred to as a power calculation, in confirmatory experiments cannot be overemphasized, although it is rarely performed in preclinical PH studies.26, 31 Using human samples or exposing lab animals to research is only justifiable if there is a realistic chance that the study will yield useful information. Importantly, inappropriate samples will result in an inconclusive study, whereas an unnecessary large sample size will accrue excessive cost. Many researchers are thus tempted to perform interim analyses to subsequently increase the sample size as necessary. However, interim analyses enhance the risk of false positive results due to multiple analyses. Therefore, the primary endpoint of preclinical confirmatory PH studies must be decided before the study begins, as well as the effect size of the intervention for which the study is powered, and should be provided in the methods section of confirmatory experiments. Similarly, empirical work has confirmed marked heterogeneity in the methodologies used to assess study outcomes in preclinical PH studies,26 including pulmonary hemodynamics, markers of right ventricular function, and pulmonary remodeling.26 The majority of in-vivo studies also fail to appropriately monitor for toxicity. Obviously, these elements cannot be unilaterally dictated but require a consensus process to take place, with experts in the field agreeing on best practice, as has been previously developed for preclinical in-vivo evaluation of pharmacological active drugs in other fields.38 –40
Multiplicity, Interim Analyses, and P Value Adjustments
Because multiple readouts are necessary to fully evaluate pathophysiological pathways and the effects of interventions, multiple endpoints are frequently measured. However, conducting multiple tests of significance progressively increases the probability that a null hypothesis is rejected when the null hypothesis is actually true (ie, false positive result). Consideration must be given to controlling the risk of false positive conclusions, and adjustment for multiplicity will typically be necessary, especially for confirmatory studies.41 –43 Similarly, interim analyses frequently used to incorporate what is learned during the course of a study increase the risk of falsely rejecting the null hypothesis. Researchers should thus avoid unplanned interim analyses, and preliminary results should be presented without formal statistical analyses unless nominal P values have been adjusted accordingly. Collaboration with a statistician at the design stage and throughout analyses is thus crucial, and the selected procedure must be prespecified in the statistical analysis plan before undertaking any analyses of the data.
Handling of Missing Data
Attrition and exclusions frequently occur in preclinical PH studies when animals die or are withdrawn from the experiments or assessment does not provide relevant data. The risk of bias from incomplete outcome data depends on several factors, including the amount and distribution of missing data across intervention arms and the reasons for missing outcome data. Researchers should consider using a flow diagram showing the number of animals in intervention and control groups at each experimental step from randomization to outcome assessment. A timeline of experimentation is also desirable to inform whether all animals within each experimental group were analyzed together. For confirmatory studies, an intention-to-treat analysis may be considered as potentially the least biased way to estimate intervention effects in randomized trials.44 However, true intention-to-treat analyses generally require imputation, which can also lead to serious biases unless conservative methods are used. Thus, where imputation is used, both the per protocol and the intention-to-treat analyses should be presented, and the methods and assumptions for imputing data should be defined a priori and appropriately described.
Interpretation of the Results
The high pressure to find low P values, combined with a common misunderstanding of how to correctly interpret P values, frequently distorts the interpretation of significant results.45 A low P value is considered strong evidence against the null hypothesis. However, a P value of .05 is frequently incorrectly interpreted as meaning that there is 95% chance that the observed difference is true, rather than indicating a 5% probability that the difference is observed even if the null hypothesis is true. Previous studies estimated that a P value of .05 corresponds to a false positive rate of at least 23% (and typically close to 50%).46 Thus, a single statistically significant hypothesis test often provides insufficient evidence to confidently discard the null hypothesis, and study replication, especially by independent investigators, enhances the confidence that study results are true findings. Therefore, investing more time in replicating results (those of others as well as our own) and synthesizing data through systematic reviews and meta-analyses should be incentivized.47, 48
Reporting and Publication Bias
Reporting biases arise when the dissemination of research findings is influenced by the nature and direction of results. While publication bias (occurring when entire studies are not published, are published in obscure journals, are rarely cited, or are inappropriately indexed in databases) is the most obvious form of reporting bias, within-study publication bias may be one of the most substantial biases affecting results from individual studies,49 analyses with statistically significant findings, or extensive magnitude of effects being more likely to be reported compared to uninteresting or unwelcome findings. Reporting bias almost inevitably leads to major overstatements of efficacy,50 including in preclinical PH research.26 Intriguingly, selective submission by the authors rather than selective acceptance by the reviewers may predominantly contribute to publication and reporting bias.51, 52 Conversely, some journals indirectly contribute to this phenomenon by relegating less interesting findings to the supplement section. Publication and selective reporting biases also prevent others from learning about negative study results (which, by the way, should be allowed by editors to be published even in big impact-factor journals), with implications for animal ethics and research funding. To minimize publication and reporting bias, study preregistration was developed for clinical trials, limiting researcher ability to modify planned experimental design and analysis afterwards. As a result, the International Committee of Medical Journal Editors now considers only those clinical trials for publication that have been registered before the start of patient recruitment.53 In preclinical studies, preregistration in a public repository at the study inception is a debated issue. Indeed, while a finding is more convincing when it was predicted, breakthrough findings have been made through exploration with limited a priori hypotheses.
ADAPTING STATISTICAL AND METHODOLOGICAL RIGOR TO THE PROGRAMMATIC PURPOSE OF RESEARCH
While investigators seek to provide a better understanding of the pathophysiological processes and identify key cellular and molecular signaling pathways/targets involved in disease development in exploratory research, detailed and reproducible information on efficacy, dosing, and toxicity of potential drug candidates are required in confirmatory investigation to decide whether the drug could be tested in clinical trials. The statistical and methodological rigor should thus be adapted according to the nature of the study (Figure 3). Nonetheless, even at an exploratory stage, significant attention should be paid to identify, avoid, and acknowledge potential bias.
A CALL FOR CHANGES IN PRECLINICAL PH STUDIES
Scientific irreproducibility is a growing concern among academics and in the general population.55 Biases and poorly designed preclinical studies likely contribute to experimental irreproducibility, wasted resources, and erroneous conclusions.56 In response to these issues, the National Institutes of Health (NIH) proposed a set of guidelines and funding policies as minimum reporting requirements to promote rigor, reproducibility, and transparency of preclinical research that have been endorsed by prominent academic societies and scientific journals with editorial commitment to complying.57 –59 These include guidelines and checklists to improve methodology and reporting.57, 58 Multicenter preclinical studies,47, 60 systematic reviews, and meta-analyses are also preconized.48, 61 Practical solutions to improve preclinical research quality and research translation have also been specifically proposed in PH preclinical studies.16, 54 Implementing such requirements will involve a paradigm shift for scientists, their institutions, journals, and funding agencies.
In preclinical research, methodological sources of potential bias and imprecision are prevalent and frequently overlooked by researchers, potentially contributing to the significant discordance between preclinical and clinical results. Although not unique to PAH, concerted efforts to address this problem are needed for more effective translation of preclinical research findings into sustainable improvements in patient outcomes, including rigorous study designs, methodological standardization, appropriate data interpretation, and statistical analysis plans, as well as transparent reporting of preclinical studies.
Disclosure: No funding was received to write this manuscript. TS and FP have no conflicts of interest to declare. SB received consultant fees from Actelion Pharmaceuticals and research grants from Janssen. SP has received research grants from Actelion Pharmaceuticals, AstraZeneca, and Resverlogix, and has received speaker fees from Actelion Pharmaceuticals. RP is a scientist of the Heart and Stroke Foundation. OB is a scientist of the Fonds de Recherche en Santé du Québec and has received research grants from the Fondation Québécoise en Santé Respiratoire (FQSR) and Boerhinger-Ingelheim.