In part 2 of this clinical commentary, we highlight the conceptual and methodologic pitfalls evident in current training-load–injury research. These limitations make these studies unsuitable for determining how to use new metrics such as acute workload, chronic workload, and their ratio for reducing injury risk. The main overarching concerns are the lack of a conceptual framework and reference models that do not allow for appropriate interpretation of the results to define a causal structure. The lack of any conceptual framework also gives investigators too many degrees of freedom, which can dramatically increase the risk of false discoveries and confirmation bias by forcing the interpretation of results toward common beliefs and accepted training principles. Specifically, we underline methodologic concerns relating to (1) measure of exposures, (2) pitfalls of using ratios, (3) training-load measures, (4) time windows, (5) discretization and reference category, (6) injury definitions, (7) unclear analyses, (8) sample size and generalizability, (9) missing data, and (10) standards and quality of reporting. Given the pitfalls of previous studies, we need to return to our practices before this research influx began, when practitioners relied on traditional training principles (eg, overload progression) and adjusted training loads based on athletes' responses. Training-load measures cannot tell us whether the variations are increasing or decreasing the injury risk; we recommend that practitioners still rely on their expert knowledge and experience.
In the first part of this viewpoint, we presented a framework for using training load (TL) for injury prevention. In essence, the TL can be used to see whether the TL that was planned was actually done by the athlete. This highlights the central role of practitioners and their decisions in the injury-prevention process. Unfortunately, the evidence available is insufficient for providing practical recommendations on how to quantitatively modify the TL to reduce the injury risk. Practitioners should rely on traditional training principles such as overload progression and adjust the TL based on the athletes' responses.1,2 One reason isolated TL measures cannot be used to prevent injury is the lack of investigation surrounding the causal relationships between measures of TL exposure and injury (see part 1). Also, conceptual and methodologic concerns suggest that prior findings may have been influenced by HARKing (hypothesizing after the results were known), P hacking (running several analyses to find significant results), selective reporting, and confirmation bias. These problems can increase the chance of false discoveries and unsupported claims.
Here we present a series of conceptual and methodologic topics that should be considered when interpreting the results of any studies before practical applications are extracted and given too much credit.
LACK OF A REFERENCE CONCEPTUAL FRAMEWORK AND MODEL: SHARK ATTACKS
Although researchers described their aim as investigating (occasionally predicting) the association between TL and injury,3–7 their conclusions and recommendations were commonly reported as if causality had been assessed. Such erroneous interpretations are understandable. In part 1, we used the example of the popular spurious correlation between shark attacks and the amount of ice cream sold. In this example, it is easy to see how association wrongfully interpreted as causality can produce confusion and ultimately the wrong recommendation, as it would mean that if we sell less ice cream, fewer shark attacks will occur. The same has happened with TL and injury studies—that is, if we modify the TL, the assumption was that we could reduce or increase injuries, which has never actually been tested. The exclusion of a causal association is the consequence of our previous knowledge of and expertise about that phenomenon. With potential predictors or prognostic factors that are conceptually or empirically linked to injury occurrence, the absence of a framework leads to the belief that any measure of TL reflects some underlying mechanism of injury. Indeed, injuries occur while training. The conclusion that an association between shark attacks and ice cream is unreasonable is based on our knowledge, and determining the role and suitability of a measure of exposure and the TL as a proxy of an injury mechanism should also depend on expert knowledge. For this reason, a conceptual framework is necessary to avoid overinterpretation of results. In other words, we need a plausible, explicit conceptual framework to understand the factors that may influence injuries, the proxy measures of these factors, and whether these proxies exist and can be measured with acceptable accuracy. Not only is this necessary when we want to manipulate the prognostic factors to influence the occurrence of the event, but it also may be required for associational and descriptive-only studies that are used to develop a theory or etiologic models. Indeed, a spurious association may provide misleading information on which a theory may be wrongly based. Whatever the goal, the use of any measure available to “try and see” is not a methodologically appropriate way to proceed, given that it may generate several outcomes because of misclassification of the predictor, selection bias, mixing of effects (confounding), intervention effects, and heterogeneity.8 We recently proposed a conceptual framework9 to describe the causal structure of acute and overuse injuries, which has some components in common with the one proposed by Bertelsen et al10 for running injuries. These are only first steps before attempting to define the proxies for testing hypothesized causal paths and the single links forming the framework. Previously proposed “frameworks” are generic and none were detailed enough to derive the proxy measures or help define a causal structure.11 Although these may be useful in providing generic models for injuries, their utility for developing studies to examine the role of TL in injury occurrence is very limited.
No specific framework or causal structure has been used in previous studies, and the only reference “model” mentioned in the literature1 behind certain common metrics (ie, acute load, chronic load, and their ratio) is the Banister impulse-response model. Figure 1 shows how the original model was oversimplified by the use of time decays as arbitrary values for acute and chronic TL, and an additive model was transformed into a ratio without justification.12 The lack of physiological, biological, and computational rationales to support these metrics as causal factors in injury has been recently discussed.13,14 Furthermore, new evidence has called into question the etiologic roles of chronic load and the acute : chronic workload ratio (ACWR).9 Therefore, it is important to underline that the absence of a strong etiologic rationale for selected TL metrics, such as the ACWR, may be a sign that the previously reported relationship between TL and injury is coincidental or due to statistical artifact. Thus, results should never be accepted as established but should be scrutinized and challenged with purposeful studies or by examining the potential methodologic problems. The domino effect of the ACWR is the proliferation of new metrics,15,16 all characterized by the lack of any physiological explanation. This just adds confusion and increases researchers' degrees of freedom, thereby leading to P hacking (ie, fishing expeditions) and the risk of false associations. The consequences of these bad practices have been extensively presented in the literature.17,18 Furthermore, this emphasis on TL has caused an overreliance on a single factor among an array of factors that may contribute to overuse injuries.19,20 For example, inappropriate proposals have been generated, such as renaming overuse injuries TL errors.21,22 In summary, in science, conceptual frameworks matter!
METHODOLOGIC CONCERNS
The lack of any conceptual framework has another unfortunate consequence: it gives researchers too many degrees of freedom, which, when combined with suboptimal analyses and data handling, can dramatically increase the risk of false discoveries and confirmation bias to support common training practices, such as the overload progression principle.
Measure of Exposures
Researchers are free to select whatever they want as measures of exposure in their statistical model. For example, other than the acute and chronic TL, they can use the ACWR, week-to-week variation, ratio between low or high chronic TL, and different categories. Calculating associations from multiple combinations4,5,23 of TL measures inflates the risk of type I errors. This was acknowledged by 1 set of authors24 who calculated 350 hazard ratios using varying time windows (7–28 days) and TL measures (rolling and exponentially weighted averages, ACWR training monotony, and strain).
Pitfalls of Using Ratios
As mentioned previously, the most common measure of TL is the ACWR. However, the physiological rationale for choosing this ratio is unknown, and it is not supported by the Banister model. Furthermore, the ratio is well known to be problematic.25–30 Lolli et al31 observed that the ACWR failed to normalize the numerator for the denominator even after controlling for the mathematical coupling.32 The ratio is indeed used to control for a denominator variable (chronic workload) that is assumed to influence the numerator (acute workload), which is the variable we consider important.26 This failure adds unnecessary “noise,” increases the risk of artifact,33–35 and makes the results difficult to interpret, as the practitioner cannot determine whether the acute or chronic TL is driving the ACWR. Despite this fundamental flaw, this problem has been completely ignored, and few researchers33–35 focused on the coupling problem raised by Lolli et al.32 Unfortunately, these investigators failed in at least 2 aspects: first, they did not use a within-participants analysis, as should be done for ACWR because this metric is calculated at the within-participants level over a time series, and second, they did not understand that the ratio is the problem. Indeed, Lolli et al32 demonstrated that even when uncoupled, the ratio does not normalize.
It was also shown that when the training schedule was not taken into account, false relationships between ACWR and injury were generated. For example, Bornn et al36 created a 1000-season simulation of TL and injury using data from 2 elite sports (Serie A soccer and National Football League football). The simulation was constructed so that the ACWR was to have no relationship with injury (ie, injury was simulated to simply be a function of the training demands on a given day). When the simulation was statistically evaluated using “common approaches,” the ACWR outside of the proposed “sweet spot” (0.8–1.3)37 had a significant relationship with injury (6.8% increase in injury in Serie A soccer and 10.5% increase in injury in National Football League football)—similar to what has been reported in previous literature.3,6 Interestingly, this relationship existed despite the simulation's being designed to have no relationship at all. Once the training day was considered in the model, representing a function of the underlying training schedule, the relationship between the ACWR and injury no longer existed. Such results suggest that relationships with the ACWR in earlier studies might be nothing more than statistical artifacts. Specifically, the statement “ACWR predicts injuries” may be better stated as “ACWR predicts the upcoming TL.”
The idea that new metrics such as ACWR are strongly and consistently associated with injury is not actually supported by evidence. Even in the first 2 studies, on which the famous U-shape model presenting the sweet spot was based, Hulin et al3,4 did not show a U shape between the ACWR and subsequent injury. The U shape was produced after unpublished data from Australian League Football (AFL) were combined,37 but other authors have noted completely different relationships (negative linear,38 positive linear,7 inverted U shape,39 etc). Given these discrepancies, a request for retraction of the U-shape model was submitted.40 Although the retraction was refused (because the model was presented as illustrative), the errors were confirmed and, at minimum, suggested that the U-shape model should be dismissed. Furthermore, although illustrative, this model has been published more than 7 times in scientific journals, including in 2 consensus statements, 1 from the International Olympic Committee.11,37,41–45 The latter presented the model as validated, and a request for clarification is still pending (https://bjsm.bmj.com/content/50/17/1030.responses).
The conceptual and methodologic inadequacy of the ACWR (ie, to “normalize” the acute by the chronic load) has even been indirectly acknowledged by the proponents of the ACWR and other authors3,4,46,47 who arbitrarily removed the training weeks in which the chronic load was 1 or 2 standard deviations below the mean from the ACWR analysis. This exclusion was considered necessary to limit “spikes” in the ACWR, as small absolute increases in acute TL at low chronic TL values would result in very high ACWR values. Because a ratio is used when one believes the numerator (ie, acute TL) is the important factor but wants to control for differences in the denominator (ie, chronic TL),26 manipulating the denominator by eliminating data (ie, low chronic TL values) should not be necessary. Furthermore, removing lower TL values from the chronic TL measure to limit spikes creates an unrealistic scenario that does not represent the actual TL exposure.
Finally, in a recent study,48 investigators showed that the ACWR (ie, the ratio) actually generated statistical artifacts. The concerns raised in this section should warn researchers and practitioners to avoid adopting into their practice methods that have been not properly scrutinized by independent investigators. Additionally, the methodologic concerns we have raised can be used to avoid repeating similar mistakes in the future.
Training-Load Measures
Training load can be assessed using various methods and devices. Again, without a conceptual framework, researchers can use whatever is available. Investigators have used measures of internal TL such as the session rating of perceived exertion and measures of external TL such as GPS and inertial sensors, while ignoring (or not considering) that each of these measure different TL constructs.49 Furthermore, different TL indicators have also shown different relationships with injury risk, as demonstrated, for example, by Bowen et al5,23 or Jaspers et al.50 One obvious limitation of GPS is that it cannot be used to quantify any indoor training (eg, gym training). Why most authors who used GPS failed to quantify or acknowledge this point is unknown. We contend that it is highly unlikely that the teams studied did not complete any gym sessions, as preventive exercise programs are fundamental training activities. Because these programs may reduce the risk of injuries,51 this should be considered, at minimum, a potential effect-measure modifier. Conversely, such programs may be thought of as confounders because activity completed in the gym can affect the previous or next field training. Although we acknowledge that not all contextual factors can be measured, gym training is a part of the TL and not a contextual factor. Finally, much of the literature is vague as to how to account for nontraining days (eg, days off) in calculating the ACWR. Lack of clarity in reporting makes it difficult to understand the influence of nontraining days on the chronic load, as these days would technically be listed as a TL of 0.52
During soccer, the match activities of the highest-level professional teams are commonly measured using semiautomated camera systems. Combining data coming from different player-tracking systems is then necessary. Indeed, in a recent study of Premier League players, Bowen et al5 used both GPS and camera-system data to represent the same load indicator, referring to a study by Buchheit et al53 to support the interchangeability. However, Buchheit et al53 not only used a different GPS device but they also found a moderate systematic error after applying necessary customized correction algorithms (which reduced systematic bias). An additional concern is that some matches in team sports, in which GPS is frequently used during competition, are also played indoors, where GPS cannot be used. For instance, in AFL, some matches are played indoors, and the activity profile of certain official matches is measured using a radio-frequency identification–based, active real-time location system and not GPS (which is used in the other matches). However, none of the authors who examined the AFL mentioned whether the analyzed data were combined from the 2 systems or the radio-frequency identification measures were excluded. Collectively, these examples indicate that data from different player-tracking systems must be analyzed with care, as systematic measurement differences between systems may create artifacts or influence the results.
Time Windows
Researchers have used a plethora of time windows without providing any justification. They appear to have applied an exploratory trial-and-error approach, in which various combinations may be considered “just to see what happens.” Investigators have used 1 to 8 weeks as the indicator of chronic (or cumulative) TL and 1 to 2 weeks for acute TL and added ACWRs calculated with as many time windows.46,47,54 Carey et al55 designed a study aimed at defining the best time windows. They explored 336 combinations and determined the results differed depending on the windows (and the TL measure). Indeed, for moderate-speed running, the “best” window was 3 days for acute load and 21 days for chronic load, whereas for total distance, the “best” windows were 6 and 24 days, respectively. Results of these studies are difficult to reconcile to a logical framework, especially in the absence of a reference conceptual model. However, this adaptation of the time windows to the available data is methodologically unreasonable and a form of overfitting.56
Another concern is the lack of consistency in the methods used to calculate acute weekly TL. Figure 2 shows 4 ways to calculate the acute load using the session rating of perceived extertion as the TL measure. We have presented these methods using Monday-to-Sunday blocks, which is common in many studies.3–5,23,57,58 Each method clearly produces different acute load figures (from 1340 to 3450 arbitrary units), which in turn affect other derived measures of changes such as the ACWR. Method 4 in Figure 2 is the most common,3–5,23,38,50,59–65 which implies an important assumption: the load immediately preceding an injury is not a relevant risk factor. The time lags in the literature between injury occurrence and the first week or days used to calculate the acute or chronic load typically ranged from 1 day to 1 week (subsequent-week injuries).3–5,23,38,47,50,59–65 This means that the authors, for reasons not reported, considered the TL completed in the days immediately before the injury as not influencing the injury risk. Also of concern is that the model combined athletes who experienced an injury immediately after the weekly load used for the calculations and athletes who might have been injured as much as 7 days later. This frequent use of time lags is worrying not only for the lack of biological and plausible explanations but also because, if those studies are correct, from a practical point of view, this would mean that no matter what the athlete does after that week, the injury cannot be prevented. Furthermore, when no time lag has been specified, it is not often possible to know how or whether the TL until the moment of the injury was entered in the analysis (method 2). Method 1 (usually referred to as “current-week injuries”) is influenced by when the injury occurred but has been used in only a few studies.3,4 Again, some of these methods are confusing, illogical, and often not properly explained or justified. We recommend that, when practitioners read these articles, they ask themselves whether the researchers' choices were reasonable and supported by any evidence.
Discretization and Reference Category
Most authors66–71 have categorized continuous measures of exposure and ratios. The limitations of discretization have been reported in the literature. Nevertheless, investigators who used this approach commonly did not explain the reasons for creating those specific categories, which is important considering their influence on the results. No clear rationale describes why researchers have used between 2 (dichotomization)72 and 7 categories of ACWR and up to 11 bins.47,61,68 Also unknown is why authors have discretized the data using group and not individual values. Such an approach may cause some players to be underrepresented in certain categories and overrepresented in others—and this affects the interpretability of the results. Authors54,63,73,74 have analyzed the data using different methods of categorization in the same study (eg, z-score or quartile). Furthermore, the creation of discrete variables may lead to additional problems. For example, Carey et al66 showed that 3 discretization approaches producing different categories caused 16 to 21 false discovery rates in 100 simulations (assuming no relation with injury risk). At least 1 of the 3 discrete methods showed a false discovery 42 times out of 100. Another major concern highlighted by Carey et al66 was that freedom of choice in selecting a reference category almost doubled the risk of a false discovery. In most, if not all, TL-injury studies, neither the categories nor the references have been clearly justified. The main justification for such approaches appears to arise from authors who simply copied the methods and statistical approaches of previous researchers.
Furthermore, the categories and their meaning should be considered in the interpretation, and we are afraid that most did not consider the implications of the selected reference categories. For instance, Fanchini et al,74 in a study of football players, demonstrated an increased injury risk at a “high” ACWR (>1.26). Specifically, the injury risk was higher for an ACWR >1.26 versus an ACWR <0.78. Thus, the risk was higher in those with an increased load compared with a decreased load. When athletes with an ACWR >1.26 were compared with those who increased their acute load from 2% to 26% (ACWR = 1.02–1.26), the risk was not higher. In other words, if we want to apply these findings assuming a causal relationship that doesn't exist, we should avoid spikes only if the players' TL decreased (ACWR < 1) during the preceding week. But once the ACWR is >1.02, even doubling or tripling TL would not increase the injury risk. This finding is exactly the same as in the first study using the ACWR: Hulin et al3 showed that at ACWR >1.5 and ACWR >2.0, the injury rate was higher than with an ACWR <0.99 but not different than with an ACWR between 1 and 1.49.
Injury Definitions
Various classification systems are available, and the International Olympic Committee75 has recently presented a consensus document on standardizing injury data collection. However, the possibility that a researcher will arbitrarily select an injury definition without providing an appropriate rationale to justify the choice (again because of the lack of a reference framework and etiologic model) is another source of degrees of freedom that can change both the study results and the interpretation.76 Injuries have been defined and used in many ways, including contact, combined contact and noncontact, match loss, both training and match loss, complaints requiring medical attention, complaints requiring training session modification, and combined upper and lower body injuries, often with unspecified severities or days lost. In addition, some investigations involved self-reported injuries, whereas others relied on the medical staff to record injuries. The specific reasons or mechanisms for selecting a particular definition that clearly determines the number of predicted events have never been presented. Some researchers used definitions based on complaints, whose nature may underlie different, if any, TL-related injury mechanisms. Even more concerning were studies that included noncontact injuries such as bruises, hematomas, or cramps with no theoretical link to TL errors.5,23 Moreover, other authors5,7 who used TL measures derived from locomotor activities included neck and upper extremity injuries. Although we acknowledge that some investigators5,23 transparently reported these details, most have not provided this information, making interpretations challenging. Furthermore, researchers5,23,77 have combined noncontact and contact injuries to increase sample sizes or applied the same measures of exposure commonly used for noncontact injuries to contact injuries. Both choices are questionable.
Although the available systems frequently used in professional sport settings were developed mainly for injury surveillance, in order to determine associations with prognostic factors and given the different responsiveness of tissues and structures, either more specific classifications may be needed or it would at least be advisable to examine associations with specific injuries and specific tissues, as was done in 1 study.65 Instead, the use of broad injury categories as outcomes assumes common qualitative and quantitative links between all these injuries and TL measures, which is not reasonable.
Unclear Analyses
The use of suboptimal statistical analyses in TL-injury studies has been reported in the literature.78,79 However, some authors appear to have intermingled various methods for calculating injury risk, relative risk, and likelihood of risk, which makes interpretation very difficult for the reader. For example, in their seminal papers, Hulin et al3,4,61 calculated the injury risk as “the number of injuries sustained relative to the number of exposures to each workload classification.”4(p233) They reported this as both injury risk (in tables) and likelihood of risk (in figures). Yet this is not an injury risk but rather an injury rate.80 They also calculated the relative risks using these rates, which means they actually calculated rate ratios. The researchers provided values for what they labeled injury risk in tables but highlighted significance levels using P values determined from odds ratios (with logistic regression as a post hoc test). This method is inappropriate because the P value referred to the odds ratio and not the injury risk. These authors recently replicated the same errors.33 This is not only confusing but a questionable reporting of statistics.3,4,61 Unfortunately, the methods from these studies were used as reference for later studies, and the inaccuracies were replicated by others.5,23,58,81,82
Further, although confidence intervals have often been reported, uncertainty has rarely been discussed or considered.47 The investigators38,47 who found extremely large confidence intervals at the extremes of the TL-injury relationship (ie, outside the “sweet spot”) suggested that few injuries occurred in these ranges (also indicating sparse data bias) and the results were compatible with any kind of association.
Finally, we should always consider that a common characteristic of these studies is that the analysis was performed on data from athletes who sustained repeated injuries and, therefore, injury recurrence should have been taken into account. Although some researchers attempted to use statistical methods to handle correlated outcomes within individuals (eg, mixed models and generalized estimating equations), a consistent reporting strategy has yet to be adopted. Inconsistent reporting of such complex models makes their interpretation and comparison of outcomes across studies challenging for the practitioner. Recurrence is an important factor when analyzing data in injury research; simply ignoring it is not the best approach.83 Also, most exposures, outcomes, confounders, and moderators are time-varying variables. Nielsen et al79,84 provided gentle introductions to this topic, and interested readers are advised to refer to these articles.
Sample Size and Generalizability
Because in most of the TL-injury research, authors examined data from single teams in a myriad of sports, the studies have low generalizability and should probably be viewed as case studies. Accordingly, the findings cannot be confidently extended to other teams or populations. Most of the investigations also lacked adequate sample sizes,85 which was amplified when the samples were split into subcategories (eg, a high ACWR combined with low or high chronic load).4,5,23,46 Adequate sample size should not be calculated using only the events-per-variable approach because this method has limitations,86,87 but authors often do not report any justification for the chosen sample size. Even worse, some researchers reported the post hoc power analysis, which is well known to be inappropriate.88 Moreover, no authors reported the number of injuries for each category, and we are concerned that some categories had as few as 1 or 2 injuries. In addition, the number of athletes who sustained repeated injuries was rarely supplied.
Missing Data
The literature contains extensive explanations89,90 of how to handle missing data, but these have been ignored by many TL-injury investigators. To be imputed, data should be demonstrated to be (or reasonably assumed to be) missing completely at random or at least at random. Today, single imputation is an infrequent practice, and when possible, multiple imputations are suggested as an alternative for completing a case analysis.91 In sport science, simple imputation is quite common, with mean group values typically used as the imputation. Unfortunately, this approach has many limitations, and other solutions (eg, substitution, hot and cold deck imputation, regression) should be considered.89–93 However, more concerning extreme imputation strategies, such as estimating the data of an entire season using the data of subsequent seasons,5 have been used in some TL-injury work, which clearly severely biases the results. To improve on these practices, we suggest that researchers report on the proportion of missing data, assess whether any missing values were missed at random, describe how data imputations were made, and provide a justification of the method. A sensitivity analysis to show the effect of imputations would also be desirable and good practice.
Standard and Quality of Reporting
Several international guidelines are available for standardizing reporting in scientific studies (www.equator-network.org). These guidelines have been developed to promote standardization, transparency, and high quality of reporting. Such information is necessary to provide readers and other researchers pertinent information about the rationale, design, methods, analysis, and interpretation. These guidelines also allow for accurate evaluation of the risk of bias. Unfortunately, as described earlier, the studies on TL and injuries are heterogeneous in their design and methods. Reporting of methods is also unclear in many studies, which makes interpretation and comparison of findings difficult. For example, rigorously developed and well-established reporting guidelines94,95 recommend presenting both the absolute and relative risk because the relative risk alone can be misleading and influence interpretation and subsequent decisions.96 Relative risk can overestimate the effect of the association between exposure and outcome. For this reason, both relative and absolute risk (eg, risk differences) must be supplied.96,97 Unfortunately, with few exceptions,98 the authors of TL-injury studies commonly provide only the relative risk. Based on the absolute risk differences reported by Colby et al,98 even when trying to contextualize these numbers, the absolute changes in the injury risk appear to be very small or negligible. Other than designing adequate studies, it is important to clearly report each study's methods. Given the increasing interest and research in injury prediction, the transparent reporting of a multivariable prediction model for individual prognosis or diagnosis99 is a valuable tool whose implementation would improve the reporting so that the risk of bias can be assessed and the methods replicated.
CONCLUSIONS
The lack of any TL conceptual frameworks or reference models has facilitated the search and selection of results that align with traditionally accepted training principles and common sense, thus biasing our view and interpretation of the results. The bias in this field is unfortunately very evident. A recent review100 supported use of the ACWR (session rating of perceived exertion) but focused on only 2 of 13 identified studies. The authors failed to advise readers that the associations were in opposite directions, with Hulin et al3 reporting a higher injury risk at high ACWR (>1.5) and Jaspers et al50 showing a lower risk (ACWR >1.12). This kind of biased interpretation is regrettably quite common. The retrospective nature of the analysis and the several sources of researchers' degrees of freedom facilitate false discoveries, P hacking, selective reporting, and HARKing. Unfortunately, based on the aforementioned methodologic concerns and the inability of studies to examine potential causal associations, practitioners should be cautious when deriving practical recommendations.
Although practitioners may be surprised and feel discouraged by our critical analysis, some good can come from our critique. Given the limitations presented in this 2-part series, we recommend that practitioners still rely on their clinical experience and intuition, combined with logical training principles and knowledge of physiological mechanisms and stimulus-inducing adaptations. Although theoretically undertraining, “excessive” training, or both may be reasonably (but generically) considered predisposing factors to injury, this cannot be ascertained and quantified based on the literature. Training load-injury researchers should slow down and focus instead on producing higher-quality studies, even if this means many fewer studies (see Table for recommendations), including exploring more fundamental topics such as injury mechanisms in order to define frameworks that can be used to develop appropriate studies that are conducted according to established epidemiologic methods.
Note: During the preparation of this manuscript, new evidence and methodologic concerns have been presented.13,14,48 In light of these arguments, the Australian Institute of Sports has recently released a communication advising that the ACWR not be used as an indicator of injury risk (http://subscribe.ausport.gov.au/t/r-2CAD3E99D6C6144D2540EF23F30FEDED).