To summarize and evaluate the current diagnostic accuracy of clinical measures used to diagnose Achilles tendon injuries.
A literature search of MEDLINE, CINAHL, and EMBASE databases was conducted with key words related to diagnostic accuracy and Achilles tendon injuries.
Original research articles investigating Achilles tendon injuries against an acceptable reference standard were included.
Three studies met the inclusion criteria. Quality assessment was conducted using the Quality Assessment of Diagnostic Accuracy Studies-2 tool. DerSimonian-Laird random-effects models were used to pool sensitivity (SN), specificity (SP), and diagnostic odds ratios with their 95% confidence intervals (CIs).
The SN and negative likelihood ratio (−LR) values for Achilles tendon rupture measures ranged from 0.73 (95% CI = 0.65, 0.81) and 0.30 (95% CI = 0.23, 0.40) to 0.96 (95% CI = 0.93, 0.99) and 0.04 (95% CI = 0.02, 0.10), respectively, whereas SP and positive likelihood ratio (+LR) values ranged from 0.85 (95% CI = 0.72, 0.98) and 6.29 (95% CI = 2.33, 19.96) to 0.93 (95% CI = 0.84, 1.00) and 13.71 (95% CI = 3.54, 51.24), respectively, with the highest SN and SP both reported in the calf-squeeze test. The SN and −LR values for Achilles tendinopathy measures ranged from 0.03 (95% CI = 0.00, 0.08) and 0.97 (95% CI = not reported) to 0.89 (95% CI = 0.75, 0.98) and 0.19 (95% CI = not reported), whereas SP and +LR values ranged from 0.58 (95% CI = 0.38, 0.77) and 2.12 (95% CI = not reported) to 1.00 (95% CI = 1.00, 1.00) and infinity, respectively, with the highest SN and SP reported for morning stiffness and palpation for crepitus. Pooled analyses demonstrated similar diagnostic properties in all 3 clinical measures (arc sign, palpation, and Royal London Hospital test), with SN and −LR ranging from 0.42 (95% CI = 0.23, 0.62) and 0.68 (95% CI = 0.50, 0.93), respectively, for the arc sign, to 0.64 (95% CI = 0.44, 0.81) and 0.48 (95% CI = 0.29, 0.80), respectively, for palpation. Pooled SP and +LR ranged from 0.81 (95% CI = 0.65, 0.91) and 3.15 (95% CI = 1.61, 6.18), respectively, for palpation, to 0.88 (95% CI = 0.74, 0.96) SP for the arc sign and 3.84 (95% CI = 1.69, 8.73) +LR for the Royal London Hospital test.
Most clinical measures for Achilles tendon injury have greater diagnostic than screening capability.
Clinical measures of Achilles tendon injuries have not been well investigated.
For Achilles tendon injuries, currently available clinical measures are stronger in their diagnostic than screening properties.
The squeeze test was a useful diagnostic measure for an Achilles tendon tear, although we recommend caution because this finding was demonstrated in only 1 study with a high risk of bias.
The Achilles tendon is the largest and most frequently torn tendon in the human body.1–3 Achilles tendon injuries are among the most common sport-related injuries.3–8 The accurate diagnosis of an Achilles tendon injury, such as Achilles tendinopathy and, to a lesser degree, Achilles tendon tear, is not always clear and straightforward.1,5,9–12 The differential diagnosis of an Achilles tendon injury includes but is not limited to retrocalcaneal bursitis, os trigonum, tarsal tunnel syndrome, posterior tibialis tendon rupture, arthritic conditions, plantar fasciitis, and stress fracture.9
Sensitivity (SN) and specificity (SP) are accuracy properties used for both screening and diagnostic accuracy tests. The closer the SN is to 100% in the presence of a test with a negative result, the stronger the ability of that clinical measure to rule out the potential for a particular diagnosis. The closer the SP is to 100% in the presence of a test with a positive result, the stronger the ability of that clinical measure to rule in the potential for a particular diagnosis.
Diagnostic ultrasound and magnetic resonance imaging (MRI) have traditionally been considered the criterion reference standards to diagnose Achilles tendon injuries.1,2,9,10 However, this testing can be costly and may not result in accurate diagnosis.12–15 Because of limited evidence support, the American Academy of Orthopaedic Surgeons' clinical practice guidelines recommendation was inconclusive regarding the routine use of MRI for diagnosing acute Achilles tendon tears.15–17 In addition, the McKinsey Global Institute18 reported that diagnostic imaging from both MRI and computed tomography scans contribute to $26.5 billion in unnecessary health care costs annually.
Clinical measures, such as subjective reports of pain and stiffness, and objective clinical tests commonly described (palpation of a gap in the Achilles tendon, calf-squeeze test, palpation for tenderness of the Achilles tendon, and the arc sign) are being used more commonly by and are increasingly accessible to practicing clinicians for assistance with the diagnosis of Achilles tendon dysfunction. In fact, recent findings2 suggested that the use of a comprehensive clinical examination incorporating such measures outperformed MRI with respect to diagnostic accuracy for Achilles tendon rupture. Additionally, delayed treatment for Achilles tendon dysfunction leads to poorer outcomes.19–24 Therefore, if these clinical measures are truly comparable with diagnostic imaging, the cost-effectiveness of the standard imaging techniques for detecting Achilles tendon dysfunction may be questioned. The clinical efficacy of clinical measures in relation to commonly accepted diagnostic imaging methods has not been determined.
To our knowledge, a comprehensive review of the diagnostic accuracy of clinical measures (both subjective [eg, reports of pain and stiffness] and objective [eg, palpation for tenderness and a gap in the tendon, arc sign, calf-squeeze test, single-legged heel raise, and hop test]) for Achilles tendon injury does not exist. Clinicians examining and treating patients with Achilles tendon conditions need a clear understanding of the utility of these various clinical measures. Therefore, the purpose of our study was to systematically evaluate the diagnostic accuracy of clinical measures used to diagnose Achilles tendon injuries.
We used the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) guidelines during the searching and reporting phases of this review. The PRISMA statement includes a 27-item checklist that is designed to be used as a basis for reporting systematic reviews of randomized trials,25 but it can also be applied to multiple forms of research methods.26
A systematic, computerized search of the literature in the MEDLINE, CINAHL, and EMBASE databases was conducted April 1, 2013. The MESH search terms for MEDLINE are listed in Table 1, with limits for English language and humans. Three reviewers (C.B., E.S., K.R.) independently performed the search. Because computerized search results for diagnostic accuracy data frequently omit many relevant articles,27 the reference lists of all selected publications were checked to retrieve relevant publications that were not identified in the computerized search. The gray literature, which included publications, posters, abstracts, and conference proceedings, was also hand searched. Two reviewers (C.B., E.S.) searched the reference lists and gray literature. To identify relevant articles, titles and abstracts of all identified citations were independently screened. Full-text articles were retrieved if the abstract provided insufficient information to establish eligibility or if the article passed the first eligibility screening.
Articles examining clinical measures for Achilles tendon conditions were eligible if they met all of the following criteria: (1) included individuals with Achilles tendon pain, (2) included at least 1 clinical Achilles tendon examination measure, (3) used an acceptable reference standard, (4) reported the results in sufficient detail to allow reconstruction of contingency tables, and (5) were written in English.
An article was excluded if (1) the condition was associated with an injury located elsewhere (eg, knee joint) that referred pain to the Achilles tendon region, (2) insufficient detail was provided to calculate diagnostic accuracy, (3) the clinical measures were performed under any form of anesthesia or on cadavers, (4) specialized instrumentation not readily available to all clinicians was used, and (5) clinical measures were performed on infants or toddlers.
All criteria were independently applied by 2 reviewers (K.P., K.R.) to the full text of the articles that passed the first eligibility screening. An independent reviewer (M.R.) verified inclusion of all articles in the review. Disagreements among the reviewers were discussed and resolved during a consensus meeting (K.P., K.R., M.R.).
Risk of Bias and Quality Assessment
Two reviewers (C.B., E.S.) independently reviewed each full-text article and scored it with the Quality Assessment of Diagnostic Accuracy Studies 2 scores (QUADAS-2) tool.28 Disagreements among the reviewers were discussed and resolved during a consensus meeting and then by an independent third reviewer (M.R.). The QUADAS-2 is a quality-assessment tool composed of 4 domains: patient selection, index test, reference standard, and flow and timing. The risk of bias is assessed in each domain; the first 3 domains are also assessed for applicability with a low, high, or unclear rating. Applicability in the QUADAS-2 refers to whether certain aspects of an individual study match or do not match the review question. Unlike the QUADAS-1, the QUADAS-2 does not use a comprehensive quality score; instead, it uses an overall judgment of low, high, or unclear risk. An overall risk rating of low risk of bias or low concern regarding applicability requires the study to be ranked as low on all relevant domains. A high or unclear rating in 1 or more domains may require that the study be rated as at risk of bias or as having concerns regarding applicability.
Data Extraction and Analysis
Three reviewers (M.R., K.P., K.R.) independently extracted information and data regarding study population, setting, special test performance, injury, diagnostic reference standard, and numbers of true positives, false positives, false negatives, and true negatives for calculation of SN, SP, positive likelihood ratio (+L), and negative likelihood ratio (−LR) when not provided. Sensitivity is defined as the percentage of people who test positive for a specific condition among a group of people who have the condition. Specificity is the percentage of people whose test results are negative for a specific condition among a group of people who do not have the condition. A +LR is the ratio of a positive test result in people with the condition to a positive test result in people without the condition. A +LR identifies the strength of a test in determining the presence of a finding and is calculated by the formula SN/(1 − SP). A −LR is the ratio of a negative test result in people with the condition to a negative test result in people without the condition and it is calculated by the formula (1 − SN)/SP. The higher the +LR and lower the −LR, the more the posttest probability is altered. The diagnostic odds ratio (DOR) is a single indicator, independent of prevalence, that represents the ratio of the odds of positivity in those with disease relative to the odds or positivity in those without disease. The values for the DOR range from 0, indicating no test discrimination, to infinity, with higher scores indicating better discrimination.29 Posttest probability can be altered to a minimal degree with +LRs of 1 to 2 or −LRs of 0.5 to 1, to a small degree with +LRs of 2 to 5 or −LRs of 0.2 to 0.5, to a moderate degree with +LRs of 5 to 10 and −LRs of 0.1 to 0.2, and to a large and almost conclusive degree with +LRs greater than 10 and −LRs less than 0.1. Pretest probability is defined as the probability of the target condition before a diagnostic test result is known. It represents the probability that a specific patient, with a specific past history, presenting to a specific clinical setting, with a specific symptom complex, has a specific condition.30
Studies were statistically pooled when ≥2 studies examined the same index test and diagnosis with the same reference standard. DerSimionian-Laird31 random-effects models, which consider both between-studies and within-study heterogeneity, were used to produce summary estimates of SN, SP, +LR, −LR, and DORs. An I2 value of >50% and Cochrane-Q P value of <.10 were the criteria to indicate significant between-studies heterogeneity of SN and SP and likelihood ratios, respectfully. We did not formally test publication bias because of the low power of the tests with limited included studies.32 No significant threshold effects were found using Spearman correlation coefficients. When a cell was empty, we added 0.5 to all 4 cells as suggested by Cox.33 All analyses were conducted in Meta-DiSc (version 1.4; Informer Technologies, Inc, Dallas, TX) by 1 author (A.G.), who was blinded to the results of the search, inclusion and exclusion criteria, and study quality.34
Selection of Studies
The systematic search through MEDLINE, CINAHL, and EMBASE netted 1512 abstracts, and 3 additional papers were identified through an extensive hand search. In total, 1437 titles were initially retained after duplicates were removed. Abstract and full-text review reduced the number of acceptable papers to 3,1,9,10 and the 2 articles1,9 investigating Achilles tendinopathy qualified for meta-analysis (Figure 1). This review included 219 participants across the 3 studies and the investigation of 14 clinical measures. The sample sizes of the studies were 174,1 14,10 and 21.9 Two of the studies1,10 investigated physical examination measures only, 1 for Achilles tendon tear1 and the other for Achilles tendinopathy,10 whereas the third study9 examined both subjective and physical examination measures (Table 2).
The individual items for the risk of bias and applicability concerns are provided in Table 3. Patient selection and index test use were common reasons for the high risk of bias and applicability concerns. The κ value between testers for the overall bias score using QUADAS-2 was 0.59 (95% confidence interval [CI] = 0.20, 0.98), with this point estimate reflecting moderate agreement.35 No disagreement persisted between reviewers after the consensus meeting.
Results of Individual Diagnostic Clinical Measures
Three studies met the inclusion criteria: 1 study for Achilles tendon rupture1 and 2 studies for Achilles tendinopathy.9,10 Two studies were classified as high bias1,10 and 1 as low bias9 per QUADAS-2. The characteristics of each study included in the review are listed in Table 2. The diagnostic accuracy of clinical measures for Achilles tendon injuries is given in Table 4, the pooled diagnostic properties of the clinical measures for Achilles tendon conditions are shown in Table 5, and the studied clinical measures are described in Table 6.
Achilles Tendon Rupture
One study qualified for inclusion based on the diagnosis of Achilles tendon rupture.1 This study was classified as high bias per QUADAS-2. Four clinical measures for this diagnosis (palpation, calf-squeeze test, Matles test, and Copeland test; Table 4) were included. The SN values for Achilles tendon rupture measures ranged from 0.73 (95% CI = 0.65, 0.81) to 0.96 (95% CI = 0.93, 0.99), whereas SP values ranged from 0.85 (95% CI = 0.72, 0.98) to 0.93 (95% CI = 0.84, 1.00), with the highest SN and SP both reported in the calf-squeeze test. Palpation for a gap in the Achilles tendon had an SN of 0.73 (95% CI = 0.65, 0.81), an SP of 0.89 (95% CI = 0.71, 0.97), a +LR of 6.64 (95% CI = 2.32, 19.91), and a −LR of 0.30 (95% CI = 0.23, 0.40). When the gap test result was positive, the posttest probability of a diagnosis of an Achilles tendon tear for that patient was altered to a moderate degree. When the palpation test result was negative, the posttest probability of such a diagnosis was only altered to a small degree. The calf-squeeze test had an SN of 0.96 (95% CI = 0.93, 0.99), an SP of 0.93 (95% CI = 0.75, 0.99), a +LR of 13.71 (95% CI = 3.54, 51.24), and a −LR of 0.04 (95% CI = 0.02, 0.10). Therefore, when this test result was positive, the posttest probability of an Achilles tendon tear was altered to a large and almost conclusive degree. Additionally, when this test result was negative, it also altered the posttest probability of not having an Achilles tendon tear to the same degree. The reported diagnostic values for the Matles test included an SN of 0.88 (95% CI = 0.78, 0.94) and an SP of 0.85 (95% CI = 0.66, 0.95), which resulted in a +LR of 6.29 (95% CI = 2.33, 19.96) and a −LR of 0.14 (95% CI = 0.07, 0.25). A positive result on the Matles test altered the posttest probability of a diagnosis of an Achilles tendon tear to a moderate degree; a negative result altered the probability of a diagnosis of not having an Achilles tendon tear to the same degree. The author reported only the SN value (0.78; 95% CI = 0.49, 0.94) for the Copeland test; therefore, +LR and −LR were not quantifiable.10 Thus, we could not compute the posttest probability of ruling out the diagnosis of an Achilles tendon tear for the Copeland test.
Two studies qualified for inclusion based on the diagnosis of Achilles tendinopathy.9,10 One study10 was ranked as high bias, whereas the other study9 was ranked as low bias. Ten clinical measures (arc sign, palpation, Royal London Hospital test, self-report of pain, self-report of morning stiffness, palpation of tendon thickening, palpation of crepitus, stretch on passive dorsiflexion with knee joint in flexion, single-legged heel raise, and hop test; Table 4) for Achilles tendinopathy were included. Three of these measures qualified for meta-analysis: the arc sign, palpation, and the Royal London Hospital test (Table 5). The individual SN values for Achilles tendinopathy measures ranged from 0.03 (95% CI = 0.00, 0.08; palpation for crepitus)9 to 0.89 (95% CI = 0.75, 0.98; self-report of morning stiffness),9 whereas the individual SP values ranged from 0.58 (95% CI = 0.38, 0.77; self-report of morning stiffness)9 to 1.00 (95% CI = 1.00, 1.00; palpation of crepitus and arc sign).9 Composite testing (combining palpation, the arc sign, and the Royal London Hospital test)10 did not improve the diagnostic accuracy of the test and in fact was less diagnostically accurate (0.59 [95% CI = 0.47, 0.74]; SP = 0.83 [95% CI = 0.76, 0.89]; +LR 3.47 [CI not reported (NR)]; and −LR = 0.29 [CI NR]). For the subjective measures, self-reports of pain (SN = 0.78 [95% CI = 0.58, 0.94]) and morning stiffness (SN = 0.89 [95% CI = 0.75, 0.98]) demonstrated a stronger screening value than diagnostic value (SP range, 0.58–0.77).9 Both of these measures, with +LRs/−LRs of 3.39/0.29 and 2.12/0.19, respectively, are only able to alter posttest probability to a small to moderate degree. Palpation for tendon thickening (SP = 0.90 [95% CI = 0.83, 0.97]; +LR = 5.9 [CI NR]) and crepitus (SP = 1.0 [95% CI = 1.00, 1.00]; +LR = infinity) function more strongly as diagnostic than as screening measures (SN range = 0.03–0.59). With +LRs of 5.9 and infinity, respectively, these measures will alter posttest probability for this diagnosis to a moderate to large, and almost conclusive, degree. All of the tendon-loading measures were much stronger as diagnostic measures (SP range = 0.87–0.93; +LR = 1.0–3.31) than as screening (SN range = 0.13–0.43; −LR = 1.0–0.66) measures. The shift in posttest probability for a diagnosis of Achilles tendinopathy is still altered only between a very small degree (passive dorsiflexion) and a small degree (single-legged heel raise and hop tests).
The diagnostic properties and total sample sizes of the 2 studies included in the meta-analysis are provided in Table 4. Both groups9,10 investigated palpation, the arc sign, and the Royal London Hospital test; all tests were for Achilles tendinopathy. Pooled analyses demonstrated similar diagnostic properties in all 3 clinical measures, with SN ranging from 0.64 (95% CI = 0.44, 0.81) for palpation to 0.42 (95% CI = 0.23, 0.62) for the arc sign. Pooled SP ranged from 0.81 (95% CI = 0.65, 0.91) for palpation to 0.88 (95% CI = 0.74, 0.96) for the arc sign. The DOR ranged from 5.49 (95% CI = 1.61, 18.71) for the arc sign to 7.41 (95% CI = 2.31, 23.74) for the Royal London Hospital test. No significant heterogeneity was found between studies.
To our knowledge, this is the first systematic review investigating the diagnostic accuracy of both subjective and objective orthopaedic clinical measures for Achilles tendon conditions. Only 3 studies met our inclusion criteria. Although all 14 examination measures affected posttest probability to some degree, we found inconsistencies in the study methods. As reported in previous systematic reviews of individual orthopaedic measures,37–44 the measures investigated in this review should not serve as the sole means of screening (ruling out) or diagnosing (ruling in) an Achilles tendon condition.
The diagnostic values of clinical measures for Achilles tendon tear have moderate to high diagnostic capability and are better used for diagnosis (SP and +LR values) than for screening (SN and −LR values).1 The calf-squeeze test demonstrated the strongest SN, SP, +LR, and −LR, indicating that it is currently the best test to both screen for and confirm a diagnosis of Achilles tendon rupture. Both the +LR and −LR altered to a significant and almost conclusive degree the posttest probability of the diagnosis of Achilles tendon tear.30 Palpation of a tendon gap demonstrated a moderate capacity (+LR 6.64) to confirm an Achilles tendon rupture but a moderate to low SN, as well as a −LR (0.30), which altered the posttest diagnosis probability to a small degree, indicating that this test should not be used in isolation for screening purposes. The reliability of these clinical measures was not reported. The only study1 investigating these tests demonstrated a high risk of bias, as well as study design concerns that included nonstandardized use of the reference standard, reference standard results that were interpreted with knowledge of the index test results, 13 surgeons performing the tests without an assessment of intrarater or interrater reliability, and lack of routine use of intertester blinding. Each of these methodologic concerns could limit the clinical utility of the diagnostic accuracy values gleaned from this study. Strengths of this study included a large sample size (149 men and 25 women), use of a control group with possible other ankle injuries, and patient selection applicability. The patients in this study ranged from 38 to 48 years old.1 Achilles tendon ruptures are most typically observed in men in the fourth to fifth decades of life,45 making this study demographically applicable.
Most Achilles tendinopathy clinical measures demonstrated greater diagnostic than screening capabilities. A total of 5 of the 14 clinical tests demonstrated strong SP, including crepitus, the arc sign, the Royal London Hospital test, single-legged heel raise, and tendon thickening.9,10 Despite this relatively high SP, we suggest caution in applying any of these measures as individual clinical assessments, given that morning stiffness and palpation were the only tests with moderate SN. The +LR of these 5 tests ranged from 3.06 (alteration of posttest diagnosis probability to a small degree) to infinity (almost conclusive posttest probability alteration). Considering the meta-analysis of 3 of these tests, +LRs ranging from 3.15 to 3.24 would again alter posttest diagnosis probability of Achilles tendinopathy only to a small degree. The reliability (κ) for measures investigating Achilles tendinopathy was determined for both intrarater and interrater reliability. Intrarater reliability ranged from poor to fair agreement (palpation [range = 0.27–0.72],10 arc sign [range = 0.55–0.72]10) to excellent agreement (Royal London Hospital test [range = 0.60–0.89],10 self-report of pain [0.81],9 self-report of morning stiffness [0.88],9 palpation [0.96],9 and arc sign [0.81]9). Interrater reliability ranged from poor to fair agreement (crepitus [0.02],9 Royal London Hospital test [0.37],9 stretch on passive dorsiflexion [0.14],9 and single-legged heel raise [0.26])9 to substantial to excellent agreement (palpation [range = 0.72–0.85],10 self-report of pain [0.75],9 self-report of morning stiffness [0.79],9 palpation [0.74],9 and arc sign [0.77]9). Although intrarater and interrater reliability were variable for the study by Maffulli et al,10 and the same measure in the 2 studies9,10 also showed variability, most of the clinical measures demonstrated substantial agreement.
We found no significant heterogeneity between studies for the measures qualifying for meta-analysis. The results of the study by Maffulli et al10 are nongeneralizable because of the distinct patient population (10 male athletes already on the waiting list at a special clinic for Achilles tendon surgical exploration). The study by Hutchison et al,9 the only study in this review demonstrating a low risk of bias, had methodologic concerns worthy of mention, including a limited number of participants (n = 21), risk of selection bias (1 of the 3 groups included colleagues of the authors), and the use of ultrasonography as a criterion reference. Although ultrasonography is considered an acceptable criterion reference, the limited investigation regarding its diagnostic accuracy compared with the longer-held standard of MRI is unconvincing (SN = 0.50 and SP = 0.81, respectively).12 The most appropriate clinical reference standard for the diagnosis of Achilles tendinopathy may need further investigation.
Currently, copious debate exists regarding the most effective treatment for Achilles tendinopathy.46–52 This discussion relies on the accurate determination of the existence of the condition. If future researchers focus on the diagnostic accuracy of the most appropriate clinical measures for Achilles tendon injuries and stratify patients within an applicable treatment-based classification, some of the current disparity may be eliminated.
Accordingly, the use of the clinical measures in this review should be tempered. The extent of their clinical utility would be strengthened with additional investigations, larger sample sizes, less bias, and consistent comparison with the strongest criterion reference(s) (Table 3). Reliance on the use of these measures as stand-alone screening or diagnostic measures is not suggested.
This study is not without limitations, including limiting the search strategy to only those articles written in English, selecting a study that did not investigate SP for the Copeland test,1 not comparing patient-inclusion and -exclusion criteria across the studies, relying on the general diagnosis Achilles tendon pathology, and looking at single measures rather than clustered measures. The phrase Achilles tendon pathology does not differentiate tendinitis from tendinosis, for example. Therefore, the reader may not be clear as to whether these measures are assessments of tendinitis versus tendinosis, although the study characteristics listed in Table 1 may be helpful. Clustering of findings includes using 2 or more clinical measures and statistically combining their diagnostic accuracy values to determine whether the combination of testing findings improves their accuracy. One study10 did cluster the findings of 3 individual measures.
In general, Achilles tendon tear measures have stronger diagnostic accuracy properties than do measures investigating Achilles tendinopathy. The calf-squeeze test has the strongest diagnostic properties of all measures investigated, with a +LR of 13.71 and a −LR of 0.04, giving it the ability to rule in or rule out an Achilles tendon tear to a large and almost conclusive degree. The diagnostic properties of the different measures for ruling in and ruling out Achilles tendinopathy are quite variable but overall demonstrate a stronger diagnostic than screening capability. Because only 1 study demonstrated low bias, further high-quality studies investigating these measures are suggested.