Health care professionals have reported and used a multitude of special tests to evaluate patients with shoulder injuries. Because of the vast array of tests, educators of health care curriculums are challenged to decide which tests should be taught.
To survey experienced shoulder specialists to identify the common clinical tests used to diagnose 9 specific shoulder injuries to determine if a core battery of tests should be taught to allied health professionals.
Descriptive survey administered via e-mail.
Of 131 active members of the American Shoulder and Elbow Surgeons, 71 responded to the survey.
Respondents were asked to complete a survey documenting their use of clinical tests during a shoulder examination. They answered yes or no to indicate their use of 122 different tests for diagnosing 9 shoulder conditions.
The average number of tests used for all pathologic conditions was 30 ± 9. The anterior apprehension and cross-body adduction tests were used by all respondents. At least 1 test was used for each of the 9 conditions listed (range = 1–7), and at least 50% of respondents used 25 tests. The tests were reviewed for valid diagnostic accuracy via the Quality Assessment of Diagnostic Accuracy Studies (QUADAS) tool. High diagnostic value and a large amount of QUADAS variability have been reported in the literature for 16 of the 25 tests.
A small percentage (20%) of clinical tests is being used by most examiners. The 25 most common tests identified from this survey may serve as a foundation for the student's knowledge base, with the clear understanding that multiple clinical tests are used by some of the most experienced clinicians dealing with shoulder injuries.
Clinical experience and familiarity sometimes outweighed diagnostic accuracy in this sample.
The 25 most commonly used special tests may be a useful foundation for the knowledge base of athletic training students.
Many clinical tests have been reported and used by health care professionals in evaluating patients with shoulder problems. These specific tests are developed through clinical practice to reproduce signs or symptoms of pain, weakness, or instability by stressing anatomical tissues to rule in or rule out a specific condition. Because of the multitude of tests, deciding which tests to teach is challenging for educators of health care curriculums. Some educators may attempt to teach all tests, whereas others teach only specific tests based on familiarity or personal preference. Regardless of the approach, test validity and frequency of use in clinical practice should be important factors as the health care field continues to advocate evidence-based clinical practice.
The typical approach to establish the validity of a clinical test uses a clinical cohort study in which the specific test is compared with a reference standard, such as the presence of injured tissue (eg, labral or rotator cuff injury) by visualization during a surgical procedure.1 The usefulness of a test is determined by comparing the clinical finding (a positive or negative test) with the reference standard, which is computed by calculating the sensitivity, specificity, predictive values, and likelihood ratios for each test.2,3 These values provide helpful insight into how well the special test rules in or rules out the lesion of interest. However, the manner in which the study was conducted provides further information to the educator regarding the validity of the reported results. The quality of these diagnostic studies can be examined using the Quality Assessment of Diagnostic Accuracy Studies (QUADAS), which was derived using a Delphi approach; a panel of 9 experts in diagnostic accuracy studies identified the items selected from 3 systematic reviews that should be incorporated into the assessment tool.4 Multiple rounds of rating, review, and discussion of the items took place until the final tool of 14 questions was developed. This instrument has been used in a recent systematic review of shoulder clinical tests.2
Beyond knowing the diagnostic accuracy of a special test, it is reasonable for health care professionals to know how commonly the test is used in clinical practice. Because most clinical shoulder examination tests described in the orthopaedic literature have been developed by orthopaedic surgeons, it is logical to investigate how frequently they use these tests in their clinical practices. The American Shoulder and Elbow Surgeons (ASES) is a subspecialty society of leading national and international orthopaedic surgeons who specialize in shoulder and elbow surgery. This group's use of clinical shoulder tests is likely to represent the exemplar for health care professionals. Therefore, the purpose of our study was to combine a survey of orthopaedic surgeons who specialize in the shoulder with a qualitative rating of the available evidence for diagnostic accuracy of shoulder tests in order to help health care educators decide which shoulder tests are appropriate to teach to students.
Active members of the ASES were asked to complete a survey regarding the clinical shoulder tests used during the clinical examination of the shoulder. The survey was distributed via e-mail through the ASES home office. Before distribution, the survey was reviewed and approved by the Lexington Clinic Institutional Review Board. The survey was also reviewed for clarity by 2 orthopaedic surgeons who were not ASES members but frequently treated shoulder injuries. Consent for participation and release of results was assumed upon the voluntary completion and submission of the survey by each respondent.
Basic demographic information was obtained, including number of years in practice, age, and number of shoulder patients seen per year. Respondents were also asked to identify the types of shoulder injuries they treated in their practices based on the following categories: (1) sports medicine-athletic, (2) sports medicine-recreational, (3) industrial, (4) osteoarthritis, and (5) general shoulder practice. A simple binary survey was developed that included 72 different shoulder tests, chosen based on literature support or common clinical practice,5,6 which were divided into 9 sections based on the condition each test was designed to detect (Appendix). The respondent was instructed to check yes (test is currently used) or no (test is not currently used) for each test. In addition, the respondent could write in other tests he or she used to detect specific injuries. Respondents were instructed to return the completed survey within 4 weeks of initial receipt, either by e-mail or fax to the authors. An e-mail was sent to ASES members at 2 and 4 weeks after the initial distribution, reminding them to complete the survey by the deadline.
Once completed surveys were submitted, we performed a frequency analysis to identify the commonly used and average number of tests used by each respondent. We considered a frequency of 50% or greater as a commonly used test and carried out a literature search through the MEDLINE, CINAHL, and SPORTDiscus databases to determine if valid evidence existed to support the use of the specific test. The search items were the name of the individual shoulder test, in addition to the terms shoulder, clinical test, and special test. The abstracts generated from this search were reviewed by 2 authors and manually selected for comprehensive review.3 In some instances, review of a complete article yielded additional relevant references to be evaluated. Each article was reviewed by 2 authors (A.D.S. and T.S.) independently for quality and rated according to the QUADAS assessment tool.4 The QUADAS tool comprises 14 yes or no questions designed to rate a diagnostic study for validity. A score of 10 or more has been suggested to indicate a higher-quality study.2 From the existing literature, we used analyses of each clinical test based on its original reported use for sensitivity and specificity to determine the justification for use of each test.
Of 131 active ASES members, 71 responded to the survey request, for a 54% response rate (age = 52 ± 9.2 years, years in practice = 21.2 ± 11, total number of shoulder patients seen per year = 1701.3 ± 1305.2). Each respondent described the distribution of shoulder injuries he or she treated: 39 (55%) treated patients from all categories, 54 (76%) treated sports medicine-athletic injuries, 56 (79%) treated sports medicine-recreational injuries, 45 (63%) treated industrial injuries, 55 (77%) treated osteoarthritis, and 60 (85%) treated general shoulder injuries.
The total number of special tests each surgeon used for each condition ranged from 7 to 19 (Table 1). The survey originally contained 72 clinical shoulder tests. Another 50 tests were added by write in, resulting in a total of 122 tests. A sample survey along with the frequency and percentage of responses is shown in the Appendix. The average number of tests used for all injury categories by each surgeon was 30 ± 9 tests, with at least 1 test being used for each of the 9 categories listed. Only 2 tests—anterior apprehension for anterior instability and the cross-body adduction test for acromioclavicular joint injury—were used by all respondents. Twenty-five tests (20%) were used by at least 50% of the respondents. For these 25 tests (Table 2), we searched the literature to locate diagnostic accuracy results and then qualitatively reviewed these with the QUADAS tool.
The electronic and hand-searched literature review identified 31 papers that reported sensitivity and specificity for 16 of the 25 most commonly used clinical tests. No data were available for the remaining 9 tests. The 2 reviewers agreed on the QUADAS scores for 21 of the 31 articles after independent review. When the reviewers did not agree, they conducted a second, nonindependent review.
No minimum level of acceptance for either sensitivity or specificity has been advocated in the literature. These values can be useful in deriving likelihood ratios, which can help determine the probability that a patient does (positive ratio) or does not (negative ratio) have a particular injury or condition.7 To illustrate the wide range of variability between the survey results and the existing literature, we list the range and median value for sensitivity and specificity and the positive likelihood ratio, negative likelihood ratio, and QUADAS scores by pathologic category in Tables 3–5.
Our purpose was to document which clinical shoulder tests were being used by orthopaedic shoulder specialists and at what frequency. The ASES members used a wide variety of special tests in evaluating a patient's shoulder injury; frequency of use equaled or exceeded 50% for only 25 of 122 tests. The lack of supporting diagnostic accuracy data for a special test did not preclude use of the test clinically, which likely indicates that clinical experience and familiarity outweighed diagnostic accuracy when this specialized group of orthopaedic surgeons decided which tests to use. However, in educating aspiring health care professionals with limited clinical experience, the reverse may be an appropriate starting point. In 7 of 9 pathologic categories, special tests did not have consistently high diagnostic values (Table 3–5). Therefore, a clinician may be forced to rely on multiple tests to determine the presence of injury, and this possibility may explain the relatively small percentage of tests used collectively by the respondents.
Based on the multitude of self-reported clinical tests, our findings suggest a trend toward using variations of established clinical tests. For example, the original survey contained 11 tests to detect labral injury; however, respondents wrote in 7 tests, nearly doubling the total to 18. This could be due to differences in surgeons' residency training and personal preferences for certain special tests as well as anthropometric differences between clinicians and patients. Athletic training education curriculums are designed to teach students an initial evaluation process that comprises multiple components, one of which is the application of special tests in the classroom and laboratory settings. However, orthopaedic surgeons learn the most from their attending physicians during training on clinical patients, not laboratory partners. The variety of clinical tests reported as used in this study could reflect mentoring doctors' preferences and not necessarily the diagnostic accuracy of a specific test. Although variations of clinical tests are common and perhaps beneficial, it is important to understand that the diagnostic accuracy of a modified test may not correlate with standard methodologic practice.
Rotator Cuff and Impingement Tests
Rotator cuff testing was being used most often and with great variation compared with all other testing. Seven rotator cuff tests were used by more than 50% of respondents, with at least 1 test designed to assess the integrity or function of each individual rotator cuff muscle. These tests include static (isometric) and dynamic muscle testing maneuvers and lag signs.18,20 A lag sign assesses a rotator cuff muscle's ability to sustain a shortened end position.20 A positive sign results when a “lag,” or inability to maintain the position of the arm at specific end ranges of motion, occurs. Whether stress tests are more useful than lag signs in helping to make a clinical diagnosis of rotator cuff injury is unknown. The many tests used by these surgeons to assess rotator cuff injury may reflect a high prevalence of rotator cuff injury in the general population.57 Another possible reason for the high rate of use is the anatomical structure of the rotator cuff: injury can affect 1 or more of the 4 tendons surrounding the humerus. Various clinical tests have been developed over the years to help clinicians detect injury to specific rotator cuff tendons. The existing literature has consistently suggested that these tests are clinically useful with ample specificity; however, they are more useful when combined with the patient's history and chief complaints.18,20,30 Therefore, the 7 tests reported as being used should complement the subjective components of the examination process.
Diagnostic values for shoulder impingement tests have been investigated in multiple studies.25–28 These studies, which provided QUADAS scores ranging between 6 and 12 (median = 9.5), have reported moderate to high diagnostic values, with the Neer14 and Hawkins-Kennedy15 tests being consistently more sensitive than specific. To elicit or reproduce the painful symptoms, impingement tests require the application of slight overpressure once the humerus is positioned to narrow the subacromial space. The degrees of freedom of the rotator interval tissue are reduced in these testing positions, and when the tissue is stressed during the testing maneuver, a painful response can occur. This may be why the literature has consistently shown these tests to be more sensitive than specific, although the validity of the studies has varied. In addition, the multiple diagnoses associated with symptoms related to impingement syndrome (eg, internal derangement, instability, bony alterations)58 give credence to the idea that impingement is a physical finding rather than a specific diagnosis. Combining impingement tests may be more beneficial to the clinician in determining if shoulder impingement is present.25,27,28 Although the study results differed regarding which tests should be used or combined, Michener et al28 most recently generalized that 3 or more positive tests out of 5 (Hawkins-Kennedy, Neer, painful arc, empty can, and external-rotation resistance) can be useful in confirming the presence of impingement. The information from our survey data suggests that the majority of ASES surgeons used at least 3/5 tests (Hawkins-Kennedy, Neer, and Jobe [empty-can] tests). Diagnostic accuracy criteria provide some evidence to support their use; therefore, health care professionals should be instructed in at least 3 of these tests.
Acromioclavicular Joint Injury
The cross-body adduction test41 was the single test reported by all surgeons as being used to detect acromioclavicular joint injury. This test is high quality (QUADAS score = 11), even though the diagnostic evidence is moderate, with the authors59 of the lone study suggesting that it not be used in isolation but as one of several clinical tests. Tests that were highly sensitive were not equally specific, and those that were highly specific were not equally sensitive. However, our results show that only 45% used another test (the active compression [O'Brien] test with the cross-body abduction test). Thus, more than half of the respondents used the cross-body adduction maneuver alone, even though the literature recommends not doing so, which could indicate that the physicians' clinical experience potentially outweighed the literature. According to the literature recommendations, various acromioclavicular joint tests could be instructed in educational programs. Yet it is difficult to specify which tests should be taught. Only 1 of 11 tests surveyed was used beyond the 50% threshold. The acromioclavicular shear test, which is described in commonly used physical examination text books,1,5 did not appear to be used by the physicians surveyed, and we found no supporting clinical utility information in the literature. This result may prompt educators to cease advocating use of the test in the clinical setting.
The Speed39 and Yergason tests40 were reported as the 2 tests used for biceps injury assessment (90% and 85%, respectively). Although both tests have been examined frequently in the literature,42,53–55 the high rate of use in this study is not consistent with the finding of most studies that neither test has strong clinical value for detecting biceps injury. This result is similar to that for the cross-body adduction test, in that a high rate of use does not coincide with the literature findings. However, the Speed test was originally used for diagnosing tenosynovitis, whereas the Yergason test was used for diagnosing long head of the biceps subluxation,39,40 even though neither condition was investigated when these tests were evaluated in the past. Both the Speed and Yergason maneuvers have been assessed for use in detecting biceps injury, tendinopathy, or other glenohumeral-specific injury (eg, labral injury) rather than their original reported uses.42,54,55 This suggests that the existing diagnostic values, although valid (ie, the median QUADAS score was 10), are more closely identified with conditions other than the actual condition each test was designed to detect, leaving each test's clinical utility for detecting tenosynovitis and subluxating biceps tendon, respectively, unknown.
Another example of clinical tests without reported diagnostic values are those used to assess scapular dysfunction. Scapular dysfunction tests are actually designed to detect a physical finding rather than a pathologic problem. According to the results of this survey, 2 tests are most commonly used to identify scapular dysfunction: the wall pushup and scapular retraction tests.37 Scapular dysfunction is a nonspecific response to a painful condition in the shoulder, rather than a specific response to certain glenohumeral injuries60 ; therefore, positive findings on either of these tests do not indicate injury. Because the existing tests used to identify scapular dysfunction are qualitative in nature and not associated with a specific injury, it is difficult to calculate a diagnostic value, especially a value that can be verified by a gold standard such as arthroscopy or other invasive means.
The diagnostic values for posterior and multidirectional instability tests have not been extensively evaluated. This survey revealed that 4 tests were used to evaluate posterior instability: the jerk test,13 load-and-shift test,10 posterior apprehension test,12 and posterior drawer test.11 Of these, only the jerk test has had diagnostic accuracy investigated and reported61 ; the remaining 3 maneuvers were widely used by more than half of respondents (65% to 83%). Posterior instability is rare, occurring in only 2% to 5% of those with shoulder instability,62 and this clinical diagnosis does not appear to be aided by special tests. The values associated with the jerk test indicate posterior instability resulting from a posterior-inferior labral injury, whereas the load-and-shift, posterior apprehension, and posterior drawer tests evaluate the integrity of the shoulder capsule.
Multidirectional instability, which is more common than posterior instability, may have an exclusive diagnostic test compared with posterior instability, as determined by this survey. Two tests reported as being used to assess multidirectional instability were the sulcus sign35 (n = 69, 97%) and the Gagey (hyperabduction) test36 (n = 39, 55%). We found no diagnostic accuracy studies in the literature for either test, but multiple authors36,63,64 have examined the anatomical basis in cadavers and observed that the inferior glenohumeral ligament is better stressed with humeral abduction than when the arm rests at the side of the body. This result is in contrast to our survey findings in that the sulcus sign maneuver with the arm down against the trunk of the body received almost complete consensus. Only a few more than half of the respondents used the Gagey test, which is performed in the preferred position with the arm abducted, suggesting that clinical experience or personal preference favors use of the sulcus sign.
Anterior instability is the most common form of instability.13 The anterior apprehension test was first introduced by Rowe and Zarins in 1981.8 A positive test occurs when apprehension and pain result from passive movement of the affected shoulder into maximal external rotation in humeral abduction. Multiple reports and texts advocate use of this test.22,23 Diagnostic studies22,23 showed that the sensitivity (median = 0.63) and specificity (median = 0.98) were variable, but both sets of authors concluded that the test can be helpful in identifying anterior instability when a patient reports apprehension and not pain during the maneuver. However, the supporting evidence is not strengthened by a high QUADAS score (median = 8.5), suggesting that the anterior apprehension test may be a viable option for assessing anterior instability, but a definitive, evidence-based recommendation is difficult to make.
Glenoid Labral Injury
A total of 18 tests were used to detect labral injury in the clinical setting. Of those, only 1 test, the active compression test, was used by more than 50% of the surgeons. The active compression test (commonly called the O'Brien test) was designed to detect labral injury and was originally reported as both highly sensitive (100%) and highly specific (98%) for the detection of labral injury.38 Subsequent authors42,43–50 have not been able to replicate these high diagnostic values and instead reported median sensitivity and specificity of 0.62 and 0.51, respectively. Upon closer examination, the QUADAS score for the original report was 5, whereas the scores for the other reports ranged between 7 and 12 (median = 9). A recent systematic review3 conducted exclusively on labral physical examination tests showed that validity was lacking in multiple diagnostic studies, implying that the previous literature's usefulness in advocating the use of any specific labral test is limited.
The low-moderate diagnostic values reported for the active compression test may be a result of the maneuver's technique. One of the primary mechanisms of labral injury is the peel-back lesion, which occurs when the long head of the biceps tenses the labrum beyond its anatomical limit as the arm is abducted and externally rotated.65 The arm position and load application of the O'Brien test are not ideal for reproducing the peel-back position because the arm is forward flexed and internally rotated. Internal rotation of the arm in the forward-flexed position before resistance winds the long head of the biceps brachii and places tension on the superior aspect of the labrum, which may be the crucial component in eliciting a positive test. Although the active compression test may not be ideal for replicating the peel-back lesion, it still elicits moderate results in detecting labral disease, probably due to tension on the biceps. As with the tests used for detecting rotator cuff injury, impingement, and acromioclavicular joint injury, when the active compression test is combined with other labral tests, a clinician can determine the presence of labral injury.42 It should be noted that very few labral tests have been evaluated to the extent of the active compression test, so it would be premature to recommend specific tests for use in conjunction with the maneuver at this time.
Effect on Education
The disparity between the wide array of clinical tests contained in this survey and the few tests that were reported as being commonly used could have a negative effect on health care education programs because no directive currently advises which tests should be taught in any specific curriculum. Education programs autonomously determine which tests should be taught based on each program's respective accreditation guidelines, clinical standards, and educational competencies. For example, the National Athletic Trainers' Association's educational competencies do not specifically state that all shoulder tests need to be taught or evaluated but instead state that a student should be able to “apply appropriate stress tests for ligamentous or capsular stability, soft tissue and muscle, and fractures” when assessing an upper extremity injury.66 It is difficult to discern which tests are considered “appropriate” due to the vagueness of the term. To satisfy educational standards, curriculums typically err on the side of caution and teach most, if not all, existing clinical shoulder tests; however, our survey results indicate that shoulder surgeons commonly use a small group of tests.
Diagnostic values such as sensitivity and specificity can be helpful to the educator attempting to decide which clinical tests should be taught. Yet no level of acceptance for either measure has been advocated in the literature. These values can be useful in deriving likelihood ratios, which can aid in identifying the probability that a patient does (positive ratio) or does not (negative ratio) have a particular injury or condition.7 Jaeschke et al7 noted that, in order to consider a test “acceptable” in either ruling in or ruling out a specific injury or condition, the minimum ranges should be 2–5 for positive likelihood ratios and 0.5–0.2 for negative likelihood ratios. For the 16 tests examined, varying degrees of clinical utility have been reported, with the median diagnostic values of some tests falling below what is considered acceptable. Conversely, other tests' median values are at or above the established level of acceptance, showing that educators' use of critical evaluation tools such as the QUADAS can help them make informed decisions about the application or instruction of clinical shoulder tests through critiques of the existing literature.
Our results reinforce the recent clinical preference toward using multiple tests to assess the presence of a specific condition. Special testing, however, is only one component of the comprehensive clinical examination, and clinical decision making should not be based solely on the findings of these tests. Typically, the subjective examination (ie, mechanism of injury and description, localization, and duration of pain) logically directs the clinician to begin thinking about specific diagnoses, which often lead to performing selective objective tests. Although we focused on one aspect of the examination process, all the assessment components are valuable, and the information obtained from this study may help to make the clinical examination of the shoulder more efficient by refining the special-testing segment of the process.
At this time, making recommendations as to which tests should be taught would be premature because of the discrepancy between the tests that have evidence supporting their use and the tests that have no such documentation. For those tests with literature support (16/25), validity varies. Therefore, we recommend that instructors, to maintain cross-discipline consistency, should at minimum teach students the 25 clinical shoulder tests reported as commonly used by the physicians responding to this survey. In addition, we advise the use of critical appraisal tools to determine if the supporting literature is of sufficient quality that an instructor should teach or refrain from teaching a particular clinical shoulder test. The critical appraisal tools can be used as guides for designing and standardizing the methods by which well-conducted studies are performed, which will, in turn, produce consistently reliable results.
We are the first to document the frequency of use of clinical shoulder tests by shoulder surgeons. However, certain limitations must be noted. First, although we asked if a number of tests were or were not used, we did not ask why each answer was selected. One of our primary observations was that a number of tests are being used that lack literature support. The survey should have included questions on whether each test was being used or not used because of the presence or absence of literature support or because of personal experience. Nevertheless, our goal was to establish a baseline of use, which was achieved.
This survey was not designed to assess differences among clinicians' skills, but not all clinicians have the same amount of experience performing a clinical shoulder examination, and they may have different examination methods (eg, palpation skills, special testing techniques). Also, not all clinicians uniformly employ the gamut of clinical shoulder tests. Patients who present with multiple ailments may prompt the clinician to select specific tests following an algorithmic approach that might have affected a respondent's selection.
Another limitation is that only 1 group of orthopaedic surgeons was surveyed. Yet, as a result of their membership in ASES, these individuals are considered experts regarding the evaluation and treatment of orthopaedic shoulder conditions. Therefore, we believed this group best illustrated the gold standard for practicing physicians. Although it would be prudent to eventually survey athletic training course instructors, clinical athletic trainers, and team physicians who are not shoulder specialists exclusively, it would not benefit the immediate objective of establishing a baseline for using clinical tests.
Lastly, in an attempt to verify if the clinical tests being used by shoulder surgeons had supporting evidence, we performed a literature review. As is the case with any literature review, some evidence may have been unintentionally overlooked. However, limiting the inclusion criteria to studies that examined clinical tests for their originally reported conditions rather than other injuries, as well as reporting at least sensitivity and specificity, helped to minimize this occurrence.
With this information as a baseline, educators can be guided in creating curriculums and teaching students. These data provide a reasonable estimate of the tests clinicians are commonly using for 9 major shoulder diagnoses and can serve researchers interested in furthering evidence-based practice through clinical validity tests. Future efforts should concentrate on determining the true clinical utility of the most commonly performed tests to see if they are indeed the tests that should be advocated and taught.
This study serves as a baseline for both clinicians and educators in providing a description of which tests are being used most by experienced shoulder surgeons. A total of 25 shoulder special tests have been identified by at least 50% of the responding orthopaedic surgeons as frequently used when evaluating patients with common musculoskeletal conditions of the shoulder. Given the close working relationship between physicians and athletic trainers, these 25 tests may serve as a foundation of standardized knowledge for educators. Instructors certainly have the discretion to teach any test they choose, but they should educate students regarding each test's level of diagnostic validity. They should also consider the test's original description and intended purpose but be cautious when evidence for the diagnostic accuracy of a particular clinical test is lacking.