There are more than ten classification systems currently used in the staging of hallux rigidus. This results in confusion and inconsistency with radiographic interpretation and treatment. The reliability of hallux rigidus classification systems has not yet been tested. We sought to evaluate the intraobserver and interobserver reliabilities of three commonly used classifications for hallux rigidus.
Twenty-one plain radiograph sets were presented to ten American College of Foot and Ankle Surgeons board-certified foot and ankle surgeons. Each physician classified each radiograph based on clinical experience and knowledge according to the Regnauld, Roukis, and Hattrup and Johnson classification systems. The two-way mixed single-measure consistency intraclass correlation coefficient was used to calculate intrarater and interrater reliabilities.
The mean ± SD intrarater reliability of individual sets for the Roukis (0.62 ± 0.19) and Hattrup and Johnson (0.62 ± 0.28) classification systems was fair to good and for the Regnauld system bordered between fair to good and poor (0.43 ± 0.24). The interrater reliability of the mean classification was excellent for all three classification systems.
Reliable and reproducible classification systems are essential for treatment and prognostic implications in hallux rigidus. Herein, the Roukis classification system had the best intrarater reliability. Although there are various classification systems for hallux rigidus, the present results indicate that the three classification systems evaluated show reliability and reproducibility.
Hallux limitus and hallux rigidus are two of the most common pathologic disorders facing podiatric physicians, affecting 2.5% of the adult population.1 Hallux limitus is an arthritic condition causing decreased motion at the first metatarsophalangeal joint, which tends to be more frequent in men and with 44% of affected patients older than 80 years. The decrease in sagittal plane motion leads to pain, stiffness, and gait disturbances.2,3 The condition is progressive and oftentimes results in hallux rigidus, a complete lack of motion at the first metatarsophalangeal joint. Although there does not seem to be one underlying cause of hallux limitus, it can be thought to result from hypermobility of the first ray, pes planus, a long first metatarsal, primus elevatus, iatrogenous complications, inflammatory diseases such as gout and rheumatoid arthritis, and genetics.2,4,5
Hallux rigidus and hallux limitus can be diagnosed with radiographs and clinical evaluation.2,6,7 The degree of joint space narrowing and the presence of osteophytes and first metatarsal elevation help determine the severity of the disorder. There have been numerous radiographic classification systems to help measure and grade hallux rigidus.8-14 The three most commonly used classifications are the Regnauld, Hattrup and Johnson, and Roukis classification systems. The Regnauld classification has three grades, with grade I defined as functional hallux limitus, grade II as joint adaptation with flattening of the first metatarsal head and pain at end range of motion, and grade III as arthrosis with severe flattening of the first metatarsal head, osteophytes, asymmetrical joint space narrowing, and erosions.8,11 The Hattrup and Johnson classification also has three grades: grade I is characterized as mild-to-moderate formation of osteophytes with no joint space involvement; grade II as moderate osteophyte formation, joint space narrowing, and subchondral sclerosis; and grade III as increased osteophyte formation and loss of joint space.9 Roukis was the first grading system applied prospectively and is similar to the Regnauld classification but includes a stage IV, which is defined as having less than 10° of range of motion and loose bodies with obliteration of joint space in the first metatarsophalangeal joint.10
Surgeons rely heavily on radiographs and classification systems to confirm clinical diagnosis and aid in surgical planning.6 The objective of this study was to assess the interobserver and intraobserver reliability of the Regnauld, Hattrup and Johnson, and Roukis classification systems of hallux rigidus. In particular, we examine the consistency of three commonly used classification systems and discuss implications for the treatment of hallux rigidus.
Materials and Methods
We selected the Regnauld, Roukis, and Hattrup and Johnson hallux rigidus classification systems for study (Table 1). Twenty-one plain radiograph sets (three radiograph packets with each packet containing seven sets of radiographs) of the foot were randomly selected from physicians' electronic systems based on the presence of hallux limitus deformity. Three standard views were selected and provided per set (weightbearing anteroposterior, medial oblique, and lateral) (Fig. 1). Each radiograph packet contained the same seven sets of radiographs with one of the three classification systems attached.
Ten American College of Foot and Ankle Surgeons (ACFAS) board-certified foot and ankle surgeons were instructed to objectively classify each packet of radiographs according to its assigned classification system based on their clinical experience and knowledge. The surgeons were randomly selected. A handout of the instructions was given to each physician. This process was repeated with each individual physician three times to determine whether each physician classified each set of radiographs consistently and to eliminate potential bias. All of the surgeons were familiar with the Regnauld, Roukis, and Hattrup and Johnson classification systems (Table 1).
Physician intrarater and interrater reliabilities were assessed based on the two-way mixed single-measure consistency intraclass correlation coefficient (ICC).14,15 The ICC provides a measure of the proportion of reliable variance and typically ranges between 0 and 1.15 Specifically, ICC[3,1] and ICC[3,k] are reported. The ICC was interpreted based on guidelines for reliability provided by Fleiss16 : less than 0.40, poor reliability; 0.40 to 0.75, fair to good reliability; and greater than 0.75, excellent reliability. Analyses were conducted using R version 3.1.3 (Vienna, Austria).12-14
The 21 radiograph sets were assessed by the ten ACFAS board-certified foot and ankle surgeons: nine men and one woman with 6 to 7 years of postgraduate education and 5 to more than 20 years of experience. Overall, the mean ± SD intrarater reliability of individual sets for the Roukis and Hattrup and Johnson classification systems was fair to good (0.62 ± 0.19 and 0.62 ± 0.28, respectively), whereas that for the Regnauld system bordered between fair to good and poor (0.43 ± 0.24). The mean ± SD intrarater reliability of the mean classification across the three sets was excellent for the Roukis and Hattrup and Johnson classification systems (0.81 ± 0.13 and 0.78 ± 0.23, respectively) and fair to good for the Regnauld system (0.65 ± 0.24). Intrarater and interrater reliabilities are shown in Table 2.
Interrater reliability was lower for set 3 than for sets 1 and 2 for all classification systems (Table 2). The Roukis classification system attained consistently higher interrater reliability than the Regnauld or Hattrup and Johnson classification systems across sets 1, 2, and 3. Specifically, interrater reliability of individual physicians for the Roukis classification system was fair to good, whereas that for the Regnauld and Hattrup and Johnson classification systems was fair to good for sets 1 and 2 and poor for set 3. Reasons for the possible outliers in set 3 could be attributed to the subjective reading as well as variability in radiographic views and the different professional backgrounds and experience of the physicians. Interrater reliability of the mean classification across the ten physicians was excellent for all three classification systems, with the highest reliability attained by the Roukis classification system.
We evaluated the intrarater and interrater reliability of three commonly used hallux rigidus classification systems in evaluating foot radiographs in a sample of ten ACFAS board-certified surgeons. In doing so, we evaluated whether particular classification systems may exhibit greater consistency for the grading of hallux rigidus. The present results show consistency and little variability in the Regnauld, Roukis, and Hattrup and Johnson classification systems.
Hallux rigidus of the first metatarsophalangeal joint is one of the most common conditions associated with the foot. A multitude of classification systems for hallux rigidus have been used since 1930.17 Previous research has discussed the relative faults and strengths of the various classification systems, including reliance solely on radiographic findings without accounting for clinical findings.17 To our knowledge, we found only one other study, by Pate et al,18 that compares the reliability and intraobserver agreement of these classification systems. They suggested using radiographic grading systems for hallux rigidus with caution because they had only 75% intraobserver reliability. In the present study, we found that the Roukis and Hattrup and Johnson classification systems obtain higher intrarater reliability than the Regnauld system. With respect to interrater reliability, the Roukis classification system attains higher interrater reliability than the Regnauld and Hattrup and Johnson classification systems.
The classification systems evaluated herein are useful for grading the severity of hallux rigidus and for directing treatment or evaluating prognosis. For example, a Roukis grade I would lead to either conservative treatment or a cheilectomy, Watermann, or Youngswick-Austin procedure, and a Roukis grade IV would lead more to an implant versus arthrodesis.12,19 Conservative treatments for hallux rigidus include icing, nonsteroidal therapy, shoe modification, corticosteroid injections, physical therapy, and orthotic devices.18 The goal for surgical treatment is to relieve pain and ensure mobility of the first metatarsophalangeal joint. The present study offers the first quantitative comparison of the relative strengths of the classification systems based on interrater and intrarater reliability. This information can be useful in determining the plan and treatment that an individual needs based on the level of severity.
Limitations of this study include the relatively small number of observers and radiographs used. Future research using larger sample sizes may be useful to validate the conclusions identified herein. In addition, in the present study we focus on the Roukis, Regnauld, and Hattrup and Johnson classification systems. These particular systems were chosen due to their widespread use. However, other classification systems, such as Hanft, Coughlin, Drag, Oloff, and Jacobs, may be useful to be evaluated in context.20,21
The present data show consistent findings of reliability and reproducibility using the three classification systems, with the Roukis classification having the best intrarater reliability. Currently, there is no gold standard among hallux rigidus classification systems, resulting in difficulty comparing results from different classification systems from a physician perspective.6,21 We conclude that the classification system for hallux rigidus is a reliable and reproducible tool in evaluating radiographs and grading hallux rigidus. Larger randomized prospective studies may be useful to evaluate the conclusions observed herein.
Financial Disclosure: None reported.
Conflict of Interest: None reported.
Department of Podiatry, West Houston Medical Center, Houston, TX.
Baylor College of Medicine, Houston, TX.