Dear Editor:

Minimal clinically important difference (MCID) estimates are often used to interpret change scores from measurement instruments. Researchers debate how MCID values should be estimated. In a recent paper, Tenan et al^{1 } recommended adjusting for baseline severity in the analysis to avoid biased MCID estimates due to regression to the mean (RTM). They stated that anchored MCID estimation can be biased by RTM due to repeated measurements. They also stated that including baseline severity as a covariate in the analysis (the authors used baseline covariate-adjusted receiver operating characteristic [ROC] analysis) averts this bias. No proof or justification was offered to support these statements. In this letter, we argue that adjusting for baseline severity is bound to introduce bias, instead of warding it off.

*Regression to the mean* refers to change that occurs due to random fluctuations.^{2 } Following a relatively high (or low) observation of a randomly fluctuating construct (eg, physical fitness), a repeated measurement will likely demonstrate a more moderate observation. Extreme values tend to regress to the mean. Therefore, RTM expresses itself as a negative correlation between baseline and change scores.

However, such a negative correlation is not necessarily a sign of RTM. Real reasons may explain why more severely affected patients improve more than less severely affected patients, eg, a treatment might be more effective in more severely affected patients.

The MCID is the change score deemed a minimal improvement that is considered important (we limit the discussion to improvement). The assessment of the improvement is based on an external criterion, namely, the anchor, which is often a single question that asks patients to rate their perceived change. It is assumed that patients have their own minimally important change thresholds, and it seems reasonable to consider the mean of the individual thresholds as the MCID to be estimated.^{3,4 }

Tenan et al simulated several datasets (n = 5000 patients) consisting of a baseline score (T1; mean = 45 ± 20) and a follow-up score (T2; mean = 65 ± 20), with variable correlations between T1 and T2. A binary anchor variable was added based on a Bernoulli distribution such that a positive outcome was more likely when the T1 – T2 difference was ≥20. It should be noted that the authors thus simulated the *true MCID* (defined as the average individual minimally important change threshold) as 20, independent of the baseline score. Indeed, a standard ROC analysis, using the anchor as the state variable and the change score as the test variable, yielded 20 as the MCID estimate.

Next, because the authors believed that MCID estimates can be biased by RTM and that accounting for the baseline score avoids this bias, they performed baseline covariate-adjusted ROC analysis. This analysis resulted in MCID estimates that were correlated with the baseline score, dependent on the correlation between T1 and T2. At this point, the authors believed their adjusted results (showing baseline-dependent MCID estimates) to be more true than their unadjusted results (which reflected the baseline-independent MCID values they had actually simulated). Why the authors came to this conclusion is a mystery to us. However, we do understand why baseline adjustment may lead to baseline-dependent MCID estimates (and this has nothing to do with RTM).

To clarify what happens when adjusting for the baseline score, we repeated the simulation in a different way. We adjusted for the baseline score by performing standard ROC analyses on baseline-stratified subgroups. We simulated a sample, similar to the first sample of the authors, but 5 times larger (n = 25 000). Then we split the sample into 5 subgroups based on quintiles of the baseline score. The correlation between the baseline score and the follow-up score was 0.11, and the correlation between the baseline score and the change score was –0.66. The results of the subgroup analyses are shown in the Table.

Due to the stratification, subgroups 1 through 5 showed increasing mean baseline scores. The MCID estimates mirrored the results of the authors' analysis in their Figure 3A; lower baseline scores were associated with higher MCID estimates and vice versa. Because of the negative correlation between the baseline and change scores, the mean change score was higher in subgroups with lower baseline scores and lower in subgroups with higher baseline scores. Given the simulated minimal important change threshold of 20, this resulted in greater proportions of patients who improved in the subgroups with lower baseline scores and smaller proportions who improved in the subgroups with higher baseline scores.

The cause of this baseline dependency of the MCID lies in the fact that the optimal ROC cutoff point (which defines the MCID value) depends on the prevalence of the condition (presently, the proportion of improved patients).^{5,6 } The optimal ROC cutoff point (Youden criterion) is the cutoff that classifies improved and not-improved patients with the least (weighted) misclassification. In large samples with normally distributed scores, this cutoff is characterized by equality of sensitivity and specificity. However, if the prevalence increases, the sensitivity of a given cutoff increases while its specificity decreases, and the opposite occurs if the prevalence decreases.^{5 } Therefore, in (sub)samples with greater proportions improved, the optimal ROC cutoff point will be higher, whereas in (sub)samples with smaller proportions improved, the optimal ROC cutoff point will be lower. Only if the proportion improved is 0.5 does the ROC-based MCID estimate reflect the true MCID (as shown in subgroup 3 in the Table).^{7 } In other words, the simulation study that Tenan et al performed actually showed that the ROC-based MCID depends on the proportion improved, even though the authors did not recognize it as such. In a previous paper,^{7 } we demonstrated this phenomenon extensively.

The bottom line is that, generally, ROC analysis is not a good method for estimating the MCID. Better methods include the adjusted predictive modeling method^{7 } and a method based on item response theory.^{8 } If one is concerned about the MCID being baseline dependent, simply stratifying on the baseline score or, for that matter, baseline covariate adjusting is not a good idea, but solutions do exist.^{9 }

## REFERENCES

*J Athl Train*

*Int J Epidemiol*

*Expert Rev Pharmacoecon Outcomes Res*

*BMC Med Res Methodol*

*Stat Med*

*J Clin Epidemiol*

*J Clin Epidemiol*

*Value Health*

*Qual Life Res*