Context.—Standards have been developed for establishing reference intervals, but little is known about how intervals are determined in practice, interlaboratory variation in intervals, or errors that occur while setting reference intervals.

Objectives.—To determine (1) methods used by clinical laboratories to establish reference intervals for 7 common analytes, (2) variation in intervals, and (3) factors that contribute to establishment of “outlier” intervals.

Design.—One hundred sixty-three clinical laboratories provided information about their reference intervals for potassium, calcium, magnesium, thyroid-stimulating hormone, hemoglobin, platelet count, and activated partial thromboplastin time.

Results.—Approximately half the laboratories reported conducting an internal study of healthy individuals to validate reference intervals for adults. Most laboratories relied on external sources to establish reference intervals for pediatric patients. There was slight variation in intervals used by the central 80% of study laboratories, but some laboratories outside the central 80% had surprisingly low and high limits for their reference intervals. In some cases the intervals used by 2 laboratories had no overlap. For example, one laboratory considered a hemoglobin of 13.8 g/dL in a woman to be “low” while another considered the same value to be “high.” Three percent of reference intervals contained a limit that qualified as an “outlier” using standard statistical tests; we could not identify any practice associated with adoption of outlier intervals.

Conclusions.—Many laboratories adopt reference intervals from manufacturers without on-site testing of healthy individuals. Reference intervals used by facilities that forgo on-site testing are not statistically different from intervals validated with on-site studies.

Laboratory test results are commonly compared to a reference interval before caregivers make physiological assessments, medical diagnoses, or management decisions.

The importance of reference intervals is underscored by US regulation: the Clinical Laboratory Improvement Amendments of 1988 require that laboratories that introduce an unmodified, US Food and Drug Administration– cleared or approved nonwaived test system “verify that the manufacturer's reference intervals (normal values) are appropriate for the laboratory's patient population.” 1 Laboratories that modify US Food and Drug Administration– approved tests or develop their own assays are required to establish their own reference intervals for their assays. Regulations also specify that reference intervals be included in laboratory reports or made available upon request to individuals who order tests.

Reference intervals are of 2 types.2 The most common type has been termed health-associated and is derived from a reference sample of persons who are in good health.3 For example, the reference interval normally reported for serum potassium is health-associated. The central 95% of the healthy adult population tested by many laboratories has serum potassium levels between 3.5 and 5.1 mEq/L, and these limits define the serum potassium reference interval. The other type of reference interval has been termed decision-based and defines specific medical decision limits that clinicians use to diagnose or manage patients. For example, a serum total cholesterol of more than 200 mg/dL defines the level at which diet and exercise are first recommended by most authorities to lower cholesterol in otherwise healthy adults.4 Similarly, the International Normalized Ratio range of 2.0 to 3.0 defines what some authors consider appropriate long-term anticoagulation in patients without atrial fibrillation or dilatation who have St Jude Medical bileaflet prosthetic aortic valves.5 Reference intervals that incorporate medical decision limits are often defined with clinical trials and adopted by laboratories from the medical literature. In this study we investigated the process clinical laboratories use to establish health-associated reference intervals. These intervals are also popularly known as reference ranges, normal values, normal ranges, biological reference intervals, and expected values.

Health-associated reference intervals vary from laboratory to laboratory. For example, in a survey of 525 clinical laboratories the lower limit of the laboratory's total serum calcium reference interval ranged from 8.3 to 8.8 mg/dL (10th and 90th percentile of laboratories) and the upper limit from 10.2 to 10.7 mg/dL.6 Some of this variation may have been due to differences among laboratories in clinical service needs, analytic platforms, populations of healthy individuals, or analytic imprecision that was present when reference intervals were determined. Yet some of the variation may have resulted from different approaches laboratories used to establish their reference intervals, as the process for establishing health-associated reference intervals has historically been poorly defined.

No single authoritative source specifies the process that clinical laboratories should use to establish health-associated reference intervals. The document that most closely approaches an authoritative source is the Clinical and Laboratory Standards Institute (CLSI; formerly NCCLS) document C28-A2.3 The CLSI document is based on the seminal work of Solberg and colleagues,7–12 who served in the 1980s on the Expert Panel on Theory of Reference Values of the International Federation of Clinical Chemistry and the Standing Committee on Reference Values of the International Council for Standardization in Haematology.

This CLSI standard recommends one approach for establishing reference intervals of a newly developed or modified analytical test system and a second, abbreviated, 20-specimen approach for validating the transfer of reference intervals among comparable analytical platforms. The more involved approach is applicable to instrument manufacturers first determining reference intervals for their test systems or to laboratories that develop their own assays or modify commercially available assays. The less involved approach allows clinical laboratories using unmodified US Food and Drug Administration–approved assays to validate reference intervals supplied by the manufacturers of their analytic instruments.

The current version of the Centers for Medicare and Medicaid Services (CMS) Survey Procedures and Interpretive Guidelines for Laboratories and Laboratory Services indicates a “laboratory must evaluate an appropriate number of specimens to verify the manufacturer's claims for normal values or, as applicable, the published reference ranges.”13 The CMS provides no guidance about what constitutes an “appropriate” number of specimens, when age-specific and sex-specific intervals need to be established, or how data from tested healthy individuals are to be used to verify a manufacturer's claims. The CMS agents make judgmental determinations about each facility's efforts to validate reference intervals on a case-by-case basis during the course of laboratory inspections.

Little is known about how laboratories establish health-associated reference intervals in practice. During 2004, 1.3% of laboratories enrolled in the College of American Pathologists (CAP) Laboratory Accreditation Program were cited by inspectors for failing to validate reference intervals properly, but the specific types of omissions were not documented and it is not clear whether inspectors approached this problem in a consistent manner. A survey of 500 laboratories conducted by the CAP in 2001 found that 390 (78% of laboratories) adopted manufacturers' published values for their reference intervals.2 Approximately one fifth of laboratories reported receiving help from manufacturers in validating a manufacturer's reference interval, either in the form of statistical consultation, test materials, or procedures. The survey did not collect information about the proportion of laboratories that adopted manufacturers' reference intervals without testing any healthy individuals.

The aims of the present study were to (1) describe methods actually used by clinical laboratories to establish reference intervals for common analytes, (2) document interlaboratory variation in the reference intervals used in practice, and (3) identify institutional factors and practices that influenced the reference intervals a laboratory adopted.

We considered the aims of this study to be important for two reasons: First, the validation of reference intervals can consume considerable laboratory time and resources. Methods for validating reference intervals that require less effort but that still produce reliable intervals may be attractive to clinical laboratories operating with tightly constrained resources. Second, we were concerned that some laboratories may have adopted unusual reference intervals that posed patient safety risks, because of some sort of oversight or conceptual error. We wished to examine how often “outlier” reference intervals (defined in the “Materials and Methods”) were found in laboratories, and whether any particular laboratory practices predisposed to the adoption of outlier intervals.

Study Design

The study was conducted according to the Q-Probes study format previously described, which relies on a convenience sample of clinical laboratories that subscribe to the CAP Q-Probes benchmarking program.14 After refinement of a standardized data collection instrument, CAP Q-Probes subscribers were mailed data collection instructions in late 2005.

Participants were asked to provide their laboratories' low and high values for reference intervals for 7 analytes: potassium, total calcium, magnesium, thyroid-stimulating hormone (TSH), hemoglobin, platelet count, and activated partial thromboplastin time (aPTT). Adult values (assuming a 32-year-old male inpatient) and pediatric values (assuming an 8-year-old male inpatient) were collected. In addition, hemoglobin reference intervals were collected for female adult and pediatric inpatients.

For each analyte, the following additional information was collected: unit of measure, primary specimen type, analytic instrument manufacturer, year the reference interval was originally established, year of the most recent revalidation of the reference interval, and year the primary instrument was placed into service. The methods the laboratory used to determine reference intervals for each analyte were also ascertained. Participants were asked whether they tested “healthy individuals” as part of the process they used to determine reference intervals, but no explicit definition of a healthy individual was provided.

Reference intervals for point-of-care instruments and capillary (fingerstick or earlobe) blood collections were excluded. In addition, intervals that were in the process of being reviewed were excluded, as were intervals for secondary laboratory testing sites and analyzers (if more than one testing site or analyzer was in use and reference intervals differed between sites or analyzers).

Participants were also queried about several institutional characteristics: occupied bed size, teaching status, pathology resident training status, government affiliation, institution location, institution type, CAP inspection status, and inspection status by the Joint Commission on Accreditation of Healthcare Organizations.

Laboratory Characteristics

Participants from a total of 163 institutions submitted data. Most of the institutions (97%) were located in the United States with the remaining located in Canada (2), Australia (1), Lebanon (1), and South Korea (1). Approximately 31% of participating institutions were teaching hospitals and 15% had a pathology residency program. Within the past 2 years, the CAP inspected 78% of the study laboratories. Hospital or laboratory inspections were conducted by the Joint Commission on Accreditation of Healthcare Organizations at 66% of participating institutions. Table 1 displays characteristics of participating institutions.

Table 1. 

Characteristics of Participating Laboratories*

Characteristics of Participating Laboratories*
Characteristics of Participating Laboratories*

For chemistry assays the majority of institutions reported that they most commonly tested serum, rather than plasma. Ninety-one institutions (56.5%) most commonly tested for potassium using serum; 94 (59.1%) used serum for calcium; 92 (57.9%) used serum for magnesium; and 120 (78.9%) used serum to test for TSH.

Statistical Analysis

Where required, calcium and magnesium reference interval data were standardized to milligrams per deciliter (if they had been reported in millimoles per liter or milliequivalents per liter). Prior to performing statistical tests for association, values were screened for outliers. Several participating institutions did not answer all of the questions on the questionnaire about demographic characteristics, institutional practices, or reference intervals for particular analytes or age groups. These institutions were excluded only from tabulations and analyses that required the missing data elements. All statistical analysis were performed using SAS v9.1 (SAS Institute Inc, Cary, NC).

The low and high values for the reference intervals were tested for associations with the institutions' demographic and practice variable information in Tables 1 and 2, as well as the analytic platform used by the laboratory. Individual associations were first tested using the nonparametric Kruskal-Wallis test. Variables with significant associations (P < .10) were then included in a forward selection regression model. All remaining variables were significantly associated at the .05 significance level.

Table 2. 

Source of Reference Intervals*

Source of Reference Intervals*
Source of Reference Intervals*

We used multivariate analysis of variance to perform a joint analysis of the dependent variables: low and high adult reference intervals and low and high pediatric reference intervals. This approach simultaneously tests whether the mean low/high vectors are statistically different for the analyte-specific predictor variables. Since the cell counts for the predictor variables were not equal, the significance level was set at .01.

We investigated outliers/atypical reference intervals using several techniques. First, we screened reference interval limits for outliers using 2 tests, the Tukey procedure (in which an outlier is defined as any value less than the first quartile − 1.5 × Interquartile Range or greater than the third quartile + 1.5 × Interquartile Range) and a 2-pass/3 SD procedure (in which an outlier is defined as any value that fell more than 3 standard deviations from the mean, either during a first pass with all data included or during a second pass in which outliers identified during the first pass were excluded). We also used a logit regression model to examine whether unusual reference interval limits—those in the upper or lower decile—were associated with any of the demographic and practice variables listed in Tables 1 and 2. For the logit regression, the significance level was set at .05.

When Are Reference Intervals Established?

A number of institutions reported that they did not know the year that their original reference interval had been established or the date of their most recent reference interval revalidation. Nine participants (5.5%) could not provide the year of their most recent revalidation of aPTT reference intervals, while 30 participants (18.4%) did not know the most recent year that potassium reference intervals were revalidated. Among laboratories that reported the year of their most recent revalidation, most had revalidated reference intervals within the past 5 years. Excluding aPTT, approximately two thirds reported that they revalidated their intervals in the same year that a new analyzer was purchased (Table 1). Nevertheless, in some laboratories and for some analytes the most recent revalidation of a reference interval had not occurred for more than 10 years, and one participant indicated the laboratory had last revalidated several reference intervals in 1983, more than 22 years before the study was conducted and more than 8 years before the laboratory's current analytic instrument was placed in service.

Methods Used to Establish Reference Intervals

Approximately half the participants reported that they conducted an internal study of healthy individuals to help establish chemistry and hematology reference intervals for adults, and half relied exclusively upon external sources (manufacturers' inserts, published literature, intervals used at other laboratories, or medical staff recommendations). For aPTT, 130 facilities (82.3%) reported conducting an internal study of healthy individuals to establish reference intervals. These data are shown in Table 2.

Most laboratories relied on external sources to establish reference intervals for pediatric patients (Table 2). Approximately one fourth of participants indicated they performed an internal study on healthy individuals to establish chemistry reference intervals for pediatric patients, and approximately 10% of facilities conducted internal studies to establish hematology and aPTT pediatric reference intervals.

One hundred thirty-six (84%) of 162 laboratories reported that they received some assistance from instrument manufacturers in establishing reference intervals. Of laboratories that received assistance from manufacturers, 111 (82%) reported receiving statistical support, 91 (67%) received consultative support, 55 (40%) followed a procedure for establishing reference intervals that had been provided by the manufacturer, and 27 (20%) received specimens for testing from the manufacturer.

We asked participants several detailed questions about the process they used to establish potassium reference intervals for adults. Of the respondents that indicated they had conducted an internal reference interval study of healthy individuals, half (65 laboratories) indicated that they had tested specimens from between 21 and 50 healthy individuals, and one quarter (32 laboratories) indicated they had tested more than 100 specimens. Of the sites that tested healthy individuals to help establish reference intervals, 56 sites (43%) set their reference intervals using the mean of the tested reference population plus or minus 2 standard deviations, while the remainder of respondents (73 sites; 57%) used test results to “verify” an external reference interval obtained from the manufacturer, a published textbook, or some other source. These data are shown in Table 3. Laboratories that set their reference intervals from the results of internal testing (mean ± 2 SD) did not tend to test more healthy patients than laboratories that used internal testing to “verify” an external interval (Table 4; chi-square = 2.27; P = .32).

Table 3. 

Methods Used to Establish Potassium Reference Interval

Methods Used to Establish Potassium Reference Interval
Methods Used to Establish Potassium Reference Interval
Table 4. 

Number of Healthy Individuals Tested for Potassium and Method Used to Establish Reference Intervals*

Number of Healthy Individuals Tested for Potassium and Method Used to Establish Reference Intervals*
Number of Healthy Individuals Tested for Potassium and Method Used to Establish Reference Intervals*

Interlaboratory Variation in Reference Intervals

There was slight variation in reference intervals among the central 80% of study laboratories (Table 5). Upper limits of reference intervals tended to vary slightly more from laboratory to laboratory than lower limits did. There was no dramatic difference between the amount of interlaboratory variation in adult reference intervals and pediatric reference intervals, even though adult reference intervals were more often validated by testing specimens from healthy individuals and pediatric reference intervals were most often obtained from external sources. Reference intervals for the 7 study analytes showed similar levels of interlaboratory variation.

Table 5. 

Variation in Reference Intervals of Selected Analytes Among 163 Laboratories

Variation in Reference Intervals of Selected Analytes Among 163 Laboratories
Variation in Reference Intervals of Selected Analytes Among 163 Laboratories

Outside of the central 80% of laboratories there was more substantial variation in reference intervals. A few laboratories had surprisingly low and high limits for their reference intervals. For example, one laboratory reported that its reference interval for adult potassium extended down to 3.0 mmol/L, while another reported that the upper limit of its potassium reference interval extended up to 5.7 mmol/L. In some cases the reference intervals used by 2 laboratories did not overlap, and the upper limit of one laboratory's reference interval was lower than the lower limit of another's. For example, one laboratory considered an aPTT of 30 seconds to be “low” while another considered an aPTT of 30 seconds to be “high.” Using the Tukey procedure, 40 (3.1%) of 1271 adult reference intervals contained at least 1 limit that was an outlier. The 2-pass/3 SD procedure identified the same total number of outliers. There were no significant differences in the fraction of outlier reference interval limits among the analytes we studied.

Institutional Factors and Practices Associated With Reference Intervals

We examined all of the institutional factors and practices in Tables 1 and 2 to elucidate variables that were associated with reference intervals. For several analytes, particular instrument manufacturers were associated with higher or lower reference intervals. Tables 6 and 7 illustrate this relationship, showing statistically significant associations between instrument manufacturer and the median value of the upper and lower limits of adult reference intervals, respectively. Tables 8 and 9 show the same relationship for the upper and lower limits of pediatric reference intervals. Interestingly, specimen type did not affect the potassium reference intervals used by participants, even though the potassium concentration in plasma is lower than in serum.15 The method used to establish a reference interval (adoption from manufacturer versus on-site testing of a healthy population) was not associated with the interval in use.

Table 6. 

Relationships Between Instrument Manufacturer and Adult Reference Interval Upper Limit*

Relationships Between Instrument Manufacturer and Adult Reference Interval Upper Limit*
Relationships Between Instrument Manufacturer and Adult Reference Interval Upper Limit*
Table 7. 

Relationships Between Instrument Manufacturer and Adult Reference Interval Lower Limit*

Relationships Between Instrument Manufacturer and Adult Reference Interval Lower Limit*
Relationships Between Instrument Manufacturer and Adult Reference Interval Lower Limit*
Table 8. 

Relationships Between Instrument Manufacturer and Pediatric Reference Interval Upper Limit*

Relationships Between Instrument Manufacturer and Pediatric Reference Interval Upper Limit*
Relationships Between Instrument Manufacturer and Pediatric Reference Interval Upper Limit*
Table 9. 

Relationships Between Instrument Manufacturer and Pediatric Reference Interval Lower Limit*

Relationships Between Instrument Manufacturer and Pediatric Reference Interval Lower Limit*
Relationships Between Instrument Manufacturer and Pediatric Reference Interval Lower Limit*

We examined whether any of the characteristics in Tables 1 and 2 were associated with the use of aberrant adult reference interval limits (in the upper or lower decile). Institutions that established their reference intervals before 2001 were 3.1 times more likely to have aberrant high aPTT limits than were institutions that established their intervals during 2001 to 2005 (P = .02). Institutions that placed test instruments in service before 2002 were 1.3 times more likely to have aberrant low TSH limits than were institutions that placed instruments in service in 2002 to 2005 (P = .01). No other factor was statistically associated with aberrant reference interval limits.

To our knowledge, this is the first large survey describing how reference intervals are actually established by clinical laboratories “in the field.” We found that approximately half of all laboratories test healthy adults to establish reference intervals, but few test healthy children. Some of the laboratories that test healthy adults calculate their own reference intervals from their findings, but most use in-laboratory testing to “validate” reference intervals supplied by manufacturers.

Although different approaches were used to establish reference intervals, no particular approach was associated with the reference interval that laboratories ultimately established. In other words, there was no discernable difference between the reference intervals of laboratories that adopted manufacturers' reference intervals without further testing, laboratories that tested healthy subjects to validate manufacturer-supplied intervals, and laboratories that tested healthy subjects to calculate their own reference intervals.

Since the approach laboratories used to determine reference intervals did not appear to influence the interval that was ultimately established, we question the conventional wisdom that it is always necessary for laboratories to validate manufacturers' intervals with on-site testing before adopting the interval locally. If a manufacturer adequately describes the process it used to establish a health-associated reference interval, and the laboratory is using the manufacturer's analytic platform in accordance with the manufacturer's instructions and is testing a similar population, validation of a manufacturer's interval might consist of nothing more than the laboratory medical director making a professional determination that the supplied interval is appropriate for the local laboratory. This approach is contemplated in the CLSI standard,3 but not in the current version of the CMS survey procedures.13 

Klein and Junge16 sound a note of caution about adopting manufacturers' reference intervals without local on-site testing of healthy individuals; the authors have concerns that preanalytic procedures used by manufacturers (such as specimen collection and storage procedures) may not be adequately duplicated by every laboratory, creating the need for local testing of healthy patients to ensure that manufacturers' intervals are locally applicable. While recognizing that the adoption of manufacturers' reference intervals without on-site validation poses some risk, our study suggests these risks are already being assumed by most laboratories that report pediatric reference intervals, since little testing of healthy children is taking place in laboratories even though specimen collection procedures for pediatric patients show a great deal of variability from institution to institution. The establishment of reference intervals for “special fluids” poses problems similar to those seen in pediatrics, as special fluid samples are rarely collected from healthy patients without clinical suspicion of disease, and reference intervals tend to be adopted from external sources without on-site validation.17 Reference intervals for children and special fluids can be set by multiple cooperating laboratories,18 but differences between test systems make this approach somewhat difficult to implement.19 

While we do not believe local testing of healthy patients is required before adopting a well-documented reference interval from a manufacturer, there is nothing wrong with a local laboratory performing on-site testing of healthy individuals using the abbreviated 20-specimen CLSI procedure to validate a manufacturer's reference interval. If no facility performs on-site validation of manufacturers' reference intervals, the entire laboratory community will be relying on manufacturers and their regulators to ensure that initial reference interval studies are performed correctly. We must also point out that a decision by a laboratory director to forgo on-site testing of healthy individuals for the purpose of establishing a reference interval does not relieve the laboratory of its obligation to adequately calibrate a new instrument and test the instrument's analytic accuracy and precision before placing it into service.

We found that for most institutions, interlaboratory variability in the reference interval values for our study analytes was fairly low. The type of test instrument in use was the main source of reference interval variability that we could identify, explaining much of the variation in reference intervals. Variation in reference intervals among analytic platforms is probably appropriate, given that major instrument platforms show analytic bias relative to one another.20–22 

Were the reference intervals used by most study participants accurate? Without access to the patient populations being tested by each of our study laboratories, we cannot answer this question directly. However, we collected indirect evidence that the potassium reference intervals used by many participants were at least somewhat inaccurate. We found no significant differences in the potassium reference intervals established by sites that primarily tested plasma and those that primarily tested serum, despite the known difference between plasma and serum potassium values (approximately 0.3 mmol/L, depending on the analytic test system). Only 28 sites “converted” results for 1 specimen type to the equivalent level from the other type before reporting, and only 17 sites established different reference intervals for each specimen type. For the remaining sites, potassium reference intervals were likely to have been inaccurate for at least 1 of the 2 specimen types.

What are the consequences of establishing health-associated reference intervals that are slightly inaccurate? Our study was not designed to address this question explicitly. Health-associated reference intervals generally do not define medical decision limits that call for action. Five percent of the healthy population—by definition—falls outside the health-associated normal range, and for most analytes there is no evidence that intervening to bring analyte concentrations into the central 95% of the healthy population will make healthy outliers any healthier. In our personal experiences, health-associated reference intervals are often ignored by physicians, who make management decisions using personal “rules of thumb,” using established decision limits relevant to a patient's particular clinical context, or after making reference to a patient's previous values for the same analyte. But small inaccuracies in a laboratory's reference intervals may have a subtle impact on demand for repeat or follow-up testing. Small degrees of analytic bias or imprecision can have significant impact on provider behavior for some analytes,23 and a similar relationship may exist for reference intervals.

Activated partial thromboplastin time and TSH may represent situations where small aberrations in health-associated reference intervals impact clinical care. Some heparin dosing protocols target a therapeutic aPTT range of 1.5 to 2.5 times the midpoint of the reference interval, which means a laboratory's choice of a reference interval will influence heparin dosing.24 In the case of TSH there is some controversy about the definition of hypothyroidism, and some authors consider any patient with a TSH above the upper limit of the health-associated reference interval to be hypothyroid. Where this view prevails, the choice of a reference interval will influence the frequency with which hypothyroidism is diagnosed.

We found that a few laboratories in our study had adopted atypical/outlier reference intervals. Overall, 3.1% of reference intervals contained a limit that qualified as an outlier using either of 2 standard statistical tests for identifying outliers. This percentage of outliers is much higher than would be predicted on the basis of chance or from normal variation in the results of on-site testing of healthy individuals. However, we could not identify any institutional practices that were broadly associated with the adoption of aberrant reference intervals. Some outliers may have resulted from participants incorrectly completing the data collection forms or being confused about the units they were reporting.

The clinical implications of using highly atypical/outlier reference intervals is of much more concern than the use of reference intervals that are slightly inaccurate. Test results are “framed” by reference intervals, and the use of aberrant frames can bias decision-making.25 Our study was not designed to determine whether a laboratory's adoption of aberrant/outlier reference intervals had clinical consequences for patients being served by the facility, but this question deserves further study.

Several limitations of this study should be acknowledged:

  • First, data from study participants were self-reported and we could not independently validate all of the data submitted.

  • Second, in the outpatient/outreach arena in the United States, approximately 30% of testing is performed by commercial laboratories that were not represented in this study. The approach to establishing reference intervals and the intervals in use by commercial laboratories may be different from those observed in this investigation.

  • Third, institutions installing new test platforms commonly perform a comparison study with the platform being replaced, in which a regression slope and intercept are calculated from split specimens tested by each platform. This practice may help laboratories appraise the “reasonableness” of reference intervals supplied by the manufacturer of a new test system, even if no on-site testing of healthy individuals is performed.

  • Fourth, the definition of a “healthy” individual may have varied among the study laboratories, accounting for some of the differences we observed in reference intervals.

  • Finally, the 163 participants in this study may not be representative of hospital-based laboratories. Participants' willingness to participate in this study might reflect increased concern about their own reference intervals or might alternatively reflect an unusually strong commitment to properly establishing reference intervals. As a result, the findings reported in this study may not accurately represent the methods generally used to establish reference intervals in hospital-based laboratories and may overrepresent or underrepresent the frequency with which outlier/atypical reference intervals are adopted in practice.

Despite these limitations, we believe our data and generally accepted laboratory practices support several recommendations:

  • First, laboratories should document how their reference interval for each analyte is established, even when “validating” a local reference interval consists of nothing more than the laboratory director making a professional determination that an instrument manufacturer's interval is applicable to the local population. In conformance with regulatory requirements, this documentation should be saved until 2 years after the reference interval is no longer in use.

  • Second, the methods used by manufacturers to determine reference intervals should be adequately described in instrument literature and reagent inserts so that laboratory directors may make informed decisions about whether to adopt a manufacturer's intervals for their own institutions. Conducting local reference interval studies consumes time and money, and manufacturers that provide adequate documentation about their own studies will help conserve their customers' resources.

  • Finally, as part of the process of implementing a new test system, laboratories should compare their chosen reference interval to the reference intervals recommended by manufacturers, published in the medical literature, or used by other laboratories. The adoption of outlier (and most likely inaccurate) reference intervals appears to occur with some frequency. Any extreme reference interval—for example, an interval that differs markedly from the manufacturer's or that has limits above the 95th percentile or below the 5th percentile of other laboratories—should prompt a reexamination of the process that was used to establish the local interval and consultation with the instrument manufacturer as required.

Clinical Laboratory Improvement Amendments of 1988 (CLIA),.
42 CFR §493.1253(b)(1)(ii) (2003)
.
Valenstein
,
P.
ed
.
Quality Management in Clinical Laboratories: Promoting Patient Safety Through Risk Reduction and Continuous Improvement.
Chicago, Ill: College of American Pathologists; 2005:99–104
.
How to Define and Determine Reference Intervals in the Clinical Laboratory; Approved Guideline—Second Edition.
Wayne, Pa: Clinical and Laboratory Standards Institute; 2000. NCCLS document C28-A2
.
Third Report of the National Cholesterol Education Program (NCEP) Expert Panel on Detection, Evaluation, and Treatment of High Blood Cholesterol in Adults (Adult Treatment Panel III).
Bethesda, Md: National Institutes of Health, US Dept of Health and Human Services; 2002. NIH publication 02-5215
.
Acar
,
J.
,
B.
Iung
, and
J. P.
Boissel
.
et al
.
AREVA: multicenter randomized comparison of low-dose versus standard dose anticoagulation in patients with mechanical prosthetic heart valves.
Circulation
1996
.
94
:
2107
2112
.
Howanitz
,
P. J.
and
G. S.
Cembrowski
.
Postanalytical quality improvement: a College of American Pathologists Q-Probes study of elevated calcium results in 525 institutions.
Arch Pathol Lab Med
2000
.
124
:
504
510
.
Solberg
,
H. E.
Approved recommendation (1986) on the theory of reference values. Part 1. The concept of reference values.
Clin Chem Acta
1987
.
167
:
111
118
.
PetitClerc
,
C.
and
H. E.
Solberg
.
Approved recommendation (1987) on the theory of reference values. Part 2. Selection of individuals for the production of reference values.
J Clin Chem Clin Biochem
1987
.
25
:
639
644
.
Solberg
,
H. E.
and
C.
PetitClerc
.
Approved recommendation (1988) on the theory of reference values. Part 3. Preparation of individuals and collection of specimens for the production of reference values.
Clin Chem Acta
1988
.
177
:
S1
S2
.
Solberg
,
H. E.
and
D.
Stamm
.
Approved recommendation on the theory of reference values. Part 4. Control of analytical variation in the production, transfer, and application of reference values.
Eur J Clin Chem Clin Biochem
1991
.
29
:
531
535
.
Solberg
,
H. E.
Approved recommendations (1987) on the theory of reference values. Part 5. Statistical treatment of collected reference values. Determination of reference limits.
J Clin Chem Clin Biochem
1987
.
25
:
645
656
.
Dybkaer
,
R.
and
H. E.
Solberg
.
Approved recommendations (1987) on the theory of reference values. Part 6. Presentation of observed values related to reference values.
J Clin Chem Clin Biochem
1987
.
25
:
657
662
.
Clinical Laboratory Improvement Amendments (CLIA), Interpretive Guidelines for Laboratories, Appendix C, Survey Procedures and Interpretive Guidelines for Laboratories and Laboratory Services.
Baltimore, Md: Centers for Medicare and Medicaid Services, US Dept of Health and Human Services. Available at: http://www.cms.hhs.gov/CLIA/03_Interpretive_Guidelines_for_Laboratories.asp. Accessed May 2, 2006
.
Howanitz
,
P. J.
Quality assurance measurements in department of pathology and laboratory medicine.
Arch Pathol Lab Med
1990
.
114
:
1131
1135
.
Solberg
,
H. R.
Establishment and use of reference values.
In: Burtis CA, Ashwood ER, Bruns DE, eds. Tietz Textbook of Clinical Chemistry and Molecular Diagnostics. 4th ed. Philadelphia, Pa: Elsevier Saunders; 2006:425–448
.
Klein
,
G.
and
W.
Junge
.
Creation of the necessary analytical quality for generating and using reference intervals.
Clin Chem Lab Med
2004
.
42
:
851
857
.
Dhondt
,
J. L.
Difficulties in establishing reference intervals for special fluids: the example of 5-hydroxyindoleacetic acid and homovanillic acid in cerebrospinal fluid.
Clin Chem Lab Med
2004
.
42
:
833
841
.
Rustad
,
P.
,
P.
Felding
, and
A.
Lahti
.
Proposal for guidelines to establish common biological reference intervals in large geographical areas for biochemical quantities measured frequently in serum and plasma.
Clin Chem Lab Med
2004
.
42
:
783
91
.
Klee
,
G. G.
Clinical interpretation of reference intervals and reference limits: a plea for assay harmonization.
Clin Chem Lab Med
2004
.
42
:
752
757
.
Klee
,
G. G.
and
A. A.
Killeen
.
College of American Pathologists 2003 fresh frozen serum proficiency testing studies.
Arch Pathol Lab Med
2005
.
129
:
292
293
.
Steele
,
B. W.
,
E.
Wang
,
D. E.
Palmer-Toy
,
A. A.
Kelleen
,
R. J.
Elin
, and
G. G.
Klee
.
Total long-term within-laboratory precision of cortisol, ferritin, thyroxine, free thyroxine, and thyroid-stimulating hormone assays based on a College of American Pathologists fresh frozen serum study: do available methods meet medical needs for precision?
Arch Pathol Lab Med
2005
.
129
:
318
322
.
Ross
,
J. W.
,
W. G.
Miller
,
G. L.
Myers
, and
J.
Praestgaard
.
The accuracy of laboratory measurements in clinical chemistry: a study of 11 routine chemistry analytes in the College of American Pathologists chemistry survey with fresh frozen serum, definitive methods, and reference methods.
Arch Pathol Lab Med
1998
.
122
:
587
608
.
Gallaher
,
M. P.
,
L. R.
Mobley
,
G. G.
Klee
, and
P.
Schryver
.
The Impact of Calibration Error in Medical Decision Making: Final Report.
Gaithersburg, Md: National Institute of Standards and Technology; 2004
.
Heparin sodium injection, USP PDR information.
Physicians DeskReference. Montvale, NJ: Thomson PDR; 2006
.
Tversky
,
A.
and
D.
Kahneman
.
The framing of decisions and the psychology of choice.
Science
1981
.
211
:
453
458
.

The authors have no relevant financial interest in the products or companies described in this article.

Author notes

Reprints: Paul N. Valenstein, MD, Department of Pathology, St Joseph Mercy Hospital, Ann Arbor, MI 48106-0995 ([email protected])