Urine specific gravity (USG) should be measured at room temperature (20°C), but the temperature of the sample is not always considered.
To evaluate the effect of sample temperature on the measurement accuracy of a digital refractometer (DIG), manual optical refractometer (MAN), and hydrometer (HYD).
Descriptive laboratory study.
Urine specific gravity.
Experiment 1 (24 Brix (°Bx) samples) showed that measurements via the DIG and MAN did not differ from the reference, but HYD provided lower or inconsistent values compared with °Bx and was highly correlated with °Bx solutions (r, > = 0.89). The overall diagnostic ability of elevated USG cut-off values (≥1.020, ≥1.025, ≥1.030) was high for all tools (area under the curve >0.92). Misclassification of samples increased from 0 to 2 at 1.020 to 1 to 3 samples at cutoffs of 1.025 and 1.030 USG. Bland-Altman analysis showed that the DIG 5°C underreported slightly without reporting bias (r = −0.344, P = .13); all other plots for the DIG, MAN, and HYD showed considerably larger underreporting at higher concentrations (r = −0.21 to −0.97 with P >.02) at all temperatures. The outcomes of experiment 2 (33 fresh urine samples) using DIG 20°C as the standard demonstrated only negligible differences between the DIG and MAN at all temperatures but larger differences using the HYD.
All tools showed reporting bias compared with the °Bx solutions, which can affect the classification of low and high urine concentration at higher USG cutoff values, especially with a sample temperature of 37°C.
Urine specific gravity is preferably measured at 20°C, but temperature of the sample is often not standardized in practical settings on or around the field.
Refractometers showed reporting bias compared with the Brix solutions used as a reference standard for urine. In particular, values near a clinically relevant dehydration level (≥1.030) tended to be lower and were amplified by a 37°C sample temperature. The underreporting was in the range of 0.001 to 0.003 depending on the tool, temperature, and concentration.
The tested digital and manual optical refractometer measures tended to be similar when Brix solutions and urine samples at 5°C and 20°C were analyzed. The hydrometer displayed a high degree of misreporting at all temperatures and should not be used to measure urine specific gravity.
Athletes should monitor their hydration status to prevent dehydration, heat-related illnesses, and fatigue.1 However, no unambiguous standard for measuring hydration status in a field situation exists.2 Therefore, combining at least 2 independent measurements to assess hydration status has been recommended.2 The National Athletic Trainers' Association1 advocated educating athletes to monitor acute differences in hydration status from body mass changes and additional assessments of chronic hydration status using urine concentration.
Although urine osmolality is seen as a more precise measurement, it may lack feasibility in a field setting.3 The simple low-cost measurement of urine specific gravity (USG)4,5 has consistently shown a good correlation with urine osmolality.6 In practice, USG is often measured as a marker of hydration status and used for athlete education.7 As the results affect the hydration advice athletes receive based on their classification (low or high urine concentration), an accurate assessment of hydration status is important.
Multiple factors may influence the quality and outcome of USG measurements, as refractometers, hydrometers (HYDs), and dipsticks may under- or overreport values.8–10 Most of the comparisons among refractometers have been reported by researchers11–15 using canine and feline urine and showed no clear consensus on whether optical or manual versus digital refractometers (DIGs) could be used interchangeably. In humans, investigators have reported that manual refractometers and DIGs displayed good correlation8,16 but that manual refractometers and DIGs may provide different values, especially when USG is high.16
Handling of the urine sample may influence the measurement. In clinical settings, the samples may be cooled and stored for multiple hours before they are measured,17 which is in contrast to athletic populations, whose samples are often measured directly after collection.18–21 The temperature differences of the samples (5°C to approximately 37°C) may affect measurement precision, as liquid density is influenced by temperature. However, manufacturers claim their refractometers automatically control for sample temperatures.16 The notion is that sample temperature has little or no bearing on the accuracy of the measurement reading, as the small amount of the sample needed immediately equilibrates to the temperature of the instrument. Yet, to date, no authors have provided detailed results that indicated sample temperature did not affect USG.
Our aim was to evaluate the reliable temperature range for a DIG, manual optical refractometer (MAN), and HYD by using reference Brix solutions and a representative sample range of urine samples at temperatures ranging from 5°C to 37°C against USG cutoff values that help to define stages of euhydration and underhydration in clinical sport nutrition practice.
METHODS
Design
We developed a quantitative study to evaluate the accuracy of different tools for measuring USG: DIG (model PEN-S.G.; Atago Co, Ltd), MAN (model RHC-200ATC; WMicro), and HYD (urinometer; C&A Scientific). All devices were operated according to the manufacturer's instructions, and we conducted the study in 2 parts. For experiment 1, we prepared aqueous sugar solutions across a range of degrees Brix (°Bx; 1°Brix = 1 g of sucrose in 100 g of solution) as an independent standard for specific gravity. For experiment 2, we used urine samples that were collected during a previous study that was approved by the Arizona State University Institutional Review Board (No. STUDY00007260).
Measurements of Specific Gravity
For experiment 1, in which we validated USG meters using Brix solutions, we prepared 24 validation solutions by mixing 500 mL of distilled water with laboratory-grade sucrose (crystal sucrose; Sargent Welch) and a precision scale with a resolution of 0.0001 (AL204; Mettler Toledo). We considered these solutions represented in °Bx as valid independent replacements for urine samples to objectively assess the USG, measured with a refractometer, ranging from 1.000 to 1.040 (Table 1).
For experiment 2, in which we compared the USG measurements of urine samples, we used 33 fresh urine samples (collected as part of another study) with USGs that ranged from 1.002 to 1.033 in a normal distribution (mean = 1.018 ± 0.009 based on PEN measurements). The fresh urine samples were measured at 20°C on the day of collection and then refrigerated at 5°C. Within 5 days of collection, the samples were measured again, first at 5°C and then after warming to 37°C on the same day.
For both experiments, the samples were prepared separately for measurement with the refractometers (2 × 15 mL in 45-mL transparent plastic freestanding centrifuge tubes; Evergreen Scientifics) and HYD (1 × 45 mL in glass Erlenmeyer flasks; Thermo Fisher Scientific). Measurements for both experiments were obtained in the following order: samples were cooled in the refrigerator (5°C) and then the closed samples were heated to 20°C and to 37°C. During experiment 1, we used an oven (model Heratherm OGS100; Thermo Fisher Scientific) and a Traceable Sentry Thermometer (Thermo Fisher Scientific) with a resolution of 1°C and accuracy of ±1°C to measure sample temperature. During experiment 2, we used a warm-water bath at 20 ± 1°C and 37 ± 1°C and a Traceable Long-Stem Digital ULTRA thermometer with a resolution of 0.1°C and accuracy of ±0.2°C. Each urine sample was measured at 3 temperatures on the same day within an 8-hour period. Before use, we calibrated both refractometers and the HYD using distilled water at 20°C. We either measured all samples in duplicate and used the average for analysis when the measurements were not more than 0.005 apart or we measured the samples in triplicate, used the median for analysis when the difference between the first 2 measurements >0.005, and used the average of the 2 measurements for analysis. Results were rounded to the thousandths.
Data Analysis
We performed separate analyses between all tools for the 2 experiments. As the HYD data were not distributed normally, we calculated the mean differences for all outcomes at different temperatures using Mann-Whitney U tests. Spearman correlations (including 95% CIs based on the Fisher Z transformation) were conducted to determine the relationships between reference measurements (°Bx) or the selected standard (DIG). The Bland-Altman analysis was used to determine mean estimation bias, direction, and 95% limits of agreement among refractometers with HYD as the standard. A level of P ≤ .05 indicated significance for both analyses. Additionally, we calculated receiver operating characteristics to assess the diagnostic capability of the DIG, MAN, and HYD for identifying high urine concentration (USG ≥1.020) based on the area under the curve (AUC) and sensitivity and specificity. An AUC of ≥0.90 is considered excellent; 0.80 to 0.89, good; and 0.70 to 0.79, fair when sensitivity and specificity are preferably >0.80.22 Finally, we evaluated the diagnostic validities of the equipment by comparing the refractometers with the accepted USG reference value of ≤1.020 as the upper limit classification of euhydration, and accuracy was depicted in a contingency table for each measurement temperature against the set reference.
We performed an a posteriori sample-size calculation using G*Power (version 3.0.10; Heinrich-Heine-Universität Düsseldorf) based on the mean difference of −0.0075 from the standard (°Bx reference solutions) and HYD 37°C for experiment 1 and the comparison of the mean difference (+0.0049) of the DIG 20°C and HYD 37°C for experiment 2.23 To determine a sufficient sample size for experiment 1, we used α = .05, power = 0.80, a calculated strong effect size (d = 1.09), and 2 tails that resulted in a minimum sample size of 9 for experiment 1. Repeating this calculation for experiment 2, we used α = .05, power = 0.80, and a calculated medium effect size (d = 0.71) that resulted in a required sample size of 18 measurements. The sample size of 33 for both experiments amply exceeded the desired power of 80%.
RESULTS
Experiment 1: Reporting the Accuracy of the USG Meters Against °Bx Reference Solutions
We compared the outcomes of the DIG, MAN, and HYD with the °Bx reference solutions. The measurements per reference solution are shown in Table 2. The USG cutoffs (1.020, 1.025, and 1.030) represent commonly used cutoff values in the literature5,21 for hydration status (euhydration, dehydration, and severe dehydration, respectively). At a higher cutoff value (ie, 1.025 or 1.030), a substantial number of measurements were classified below the cutoff for all tools. As indicated by color coding in Table 2, the misclassification of samples increased from 0 to 2 at 1.020 to 1 to 3 samples classified lower than the cutoff at 1.025 and 1.030 USG. All other results are presented in Table 3. The measured median (interquartile range) USG value for the reference °Bx solution was 1.025 (1.020–1.030). No median differences were seen between the °Bx solution and the DIG and MAN at any temperature or for the HYD 5°C (P > .05), reflecting similar reporting for averages. Only HYD 20°C (1.019 [1.022–1.030], P = .040, and HYD 37°C (1.017 [1.013–1.022], P = .005) demonstrated clearly lower values than those of the reference.
Ranking for all measurements of the DIG, MAN, and HYD with the °Bx solution can be considered strong for all temperatures, with r = 0.99 to 1.00, except for the HYD 5°C, for which we calculated a slightly lower correlation of r = 0.89.
Bland-Altman analysis revealed that the HYD data consistently underreported hydration status at all temperatures for the DIG and MAN, ranging from −0.001 to −0.003. The DIG 5°C underreported slightly without reporting bias for low and high measurements (r = −0.344, P = .13); all other plots for DIG, MAN, and HYD exhibited considerably larger underreporting biases at higher concentrations (P ≤ .012) at all temperatures except for the HYD 5°C, which displayed overreporting bias at lower values.
The overall diagnostic ability of all tools for measuring high USG values was excellent, with an AUC from 0.977 to 1.00, except for HYD 5°C, which demonstrated a good AUC of 0.867 for identifying a USG threshold of 1.020. Sensitivity was high in all cases, with slightly lower values for the HYD 5°C (0.750) and HYD 20°C (0.875); specificity based on a cutoff value of 1.020 was optimal at 1.000 in all cases.
Experiment 2: Comparison of Refractometers
We compared the MAN and HYD outcomes with the DIG measurement at 20°C that was selected as the standard (Table 4). The measured median (interquartile range) USG value for the standard DIG at 20 °C was 1.021 (1.010–1.024). No differences were seen in comparison with the DIG 5°C (1.021 [1.009–1.024], P = .93), DIG 37°C (1.020 [1.009–1.024], P = .81), MAN 5°C (1.020 [1.011–1.025], P = .71), MAN 20°C (1.021 [1.010–1.025], P = .81), or MAN 37°C (1.021 [1.010–1.025], P = .71). Although not different from the selected standard DIG 20°C, the median reported values for HYD 5°C (1.018 [1.008–1.022], P = .20) and HYD 20°C (1.017 [1.008–1.021], P = .20) were lower. The only significant difference was between the standard and HYD 37°C (−0.0049, P = .008). All correlations with the DIG 20°C can be considered strong at all temperatures, as they ranged from 0.97 to 1.00. This result indicated that DIG measurements at 5°C and 37°C and MAN and HYD measurements at all temperatures can be ranked as consistently similar to the DIG 20°C standard.
Bland-Altman plots showed that the HYD data consistently underreported hydration status at all temperatures. At 37°C, the measurements of all tools were as biased as the MAN 5°C, which was indicated by the moderate correlations between the different methods and the means of both methods ranging from r = 0.37 to 0.52, with P ≤ .034. No difference was observed for DIG 5°C or MAN 20°C; the absolute differences between the DIG 20°C standard and DIG 37°C (−0.001), MAN 5°C (0.001), and MAN 37°C (0.001) measurements were small, but the HYD outcomes were considerably larger: −0.002 for the HYD 5°C and HYD 20°C and −0.006 for the HYD 37°C.
The overall diagnostic ability of all tools to classify low and high urine concentrations, based on receiver operating characteristics, was excellent, and all were able to identify a USG threshold of 1.020. The AUC scores for the DIG and MAN were ≥0.98 and the HYD scores ranged from 0.92 to 0.96. Sensitivity and specificity for the DIG and MAN at all temperatures were >0.85. For the HYD, sensitivity ranged from 0.75 to 0.80 and specificity from 0.85 to 0.90.
DISCUSSION
When we compared the °Bx solution with the USG measurements, reporting bias and a misclassification of samples at higher cutoff points (1.025 and 1.030) were evident, especially for the samples measured at 37°C. Overall, the HYD displayed the highest degree of misreporting at all temperatures. Experiment 2, in which we measured urine samples using the 20°C DIG as the standard, showed that the DIG and MAN displayed no mean differences when ranked against the outcomes of both tools and the receiver operating characteristic analysis, whereas the HYD reflected clear differences from the selected standard, as present in the greater bias.
We suggest that the DIG and MAN performed similarly, which contradicts the earlier findings of Minton et al,16 who, using Bland-Altman analyses, suggested an underestimation bias for 2 types of DIGs compared with a MAN as USG values increased. In the current study, the HYD underreported in both experiments versus the selected standards. Rudinsky et al15 compared 4 refractometers and noted measurement differences of ±0.002. We observed that the urine measurement differences were smaller (±0.001). Athletic trainers, dietitians, and physicians expect consistent USG results, regardless of the refractometer used; yet it is important to understand that refractometers may report differently.15 To our knowledge, only 1 other group of researchers investigated the effect of temperature on mean outcomes for USG. Jin et al9 compared MAN and HYD measurements at 4°C, 20°C, and 37°C. They obtained similar results, with no effect of temperature on mean outcomes of the refractometer but described underreporting of the HYD at 37°C. We added to this previous report by validating the refractometers and HYD in different ways (median difference, ranking of outcomes, magnitude of underestimation, and accuracy of reporting) using both urine samples and an independent °Bx reference solution. The inclusion of an independent marker (°Bx) was critical, as the Bland-Altman analysis revealed that all tools tended to underreport when USG increased.
Despite the suggestions of some manufacturers, we did not correct for temperature differences with any of the tools. The HYDs are known to be temperature sensitive,24 and for each ±3 °C of standard temperature (20°C), 0.001 should be added or subtracted from the original reading. Considering that the HYD measurements showed large internal variations (Table 2), we recommend that HYDs should not be used to detect severe dehydration, as the tool underreported even with the temperature correction.
As illustrated by the results in Table 2 and shown statistically in the Bland-Altman analysis in Table 3, all tools except the HYD 5°C misreported at higher USG concentrations. Although the absolute difference in misclassification increased only by 1 to 2 cases per set of measurements, the relevance of misreporting at high USG values must be considered, as this is the point at which athletes are classified into groups with a low or high urine concentration. Some sports, such as boxing, wrestling, and other weight-class–bound sports, use specific cutoffs as part of the weigh-in process.4,19,25 “Cutting weight” may influence the fluid balance, resulting in many athletes displaying USG values around 1.030.4,19,25 In other sports, such as cycling, lacrosse, soccer, and triathlon, many athletes present high USG values.5,26,27 For example, Armstrong et al5 determined that 25% of all collected morning urine samples exceeded 1.029. Montazer et al28 found that 38% of construction workers had USG values larger than 1.025. As high USG values around frequently used cutoff values for dehydration (1.026) and severe dehydration (1.030)10 are described in a wide range of high-risk individuals, the misreporting of refractometers is likely to affect the proper classification of hydration status among these populations.
We believe that one of the strengths of this study was our design, which was a combined approach with measurements of reference °Bx solutions as an independent marker and urine samples covering the same range of measurement values. This enabled us to investigate the accuracy of USG measurement over the spectrum from 1.000 to 1.040 that is normally used for measuring human urine. Another strength is that we used only fresh urine samples measured within 5 days of collection during experiment 2. The USG in fresh urine remains stable for at least 7 days when stored at 5°C, and freezing the samples negatively alters USG measurements.27
A limitation is that we did not correct for large molecules in the urine, such as high levels of protein and glucose, which may influence USG readings.24 However, our use of an independent marker (°Bx) was a strength; if the urine samples had been dried to determine the precise weight of the total solids,14,29 large molecules would have affected the calculated density of the urine sample based on the dry matter obtained.
This study is clinically relevant, as the assessment of chronic hydration status via USG is often used in athlete education.1 A correct diagnosis requires accurate measurements. As we have shown here, values near clinically relevant dehydration (≥1.030) tended to indicate lower values, which were amplified by a 37°C sample temperature. At lower cutoff values, such as 1.020 and 1.026, which are often used in the literature,5 there is also a chance of underreporting. The underreporting is likely to be 0.001 to 0.003, depending on the tool, temperature, and concentration. Although these small differences may not be clinically relevant, they may have a large effect when professionals use the measurement to classify athletes as being euhydrated versus underhydrated. These differences would also affect the hydration advice given, with more individuals being classified on the lower side of the cutoff (toward being hydrated), whereas a substantial number should be classified as underhydrated. Therefore, despite the manufacturers' instructions about the temperature-correction abilities of their devices, we suggest that measurement temperatures should be standardized at 5°C or 20°C and that measurements at approximately 37°C should be discouraged. Finally, the DIG was the most time-efficient device studied, as it could be dipped directly into a sample to obtain a measurement, whereas the MAN required an additional action of pipetting a small drop of urine on a transparent glass plate or lens that then needed to be closed. The HYD was the most time-consuming and least accurate measurement. The specific HYD we used had only 1 scale printed on the side and often kept spinning in the glass jar after it was dropped in the sample, making it more difficult to obtain a good reading.
In conclusion, the HYD produced unreliable results and should not be used. The DIG and MAN provided comparable results at multiple levels against an independent standard, and no clear differences were detected for medians, correlations, or receiver operating characteristics for the USG cutoff value of 1.020, regardless of sample temperature. Furthermore, USG reporting on individual urine samples may be slightly biased when urine concentration increases toward or above the threshold value of 1.025, particularly at higher sample temperatures. Thus, USG measurements should be standardized for temperature and preferably measured consistently at 5°C or room temperature (20°C) using a DIG or MAN.
ACKNOWLEDGMENTS
Stavros Kavouras, PhD, has served as a scientific consultant for Quest Diagnostics, Standard Process, and Danone Research and has active grants with Danone Research and Standard Process. We thank research nurse Ginger Hook, RN, CDE, in the College of Health Solutions Nutrition Laboratories for her support during this study.