Urine color (Uc) is used to asses urine concentration when laboratory techniques are not feasible.
To compare the accuracy of Uc scoring using 4 light conditions and 2 scoring techniques with a 7-color Uc chart. Additionally, to assess the results' generalizability, a subsample was compared with scores obtained from fresh samples.
Descriptive laboratory study.
A total of 178 previously frozen urine samples were scored, and 78 samples were compared with their own fresh outcomes.
Urine color and accuracy for classifying urine samples were calculated using receiver operating characteristics analysis, allowing us to compare the diagnostic capacity against a 1.020 urine specific gravity cutoff and defining optimal Uc cutoff value.
Urine color was different among light conditions (P < .01), with the highest accuracy (80.3%) of correct classifications of low or high urine concentrations occurring at the brightest light condition. Lower light intensity scored 1.5 to 2 shades darker on the 7-color Uc scale than bright conditions (P < .001), but no further practical differences in accuracy occurred between scoring techniques. Frozen was 0.5 to 1 shade darker than freshly measured Uc (P < .004), but the values were moderately correlated (r = 0.64). A Bland-Altman plot showed that reporting bias mainly affected darker Uc without affecting the diagnostic ability of the method.
Urine color scoring, accuracy, and Uc cutoff values were affected by lighting condition but not by scoring technique, with greater accuracy and a 1-shade-lower Uc cutoff value at the brightest light (ie, light-emitting diode flashlight).
Urine color scoring and accuracy were affected by light condition. Assessing urine samples in bright light conditions with intensity >1600 lux resulted in the greatest accuracy.
In addition to light, other factors, such as container material and volume, should be factored in when defining the optimal urine color cutoff value.
Light condition will likely influence urine color scoring in many situations, even if athletes assess their urine straight from the toilet or urinal.
A high urine concentration has been suggested as a biomarker for detecting underhydration.1 This can occur without the perception of thirst or a change in plasma osmolality concentration.1 Evidence is growing for the long-term health benefits from fluid intake that results in a 24-hour urine concentration of ≤500 mOsm/kg2 and a urine specific gravity (USG) value of ≤1.012.3 As spot morning urine samples generally have a greater concentration than a full 24-hour urine sample,4 the cutoff value for well-hydrated status potentially lies somewhat higher. Therefore, concentrations of 700 mOsm/kg5 or 1.020 USG4 have often been reported. The most accurate way to assess urine concentration involves laboratory-based techniques. Still, urine color (Uc) has been suggested as an appropriate proxy measurement in an applied setting for assessing the hydration status of athletes.6 Measuring Uc has several advantages: the method is inexpensive and noninvasive, does not require technical expertise, and gives immediate results.7 Despite the fact that urine osmolality is often proposed as the standard for assessing urine concentration,8 USG displayed similar correlations with Uc (r ≅ 0.80)9,10 and 5% better accuracy for classifying samples with a high versus low urine concentration compared with urine osmolality.
Urine color charts have been valuable in hydration assessment and education in many settings, including clinical, athletic, and household settings.11,12 However, the main disadvantage of using the current common Uc charts is their lack of sensitivity (∼80% sensitivity accuracy).13 Certain variables, such as vitamins and other bioactive substances, as well as protein, can influence Uc14 and concentration.15 On the other hand, the accuracy of correctly predicting urine concentration based on Uc can be improved substantially when using spectrophotometry, resulting in an analysis with 97.4% sensitivity,16 showing that assessing light-based urine concentration with high accuracy is possible.
Authors3,9,13,17 have reported a wide range of average Ucs for different populations, from 3 to 6. Apart from the diameter (volume) and materials used in the urine container, the type of light and light intensity will influence perceived Uc.10 The Beer-Lambert law states that light absorbance is equal to the concentration of the solution the light is passing through, the solution's volume, and the absorption coefficient. Despite researchers' reports3,10,18 that urine samples were scored in a well-lit room, not much information is available about the actual light conditions present and how the urine sample was positioned against the light for its color to be scored.
As previous investigators have not clearly defined the light conditions on which the suggested Uc cutoff values have been based, health professionals, such as athletic trainers, coaches, and sport dietitians, would benefit from more information reporting the effect of different light conditions on Uc scoring. Such insight will help further standardize the Uc scoring process, potentially leading to better athlete self-classification of urine samples with low or high urine concentration. Therefore, the objective of our study was to compare the effect of different light conditions on Uc scoring. The aims were as follows: (a) to report Uc scores for 4 light conditions using a 7-color Uc chart and 2 scoring techniques, producing 8 Uc scoring combinations; (b) to evaluate the diagnostic ability and the optimal Uc cutoff value for each of the 8 scoring combinations to assess Uc; and (c) as the samples used were previously frozen, to perform a sensitivity analysis comparing the results of a subsample (n = 78) before and after freezing to determine if the results from aims (a) and (b) can be generalized to freshly collected and measured Uc.
To evaluate Uc scoring differences among 4 light conditions, we used a 7-color Uc chart and 2 Uc scoring techniques. Three research technicians individually scored each sample. Then the calculated median Uc scores were compared with the measured USG of initially fresh samples to identify the accuracy of correctly classifying urine samples with a low versus high urine concentration for each condition. Urine samples were classified as having a low urine concentration with a USG cutoff value <1.020 or a high urine concentration equal to or above this value.4 To ensure generalizability of the data, as frozen samples tend to display a slightly darker color, a multilevel comparison was performed in a subset of the samples measured, with Uc and USG assessed before and after freezing. All 3 research technicians were involved in previous studies of Uc scoring and were experienced in comparing color of urine with multiple Uc charts while using both of the scoring techniques described in this Methods section.
The 178 urine samples were initially collected between 2018 and 2019 during 3 studies approved by the Institutional Review Board at Arizona State University: STUDY00010071, STUDY00008336, and STUDY00007260. Urine samples were anonymized before the project was started. Apart from the original measured USG value for each sample as described earlier by our laboratory team,19 no other data were carried over to this study. Samples were collected from a healthy athletic population as a spot urine sample and stored in 30 mL tubes at −20°C for 1 to 18 months.
During each scoring session, the frozen samples were covered with clear film and thawed until they reached room temperature (20°C) for scoring. Four scoring stations were created, 1 per light condition. At each station, the technicians used 2 techniques to score the samples. After a batch of approximately 30 samples was scored, the samples rotated while the technicians remained at their station using the same scoring method and light type before moving over to the next light condition. Figure 1 illustrates all scoring combinations.
The 7-Color Uc Chart
Two scoring techniques were used: the traditional sample-over-chart method, which required the technician to hold the sample while comparing it with the color chart, and the Uc scoring box method that was developed to standardize the distance between the eye and the urine sample, which could potentially affect scoring accuracy. Urine samples were prepared in 30-mL transparent plastic centrifuge tubes (freestanding Evergreen model; Caplugs). Each urine sample was covered using clear Parafilm (Bemis Company, Inc) to seal the sample and prevent color distortion, allowing the scoring of samples without the original green tube cap. Each urine sample was inverted 3 times before being scored.
The Sample-Over-Chart Method
During this method, the technicians slid each separate urine sample over the white part of the Uc chart and compared the sample with each color on the 7-color Uc chart.9 A decision was then made about the color of the sample, and the score was noted.
The Uc Box Scoring Method
The second method involved a color scoring box constructed in our laboratory.21 This box was created to standardize the distance from the observer to the sample and the way the urine sample was positioned in relation to the light for each light condition (36 cm [14 in]) as previously reported.20 The technicians looked through the box while comparing the color they perceived with the colors on the 7-color Uc chart positioned directly aside the box. A decision was made about the color of the sample, and the score was noted.
Light conditions were selected based on practical relevance. The light intensity was different for each light condition but similar for scoring techniques within each light condition. Each light condition intensity was measured using a foot-candle lux meter (model 407026; Extech Instruments) at the tungsten/daylight setting. The light-emitting diode (LED) light conditions represent a combination of lights (as the background lights were not turned off).
The restroom halogen light was part of the testing facility. The lights were built into the ceiling 180-cm (6-ft) apart in a square, at the corners of all ends of the 2 tables that created the testing station. The light intensity for this condition was 224 lux, measured at the center of the testing station, with the surface of the tables 180 cm from the ceiling.
The laboratory space had fluorescent office light. The two 3-light fluorescent parabolic troffers were built into the ceiling 180 cm apart, matching the long end of the testing stations, ensuring that the measurements were obtained directly under the light source. The light intensity for this condition was 402 lux, measured at the center of the testing station, with the surface of the tables 180 cm from the ceiling.
The LED Panel
For the sample-over-chart method, the 28-W LED panel light set to full white (model NL480; Neewer), providing 1666 lux, was placed at 12 cm (7.5 in) on the left side of the sample. For Uc box scoring, the light was placed on the left side of the 30-mL urine sample and the scorer's perspective. The LED panel analysis was conducted in the restroom facility with the halogen ceiling lights switched on.
The LED Flashlight
The flashlight contained 6 LEDs (Ozark Trail), providing 1848 lux when covered with a single layer of white masking tape to create a filter. The light was projected directly from underneath the 30-mL centrifuge tube for both scoring methods. During the sample-over-chart method, the technicians wore blue laboratory gloves and held the urine sample directly on the flashlight. For the color box scoring, the flashlight was built into the box to light the sample from underneath when it was placed in the center of the box on top of the flashlight. The LED flashlight analysis was performed in the office space with the fluorescent ceiling lights switched on.
The urine concentration from the freshly measured samples was reported as the median and interquartile range (IQR). To address aim (a), to evaluate the scored Uc using different lights and techniques, we reported the median, IQR, mean, and SD for each scoring condition. Differences were assessed using the Friedman and Wilcoxon signed rank tests.
To address aim (b), investigating the diagnostic ability under the 8 conditions to distinguish between low and high urine concentrations based on the correct classification of Uc, we calculated receiver operating characteristic (ROC) curves. The Uc scores were optimally fitted against the USG values that were initially measured in fresh urine samples. The best Uc cutoff value for distinguishing urine samples with a low versus high urine concentration (<1.020 USG cutoff value) was determined from the area under the curve (AUC) using the max approach for sensitivity and specificity.
To address aim (c), assessing the generalizability of the study outcomes based on frozen samples before analysis, we compared data from a subsample (n = 78) measured before and after freezing. The original USG value of the freshly measured samples was reported and correlated against the frozen samples using the Spearman correlation, including 95% CIs, using the Fisher Z transformation. The mean difference between the frozen and fresh samples was analyzed using the Wilcoxon signed rank test and Spearman correlation. Further, a Bland-Altman plot was produced to assess the agreement between scores for individual samples, comparing the Uc scores from 1 to 7 to evaluate the outcomes of freshly scored urine samples against frozen samples. To assess whether the Bland-Altman results were biased for scoring Uc lighter, similar to, or darker between fresh and frozen urine samples, we calculated an additional Spearman correlation coefficient. This was done by correlating the Bland-Altman results from the y axis (the difference between the reported Uc outcomes) against the results of the x axis (the means of both outcomes); reporting bias was present when the correlation was significant. The level of correlation provided more information about the direction of this bias. Finally, the agreement level was calculated, defined as M(difference) ± 1.96 SD(difference). Statistical significance was set for all analyses at P ≤ .05.
The original median USG value measured in the 178 fresh samples was 1.018 (IQR = 1.012–1.027), with 42% of the samples at or above the suggested USG cutoff value of 1.020. The outcome of aim (a), scoring Uc using different lights and techniques, is reported in Table 1. The average of the combined median values per scoring technique resulted in a difference for Uc among all light types (P < .001). The conditions with lower light intensity, such as halogen and fluorescent, scored 1.0 to 1.5 shades darker than the LED light conditions. The average Uc for fluorescent light was 3.5 (2.0–6.0); for halogen light, 3.0 (2.0–6.0); the LED panel, 2.0 (1.0–5.5); and the LED flashlight, 2.0 (1–4). Despite the higher light intensity, Uc was darkest under fluorescent light, before halogen and the brighter LED conditions. Scoring technique, ie, the sample over chart versus in scoring box, influenced the reported Uc under each light type (P ≤ .01), with the exception of halogen light (P = .91).
The AUC, calculated to investigate the diagnostic ability of the 8 measured conditions as formulated for aim (b), slightly increased with brighter light conditions. The lowest AUC for the sample-over-chart and Uc scoring-box techniques, respectively, was 0.82 and 0.84 for fluorescent light, followed by halogen light (0.84 and 0.86), LED light panel (0.87 and 0.84), and LED flashlight (0.86 and 0.87; Table 1).
The accuracy of correctly classified urine samples for low or high urine concentration increased incrementally, similar to the reported AUC. Starting with fluorescent light (76% for both scoring techniques), followed by halogen light (77% for the sample-over-chart and 76% for the scoring box), LED light panel (79% for the sample-over-chart and 78% for the scoring box), and ending with the highest values for the LED flashlight (79% for the sample-over-chart and 80% for the the scoring box). The rate of false-positive (FP) values (as a result of a low Uc when concentration was actually above the selected USG cutoff) was inversely associated with light intensity: the highest percentage was for halogen light with the lowest light intensity (20% and 17% for the sample-over-chart and box-scoring techniques, respectively) versus LED flashlight with the highest light intensity (12% and 15% for the sample-over-chart and the Uc box, scoring techniques, respectively). The ROC-based Uc cutoff value for fluorescent light was 1 shade darker (Uc ≤ 4) than the suggested best fit for Uc cutoff for all other light types (Uc ≤ 3) for classifying low versus high urine concentration.
Difference Between Fresh and Frozen Urine Samples
Aim (c) was to investigate the generalizability of the outcomes between fresh and frozen urine samples. The median USG for the 78 freshly measured samples was 1.018 (IQR = 1.012–1.023). The median Uc score was 2, with a larger IQR in the frozen condition (+2 Uc shades) than in the freshly measured samples (P = .004), indicating a larger number of darker-scored urine samples in the frozen condition.
Correlations between Uc and USG were somewhat stronger for frozen samples (r = 0.74; 95% CI = 0.61, 0.83; P < .001) than fresh samples (r = 0.59; 95% CI = 0.42, 0.85; P < .001) as shown in Table 2, but the samples were correlated moderately for Uc when fresh versus frozen Uc scores were compared (r = 0.64; 95% CI = 0.48, 0.76; P < .001).
Based on the Bland-Altman plot comparing the Uc scores of fresh and frozen urine samples on an individual level, 48% of the reported Uc scores were similar for both conditions (Figure 2), and 75% of the reported Uc scores stayed within a ±1 Uc shade difference. When correlating the results of the y and x axes of the Bland-Altman plot, we detected a small amount of reporting bias (r = −0.45, P < .001). The nature of this bias was expressed at the midsection of the 7-color Uc chart, with slightly darker Uc scores (∼13%) in the frozen condition, indicating a higher level of underreporting in comparison with freshly scored samples.
The AUC was higher for frozen samples (0.88) than for fresh samples (0.77). This resulted in less accuracy in correctly classifying fresh samples (64.4%) versus frozen samples (83.6%). The ROC-based Uc cutoff value was 1 shade darker (Uc ≤ 2) than the suggested best fit for freshly measured urine samples (cutoff = 1) used to classify low versus high urine concentration.
The key insight of our study was that under brighter LED light conditions, the diagnostic ability to discriminate urine samples with low versus high urine concentration was greater than in the halogen and fluorescent light conditions. Light intensity influenced perceived Uc, but despite differences for Uc among all light conditions, the 1-shade Uc cutoff value difference between the bright (1660–1848 lux) and darker (224–420 lux) light conditions was the most important practical finding. Additionally, Uc was reported differently between the scoring techniques, ie, assessing the sample while holding it against the chart versus assessing the sample using a standardized Uc scoring box, for the fluorescent and the LED light conditions but not for halogen. Although Uc was reported differently, no practical differences were evident for the AUC and accuracy of correctly classified urine samples or Uc cutoff value between scoring techniques.
Regarding objective (a), to evaluate the scored Uc using different lights and scoring techniques, the median scores ranged from 2 to 4, with the lowest and highest IQR values (1 and 6, respectively) fitting well within earlier reported data. Authors of previous studies described average Ucs of 3 ± 110 and 3 ± 2 up to 6 ± 1.9 Earlier researchers who addressed Uc scoring validation tested the urine samples in a well-lit room3,10,18 and may have used different container sizes, both of which may affect Uc scoring. The question is which of our conditions matched the definition of a well-lit room, as the light intensity varied substantially among the fluorescent, halogen, and LED lights. We suggest that future publications should specify the type of light, as wavelength may influence how Uc is perceived and intensity and the container volume, as both may affect Uc scoring accuracy. Despite the significant difference between scoring techniques under fluorescent and LED light conditions, no clear differences were seen for reported Uc or the diagnostic ability of the scoring technique, suggesting that more rigid scoring conditions, such as using a separate Uc scoring box, do not necessarily result in better performance than the traditional sample-over-chart method.
Objective (b) was to investigate the diagnostic ability of the 8 measurement conditions. We showed fair to good diagnostic ability based on an overall high AUC ranging from 0.77 to 0.87, which was comparable with values (0.73–0.82) previously reported by athletes,6 but other scores have been similar3 or higher, ranging from 0.85 to 0.92.22 On average, regardless of the scoring technique, the 2 brighter light conditions resulted in a Uc cutoff value that was at least 1 shade lower than under the standard fluorescent and halogen lights found in traditional office and restroom spaces. The LED flashlight conditions also resulted in 3% to 4% better accuracy for classifying low versus high urine concentration and a lower number of FP-classified urine samples. When the goal is to detect high urine concentration, a substantial ∼5% fewer FP cases in the brighter light conditions is important; a reported low Uc when the actual urine concentration is above the selected 1.020 USG value will not prompt an individual to adjust fluid intake, which defeats the sole purpose of the hydration assessment.
Interestingly, the ROC-based best-fit Uc cutoff value was similar for halogen and the 2 LED conditions, whereas under fluorescent light (with a higher light intensity than halogen light), the optimal Uc cutoff value was 1 shade darker. The visual spectrum of light ranges from 360 to 830 nm.23 Light types overlap in color range but differ in color tone. For example, LED light covers a blueish range of 395 to 530 nm; fluorescent light, a green to yellow range of 480 to 570 nm; and halogen, a range of 650 to 950 nm, starting as more reddish and eventually exceeding human perception. Although our study was not designed to differentiate among different intensities within light conditions, it seems likely that wavelength, in addition to light intensity, can influence how Uc is perceived. This is especially consequential because spectrophotometric analysis also demonstrated that, while Uc was reported as a darker color with increasing concentration, individual values of tristimulus colorimetry (CIE L*a*b*) scores changed, indicating a notable polynomial color trend along the green-red axis, with a green hue in slightly dehydrated urine.24 Therefore, although Uc tends to darken as concentration increases, subtle differences in how the color is constructed in combination with the selected light condition may influence how the urine is perceived by the human eye.
The goal of aim (c), comparing the results of frozen versus originally scored fresh outcomes, was to generalize these findings to freshly scored urine samples. Freezing tends to result in Uc that is 0.5 to 0.6 (95% CI = 0.3, 0.8) shades darker,25 possibly as a result of the freeze-thaw method,26 consistent with our protocol. At the same time, we noticed sedimentation,25 which dissolved when the tubes were slowly inverted. Despite the mean Uc difference between fresh and frozen urine samples, the correlation between Uc scores was fair. The reporting bias in the Bland-Altman plot was caused by Uc samples that would likely have been classified above the suggested Uc cutoff value for underhydration in both fresh and frozen conditions. Therefore, the actual result for classifying urine samples of low versus high urine concentration would have been the same for fresh and frozen samples, suggesting that our results can be generalized toward Uc scoring of fresh samples.
The strengths of this study were the relatively large sample size and multiple technicians, ensuring objective Uc scoring under the 4 light conditions. Scoring under the LED flashlight condition, which was the most accurate, could be easily accomplished at low cost with only slight modifications, ie, applying a single layer of white masking tape or painter's tape to block out the visual blue light, making the method accessible for a large population. However, our work also had limitations. The sensitivity analysis was based on only 1 light condition (LED panel) and the Uc scoring-box technique because this was the only method used to score the subsample of fresh urine samples before freezing. Another limitation was that the Uc scoring was compared with USG and not with urine osmolality, although USG has correlated strongly (up to r = 0.97) with urine osmolality.20 An additional factor was our use of randomly selected spot urine samples provided at different times of the day. However, previous data from our laboratory indicated that, when urine collections were stratified by morning versus other times of the day, Uc scoring accuracy was not affected.6 As we did not control for activity, Uc and urine concentration could have been influenced by acute rehydration, leading to a mismatch between Uc and urine concentration due to acute dilution of the urine. The results of this analysis are based on urine samples from a healthy active population and may not be generalizable to patient populations with illnesses that affect their Uc. Finally, we used 30-mL tubes, whereas for urine collection, 90-mL containers are often used. Based on Uc comparisons in our laboratory, Uc differs a full shade between those volumes, resulting in a 1-shade-darker Uc when scoring the larger container.
The practical importance of these findings for athletic trainers and other sport medicine practitioners lies in the fact that the accurate classification of samples with high versus low urine concentration can be improved by modifying light conditions. Normal light conditions in well-lit rooms will probably be around 200 to 400 lux. We observed that scoring urine samples under very bright light (>1600 lux) resulted in greater accuracy of correctly classifying urine samples and fewer FP scores. A smaller number of FP scores is vital, as a reported light Uc when the actual urine concentration is high is unlikely to result in increased fluid intake.
The assessment of hydration status via Uc can be an important educational tool for athletes,11 but even in applied settings, when laboratory analysis is not strictly necessary, athletes should use the most accurate method available. We suggest that 30-mL urine samples lit from underneath by a small LED flashlight displayed the highest number of correctly classified samples while using a modified Uc cutoff value. Previous researchers20 identified the cutoff value for fresh urine samples scored using bright LED light was 1; therefore, samples with a Uc score ≥2 indicated a USG ≥1.020.
In conclusion, the accuracy of Uc scoring was affected by light conditions, while the scoring techniques we evaluated did not seem to lead to practical differences in Uc scoring accuracy. These results can be translated to an everyday suggestion that Uc cutoff values need to be adjusted to light conditions to optimize scoring accuracy.
We thank student helpers Margeret Matzinger and Josh Boeckman.