ABSTRACT
Vicens-Miquel, M.; Tissot, P.E.; Colburn, K.F.A.; Williams, D.D.; Starek, M.J.; Pilartes-Congo, J.; Kastl, M.; Stephenson, S., and Medrano, F.A., 2025. Machine-learning predictions for total water levels on a sandy beach.
Coastal inundation can significantly impact beach management and conservation efforts, particularly as the frequency of such events increases due to relative sea-level rise. Therefore, improved, timely models are needed to predict the potential degree of coastal inundation accurately. Existing numerical models and observational platforms were investigated, and their capabilities and limitations were highlighted. Despite advancements, accurate coastal inundation forecasting remains a challenge. This is primarily because tide gauge water-level measurements are approximately a meter below the actual total water level, which includes wave run-up. To address this issue, a machine-learning approach is proposed to predict total water levels on a beach, incorporating local and regional metocean data and wave run-up along the central Texas Gulf Coast. This unique approach involves installing cameras in the study area to record images and videos every 30 minutes. Additionally, beach-profile surveys were conducted bimonthly to measure the elevation across the berm using conventional surveying methods. Beach-profile surveys facilitated the creation of digital elevation models, which, combined with imagery, allowed for extracting the wet/dry shoreline elevation. This elevation indicates the most landward point reached by water on the beach, combining tides, surge, and wave run-up, defined in this paper as the total water level. Combining this technique with tide gauge and wave buoy data facilitated the creation of a 1-year dataset that served as input for a multilayer perceptron model. Two versions are presented: a high-performance and an operation-ready model. Both satisfy the National Oceanic and Atmospheric Administration criterion that the central frequency of 15 cm exceeds 90% for both 24- and 48-hour predictions during nonfrontal months. The success of this novel machine-learning application for predicting total water levels, including wave run-up, is presented; however, model performance during the frontal months will require additional improvements.
INTRODUCTION
Accurately predicting the occurrence and degree of coastal inundation has many positive implications for geological, biological, economic, and societal problems, such as protecting biodiversity and habitat and also beach management in coastal areas. These predictions are essential from an economic perspective, as preparation efforts prompted by accurate predictions can potentially mitigate property and personal damages from an inundation event. In addition to private property damages, accurate predictions can support actions to potentially prevent public infrastructure damages, such as highways and roads (Kim, Keum, and Han, 2019), which could otherwise disrupt commerce and put travelers at risk. At present, there is a general lack of understanding about inundation by those charged with beach and coastal management due to limited datasets that document the actual total water level and the frequency of the conditions responsible. These examples highlight the significance of the goal of improving the predictive tool sets and exceeding present limitations. Reaching this goal requires overcoming the acknowledged challenges of predicting coastal inundation accurately. To overcome these challenges, an interdisciplinary group of researchers has been assembled to integrate the most relevant spatiotemporal data and metocean forcings into the models.
Water-level prediction models have existed for centuries and have evolved alongside advancing technologies, including tidal computations, numerical models, and more recently artificial intelligence (AI). These models traditionally predict average water levels without including the wave run-up. This is sufficient for open-water predictions or where wave impact is minimal; however, many shorelines experience wave run-up on beaches, significantly extending the area of wet beach or inundation beyond what average water levels suggest. This research shows that total water levels consistently exceed average water levels, typically by 1 m at the study location. This makes predicting coastal inundation, or total water levels, more challenging.
The wave run-up is the maximum vertical height above the still water level that the water reaches on a beach or structure due to wave action (Andadari and Magdalena, 2019; Mendes, Andriolo, and Neves, 2022; USACE, 2002). It is influenced by various factors such as wave characteristics, coastal morphology, surface roughness, porosity, and vegetation (Manousakas et al., 2022; Phillips et al., 2017), and it is a valuable factor in determining the extent and severity of inundation (Da Silva et al., 2020). Wave run-up can be further decomposed into superelevation of the mean water level due to the wave action and the swash uprush (USACE, 2002). Various studies have been conducted to understand and model wave run-up phenomena. Chen et al. (2000) provided a review of wave run-up on beaches and coastal structures, emphasizing the need for further research on two-dimensional shoreline run-up. This review highlights the evolving nature of research in this field and the relevance of advancing our understanding beyond traditional one-dimensional models.
Stockdon et al. (2006) introduced an empirical parameterization for extreme run-up developed from diverse field experiments, offering a valuable tool for predicting run-up on natural beaches. This parameterization considers setup at the shoreline, significant swash, and the influence of incident and infragravity frequency bands. Synolakis (1987) conducted laboratory experiments to support theoretical models on the run-up of solitary waves, contributing to the foundational understanding of wave run-up dynamics. Whittaker et al. (2017) explored the optimization of wave group run-up on beaches, emphasizing the influence of wave characteristics on significant wave run-up. More recently, Beuzen, Goldstein, and Splinter (2019) calibrated a machine-learning model to predict LIDAR-derived wave run-up on a beach, using the results to generate ensemble model predictions to predict dune erosion with uncertainty. Tarwidi et al. (2023) compared machine-learning methods to predict wave run-up on sloping beaches in laboratory conditions with excellent accuracy including a correlation coefficient of 0.99. Kim and Lee (2024) compared machine-learning methods to predict wave run-up while using several field observations and laboratory experiments. They showed that an XGBoost method can predict the height of the wave run-up more accurately than empirical formulas while using only two inputs with their model.
These previous models showcase the ability of machine learning in enhancing predictive capabilities. These previous methods successfully model wave run-up explicitly (i.e. directly as a parameter), and the proposed approach predicts total water level, which includes wave run-up implicitly as part of the prediction. Total water-level predictions require the combined modeling of average water level and wave run-up.
At present, a general AI model is not capable of directly predicting coastal inundation for specific locations, but existing methods can forecast inundation risk (Aucelli et al., 2017). One existing approach involves predicting inundation by using shoreline topography, historical water levels, and relative sea-level rise, which then guides the creation of coastal inundation maps that identify the areas most vulnerable to inundation (Aucelli et al., 2017). However, this method cannot be applied to a variety of locations and does not have the capability to accommodate onshore forcing events and particularly the influence of wave run-up. Additionally, in a controlled laboratory setting, Roberts, Wang, and Kraus (2010) discovered a direct correlation between the significant wave height of breaking waves and the distance of wave run-up on the beach. This suggests that significant wave height can be used as an indicator of when inundation may occur. To address this gap in capability, various numerical models and observational platforms that are used for coastal inundation prediction and monitoring can be applied, each with distinct capabilities and limitations. The benefits and limitations of these tools are described in the following paragraphs.
The National Water Prediction Service (NWPS) is a division of the National Weather Service (NWS) that provides forecasts for water resources, hydrology, and associated water-related risks across the United States. The organizations provide experimental flood inundation maps, which estimate the extent of inland river flooding and coastal flooding (National Water Prediction Center, 2023); however, these models are not tailored to predict total water levels on beaches resulting from changes in metocean variables such as waves and water levels. As a result, they may not accurately forecast the maximum degree of wave run-up on beaches.
The NWS has developed a computerized numerical model called the Sea, Lake and Overland Surges from Hurricanes (SLOSH) model (National Hurricane Center and Central Pacific Hurricane Center, 2023). The model was designed to calculate the estimated height of storm surges caused by hurricanes, whether historical, hypothetical, or predicted, by considering various parameters of the hurricane such as the atmospheric pressure, size, forward speed, and track data. The SLOSH model creates a wind-field model that generates storm surge. The model resolution is variable but typically too coarse when considering an individual beach and does not directly predict key components that contribute to total water level, such as the wave run-up.
The Hybrid Coordinate Ocean Model is a computational tool used to simulate oceanic circulation and related processes (HYCOM, 2023). It combines z-coordinate and isopycnic coordinate systems to accurately depict ocean dynamics, particularly in regions with complex terrain or changing water characteristics. The HYCOM can forecast various oceanic parameters, including ocean heat content, currents, salinity, sea-surface height, and sea-surface temperature; however, it is not designed to predict total water levels on beaches.
The National Oceanic and Atmospheric Administration (NOAA) has an Extratropical Surge and Tide Operational Forecast System (ESTOFS) model. The ESTOFS is a broadly used operational water-level model; however, it has limitations because it does not include waves, steric effects, or other forces (NOAA, 2023; Ocean Prediction Center, 2023). Although the ESTOFS can predict the tidal and surge components that contribute to the total water level, it cannot predict the critical wave run-up component.
The NOAA has also developed the Northern Gulf of Mexico Operational Forecast System (NGOFS2). This numerical modeling system forecasts oceanographic variables, including the water temperature, salinity, currents, and sea level of the northern Gulf of Mexico region (NOAA Tides and Currents, 2023a). The NGOFS2 uses oceanographic models, computational algorithms, and observational data to generate operational forecasts that facilitate navigation, search and rescue operations, coastal management, and environmental monitoring. Although NGOFS2 does not directly forecast coastal inundation or wave run-up, its sea-level and ocean current forecasts could serve as valuable inputs for other models designed to predict such phenomena.
Some models can predict inundation, including a hybrid of hydrodynamic and wave models such as Advanced CIRCulation (ADCIRC) and Simulating Waves Nearshore (SWAN; Dietrich et al., 2012; Luettich, Westerink, and Scheffner, 1992). The ADCIRC simulates tidal circulations and storm-surge propagation, aiding in flood prediction and long-term coastal planning for hurricane resilience. With advancements in computational capabilities, ADCIRC is now operational in real-time within the Coastal Emergency Risks Assessment (CERA) framework (CERA, 2023; University of North Carolina at Chapel Hill, 2023; U.S. Climate Resilience Toolkit, 2023). Meanwhile, SWAN predicts wind-generated waves in coastal areas, thereby contributing to inundation estimates by forecasting wave characteristics. Despite these advancements, these combined models have limitations such as lacking specificity for individual beaches and direct prediction of wave run-up, a crucial parameter required for total water level forecasts for beaches.
The dynamic nature of specific beaches can be modeled using XBeach, a widely used open-source numerical model specifically designed for simulating hydrodynamics, morphodynamics, and wave-induced inundation processes along coastlines (Roelvink et al., 2009). Despite its immense value in simulating various scenarios, it is computationally expensive. Seok and Suh (2018) discussed the potential to run XBeach in an operational setting to predict beach erosion with a 12-hour lead time by coupling the model with real-time outputs from ADCIRC and spectral data from SWAN as boundary conditions. This coupling makes real-time implementation a substantial challenge, yet to be operationalized to our knowledge. Other operational challenges would include the need for periodic updates of the bathymetry and beach morphology of the study area. The proposed model in this paper, although limited to a single location, is more straightforward to deploy and is running in real time.
Historically, cameras have been used to monitor the extent of coastal inundation, which supports the incorporation of wave run-up effects (Salmon, Bryan, and Coco, 2007). The functionality of cameras for monitoring the nearshore environment continues to grow through use of platforms designed for in situ coastal observations. This includes the platforms of the Web Camera Observation Network (Southeast Coastal Ocean Observing Regional Association [SECOORA]–WebCOOS, 2023; SECOORA–IOOS, 2023). Although incapable of predicting total water levels, these applications capture real-time imagery, which is valuable for quantifying beach inundation events. However, inherent limitations occur, such as the distance of the platform from the water and camera or video challenges, that preclude the ability to record conditions clearly during darkness or when lens clarity is obstructed because of blowing sand, condensation, or other factors. Additionally, these programs do not monitor beach elevation or create digital elevation models (DEMs) from the imagery, which are both essential for developing a total water level model.
To bridge these gaps in modeling, a novel machine-learning approach is proposed for predicting total water level on the Gulf-facing beach adjacent to Horace Caldwell Pier in Port Aransas, Texas. This approach combines local and regional metocean data with elevation data from beach profile surveys. Average water level and wave measurements from nearby tide gauges and offshore wave buoys serve as input for the machine learning model, which forecasts total water levels for the upcoming 12, 24, 48, and 72 hours.
The four research contributions are multiple and unique. (1) To the author’s knowledge, this is the first machine-learning model capable of predicting total water levels, including beach run-up. (2) The camera-based observation dataset combined with frequent beach surveys is also original and rare. The high-frequency images allow the delineation of the location of the wet/dry shoreline on the beach, and the proposed method allows for the estimation of the elevation of the wet/dry shoreline. The resulting time series of total water-level measurements provides a target for the AI predictive model that includes wave run-up and is applied to evaluate the model. (3) The overall dataset itself is unique, including the combination of DEM survey data with camera observations, resulting in a total water-level time series, along with the colocated metocean variables, average water levels, wind, and offshore waves. (4) The machine-learning total water-level model has the capacity of operating in real time. In addition to these four research contributions, the real-time model is available from Coastal Dynamics Lab (2024).
Study Area
The research was conducted on the beach adjacent to Horace Caldwell Pier in Port Aransas, Texas, which is directly on the Gulf of Mexico. It is a small segment of the Gulf-facing beach located south of the Aransas Ship Channel South Jetty on Mustang Island, Texas. The study area measures approximately 54.86 × 85.34 m and is a part of the I.B. Magee Beach Park, managed by Nueces County Coastal Parks. Vehicular traffic is prohibited (Texas Natural Resources Code, 2024), maintaining a more natural state of the beach morphology compared with other beaches along Mustang Island. The study area is located along the widest beach segment on the island, other than the fillet north of Packery Channel, which is located nearly 21 km to the south. The berm ranges from 250-m wide near the jetty to 50- to 100-m wide near the pier. A maintained driving lane and a nonpaved parking area lie between the dune line and the study area.
Analysis was focused across the section of the berm that extends seaward from the parking area across the central berm, ending at the water’s edge during data collection (Figure 1). The broader beach segment from the pier to the south jetty has remained relatively stable since the 1950s, with erosion events occurring mainly during hurricanes and tropical storms, such as documented during Hurricane Harvey (2017; Gibeaut et al., 2001; Morton and Pieper, 1977; Williams and Turner, 2022). Along the study area, longshore sediment transport primarily moves toward the NE, especially during strong SE winds that are prevalent for most of the year. In contrast, longshore sediment transport toward the SW is limited by the sheltering influence of the 1900-m-long jetties that extend approximately 1000 m seaward of the 2023 shoreline position (Williams and Turner, 2022). The natural beach morphology of the study area is maintained because infrequent mechanical sand redistribution and the absence of vehicular traffic. Because this segment of the beach is unusually wide, park staff maintain a small trench at the southern end of the study area to minimize pooling after rain and inundation periods to maintain pier access.
To correctly interpret model results within the study area, it was necessary to account for site-specific characteristics and recognize potential differences compared with other beach locations. The study area is located immediately adjacent to a pier and within 1000 m of a jetty, experiences a microtidal range, and is protected from vehicular traffic. It is also notable that this beach segment had relatively few significant foreshore erosion events where sand was redistributed landward over a 13-month study period, as described by Vicens-Miquel, Williams, and Tissot (2024).
The presence of a pier structure on the beach and in the nearshore influences ocean wave behavior, causing wave shadowing and turbulence due to wave refraction. These effects alter wave energy distribution along the beach and can result in localized erosion or accretion (Ludka et al., 2015; Miller and Dean, 2004; Pianca, Holman, and Siegle, 2015; Plant et al., 1999; Splinter et al., 2014). Similarly, piers disrupt sediment transport, typically leading to variable levels and trends in accretion in the immediate vicinity of the pier that influence berm width and shoreline position. Focused accretion can be transient and is frequently focused on one side of the pier, with erosion or limited change on the other, underscoring the necesssity of understanding sediment dynamics specific to beach segments that vary by location (Splinter et al., 2014).
The proximity of nearby jetties can significantly alter localized coastal processes, including sediment transport within a zone of influence. This dynamic is related to the length of the jetty, which provides shelter from waves and longshore currents that influence the magnitude and primary direction of sediment transport while promoting sand impoundment. Studies have demonstrated the influence of jetties on sediment transport, particularly the sheltering effect they have on adjacent beaches where they limit sand loss and have a limiting effect on sediment transport from sources that are updrift of the jetty (Fox and Davis, 1978; Garel, Sousa, and Ferreira, 2015; Hunter, Richmond, and Alpha, 1983; Knight and Burningham, 2003).
The unique tidal dynamics in the study area distinguish it from other coastal settings. Although the Texas coast has a limited tidal range on the order of 0.4 m (NOAA Tides and Currents, 2024), metocean forcings, particularly strong winds during cold-front passages and wind directed from the south during the spring and summer seasons, can lead to water-level changes larger than the tidal range. These events can lead to coastal flooding or expose significant portions of the foreshore and berm, particularly during seasonal lows in water level. This exposure allows for strong onshore wave action to impact areas typically submerged during high tide, thus potentially focusing wave energy across a broader range of the foreshore and berm as water levels fluctuate during storm events (Coco et al., 2004).
The state of Texas permits vehicular traffic on beaches, and this study focuses on a protected area that prohibits public driving. This minimizes the impact of vehicles and mechanical sand redistribution that is inherent to maintaining vehicular access. Removing these influences enhances the accuracy of interpreting changes in the beach morphology. Although lifeguards and beach managers occasionally drive on these beach segments, the limits on vehicular traffic reduce anthropogenic influences that could otherwise be reflected in the data, providing a clearer understanding of the natural beach dynamics. The findings along the study area are representative of those along adjacent segments where driving is also prohibited.
METHODS
The methodology employed in this study is illustrated in Figure 2, presenting a systematic workflow further described in later sections. Various steps were conducted concurrently, including the continuous automatic imagery collection process (“Automated Beach Imagery Collection” section) and the frequent beach-profile surveys (“Survey Methodology” section).
Beach-profile surveys that were generally conducted bimonthly, referred to as frequent surveys, and imagery collected at 30-minute intervals were combined to identify morphological changes within the study area. Key metocean variables were identified that were correlated to inundation events and are key to determining the drivers of these events for the study area that include water level, significant wave height, and dominant wave period (Vicens-Miquel, Williams, and Tissot, 2024). These drivers then served as input to the coastal inundation AI model. Additionally, the frequent surveys were applied to generate DEMs, which, when combined with camera imagery, enabled the computation of the wet or dry shoreline elevation, the target for predicting coastal inundation.
Frequent Beach Surveys
Frequent beach surveys in combination with monitoring of the study area with cameras were necessary for understanding the frequency and impact of inundation events, as well as the underlying factors causing it. Monitoring involved a series of 29 beach-profile surveys conducted between February 2023 and January 2024. These beach-profile surveys were spaced at an average interval of 2 weeks apart. The data from these frequent beach surveys was applied to generate DEM models for the study area, which, when combined with imagery from cameras, allowed for the extraction of wet/dry shoreline elevation, which was the intended target for the machine learning model.
Survey Methodology
The beach-profile surveys were conducted using a Trimble R10 Global Navigation Satellite System (GNSS) multifrequency receiver with broadcast corrections from Trimble virtual reference stations using the G3 CMRx NAD83 2011 model (Trimble, 2009). According to Trimble (2024) documentation, the horizontal and vertical accuracies (RMS) are 2 and 3 cm, respectively. This system was used for measuring topographic spot elevations across the beach through real-time kinematic (RTK) positioning. A systematic approach was adopted to guarantee precise data acquisition, employing a 3.05 × 3.05-m grid (10 × 10 ft), illustrated in Figure 3. Surveys were conducted with x and y coordinates referenced to NAD 1983 State Plane Texas South, whereas elevation measurements were based on NAVD88 (converted from NAD 1983 ellipsoid heights using GEOID12B). Although all survey transects originated from the same point, the length of each transect varied because of the shifting position of the waterline near the water during the survey, which signified the endpoint.
Drivers of Coastal Inundation and Morphological Changes in the Study Area
Understanding the coastal processes that force coastal inundation and morphological change in the study area was essential for identifying key metocean parameters that control inundation events. Datasets that documented intervals of change in metocean parameters were applied to identify the frequency of these events and whether erosion occurred, which in turn influenced the DEM for the study area.
In a focused study conducted at the same study area as this paper (July 2022 to October 2023), morphological changes were characterized during the examination of three full and four partial beach-inundation events (Vicens-Miquel, Williams, and Tissot, 2024). During these events, changes were measured across 25 beach profiles that were measured typically on a bimonthly schedule. In addition to these seven identified inundation events, more inundation cases could have taken place overnight and were missed because of the inability of the cameras to capture images of the beach during periods of darkness. The study revealed that the maximum dominant wave period, maximum average wave period, maximum significant wave height, and average water levels were the main factors driving inundation. Additionally, over the 16-month study period, the maximum dominant wave period was identified as the primary driver of foreshore erosion events, with sand redistribution landward.
DEMs Generation
The DEMs were generated from the GNSS survey grid data using ArcGIS Pro software following a consistent series of steps. Each DEM was generated at approximately 35 cm/pixel spatial resolution and covered a total area of 3900 m2. Because of the numerous surveys that were conducted, a Python script was developed to automate the creation of DEM models for each survey. The script was executed within a Jupyter Notebook integrated into the software. The code is available in the project’s GitHub repository for reproducibility and ease of reference.
The precision of the DEMs was evaluated by creating multiple DEMs from the same survey grid after removing one grid point, also corresponding to one ground measurement, from each DEM. Subsequently, the DEM elevation for the removed grid point was compared with the elevation for the same x and y coordinates in the original DEM containing all points. This process was repeated 20 times, each time removing a different grid point. The selected grid locations for the point removal were well distributed across the study area. The average error over these 20 DEMs resulted in a mean absolute error (MAE) of 0.82 cm and root mean squared error (RMSE) of 1.51 cm. This evaluation process was conducted on only one randomly selected DEM because all DEMs were generated using the same procedure; thus, the DEM error is expected to be comparable across all DEMs.
Automated Beach Imagery Collection
The image collection process for this project involved capturing one photograph of the study area every 30 minutes, commencing 1 February 2023 and ending on 31 January 2024. The camera used was an Amcrest 4K Outdoor Security IP Turret Power over Ethernet (PoE) Camera IP67 IP8M-T2599EW. This camera has a 4K (8-Megapixel/3840 × 2160) resolution and a wide 125º viewing angle, ensuring coverage of the entire study area.
Camera and Other Hardware
The objective of the camera installation was to establish a system capable of automatically capturing images at 30-minute intervals and making them available for online access. To achieve this goal, an on-site computer system was necessary to oversee the camera array along with a network switch and hotspot to facilitate internet connectivity for both the camera and the computer. The specific components employed for this purpose were the Zed Box Nvidia Jetson for computing, TRENDnet TE-GP051 as the network switch, the Sierra WireLess AirLink RV50X Modem providing internet connection, and a WD Elements 2TB external drive for imagery backup.
The Jetson computer was a small graphics processing unit Linux system used to manage the data collection process. The Command Run ON (CRON) scheduler initiated a Python script to drive the collection process on the computer. This script handled multiple tasks, including collecting imagery, compressing files, uploading them to Amazon web server (AWS), and creating on-site backups for added data security. The TRENDnet was a portable hard disk drive (HDD) that was connected to the Zed-Box, serving as a local backup solution. A script transferred the day’s imagery from the HDD to the TRENDnet drive every night. Additionally, a script scanned the on-site backup for potential missing data before pulling a dataset from AWS. If any gaps were identified, the script uploaded them to AWS, ensuring data integrity before finalizing the dataset.
A cellular modem, Sierra Wireless AirLink, supplied internet connectivity to the Zed-Box computer. This enabled the regular uploading of data packets to AWS for storage and facilitated access to the Zed-Box via Secure Shell for necessary maintenance tasks. All images were securely stored on a WD Elements 2TB external drive, serving as a backup to prevent data loss in the event of any upload failures to AWS. This setup ensured data integrity, allowing images to be subsequently uploaded to AWS (2023) should an upload fail initially. An alert system was in place to track upload failures and promptly notify the administrator, preventing any potential data loss and ensuring the reliability of the backup system for imagery.
The objective of this configuration was to develop a cost-effective solution that is easily replicable. The overall expenditure for the setup amounted to $1961, which can be itemized as follows: $119 for the Amcrest camera, $1299 for the Jetson device, $509 for the modem, and $34 for the cables (items purchased in 2022). Replacing the Jetson with a Raspberry Pi 5/8GB would further lower the cost from $1961 to $750.
Installation and Organization of Instrumentation
Instrumentation was installed at the Nueces County learning center building on Horace Caldwell Pier in Port Aransas, Texas. The installation included two distinct weatherproof boxes (Figure 4). The first box, located inside the building, contained essential computational and communication components, such as the Nvidia Jetson computer, Sierra Wireless AirLink cellular router, WD Elements 2TB external drive, and the TRENDnet TE-GP051 network switch. The second box was situated outside the building to protect the camera from various potential weather and anthropogenic hazards.
The outdoor box was designed to protect the camera against theft, corrosive beach effects, and other environmental factors to improve the longevity of the equipment. The enclosed structure also prevented the camera lens from accumulating dirt or salt, guaranteeing clean imagery as the protective plastic covering was regularly cleaned. Moreover, the sturdy installation of the outdoor box ensured camera stability, a critical aspect for this research.
The Amcrest camera and the Jetson computer were connected via a PoE cable to the network switch for internet access. Subsequently, the Jetson computer managed the camera array through a Python camera driver program. This ensured the following functionalities: capturing an image every 30 minutes, compressing and storing the collected imagery to the backup external drive, uploading the compressed imagery to AWS using the AWS software development kit, and (4) overseeing an alert system. The alert system would send a notification if the camera was undetected, failed to capture images, or experienced difficulties uploading the data to AWS.
Method to Compute the Wet/Dry Shoreline Elevation
The steps to extract the wet/dry shoreline elevation followed the methodology proposed by Vicens-Miquel et al. (2022). The process involved georeferencing the camera imagery, delineating the wet or dry shoreline, overlaying the wet or dry shoreline on top of the DEM, extracting the elevation for each point, and calculating the mean elevation for the wet or dry shoreline after eliminating outliers. The result of this process applied to a time series of georeferenced images is one for the first total water-level time series.
Georeferencing the Camera Imagery
The cameras automatically capture oblique images every 30 minutes, 24 hours per day; however, most images lacked targets because of the inability to maintain targets on the beach continuously. When the survey teams conducted beach surveys approximately every 2 weeks, targets were placed on the beach, the center of each target location was measured using RTK GNSS, and the array of targets was captured by the camera imagery. Georeferencing these images is challenging because of the 8-MP resolution of the cameras combined with the paucity of clearly visible permanent features on a sandy beach, making it difficult to capture static objects for georeferencing purposes. To ensure accurate georeferencing, various transformations were evaluated, ultimately selecting the projective transformation because it was the one that provided the most accurate georeferencing (Figure 5). Throughout this study, the position of the target significantly influenced the accuracy of the georeferencing. It was observed that beyond halfway along the shore in the study area, the camera resolution made it difficult to precisely identify the centers of the targets, resulting in a notable drop in georeferencing accuracy. Further investigation revealed minimal elevation variations along the shore. Given these considerations, six survey target locations were selected to maximize the georeferencing accuracy in areas where the wet/dry shoreline predominantly occurs. All but five of the wet/dry shorelines identified in the 1-year dataset were located within this area defined by the targets. The location of the wet/dry shoreline for these five cases was still estimated, although less precisely. The location of the targets is illustrated in Figure 5.
Because of the lack of visible permanent matching features, the image-to-image registration technique was applied within the ArcGIS Pro software by importing control points (ESRI, 2023). This technique enables georeferencing new images based on a previously georeferenced image while aligning the pixels of the respective images. It is applicable in scenarios without consistent targets, where the camera remains stationary, and both images share identical pixel locations.
Given the static nature of the camera setup, a single image was selected as the reference. As part of the selection process, all images that included targets were georeferenced. The RMSE computed relative to the RTK GNSS measurements were then compared, and the image with the lowest RMSE was selected as the reference. The results were influenced by such factors as lighting conditions and the cleanliness of the camera box.
Of note, the effectiveness of this method hinges on camera stability; any movement can significantly affect georeferencing accuracy. Visible targets placed every 2 weeks during surveys served as checkpoints to ensure the accuracy of the z-coordinate extraction. Emphasizing the purpose of georeferencing, which is to extract the wet/dry shoreline elevation, underscores the importance of z-coordinate accuracy. Although x and y-coordinate accuracy are related, the primary focus lies on z-coordinate accuracy. An essential consideration in this approach is the need for a fair comparison between survey shots and target center elevations. Using the survey shot elevation would introduce bias due to inherent differences from DEM elevation. Hence, the DEM elevation was chosen for both survey shots and target centers to ensure equitable comparison. Thus, for the evaluation, the x and y coordinates of survey shots were used to locate them on a DEM, extracting the z coordinate. Similarly, after georeferencing the imagery intended for evaluation, the center of each target was located, extracting its elevation from the DEM. The error was evaluated for all surveys except for the first 3 months of the dataset because some of the target positions changed after this initial period of the study.
Table 1 presents the MAE and RMSE errors for x, y, and z coordinates. The primary focus of this research is ensuring optimal georeferencing accuracy along the z coordinate, which is essential for determining the wet/dry shoreline elevation. Analysis of the error statistics in Table 1 reveals a significantly smaller error in the z coordinate compared with the x and y coordinates, with a maximum MAE of 1.6 cm compared with 22 and 38 cm, respectively. This discrepancy between the accuracies along the z vs. the x and y directions were expected. Because of limited camera resolution, challenges in locating the target centers affect mostly the x and y accuracies, whereas the relatively flat profile of the beach leads to small changes in the z coordinates and an overall low error for z coordinates. The proposed georeferenced approach captures wet/dry shoreline elevations within a 2-cm margin. These results affirm the success of the image-to-image georeferencing method, facilitated by the relatively flat beach terrain with minor slope variations near targets.
Wet/Dry Shoreline Delineation
The wet or dry shoreline locations in the study imagery were identified manually and hand labeled following the same process described in Vicens-Miquel et al. (2022). In their work, the model successfully detected the wet/dry shoreline for multiple independent testing datasets. The success of the model predictions, including the close agreement between manually delineated and AI predicted we/dry shoreline locations, validates the entire process, including the manually labeled ground truth. In this research, the hand-labeling delineation was further validated based on the consistency between the total water levels, resulting from the manual delineation and the water levels measured at the nearby tide gauge. Figure 6 displays the comparison, where a high level of consistency is observed between both time series. As the dataset grows, it becomes possible to calibrate a machine learning model to automatically delineate the wet/dry shorelines while accounting for the seasonal and interannual variabilities.
Computing the Wet/Dry Shoreline Elevation
The wet/dry shoreline elevation was computed following the methodology proposed by Vicens-Miquel et al. (2022). Given the large number of images, a Python script was executed within a Jupyter Notebook in ArcGIS Pro software to automate the process. The script encompassed the subsequent steps is in the project’s GitHub repository for reproducibility.
Wet/Dry Shoreline Elevation Time Series (Machine-Learning Target)
Figure 6 illustrates the time series of the wet/dry shoreline elevations alongside the Port Aransas water levels and harmonic predictions. The elevation of the wet/dry shoreline changes more slowly than water levels, especially during outgoing tides when the wet/dry shoreline delineation remains at its tidal cycle maximum for a few hours. The high correlation between similar wet/dry shoreline elevations while other metocean variables are changing poses a challenge to model calibration. To address this challenge, a decision was made to use only four images per day instead of using images captured every 30 minutes. Consequently, these four images per day were grouped into two sets: one set comprised images taken at 1500, 1700, 1900, and 2100 coordinated universal time (UTC), whereas the subsequent day’s set included images taken at 1400, 1800, 2000, and 2200 UTC. The UTC was chosen to account for daylight saving changes being 6 or 7 hours ahead of the local time, depending on the time of year. These specific times were selected because of the availability of natural light conditions on the beach. Furthermore, by varying the hours for different days, biases related to specific times of the day were minimized, resulting in a more diverse dataset. Furthermore, it was found that using images with a 2-hour interval resolved the correlation issues, allowing observation of the variability of the wet/dry shoreline elevation related to the daily tide variability (Figure 6). It is valuable to note that if it rained, which was infrequent during the study period, it was not possible to determine the position of the wet/dry shoreline on the beach; thus, images taken during rain events were removed from the dataset.
Analyzing Figure 6 in detail, the overall low-frequency (seasonal) and the higher-frequency (daily) water-level ranges can be compared. The lowest daily average harmonic predictions are found in early July at about −0.1 m, whereas the highest are found in late October at about 0.35 m. The lowest daily average water-level measurements are found in late July also at about −0.1 m, whereas the highest are found in late September at about 0.65 m. The water-level measurement range is larger than the tidal range because of the impact of metocean forcing not included in harmonic predictions. The dynamic of the wet/dry shoreline elevation is somewhat different, with daily lows of around 0.5 m reached throughout the year in early February, late April, mid-May, then throughout December and January. The wet /dry shoreline elevation average daily highs were reached during the periods of mid-February to early May, during the impact of Tropical Storm Harold on 22 August and during the November–January period. Outside the impact of Tropical Storm Harold, the highs and the lows of the wet/dry shoreline elevation were reached during the frontal season and are event or wave driven.
During the late fall to midspring season, referred to as the frontal season (i.e. January, February, March, April, May, October, November, December), frontal systems pass off the coast associated with large barometric pressure drops, strong winds, and a transition in wind direction from out of the SE to out of the north. The passage of these cold fronts leads to substantial differences in water levels and even larger differences in wet/dry shoreline elevations, the latter because of the larger waves. These periodic and large wave events are followed by periods of lower wind speeds and lower wave climates over the summer months (i.e. June, July, August, September), labeled as the nonfrontal season, in the absence of tropical storm activity. The peak in the elevation of the wet/dry shoreline was observed on 22 August 2023 and was attributed to Tropical Storm Harold, which made landfall as a depression on North Padre Island at approximately 1500 UTC. An increase in water level coincided with the landfall of Harold, during which the peak wind gusts exceeded 20 m/s along with a slight decrease in air temperature and barometric pressure.
Reduced variability over the summer months is expected, as conditions—excluding hurricanes—are generally consistent. The summer months are typically dominated by S-to-SE winds, the absence of cold fronts, and no astronomical peaks. In contrast, much more variability occurs in metocean and astronomical parameters from October to February, typically characterized by two periods of extremes in the tidal cycle, with highs in October and lows in February.
For the wet/dry shoreline, the annual variability is split between the frontal and nonfrontal seasons, and the variability of the water levels and harmonic predictions follows a distinct pattern. Typically, the expected seasonal high occurs at the start of the winter season in October or early November. This is followed by a seasonal low in January to February, a secondary high in late April into May, and finally a secondary seasonal low in July (NOAA Tides and Currents, 2023b).
Physics-Based, Machine-Learning Model Inputs
The initial inputs for the model were based on the study conducted by Vicens-Miquel, Williams, and Tissot (2024), which identified the metocean variables that had the highest correlation with morphological changes for the same study area. These variables included the offshore-dominant wave period, the average wave period, the significant wave height, and nearby NOAA’s tide gauge water levels at Port Aransas (NOAA Tides and Currents, accessed 2023c). To prevent gaps in the data due to buoy downtime and given that the two Texas offshore buoys are relatively close to each other, wave data were selected as the maximum or sole existing value of the National Data Buoy Center (NDBC) 42019 (NDBC, 2023a) and NDBC 42020 (NDBC, 2023b) buoys. Other metocean variables such as wind speed and direction were evaluated for this application but did not lead to performance improvements, so they were not included. The wet/dry shoreline elevation served as the target for the model.
Seasonal fluctuations in the wet/dry shoreline elevation with recurring peaks and lows were identified (Figure 6). To optimize model performance, the change in the elevation of the wet/dry shoreline was predicted rather than directly predicting its elevation. This process eliminated any enduring trends or seasonal variations in the original signal, enabling a more precise capture of temporal dependencies and consequently enhancing model performance.
To better capture this new signal, which relies on metocean data, four inputs were used: dominant wave period, average wave period, wave height, and average water levels. Maximum values may not have accurately reflected the variable’s behavior over either the previous or following 30 minutes, as they could have been outliers. For inundation to occur, metocean variables must persist over time rather than show only one short-term increase in the wave height or period. Wave and water-level inputs were constructed by considering hourly measurements over the last 12 hours and computing the average of the four largest measurements during that time. Computing the 12-hour average also accommodated offshore buoy data, considering their estimated 4- to 12-hour transit time to the coast.
Machine-Learning Model
Machine-learning models rely on large, diverse datasets over significant time spans to produce the most accurate results, and, in the case of this research, the dataset was small, comprising less than 1000 rows. Consequently, the options for selecting machine-learning architectures were limited. One architecture that stood out in such cases was the multilayer perceptron (MLP). In general, the MLP architecture performs well in scenarios with limited data, making it a good option for this research problem. Additionally, MLPs can capture temporal dependencies within the data. This temporal aspect was relevant in this context, where the relationship between metocean variables and coastal inundation was influenced by dynamic and time-varying factors.
The model architecture in this study comprised one hidden layer with four neurons and one output layer. It used sigmoid activation functions and applied L2 regularization with a regularization value of 0.0001 to prevent overfitting. The model employed a batch size of 512 and the Adam optimizer with a learning rate of 0.0001. Early stopping was implemented with a patience of 35 epochs, restoring the best weights for optimal performance.
To assess the predictive capabilities of the models across different months, variations between them were examined, and any relationship between model performance and the underlying physical processes was analyzed. To achieve this, a training strategy involving 9 months for model training, 1 month for testing, and the 2 preceding months for validation was employed. A k-fold, cross-validation approach with k = 12 was used to ensure that each month served as an independent testing set. Furthermore, to evaluate model robustness, each independent testing set and lead time combination was calibrated five times, and the median value was selected.
RESULTS
Two machine-learning models were developed to predict total water levels in real-time operational scenarios. The first model, a high-performance model, requires real-time measurements of wet/dry shoreline elevations that are not yet available, but it is realistic to expect its availability in the future. The second model, designed for present real-time operational use, relies on currently available inputs. Although it offers lower performance, it is already running in real time and publicly available.
Performance comparisons between the models are presented in Tables 2–5, with Figure 7a, b, c, d illustrating the predictions and observed measurements for all k folds. Table 2 and Figure 7A show results for the present operational model alone, as the high-performance model cannot make 12-hour predictions because of its reliance on past wet/dry shoreline measurements. Additionally, these tables illustrate substantial performance differences between the months. Based on their performance and the difference in the water-level dynamic, the months were classified as frontal months, which have lower performance (January, February, March, April, May, October, November, December), and nonfrontal months (June, July, August, September), which have higher performance.
Table 6 summarizes mean performance for each forecast time, distinguishing between frontal and nonfrontal months. For the high-performance model and for the frontal season, CF (15 cm) were, respectively, 74%, 65%, and 58% for 24, 48, and 72 hours, whereas the present operational model’s performance were lower by 52%, 46%, 46%, and 41% for 12, 24, 48, and 72 hours, respectively. The high-performance model exhibited differences in CF (15 cm) of 22%, 28%, and 29% for 24, 48, and 72 hours, respectively, when comparing nonfrontal to frontal months.
DISCUSSION
Predicting total water levels is inherently challenging because of the need to account for the combined effects of various metocean variables such as waves, tides, and meteorological conditions. This complexity is further amplified when predictions are made in real time, necessitating real-time observations of waves, water levels, and wet/dry shoreline elevation. This research compares two models developed to address these challenges.
The first model is a high-performance model designed to maximize predictive accuracy by using real-time measurements of total water levels, which are not currently available but are anticipated soon. The second model is a real-time operational model that offers lower performance compared with the high-performance model but is operating in real time using currently accessible inputs. The primary difference between these two models lies in the target variable used. The high-performance model predicts the difference in wet/dry shoreline elevation rather than its absolute value, whereas the second model predicts the absolute total water-level value without the need for the availability of real-time total water-level measurements.
Results in Tables 2–6 illustrate that both the high-performance model and the operational model perform significantly better during the nonfrontal months compared with the frontal months. This superior performance is because of the relative stability and predictability of environmental conditions during the nonfrontal months compared with the high variability of frontal passages during the frontal months.
During nonfrontal months, typically encompassing the summer season (i.e. June, July, August, September), weather patterns are more stable and predictable. Less variability in temperature, precipitation, and wind patterns occurs, leading to more consistent water levels and wave conditions. The reduced variability in environmental factors results in more stable wet/dry shoreline elevations. This stability allows the models to make more accurate predictions because the input data (e.g., water levels, wave conditions) exhibit less fluctuation. In contrast, during frontal months, the study area experiences frequent transitions between air masses of different temperatures and humidity, typically bringing abrupt changes in weather conditions. Frontal passages are characterized by significant variability in wind speed and direction, precipitation, and atmospheric pressure. These rapid changes cause fluctuations in water levels and wave heights. The high variability associated with frontal passages makes it more challenging for the models to predict total water levels accurately. The rapid changes in environmental conditions lead to larger errors in the models’ predictions because they must account for sudden shifts in inputs. Thus, the increased complexity and variability during frontal months pose significant challenges to the models’ performance.
Another factor contributing to the performance differences between the frontal and nonfrontal months is the limited size of the dataset. The dataset used for model training comprises slightly more than 1000 cases spanning 12 months. This is challenging because it means the model is being tested on months that have not been seen in the training dataset, so the specific seasonal conditions of these months are not represented in the training data. A dataset covering 24 months or more would provide a more robust training foundation ensuring that the seasonal conditions are better represented in the training, validation, and testing datasets.
A constraint of this research is the reliance on wave measurements from offshore buoys, which can be out of service for extended periods because of logistical challenges of conducting offshore site visits. In particular, the wave information of this model comes from offshore buoys 168 km (NDBC 42019) and 104 km (NDBC 42020) away from the study area, with several hours of lag between the locations for the wave train. However, these offshore wave observations are the only available wave data for the study area because no onshore wave measurements occur. Information from a nearby tide gauge is also subject to downtime, but land-based instruments typically experience fewer and shorter downtimes. To mitigate these challenges, measurements could be supplemented or replaced by numerical predictions, such as the high-resolution wave and water-level forecasts provided by the NWS Nearshore Wave Prediction System (NWPS). This system would then operate similarly to model output statistics (Annett, Glahn, and Lowry, 1972; Glahn, 1991). Additionally, the flexibility of AI would allow for the combination of offshore wave measurements with nearshore wave predictions. Although this method has yet to be tested, it could lead to better predictions. Accurate nearshore wave characteristics would be more relevant to the adjacent beach, reducing uncertainties caused by wave directions and the impact of nearshore winds.
Despite these constraints, Table 6 shows that for 12-hour predictions, the present operational model can meet NOAA’s standards for all nonfrontal months. For 24-hour predictions, the high-performance models meet the standard for all nonfrontal months, whereas the operational models meet the standard for three out of the four months. For 48-hour predictions, the present operational model is still able to meet the standard for one-half of the months. For the 72-hour predictions, the high-performance model meets this standard for one-half of the months. These results represent the significant success for this research because it indicates that the model can provide actionable information in real time. Performance during frontal months would likely be significantly improved by incorporating predictors related to the nearshore wave climate, including measurements that are presently not available or operational numerical predictions. Expanding data collection through new or existing monitoring stations could further enhance the model’s operational readiness.
CONCLUSION
This research introduces the first machine-learning model for total water-level predictions, which includes wave run-up. This accomplishment was made possible by installing a set of cameras capturing imagery every 30 minutes, supplemented by bimonthly surveys. The combination of camera imagery and DEM created from survey data enabled the extraction of wet /dry shoreline elevations and the creation of one of the first total water-level time series. Two models were calibrated and tested: an operation-ready model based on existing real-time predictors and a high-performance model that will be implemented once measurements of wet or dry shoreline elevations are available in real time. Both models perform considerably better during the summer months of June, July, August, and September as compared with the more dynamic winter frontal season. The differences are notable, with CF (30 cm) reaching 99% and CF (15 cm) at 96% during nonfrontal months for the high-performance model, whereas the performance of the operational model was respectively 99% and 90.1% for 24-hour predictions. During the frontal season, the performance decreases to 95% for CF (30 cm) and 74% for CF (15 cm) for the high-performance model and further drops to 88% for CF (30 cm) and 46% for CF (15 cm) for the operational model for 24-hour predictions. The differences in performance between the frontal and nonfrontal seasons are similar for 48- and 72-hour predictions. Both models (i.e., high-performance and operational models) achieved a CF (15 cm) of at least 90% across all nonfrontal independent testing datasets for 24-hour forecasts, meeting NOAA standards for operational water-level predictions. The creation of this model is a significant advancement as, to our knowledge, this is the first machine-learning based total water level predictive model. Although its performance is presently not up to operational standards during the frontal season, the operational model was shown to provide useful information as compared with average water-level predictions. In addition, the simplicity of its inputs being only past measurements makes it relatively easy to implement operationally compared with other models that require metocean variable predictions as inputs.
ACKNOWLEDGMENTS
This material is based upon work supported by the National Science Foundation under Grant No. RISE-2019758 within the NSF AI Institute for Research on Trustworthy AI in Weather, Climate, and Coastal Oceanography (AI2ES). Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation. We would like to acknowledge the efforts of the survey and data collections teams: Judy Millien, Yasmin Himsieh, Wyatt Miller, Spencer Berglund, and Beto Estrada. We also acknowledge the support of Scott Cross with the Nueces County Parks System for hosting the instrument ensemble on Horace Caldwell Pier, which made this research possible.
Code and Data Availability
All code and data discussed in the paper is available in the following GitHub repository: https://github.com/conrad-blucher-institute/ML-Total-Water-Levels-Predictions.