## 2017-335 ABSTRACT

Major oil spills in the United States often result in some natural resource damages (NRD), which arise from injuries to natural resources and losses of their services. Other things being equal, larger spills lead to larger NRD. Previous research on NRD settlements using multiple-regression analysis identified several factors that predicted variations in damages, other than the number of gallons spilled (Dunford and Lynes 2014). Those factors included the geographic location of the spill, whether threatened and endangered species were injured, whether recreation closures occurred, and whether unvalued compensatory restoration was part of the settlement. The current paper extends the previous research by adding NRD settlements in the past three years, and by exploring a new multivariate statistical method. Specifically, the new method employs a binary division of sample observations to create tree-like regression models, which are used to determine the correlations among the settlement characteristics. The new approach is non-parametric, which means that it imposes no assumptions about the underlying distribution of the predictor variables. The model identifies multiple combinations of key explanatory variables, providing more insights on observed differences in NRD settlement amounts.

## BACKGROUND

The Oil Pollution Act of 1990 allows federal and state government agencies, acting as Trustees for natural resources on behalf of the public, to collect monetary damages from parties responsible for oil spills in the United States for the concomitant natural resource injuries. The monetary damages are usually based on the cost of projects to restore the lost or impaired services resulting from injuries to natural resources.^{1} Oversight and monitoring costs for the restoration projects are often included in settlement amounts.

In this paper we develop a statistical model that explains a substantial portion of the variation in NRD settlement amounts for past oil spills. The next section of our paper describes the sources of the NRD settlement data, explains the challenges in determining the NRD amount for oil spill settlements, and presents the NRD settlement data. Then, the following section provides our statistical model. The penultimate section presents the results of our statistical model, while the final section describes future research plans.

## NRD SETTLEMENT DATA

We obtained information for our analysis of NRD settlements from a variety of sources. The consent decrees accompanying settlements were our preferred source for settlement amounts. When consent decrees were not available, we used NRD documents such as Damage Assessment and Restoration Plans for settlement amounts. We also used *Federal Register* notices, press releases, and newspaper articles for settlement amounts, when necessary.

We encountered several difficulties in developing our NRD settlements database. The main difficulty was in isolating the NRD amount in “global” settlements, which often included reimbursement for response costs, penalties and fines, and other non-NRD elements. In several settlements, assessment costs were combined with a portion of response costs or other non-NRD costs. Even when assessment costs were not combined with non-NRD costs, the settlement documents often excluded earlier payments of assessment costs. (For example, sometimes the responsible party had paid for some of the Trustees’ assessment costs in an earlier stage of the assessment, so those costs were not part of the settlement amount.) Furthermore, the assessment costs in NRD settlements only reflect the Trustees’ assessment costs. The assessment costs incurred by the responsible party are not included in the settlement amount. Therefore, we excluded assessment costs from our measure of NRD amounts, focusing exclusively on primary restoration costs, compensatory restoration costs, compensable values, and oversight/monitoring costs for restoration projects. Furthermore, we excluded payments for unspecified support of governmental and non-governmental programs (e.g., oil spill prevention programs), whenever possible, under the assumption that these payments were made in lieu of fines or penalties.

We also found that some of the NRD settlements included projects that the responsible party was going to implement for which there was no cost estimate in the settlement. For example, the responsible parties on the 1994 Tampa Bay oil spill purchased some coastal property and developed a mangrove marsh on that property as part of the NRD settlement, but the settlement documents do not include a dollar value for that compensatory restoration project. Therefore, the Tampa Bay settlement amount in our database understates the full cost of that NRD settlement.

Finally, different sources of information on NRD settlements sometimes did not agree on the amount of the settlement or other characteristics of the spill (e.g., the amount of oil spilled). In such instances we assumed that the consent decree or the NRD document from the trustees (if we did not have the consent decree) was the most reliable source of the information in dispute. We also used the lower end of the range provided for the amount of oil spilled.

Our analysis includes every oil spill having an NRD settlement, except for five spills having no estimate of the spill amount and three “anomalous” NRD settlements. For example, we excluded the NRD settlements for both the *Exxon Valdez* ($0.9 billion paid over 10 years) and the *Deepwater Horizon* ($8.8 billion paid over 15 years) oil spills, because they are one or two orders of magnitude bigger than the next largest NRD settlement. We also excluded the $0 NRD settlement for the *Mega Borg* oil spill, which involved almost five million gallons of oil but caused no measurable environmental damages. We excluded the NRD settlements for small oil spills in Washington based on that state’s NRD formula and three spills outside Washington for which we had no information on the amount of oil spilled. Finally, we excluded NRD settlements for “chronic” oil spills occurring over decades (e.g., the diluent leaks in the Guadalupe, California, oil field). In conclusion, our database includes NRD settlements for 91 oil spills ranging from 422 to 3,800,000 gallons.

Table 1 provides summary information on the oil spills and NRD settlements in our database. For the purposes of this paper we have listed the oil spills alphabetically by the name of the spill. The NRD settlements (excluding assessment costs) average $4.35 million in 2016 dollars after adjusting for inflation, and range from $0 to $60.0 million.^{2} The total for all of the NRD settlements in Table 1 is about $396 million in 2016 dollars.

## STATISTICAL MODEL

### Variables Included in Our Statistical Model

Our statistical model uses a multiple-regression approach to explain variations in inflation-adjusted NRD settlement amounts (less assessment costs) in millions of dollars for oil spills in the United States (the “dependent” variable) as a function of the magnitude of several “explanatory” variables. In general, the explanatory variables are the characteristics of an oil spill that might affect the magnitude of natural resource damages, including:

amount of oil spilled,

type of oil spilled,

year of the settlement,

number of years between the spill and the settlement,

season in which the spill occurred,

geographic region in which the spill occurred,

whether the spill occurred in saltwater or freshwater,

whether threatened or endangered species were injured by the spill,

whether the spill closed a recreation area,

whether unvalued compensatory restoration costs are part of the settlement,

the number of trustees, and

whether one or more of the trustees were tribes.

Other factors that might affect the magnitude of NRD settlements may include:

the duration of the cleanup,

the geographical extent of the spill,

the number and type of injured threatened or endangered species,

the duration of recreation area closures, and

the availability of similar substitute recreation areas for the closed recreation areas.

Unfortunately, we could not find information on these factors for most of the oil spills in our database. Consequently, we could not include these factors in our statistical analysis. Table 2 describes the explanatory variables that were significant in our statistical model.

### Regression Tree Prediction Model to Determine Significant Variable Splits

As noted above, a wide variety of spill characteristics may affect NRD settlement amounts, but the parameters that are significant drivers for the final settlement amount may not be readily apparent. Our multivariate statistical analysis utilizes a binary division of the sample to create tree-like regression models (Kampichler et al. 2010) to determine the significant parameters that drive settlement amounts. Binary regression trees are a non-parametric technique that are not limited by the requirements of linear regressions regarding normalized distributions, and the initial distribution of the data can be used without having to transform any non-normal data (Lemon, 2003).

We used the binary regression-tree statistical method as an investigation tool to identify significant splits within the compiled dataset of NRD settlements. The binary regression-tree method revealed relationships between multiple variables within the data that might not be highlighted using other generalized multiple-regression models. We chose the binary regression-tree method because it highlights the underlying relationships of the prediction model and identifies splits within the compiled dataset. The continuing development and widespread use of this statistical method within the fields of health sciences, social sciences, statistics, computer science, environmental sciences and other fields speaks to the general acceptance of this method as an exploratory regression tool.

We used the statistical program R (R Core Team, 2016) with the package ‘tree’ (Ripley, 2016) for the regression-tree statistical analysis to create a prediction model of NRD settlement amounts. We used all of the variables listed in Table 2 as predictor variables with the NRD settlement amounts adjusted to 2016 dollars as the response variable. To build a regression tree, we split the sample population into two unique groups based on an identified significant division of the data by choosing one of the predictive factors. Then, we applied this process again, treating each new group as its own unique entity and finding the next set of variables that best divides the input population into two new groups. We repeated this process recursively until a minimum size was reached or a subgroup could no longer be subdivided (Therneau, et al. 2011). Then, we pruned the initial tree to remove the least important nodes by recursively removing weak splits (Ripley, 2016) to prevent the tree from overfitting the data.

## RESULTS

### Regression Tree for Predicting NRD Settlement Amounts

The prediction regression tree created using the R library “tree” contained 4 significant splits using the variables Gallons, Region, and T&E Species, as described in Table 3. Figure 1 is a depiction of the regression tree, which created 5 unique groups using the significant variables. The bottom of Figure 1 shows the group size and boxplot for each terminal node group, with the first quartile, median and third quartile shown in the boxplots.

### Trends Revealed by the Regression Tree

The significant splits within the dataset are easily seen when examining Figure 1. For example, when there is a threatened or endangered species present within an area impacted by a spill, the NRD settlement amount is 5 times larger than if a threatened or endangered species is not present within the impacted area. This can also be seen with respect to the region within the United States. Those spills that occur on either the Atlantic or Pacific coasts have NRD settlement amounts that are 8 times larger than a comparable spill in the Gulf of Mexico. Table 4 identifies each of the terminal nodes and shows the variable combinations that create each of the unique splits, the predicted NRD settlement amounts, and the range of values within each group.

### Prediction Power of the Regression Tree

A previous multiple regression analysis of NRD settlement amounts using an older version of the dataset had an R squared of 0.78 (Dunford and Lynes, 2014), so it explained 78% of the variation in the settlement amounts. The relatively simple nature of the regression-tree model results in a weaker fit to the NRD settlement amounts with an adjusted R squared of 0.39, as shown in Figure 2. However, the slope and intercept of the regression line were both very significant, and the regression-tree model clearly shows the thresholds at which differences in the circumstances of a spill event influence NRD settlement amounts. For example, the injury of a threatened or endangered species or the geographic location of the spill significantly impact the NRD settlement amount.

## FUTURE RESEARCH

The use of the binary regression tree for creating a prediction model is one of many different machine learning algorithms that could provide insights on how different variables influence NRD settlement amounts. In the next phase of our research we are planning to explore other potential algorithms. Additionally, we are going to investigate alternatives for combining the insights gained from the regression-tree model and the ordinary-least-squares regression model from the previous statistical analysis of NRD settlement amounts by Dunford and Lynes (2014). In particular, the previous analysis revealed a highly non-linear relationship between the NRD settlement amounts and the explanatory variables. In contrast, the regression-tree approach identifies differences in the relationship between the NRD settlement amounts and the explanatory variables below versus above threshold levels of key explanatory variables. The threshold levels for the splits may be useful in creating new interaction variables that can be included in the non-linear regression approach. The goal will be to increase the predictive power of the non-linear regression model by incorporating the interaction variables created from the results of the regression-tree model.

There are two primary purposes for predicting NRD settlements from a statistical model. First, Trustees can use such a model to determine the appropriate level of effort for assessing NRD. In particular, it is inefficient for assessment costs to exceed the NRD amount (e.g., it is inefficient to spend $2 million in assessment costs to determine that the NRD amount is $1 million). Trustees could use such a statistical model to estimate a range for NRD shortly after a spill, which would help them in determining the appropriate effort for assessing the NRD. Second, Potentially Responsible Parties (PRPs) for oil spills could use the statistical model shortly after a spill to determine an amount to reserve for an NRD settlement at a later time. As the assessment proceeds the PRPs can refine the initial reserve amount as warranted.

It is not possible for a statistical model to explain all the variation in NRD settlement amounts, which means that a final NRD settlement could fall outside a 90% confidence interval from a statistical model, for example. However, the likelihood of such an outcome is low (i.e., less than 10% in our example). Therefore, a statistical model should be a useful tool for both Trustees and PRPs in predicting NRD amounts shortly after an oil spill occurs.

## REFERENCES

^{1} In some instances the responsible party implements one or more of the restoration projects rather than give the cost of the project(s) to the Trustees for them to implement. In many of these instances, the cost of these restoration projects is not included in the NRD settlement, which means that the settlement amount underestimates the total cost of the settlement. As explained later, we take this into account in our statistical model.

^{2} Thirteen oil spills in our database (one-seventh) have $0 NRD settlements after excluding assessment costs. In all 13 of these oil spills the responsible party implemented the restoration projects required by the settlement, so the Trustees did not receive any payment for natural resource damages other than assessment costs. The impact of these settlements is quantified by the Unvalued Compensatory Restoration variable in our statistical model.