## ABSTRACT

This paper presents various empirical strategies used to analyze the effect from school facilities on student outcomes, and discusses strengths and weaknesses by the methods. A key challenge in studies of student outcomes is that outcomes are affected by many factors and that many of these factors are correlated with each other. Moreover, some factors are difficult to measure, and cannot be observed in data. Hence, it is difficult to avoid problems related to omitted variables bias and the estimated correlations can thus often not be interpreted as causal effects. It is important to be aware of how difficult it is to move on from a correlation to a causal effect. If researchers wrongfully draw causal inferences one risks misleading policy makers into allocating resources to the wrong factors.

## Introduction

After labor, public buildings are probably the second most important input factor in the production of public services. Hence, it is of both practical and academic interest to evaluate the correlation between building conditions and the quality of public services. Due to the availability of rich data on student outcomes, empirical researchers have given school building conditions particular attention.

This paper discusses some of the challenges related to empirical investigations of the effects from school building conditions on student outcomes. The main challenge in studies of student outcomes is that several factors affect outcomes and that these factors are correlated with each other. Moreover, some factors are difficult to measure, and cannot be observed in data. Consequently, it is difficult to avoid problems related to omitted variables bias and the estimated correlations can thus often not be interpreted as causal effects. It is vital that empirical researchers are aware of how difficult it is to move on from a correlation to an actual causal effect. If researchers wrongfully draw causal inferences one risks misleading policy makers into allocating resources to the wrong factors.

The paper proceeds with a discussion about how different data structures can be used to study this question and discuss some of the alternative empirical strategies used in the literature. In particular I will discuss papers that use survey data information about building conditions, and papers that study policy interventions that lead to changes in school investment levels.

The inevitable conclusion is that studies using settings where policy interventions lead to natural experiments in school investments are clearly superior to studies based on survey data in terms of causal interpretation (internal validity). This is due to the advanced empirical techniques natural experiments allow researchers to use. However, since these studies must rely on policy interventions, they can only be conducted in particular settings. This might raise questions regarding the generalizability (external validity) of the results.

A key advantage by surveys are that they can provide researchers with direct measures of building quality. Even if one cannot causally quantify how much a student's test score will improve by improving the school's building condition, it is obviously interesting to know if students in poor school buildings systematically underperform relative to those in good buildings. If such a correlation is established as a “stubborn fact”, one can proceed to a discussion about whether the building conditions cause the achievement, or if building conditions is a symptom rather than a cause. Another advantage by survey data is that one can collect data for a wide sample of the population relatively easily, which increases the generalizability of the results.

## Background

### The school as a producer of student outcomes

The ultimate goal of a school is to “produce” learning, typically measured in terms of test scores or grades.^{1} To help structure the discussion, it is useful to formulate a school production function as

Equation (1) gives student outcomes (*y*) as a function of several factors that are known to be important for student outcomes. First, *k* includes the schools' resources, broadly and generally defined. This includes all sort of resources, including school facilities. Second, *j* captures the students' individual characteristics and family background. Third, *p* denotes peer effects, i.e., how the characteristics and background of students affect other students in the same area.

Policy makers can often affect *k* directly, while the other factors can mostly only be addressed indirectly and often only in the longer run. For example, it takes time to affect peer effects by making poor and poorly educated neighborhoods affluent and well educated. In order to spend society's resources in the best way, the policy makers thus need to be informed about which factors that are most effective in stimulating student achievement. This has generated a vast empirical literature, which largely use variations of ordinary least squares (OLS) to investigate these questions.^{2}

In order to keep the technical discussion simple, assume now that building conditions is the only resource input \(\left( {{k_i}} \right)\) in school *i*. If we further assume that this resource is the only variable affecting student outcomes \(\left( {{y_i}} \right)\), we can estimate a simple OLS (i.e., OLS with only one explanatory variable) ^{3}

where \({u_i}\) is a stochastic error term and the \(\it \beta \)s are the OLS estimators. The estimated OLS coefficient for \({k_i}\) can be written as

The estimated coefficient \(\widehat {{\it \beta _1}}\) is equal to the true parameter \({\it \beta _1}\), plus an additional term. The fraction has the empirical covariance between the variable \({k_i}\) and the error term \({u_i}\) in the numerator, and the empirical variance of \({k_i}\) in the denominator. Taking expectations in (3) yields

From (4) it is clear that the expected value of the OLS estimator is equal to the true parameter (i.e., OLS is unbiased) if and only if the covariance between the variable and the error term is equal to zero. More compactly, the requirement for unbiasedness is that the conditional expectation of the error term given the variable is equal to zero, i.e.,

A main problem by estimating equation (2) is that the assumption that one single resource input is the only variable affecting student outcomes is obviously not correct. Many variables that are not directly tied to resource inputs (e.g., student characteristics and peer effects) are well-known to affect student achievement. Moreover, \({k_i}\) is not a single variable in reality, but rather a large set of different resource inputs. In order to illustrate the problem in the simplest possible way, assume that the true model for student outcomes is

where \({k_i}\) measures the school facilities while \({z_i}\) is “the only other variable that matters”. The estimator from the simple OLS equation can then be rewritten as

The \(\it \delta \) in (6) is the regression coefficient obtained by estimating the regression

If a variable that is correlated with both the outcome (\({\it \beta _2} \ne 0\)) and building conditions (\(\it \delta \ne 0\)) is omitted from the regression, the estimate for the effect of buildings captures some of the effect from the omitted variable. This is the essence of omitted variables bias, and can be generalized to cases with many omitted variables, such as individual characteristics, family background and peer effects.

In order to be able to formulate a model that gives an unbiased estimator, one has to think about the underlying mechanisms in the school production function. That is, one must think about how resources, including school facilities, interact with the other factors that determine student outcomes.

As an example, how do parents respond when a school increases its resources? Different factors can be either substitutes or complements. The factors are substitutes if parents, e.g., respond to more resources in school by assuming that more resources makes it less important that parents help their children with homework. If parents respond this way, one risks underestimating the effect from school buildings, since the estimator could capture the negative effect lower effort at home has on outcomes. The factors are complements if more resources in school has a stimulating effect on parents and actually make them help their children even more. This would mean that one risks overestimating the effect, as the estimator could capture the positive effect from more effort at home.

Another likely effect arises from the fact that resourceful parents are able to evaluate the quality of schools and the resource situation in them. Hence, one can also see a sorting effect where the most resourceful families crowd to the most resource rich schools. Similarly, one could have that the best teachers will go to schools with more resources. If this kind of sorting goes on, the OLS estimator for school building could pick up some of the positive effects on outcomes from a good peer group and good teachers, drawing towards overestimating the effect.

One must also consider how policy makers might respond to poor school building conditions. If policy makers perceive school buildings as important, they may provide compensatory policies in terms of increasing use of other resources in schools with poor buildings. If they respond in this way, the estimator could capture some of the positive effects from increased spending on other resources, and one thus risks overestimating the effect.

The mechanisms discussed above reveal an additional problem to the bias in the OLS estimator. Not only is there a very high risk that OLS will suffer from omitted variables, but due to the mechanisms drawing in opposite directions it is also very difficult to say in which direction the estimates are biased.

### Related literature

Earlier literature suggests several plausible mechanisms through which school building conditions may affect student outcomes. Some suggest that improving environmental conditions may bring substantial gains to student outcomes by reducing distractions and missed school days (Earthman, 2002; Mendell and Heath, 2005). Moreover, in a study of Finnish schoolchildren, Taskinen et al. (1997) find results that indicate a relationship between poor indoor climate and asthma and other respiratory problems. Hence, poor environmental conditions can even have direct consequences for students' health. In addition, Buckley et al. (2005) propose that better school building conditions can benefit teachers by improving their morale and reducing absenteeism and turnover, giving an indirect effect from building conditions on student outcomes.

Even though the literature suggests several plausible mechanisms, some empirical investigations find little or no effect from building conditions on student outcomes. Hopland (2012) studies the link between school building conditions and student outcomes in Norwegian schools using survey data on school building conditions and student outcomes on national tests.^{4} He finds that there is a tendency that students in schools with poor building conditions perform worse than students in schools with buildings in good condition. He concludes, however, that the link is weak and in most cases not statistically significant. Hopland claims that these findings probably reflect that Norwegian school buildings are ‘too good to matter'. In other words, the difference between schools reported to be in ‘good' or ‘poor' condition is not big enough to have a significant impact on outcomes. Small or zero effects are also found in other wealthy countries (Cellini et al., 2010; Hopland, 2013). However, in a study that focuses explicitly on a poor school district, Neilson and Zimmerman (2014) identify a significantly positive effect from investment in school facilities on student outcomes. This makes sense, since it is reasonable to assume that the effect of investment in school facilities is stronger in poor than in rich districts, because of poorer initial condition of the facilities.

In a follow-up of the Hopland (2012) study, Hopland (2014) questions how useful technical measures of facility conditions are in studies of student outcomes. He thus performs an empirical investigation of the correlation between the technical measure used by Hopland (2012) and student satisfaction with school facilities. It turns out that the correlation is statistically significant, but quite low. Inspired by that result, Hopland and Nyhus (2015) study the relationship between student satisfaction with the buildings and exam result and find a modest, yet significant correlation.^{5} This indicates that purely technical measures do not capture some important aspects of a building from a user perspective.

## Data structures and examples of empirical strategies

### Cross-sectional and panel data

A cross-sectional dataset consists of one observation per cross-sectional unit (e.g., students, schools or local governments), while a panel data includes several observations of the same cross-sectional unit over time. Studies that use survey data with direct measures of building conditions often relies on cross-sectional data (e.g., Hopland 2012; 2013). The reason is that building conditions develop slowly, so even if surveys were repeated over several years, there would likely be limited within-school variation in building conditions.

An alternative is to use public accounts on investment in facilities (Cellini et al, 2010; Neilson and Zimmerman, 2014). Since investments vary substantially over time, one can obtain meaningful variation within schools in panel data. Although investment is only an indirect measure of building conditions, there are strong empirical arguments in favor of using panel data rather than cross-sectional data.

In order to understand this, assume that a true cross-sectional model is again given by (6), but that the variable \({z_i}\) for some reason is unobservable. This kind of unobservable heterogeneity is a very real concern in empirical work, as it is in practice impossible to obtain proper quantitative measures of some of all the characteristics that make cross-sectional units different. In this case, the researcher has no other option than to estimate the mis-specified equation (2) and thus end up with the bias derived in (7).

If a researcher collects a panel with student outcomes and investment data over several years, the model can be extended as follows

In equation (7) it is assumed that building conditions \(({k_i})\) and the unobservable heterogeneity \(\left( {{z_i}} \right)\) are constant over time. The unobservable heterogeneity can be thought of as some inherent characteristics of a given school that changes very slowly and are notoriously difficult to measure, such as teaching culture or peer effects.

The error term in (7) is decomposed into two components, the time-invariant \({\it \alpha _i}\) and \({\epsilon _{it}}\) which varies both across schools and over time. For simplicity, assume that \(E\left( {{\epsilon _{it}}{\rm{|}}{k_i},{I_{it}}} \right) = 0\), so that it is only the time-invariant component of the error term that potentially gives a bias in the estimators. That is, the problem to be solved is that \(E\left( {{\it \alpha _i}{\rm{|}}{k_i},{I_{it}}} \right) \ne 0\). Importantly, this can be solved by using the data's panel structure.

Start by defining school-specific averages for all the variables and error term components as follows

The next step is to re-formulate the regression equation (8) in terms of the school-specific averages found in (9)

Subtracting (10) from (8) gives an equation where variables are measured as deviations from their school-specific averages

This is a practical example of the popular and widely used within-groups transformation, and OLS estimation of (11) gives the within-groups (also known as fixed effects) estimator. The essential is that in equation (11) the unobservable heterogeneity and the problematic error term component disappears, as they are constant over time. The coefficient for investments, \({\it \beta _3}\), is thus unbiased. Importantly, this example also shows a main drawback by using survey data on building conditions. As building conditions are very stable over time \({k_i}\) is not included in (11) and this transformation cannot be used in analyses that relies on surveys that give building conditions at a single point in time.

It is important to note that this stylized example is a simplification and that one could have unobserved characteristics that vary over time and give rise to omitted variables bias even when using the within-groups transformation. However, the transformation does clearly reduce the problem.

### Public accounting data and exogenous investment decisions

The studies by Cellini et al. (2010) and Neilson and Zimmerman (2014) use investment data, combined with exogenous political variation. Hence, they are able to use the panel data techniques discussed above. Moreover, in order to obtain proper identification of the causal effects, they use their particular data structures to further isolate exogenous variation in the data.

#### Regression discontinuity design

Cellini et al. (2010) study the effects from investment in school facilities in California. Californian school districts can issue general obligation bonds to finance the construction, improvement, and maintenance of school facilities. Proposed bond measures must be approved in local referenda. Importantly, districts that approve bond issues are likely to differ on both observable and unobservable dimensions from those that do not, and controlling properly for this is the main empirical challenge in their paper. In order to do so, they use a method that takes into account that districts in which bonds pass or fail by very narrow margins are likely to be quite similar on average.

This quasi-experiment allows them to use a regression discontinuity design (RDD). A detailed explanation of their methodology is beyond the scope of this discussion, but a short introduction to the methodology is sufficient to gain a working understanding of the principles behind the technique.

Let \(V\) and \({v_0}\) denote the vote-share and threshold value, respectively. Since Cellini et al. works with election data, the threshold value is where the vote is split evenly, and the idea is that it is random variation that gives whether the vote ends with 50 percent plus one rather than 50 percent minus one in favor of investment. An obvious problem is that by comparing such a narrow bandwidth of the observations, one will have few observations and likely obtain extremely imprecise estimates. By expanding the bandwidth, one obtains better precision, but risks introducing bias as the observations become more and more different. Where to set the cut-off is in the end a judgment call, and one should make sure to check that the conclusions are not sensitive to small changes in the bandwidth.

If the effect from school investment \({k_i}\) on student outcomes \(\left( {{y_i}} \right)\) is continuous, one should see a jump in outcomes at the threshold. The magnitude of this jump can then be used to measure the treatment effect. Figure 1 gives a graphical illustration of what this would like in a hypothetical example where there is an effect from investment on outcomes.

There are two main categories of RDD, sharp and fuzzy design. In a sharp RDD, everyone above the threshold value is treated, while no one below the threshold receives the treatment. The treatment effect can then be estimated by the OLS equation

In (12), the variable T is a dummy variable that is equal to 1 for observations above the threshold, i.e., \(T = 1\forall V \ge {v_o},T = 0\forall V \lt {v_0}\). Note that this formulation assumes that the slope of the effect from investments do not change when crossing the vote threshold and thus experience a jump. It is possible to allow for a change also in the slope by extending the regression by an interaction variable \(k \times T\).

In a fuzzy design, the probability of treatment increases substantially when crossing the threshold, but there might be untreated above the threshold and treated below. In this case, one can estimate the effect using a two-step procedure (two-stage least squares), where one first estimates

From (13) one obtains the estimated probability of treatment,\({\overline {T} _i}\), and can proceed to estimating

Equations (12) and (14) are identical, with the small but important difference that one has to use an estimated probability of treatment rather than a direct treatment variable in (14).

The main advantage by the regression discontinuity method is that it provides strong causal identification of the treatment effects and thus strong internal validity. The main downside is that it by construct relies on a limited sample that bunches around a chosen threshold. Hence, it is not obvious that the results can be generalized. Moreover, since the method only works in settings where natural experiments create discontinuities, it is often not available to empirical researchers.

#### Difference-in-differences

Neilson and Zimmerman (2014) study a school construction program in a poor urban district in Connecticut. Since the investments took place at different times across areas, and the timing was given exogenously, they can construct a difference-in-differences (DiD) and obtain strong identification of the causal effects from investment on outcomes. Again, it is beyond the scope of this discussion to present their empirical strategy in full detail, my aim is only to give a basic conceptual understanding of the DiD framework.

The idea behind DiD is that two similar groups follow the same trend until one of the groups receives a treatment (treatment group) whereas the other does not (control group). In this setting \({T_{it}}\) is a dummy equal to one if the school district is in the treatment group, while \({A_t}\) is a dummy equal to one after the treatment has occurred. In Neilson and Zimmerman, the treatment group has received school investments, while the control-group schools have not. Figure 2 illustrates graphically what this could look like in a case where school investment does affect student outcomes, and it is assumed that the two groups did follow a common trend prior to the treatment.

Technically, one can estimate the DiD effect using OLS from the equation

The treatment effect is given by the coefficient for the interaction variable defined as the product of the treatment-group and after-treatment dummies. Table 1 explains this in a simple and intuitive way.

As with RDD, a well-designed DiD secures high internal validity. However, generalizability can also be an issue for DiD, since it relies on a treatment group and a control group that is chosen to be as similar as the treatment group as possible, rather than a broad population. Moreover, DiD also require policies that create natural experiments it is often not available for researchers.

### Survey data on building conditions

The studies that most directly aim to measure effects from building conditions on student outcomes use survey data where either school leaders, students, or local governments provide a quantitative measure of the condition of their school buildings. Since building conditions in absence of large construction programs develop slowly, such studies tend to observe building conditions at a single point in time. Studies such as Hopland (2012, 2013) are thus limited to cross-sectional regression techniques, with the shortcomings discussed in section 3.1. Hence, despite the intuitive appeal of using direct measures of building conditions in this type of studies, studies relying of such data are clearly inferior to the studies using investment data in terms of causal interpretation.

The survey studied by Hopland and Nyhus (2015) was repeated annually, and the student responses varied enough over time to allow the authors to use school fixed effects in the estimation. Hence, some of the omitted variables problems that likely plague the estimates in the pure cross-sectional studies are less troublesome for this study.

However, one should still be cautious when interpreting the results of this study. It is well-known that the use of self-reported data is not unproblematic (see, e.g., Podsakoff and Organ, 1986). One problem is that successful students might simply be more satisfied in general, and thus be more prone to report that they are satisfied with the physical learning environment. This is potentially a very serious issue, as it turns the direction of the causality around: The level of outcomes might explain the reported satisfaction rather than the other way around. A second issue is that student perceptions of building conditions can be noisy since students may have low abilities to evaluate building conditions.

Hence, studies using surveys of student satisfaction surveys are also clearly inferior to the studies using investment data in terms of causal interpretation, even if the surveys allow for the use of panel methods. It is less obvious how studies using student satisfaction rate compared to those using technical measures of building conditions though, as both strategies have severe, albeit different, flaws that render their results less than satisfying for causal interpretation.

However, despite these caveats studies using survey data can be useful and relevant. Even if one cannot quantify how much a student's test score will improve by improving the school's building condition, it is obviously interesting to know if students in poor school buildings systematically underperform relative to those in good buildings. If such a correlation is established as a “stubborn fact”, one can next move on to a discussion about whether the building conditions cause the outcomes, or if building conditions is a symptom rather than a cause. Another important advantage by survey data is that one can collect data for a wide sample of the population relatively easily, which increases the generalizability of the results.

## Concluding remarks

This paper has discussed some of the alternative empirical strategies in studies of the relationship between school facilities and student outcomes. While some use data that directly measure building conditions, typically survey data, others use public accounting data to study the level of investment in school facilities. Both data structures have their advantages and drawbacks. Studies using refined panel data techniques can circumvent problems with unobservable characteristics and thus identify unbiased estimates of the causal effect. However, as these studies must rely on very particular settings, one may question their external validity. Moreover, relevant settings are often not available.

The estimators produced by studies using survey data on building conditions are clearly less than satisfying in terms of causal interpretation. However, even though the coefficients cannot be given a causal interpretation, it is obviously interesting to know if students in poor school buildings systematically underperform relative to those in good buildings. Another important advantage by survey data is that one can collect data for a wide sample of the population relatively easily, which increases the generalizability of the results.

The insights from this paper should interest policy makers and facility managers, as it highlights the challenges related to assessing how and to which extent buildings and their conditions contribute to the production of services within them. Knowledge about these effects is of great importance when making decisions about investment and maintenance, in order to make sure that resources are allocated as efficiently as possible.

## References

^{1}

The literature has used several measures of student outcomes, e.g., exam results, grade average, and international test results.

^{2}

See Hanushek (1986) for an overview of the early literature, and Burgess (2016) for a more recent review of the literature.

^{3}

For a more detailed discussion of OLS and its applications, see Wooldridge (2006) or similar textbooks.

^{4}

Borge and Hopland (2017) and Hopland and Kvamsdal (2019) document that poor condition of public buildings, including schools is an important concern in Norwegian local governments.

^{5}

Hopland and Nyhus (2016) find a positive correlation between satisfaction with school facilities and self-reported effort-level in school. This can potentially partly explain the positive effect from building conditions on student outcomes.