Have improvements in ozone air quality reduced ozone uptake into plants?

Peak levels of ozone (O3)—quantified by concentration metrics such as accumulated O3 exposure over a threshold of 40 ppb (AOT40) and the sigmoidal-weighted cumulative exposure (W126)—have decreased over large parts of the United States and Europe in the last several decades. Past studies have suggested that these improvements in AOT40 and W126 indicate reductions in plant injury, even though it is widely recognized that O3 flux into leaves, not ambient O3 concentration, is the cause of plant damage. Using a new dataset of O3 uptake into plants derived from eddy covariance flux towers, we test whether AOT40, W126, or summer mean O3 are useful indicators of trends in the cumulative uptake of O3 into leaves, which is the phytotoxic O3 dose (POD or PODy, where y is a detoxification threshold). At 32 sites in the United States and Europe, we find that the AOT40 and W126 concentration metrics decreased over 2005–2014 at most sites: 25 and 28 sites, respectively. POD0, however, increased at a majority (18) of the sites. Multiple statistical tests demonstrate that none of the concentration metrics—AOT40, W126, and mean O3—are good predictors of POD0 temporal trends or variability (R2 ≤ 0.15). These results are insensitive to using a detoxification threshold (POD3). The divergent trends for O3 concentration and plant uptake are due to stomatal control of flux, which is shaped by environmental variability and plant factors. As a result, there has been no widespread, clear improvement in POD over 2005–2014 at the sites we can assess. Decreases in concentration metrics, therefore, give an overly optimistic and incomplete picture of the direction and magnitude of O3 impacts on vegetation. Because of this lack of relation between O3 flux and concentration, flux metrics should be preferred over concentration metrics in assessments of plant injury from O3.

Several metrics are used to quantify surface O 3 and its impacts. For human health, the maximum daily average

RESEARCH ARTICLE
Have improvements in ozone air quality reduced ozone uptake into plants? over 8 hours (MDA8), a concentration metric, is widely used to predict respiratory injury (EPA, 2011;McDonnell et al., 2012;Turner et al., 2016;Fleming et al., 2018). For assessing plant impacts, some metrics quantify O 3 concentration in ambient air while others quantify the flux of O 3 into leaf tissue through the stomata. The most widely used concentration metrics are the accumulated O 3 exposure over a threshold of 40 ppb (AOT40) and sigmoidalweighted cumulative exposure (W126) indices, both of which give greater weight to high concentrations (Lefohn and Runeckles, 1987;Hůnová et al., 2003;Avnery et al., 2011;Lefohn et al., 2018;Mills et al., 2018a). Although correlations between these concentration metrics and plant injuries have been reported, the flux of O 3 through stomata is a better predictor of plant damage because it reflects the physiological dose to tissues within the leaves (Musselman et al., 2006;Mills et al., 2011a,b;Braun et al., 2014;Büker et al., 2015;CLRTAP, 2017). The phytotoxic O 3 dose (POD) metric integrates the stomatal flux over a growing season or other designated time period. The related POD y metric integrates flux that exceeds a threshold (y nmol O 3 m -2 s -1 ) that can be detoxified by the plant (POD y , Mills et al., 2011a,b;CLRTAP, 2017;Mills et al., 2018a). When stomata are closed, high ambient O 3 concentrations may not injure plants. Conversely, when stomata are open wide, large fluxes and resulting injuries can occur at low O 3 concentrations. For these reasons, flux-based metrics are generally preferred, where they are available, and critical POD y levels have been determined for many plant species (Mills et al., 2011b;CLRTAP, 2017). Indeed, several studies have found that impacts predicted from modeled stomatal uptake differ in size and pattern from impacts predicted from concentration metrics (Mills et al., 2011a;2018c;Tang et al., 2013). Despite the advantages of POD and related flux metrics over concentration metrics, however, many plant impact studies continue to use concentration metrics because O 3 concentration data are much more widely available than stomatal O 3 flux data (Fuhrer et al., 1997;Musselman et al., 2006;Van Dingenen et al., 2009;Avnery et al., 2011;Mills et al., 2011a;Braun et al., 2014;Holmes, 2014;Lefohn et al., 2018;Mills et al., 2018a).
Across large parts of the United States and Europe, surface O 3 air quality has improved in recent decades, according to many concentration metrics (Cooper et al., 2014;Chang et al., 2017;Lefohn et al., 2017;Fleming et al., 2018;Lefohn et al., 2018;Mills et al., 2018a), after deteriorating for much of the 20 th century (Vingarzan, 2004;Shindell et al., 2006;Parrish et al., 2012;Cooper et al., 2014). These recent O 3 improvements resulted from policies and technology that reduced emissions of O 3 precursors, particularly nitrogen oxides, carbon monoxide, and volatile organic compounds (EPA, 2003;Council of the European Union and European Parliament, 2008;EPA, 2011;EEA, 2016;EPA, 2016). The Tropospheric Ozone Assessment Report (TOAR) concluded that these O 3 declines reduced the potential risk of damage to crops and other vegetation in these regions, while recognizing that climate, soil, and plant controls on stomatal conductance also determine the risk of damage (Mills et al., 2018a, hereafter TOAR-Vegetation). While the TOAR-Vegetation report used concentration metricsprincipally AOT40 and W126-because long-term O 3 flux data are very sparse, the report recommended that stomatal uptake metrics be used in future risk assessments Mills et al., 2018a).
Although some studies use empirical stomatal models to calculate O 3 flux metrics and predict reductions in crop yield (Emberson et al., 2000;Mills et al., 2011a;Büker et al., 2012;Grünhage et al., 2012;Tang et al., 2013;Emberson et al., 2013;Mills et al., 2018bMills et al., , 2018c, there has been little analysis of decadal or longer trends in POD or whether those trends match the trends in concentration metrics . POD and concentration metrics can be well correlated, at least on short time scales, under conditions where stomatal variability is limited, such as containers with a single plant species or irrigated and fertilized fields (Cieslik, 2004;Karlsson et al., 2004;Gonzalez-Fernandez et al., 2010;Matyssek et al., 2010;González-Fernández et al., 2014). Under less controlled, natural conditions, however, weather, hydrology, and climate can drive substantial changes in conductance on time scales from minutes to years (Emberson et al., 2000;Büker et al., 2012;Keenan et al., 2013;Clifton et al., 2017). This environmental variability may disrupt the relationship between POD and O 3 concentration. The widespread and well-documented reductions in AOT40 and W126 in the United States and Europe may, therefore, misrepresent the benefits for plant health because of the influence of stomata on POD. As a result, there is a need to test whether O 3 flux into vegetation (POD) covaries with concentration metrics (AOT40 and W126) on multi-year time scales.
This paper quantifies temporal variability and trends of O 3 uptake into vegetation and compares them to variability and trends in O 3 concentration over a decade. While past work has documented that POD has low spatial correlation with AOT40 and W126 (Mills et al., 2011a;Ducker et al., 2018), we specifically test temporal trends and relationships. We use a new dataset of O 3 fluxes in the United States and Europe (Ducker et al., 2018), which covers the period 2005-2014 when previous studies documented declines in O 3 concentration metrics (Chang et al., 2017;Lefohn et al., 2017;Lefohn et al., 2018;Mills et al., 2018a). We will show that POD trends differ significantly from the concentration metrics-specifically AOT40, W126, and mean O 3 -and we examine the implications of those divergent trends for vegetation health.

Ozone data and methods
We analyze trends in stomatal O 3 uptake and O 3 concentration in the SynFlux dataset. As described by Ducker et al. (2018), SynFlux calculates stomatal conductance and other components of O 3 deposition velocity from measurements at eddy covariance flux towers (Pastorello et al., 2017), with some additional information from remote sensing. The method uses observed water vapor flux, heat flux, momentum flux, leaf area, and standard meteorology variables. Direct measurements of O 3 flux are not needed for SynFlux. The stomatal conductance and deposition velocity are then combined with a gridded dataset of O 3 mole fractions to estimate stomatal fluxes of O 3 into vegetation around the flux tower. The O 3 dataset, described by Schnell et al. (2014), is a weighted interpolation of about 4000 of air quality monitoring stations and has horizontal resolution of 1° and temporal resolution of 1 hour. At sites with O 3 flux measurements, the gridded O 3 dataset reproduces 60-90% of observed daily O 3 variability (R 2 = 0.6 -0.9) with mean bias of 5-10 ppb (Ducker et al., 2018). At a broader range of sites, Schnell et al. (2014) estimated gridded O 3 errors to be 6-9 ppb (rms). Since O 3 errors at a particular site and time affect all concentration and flux metrics simultaneously, the metric vs. metric comparisons shown here are insulated from inaccuracies in the O 3 dataset. SynFlux reproduces approximately 90% of the day-to-day variability (R 2 = 0.9) in stomatal O 3 uptake at flux measurement sites with a mean bias of 20% or less that can mostly be explained by the O 3 concentration bias (Ducker et al., 2018).
We examine trends in O 3 concentration and POD in the summer growing season over the ten-year period 2005-2014. This decade has the greatest number of sites in the SynFlux dataset and longer periods would significantly reduce the number of sites in the analysis. All SynFlux sites with at least eight years of observations in the period are used, which results in 32 qualifying sites: 10 in the United States and 22 in Europe. These sites are listed in Table S1. The sites sample ecosystem types that are widely distributed in the United States and Europe. Of the 32 sites, 21 are forests (10 needleleaf, 7 broadleaf, and 4 mixed), 5 are grassland, 2 are crops, 3 are savanna or shrubland, and 1 is wetland.
At each SynFlux site, we calculate POD 0 , POD 3 , AOT40, W126, and daytime mean O 3 for each summer growing season, defined as June-September. POD y is the cumulative daytime stomatal flux above a threshold y nmol m -2 s -1 during these months, which is contained in the SynFlux dataset described by Ducker et al. (2018). Recommended thresholds vary by species, so we use POD 3 , a predictor for damage in several vegetation types (Mills et al., 2011a;Büker et al., 2015;CLRTAP, 2017), to test whether the threshold affects our results. We integrate the stomatal flux over times when the sun is at least four degrees above the horizon. When some stomatal flux data are missing, the integral of available fluxes in each month is scaled up by the fraction of missing data; the scaled monthly integrals are summed to obtain POD y for the summer growing season. Months with fewer than 100 observations are discarded from the analysis because of the greater uncertainty in the monthly integral. When one of the four growing season months is missing, we scale up the remaining months in the same way to get total POD y for the growing season; if two or more months are missing, the POD y is treated as missing for that year. The AOT40 and W126 metrics are calculated from the gridded O 3 dataset following previously documented methods (Ducker et al., 2018;Lefohn et al., 2018). Unlike some studies, we do not apply a three-year running mean, so our W126 and AOT40 values describe O 3 concentrations in the summer growing season of a single year. Gaps in the O 3 concentration data, although rare, are treated similarly to POD y , by scaling up the AOT40 or W126 by the fraction of missing data. All metrics are calculated over the same growing period, June-September, at all 32 sites. Accumulating the O 3 metrics over site-specific growing months could alter the metric values or their trends, but is less likely to affect whether POD y and concentration metrics have consistent temporal variability, which is the main focus of this work. In cases where POD y is missing for a particular site and year, the AOT40, W126, and mean O 3 are also discarded for consistent analysis of trends and variability. For some analyses, we remove mean spatial differences between sites by computing anomalies, x x  , where x represents a metric value at a particular site and x is its 10-year mean at that site. To compare fractional changes among metrics with different units, we also normalize values using / 1 x x x ¢   . Figure S1 shows time series of all normalized metrics.
We estimate the linear trends in O 3 metrics at each site using ordinary least squares regression: x = a + bt, where x represents a metric value, t is time, and a and b are fitted parameters. To test if two metrics have the same trend, we normalize the metrics, as described above, and add interaction effects to the regression model: x' = a + bt + αC + βCt, where C is a categorical variable for metric type (C = 0 for metric one, C = 1 for metric two) and α and β are fitted parameters expressing the differences in the intercept and slope, respectively, for the two metrics. If β is significantly different from zero, then the metrics have different trends. We will use this approach to compare pairwise the trends in POD 0 , POD 3 , AOT40, and W126, thus highlighting the relationship between the flux and concentration metrics. In addition to standard p-values of regression coefficients, we use Fisher's combined probability test, a meta-analysis method, to assess whether an ensemble of p-values collectively provide evidence of an effect (Fisher, 1934). We also assess the temporal co-variability of metrics in three additional ways. First, we pool the anomaly data from all sites and years (n = 299) and compute the coefficient of determination (R 2 ) for each pair of metrics. Any correlation among the anomalies is strictly from temporal co-variability since mean spatial differences have been removed. Second, we calculate the coefficients of determination site-by-site for each pair of metrics (m = 8-10 years) and then average the resulting R 2 values across sites (n = 32). Finally, we correlate the temporal trends, described above, for each pair of metrics (n = 32 sites). Analyses are performed in Python using the statsmodels module for statistical tests (Seabold and Perktold, 2010). We use the graphical format of Cooper et al. (2014) and Mills et al. (2018a) to visualize trend results in Figures 1 and 2.  Our analysis methods differ from some past trend studies in that we use gridded O 3 data rather than original  station measurements. Nevertheless, our concentration trend patterns are very similar to those reported by others (Cooper et al., 2014;Lefohn et al., 2017;Mills et al., 2018a), despite each of those studies using different averaging methods (daytime or 24 hour) and examining different ranges of years. They generally show that summer mean O 3 in Europe has decreased or held steady since the 1990s, while concentrations in the United States decreased in the eastern United States and were steady or rising across most of the western and central United States. Like us, those earlier studies also found more consistent downward trends in AOT40 and W126 than mean O 3 in all of these regions except the western and central United States (Fleming et al., 2018;Mills et al., 2018a). The TOAR-Vegetation analysis suggests that statistically significant declines of W126 and AOT40 are more widespread than we report (45-50% of TOAR-Vegetation sites in the United States and Europe had declines; Mills et al., 2018a), but this apparent difference is explained by the larger fraction of TOAR-Vegetation sites in areas recovering from severe historical O 3 pollution, like the eastern United States and California, and the longer period of the TOAR-Vegetation analysis (1995)(1996)(1997)(1998)(1999)(2000)(2001)(2002)(2003)(2004)(2005)(2006)(2007)(2008)(2009)(2010)(2011)(2012)(2013)(2014). Overall, the comparison shows that our trend results for concentration metrics using SynFlux and gridded O 3 fields are consistent with TOAR-Vegetation results using station O 3 observations. Trends in POD 0 are distinctly different from trends in mean O 3 , AOT40, and W126, as seen in Figures 1, 2, and S1. Unlike all of the concentration metrics, POD 0 increased at more than half (18 of 32) of the sites, although 4 had POD 0 declines with p < 0.05. The sign or direction of the POD 0 (and POD 3 ) trends also disagree with the concentration trends about as often as they agree. Of the 28 sites with decreasing W126, 16 have increasing POD 0 . Of the 25 sites with decreasing AOT40, 14 have increasing POD 0 . Similar patterns appear in the multi-site mean trends. For AOT40, W126, and mean O 3 the multi-site mean trends are downward (p = 0.001, p ≤ 0.0001, and p = 0.06, respectively) while the mean POD 0 trend is upward (p = 0.9).

Trends in O 3 uptake and concentration metrics
The discrepancies between POD 0 and concentration trends occur in nearly all vegetation types examined. The POD 0 trends have opposite sign to W126 and AOT trends at roughly half of the forest sites (5 of 10 needleleaf, 4 of 7 broadleaf, 3 of 4 mixed forest), grassland sites (2 of 5 sites), and shrubland sites (1 of 2). At the one wetland site all metrics have the same trend sign, which is consistent with stomatal flux correlating with O 3 concentration in a moisture-rich environment that promotes stomatal opening (CLRTAP, 2017). Conversely, at both crop sites, POD 0 and concentration trends have opposite sign. While the crop sites used irrigation, which relieves water stress, they also rotated crops in some years, which would increase the interannual variability in stomatal conductance and POD 0 while having less effect on O 3 concentrations, thereby partly decoupling O 3 concentration and uptake. Trend disagreements also occur across most of the examined geographical region and climate types, with no clear pattern. While the small number of sites for some vegetation types (particularly non-forests), regions, and climates make it difficult to determine if O 3 concentration and flux preferentially decouple in certain environments, it is clear that discrepancies between POD y and concentration trends are widespread.
Some differences in trends should be expected because of statistical fitting errors in each slope estimate stemming from errors and uncertainty in the measurements that underlie each metric. Nevertheless, if stomatal conductance and deposition velocity were steady, the normalized O 3 concentration and normalized POD should have similar trends. Using a regression model of normalized data with interaction effects (Section 2), we find that the fractional trends in POD are indeed different (p < 0.05 level) from the fractional trends in other metrics at 7 sites, regardless of which concentration metric is chosen. Many more sites have trend differences with marginal or low significance (0.05 < p < 0.2) (9 for AOT40 and mean O 3 , 11 for W126). The statistical strength of these results may be underestimated due to the random errors in SynFlux (section 2), which inflate the POD 0 variance and diminish the significance (p value) of differences from concentration metrics. Mean biases do not affect the POD 0 trends, so they should not affect the relationship to concentration metrics. In an aggregate meta-analysis (Fisher's combined probability test), the skew towards zero of all 32 p-values gives strong evidence that the normalized POD 0 trends systematically differ from the concentration trends (p ≪ 0.001), for all normalized concentration metrics. The divergent trends for POD 0 and concentration metrics are likely explained through stomatal conductance, which is driven by many factors such as weather, hydrology, and climate. Rising stomatal conductance increases O 3 uptake into plants, while decreasing ambient O 3 concentration through dry deposition (Solberg et al., 2008;Emberson et al., 2013;Kavassalis and Murphy, 2017). Regardless of the causes, however, the divergent trends indicate that the common AOT40 and W126 metrics have limited utility for tracking changes in O 3 impacts on vegetation.
While the O 3 trends in our 10-year study period are consistent with past studies that investigated other periods, as shown above, the trends for any given site and metric can vary depending on which years are analyzed. Individual years with extreme or missing data can sometimes discernably affect trend estimates, which may contribute to a few apparently large differences between some neighboring sites (e.g. POD 0 trends in Italy). Calculating trends over a longer time period could reduce the influence of individual years, but at the expense of having fewer sites in this analysis. However, the occurrence of an extreme value in one metric but not another (e.g. W126 vs. POD y ) at a single site is still a meaningful indicator that those metrics have different temporal variability. In addition, the regression model that tests differences in trends between metrics accounts for the greater uncertainty in the individual trends that result from extreme values, yet we still find statistically significant different trends across metrics.
Another measure of the usefulness of concentration metrics is their temporal correlation with POD, shown Figures  3 and 4 and Table S2. Mills et al. (2011a) recognized that the spatial pattern of POD can be quite different from the O 3 concentration pattern, particularly when considering sites with contrasting climate and vegetation. Ducker et al. (2018) further showed that their spatial correlations are very low (R 2 ≤ 0.05 for POD 0 vs. AOT40, W126, or mean O 3 ). We quantify the temporal correlation between POD y and other metrics in three ways (Table S2). The first approach, seen in Figure 3, which pools data from all sites, reveals no meaningful correlation between POD 0 and any of the concentration metrics (R 2 < 0.01) while all pairs of concentration metrics are strongly correlated (R 2 = 0.7 -0.9). Results are unchanged when using POD 3 in place of POD 0 (R 2 < 0.01, Figure S2). The second, site-by-site approach also shows that the concentration metrics are closely correlated with each other (R 2 = 0.72 -0.84 averaged across sites; Table S2) but not with POD 0 (R 2 = 0.13 -0.14). Both of these approaches mean that none of the concentration metrics can predict well the interannual variability of POD 0 . Finally, the linear trends shown in Figure 4, which are fitted to data at each individual site, indicate that POD 0 trends are essentially unrelated to trends in any concentration metric (R 2 ≤ 0.05; Figure 4; Table S2), while all of the concentration metric trends have considerable common variability (R 2 = 0.5 -0.9). The trend correlations are again similarly weak when using POD 3 (R 2 ≤ 0.1; Figure S3). This means that neither AOT40 trend nor W126 trend has any skill in predicting the POD y trend.
While it is already widely recognized that variations in stomatal conductance complicate the relationship between O 3 concentrations and stomatal uptake (Musselman et al., 2006), these results go a step further. Our results show that the conductance changes under common environmental conditions are sufficiently large and important that W126 and AOT40 trends are poor predictors of POD y trends. AOT40 and W126 might still be useful for assessing ozone extremes for other applications, however. Thus, the widespread decreases of W126 and AOT40 in large parts of the United States and Europe, while favorable, are not robust indicators for improved plant uptake or health. In fact, we have shown that there has been no widespread improvement in POD y at sites in these regions.

Conclusions
By many metrics, O 3 air quality has improved in large parts of the United States and Europe over the last two decades in response to policies and technological improvements that reduced emissions of O 3 precursors. Past work and our results show that there are downward trends in mean O 3 , AOT40, and W126 metrics at a majority of sites that we studied in the eastern United States and Europe. These metrics are widely used to assess the impacts of O 3 and their declines have been interpreted as indicating reduced O 3 damage to vegetation. While POD is known to be a better predictor of the physiological O 3 dose than ambient O 3 concentration, its use has been limited by data availability.
We use the SynFlux dataset to report decadal trends in POD for the first time and find that POD does not follow the same trends as the O 3 concentration metrics commonly used to assess vegetation injury. POD trends have mixed increases and decreases across the United States and Europe, in contrast to the predominant decrease in concentration metrics at the sites we examined. Many sites have simultaneous decreasing AOT40 and W126 while POD is increasing. Using multiple statistical approaches, we show that the multi-year trends and temporal variability in POD differ significantly and systematically from the concentration metrics. The results are not affected by POD y threshold choices (y = 0 or 3 nmol m -2 s -1 ). Past work showed that concentration metrics have low spatial correlation with POD and, here, we add that there is also  little temporal correspondence. Thus, AOT40 and W126 are not robust predictors of trends in plant injuries from O 3 . Rather, the widespread decreases of AOT40 and W126 in the United States and Europe in recent decades give an overly optimistic view of changing plant injury risk in recent years. If all else were equal, reduced concentrations would lead to less plant injury, but, in reality, stomatal conductance and its variability-driven by meteorology, hydrology, and climate-is an equally important control on POD. The analysis here further supports the recommendations of TOAR-Vegetation and others that future studies of plant damage and economic losses should avoid relying primarily on AOT40 or W126 and make greater effort to account for stomatal activity and stomatal flux. This is particularly important when considering the combined effects of climate variability and change in combination with evolving surface O 3 concentrations.