Estimates of ground-level ozone concentrations have been improved through data fusion of observations and atmospheric chemistry models. Our previous global ozone estimates for the Global Burden of Disease study corrected for bias uniformly across continents and then corrected near monitoring stations using the Bayesian Maximum Entropy (BME) framework for data fusion. Here, we use the Regionalized Air Quality Model Performance (RAMP) framework to correct model bias over a much larger spatial range than BME can, accounting for the spatial inhomogeneity of bias and nonlinearity as a function of modeled ozone. RAMP bias correction is applied to a composite of 9 global chemistry-climate models, based on the nearest set of monitors. These estimates are then fused with observations using BME, which matches observations at measurement stations, with the influence of observations declining with distance in space and time. We create global ozone maps for each year from 1990 to 2017 at fine spatial resolution. RAMP is shown to create unrealistic discontinuities due to the spatial clustering of ozone monitors, which we overcome by applying a weighting for RAMP based on the number of monitors nearby. Incorporating RAMP before BME has little effect on model performance near stations, but strongly increases R2 by 0.15 at locations farther from stations, shown through a checkerboard cross-validation. Corrections to estimates differ based on location in space and time, confirming heterogeneity. We quantify the likelihood of exceeding selected ozone levels, finding that parts of the Middle East, India, and China are most likely to exceed 55 parts per billion (ppb) in 2017. About 96% of the global population was exposed to ozone levels above the World Health Organization guideline of 60 µg m−3 (30 ppb) in 2017. Our annual fine-resolution ozone estimates may be useful for several applications including epidemiology and assessments of impacts on health, agriculture, and ecosystems.
1. Introduction
Ground-level ozone is a pervasive air pollutant that detrimentally affects human health and vegetation. Ozone can cause a wide range of health problems in humans and has been associated with premature mortality from daily exposures (Bell et al., 2004; Di et al., 2017; U.S. Environmental Protection Agency [U.S. EPA], 2020) and from long-term chronic exposure (Jerrett et al., 2009; Turner et al., 2016; U.S. EPA, 2020). Ozone concentrations above roughly 35 parts per billion (ppb) are associated with higher respiratory and cardiovascular mortality, with every 10 ppb increase in ozone concentration increasing all-cause mortality by 2% (Turner et al., 2016). Ambient ozone is estimated to have caused about 365,000 deaths globally in 2019 or 0.65% of all global deaths (GBD 2019 Risk Factors Collaborators, 2020). Recently, the World Health Organization (WHO) established new guidelines for ambient ozone of 100 µg m−3 (50 ppb) for 8-h average ozone and of 60 µg m−3 (30 ppb) for the annual metric used in this study (WHO, 2021). These guidelines suggest that ozone affects health even at concentrations frequently observed in relatively unpolluted regions. Unlike other air pollutants, ozone is purely secondary, which is created through photochemical reactions involving nitrogen oxides (NOx), volatile organic compounds (VOCs), carbon monoxide, and methane in the presence of sunlight. Ozone concentrations are typically higher in the daytime and during summer months.
Understanding of ozone impacts on human health and plants has been limited in part by our understanding of ground-level ozone distributions in space and time. Estimates of surface ozone distributions rely on monitoring station observations and chemical transport models, but both have limitations. While the United States, Europe, and Japan have dense monitoring networks that began prior to 1990 and China recently created a large network, station observations of ozone elsewhere are extremely limited (Schultz et al., 2017; Fleming et al., 2018; Lu et al., 2020). Models can help fill in these gaps in space and time but have biases (Cooper et al., 2014; Young et al., 2018; Archibald et al., 2020).
In our previous work, we conducted a data fusion of ozone observations and multiple global atmospheric models, in 2 phases, to estimate global ground-level ozone concentrations at fine spatial resolution (Chang et al., 2019; DeLang et al., 2021). These ozone maps were used to estimate global premature deaths from exposure to ambient ozone in the Global Burden of Diseases, Injuries, and Risk Factors (GBD) 2017 and 2019 Studies (GBD 2017 Risk Factors Collaborators, 2018; GBD 2019 Risk Factors Collaborators, 2020). GBD conducts a comparative risk assessment to estimate the global health burden caused by various risk factors from 1990 to the present.
Prior to GBD 2017, ozone in previous GBD studies was estimated solely by a single model with no observational bias correction (Brauer et al., 2016). The first study to combine ozone observations and output from global atmospheric chemistry models applied the new M3Fusion method to correct model bias, improving global ozone estimates from purely observation- or model-based approaches (Chang et al., 2019), in support of GBD 2017. M3Fusion bias corrects and combines multiple chemical transport models by finding an optimal linear combination of models for each world region, using weighting based on performance when compared to available observations. This multimodel composite was then corrected within 2° of a monitoring station using a spatial interpolation of observations, creating fine resolution output.
To support GBD 2019, we then improved this method using a novel combination of Bayesian Maximum Entropy (BME) along with M3Fusion (DeLang et al., 2021). BME is a framework for nonlinear geostatistics that performs the fusion of data from multiple sources (Serre and Christakos, 1999; Christakos et al., 2001; Christakos et al., 2004). BME used observations to correct the M3Fusion multimodel composite smoothly in both space and time, so that ozone estimates match the observations at station locations. The influence each station exerts diminishes over time and space based on a calculated spatiotemporal covariance function. BME had been used before on smaller scales to fuse ozone observations and models (Christakos et al., 2004; de Nazelle et al., 2010; Xu et al., 2016), but DeLang et al. (2021) were the first to apply it globally. The ability of observations in BME to influence estimates across time was also shown to be useful in informing earlier years before new stations were added (DeLang et al., 2021).
DeLang et al. (2021) showed that ozone estimates improved markedly over purely model- or observation-based estimates and produced a global fine resolution (0.1°) dataset for each year from 1990 to 2017. However, like Chang et al. (2019) and DeLang et al. (2021) used the M3Fusion method, which uses linear model bias corrections that are homogeneous across continents. BME then corrected estimates based on nearby observations, allowing for spatial nonhomogeneous bias corrections. The spatial scale over which BME has an influence is not bounded, but it is determined by the spatial covariance function based on observations from which we find that BME ozone bias corrections are small beyond about 0.5° from a monitoring location (DeLang et al., 2021). Here, we implement a second method that allows spatial nonhomogeneous and nonlinear bias corrections that have influence beyond 0.5° from a monitoring location.
Previous research has shown that air pollution model performance and biases are nonhomogenous (vary by location) and nonlinear (vary as a nonlinear function of the model estimate) (Reyes et al., 2017). For example, chemical transport models generally overpredict fine particulate matter (PM2.5) when predicting high (>25 µg m−3) values and underpredict elsewhere (Reyes et al., 2017). Model errors for ozone stem from uncertainties in inputs, especially emissions of ozone precursors (NOx and VOCs) from anthropogenic and natural sources, and in model processes including chemistry, model resolution, transport, and deposition (Young et al., 2018). Previous model evaluations have found that models have errors that vary by season and latitude (von Kuhlman et al., 2003), reflecting uncertainties in emissions inputs and in physical and chemical processes within the models. These errors could lead to overestimates in some locations and underestimates in others, so that model performance is heterogeneous (Liang and Jacobson, 2003). For example, emission estimates used in an atmospheric model may be too high or too low for a nation, and if monitors are present, it may be possible to identify and correct biases over that nation. Ozone production is also known to change nonlinearly with emissions (Cohan et al., 2005). Consequently, model-simulated ozone concentrations exhibit a nonlinear bias with respect to observations. The Community Multiscale Air Quality model, for example, has been shown to overestimate maximum 8-h average ozone levels where observations are below 35 ppb and underestimate where observations are above 85 ppb (Appel et al., 2007).
The goal of this study is to improve the work of DeLang et al. (2021) to map global surface ozone concentrations each year from 1990 to 2017 at fine spatial resolution by adding a nonlinear and heterogeneous bias correction before BME data fusion and then evaluate the improvement in performance. We use the Constant Air Quality Model Performance (CAMP) method, which allows for nonlinear bias correction as a function of the modeled ozone concentration, and the Regionalized Air Quality Model Performance (RAMP) method (Reyes et al., 2017), which applies CAMP regionally at individual points based on the nearest observations. CAMP and RAMP corrections are applied to the M3Fusion multimodel composite prior to BME data fusion. While BME is not restricted spatially, ozone’s steep covariance curve means that observations have little influence beyond about 0.5° from a monitoring station (DeLang et al., 2021). In contrast, CAMP and RAMP corrections can influence estimates over a larger spatial range. Specifically, we aim to use regional differences in model underestimation/overestimation to correct the M3Fusion results regionally and increase the fidelity of our estimation in areas with sparse or no ozone observation stations, and we apply CAMP and RAMP here to improve estimates in particular beyond about 0.5° from a monitor. These geostatistical methods differ from other recent efforts to map ozone concentrations based on machine learning (Seltzer et al., 2020; Kleinert et al., 2021; Sun et al., 2021; Betancourt et al., 2022; Liu et al., 2022), and so there is value in further developing geostatistical methods for comparison with these recent advancements.
Both CAMP and RAMP bias correct models by comparing observed and modeled values at collocated points in space and time. CAMP applies a nonlinear correction as a function of the modeled ozone concentration, assuming that nonlinear function is constant across the study region. RAMP improves on this by giving each model grid cell its own nonlinear bias correction based on nearby observations; RAMP is both nonlinear and nonhomogeneous spatially. Here, the RAMP method (Reyes et al., 2017) is applied globally for the first time, with each model grid cell correction based on a unique area that includes the nearest points in space/time. These areas are much smaller than the continental regions used in M3Fusion, allowing us to better correct biases in the M3Fusion multimodel composite at points far from observations, while BME then applies corrections near observations. In applying RAMP at a global scale, we also make a novel modification of the RAMP method because station observations are sparse in some regions and clustered in others. This modification prevents sharp spatial changes in corrections when transitioning between 2 different regions with dense observation networks. The CAMP- and RAMP-corrected estimates are then each used as global background ozone levels (the global offset) for BME data fusion with observations, as was done by DeLang et al. (2021) with the uncorrected M3Fusion multimodel composite.
We aim for the improved global ozone database to be useful for researchers in a variety of fields, including atmospheric science, epidemiology and health impact assessment, and ecosystem and crop impacts. As this is the first global application of CAMP and RAMP, we document the methods and results to inform future global data fusion efforts for ozone and other air pollutants. Finally, by further developing geostatistical methods for global ozone estimates in this study, we also aim to provide ozone estimates that can be compared with recent machine learning applications in future studies.
2. Methods
2.1. Ozone observations and model estimates
We use the ozone season daily maximum 8-h mixing ratio (OSDMA8) as the annual ozone metric, as it is used for calculating health outcomes from ozone pollution by GBD (GBD 2019 Risk Factors Collaborators, 2020) using concentration–mortality relationships from Turner et al. (2016) and other studies. OSDMA8 is the maximum 6-month running mean of monthly averages of the daily 8-h maximum mixing ratios. Since ozone is usually highest in the summer, each defined year includes up to March of the following year to capture the Southern Hemisphere summer. All reported ozone values, including observations, modeled values, and estimates, are OSDMA8 values.
Ground-level ozone measurements are taken from databases compiled by the Tropospheric Ozone Assessment Report (TOAR) and the Chinese National Environmental Monitoring Center (CNEMC) (Schultz et al., 2017; Lu et al., 2020). The TOAR database is the largest collection of global hourly surface ozone concentrations and spans 1970–2015. To support this project, some national datasets were extended for 2015–2017 (DeLang et al., 2021). While observations are dense in North America, Europe, Japan, and South Korea, they are sparse to nonexistent elsewhere (Figure S1). CNEMC provides surface ozone observations in China beginning in 2013, and data through 2017 are used in this study (Lu et al., 2020). Both datasets were quality controlled with the same algorithm developed for the TOAR database. The number of observation locations in the combined dataset is least in 1990 (with 1,190) while 2015 has the most (4,999).
We used surface concentration output from 9 atmospheric chemistry models to create our M3Fusion multimodel composite (DeLang et al., 2021). Models include 4 from the chemistry-climate model initiative (CCMI; Morgenstern et al., 2017) that simulate 1990–2010, 2 additional CCMI models that extend the simulation beyond 2010, 2 CMIP6 models that cover years after 2010 (Collins et al., 2017), and MERRA2-GMI that covers 1990–2017 (Strode et al., 2019; Anderson et al., 2021). The compilations of observations and models used here are the same as used by DeLang et al. (2021).
2.2. Data fusion methods
M3Fusion was used to evaluate model performance and create a bias-corrected multimodel composite for each year 1990–2017 (Chang et al., 2019). This method finds a linear combination of the 9 models in each year and continental domain that minimizes the mean square error (MSE) compared to interpolated observations, creating a single bias-corrected global composite for each year. This is the same composite used by DeLang et al. (2021), and it reflects model expectations about variations in ozone due to emissions, geography, meteorology, and chemistry. However, M3Fusion does not capture the nonlinearity of model performance with respect to model value, nor how model performance varies within a continental domain (heterogeneity), both of which we address using RAMP.
The CAMP method (de Nazelle et al., 2010) is a precursor to RAMP that bias corrects for nonlinear model performance but does not account for nonhomogeneity as a single correction applies globally. It matches each observation point with the model estimate at that location. These matched pairs are then binned by the model estimate, and an average of model estimates and observations is set for each bin. The M3Fusion composite is then corrected by interpolating between these values. Since CAMP is closely related to RAMP, we describe the method in depth in the following paragraphs. While CAMP works well for local applications in a single year, RAMP allows us to account for the heterogeneity in model performance at a global scale by performing the model correction based on the nearest observations only.
RAMP is a method to visualize and evaluate model performance that can be used to bias correct models (Reyes, 2016; Reyes et al., 2017). The correction accounts for nonlinear and nonhomogenous model performance (de Nazelle et al., 2010), in which the RAMP correction is not limited to a linear function with respect to model value, and it may correct differently in different geographic regions. Here, we apply RAMP to the M3Fusion composites so that we address residual nonlinear and nonhomogeneous biases. While previous studies have used RAMP to bias correct model estimates of air pollutants (de Nazelle et al., 2010; Xu et al., 2016; Reyes et al., 2017), none has done so at a global scale.
We describe RAMP by letting be the M3Fusion multimodel composite prediction of ozone at space/time coordinate where s is the spatial location in longitude/latitude degrees, and t is the time in years. Let be the ozone observation (TOAR or CNEMC) at space/time monitoring points . M3Fusion predictions are available throughout our entire global study domain, whereas observations are only available at certain locations. We match each observation with the underlying model prediction , so that () are the paired observation-model values. We let be the space/time region around p containing the N = 250 spatially closest stations in years t, t − 1, and t + 1 (1990 does not use t − 1 and 2017 does not use t + 1). After testing other numbers of stations, we found that N = 250 is ideal, with enough stations to maintain consistent patterns and prevent outliers from having significant effects, while giving a narrow enough spatial range to correlate with local trends. As we use 3 years, contains up to 750 collocated () pairs. We sort these pairs by increasing model value and stratify them in 10 bins corresponding to increasing model decile values . Then, we calculate the average observed value λ1 for model decile value in region as
where is the number of paired observed/modeled values () for which is in the kth decile of modeled values, and is the jth observation in these pairs.
The abovementioned steps follow those outlined by Reyes et al. (2017); in this article, we further improve RAMP by ensuring that the slope between λ1s does not become negative, or in other words, that the λ1 RAMP curve for any is monotonically increasing. To do this, we define the mean value of all observed values in as λmean. We compare λmean with , the λ1 in the fifth decile bin. If , we set . We then compare the fifth and fourth bin in the same way, and so on, ensuring that , by setting them as equal when necessary. We do the same for bins k = 6 through 10, first comparing bin 6 to λmean and setting the value of equal to . This is a novel improvement to Reyes et al. (2017) as it maintains the ordinality of estimates from the original model with the same .
By plotting with respect to , we obtain the RAMP curve at location p showing how the average observation changes with respect to model values. Figure 1 visualizes the nonlinear performance of the M3Fusion composite, and by changing location p, we can see how that performance varies spatially and where it is nonlinear. This visualization can, for example, be used to detect regions where the M3Fusion composite overpredicts high ozone values and underpredicts low ozone values. These plots also allow us to correct the model value by interpolating along and selecting new model values based on the value of λ1 evaluated at the original M3Fusion value. Therefore, the RAMP-corrected model value is .
A novel challenge posed by the implementation of the RAMP method at the global scale is that station locations are clustered in some countries or continents (e.g., the United States, China, Japan, and Europe) and are sparse in large areas in between. Previous applications of RAMP had more uniform distributions of observations over a smaller scale (Reyes et al., 2017). As a result, globally the region containing the N = 250 stations closest to p can change dramatically over a short distance, for example, when shifting from a domain dominated by European observations to one dominated by China. This abrupt change in can result in a discontinuity in the RAMP-corrected value . To reduce this discontinuity, we introduce the RAMP–M3Fusion weighted-average calculated as
where and are the RAMP and M3Fusion values, respectively, and is the weight for RAMP at location p. We want a weight that is high when a large fraction of the N stations used to construct the RAMP curve are close to p and low when this fraction is low. We therefore set the weight using the following equation:
where is the number of stations used to calculate the RAMP curve at location p (i.e., 250), is the number of these stations that are within a radius q of p, and is the fraction of RAMP stations (between 0 and 1) that are within q degrees of p (Figure 2). We chose a radius q = 25°, so that the RAMP weight allows RAMP to exert influence beyond the range of BME without extending into areas lacking representative observations. Areas that are more than 25° away from these station clusters, like the area at the midpoint between China and Europe, will have a RAMP weight close to 0 and , thereby mitigating any RAMP discontinuity. We call the global output of values as weighted RAMP (wRAMP).
BME data fusion is then applied after RAMP correction to fuse model prediction and observations, using the approach described by DeLang et al. (2021). Each BME estimate uses a different background assumption for global ozone levels at every grid cell, which we call the global offset, based on either the M3Fusion composite, CAMP-corrected M3Fusion, or wRAMP-corrected M3Fusion. This global offset is corrected using BME so that the final BME estimate matches observed values at each station location. Observations are treated as “hard data,” meaning they are perfectly accurate, neglecting measurement uncertainty. Each station exerts an influence based on the difference between the station estimate and the global offset, which decreases as the space/time distance from observations increases, eventually matching the offset prediction away from observations. The rate at which this influence falls is based on a derived covariance function. BME has been used previously for the fusion of ozone observations and models (Christakos, 2000; Christakos et al., 2004; de Nazelle et al., 2010), though only once before at the global scale (DeLang et al., 2021). While these papers provide the details of BME, we give here the main BME steps.
The fundamental step in BME data fusion is the definition of an offset function at all points p across the study space/time domain. Here, we set equal to either (M3Fusion), (RAMP), or (wRAMP). We calculate the offset-removed observations as
where are the observations at point . We define as a homogeneous/stationary space/time random field (STRF) with realizations . is a STRF representing the residual uncertainty and variability that is left in the offset-removed observations, and therefore its covariance function changes with the offset considered (either M3Fusion, RAMP or wRAMP). Finally, we define the STRF representing the ozone concentration as the sum of the residual field and the offset, that is,
We implement BME on the residual STRF to obtain the BME estimate and associated estimation error variance of at estimation points across a global estimation grid. The general knowledge base characterizing consists of a mean assumed to equal 0 within the estimation neighborhood, and a covariance function obtained from a variogram analysis (see Supplementary information for details on the covariance model and its parameters). The site-specific knowledge consists of the offset-removed observations treated as hard data (data with no assumed uncertainty). We numerically implement BME using the BMElib library written in the MATLAB programming language (Serre and Christakos, 1999; Christakos et al., 2001), and as shown by DeLang et al. (2021), in this case, the BME posterior probability density function of is Gaussian with a mean and associated error variance equal to that of simple kriging. Finally, the BME estimate of , representing ozone at the estimation point, and associated variance are obtained as
and , where is the (M3Fusion, RAMP or wRAMP) offset at the estimation point. Estimation points are set on a 0.5° grid, giving a final BME estimation at 0.5° resolution. All data fusion methods are the same globally, as we do not attempt to alter the methods for particular regions, although the effects of the methods differ in space and time reflecting observations and model output.
The variance estimated here is a statistical measure of uncertainty in the BME framework, but the variance does not quantify the uncertainty fully as it omits other sources of uncertainty. In this study, measurements are assumed to be accurate, ignoring measurement uncertainty, and the lack of measurements over large world regions hinders our understanding of ozone in those regions. Models involve additional uncertainties in model physical and chemical processes and in emission inventories and other model inputs. Finally, there is uncertainty in the BME framework as other data fusion methods may also be used.
2.3. Cross validation
Leave-one-out cross-validation (LOOCV) was done by removing each observation one at a time and using various estimation methods to evaluate our ability to predict this observation based on the remaining data. LOOCV was performed by predicting ozone at each 0.5° grid cell containing an observation point and comparing it with the observations in the grid cell, following the same protocols as DeLang et al. (2021). This was done for M3Fusion, CAMP, and wRAMP both before and after data fusion with BME. For LOOCV of BME, BME was used to estimate each removed point in turn, and the aggregated errors were used to calculate MSE and the Pearson correlation coefficient squared (R2) that is bounded between 0 and 1. For LOOCV on the offsets, the difference between the offset and observation point at each station location was used.
Whereas LOOCV tests the ability to predict based on nearby clustered observations, we use checkerboard cross-validation (CBCV) to better test each estimation method, especially farther from nearby observations. This method derives from the radius-based validation methods of Xu et al. (2016) and Cleland et al. (2020). In CBCV, we create a “checkerboard” of boxes over the world with each box having a side length s latitude and 2*s longitude. For each box, we remove all observed values within the box and use BME to reestimate the ozone values at the location y*(pi) of the removed observations within the box, using only observations outside of it. The validation error is defined as , which is then used to calculate R2 values to quantify error for each observation in every box. We test CBCV with s ranging from 0.5° to 50°. BME relies on observations to make corrections within the covariance range (less than about 1° for ozone), so as s increases, observed values will have a smaller influence on correction. CBCV simulates the effect of sparse observations, while still having observations to validate the estimate. As most of the world lacks dense observation networks, the ability to correct away from observations is valuable to global estimations of air pollution.
3. Results
The M3Fusion multimodel composite (Chang et al., 2019; DeLang et al., 2021) (Figure S1) is used here as the basis for CAMP and RAMP bias corrections based on the TOAR and CNEMC observations (Figure 1). Using CAMP, we see that M3Fusion performance is nonlinear, tending to overpredict ozone where it estimates high values and slightly underpredict low values (Figure S2). This has been demonstrated for individual models in previous studies of surface ozone (de Nazelle et al., 2010), but we are not aware that it has been demonstrated previously at the global scale. Using RAMP, we confirm that M3Fusion bias varies at a finer scale than the continental regions used in M3Fusion, supporting the value of RAMP’s localized (nonhomogenous) bias correction.
When applying RAMP bias correction to each model grid point, corrections vary each year and at each location, but the largest changes generally occur where the M3Fusion estimate is above about 55 ppb or below 35 ppb. While the M3Fusion composite generally overestimates when it predicts high ozone and underestimates where it predicts low, this is not true for all regions. Figure 3a shows an area where the model consistently underpredicts ozone, and the RAMP correction has a steeper slope at high values. Figure 3b shows a nearby region in the same year where M3Fusion overpredicts ozone at all but the lowest levels, and the ozone estimate at the specific point is lowered by the RAMP correction. The locations of Figure 3a and b are relatively close to one another, showing that the M3Fusion bias varies at finer spatial scales than continental. Figure 4 shows the heterogeneity in model performance and bias correction globally. While some areas, like the Americas, primarily have a RAMP correction in a single direction, others like northern Africa and eastern Europe have regions which are corrected upward and bordering regions corrected downward.
Figure 3b also shows a region where model performance is nonlinear with respect to estimations, overpredicting high values and underpredicting low values. In these areas, the M3Fusion bias does not vary linearly with respect to the M3Fusion estimate, and therefore our correction is not a linear function. This shows the value of RAMP over a linear bias correction, as a linear correction could not replicate these nonlinear curves. Figure 3c shows an example region where M3Fusion consistently overestimates ozone. These RAMP curves show the trends in model performance in the region, as a function of modeled concentration, as well as correcting the individual points (the pink star) based on this evaluation.
At a global scale, RAMP creates “streaks” where the observations used to correct the model change from being dominated by one region (eastern Europe) to another far away region (Japan and South Korea) over a short spatial distance (Figure 5). This situation arises because there are no/few local observations for the RAMP correction in this area. To avoid these discontinuities and to avoid RAMP bias corrections based only on measurements far away, we weight RAMP (Figure 2) to allow a smooth transition between regions, using weights for RAMP and M3Fusion that vary spatially and temporally (Figure 6). wRAMP heavily favors RAMP over M3Fusion in areas with high density of observations, and RAMP maintains some influence up to 25° from any station. This distance is long enough to give RAMP an influence in areas not reached by the BME correction, but short enough that it creates smooth transitions between regions and lessens the discontinuities seen in pure RAMP.
Using wRAMP as our global offset and station observations as hard data for BME, we obtain yearly estimates of global ozone and variance at 0.5° resolution (Figure 6). The ozone estimates match observations at any space/time location with an observation, as the observations are assumed to be accurate. The influence of observations decreases as a function of space/time distance as the estimate moves further from an observation, based on the derived covariance (see Supplementary Materials equations and Figure S3). The influence of an observation over multiple years in BME is valuable when correcting areas with inconsistent observations. DeLang et al. (2021) explore the significance of the temporal factor in more detail. Variance is strongly influenced by proximity to observations, which are the only source of hard data in the BME estimate. Variance drops to 0 at measurement locations and quickly rises as distance from stations increases, since measurements are assumed to be accurate. Therefore, variance is low in Europe, North America, Japan, South Korea, and in some years parts of China, and high elsewhere. As variance approaches 60 ppb2, the BME estimation approaches wRAMP.
While in our previous work, the BME data fusion of DeLang et al. (2021) improved markedly over the M3Fusion composite, Figure 7 shows the difference between our BME estimate using the wRAMP-corrected model as our global offset and the BME estimate of DeLang et al. (2021) which uses the M3Fusion composite as the global offset. The 2 methods differ most at distances more than 0.5° from stations, because BME matches observations exactly at measurement locations, with that influence approaching 0 beyond 0.5°. But the influence of wRAMP generally drops off with distance from stations because of the weighting used, and wRAMP has no influence beyond 25° from a station. Thus, wRAMP often has the greatest influence in regions between monitors, even in regions that are densely monitored. Whether RAMP increases or decreases estimates varies in time and space, and even nearby areas can have different signs of the correction. Changes in specific regions also vary year to year. General trends include decreases in the Korean peninsula, large changes in China once local data become available in 2014, overall increases in eastern China prior to 2014, increases in the northeastern United States in most years, slight increases in southeastern Europe, and overall better model performance in inland United States and European Union (EU) than on the coasts indicated by smaller corrections in those regions. In regions with few observation stations, especially those further than 25° from the nearest observation, changes caused by including wRAMP are often less than 0.2 ppb in individual grid cells (Table S1). Larger changes are observed in regions with more monitors. In Europe in 2000, for example, the population-weighted average ozone concentration decreases by 0.205 ppb (about 0.5% of the regional population-weighted mean ozone) with some grid cells changing by more than 3 ppb (Table S1; Figure 7). Since there are compensating positive and negative changes, we also report a population-weighted mean of the absolute value of changes, which is 0.863 ppb (about 2% of the mean). Similarly, in East Asia in 2017, the population-weighted mean ozone increases by 0.217 ppb (about 0.4%), some grid cells change by more than 3 ppb, and the average absolute change is 0.897 ppb (about 1.7%).
We evaluate our results relative to DeLang et al. (2021) and other methods by using LOOCV and CBCV, for 7 cases:
Simple Model Mean: An average of all models used in M3Fusion where each is given equal weight.
M3Fusion: Multimodel composite of 9 models using the M3Fusion method.
CAMP: CAMP-corrected M3Fusion composite.
wRAMP: RAMP-corrected M3Fusion composite, weighted based on proximity to observations.
BME using M3Fusion as Offset: BME data fusion using the M3Fusion multimodel composite as the global offset, matching the results of DeLang et al. (2021).
BME using CAMP as Offset: BME data fusion using CAMP as the global offset.
BME using wRAMP as Offset: BME data fusion using wRAMP as the global offset.
In LOOCV, M3Fusion, CAMP, and wRAMP all improve performance over the simple model mean (Table S2), and CAMP and wRAMP provide clear improvements with reduced MSE and improved R2 over M3Fusion. Using BME, LOOCV performance does not differ whether M3Fusion, CAMP, and wRAMP are used as the global offset. This result is expected because most observations are clustered, and BME predicts accurately when observations are close together, similar to kriging on the observations. In an LOOCV, removing a single monitor in many cases leaves a sufficient number of nearby stations to accurately predict ozone.
Methods other than LOOCV are therefore important to test the improvement in ozone estimates when wRAMP bias correction is used before BME data fusion, and, for this, we use CBCV. In particular, CBCV more meaningfully tests each method’s ability to estimate ozone where there is not a dense network of observations (Figure 8). At small box sizes, CBCV approximates LOOCV and all BME methods have similar R2 (though a smaller R2 than in LOOCV since CBCV removes observations in all years). As the box size increases, R2 for wRAMP decreases less than the other cases, maintaining a minimum of 0.45 at large box sizes. It also does not experience the dramatic performance drop-off that CAMP and M3Fusion have at about 4°. CAMP also has consistently higher R2 than M3Fusion at box sizes greater than 4°. BME with wRAMP shows great improvement in estimating where observations have been removed compared to the base M3Fusion BME estimates. That is, even when all nearby measurements are removed (at large box sizes) wRAMP is beneficial, by using information from monitors farther away to inform the bias correction.
Using BME estimates and variance, we evaluate the likelihood that ozone values exceed specific thresholds. Specifically, using our best estimates using BME with wRAMP as the global offset, we analyze the likelihood that a given location surpasses selected ozone levels (Figure 9), where 30 ppb corresponds to the WHO guideline (WHO, 2021). Note that we do not estimate the likelihood of exceeding daily 8 h standards, such as the U.S. National Ambient Air Quality Standards for ozone, which cannot be estimated directly from OSDMA8. Areas with ozone estimates near the threshold and areas with high variance (few observations) are most likely to fall in the uncertain range. Certainty in exceeding or not exceeding a given value comes from extreme estimates and/or dense observations. For example, very few parts of the world are definitively below 45 ppb in 2017, but only areas with high estimates (central Africa, India, the Middle East, and parts of China) and areas with dense sensor networks (EU and western United States) are clearly above it. Similarly, comparing the likelihood of exceedance with our ozone estimates, we see some areas which have the same level of estimated ozone but have different likelihood of exceeding thresholds due to the difference in nearby observations (and therefore variance). For example, the hotspots in southern Africa are estimated to be above 65 ppb, but we are less than 90% certain that this area exceeds 55 ppb. Meanwhile the hotspot centered around Beijing, which has nearby observations, is above 55 ppb with near certainty, and even above 65 ppb with 90% certainty in some areas.
We estimate that 96% of the world’s population was exposed to ambient air that exceeds the new WHO guideline for ozone in 2017 (Figure 10), and that this percentage has increased from about 91% in 1990. Here, we use 2019 population data from GBD 2019 for all years, so all changes are due purely to ozone changes not population changes. Among individual world regions, North America, Europe, Russia, and South Central Asia have very nearly 100% of their population exceeding the guideline. Regions with relatively less ozone pollution, Oceania and South America, have over 50% of the population exceeding the guideline in 2017.
Finally, following DeLang et al. (2021), we use global population data from GBD 2019 to analyze annual population-weighted ozone in different regions (Figure S6). Trends in regions and years with sparse observations are less certain. Although there are small differences in individual years and regions, trends overall follow the same pattern as for DeLang et al. (2021). This reflects both compensating positive and negative bias corrections within regions by wRAMP (Figure 7; Table S1), and the fact that the population-weighted averages plotted here are weighted toward large urban areas, some of which have dense monitoring networks in which BME estimates ozone accurately whether or not wRAMP is used. Asia has a large upward trend, which along with a large increase in Africa drives an overall upward global trend in population weighted ozone. North America and Europe trend downward, though the European trend is much weaker. Russia begins to trend down in 2010, while South America and Oceania fluctuate but have no clear trend. TOAR observational studies support the downward trend in North America from 2000 to 2014 (Chang et al., 2017), and a study of CNEMC observations supports the increase in East Asia based on observational trends in China (Lu et al., 2020).
4. Conclusions
Here, we improve upon the global ozone estimates of Chang et al. (2019) and DeLang et al. (2021) by providing an additional bias correction step to the M3Fusion model composite before BME data fusion. This RAMP correction provides a local, nonlinear, nonhomogenous model bias correction, in which each point receives a different bias correction based on the M3Fusion composite performance at the nearest stations. Using this corrected model as the basis for BME data fusion leads to improvements in ozone estimates in regions beyond the distance over which observations have a substantial influence in BME (about 0.5° from monitoring stations), but less than the 25° limit in the weighting scheme used for RAMP. Including RAMP therefore can help at intermediate distances to fill in values between monitors that are not very closely spaced. We found that performance of M3Fusion varies by space/time location and is often nonlinear, making RAMP the ideal tool to further improve this composite. This method also takes full advantage of TOAR and CNEMC observations, as it allows them to both directly correct estimates locally through time with BME data fusion and inform model corrections at a larger regional scale through M3Fusion and RAMP. Our final estimates provide yearly fine resolution global ozone estimates for 1990–2017, involving a data fusion of surface observations from global monitoring stations and 9 chemistry-climate models. We also provide estimates of variance generated by BME, which may be useful for some further applications, but we caution that the variance estimated here does not represent the full uncertainty in ozone concentration.
The RAMP method demonstrates that model performance and biases have local variations, even after a uniform continental bias correction is applied in the M3Fusion multimodel composite. RAMP therefore improves estimates over M3Fusion or the global CAMP in accounting for heterogeneous model performance. RAMP also shows that model performance is nonlinear with respect to observations in many areas, which often manifests as an overprediction at one extreme and an underprediction at the other. Overall, the multimodel composite is better at estimating ozone values near the average (often 40–55 ppb) and poorer at the extremes. RAMP’s ability to account for nonlinear model performance allows greater corrections where M3Fusion predictions are worse.
As this is the first application of RAMP at a global scale, we find that RAMP alone creates “streaks” where the observations being used to inform the correction change over a short distance, showing the difficulty of using a single method over areas both rich and sparse in data. RAMP could potentially encounter this issue for any dataset where there are 2 or more heavily clustered regions of observations separated by areas with sparse observations. Therefore, we choose to weight RAMP to create smooth transitions between regions, giving a much greater weight to the multimodel composite when corrections are far from observations, while very close to observations corrections are incorporated by BME data fusion. In areas with a more uniform distribution of observations, such as those from previous studies using RAMP (Xu et al., 2016; Reyes et al., 2017), weighting RAMP would not be necessary. Weighting RAMP by distance from observations avoids such streaks, and limits RAMP’s influence in regions very far from monitors. It also allows a smooth transition between the regional bias corrections of RAMP, and M3Fusion, which bias corrects within each continent. In future work on different scales, different weighting schemes for RAMP may be chosen, including no down-weighting at all in regional applications where there are many monitors.
Overall, RAMP is more accurate than CAMP and M3Fusion at estimating global ozone. When used in conjunction with BME, RAMP does not appreciably improve estimates in LOOCV and within 1° of another station. BME alone can correct the model within the range where observations co-vary with each other, especially if it can draw on observations at the same location in other years. The advantage of RAMP is seen in the CBCV, which is greatest beyond about 0.5° from the nearest monitoring station but less than 25°. The improvements of CAMP over M3Fusion shows the value of a nonlinear model correction alone, while RAMP’s improvements over CAMP show the value of accounting for regional heterogeneity in model performance. Changes in ozone estimates due to including RAMP before BME data fusion are as large as 3 ppb in individual grid cells and absolute values of changes are up to 1 ppb when averaged over regions where ozone monitors are dense. These changes are smaller (<0.2 ppb in individual grid cells) in regions far from monitors, where the weighting put on RAMP goes to near 0.
Because the BME method provides both ozone estimates and the associated variance, we can evaluate the confidence that ozone is above or below selected values. We find that most of the world’s population lives in areas very likely above 35 ppb in OSDMA8 and even above 45 ppb. Some regions estimated to have the highest ozone, including much of India, are very likely above 55 ppb. In the case of India, model estimates suggest high ozone that may be above 65 ppb, but the lack of ground observations decreases confidence in this region. Regions with high modeled ozone but low confidence in results because of the distance from observations can be among those prioritized for increased monitoring. While RAMP improves estimation far from monitors, additional monitoring capacity in regions currently lacking monitors would be valuable for improving ozone estimates. Currently much of the world’s population lives in low- and middle-income nations, which are located far from ozone monitors. In these regions, large populations are likely exposed to high and growing ozone levels. Based on our analysis, enhanced surface ozone monitoring in these regions would have the greatest impact for improving our understanding of global scale exposure to ozone pollution.
Future work should aim to update these methods using newer atmospheric models, including results from chemical reanalyses, and to the forthcoming TOAR-II measurement database (https://toar-data.org/surface-data), which will include new monitoring sites in populated regions such as India. This analysis focuses on a single annual metric relevant for health, but this analysis could be reproduced monthly and could be reproduced for other ozone metrics such as those relevant for plants. These methods are also applicable to other air pollutants. Finally, these methods offer an interesting counterpoint to recent machine learning estimates (Seltzer et al., 2020; Kleinert et al., 2021; Sun et al., 2021; Betancourt et al., 2022; Liu et al., 2022). Machine learning has advantages of incorporating a wide range of datasets as predictors of ozone and in finding nonlinear relationships that can be difficult to capture in the geostatistical methods used here. However, machine learning has possibilities of spurious results—especially where relationships found over densely monitored regions are assumed to hold globally—and machine learning results can be difficult to understand and interpret. The geostatistical methods used here can be adapted to incorporate a wider range of input datasets, including machine learning estimates, and using machine learning and geostatistical methods in combination may further improve global ozone estimates.
Data accessibility statement
Input data from TOAR are publicly available through the TOAR web interface (https://join.fz-juelich.de/accounts/login/) and through PANGAEA (https://doi.pangaea.de/10.1594/PANGAEA.876108). Modeling output from CCMI used as input to this data fusion project can be obtained from CCMI (http://blogs.reading.ac.uk/ccmi/). MERRA2-GMI output is publicly available (https://acd-ext.gsfc.nasa.gov/Projects/GEOSCCM/MERRA2GMI/). Codes for the M3Fusion method to create a multi-model composite were made available through the journal publication by Chang et al. (2019) (https://gmd.copernicus.org/articles/12/955/2019/). BME codes are publicly available through the BME library (https://mserre.sph.unc.edu/BMElib_web/index.htm). Files containing full gridded results for our annual ozone maps produced by data fusion in this project, and their variance, are available to the public through Zenodo (https://zenodo.org/record/7573881).
Supplemental files
The supplemental files for this article can be found as follows:
SuppInfo_Final2.Pdf
Funding
The authors acknowledge funding from the NASA Health and Air Quality Applied Sciences Team (NNX16AQ30G), NASA (80NSSC23K0930), and the National Institute for Occupational Safety and Health (T42-OH008673). ORC and KLC were supported by NOAA cooperative agreements NA17OAR4320101 and NA22OAR4320151. MD was supported by the Japan Society for the Promotion of Science (JP20K04070). The MERRA-2 GMI Replay simulation was supported by the NASA Modeling, Analysis and Prediction (MAP) program, with computer resources provided by the NASA Center for Climate Information.
Competing interests
The authors have declared that no competing interests exist.
Author contributions
Contributed to conception and design: JSB, MND, KLC, MLS, ORC, MB, JJW.
Contributed to acquisition of data: JSB, MND, KLC, ORC, MGS, SS, XL, LZ, MD, BJ, CAK, JFL, ML, JL, VM, SAS, KS, ST, LZ.
Contributed to analysis and interpretation of data: JSB, MND, KLC, MLS, ORC, HW, JJW.
Drafted and/or revised the article: JSB, MND, KLC, MLS, ORC, HW, JJW.
Approved the submitted version for publication: All authors.
References
How to cite this article: Becker, JS, DeLang, MN, Chang, K-L, Serre, ML, Cooper, OR, Wang, H, Schultz, MG, Schröder, S, Lu, X, Zhang, L, Deushi, M, Josse, B, Keller, CA, Lamarque, J-F, Lin, M, Liu, J, Marécal, V, Strode, SA, Sudo, K, Tilmes, S, Zhang, L, Brauer, M, West, JJ. 2023. Using Regionalized Air Quality Model Performance and Bayesian Maximum Entropy data fusion to map global surface ozone concentration. Elementa: Science of the Anthropocene 11(1). DOI: https://doi.org/10.1525/elementa.2022.00025
Domain Editor-in-Chief: Detlev Helmig, Boulder AIR LLC, Boulder, CO, USA
Associate Editor: Ian Faloona, University of California Davis, Davis, CA, USA
Knowledge Domain: Atmospheric Science
Part of an Elementa Special Feature: Tropospheric Ozone Assessment Report (TOAR)