Comprehensive evaluation of the Copernicus Atmosphere Monitoring Service (CAMS) reanalysis against independent observations: Reactive gases

,


Introduction
The Copernicus Atmosphere Monitoring Service (CAMS; http://atmosphere.copernicus.eu) is a component of the European Union's Earth Observation Programme Copernicus.This service is designed to meet the needs of policy makers and stakeholders for data and information concerning environmental issues such as climate change, air pollution, and other atmospheric challenges like volcanic eruptions.The CAMS core services include, among others, the daily production of near-real-time (NRT) analyses and forecasts of global atmospheric composition (AC), European air quality products with an ensemble system of regional models, and solar and ultraviolet (UV) radiation products.CAMS is also producing global reanalysis data sets of reactive trace gases, greenhouse gases, and aerosol concentrations.These retrospective analyses of AC are beneficial for air quality and climate studies (e.g., Bechtold et al., 2009;Benedetti et al., 2014), solar spectral irradiance studies (e.g., Mueller and Tra ¨ger-Chatterjee, 2014;Polo et al., 2017), monitoring of stratospheric composition (e.g., Lefever et al., 2015;Errera et al., 2019), or as boundary condition for regional models (e.g., Schere et al., 2012;Giordano et al., 2015;Im et al., 2015).Within the CAMS preparatory project, Monitoring Atmospheric Composition and Climate (MACC), a 10-year reanalysis, was produced (Inness et al., 2013).About 3,000 users have downloaded the MACC reanalysis, which covers the years 2003-2012, since its release in 2013.The MACC reanalysis has not been extended because of major changes in the Integrated Forecasting System (IFS) model configuration (Flemming et al., 2015).After the release of a test reanalysis for reactive gases and aerosols (CAMS Interim Reanalysis; Flemming et al., 2017), the CAMS reanalysis was produced, which covers the years from 2003 onward (Inness et al., 2019).The MACC reanalysis suffered from known inconsistencies in the assimilated data, which led to drifts in carbon monoxide (CO) and ozone (O 3 ) model fields and limited its use for reliable trend studies.
An important part of the CAMS service is the provision of independent quality assurance information to the CAMS users.A dedicated validation team produces updated evaluations of the CAMS forecast products every 3 months, based on a multitude of independent observational data sets.A description of these validation activities and results are presented in Eskes et al. (2015), Cuevas et al. (2015), and Wagner et al. (2015).During the production of the CAMS reanalysis, a series of validation reports have been produced to monitor the stability of the data sets (Bennouna et al., 2020).All validation reports are publicly available and can be downloaded from the CAMS quality assurance webpages at https://atmosphere.copernicus.eu/quality-assurance. Inness et al. (2019) provide a comprehensive description of the CAMS modeling system used for the CAMS reanalysis and in this context also present selected initial comparisons to observations for the period 2003-2016 in order to demonstrate improvements compared to previous reanalysis runs, that is, the CAMS interim reanalysis (Flemming et al., 2017) and the MACC reanalysis (Inness et al., 2013).Wang et al. (2020) present additional validation results from comparisons with airborne field campaign data for the period 2003-2016.An intercomparison study of tropospheric O 3 reanalysis products based on the same period has been conducted by Huijnen et al. (2020).
This article presents the full evaluation results from the CAMS validation team with independent observational data, covering 16 years of reanalysis (2003)(2004)(2005)(2006)(2007)(2008)(2009)(2010)(2011)(2012)(2013)(2014)(2015)(2016)(2017)(2018).Here, we evaluate the CAMS reanalysis of reactive gases, namely, O 3 , CO, nitrogen dioxide (NO 2 ), and formaldehyde (HCHO).In the present study, the focus is set on the temporal consistency and stability of the model and on the quantification of seasonal and interannual biases.In order to thoroughly assess the impact of data assimilation and the influence of emissions, the results from a separate model run, for which data assimilation has been switched off (further referred to as "control run"), are included and analyzed.Improvements and shortcomings of the CAMS reanalysis compared to the previous MACC reanalysis are likewise quantified and discussed.The article is structured in the following way: Section 2 provides an overview of the reanalysis model system, the validation data, and methods.Section 3 discusses the validation results, and Section 4 handles the conclusions.
2. Description of the CAMS reanalysis system, validation data, and methods

The CAMS reanalysis model system
The CAMS reanalysis consists of 3D time-consistent AC fields, including aerosols and chemical species.The meteorological model is based on the IFS cycle 42R1, with interactive O 3 and aerosol feeding its radiation scheme, 60 hybrid sigma/pressure (model) levels in the vertical up to the top level at 0.1 hPa, and a horizontal resolution of approximately 80 km (Inness et al., 2019).For the CAMS reanalysis, IFS includes the modified Carbon Bond 2005 Chemical Mechanism (CB05) tropospheric chemistry scheme (Williams et al., 2013), which was originally developed for the TM5 chemistry transport model (CTM; Huijnen et al., 2010).The model computes stratospheric O 3 using the same Cariolle scheme (Cariolle and Teysse `dre, 2007), as in the meteorological production of IFS, while stratospheric NO x is constrained through a climatological ratio of HNO 3 /O 3 at 10 hPa.Inness et al. (2015Inness et al. ( , 2019) ) provide a detailed description of data assimilation for chemical trace gases and Benedetti et al. (2009) for aerosols.Table 1 lists the data sets used in the assimilation system, and Figure S1 displays a time series for data assimilation in the CAMS reanalysis.Anthropogenic reactive gas emissions are based on MACCity (Granier et al., 2011), where wintertime CO emissions have been scaled up over Europe and the United States (Stein et al., 2014).Monthly mean biogenic emissions are derived from hourly calculations by the Model of Emissions of Gases and Aerosols from Nature (MEGAN) using NASA's Modern-Era Retrospective Analysis for Research and Applications (MERRA) reanalyzed meteorology (Sindelarova et al., 2014).NRT fire emissions are taken from the Global Fire Assimilation System (GFAS) v1.2 (Kaiser et al., 2012).Table S1 lists the major differences between the MACC and CAMS reanalysis data sets.In order to assess the impact of data assimilation, our evaluations also include the CAMS control run.The control run applies the same settings as the CAMS reanalysis, except that data assimilation is switched off.It consists of 24-h cycling forecasts and uses the meteorological fields from the CAMS reanalysis.A more detailed documentation of the CAMS reanalysis model setup can be found on the CAMS Confluence webpage.The CAMS 3D reanalysis products are stored as 3-hourly fields.Data are publicly available from the CAMS Atmosphere Data Store.

Validation data and metrics
All data sets used in our validations are listed in Tables 2  and 3.As we use a wide range of different observations, more comprehensive descriptions of the individual observational data sets and validation algorithms are provided in the supplement (Section S1) and in Eskes et al. (2018).Validation metrics are listed in Table 4.More detailed information and a discussion concerning the use of the respective validation metrics can be found in Eskes et al. (2015) and Wagner et al. (2015).

Stratospheric ozone
In the stratosphere, O 3 is validated with vertical profile observations from satellites and sondes as well as with partial column observations from the Network for the Detection of Atmospheric Composition Change (NDACC).
Figure 1 shows the results of the evaluation with Michelson Interferometer for Passive Atmospheric Sounding (MIPAS), Microwave Limb Sounder (MLS), and Atmospheric Chemistry Experiment-Fourier-Transform Spectrometer (ACE-FTS) satellite data averaged over all longitudes and over the three most interesting latitude bands for stratospheric O 3 : Antarctic (90-60 S), Tropics (30 S-30 N), and Arctic (60-90 N).In the upper stratosphere (3-10 hPa, see top row), the absolute value of the bias is generally less than +10% for all instruments, except in 2003-2004 in the north polar region, where larger biases appear.This is related to the degraded quality of the assimilated SCIAMACHY and MIPAS data in 2003 and also to the lack of MLS O 3 profile data for assimilation until the beginning of August 2004 (Inness et al., 2019).A negative bias against MLS (maximum 4%), ACE-FTS (maximum 10%), and MIPAS (maximum 12%) is systematically present in this layer since mid-2004.All limb data sets show that this O 3 deficit has a seasonal component: The negative biases are more pronounced in summer than in With f mean value of the reanalysis o mean observed value and s f ,s o are the corresponding standard deviations.winter (see Figure S2).The seasonal biases with respect to MLS are much smaller than for MIPAS.This indicates that the CAMS reanalysis is much more constrained by MLS than by MIPAS and that the seasonal patterns mostly reflect differences between MIPAS and MLS.Seasonal biases between MIPAS and MLS have already been published for the midlatitudes (Errera et al., 2019) and are shown here in the tropical and polar regions.While such interinstrument biases deserve more investigation, they are beyond the scope of this article.
In the middle stratosphere (10-30 hPa, middle row), after 2004, the bias is generally within +5% for all limb-scanning satellite data sets.This is in very good agreement with a recent comparison of ACE-FTS with MLS and MIPAS observations, which also reports biases within 5% in the middle stratosphere (Sheese et al., 2017).In the lower stratosphere (30-70 hPa, bottom row), the spread of the biases against the different instruments is larger.The bias against MLS and ACE-FTS is positive (3%-7%) in the Tropics.In the polar regions, the bias against MLS is low (<1.5%),except during the O 3 hole events (September-October in the 90 -60 S latitude band) when O 3 abundances are approximately 3% larger than in the MLS data set, that is, O 3 depletion is slightly underestimated.The biases against MI-PAS and ACE-FTS are negative but remain always lower than 10% and 5%, respectively.where the limited availability of such data has a negative impact on the validation results.The bias against ACE-FTS is stable over time, except for a slight drift after 2013 in the lower tropical stratosphere (but <1% over 5 years).The deseasonalized biases with both MLS and ACE-FTS exhibit an increase of approximately 2% over the last year (2018) in the polar middle stratosphere and in all three regions of the lower stratosphere.This feature continues in 2019 and is likely related to the switch from reprocessed to NRT MLS V4 data.
Figure S3 gives a global overview of the agreement between the CAMS and MACC reanalysis and the observations by the limb-scanning instruments, averaged over the whole period.The correlation between the reanalysis and the observations is very good (at least >0.8 for the pressure range 2-200 hPa or altitude range 10-45 km for ACE-FTS).
The CAMS reanalysis setup does not include explicit modeling of stratospheric chemistry.The stratospheric O 3 profile is constrained using the Cariolle parametrization (e.g., Cariolle and Teysse `dre, 2007).In practice, this leads to considerable biases in the stratospheric profile observed in the control run when comparing with  S2) and NDACC data (Figure S8, Table S3) in the stratosphere, reflecting uncertainties in the Cariolle O 3 parametrization.Large positive biases of up to 40% appear over Antarctica and the Arctic and modified normalized mean biases (MNMBs) up to 20% over the Northern midlatitudes.For NDACC stations in the Southern midlatitudes and over the Tropics, the control run shows negative biases up to -15% (Figure S8).The assimilation of O 3 total columns and stratospheric profiles in the CAMS reanalysis successfully compensates for this lack in explicit stratospheric chemistry and thus proves to be very effective in constraining stratospheric O 3 .The seasonal and interannual evaluation with O 3 sonde data shows a very stable and consistent performance of the CAMS reanalysis during all years and seasons (Figures S4-S7) with MNMBs mostly smaller than 5%.Only over the Arctic, a very small change in bias from negative to positive is visible from 2017 onward, which likewise appears in the control run.

Tropospheric ozone
In the troposphere, our validations rely on sonde observations (Figure 4; Table S4) and In-service Aircraft for a Global Observing System (IAGOS) measurements (Figure 5; Table S5).
For the CAMS reanalysis, MNMBs in all regions are mostly within 10%, with respect to sonde observations.Larger positive biases appear over the Tropics and Antarctica.
The control run underestimates O 3 with MNMBs between 10% and 20% in all regions except the Tropics, where O 3 is overestimated with MNMBs up to 20% (Figure 4; Figures S9-S11).The continual overestimation of O 3 could potentially relate to an overestimation of precursor emissions (fire and biogenic emissions) in this region (see Section 3.4).In the high latitudes and Northern midlatitudes, the control run has a seasonal pattern in the bias, with larger negative biases during winter and spring.
Data assimilation effectively increases tropospheric O 3 , which improves MNMBs in most regions for the CAMS reanalysis compared to the control run.Likewise, the seasonality in the biases is largely eliminated by the data assimilation.In the Tropics, data assimilation is less effective in improving the biases for the CAMS reanalysis compared to the control run.
Two periods with larger MNMBs in the time series are noticeable for the CAMS reanalysis, which affect the longterm consistency and relate to data assimilation issues: Tropics, we find larger differences between the control run and the CAMS reanalysis after these changes in 2013 (Figure S11).
The evaluation with IAGOS data shows MNMBs of -5% and 25% over North America, -10% to 20% over Europe, between -10% and 40% over East Asia, and +35% over India (Figure 5).O 3 is mostly overestimated over the more polluted metropolitan sites, especially over India and East Asia (see Figures S12-S15).The control run has larger positive MNMBs over India and partly over East Asia, but lower MNMBs over Europe and North America (see also Figure S16).Over India and East Asia, biases increase during July-September, revealing problems of the model to correctly simulate the low O 3 values during the Asian monsoon season (Figure S16).The seasonal pattern in the biases is almost absent for Europe and North America and reduced for India and East Asia for the CAMS reanalysis, which is more constant over the years, however, with partly larger positive biases compared to the control run.

Surface ozone
Modeled surface O 3 is compared to Global Atmosphere Watch (GAW) data (Figure 7), International Arctic Systems for Observing the Atmosphere (IASOA) network Figure 7 shows the time series of MNMBs for surface O 3 calculated from model and GAW observational data, as regional average over stations in four different latitude zones.Biases for surface O 3 are generally larger than for O 3 in the free troposphere but remain within +30%.Largest biases for both CAMS reanalysis and control run appear during Arctic spring (MNMBs up to 80%).
A closer look at Arctic stations from the IASOA network (Figure S17, Table S6) reveals that the High Arctic coastal stations (Alert, Barrow, and Villum Research Station) are influenced by O 3 depletion events during arctic spring (MAM: March to May).These halogen chemistry reactions are not represented in the simulations, and the model is thus unable to capture the low concentrations measured in spring at these sites.European Arctic IASOA sites (Esrange, Karasjok, Oulanka, Pallas, and Tustervatn), which are located inland, are not affected by O 3 depletion events and thus show smaller biases during springtime (see Figure S17).Our evaluations show that the impact of data assimilation is rather small at the surface, which reflects in almost identical biases for the CAMS reanalysis and control run (Figure 7; Figures S17-S19).Differences between the control run and the CAMS reanalysis appear for high-latitude regions (Figure 9; Figure S20), where data assimilation increases surface O 3 , which partly improves the negative bias in the control run but partly also leads to overestimations of modeled O 3 , especially during Arctic/Antarctic spring (MAM/SON, respectively; SON ¼ September to November).
For Antarctica (Figure 9), the changes in the assimilation system described for tropospheric O 3 are visible even at the surface in a distinct shift in bias from negative to positive during DJF, and MAM 2012/2013, whereas the control run remains stable.
In the Northern midlatitudes (Figure 7), the interannual time series of biases show a constant seasonal pattern in the biases, with negative biases during DJF and MAM and larger positive biases during JJA and SON.This seasonal pattern in surface O 3 biases is very common in global CTMs and has been discussed in various studies before (e.g., Ordo ´n ˜ez et al., 2010;Val Martin et al., 2014;Wagner et al., 2015).
For Europe, we have additionally investigated these seasonal variations of MNMBs for different latitude zones for EMEP data sets (Figure 8).The seasonal mean variability of biases for the CAMS reanalysis has been separated in nighttime (0-3 UTC) and in midday time (12-15 UTC) values (UTC: coordinated universal time).For Northern Europe, and to a smaller extent for Southern Europe, the overestimation in summer and underestimation during winter are stronger for nighttime O 3 than for daytime O 3 .This likely means that nocturnal O 3 destruction processes in the boundary layer (like NO x titration) are not reproduced correctly in the model.The global model has difficulties to resolve such regional subgrid processes (see e.g., Wagner et al., 2015).For stations above 1,000 m, only positive biases are present throughout the year without changes between nighttime and daytime.The overestimation reaches a maximum in October (MNMBs of 15%).
There is a drift in the interannual time series of seasonal MNMBs toward larger positive biases for the Northern midlatitudes (Figure 10).Similar drifts can be observed in the evaluation with EMEP data for Northern Europe (Figure S18) as well as in the evaluation with IAGOS surface data over Europe and East Asia (Figures S19 and S21).Park et al. (2020) accordingly note inadequateness of the CAMS reanalysis to capture O 3 trends for East Asia.In the Tropics, a drift is visible for JJA (Figure S22).
Although further investigations and sensitivity analysis will be needed to prove this, it is likely that unreasonable trends in the emissions are responsible for the drifts (see Section 3.4).In a recent study, Gaubert et al. (2020) show that running CTMs with biased CO and volatile organic compound (VOC) emissions can lead to poorly modeled O 3 .
Compared to the previous MACC reanalysis, we see large improvements for the CAMS reanalysis for O 3 (see The overestimation of O 3 below around 15 hPa completely disappears in the CAMS reanalysis due to a better setup of the variational bias correction scheme, which is now applied only to total column O 3 retrievals but not to profiles from MLS or MIPAS (Inness et al., 2013(Inness et al., , 2019)).The absence of the drift also leads to improvements for the free troposphere and surface.Apart from the drift, the MACC reanalysis also shows large negative biases (MNMBs down to -150%) in the high latitudes (Figure 7).This is a known issue of the former coupled IFS-MOZART-3 (MO-ZART: Model for OZone And Related chemical Tracers) CTM used in MACC (see e.g., Wagner et al., 2015).The improvements from MACC to CAMS mostly relate to changes in the chemistry module, that is, the replacement of the coupled model system (IFS and MOZART-3 CTM) used for the MACC reanalysis by the online-coupled model C-IFS (with CB05 of the TM5 CTM) used for the CAMS reanalysis (Flemming et al., 2015).As a result, MNMBs, especially in high-latitude regions, are considerably smaller and more stable.Furthermore, the seasonality of O 3 is better captured.

Carbon monoxide
Modeled CO is compared to Measurements of Pollution in the Troposphere (MOPITT) and Infrared Atmospheric Sounding Interferometer (IASI) total column satellite measurements (Figures 11 and 12; Figures S23 and S24), to NDACC partial column measurements (Figure 13; Figure S25 and Table S9), and to IAGOS aircraft data (Figure 14; Figures S26 and S27 and Tables S10 and S11) and GAW CO surface observations (Figure 15; Figures S28 and S29 and Table S12).
Figure 11 shows MOPITT total column values as a function of latitude and time and the biases in comparison with the CAMS reanalysis and control.Observed CO total columns are slightly underestimated by the model over all regions with MNMBs mostly within +10%.Larger MNMBs (up to 20%) appear over tropical regions, especially during the years 2012-2015.
The control run has larger CO in all regions, especially during the winter season.Largest positive MNMBs (up to 50%) show up over the Southern Hemisphere (SH) during November to May.For later years (2012 onward), the overestimation also reaches over to the northern hemispheric low latitudes.This effect in the control run accordingly appears in the validation with NDACC data in the SH (Figure 13) and has likewise been described by Flemming et al. (2017) for the CAMS interim reanalysis.In their study, the authors assume that the overestimation of CO in the SH points to deficiencies in the simulation of the global chemical loss and production of CO as well as problems with large-scale transport.To a minor extent, an overestimation of the GFAS biomass burning emissions for Central Africa, Maritime South East Asia, and South America could also seasonally contribute to this according to Flemming et al. (2017).In the frame of our validations, we find that the biogenic VOC emissions (MEGAN-MACC; Sindelarova et al., 2014;Sindelarova, 2018) as chemical source term for CO may also play a considerable role concerning the overestimation of CO in the SH observed for the control run, as discussed below.In a comparison of available isoprene emission data, Sindelarova (2018) shows that the MEGAN-MERRA biogenic emissions (used in the CAMS reanalysis and control run) are about 1.5-2 times higher and show larger year-to-year differences than other available data sets.Additionally, the TM5 model seems to be more sensitive to changes in VOC (NMVOC) rates, as described by Zeng et al. (2015).In combination with larger biogenic emissions, this effect might contribute to the variation and magnitude of biases for CO shown for the control run.Data assimilation reduces total column CO for the CAMS reanalysis compared to the control run, and positive biases remain only for stations in the high latitudes of the SH (see Figure 13).2015) and will not be further addressed in the frame of this article.For Europe and the United States, biases of the CAMS reanalysis compared with total column data remain stable over the entire period with a seasonal variation showing larger underestimation during the summer season (up to -12%) and lower underestimation during winter (up to -5%).A similar seasonal pattern appears over East Asia and South Africa, whereas over North Africa, it is reversed.For IASI data, the seasonal pattern is stronger, and MNMBs are generally larger (up to -18% in summer and +8% in winter).The control run, however, has larger variable CO concentrations over all regions.In comparison with MO-PITT, this partly leads to an overestimation of CO during the winter season (Europe, United States, and East Asia).
The variability in the time series of biases in the control run closely resembles the annual variability of the CO burden of the control run, with low total column CO in 2008 (a La Nin ˜a year) and large total column CO in 2015 (an El Nin ˜o year with high fire activity).Main drivers for the spatial and temporal CO burden are wildfire emissions and anthropogenic emissions (Flemming and Inness, 2019).
An overestimation of fire emissions could explain larger CO in the SH and maxima during the El Nin ˜o year 2015.However, it does not explain the variability observed for other years like the increasing biases between 2011 and 2015 in the control run.
Various studies (Hassler et al., 2016;Elguindi et al., 2020) describe an inaccurate representation of CO emission trends in the MACCity inventory for Europe and the United States.Hassler et al. (2016) show that reduction trends of vehicle CO emissions in U.S. cities after 2007 are not captured correctly in the MACCity inventory.Elguindi et al. (2020) show large uncertainties for regions like Next to anthropogenic and fire emissions, another source for CO lies in the oxidation of biogenic VOC emissions as mentioned before.The annual variability in CO biases in the control run closely resembles the annual variation of the MEGAN-MACC isoprene emissions (see Sindelarova et al., 2014;Sindelarova, 2018; Figure 3), with low values during the La Nin ˜a year 2008 and an increase during the years after up until the El Nino year 2015/2016.Further investigations, including sensitivity tests, are needed to thoroughly disentangle the specific influence of the different emission sources on the variability and drifts of the biases in the control run.
For most regions shown in Figure 12, the biases remain stable for the CAMS reanalysis, which clearly illustrates how well assimilated MOPITT data manages to constrain modeled CO and thus successfully prevents drifts caused by the emissions.This underlines the importance of a stable and consistent satellite CO product for assimilation.Only for East Asia, a drift in bias visibly remains in the CAMS reanalysis (Figure 12).
The big fire events in 2003, 2008, and 2012 over Siberia could be captured correctly by the model (Figure S23).In the control run, the magnitude of the events is mostly slightly overestimated, which points to an overestimation of the fire emissions during these events.Autumn and winter 2015/2016 is the period with the largest positive biases.Especially over Asia, autumn 2015 is an exceptional season with increased biases of up to þ10%.September and October 2015 were marked by a strong El Nin ˜o event, which intensified during the dry season over large regions of Indonesia.During these months, the largest amount of fire emissions were recorded in Indonesia since 1997 based on GFED emissions time series (Huijnen et al., 2016b).The overestimation of emissions during December to March is strongly present in the control run especially for South Asia but could be greatly reduced by the data assimilation.
Figure 13 and Table S9 show relative differences of model and Fourier-transform infrared (FTIR) partial columns (NDACC) for different sites and latitudes.The results for the control run accordingly show the large positive error in the SH and the drifts toward positive biases from 2011 onward for the Northern midlatitudes as described in the validation with satellite data before.The relative differences for the stations in the SH show a significant positive trend in the relative biases: Maido: þ0.7%, Lauder: þ0.2%, and Wollongong: þ1.1% (Figure S25).A negative trend at the Antarctic station Arrival Heights is likely related to the observations, which are dropping in 2009 (Figure S25).For FTIR stations located in the United States and Europe, the validation reveals an underestimation of modeled CO partial columns with values fluctuating around -5% and smaller (Toronto).For Jungfraujoch, Zugspitze, and Toronto, the largest underestimation of CO appears during the summer months (JJA) when CO is low, with values of up to 4% lower compared to the relative bias during the winter months.For three Arctic sites (Eureka, Ny Alesund, and Thule), CO partial columns are slightly underestimated by the model in the order of -5%, which is close to the reported measurement systematic uncertainty.For all three Arctic sites, the strongest underestimation (up to -10%) takes place during the end of winter (January to February, with only few measurements due to polar night) and early spring (March to April).For three sites located in the low latitudes of the NH (Izana, Mauna Loa, and Altzomoni) and five sites in the SH results show that the data assimilation mostly corrects the positive biases in the control run.For some sites (Alzomi and  S11) and GAW surface observations (Figure 15; Figures S28 and S29) shows that CO mixing ratios in the free troposphere and at the surface are mostly underestimated by the CAMS reanalysis and control run during 2003 until 2013 for both Europe and East Asia.For later years, the control run partly overestimates CO in the free troposphere.For North America, the validation at the surface shows partly positive MNMBs up to 20%-50% (Figure 15; Figure S26).Traffic emissions have been scaled during the winter season for North America and Europe (Stein et al., 2014), which improves the biases for the winter period in both regions (Figures S27-S29).However, during the summer season, surface CO is still too low in Europe, which points to too low emissions in the MACCity inventory (accordingly reported by Hassler et al., 2016) or too large CO sinks.The strong overestimation of modeled surface CO over India points to an overestimation of emissions in this region (Figure S26).
Compared to the MACC reanalysis, the CAMS reanalysis is more consistent over time and has reduced biases (Figures 13-15; Tables S9-S12).For NDACC stations in the high northern latitudes (Figure 13), the MACC reanalysis has a negative trend in bias from 2008 onward (biases of up to -20%).This is related to discontinuities in the data assimilation of CO satellite data in the MACC reanalysis, namely the assimilation of IASI satellite data in April 2008, which leads to a decrease of CO especially in the high northern latitudes (see also Flemming et al., 2017;Inness et al., 2019).The distinct drop in bias in April 2008 can be noticed accordingly for validations at the surface with GAW data for North America (Figure 15).The CAMS reanalysis that assimilates only MOPITT data is more consistent here and thus shows significant improvement.For tropical regions in the Northern Hemisphere (Izana, Mauna Loa, and Altzomoni), the CAMS reanalysis has an increased bias, which is not observed in the MACC run, however.The validation with GAW surface data shows that the MACC reanalysis has a negative offset for surface CO mixing ratios in Europe and North America (see Figure 15, GAW; Figure S26, IAGOS), which is likely related to the unrealistically low wintertime road traffic emissions that were still unscaled in the MACC reanalysis (Inness et al., 2013(Inness et al., , 2019)).

Nitrogen dioxide
Tropospheric NO 2 columns have been validated with SCIA-MACHY and GOME-2 satellite data (Figures 16-18;    Art.9( shown in the time series, the MACC reanalysis is closer to satellite-retrieved HCHO columns than the CAMS reanalysis here.For the regions North Africa and Indonesia, which are dominated by biogenic and pyrogenic sources, the reanalysis runs show a positive offset compared to satellite retrievals.The seasonality is in agreement with the retrievals for Indonesia and overestimated for North Africa.For September and October 2015 over Indonesia, satellite retrievals and simulations show a distinct maximum, which is, however, much more pronounced in the simulations.September, October, and November 2015 were strong El Nino months (e.g., NOAA El Nin ˜o webpage), which caused droughts and higher fire activity in Indonesia.Fire emissions used by the CAMS reanalysis seem to be largely overestimated for this El Nino year, resulting in an overestimation of up to a factor of 1.8 compared to the observations.A similar overestimation was also reported for the CAMS NRT product, for which it was shown that this is not due to cloud flagging applied to the satellite and model data (Huijnen et al., 2016a)  Comparison with the MACC reanalysis shows that the CAMS reanalysis has lower MNMBs for Indonesia and East Asia during the summer period (Figure S31), likely related to the differences in the fire emissions used (GFED/GFAS v0 in MACC and GFAS v1.2 in CAMS).For other regions and periods, the CAMS reanalysis results in larger MNMBs.

Conclusions
CAMS provides its users with a variety of different products in the field of air quality and AC.Beside the NRT forecasts, there is also a large and growing interest in longterm retrospective analysis (reanalysis) data sets.After the release of the MACC reanalysis in 2013, and an interim test product in 2015, CAMS has now produced a new reanalysis data set (CAMS reanalysis), which is freely available to the public.
We have comprehensively validated the reactive gas species (O 3 , CO, NO 2 , and HCHO) of this new product in the period 2003-2018 with multiple independent observations.For reanalysis data sets, a temporal stability of the model results over time is crucial, for example, for trend studies on chemical species.Special focus was thus set on the long-term consistency shown in the time series of biases and on the assessment of seasonal and interseasonal changes in biases.In order to thoroughly evaluate the impact of data assimilation on the long-term quality of results, a comparison with a control run without assimilated data is conducted.Finally, improvements and shortcomings of the CAMS reanalysis compared to the previous MACC reanalysis are quantified and discussed.
Our evaluations show that the CAMS reanalysis reproduces O 3 with MNMBs mostly within +10% in the stratosphere and troposphere of the Northern midlatitudes compared to sonde observations and satellite instruments.Larger biases (up to +38%) appear over the high latitudes, the Tropics, and generally for surface O 3 .Total column CO over Europe, the United States, East Asia, and North Africa is reproduced with MNMBs mostly within +10%.Larger MNMBs appear over East Asia and for surface CO, reaching up to +40%.The CAMS reanalysis performs reasonably well regarding the magnitude and seasonality of NO 2 in comparison with SCIAMACHY and GOME-2 NO 2 satellite retrievals.Stronger shipping signals show up compared to the satellite observations, and NO 2 in boreal fire regions is overestimated in summer, whereas NO 2 over the pollution hotspots of Central Europe is underestimated in winter.Modeled HCHO columns mostly show a good agreement with SCIAMACHY and GOME-2 satellite observations.For regions dominated by biogenic emissions with some anthropogenic input (East Asia and Eastern United States), the CAMS reanalysis reproduces absolute values and seasonality but fails to match the maxima of the satellite retrievals for individual years.The seasonality over East Asia is generally underestimated with differences of up to approximately 1 Â 10 15 molecule cm -2 .For regions where biogenic and pyrogenic sources dominantly influence HCHO columns (North Africa and Indonesia), the CAMS reanalysis shows a positive offset compared to satellite retrievals.Concerning the longterm consistency, our results show that the CAMS model system mostly provides a stable and accurate representation of the global distribution of reactive gases over time.However, the comparison with the control without data assimilation reveals some shortcomings in the model and emissions: The lack of an explicit modeling of stratospheric chemistry leads to large biases for stratospheric O 3 .For tropospheric and surface O 3 , the model shows seasonal patterns in the biases in midlatitude and highlatitude regions with larger negative MNMBs during winter and spring.For the Arctic, large positive biases appear during O 3 depletion events in spring.Furthermore, the control run shows large overestimation for CO in the SH, likely related to the overestimation of fire and biogenic emissions, together with shortcomings in the simulation of the global loss, production, and large-scale transport of CO.Overestimations of HCHO concentrations likewise suggest that fire emissions are overestimated over boreal regions, Indonesia, Africa, and East Asia, especially during years with high fire activity like during the strong El Nin ˜o event 2015/2016.
Finally, we also discovered positive drifts in the interannual time series of biases for various species (CO, O 3 , and NO 2 ) in the control run, likely triggered by unrealistic emission trends, especially after 2010.Data assimilation is able to successfully constrain stratospheric and tropospheric O 3 and CO and thus ensures the long-term consistency and stability of the CAMS reanalysis.However, this works less effectively near the surface and for short-lived species like NO 2 .
Our evaluations concerning the long-term stability of the CAMS reanalysis show that the consistency in the quality of model results is also essentially affected by limitations in the availability of high-quality data for assimilation and by changes in the assimilated satellite data sets.Especially during the first years, degraded quality data and the lack of O 3 profile data deteriorated the validation results for O 3 and NO 2 .Modifications in the assimilation system in 2012/2013 cause jumps in the interannual seasonal time series of biases for tropospheric and surface O 3 , especially over high-latitude regions.For trend analysis, these effects related to changes in the data assimilation need to be considered and removed.
Compared to the MACC reanalysis, the CAMS reanalysis has systematically lower biases, better correlation, and a weaker seasonal pattern for O 3 especially in the free troposphere and at the surface.Aside from the improved data assimilation, especially the change in the chemistry module, that is, the online-coupled IFS model combined with the different chemistry treatment in the CB05 model leads to a reduction of biases for tropospheric and surface O 3 .Largest improvement in the magnitude of biases is thus found over the Arctic and Antarctic regions.For CO, the comparison with the MACC reanalysis shows that the scaling of the winter road traffic emissions and a more consistent data assimilation improve the results.For NO 2 , the CAMS reanalysis shows a better reproduction of wintertime NO 2 over East Asia.For South Africa, however, the CAMS reanalysis has larger underestimations of observed NO 2 during SH summer and autumn.For HCHO, improvements compared to the MACC reanalysis can be seen over Indonesia, but for North Africa and the Eastern United States, the MACC reanalysis shows smaller biases.
For next-generation CAMS reanalyses, challenges in the data assimilation will include the integration of more species (e.g., HCHO) and additional sensors (O 3 profile data turned out to have a crucial impact) while assuring good long-term stability of results.New sensors such as sentinel 5P will provide promising perspectives.Our results concerning the control run suggest that deficiencies in the model's chemistry and transport scheme need to be investigated and improved further, especially in combination with the emission data sets, to remove large zonal errors such as the overestimation of CO in the SH as well as drifts and seasonal patterns in the biases of the control run.Given the large impact of anthropogenic, fire, and biogenic emissions on CO, more care should be taken to investigate and consolidate emission rates and trends.Simple scaling approaches as conducted for the wintertime traffic emissions could be replaced by more sophisticated approximations, available from recent bottom-up and top-down inventories.A more comprehensive stratospheric chemistry scheme could improve the model results in the stratosphere.Furthermore, the integration of small-scale local processes, such as halogen chemistry, could help to reduce more local sources of errors in the model system, such as the large biases for surface O 3 over the Arctic in springtime.The CAMS reanalysis data set that was not produced in the frame of our study but as a product of CAMS can be found at Atmosphere Data Store.Available at https://ads.atmosphere.copernicus.eu.Accessed 4 May 2021.

Supplemental files
The supplemental files for this article can be found in a composite file (docx).

Figure 2
Figure 2 presents the deseasonalized time series of the normalized mean biases based on the monthly climatologies shown in Figure S2.The importance of high-quality assimilation data most obviously shows for the year 2003,where the limited availability of such data has a negative impact on the validation results.The bias against ACE-FTS is stable over time, except for a slight drift after 2013 in the lower tropical stratosphere (but <1% over 5 years).The deseasonalized biases with both MLS and ACE-FTS exhibit an increase of approximately 2% over the last year (2018) in the polar middle stratosphere and in all three regions of the lower stratosphere.This feature continues in 2019 and is likely related to the switch from reprocessed to NRT MLS V4 data.FigureS3gives a global overview of the agreement between the CAMS and MACC reanalysis and the observations by the limb-scanning instruments, averaged over the whole period.The correlation between the reanalysis and the observations is very good (at least >0.8 for the pressure range 2-200 hPa or altitude range 10-45 km for ACE-FTS).The CAMS reanalysis setup does not include explicit modeling of stratospheric chemistry.The stratospheric O 3 profile is constrained using the Cariolle parametrization (e.g.,Cariolle and Teysse `dre, 2007).In practice, this leads to considerable biases in the stratospheric profile observed in the control run when comparing with

Figure 5 .
Figure 5.Time series of monthly MNMBs (%) from the comparison against In-service Aircraft for a Global Observing System O 3 aircraft data in the free troposphere for the period 2003-2018.Top left: North America, top right: Europe, bottom left: East Asia, bottom right: India.The plots show averages over various airports in the free troposphere (350-850 hPa) for the CAMS reanalysis (red), control run (blue), and MACC reanalysis (green).MNMBs ¼ modified normalized mean biases; CAMS ¼ Copernicus Atmosphere Monitoring Service; MACC ¼ Monitoring Atmospheric Composition and Climate; O 3 ¼ ozone .DOI: https://doi.org/10.1525/elementa.2020.00171.f5

Figure
S23 displays observed time series of total column CO concentrations of MOPITT and IASI in comparison to the CAMS reanalysis and control run over different regions.Figures 12 and S24 show the resulting time series of MNMBs.The differences of total column CO between MOPITT and IASI are discussed by, for example, Illingworth et al. (2011) and George et al. (

Figure 11 .
Figure 11.Measurements of Pollution in the Troposphere (MOPITT) V7 CO total column (upper panel) as a function of latitude and time from January 2003 to December 2018.Relative biases between MOPITT V7 and the CAMS reanalysis (lower panel, left) and between MOPITT V7 and the control run (lower panel, right).CAMS ¼ Copernicus Atmosphere Monitoring Service; CO ¼ carbon monoxide.DOI: https://doi.org/10.1525/elementa.2020.00171.f11

Figure 12 .Figure 13 .
Figure 12.Time series of MNMBs bias (%) from the validation with Measurements of Pollution in the Troposphere v7 CO total columns over selected regions for the years from 2003 to 2018.First row left: the United States, first row right: Europe, second row left: East Asia, second row right: South Asia, third row left: North Africa, third row right: South Africa, last row left: Alaska fire region, last row left: Siberian fire region (CAMS reanalysis: red; control run: blue).MNMBs ¼ modified normalized mean biases; CAMS ¼ Copernicus Atmosphere Monitoring Service; CO ¼ carbon monoxide.DOI: https://doi.org/10.1525/elementa.2020.00171.f12

Figure 14 .Figure 15 .
Figure 14.Time series of monthly MNMBs (%) from the comparison against In-service Aircraft for a Global Observing System CO aircraft data in the free troposphere for the period 2003-2018.Top left: North America, top right: Europe, bottom left: East Asia, bottom right: India.The plots show averages over various airports in the free troposphere (350-850 hPa; CAMS reanalysis: red, control run: blue, and MACC reanalysis: green).MNMBs ¼ modified normalized mean biases; CAMS ¼ Copernicus Atmosphere Monitoring Service; MACC ¼ Monitoring Atmospheric Composition and Climate; CO ¼ carbon monoxide.DOI: https://doi.org/10.1525/elementa.2020.00171.f14

Figure 16 .
Figure 16.Global map comparisons of satellite-retrieved and model-simulated seasonally averaged tropospheric NO 2 columns (molecule cm -2 ).From top to bottom: DJF 2017/2018, MAM 2018, JJA 2018, and SON 2018.The difference between CAMS reanalysis and GOME-2 is shown in the left, GOME-2 in the middle, and the CAMS reanalysis in the right column.GOME-2 data were gridded to model resolution (i.e., 0.75 Â 0.75 ).Model data were treated with the same reference sector subtraction approach as the satellite data.SON ¼ September to November; DJF ¼ December to February; MAM ¼ March to May; JJA ¼ June to August; NO 2 ¼ nitrogen dioxide; CAMS ¼ Copernicus Atmosphere Monitoring Service; GOME ¼ Global Ozone Monitoring Experiment.DOI: https://doi.org/10.1525/elementa.2020.00171.f16

Figure 18 .
Figure 18.Modified normalized mean bias (MNMB) from the comparison of time series of tropospheric NO 2 columns from SCIAMACHY and GOME-2 to model results.Results are derived from daily averages over a specific region.Negative daily averages of the retrievals have been flagged in the calculation of the MNMB only.NO 2 ¼ nitrogen dioxide; SCIAMACHY ¼ Scanning Imaging Absorption Spectrometer for Atmospheric ChartographY; GOME ¼ Global Ozone Monitoring Experiment.DOI: https://doi.org/10.1525/elementa.2020.00171.f18 have been validated with SCIAMACHY and GOME-2 satellite data (Figures19-21; FigureS31).Global monthly mean map comparisons (see Figure19as an example of seasonal averages for 2018) to satellite retrievals from SCIAMACHY and GOME-2 show that the magnitude of oceanic and continental background values and the overall spatial distribution are well represented by the CAMS reanalysis.Compared to SCIA-MACHY and GOME-2 satellite retrievals, there is an

Figure 19 .
Figure 19.Global map comparisons of satellite-retrieved and model-simulated seasonally averaged tropospheric HCHO columns (molecule cm À2 ).Satellite-retrieved values in the region of the South Atlantic anomaly are not valid and therefore masked out (white boxes in all images except those that show model results only).HCHO ¼ formaldehyde.DOI: https://doi.org/10.1525/elementa.2020.00171.f19

Figure 20 .
Figure 20.Comparison of time series of tropospheric HCHO columns from SCIAMACHY (up to April 2012) and GOME-2 (from April 2012 onward) to model results (HCHO columns: black, CAMS reanalysis: red, control: blue, and MACC reanalysis: green).The switch from SCIAMACHY to GOME-2 in April 2012 is indicated by the vertical black dashed lines.The regions differ from those used for NO 2 shown in Figure 17 to better focus on HCHO hotspots: East Asia: 25-40 N, 110-125 E; Eastern United States: 30-40 N, 75-90 W; Northern Africa: 0-15 N, 15 W-25 E; and Indonesia: 5 S-5 N, 100-120 E. Negative satellite-retrieved values over Eastern United States are due to a lack of data during the Northern Hemisphere winter months for this region.HCHO ¼ formaldehyde; SCIAMACHY ¼ Scanning Imaging Absorption Spectrometer for Atmospheric ChartographY; GOME ¼ Global Ozone Monitoring Experiment; CAMS ¼ Copernicus Atmosphere Monitoring Service; MACC ¼ Monitoring Atmospheric Composition and Climate; NO 2 ¼ nitrogen dioxide.DOI: https://doi.org/10.1525/elementa.2020.00171.f20
Table 5 lists all acronyms.

TABLE 5 .
(continued) WOUDCWorld Ozone and Ultraviolet Radiation Data Centre 1) page 22 of 31 Wagner et al: Evaluation of the Copernicus Atmosphere Monitoring Service reanalysis