Multi-model intercomparisons of air quality simulations for the KORUS-AQ campaign

, The Korea-United States Air Quality (KORUS-AQ) field study was conducted during May–June 2 0 16 to understand the factors controlling air quality in South Korea. Extensive aircraft and ground network observations from the campaign offer an opportunity to address issues in current air quality models and reduce model-observation disagreements. This study examines these issues using model evaluation against the KORUS-AQ observations and intercomparisons between models. Six regional and two global chemistry transport models using identical anthropogenic emissions participated in the model intercomparison study and were used to conduct air quality simulations focusing on ozone (O 3 ), aerosols, and their precursors for the campaign. Using the KORUSv5 emissions inventory, which has been updated from KORUSv1, the models successfully reproduced observed nitrogen oxides (NO x ) and volatile organic compounds mixing ratios in surface air, especially in the Seoul Metropolitan Area, but showed systematic low biases for carbon monoxide (CO), implying possible missing CO sources in the inventory in East Asia. Although the DC-8 aircraft-observed O 3 precursor mixing ratios were well captured by the models, simulated O 3 levels were lower than the observations in the free troposphere in part due to too low stratospheric O 3 influxes, especially in regional models. During the campaign, the synoptic meteorology played an important role in determining the observed variability of PM 2.5 (PM diameter (cid:2) 2.5 m m) concentrations in South Korea. The models successfully simulated the observed PM 2.5 variability with significant inorganic sulfate-nitrate-ammonium aerosols contribution, but failed to reproduce that of organic aerosols, causing a large inter-model variability. From the model evaluation, we find that an ensemble of model results, incorporating individual models with differing strengths and weaknesses, performs better than most individual models at representing observed atmospheric compositions for the campaign. Ongoing model development and evaluation, in close collaboration with emissions inventory development, are needed to improve air quality forecasting.


Introduction
An international air quality field study, Korea-United States Air Quality (KORUS-AQ), which was jointly hosted by the Korean National Institute of Environmental Research (NIER) and U.S. National Aeronautics and Space Administration (NASA), occurred in South Korea during May-June 2016, to understand the factors controlling air quality across urban, rural, and coastal interfaces (Crawford et al., n.d.). During the campaign, extensive surface and aircraft observations of gas and aerosol species were conducted using various instruments (Crawford et al., n.d.). In addition, a number of 3-D chemistry transport models (CTMs) were used on a daily basis to produce up to 5-day air quality forecasts for planning aircraft observations in the peninsula and nearby oceans during the campaign.
Although the forecasts were valuable for identifying pollution plumes and other features targeted for observational sampling, the forecasts sometimes differed between models and failed to capture the observed magnitudes of aerosols, ozone (O 3 ) and their precursors. A number of explanations are possible for the differences between models as well as between models and observations, including differences in the emissions used (or from reality), meteorology, representation of chemistry and aerosol formation, and other processes affecting atmospheric composition. In this study, we will address these issues using model evaluation against extensive surface and aircraft observations from the campaign as well as intercomparisons between models.
Model intercomparison studies of regional air quality models have been extensively performed in East Asia since Carmichael et al. (2001) initiated the Model Intercomparison Study of long-range transport and sulfur deposition in East Asia (MICS-Asia). The first intercomparison study mainly focused on regional model capabilities in simulating source-receptor (S-R) relationships for sulfur deposition in East Asia. MICS-Asia II was further expanded by including nitrogen compounds, O 3 , and aerosols as key species for regional acid deposition with the use of observations from the Acid Deposition Monitoring Network in East Asia (EANET) to evaluate participating models (Carmichael and Ueda, 2008). Recently, the MICS-Asia III updated the intercomparison of regional simulations for O 3 , aerosols, and their precursors in East Asia (Chen et al., 2019;Li et al., 2019) with mechanistic analyses to explain the diversity of model performances in reproducing observed species concentrations as well as deposition fluxes in East Asia Itahashi et al., 2020;Tan et al., 2020;Tao et al., 2020).
The MICS-Asia initiative has provided a number of important scientific findings to enhance our understanding on regional and local air pollution problems in East Asia including S-R relationships of key species such as O 3 , aerosols, and their precursors. Scientific findings and quantitative analyses of models' capabilities in reproducing observations from the MICS-Asia initiative have been critical to improving our ability to predict air quality by contributing to advances in models (Carmichael and Ueda, 2008). However, these previous studies focused on the evaluation of models using observations mostly from a surface network (EANET). Lack of observations in the free troposphere could be a missing component in model evaluation and intercomparison. Our intercomparison study focuses on model evaluation using extensive aircraft observations of major air pollutants and their precursors to address the formation of inorganic aerosols by gas/particle phase partitioning, chemical formation of organic aerosols, and several other issues raised from the campaign.
During the KORUS-AQ campaign, observations of major air pollutants in South Korea, including tropospheric O 3 , fine particulate matter (PM), and their precursors, showed several high episodes exceeding their air quality standards (NIER and NASA, 2017). For example, maximum 8-h average O 3 mixing ratios in surface air frequently exceeded the Korean 8-h air quality standard of 60 ppbv during the campaign (Peterson et al., 2019). High O 3 levels were observed in airborne and O 3 sonde observations, showing campaign average values of 75.8 ppbv at 700 hPa (Miyazaki et al., 2019). These high O 3 levels were accompanied with high precursor mixing ratios, especially aromatic volatile organic compounds (VOCs), which accounted for a large portion of total OH reactivity in the Seoul Metropolitan Area (SMA; Simpson et al., 2020). PM 2.5 (PM diameter 2.5 mm) concentrations in surface air are typically high in South Korea in winter (Jeong and Park, 2017). Therefore, high PM 2.5 episodes and their driving factors were not the prime interest of the campaign. However, daily values at surface sites reached 90 mg/m 3 during May 26-31, when frontal passages were associated with foreign inflow of pollutants to the Korean peninsula (Peterson et al., 2019). The exceedance of the 24-h national standard of 35 mg/m 3 occurred not only during the transport period but also during several other episodes under stagnant conditions during the campaign.
Observed aerosol chemical composition averaged at six ground sites (Bangnyung, Bulkwang, Olympic park, Gwangju, Ulsan, and Jeju) showed that inorganic salts and carbonaceous aerosols contributed to PM 2.5 by 53% and 47%, respectively. Similar chemical compositions of PM were also shown in airborne observations over the SMA within the boundary layer in which 55% and 43% of PM 1 (PM diameter 1 mm) were composed of inorganic and carbonaceous aerosols, respectively . Organic aerosols were the main component of carbonaceous aerosols with a minor contribution of black carbon aerosols (<8%; Jordan et al., 2020).
Besides the abovementioned findings, there are many additional post-mission analyses that help improve our understanding of the various factors contributing to air quality in South Korea. With the basic understanding of the campaign provided by observations, we here use an updated KORUS emissions inventory from Woo et al. (n.d.) as an input to several air quality models to simulate gas and aerosol species and further evaluate the emissions, chemistry, and physical processes that affect model performance. The model evaluation and multi-model intercomparison gives us an opportunity to better understand the differences among models, which can be used as a stepping stone to improve our scientific understanding Art. 9(1) page 2 of 29 Park et al: Multi-model inter-comparisons for the KORUS-AQ campaign on the issues associated with air quality simulations in the Korean peninsula and broadly in East Asia.

Method
Six regional and two global CTMs participated in this MICS (  (Figure 1), available data coverage differed. GEOS-Chem used a nested domain with 0.25 Â 0.3125 (latitude Â longitude) spatial resolution for East Asia with boundary conditions from a 2 Â 2.5 global simulation. CAM-Chem was used to conduct a global simulation with 0.47 Â 0.63 spatial resolution. Also note that the Iowa and UCLA WRF-Chem used the smallest domain with the finest horizontal resolution (4 km), which is a nested one within the outer domain with a 20 km horizontal resolution, covering East Asia. CMAQ also used a nested simulation (9 km) with an outer domain of 27 km resolution covering East Asia. Hourly mean simulated gas and aerosol concentrations were used for the multi-model intercomparisons and comparisons with observations at surface sites.

Model description
Tables 1 and 2 summarize the participating models with brief descriptions of their driving meteorology, boundary conditions, wildfire emissions inventories, biogenic emissions schemes, and grid resolutions of the simulations. We allowed all models to choose their own options for meteorological fields and natural emissions. Therefore, despite the use of the same anthropogenic emissions, a difference in total emissions exists between models, which is discussed in Section 2.2. Table 3 summarizes gas-phase chemistry mechanisms, aerosol thermodynamics/microphysics modules, and secondary organic aerosol (SOA) schemes used in the models. As shown in the table, substantial differences in terms of the number of chemical species and complexity of chemical schemes exist between models. For example, the GEOS-Chem Tropchem mechanism has the largest number of kinetic reactions (713) and chemical species (220), while other schemes typically have 100-300 reactions, and some hydrocarbon mechanisms, such as the one used in Iowa WRF-Chem, were even more simplified for computational efficiency. Although complex gas phase chemistry is embedded within GEOS-Chem, the Tropchem mechanism, along with SAPRC99 used in CAMx, both lack detailed aromatic chemistry, compared to other hydrocarbon mechanisms such as the MOZART and RACMbased schemes, used in CAM-Chem and WRF-Chem (Carter, 2000;Ahmadov et al., 2012;Knote et al., 2014;Emmons et al., 2020). As for the inorganic aerosol thermodynamics calculation, three different modules were used including ISO-RROPIA, Multicomponent Equilibrium Solver for Aerosols (MESA), and Model for an Aerosol Reacting System (MARS). ISOROPPIA computes gas-particle partitioning of nitric acid and ammonia (Fountoukis and Nenes, 2007) and MESA solves the solid-liquid equilibria within each sectional aerosol size bin (Zaveri et al., 2008). MARS calculates the thermodynamic equilibrium of sulfate, nitrate, and ammonium (Grell et al., 2005).
For aerosol microphysics simulations, most models except for GEOS-Chem used either a sectional or modal approach. WRF-Chem used either the Model for Simulating Aerosol Interactions and Chemistry (Zaveri et al., 2008) or Modal Aerosol Dynamics Model for Europe (Ackermann et al., 1998). CAMx employed the Coarse/Fine scheme with two static modes. CAM-Chem and CMAQ used the Modal Aerosol Model with 4 modes (Liu et al., 2016) and AERO5 (Foley et al., 2010), respectively, and GEOS-Chem used a bulk aerosol scheme.
Three types of SOA schemes were used in the models: a simplified scheme developed by Hodzic and Jimenez (2011), the 2-product approach (Odum et al., 1996), and the volatility basis set (VBS) approach (Donahue et al., 2006;Stanier et al., 2008). In the atmosphere, SOA can be formed by the oxidation of parent hydrocarbons, creating oxygenated semi-volatile compounds that either partition to the particle phase or undergo continuous oxidation. However, due to the complexity of various parent hydrocarbons, the simplified SOA scheme assumes that  The third column indicates the total number of vertical levels and the number of levels below 1.5 km inside parentheses.   (Mao et al., 2012;Fisher et al., 2016;Marais et al., 2016) ISORROPIA II (Fountoukis and Nenes, 2007) þ Bulk scheme Anthropogenic and biomass burning precursors scaled by CO emission þ biogenic precursors scaled by isoprene, monoterpene emissions þ simplified SOA with no VBS  CAM-Chem (NCAR) MOZART-T1  N/A þ MAM4 (Liu et al., 2016) Aromatics, isoprene, terpenes, S/IVOCs þ MAM4 with 5-bin VBS  CAMx (Ajou) SAPRC99 (Carter, 2000) ISORROPIA I (Nenes et al., 1998) þ CF scheme Aromatics, isoprene, terpenes þ the original Secondary Organic Aerosol Partitioning (SOAP) (Strader et al., 1999) WRF-Chem (NCAR) MOZART-4 þ updates from Knote et al. (2014) MESA þ MOSAIC (Zaveri et al., 2008) Anthropogenic and biomass burning precursors scaled by CO emission þ simplified SOA with no VBS  þ 2-product approach for biogenic SOA (Shrivastava et al., 2011) WRF-Chem (PNU) RACM-ESRL  MARS (Grell et al., 2005) þ MADE (Ackermann et al., 1998) Alkanes, alkenes, aromatics, cresol, isoprene, terpenes a lumped SOA precursor is emitted from the same sources as CO, and it is converted irreversibly to SOA with a fixed lifetime (approximately 1 day; Cubison et al., 2011;Hodzic and Jimenez, 2011;Hayes et al., 2015;Kim et al., 2015;Shrivastava et al., 2017). The 2-product approach, which was developed by Odum et al. (1996), requires speciated parent hydrocarbons that undergo oxidation and produces two SOA surrogates that partition (gas to particle) using yields from chamber experiments. Finally, the VBS approach (Donahue et al., 2006;Stanier et al., 2008) is the most complex scheme, which divides the semi-volatile oxidation products into several bins according to their volatilities to represent the continuous oxidation process of SOA.
In this intercomparison, we do not intend to judge individual schemes used in models or whether one scheme performs better than others. Instead, we try to understand the issues present with current schemes and configurations used in air quality simulations and gain some insights to improve them based on the findings.

Emissions
All models used an identical anthropogenic emissions inventory (KORUSv5), which was developed by Konkuk University for the campaign. Details on the inventory can be found elsewhere (Woo et al., n.d.). The inventory includes area, point, mobile, and ship emissions of species such as carbon monoxide (CO), nitrogen oxides (NO x ¼ NO þ NO 2 ), ammonia (NH 3 ), sulfur dioxide (SO 2 ), and VOCs. Annual total emissions of individual species from the KORUS version 1 (v1) and KORUS version 5 (v5) inventories are summarized for East China and South Korea in Table 4. There are significant changes in the emissions of a few species from v1 to v5 inventories. For example, the anthropogenic CO and NO x emissions in South Korea were increased by factors of 2.5 and 1.4, respectively, but they were decreased by about 25% in East China. A more dramatic decrease up to 70% is shown for SO 2 in East China. On the other hand, anthropogenic VOCs emissions were generally increased in both East China and South Korea from v1 to v5, especially for aromatic species, which were shown to be higher by factors of 1.5-2.3 in the v5 inventory compared to those of v1. One thing to note is that participating models used their own emission processors for chemical speciation and imposing diurnal variation of species emissions, which could result in differences of air quality simulations between the models despite the use of the same emissions inventory Goldberg et al., 2019).
After the campaign, various modeling analyses using 3-D CTMs and 0-D box models, combined with remote sensing and in situ measurements, were applied to constrain the anthropogenic emissions in South Korea. Miyazaki et al. (2019) and Goldberg et al. (2019) used a top-down method accompanying satellite retrievals and CTMs, and Oak et al. (2019) conducted model evaluation against airborne data to examine bottom-up emission estimates for South Korea. These studies suggested 40-50% and 83% increases in NO x and CO emissions in South Korea, respectively. Updates in KORUSv5 emissions were in part based on these studies. All the rationales and supporting information for the updates of the KORUS inventory are available in Woo et al. (n.d.).
Participating models used either Model of Emissions of Gases and Aerosols from Nature (MEGAN) versions 2.04 or 2.1 (Guenther et al., 2012) for biogenic emissions of isoprene, terpenes, and other VOCs. However, each model used its own vegetation map and meteorology to Individual emissions for NO and NO 2 are provided in the inventories and the sum is used to represent NO x .
Art. 9(1) page 6 of 29 Park et al: Multi-model inter-comparisons for the KORUS-AQ campaign determine the emissions, so there were differences in the resulting isoprene emissions among the models. Different biomass burning emissions inventories were also used in the participating models. Wildfire emissions were generally much lower than anthropogenic emissions over South Korea during the campaign (Tang et al., 2019), but there existed a short period (May 18) when the influence of Siberian fires reached the Korean peninsula Peterson et al., 2019). Among the three different biomass burning inventories (GFED4, FINNv1.5, QFEDv2.4) used in the models, GFED4 had the largest CO emissions in South Korea (0.187, 0.039, and 0.068 Tg/yr, respectively). CO emissions from each inventory in eastern China and eastern Russia ranged from 0.8 to 1.6 Tg/yr and 33 to 74 Tg/yr, with GFED4 also being the largest and FINNv1.5 being the smallest in both regions. Emissions of primary organic aerosols (POA) and black carbon (BC) were also smallest in FINNv1.5, by factors of 2-10 for POA and 3-12 for BC, compared to those of GFED4 and QFEDv2.4 in the eastern Eurasia domain. POA and BC emissions from QFEDv2.4 were especially high compared to other inventories in eastern Russia. However, this difference had a minimal influence on the intermodel variability of PM simulations in South Korea among the participating models because of limited transboundary transport from Siberia into the peninsula.

Observations
We used observations from surface sites and the DC-8 aircraft during the campaign to evaluate the air quality simulations. Figure 2 shows site locations of the AirKorea network managed by NIER, affiliated with the Korean Ministry of Environment, which conducts regular measurements of six regulatory air pollutants including CO, O 3 , NO 2 , SO 2 , PM 2.5 , and PM 10 (PM diameter 10 mm). This monitoring network provides hourly volume mixing ratios of gaseous pollutants and PM mass concentrations across the country, particularly focusing on urban air quality to which a large population is exposed. Therefore, most sites are situated in or around the major cities. However, because the AirKorea observations do not provide information on the chemical composition of PM, we used submicron particulate species (PM 1 ) measurements conducted by the Korea Institute of Science and Technology (KIST), located in Seoul. Chemical compositions of non-refractive PM 1 including sulfate (SO 4 2-), nitrate (NO 3 -), ammonium (NH 4 þ ), and organic aerosol (OA) were observed and analyzed using a high-resolution time-of-flight aerosol mass spectrometer (HR-ToF-AMS; DeCarlo et al., 2006) and the positive matrix factorization (PMF) method by Kim et al. (2018). BC mass concentrations were also measured using a multi-angle absorption photometer (Petzold and Schönlinner, 2004) by KIST.
During the KORUS-AQ campaign, 20 research flights (RFs) of the NASA DC-8 aircraft were performed in the daytime (08-16 local standard time [LST]) and conducted extensive measurements of ambient concentrations of trace gases and aerosols over South Korea with high spatial and temporal resolutions (NIER and NASA, 2017;Crawford et al., n.d.). In order to focus on general characteristics during the campaign, we excluded observations from two flights (RF7 and RF18) that were designed for frontal cloud profiling and point source surveying. We also excluded observations near Daesan ( Figure 2) from all flights to avoid influences from large power plant point sources (36.4-37.15 N, 126-126.88 E) in the comparisons below. Here we used 60-s averaged airborne observations of gaseous species including CO, O 3 , NO x , HNO 3 , H 2 O 2 , PAN, ANs (¼ peroxy þ alkyl nitrates), VOCs, and HCHO for model evaluation. Table 5 summarizes instruments used for the measurement of each species.
For airborne observations of non-refractive PM 1 , we used 60-s averaged HR-ToF-AMS (DeCarlo et al., 2006) observations of particulate phase SO 4 2-, NO 3 -, NH 4 þ , and OA conducted by University of Colorado at Boulder. Detailed descriptions of AMS measurement methods and PMF analysis used to determine the aerosol composition can be found in Nault et al. (2018). BC mass concentrations within the size range of 100-500 nm were measured using the Humidified Dual-Single Particle Soot Photometer (HD-SP2) by the NOAA Chemical Sciences Laboratory .

Model evaluation
We here conduct model evaluations by using various observations from the surface network and DC-8 aircraft during the KORUS-AQ campaign. This evaluation allows us to better understand issues with models used for the air quality simulations. Relative contributions to air quality in South Korea by local and transboundary pollution influences were predominantly determined by synoptic meteorology (Peterson et al., 2019). The campaign period can be grouped into four distinct synoptic regimes: dynamic weather (May 1-16), stagnant period (May 17-22), transport period (May 24-31), and blocking  Figure 3, all the simulated results with different spatial resolutions were first regridded to 0.5 resolution, then averaged to obtain a single 2-D gridded data set to represent the ensemble model.
For comparisons with the surface and aircraft observations in Figures 4 and 7-11, we used individual model results with different spatial resolutions by sampling simulated values from the closest grid box to the observed location. Model results were also sampled along the DC-8 flight track at the observed hour so that simulated values are coherent in time and space with observations. Here the ensemble model is the arithmetic mean of all models with no regridding or interpolation. For model evaluation, we use three statistical metrics: normalized mean bias (NMB), Pearson correlation coefficient (R), and root mean square error (RMSE), which are defined below. (1) where the overbar ( -) is sample mean, and The spatial distributions of observed O 3 mixing ratios are generally well captured by the model ensemble (R ¼ .55), which clearly shows negative spatial correlations between simulated NO 2 and O 3 . The models tend to underestimate the observations in part owing to the coarse spatial resolutions to simulate concentrated plumes from urban areas where most sites are located. The simulated low biases appear to be smaller for secondary pollutants relative to the primary; this is especially evident for CO, which was known to be severely underestimated in CTMs in South Korea (Huang et al., 2018;Tang et al., 2019;Gaubert et al., 2020;Lee et al., 2020). This low bias in simulated CO (NMB ¼ -47%) indicates that there are additional missing sources of CO in the emissions inventory even though the domestic anthropogenic CO emissions have increased by a factor of 2.5 in the KORUSv5 inventory relative to that of the KORUSv1 inventory. The model ensemble reproduced observed SO 2 mixing ratios with fair spatial correlation (R ¼ .74), capturing high values from major power plants in the Daesan, Yeosu, and Ulsan areas, but in general showed an underestimation (NMB ¼ -27%). We can see large discrepancies in PM 10 concentrations between the model ensemble and the observations, indicating a large uncertainty in simulating primary dust aerosols in models (Huneeus et al., 2011;Jeong and Park, 2018). Figure 4 shows the time series of daytime O 3 and 24h mean PM 2.5 concentrations averaged at AirKorea surface sites located in Seoul, Busan, Incheon, Gwangju, Yeosu, and Gangwon. The four different synoptic regimes are indicated in colored shadings. Simulated results from the participating models using KORUSv1 and KORUSv5 emissions are shown, although not all participating model results with the KORUSv1 emissions were available. From this comparison, a few notable points are found as follows:  19-24, due to underestimations of O 3 precursors (i.e., NO x , aromatics) and OA. The models are much closer to observations with the KOR-USv5 emissions, showing smaller biases (-7% and À22%, respectively) between the ensemble model and observations, although intermodel variability is larger with the KORUSv5 emissions relative to that with the KORUSv1 emissions.

Evaluation for aircraft observations
The DC-8 collected observations on 20 flights during the campaign. Each flight was designed based on the air quality forecasts to examine characteristics of air pollutants either emitted domestically or transported from outside South Korea. We divided individual flight observations and model results into four different synoptic regimes as discussed earlier to separate local and foreign influences. For model evaluation of vertical profiles (Figures 7-9) using DC-8 observations, we used data collected within the SMA region (37-37.6 N, 126.6-127.7 E), and statistics for individual models are summarized in Table 6. A total of 55 missed approaches were repeatedly conducted over this region, which provided more representative information on the atmospheric conditions during the campaign. Figure 5 shows simulated O 3 mixing ratios from the model ensemble using KORUSv5 emissions for different synoptic regimes compared with DC-8 observations averaged below 1.5 km. DC-8 observations and the ensemble mean of the model results that were sampled along the DC-8 flight track were both regridded and interpolated to 0.25 horizontal resolution to avoid overlapping. Both observed and simulated O 3 mixing ratios showed highest values for the transport period and relatively low values during the dynamic weather when frequent clouds and precipitation suppressed photochemical production. We can also see relatively high observed O 3 levels during the stagnant and blocking periods throughout the peninsula. However, the ensemble model tends to underestimate the DC-8 observations by 24% and 7% during the two periods, although no significant biases were noted in the model evaluation against the surface observations above. We further discuss this issue in Section 3.2.1. Figure 6 compares simulated PM 1 concentrations from the model ensemble and DC-8 observations below 1.5 km averaged for each synoptic regime. As was discussed in Figure 5, we can also see clear differences in PM 1 concentrations in South Korea depending on the synoptic meteorology such that the transport period showed the highest PM 1 concentrations, whereas the lowest PM 1 concentrations were shown in the dynamic weather period, suggesting that the meteorology plays an important role in determining air quality in South Korea. The ensemble model generally captures the spatial distribution of observed PM 1 during the dynamic (R ¼ .47) and transport (R ¼ .53) periods. We see a strong west-east gradient of PM 1 concentrations during the stagnant period, which is consistently shown in both DC-8 observations and the model ensemble. Predominant easterlies transported air pollutants to the Yellow Sea during this period. During the blocking period, we can see more concentrated PM 1 concentrations mostly around the SMA with high emissions from mobile and industrial point sources. The ensemble model tends to pick up the local emissions to some extent, but generally underestimates observations, which calls for a more indepth analysis on the chemical components of PM, which is discussed in Section 3.2.4. Here we selected a few important species including CO, O 3 , NO x , toluene, isoprene, and HCHO, which are essential for understanding O 3 chemistry in the peninsula . As discussed above, individual flight observations and coherently sampled model results were averaged for four periods based on the synoptic patterns during the campaign. CO is a primary pollutant and a good indicator for pollutant emissions from fossil and biofuel burning. In the surface air evaluation, the ensemble model underestimated observed CO mixing ratios in surface air (Figure 3) in South Korea. Most models show an underestimation in comparison to the DC-8 observations, especially in the boundary layer (<1.5 km) regardless of synoptic patterns, indicating that the current emissions inventory may have a missing source of CO in the peninsula. This low bias in the boundary layer is largest in the transport period, when transboundary transport of Chinese emissions heavily influenced the peninsula, and the bias implies that the KORUSv5 emissions may also have some missing sources for CO in China. Gaubert et al. (2020) used MOPITT satellite retrievals to optimize emissions in the CAM-Chem model, starting with the KORUSv5 inventory, and suggested 33% and 80% increases in CO emissions for central and northern China, respectively.
Simulated ensemble CO mixing ratios in the free troposphere do not show any significant biases from the observations, but individual models show different signs in biases from the observed profile. We find that two WRF-Chem simulations by Iowa and UCLA, which used identical chemical boundary conditions generated from the MOZART-4 global model, show higher CO mixing ratios in the free troposphere relative to the observations. Considering the relatively long lifetime of CO, the large variability of simulated CO mixing ratios in the free troposphere could be related to lateral boundary conditions of CO, especially for regional models (Tang et al., 2007). CO profiles from the MOZART-4 boundary conditions were found to be higher than the boundary conditions used in other models by at least 40 ppbv in the free troposphere ( Figure S1).
We also find that the planetary boundary layer (PBL) heights in the models could result in notable differences in simulated CO mixing ratios. Figure S2 compares the simulated versus the lidar-derived diurnally varying PBL heights at Seoul National University averaged for the campaign. Some models (CAM-Chem, CAMx, PNU WRF-Chem) overestimate daytime PBL heights relative to the lidarderived value by 39-60% and simulate lower surface CO mixing ratios by approximately 20% compared to the models with low daytime PBL heights (NCAR WRF-Chem, Iowa WRF-Chem, UCLA WRF-Chem, CMAQ). In addition, biogenic and biomass burning sources and the chemical formation of CO along with different loss rates  by the OH oxidation could contribute to the variability of simulated CO mixing ratios among the models. Tropospheric O 3 , especially in surface air, is an important air pollutant. In East Asia, seasonal O 3 levels are highest in spring (Li et al., 2007). During the campaign, we observed that synoptic meteorology had a distinct influence on O 3 variations in the boundary layer with relatively low O 3 levels during the dynamic weather and high O 3 levels during the stagnant and transport periods (Peterson et al., 2019). Despite this variation, the observed mean O 3 mixing ratios from the DC-8 are always higher than 60 ppbv, which is the 8-h average O 3 air quality criteria in South Korea. This is somewhat different from what the surface observations showed in Figure 4, that the observed daytime O 3 mixing ratios in surface air were often recorded higher than 60 ppbv but not for the whole time. We will discuss this issue of disparity in Section 4. Figure 7 compares observed versus simulated O 3 profiles up to 4 km averaged for different synoptic regimes. We find that the model ensemble is systematically lower than the DC-8 observations throughout the whole troposphere. The magnitudes of low biases in the model ensemble slightly vary with different periods but exist for the whole campaign period. We find that simulated O 3 profiles are also dependent on the lateral boundary conditions. The PNU and UCLA WRF-Chem simulations used the same chemical mechanism (RACM-ESRL) with similar model configurations, but they tend to be on opposite extremes of free tropospheric O 3 , driven by different lateral O 3 boundary conditions. O 3 boundary condition profiles used in the UCLA WRF-Chem were approximately 10 ppbv higher throughout the troposphere than in the PNU WRF-Chem ( Figure S1). Also, as the NCAR WRF-Chem used CAM-Chem results for their boundary conditions, similar profiles are shown in the free troposphere among the two models. Figure 7 also shows comparisons of important O 3 precursors including NO x and VOCs in South Korea. Simpson et al. (2020) showed using VOCs observations during the KORUS-AQ campaign and their OH reactivity that isoprene and toluene are two of the most important VOCs in the SMA. The comparisons of the model ensemble with observed profiles of O 3 precursors, including NO x , toluene, and isoprene, during the KORUS-AQ campaign do not show any significant systematic biases (-10 to 12%) in the SMA (Figure 7, Table 6). However, a close investigation shows discrepancies between the models and the observations for the O 3 precursors. We will address this issue in detail in Section 4. ), nitrate (NO 3 -), and ammonium (NH 4 þ ) aerosols, for four different periods during the campaign. Chemical components of PM 1 were measured onboard the DC-8 using the AMS and HD-SP2 and used here for the model evaluation. We defined simulated and observed PM 1 as the sum of the above five chemical components. Natural and anthropogenic dust and sea salt aerosols were not considered in this work. During the campaign, there were a few days with high soil dust concentrations transported from the Gobi Desert and nearby arid regions in China. However, not all models simulated natural soil dust aerosols, and therefore, this work does not include soil dust aerosols in the evaluation.
BC is one of the important PM components in East Asia and is mainly emitted from incomplete combustion (Bond et al., 2013). BC emissions from East Asia amount up to 30% of global anthropogenic BC emissions (Bond et al., 2004), with 91% originating from China (Zhang et al., 2009). Major sources of BC are fossil fuel and biofuel use in East Asia. Although the majority of Chinese BC emissions is from residential and industrial sectors, emissions from transportation are dominant in South Korea. In particular, diesel vehicle registrations in South Korea were 47.1% of total vehicles in 2014, and on-road diesel emissions accounted for roughly 60% of total BC emissions from transportation in 2015 (Anenberg et al., 2019). From KORUSv1 to KORUSv5, there was an 80% increase in domestic BC emissions and a 35% decrease in Chinese BC emissions.
We find that the model ensemble tends to overestimate observed BC concentrations, especially in the boundary layer for all four periods (18-44%). The discrepancy slightly varies with different synoptic patterns and is largest during the blocking period. The ensemble model using KORUSv1 emissions showed a 34% underestimation of BC in the boundary layer, which implies a possible overestimation of BC emissions in South Korea in the KORUSv5 inventory.
Simulated PBL heights also play an important role in simulated BC concentrations. In particular, the intermodel variability of BC concentrations in the boundary layer reflect that of PBL heights as shown in Figure S2. For example, the regional models simulating relatively high BC concentrations (NCAR WRF-Chem, Iowa WRF-Chem, UCLA WRF-Chem, CMAQ) show relatively lower PBL heights during daytimes compared to other models. Similarly, the models with high PBL heights, PNU WRF-Chem, CAM-Chem, and CAMx, simulate low BC concentrations. Although the variation of PBL heights among the models does not entirely explain the inter-model variability of simulated BC concentrations, the effect of PBL heights on BC is more noticeable than that of CO. All models simulated both POA and SOA explicitly. The sum of POA and SOA concentrations was defined as OA mass concentrations, which was compared with the observed AMS OA collected by the DC-8. All models used the same anthropogenic POA emissions but different biomass burning emissions, which accounted for less than 20% of total South Korean POA emissions during the campaign. We find in the comparison that simulated total OA concentrations in most models fall within the observed 10-90 percentile range but with systematic low biases, which have been addressed in previous studies (Heald et al., 2005;Heald et al., 2011). Figure 8 reveals that the large OA variability among the models is mainly due to different treatments of SOA formation from its precursor species, which are summarized in Table 3.
Inorganic sulfate-nitrate-ammonium (SO 4 2--NO 3 --NH 4 þ ) aerosols were the largest contributor to PM 2.5 concentrations during the campaign in South Korea . It appears that the model ensemble generally well reproduces the profiles of observed inorganic aerosols for the stagnant and transport periods but shows overestimation during the dynamic weather period and underestimation during the blocking period. The discrepancy for the dynamic period could be associated with the treatment of wet scavenging processes in models . During the blocking period, observed SO 4 2is relatively well captured by the ensemble model (NMB ¼ À8%), but NO 3 and NH 4 þ aerosols are underestimated by 38% and 26%, respectively. One thing to note is that the discrepancy between the model ensemble and the observations is a lot smaller than the inter-model variability.
During the dynamic weather period, inorganic components of PM 1 are overestimated and OA is underestimated by the ensemble model, resulting in a slight overestimation (1%) of observed PM 1 concentration in the boundary layer. The ensemble model captures the enhancement of observed DC-8 PM 1 concentrations during the transport period. From this, we can infer that the models were able to simulate the meteorological conditions that caused eastward transport of pollutants to the Korean peninsula.
Overall, the model performances for the stagnant and blocking periods show underestimations of PM 1 (-33% and -41%, respectively) below 1.5 km. During the stagnant period, underestimation of PM 1 is mostly due to low biases in simulated OA, whereas the underestimation of OA and inorganic aerosols both play a larger part during blocking. We also sampled AirKorea surface PM 2.5 observations from the closest points along the DC-8 flight tracks below 1 km so that they are coherent with the DC-8 observations in both time and space. Figure 8d shows that collocated AirKorea ground observations of PM 2.5 during the blocking period show a large difference from the DC-8 observations at 0.5 km, where the aircraft observation is much higher than that in surface air. This is likely caused by the mismatch between the aircraft flight tracks versus the PM 2.5 monitoring stations (Figure 3e) located in major industrial source regions in the SMA, which is further discussed in Section 3.3.

Evaluation for reactive nitrogen partitioning
Total reactive nitrogen (NO y ), defined as the sum of NO x and its oxidation products (NO y ¼ NO x þ HNO 3 þ particulate nitrate þ PAN þ ANs þ etc.), is also an important species in tropospheric chemistry. NO y partitioning between NO x and its reservoirs has an important implication for O 3 and aerosol formation (Zellweger et al., 2003). Models were able to reproduce observed NO y mixing ratios but failed to accurately simulate its partitioning among reactive nitrogen compounds. In this section, we compare the observed and simulated vertical profiles of NO y species partitioning for different synoptic conditions during the campaign. Figure 9 shows the simulated and observed profiles of each nitrogen compound ratio to NO y , including NO x , HNO 3 , particulate nitrate (pNO 3 ), PAN, and ANs. First, we find that the model ensemble overestimates the observed NO y mixing ratios in the boundary layer by 7% during the whole campaign ( Figure S3) but well captures the gradual increase of observed NO y from May 1 to June 10 (not shown).
The observed NO x /NO y ratio is in the range of 0.1-0.8, which is largest at the surface and generally decreases with altitude. The models relatively well capture the observed ratio in the low troposphere but generally underestimate it in the free troposphere (>3 km), implying too rapid conversion of NO x to its oxidation products, particularly HNO 3 .
We find that the discrepancy between the models and the observations is quite large for the HNO 3 /NO y ratio. Most models tend to overestimate the observed HNO 3 fractions (0.1-0.3) by factors of 2-3. This high bias is in general large in the free troposphere, which was also found in the model evaluation against the observations from the TRACE-P and ACE-Asia campaigns in East Asia (Carmichael et al., 2003;Tang et al., 2004). Figure S3 shows that simulated HNO 3 mixing ratios show systematic high biases compared to HNO 3 measured using the California Institute of Technology Chemical Ionization Mass Spectrometer (CIT-CIMS) instrument, used in Figure 9. The CIT-CIMS measures gas phase HNO 3 through selective ion chemical ionization with an uncertainty of 40% (Crounse et al., 2006). In situ HNO 3 was also measured by the University of New Hampshire Soluble Acidic Gases and Aerosols (SAGA) instrument onboard the DC-8 (Dibb et al., 2003). The SAGA HNO 3 measurements are likely to have a considerable enhancement due to the contamination by submicron aerosol nitrate (McNaughton et al., 2009). When simulated HNO 3 is compared to the SAGA measurements, the NMB of the ensemble model is reduced to -24%, relative to the comparison with the CIT-CIMS HNO 3 (159%).
A large fraction of NO y is composed of organic nitrogen compounds (PAN and ANs) in the observations during the KORUS-AQ. Especially the observed PAN fraction amounts up to 0.5 above in the free troposphere, indicating large availability of oxidized VOCs. The model ensemble tends to underestimate the observed PAN fractions with large intermodel variability. We also find that the models in general underestimate the ANs fractions. The ensemble model underestimates ANs by 39%, which only make up 3% of total NO y , while the ANs contribute up to 6% of observed total NO y . In particular, models with relatively coarse resolutions showed large underestimations of ANs ( Figure S3). The underestimation of the organic nitrogen fraction may imply the importance of reactive VOCs in NO y partitioning and NO x -HO x recycling. For example, more detailed aromatic chemistry mechanisms increased the conversion of NO x to organic reservoirs such as PAN and ANs, by the oxidation of aromatic VOCs (Oak et al., 2019). As different VOC chemistry mechanisms are used in each model, large inter-model variabilities are noticeable in simulated organic nitrogen fractions.

Evaluation for aerosol compositions
Variations in the aerosol chemical composition during the campaign were distinct among different synoptic patterns, which were examined by previous studies to explain the elevations in PM loadings. Kim et al. (2018) compared the temporal variations of inorganic/organic aerosols and their precursor gases at the KIST ground site in Seoul. They found that the organic portion rapidly increased in May 20-24 under stagnant conditions with high temperature and strong radiation, and high levels of VOCs made a favorable environment for secondary formation of OA. They also concluded that during the high loading episode (May 26-31) when relative compositions of inorganics were dominant, direct eastward transport from SO 2 source regions was combined with local secondary formation of nitrate. Jordan et al. (2020) also provided a detailed analysis of aerosol observations and revealed that the majority of fine PM (PM 2.5 or PM 1 ) in South Korea was composed of secondary aerosols including inorganic sulfate-nitrateammonium and OA. The information of PM chemical composition is important to develop an efficient air quality policy to reduce PM concentrations in South Korea. Here we evaluate the models focusing on this aspect by comparing the simulations against the AMS observations aboard the DC-8 and in surface air during the campaign. Figure 10 compares the simulated and observed PM 1 chemical compositions along the DC-8 flight tracks (Figure 6) below 1.5 km over the SMA. We here defined the hydrocarbon-like component of OA as POA, and the oxidized component as SOA, based on PMF analysis by Nault et al. (2018). The observed PM 1 mass concentrations vary with the meteorological regimes as we discussed earlier that the highest PM 1 concentration occurred in the transport period and the lowest concentration was observed in  Figure S4). This observed variability is generally captured by the model ensemble, but it fails to reproduce high levels of PM concentrations in the blocking period, during which DC-8 observations were considerably affected by domestic emissions from major industrial complexes located in the southern part of the SMA ( Figure S4). The observed SOA concentrations and toluene, one of the main parent hydrocarbons, show relatively high values in this area, where PM 1 concentrations are also high. Although these local sources are included in the present inventory, top-down estimations using HCHO show that toluene emissions in the KORUSv5 inventory are underestimated from these facilities (Fried et al., n.d.; Kwon et al., n.d.).
As to the chemical composition evaluation, we find a few issues as follows: 1. The models in general overestimate POA and underestimate SOA throughout the whole campaign ( Figure 10). 2. Enhancements of the organic fraction during the stagnant and blocking periods are not captured by most models, especially for SOA.
3. Models that use the simplified SOA scheme (M1, M4, M6) that scales CO emissions to estimate SOA precursors generally underestimate local SOA production, which is in part due to CO underestimation. 4. Models that use the VBS scheme (M2, M5, M7) generally simulate higher SOA concentrations than those of other schemes, resulting in closer agreement with observations, but still underpredict SOA concentrations during the stagnant and blocking periods. 5. The models successfully reproduce increases of the inorganic fraction (especially sulfate) during the transport period. 6. When the models appear to be lower than the observations, it is mainly because of the OA underestimation.
We conducted a similar PM chemical composition evaluation using observations in surface air. Figure 11 compares the observed and simulated PM 1 chemical composition at the KIST ground site (Figure 2) in Seoul during the campaign. For OA speciation, we defined POA as the sum of hydrocarbon-like OA and cooking OA, and SOA as the sum of oxidized OA, according to Kim et al. (2018). The observations show a similar temporal variability with aircraft observations, being higher in the transport period relative to other periods. Underestimation of SOA during the stagnant period, when local secondary production was likely to dominate OA concentrations, is shown in both the airborne ( Figure  10b) and surface (Figure 11b) data. Nault et al. (2018) and Kim et al. (2018) showed that local emissions and chemistry during the campaign resulted in intense formation of SOA in Seoul. The influence of Siberian wildfires on May 18 also brought a mixture of aged smoke plumes with additional SOA precursors to Korea (Peterson et al., 2019). This implies that the current SOA schemes in the models tend to underestimate SOA formation from precursor emissions.
One notable difference from the DC-8 observation is that the observed PM 1 concentration is much lower in the blocking period than that of the DC-8 observation. Figure  S5 shows that similar temporal variations of surface PM at the KIST site are also observed at the Olympic park ground site, which is located in southeast Seoul. As shown in Figure S4, the KIST and Olympic park sites are located in the northern part of the SMA, where the influence of industrial emissions is relatively small. The models tend to overestimate the ground observations during the blocking period, mainly driven by the overestimation of inorganic aerosols. We find in particular that the nitrate overestimation is the main reason for the model overestimation of the surface observations. The lack of nitrate and ammonium aerosols in CAM-Chem results in an overall underestimation of aerosol mass in comparison to both ground and aircraft observations, indicating the importance of including nitrate aerosol formation in air quality simulations.

Discussion
As was discussed in the surface air evaluation (Figure 4b), the models showed capability in reproducing observed daytime O 3 levels during the campaign with a high correlation coefficient (R ¼ .83) and no significant bias (NMB ¼ 4%) between the model ensemble using the KORUSv5 emissions and the AirKorea observations. However, the model ensemble showed a systematic low bias in simulating DC-8 O 3 profiles throughout the whole troposphere regardless of the synoptic patterns. This underestimation was also shown at the lowest altitude, which seems inconsistent with the result of the surface air evaluation. For comparison, we plotted averaged AirKorea surface O 3 mixing ratios, which were coherently sampled with DC-8 observations below 1 km to examine the gradient of O 3 mixing ratios from the surface to the lowest DC-8 observations. As shown in Figure 7, a sharp O 3 gradient from the surface to the lowest DC-8 altitude is found in the observations and the model ensemble, although the simulated gradient is less than that of the observations. Figure S6 compares O 3 sonde measurements at Olympic park and Taehwa research forest, a downwind site located 30 km apart from Olympic park. O 3 profiles from sonde measurements show sharp declines (approximately 30-40%) below 2 km to the surface, which were also reported in a case study during summer 2016 at Hangzhou, China, showing a 40% decrease within the boundary layer (Su et al., 2017). Observation-constrained box model simulations for the KORUS-AQ period show that O 3 formation, especially through the reaction between HO 2 and NO, decreases and the O 3 destruction rate increases at lower altitudes near the surface driven by the concentrated NO and VOCs that react with O 3 , resulting in an overall decrease of net O 3 production ( Figure S6c). As the box model only considered photochemical loss of O 3 , combined effects of wet or dry deposition of O 3 and its precursors could further lower surface O 3 levels.
Among the models, we found that CAM-Chem reproduced observed O 3 profiles relatively well with no significant low biases above the boundary layer, compared to other models ( Table 6). In order to understand a possible cause for the low bias in the model ensemble, we used the CAM-Chem stratospheric O 3 tracer to estimate stratospheric O 3 contributions in the troposphere during the campaign ( Figure S7). Stratospheric O 3 influx varies with different synoptic regimes and shows generally high contributions during stagnant and blocking periods associated with persistent high pressure systems. We also find significant stratospheric O 3 contributions even to the surface air (15-30 ppbv) from CAM-Chem simulations, which might be overestimated because this stratospheric tracer does not account for rapid titration by high NO x mixing ratios in surface air and dry deposition processes. Because the model top heights are approximately 20 km for the regional models, they did not simulate stratospheric chemistry, so the stratospheric influx of O 3 was determined by the boundary and initial conditions used. Therefore, this additional O 3 source may not be accurately represented in some models depending on the synoptic regimes, which show systematic low biases in simulated O 3 mixing ratios compared with the observations in spring, when frequent intrusions of stratosphere air into the troposphere occur . Nonetheless, Kim and Lee (2010) suggested that springtime O 3 maxima in Korea were more likely to be caused by photochemistry than stratospheric intrusions. We also found that the simulated stratospheric O 3 using a linearized first-order reaction scheme (Linoz) in GEOS-Chem showed similar values to that of CAM-Chem, but showed lower levels in the troposphere, indicating that additional O 3 sources were absent in the model. Schroeder et al. (2020) used an observationconstrained photochemical box model to quantify O 3 production sensitivities to various precursor VOCs in Korea. Souri et al. (2020) used a box model to evaluate the HCHO/NO x column ratio obtained from a remote sensing instrument during KORUS-AQ, which can be used as a proxy for classifying O 3 production regimes. These studies emphasized the role of reactive VOCs, especially aromatics, in O 3 production in the major metropolitan areas in South Korea, which were clearly found to be VOClimited. Therefore, differences in the aromatic chemistry schemes among models are expected to result in considerable differences in O 3 simulation. Although no systematic biases in the model ensemble were found for reactive VOCs, such as toluene and isoprene, as shown in Figure 7, the simulated NO x concentration below 1.5 km was higher by 11% relative to the observations during the campaign. NO x was overestimated by 19% during the stagnant and blocking periods, when local chemistry had a larger influence on O 3 production. Oak et al. (2019) investigated the O 3 production sensitivity to aromatic chemistry in Korea during KORUS-AQ using a 3-D CTM and found that aromatic VOCs oxidation drives more NO to NO 2 conversion, resulting in significant changes in O 3 photochemistry in the boundary layer during 13-16 LST. They found that including detailed aromatic chemistry in the model resulted in a decrease of simulated NO x mixing ratios by 20% while increasing O 3 by 13% below 1.5 km. Therefore, we expect that the biases below 1.5 km in the simulated NO x (11%) and O 3 (-18%) in the model ensemble can be reduced by including detailed aromatic chemical mechanisms (Knote et al., 2014;Porter et al., 2017). Among the models, CAM-Chem and NCAR WRF-Chem used the most complex aromatic chemistry scheme (56 kinetic reactions) from MOZART-T1 described in Knote et al. (2014) and showed smaller model biases (-15%) compared to the ensemble model (-18%) for O 3 below 1.5 km during the whole campaign.

Conclusions
An international air quality field study, KORUS-AQ, which was jointly hosted by the Korean NIER and the U.S. NASA, occurred during May-June 2016, to understand the factors controlling air quality in South Korea (Crawford et al., n.d.). Extensive aircraft and ground network observations in the peninsula and nearby oceans were available from the campaign for which a number of 3-D CTMs were also used on a daily basis to produce up to 5-day air quality forecasts for planning aircraft observations.
Although the forecasts were valuable for identifying pollution plumes and other features targeted for observational sampling, the forecasts sometimes differed between models and were not able to capture the observed magnitudes of aerosols, O 3 , and their precursors. This study addressed the issues associated with air quality simulations using model evaluation against extensive surface and aircraft observations from the campaign as well as intercomparisons between models.
Six regional and two global CTMs participated in the MICS and were used to conduct air quality simulations focusing on O 3 , aerosols, and their precursor species for the campaign using the KORUSv5 anthropogenic emissions inventory. The participating models chose their own options for emissions from biomass burning and biogenic sources.
Relative to the KORUSv1 inventory developed for the campaign, considerable changes were made in the latest KORUSv5 inventory. For example, the anthropogenic CO and NO x emissions in South Korea were increased by factors of 2.5 and 1.4, respectively, but they were decreased by about 25% in East China. A more dramatic decrease up to 70% was shown for SO 2 in East China. On the other hand, anthropogenic VOCs emissions were generally increased in both East China and South Korea from v1 to v5, especially for aromatic species, which were shown to be higher by factors of 1.5-2.3 in v5 inventory compared to those of v1.
Our model evaluation focusing on surface air simulations revealed that the models, using the KORUSv5 emissions inventory, successfully reproduced the observed spatial and temporal variabilities of both O 3 and PM 2.5 concentrations in surface air in South Korea. However, we found a significant low bias for simulated CO mixing ratios in the peninsula and nearby oceans, implying possible missing CO sources in the inventory in East Asia. Peterson et al. (2019) showed that synoptic meteorology played a critical role in determining characteristics of air quality in South Korea and broadly East Asia during the KORUS-AQ campaign. Following Peterson et al. (2019), we grouped DC-8 observations for four distinct synoptic regimes: dynamic weather (May1-16), stagnant period (May 17-22), transport period (May 24-31), and blocking meteorology (June 1-10), during which we conducted the model evaluation by comparing the simulated versus observed profiles of species concentrations primarily in the SMA.
Although the DC-8 aircraft observations of O 3 precursors were well captured by the models, simulated O 3 mixing ratios were lower than the observations in the troposphere regardless of synoptic regimes. The low O 3 bias in the models was in part owing to too low stratospheric O 3 influxes, especially in the middle and upper troposphere and in part owing to insufficient chemical production of O 3 because of a simple representation of the chemistry of aromatic VOCs in most models.
Unlike O 3 , the synoptic meteorology played an important role in determining the observed variability of PM concentrations in South Korea. Highest PM 1 concentrations observed onboard DC-8 occurred in South Korea during the transport period driven by the transboundary transport from China, whereas the lowest values were shown in the dynamic weather period. Despite the underestimation of PM 1 concentrations in the models, we found that the models generally reproduced the observed variability of the PM 1 mass concentrations depending on the synoptic regimes. Each chemical component comprising PM 1 in the models, however, showed some discrepancies from the observations. Observed chemical compositions of both PM 1 and PM 2.5 concentrations from the DC-8 and the surface network, respectively, showed a significant contribution by inorganic SO 4 2--NO 3 --NH 4 þ aerosols followed by carbonaceous aerosols including organic and BC aerosols. The large inorganic aerosol contribution was in general captured by the models. However, the ensemble of models tends to overestimate observed BC concentrations, especially in the boundary layer for all four synoptic periods (18-44%) with the KORUSv5 inventory, implying a possible overestimation of BC emissions in South Korea. We find in this study that the systematic low biases of simulated OA concentrations addressed in previous literature were also shown in the participating models with less Park et al: Multi-model inter-comparisons for the KORUS-AQ campaign Art. 9(1) page 21 of 29 degree of discrepancy from the observations, but with large OA variability among the models mainly due to different treatments of SOA formation from its precursor species. Model deficiencies in simulating organic nitrogen compounds (e.g., ANs) in the boundary layer were also revealed in the model evaluation against DC-8 observations. The underestimation of the organic nitrogen fraction out of total reactive nitrogen (NO y ) might imply the importance of reactive VOCs in NO y partitioning and O 3 formation through NO x -HO x recycling.
From the model evaluation, we found that an ensemble of model results, incorporating individual models with differing strengths and weaknesses, performs better than most individual models at representing observed atmospheric composition for the campaign. Ongoing model development and evaluation, in close collaboration with emissions inventory development, are needed to improve air quality forecasting.

Data accessibility statement
Observational data from KORUS-AQ used in this study can be downloaded in the International Consortium for Atmospheric Research on Transport and Transformation (ICARTT) format through the data archive website (https://www-air.larc.nasa.gov/cgi-bin/ArcView/korusaq). Hourly surface observations from the AirKorea network can be downloaded online through http://www.airkorea. or.kr/web. All data from the participating models for this study are also available through the KORUS-AQ data archive website in the ICARTT format for aircraft data and the netCDF format for hourly surface data.

Supplemental files
The supplemental files for this article can be found as follows: Figure S1. Comparison of average vertical profiles of CO and O 3 mixing ratios from the lateral boundary conditions used in each model. Figure S2. Comparison of mean diurnal profiles of observed (lidar) and simulated PBL height at Seoul National University (SNU; 37.46 N, 126.95 E) during the whole campaign period. Figure S3. Comparison of simulated and observed mean vertical profiles of NO y and its components (HNO 3 , PAN, ANs) in the SMA (37-37.6 N, 126.6-127.7 E) during different synoptic conditions. Figure S4. Model ensemble (background) and observed (overlaid circles) a) PM 1 , b) secondary organic aerosol (SOA) concentrations, c) toluene, and d) HCHO mixing ratios along the DC-8 flight track averaged below 1.5 km in the SMA during the blocking period (June 1-10). Figure S5. Comparison of simulated and observed mean PM 2.5 chemical compositions in surface air at the Olympic park ground site (37.519 N, 127.122 E) for different synoptic regimes during the campaign. Figure S6. Comparison of O 3 sonde observations at a) Olympic park, b) Taehwa research forest (37.280 N, 127.227 E), and c) 0-D photochemical boxmodel simulations along the DC-8 flight track in the SMA below 2 km. Figure S7. Simulated CAM-Chem (magenta) O 3 mixing ratios during KORUS-AQ with stratospheric O 3 (blue) contributions.