Validation report for the 2010 Air Quality Assessment Report
Transcription
Validation report for the 2010 Air Quality Assessment Report
MACC-II Deliverable D_113.3 l Validation report for the 2010 Air Quality Assessment Report Date: 05/2013 Lead Beneficiary: INERIS (#17) Nature: R Dissemination level: PU Grant agreement n°283576 File: MACCII_EVA_DEL_D_113.3_AQ2010-Valid_May2013_INERIS.docx/.pdf Work-package Deliverable Title Nature Dissemination Lead Beneficiary Date Status Authors Approved by Contact 113 (EVA, Assessment reports production and routine validation) D_113.3 Validation reports for 2010 R PU INERIS (#17) 05/2013 Final version L. Rouïl et al. (INERIS) V.-H. Peuch (ECMWF) info@gmes-atmosphere.eu This document has been produced in the context of the MACC-II project (Monitoring Atmospheric Composition and Climate - Interim Implementation). The research leading to these results has received funding from the European Community's Seventh Framework Programme (FP7 THEME [SPA.2011.1.5-02]) under grant agreement n° 283576. All information in this document is provided "as is" and no guarantee or warranty is given that the information is fit for any particular purpose. The user thereof uses the information at its sole risk and liability. For the avoidance of all doubts, the European Commission has no liability in respect of this document, which is merely representing the authors view. 2 / 52 Evaluation Report of the Air quality assessments in Europe for 2010 Edited by Laurence ROUÏL (INERIS) With contribution from the MACC regional modelling teams: CERFACS Emmanuele Emili CNRS/LISA Matthias Beekman, Gilles Foret FMI Mikhail Sofiev, Julius Vira KNMI Henk Eskes INERIS Frédérik Meleux, Anthony Ung Meteo France Virginie Marécal Met.no Alvaro Aldebenito, Anna Carlin-Benedictow RIUKK Hendrik Elbern, Elmar Friese, Achim Strunk SMHI Lennart Robertson TNO Arjo Segers April 2013 1 Table of content 1. 2. 3. 4. 5. 6. 7. Rationale .......................................................................................... 7 Methodology....................................................................................... 9 2.1 Observation datasets ......................................................................9 2.2 Performance indicators.................................................................. 12 2.3 Models list ................................................................................. 14 . Ozone simulations and re-analyses ........................................................ Nitrogen Dioxide simulations and re-analyses.............................................. PM10 simulations and re-analyses ........................................................... PM2.5 simulations and re-analyses .......................................................... Conclusions ...................................................................................... 15 27 33 42 46 2 List of figures Figure 1. Location of the NO2 AIRBASE stations selected for data assimilation (red dots) and validation (green dots) processes ..................................................... 10 Figure 2. Location of the O3 AIRBASE stations selected for data assimilation (red dots) and validation (green dots) processes ............................................................ 11 Figure 3. Location of the PM10 AIRBASE stations selected for data assimilation (red dots) and validation (green dots) processes ..................................................... 12 Figure 4. Taylor diagram representing the performance of the MACC-II/EVA modes to simulate summer daily max of ozone (2010); data assimilated results correspond to the model noted with un “a” index. ................................................................... 15 Figure 5. Taylor diagram representing the performance of the MACC-II/EVA modes to simulate summer daily mean of ozone (2010); data assimilated results correspond to the model noted with un “a” index .................................................................... 16 Figure 6. Statistical scores of the “assimilated ensemble” model results against the AIRBASE validation dataset for the ozone daily maximum over the summer 2010 Bias (a) Correlation coefficient (b) Root mean square error (c) ........................................ 17 Figure 7. Statistical scores of the “raw ensemble” model results against the AIRBASE validation dataset for the ozone daily maximum over the summer 2010 Bias (a) Correlation coefficient (b) Root mean square error (c) ........................................ 18 Figure 8. MACC-II/EVA model responses to simulated ozone daily peaks over summer 2010, for various station typologies: rural (top), suburban(middle), urban (bottom) ..... 19 Figure 9. MACC regional model scores for predicting daily ozone peak over the year 2010 throughout European sub-regions (a) Bias (b) RMSE (c) Correlation coefficient at rural stations ......................................................................................... 21 Figure 10. MACC regional model scores for predicting daily ozone peak over the year 2010 throughout European sub-regions (a) Bias (b) RMSE (c) Correlation coefficient at suburban stations .................................................................................... 22 Figure 11. MACC regional model scores for predicting daily ozone peak over the year 2010 throughout European sub-regions (a) Bias (b) RMSE (c) Correlation coefficient at urban stations ........................................................................................ 23 Figure 12. Number of days (observed in 2010) when the information regulatory threshold for ozone (180 µg/m3 hourly) was exceededed . Classification by European subregions ......................................................................................... 24 Figure 13. Number of days (observed in 2010) when the alert regulatory threshold for ozone (240 µg/m3 hourly) was exceeded . Classification by European sub-regions ........ 24 Figure 14. Capacity of the “assimilated Ensemble” model (ENSa) to reproduce the number of days when the information ozone threshold was exceeded ...................... 25 Figure 15. Contingency graphs for the prediction of exccedances of the information threshold in 2010 by the MACC-II/EVA models. Rural sites (top), suburban sites (middle) and urban sites (bottom) ........................................................................... 26 Figure 16. Bias calculated for the NO2 daily mean in 2010 by the data assimilation systems LOTOS-EUROSa (left) , SILAMa (right) and EURADa (bottom) ....................... 28 Figure 17. RMSE calculated for the NO2 daily mean in 2010 by the data assimilation systems LOTOS-EUROSa (left) , SILAMa (right) and EURADa (bottom) ....................... 28 Figure 18. Correlation coefficien) calculated for the NO2 daily mean in 2010 by the data assimilation systems LOTOS-EUROSa (left) , SILAMa (right) and EURADa (bottom) .. 29 Figure 19. Bias the MACC-II/EVA models to predict daily mean of NO2 concentrations in 2010 for various station typologies: rural (top), suburban (middle), urban (bottom) ..... 30 Figure 20. RMSE the MACC-II/EVA models to predict daily mean of NO2 concentrations in 2010 for various station typologies: rural (top), suburban (middle), urban (bottom) .. 31 3 Figure 21. Correlation coefficient the MACC-II/EVA models to predict daily mean of NO2 concentrations in 2010 for various station typologies: rural (top), suburban (middle), urban (bottom) ....................................................................................... 32 Figure 22. Taylor diagram representing the performance of the MACC-II/EVA modes to simulate PM10 daily mean (2010); data assimilated results correspond to the model noted with un “a” index. ................................................................................... 33 Figure 23. Bias, RMSE and correlation coefficient of the data assimilated Ensemble for the prediction of PM10 annual average in 2010 ................................................. 35 Figure 24. Performance indicators of the MACC-II/EVA models subregion by subregion for the prediction of the PM10 daily mean, 2010, rural stations ............................. 36 Figure 25. Performance indicators of the MACC-II/EVA models subregion by subregion for the prediction of the PM10 daily mean, 2010, suburban stations ........................ 37 Figure 26. Performance indicators of the MACC-II/EVA models subregion by subregion for the prediction of the PM10 daily mean, 2010, urban stations ............................ 38 Figure 27. Various model scores to simulate PM10 daily mean at urban sites, 2010, according : Bias (top), RMSE (middle) and Correlation coefficient (bottom) ............... 39 Figure 28. Number of days of exceedance of the PM10 daily average threshold (50 µg/m3) observed in 2010 and sorted by European sub-regions: EUW : Western, EUC : Central, EUN: Northern, EUS: Southern, EUE: eastern ......................................... 40 Figure 29. Number of days of exceedance of the PM10 daily average threshold (50 µg/m3) in 2010 predicted by the MACC-II/EVA data assimilated ensemble models; sorted by European sub-regions: EUW : Western, EUC : Central, EUN: Northern, EUS: Southern, EUE: eastern ......................................................................................... 41 Figure 30. Number of days of exceedance of the PM10 daily average threshold (50 µg/m3) in 2008 predicted by the “data assimilated ensemble” model ...................... 41 Figure 31. Contingency indicators for the prediction of exceedances of the daily PM10 limit value at urban stations by the MACC-II/EVA models: good predictions, false alerts, and non detection for the year 2010 ............................................................. 42 Figure 32. Bias between PM2.5 observed and modelled daily means for the year 2010 for the MACC-II/EVA models and for various station typologies: rural (top), suburban (middle), urban (bottom) ........................................................................... 43 Figure 33. RMSE between PM2.5 observed and modelled daily means for the year 2010 for the MACC-II/EVA models and for various station typologies: rural (top), suburban (middle), urban (bottom) ........................................................................... 44 Figure 34. Correlation coefficient between PM2.5 observed and modelled daily means for the year 2010 for the MACC-II/EVA models and for various station typologies: rural (top), suburban (middle), urban (bottom) ....................................................... 45 4 Glossary AIRBASE Analyses AOT 40 European Air Quality database (http://airclimate.eionet.europa.eu/databases/airbase/) Maps of air pollutant concentrations fields issued from numerical model results combined with up-to-date available observation data to improve their accuracy in the vicinity of measurement points. In MACC-II, they are produced routinely on a daily basis. Accumulated Ozone over the 40 ppb Threshold AQD Assessments Air Quality Directive Quantitative evaluation of air quality fields based on validated data and numerical model results CERFACS Centre Européen de Recherche et de Formation Avancée en Calcul Scientifique (France) Data assimilation SMHI Mathematical process to incorporate observations in a numerical model of physical systems European Aeroallergen Network Air Quality data assimilation sub-project in the MACC and MACC-II projects European Environment Agency Air quality forecasting and analysis sub-project in the MACC and MACC-II projects Combination of various results from various models. This can be a simple average (median), or a weighted average resulting from analysis of models’ behaviour over past periods. The models building-up the ensemble can correspond to different systems (multi-model approach) or to the same modelling system fed with different input datasets. Air quality validated assessments sub-project in the MACC and MACC-II projects Finnish meteorological Institute Royal Netherlands Meteorological Institute Laboratoire Interuniversitaire des Systèmes Atmosphérique (France) French Weather Services Norwegian Meteorological institute (Norway) Model results directly issued from the modelling chain, without any post-treatment process Maps of air pollutant concentrations fields issued from numerical model results combined with validated observation data to improve their accuracy in the vicinity of measurement points Rhenish Institute for Environmental research at the University of Cologne (Germany) Root Mean Square Error. It gives the standard deviation of the model prediction error. A smaller value indicates better model performance. Swedish Meteorological and Hydrological Institute (S) SOMO35 Ozone concentrations accumulated dose over a threshold of 35 ppb EAN: EDA EEA ENS Ensemble Model EVA: FMI KNMI LISA Météo France Met.no Raw model data Re-analyses RIUKK RMSE 5 TNO Netherlands Organisation for applied Scientific Research (NL) VOC Volatile Organic Compound WHO World Health Organization WMO World Meteorological Organization 6 1. Rationale The Copernicus/MACC-II project aims at delivering a number of services dedicated to global atmospheric composition and air quality monitoring in Europe. Air quality issues are covered by two services (http://www.copernicus-atmosphere.eu/services/aqac/) : - The so-called ENS services are focussed on routine and near real time products (forecasts, Near Real Time analyses). Up to four days forecasts of ozone, nitrogen dioxide and particulate matter concentrations (PM10 and PM2.5) throughout Europe are available every day. Daily analyses of the same variables (simulated fields are improved with observations thanks to data assimilation techniques) are proposed as well. They are available on the Copernicus atmosphere services website (http://macc-raq.gmes-atmosphere.eu/som_regrid_ens3D.php ). - The so-called EVA services relate to detailed analysis of past situations thanks to validated material issued from observation networks and modelling. Therefore, a posteriori validated air quality assessments for Europe, based on re-analysed air pollutant concentration fields are proposed. Simulations of past years are performed and “improved” thanks to the assimilation of available validated in-situ and satellite observations. They are available on the Copernicus atmosphere services website (http://www.gmesatmosphere.eu/services/raq/raq_reanalysis/). The so-called MACC-II regional air quality assessment reports describe, with a yearly frequency, the state and the evolution of background concentrations of air pollutants in European countries. Special care is given to pollutants characterised by the influence of long range transport, correctly caught by European scale modelling systems: ozone, nitrogen dioxide, particulate matter (PM10 and PM2.5). Focus on specific pollution episodes that happened during the year will be considered. This work results from a service that use both observations (in-situ and satellite) and model results to elaborate assessments. Both sources of information (modelling and measurement) are smartly mixed by the MACC-II scientists to elaborate high quality maps of air pollutant concentrations and patterns. Because the MACC-II/EVA assessment reports have the ambition to address policy and decision makers’concerns reliability, quality assurance and accuracy must be ensured. The model versions implemented and their capacities must be objectively evaluated and described in a transparent and traceable way. This is the objective of the Quality Assurance plans elaborated within MACC-II for each re-analysis chain. This kind of information is necessary to interpret correctly the model results, their variability and to assess uncertainties of the maps and air pollution fields delivered by the service. The air quality European assessments provided by MACC-II/EVA are built up on the basis of seven state-of-the-art chemistry transport models run in an operational way by decentralised modelling teams. The list and the current configuration of the modelling systems are given in annex. Moreover the multi-model functionalities developed in the MACC-II “regional cluster” allow to derived “ensemble” model estimations. They relate to the combination of various model results to obtain an average with improved skills compared to the individual models’ ones. This combination can be a simple median average or more sophisticated averages, weighted by coefficients depending on the model, the geographical location, the simulated period.... The former option has still been used for 2010. 7 The present document is the MACC-II/EVA evaluation report on European air Quality assessments for the year 2010. It provides the reader with a number of commented performance indicators which allow the evaluation of the quality of model results used in the assessment report. Building up confidence in the MACC-II/EVA re-analysis system is not the only issue covered by this analysis. It aims at establishing keys for a better understanding of the model results as well. Notice: At this stage, it should be noted that not all the expected capacities of the MACC program have been used for the 2010 assessment report working-out. Therefore, some results were issued exclusively from raw simulations of past situations, and other from combined numerical data with observations according to various data assimilation approaches (“re-analyses”). The performances of all model configurations were evaluated by INERIS and are reported in this report. For some pollutants, because too few members with re-analysis process (individual data assimilated model results) were available, the “ensemble” concept is no longer relevant. In such cases one individual model could be significantly better than the ensemble and consequently is selected for illustrating the assessment report. Content of the report This report addresses the capacities of state-of-the-art chemistry transport models to predict air quality indicators considered for the air quality re-analysis process. The evaluation phase which is discussed in the present document allows the establishment of objective and quantitative criteria to assess model skills and performances. The variability of the model responses is an indicator of the uncertainty of the modelling and re-analysis approaches as well. Geographical areas where all models perform reasonably well with the same trends can be considered as “well described” by the modelling systems. Conversely, more caution must be accorded to the sub-regions where the range of variability of the model results spreads out significantly. The next section describes the evaluation methodology that has been adopted. The other ones provides for each pollutant a synthesis of the statistical performance scores calculated for the various models involved in MACC-II/EVA. They are presented on comprehensive maps or with time series and histograms. 8 2. Methodology 2.1 Observation datasets The evaluation work is focussed on ozone, nitrogen dioxide, PM10 and PM2.5 concentration fields. Sufficiently representative and relevant observation data are now available for these pollutants allowing for the calculation of performance indicators. The AIRBASE database from the European Environment Agency (EEA)http://www.eea.europa.eu/data-and-maps/data/airbase-the-european-air-qualitydatabase-6- gathering all validated observations reported from the European regulatory air quality monitoring networks available until the year 2010. This data had been used for both re-analysis and evaluation processes. Considering the available data for ozone, NO2 and PM10, same sets cannot be used for both validation and data assimilation systems (it would not make sense to evaluate reanalyses against the same observation dataset that was used for data assimilation in the models). So the set of observations available in AIRBASE had been split in two subsets, one for DA and the other for validation (about one third of the total number of stations). Randomness and homogeneous spatial coverage with respect with the station typology (rural, suburban, and urban) were the principles considered for the station classification. The table below presents the number of stations selected in each category and their location is mapped on the following figures. O3 Rural/ DA 281 Rural/ Validation 144 NO2 209 105 PM10 162 74 PM2.5 34 20 Suburban/ Suburban/ DA Validation 262 124 Total O3: 1345 251 120 Total NO2: 1336 211 94 Total PM10: 1198 49 21 Total PM2.5: 226 Urban/ DA 360 Urban/ Validation 166 449 212 455 203 68 34 Table 1 : Number of stations selected in the AIRBASE database for 2010 for DA and validation 9 Figure 1. Location of the NO2 AIRBASE stations selected for data assimilation (red dots) and validation (green dots) processes 10 Figure 2. Location of the O3 AIRBASE stations selected for data assimilation (red dots) and validation (green dots) processes 11 Figure 3. Location of the PM10 AIRBASE stations selected for data assimilation (red dots) and validation (green dots) processes 2.2 Performance indicators The model performances are evaluated on the basis of classical statistical indicators which measure objectively the gap between the model results (raw data or re-analyses) and the observations at the available stations: bias, root mean square error (RMSE) and correlation coefficient are the most classical. Comparison of observed and model averages is generally considered as well. Obviously the behaviour of performance indicators depends on the station typology and the considered pollutant: the models used in the MACC-II/EVA systems run at the European scale and their spatial resolution is about 20 km in the best case. Consequently for pollutants which are largely influenced by local sources (NO2, PM in some situations) these regional models are not able to reproduce hot 12 spots monitored by traffic or industrial stations. Performance indicators will not be assessed. Difficulties can even be encountered at urban stations. Conversely for pollutants characterised by long residence time in the atmosphere and large impacted areas (typically ozone and PM in some cases), performance indicators evaluated at all type of stations (except traffic and industrial sites) make sense. The definition of the various performance indicators used in the report are reminded below. They are very usual1 in evaluation processes: • Bias indicates, on average, if the simulations under or over predicts the actual measured concentrations. In our case, negative values indicate under-prediction, whereas positive values indicate over-prediction; values close to 0 are the best ones: 1 N ⋅ ∑ (Pi − Oi ) N i =1 • Where N is the number of observations, Pi refers to the predictions and Oi to the observations. It is expressed in µg/m3. Root Mean Square Error (RMSE) gives information about the skill of the model in predicting the overall magnitude of the observations. It should be as weak as possible: 1 N 2 ⋅ ∑ (Pi − Oi ) N i =1 • Where N is the number of observations, Pi refers to the predictions and Oi to the observations. It is expressed in µg/m3. Correlation is a measure of whether predictions and observations change together in the same way (i.e. at the same time and/or place). The closer the correlation is to one, the better is the correspondence of extreme values of the two data sets. r= cov( Pi , Oi ) var( Pi ) ⋅ var(Oi ) Where N is the number of observations, Pi refers to the predictions and Oi to the observations. This is a non dimensional number. Taylor diagrams synthesize on a unique quadrant various statistical indicators for various models: the radii correspond to the correlation coefficient values, the x-axis and the yaxis delimits arcs with bias values and the internal semi-circles correspond to the RMSE values. Therefore this is a very pedagogic way to present an overview of the relative performances of a set of models, often used in model intercomparison exercises. For indicators related to threshold values, for instance the number of days, hours when a certain concentration level is exceeded, some “contingency tables” giving the percentages of correct predictions (GP), false alarms (FA), or missing events (ME) are estimated. These concepts come from the weather or air quality forecasting world. Although they are very severe and not objectively representative of the intrinsic model performance (because of the threshold cut-effect, a result close to the threshold can fall arbitrary in one or the other category), they can give a useful information to compare various models’ behavior in different geographical regions. GP, FA and ME are expressed in percentage (%). 1 Chang J.C. et Hanna S.R., 2004. Air quality model performance evaluation. Meteorol. Atmos. Phys. 87, 167–196. 13 Several representations of the models’ skills are proposed: maps with coloured patches at the location of the stations selected in AIRBASE for the evaluation process. The colour scale indicates how the model performs. Taylor diagrams provide a wider overview of the model performances. These graphs propose a global representation to consider at one glance the classical statistical scores that characterise the model performances against observations: bias, correlation coefficient and RMSE. Histograms with model performances sorted by station typology and by European subregion (Western, Northern, Southern, Central, Eastern) are proposed as well. 2.3 Models list The models involved in this evaluation are those run operationally by the MACC regional air quality modelling teams. A short reminder of the characteristics of their systems is given in annex. But for an easier reading, the list of models discussed in the next sections is given in the table below: Model CHIMERE EMEP EURAD LOTOS-EUROS MATCH MOCAGE SILAM Origin France (CNRS/INERIS) Norway (met.no) Germany (FRIUUK) The Netherlands (KNMI/TNO) Sweden (SMHI) France (Météo France) Finland (FMI) 14 3. . Ozone simulations and re-analyses Figure 4and Figure 5 shows Taylor diagrams for each individual model results and the ensemble models’results: raw simulation results (“ENS”) and data assimilated ensemble model results (“ENSa”). They relate to ozone daily maximum and ozone daily mean over the 2010 summer period respectively. The station typology is distinguished for a more comprehensive analysis. The benefits of the data assimilation process are significant considering the correlation coefficient (+ 0.05-0.07) and the Root Mean Square Error (-5 µg/m3). Standard deviation of assimilated model result improved significantly too, the observed reference being 33 and 27 µg/m3 respectively . In all cases, the assimilated ensemble provided the best results, which are very satisfactory considering the state of the art: correlation coefficients were about 0.95, except for rural daily mean (0.90). RMSE ranged between 15 and 10 µg/m3, except for rural daily for which it was slightly higher than 15 µg/m3. The fact that the performances are slightly lower for rural daily mean can certainly be explained by the way model simulate night ozone levels, which are generally too high. Those results sho a significant improvement compared to the previous assessed years. Figure 4. Taylor diagram representing the performance of the MACC-II/EVA modes to simulate summer daily max of ozone (2010); data assimilated results correspond to the model noted with un “a” index. 15 Figure 5. Taylor diagram representing the performance of the MACC-II/EVA modes to simulate summer daily mean of ozone (2010); data assimilated results correspond to the model noted with un “a” index In-depth analysis of the “assimilated model results” can be elaborated considering the spatial distribution of the statistical indicators over Europe. Figure 6 presents maps of bias, correlation coefficient and RMSE related to the “ENSa” model results, for the summer 2010. Correlation coefficient is excellent with values larger than 0.9 in most of the cases. Actually, only few stations in Italy, Portugal and in Central Europe show poor performances. For major number of stations, RMSE ranges between 5 and 15 µg/m3 what is very good. Performances decrease for stations around the Mediterranean area and in Central Europe. But in this last case, very few stations are available for validation. Therefore the ENSa model is correctly fitted to predict high values and exceedances of the regulatory thresholds. For comparison, Figure 7 shows the same panel of information for the ensemble model results (without data assimilation) and the same indicator (2010 summer daily average). It highlights where dadat assimilation improved the most model results: Western Europe is clearly concerned but the difficult regions too (Mediterranean coast and Central Europe). 16 (a) (b) (c) Figure 6. Statistical scores of the “assimilated ensemble” model results against the AIRBASE validation dataset for the ozone daily maximum over the summer 2010 Bias (a) Correlation coefficient (b) Root mean square error (c) (a) (b) 17 (c) Figure 7. Statistical scores of the “raw ensemble” model results against the AIRBASE validation dataset for the ozone daily maximum over the summer 2010 Bias (a) Correlation coefficient (b) Root mean square error (c) The multi-model approach developed in the MACC-II system for regional air quality modelling is of high interest for the qualification of the uncertainty of the results. of the ozone concentrations assessments can be approached considering the range of variability of the model results. Figure 8 shows the ozone daily peaks simulated by the EVA models in summer 2010, at rural, suburban and urban monitoring sites in Europe. The consistency between the various models is good, whatever the site typology. Differences do not exceed 15 µg/m3 and the temporal correlation with the observed time series (in green) is high. 18 Figure 8. MACC-II/EVA model responses to simulated ozone daily peaks over summer 2010, for various station typologies: rural (top), suburban(middle), urban (bottom) More investigation on how the models (in raw simulation and data assimilation modes) behave to predict the hourly concentrations and the daily peak has been conducted. Statistical scores established for the various typologies of stations and detailed sub-region by sub-region2 over the year 2010 are proposed in the figures below (see Figure 9, Figure 10, Figure 11). These figures show the variability rather high of the models performances, when considering the typology of the stations and the sub-region as well. Model performances were quite satisfactory and consistent with the state of the art. As expected the best results are obtained for Northern Europe stations while more difficulties were highlighted for Southern stations. Complexity of the meteorological patterns and topography, uncertainties on some sources (for instance biogenic sources) and uncertainties related to some chemical mechanisms could explain this “well-known” 2 EUW = Western Europe, EUC= Central Europe, EUS= Southern Europe, EUN= Northern Europe, EUE= Eastern Europe 19 limitation of the current modelling systems. It should be noted the good consistency between model performances (with satisfactory performances) for stations located in western and central Europe Lack of stations prevented from achieving the evaluation for suburban and urban locations in eastern Europe. The benefits of data assimilation are generally clearly established when we compare the “ensemble” of raw simulations with the “ensemble” of “DA simulations”. Considering individual models, CHIMERE performed (considering both raw results and DA results) the best. Inconsistencies can be noted in EURAD results with some discrepancies that could occurred in the data assimilated results. (a) (b) 20 (c) Figure 9. MACC regional model scores for predicting daily ozone peak over the year 2010 throughout European sub-regions (a) Bias (b) RMSE (c) Correlation coefficient at rural stations (a) 21 (b) (c) Figure 10. MACC regional model scores for predicting daily ozone peak over the year 2010 throughout European sub-regions (a) Bias (b) RMSE (c) Correlation coefficient at suburban stations 22 (a) (b) (c) Figure 11. MACC regional model scores for predicting daily ozone peak over the year 2010 throughout European sub-regions (a) Bias (b) RMSE (c) Correlation coefficient at urban stations 23 Models ‘capacities to predict situations when the regulatory thresholds (information level: 180 µg/m3 and alert level 240 µg/m3) are exceeded, especially during the “summer period” (April to October ) were assessed for each model and version of model (raw simulation or data assimilation modes). Figure 12 and Figure 13 show the number of days when exceedances of the information and alert thresholds respectively had been reported in the AIRBASE database. They are classified by sub-regions. Several ozone peaks held in summer 2010, especially in July. Western Europe and central Europe were mainly concerned by the highest values. The ability of the MACC-II/EVA “assimilated ensemble” model (ENSa) to reproduce these peaks is illustrated on Figure 14. Although detected on a qualitative point of view, the number of exceedances observed in July especially in Central Europe, was underestimated. This is not surprising; almost all the models, and the Ensemble, showed a negative bias in these areas. Figure 12. Number of days (observed in 2010) when the information regulatory threshold for ozone (180 µg/m3 hourly) was exceededed . Classification by European sub-regions Figure 13. Number of days (observed in 2010) when the alert regulatory threshold for ozone (240 µg/m3 hourly) was exceeded . Classification by European sub-regions 24 Figure 14. Capacity of the “assimilated Ensemble” model (ENSa) to reproduce the number of days when the information ozone threshold was exceeded Capacities of MACC-II/EVA models in predicting exceedances of the threshold values can be assessed, even if it is recommended to give special caution to the interpretation of such results. Indeed it is very difficult to deal with the threshold effect, only one microgram/m3 (which is lower than the intrinsic model uncertainty) over or under the threshold being likely to be responsible for a bad mark. Contingency graphs are proposed for information below for all the involved models and various station typologies. In general, data assimilation process tends to reduce the number of non detections (pink bars) and increase the number of good predictions (blue bars). In some cases (for instance the EURAD model) it can increase the number of false alerts (red bars). Balance between those classes of events are generally consistent, whatever the station typology. 25 Figure 15. Contingency graphs for the prediction of exccedances of the information threshold in 2010 by the MACC-II/EVA models. Rural sites (top), suburban sites (middle) and urban sites (bottom) 26 4. Nitrogen Dioxide simulations and re-analyses It is important to note that for the year 2010, only three teams assimilated NO2 observations: three teams over six assimilated in an operational way NO2 observations: RIUUK (Rhenish Institute for Environmental Research at the University of Cologne) assimilated NO2 ground-level observations from the AIRBASE database and also NO2 columns retrieved from satellites observations (OMI, GOME-2, SCHIAMACHI). The consortium KNMI/TNO assimilated ground level concentrations from AIRBASE, and satellite observations from IASI. The FMI assimilated NO2 in-situ data from AIRBASE in its results. Figure 16, Figure 17 and Figure 18 present the statistical scores (bias, RMSE and correlation coefficient respectively) of the EURADa, SILAMa and LOTO-EUROSa data assimilation systems to reproduce NO2 daily mean values over the year 2010. Score indicators are clearly better for EURAD, whatever the indicator. The geographical consistency of EURADa scores is remarkable as well. One should note the low RMSE (lower than 5 µg/m3 at many locations) obtained with the EURAD system. Its superiority can be explained by the maturity of the data assimilation system which bears the operational chain, and the fact that not only in-situ data from the AIRBASE stations are assimilated but also satellite information. Earth observations should help in reproducing the geographical distribution of air pollution patterns. Generally, the geographical areas where models are less satisfactory are the same for all systems: Italy, Alps and mountainous areas, and Eastern and central Europe for SILAM and LOTOS-EUROS. All models perform correctly in Western Europe. Considering those model results it should be noted a significant improvement of the scores compared to those obtained in the previous years, showing progress in the whole MACC-II regional modelling chains. In 2008, RMSE was ranging from 10 µg/m3 (in Germany and Central Europe) to 40 µg/m3, in Italy and in Eastern Europe. In the current system, best values are below 5 µg/m3 and do not exceed 30 µg/m3 in the worst cases. Because NO2 in ambient air is mainly influenced by local sources, European-wide models with a limited resolution (20km in the best case) perform less efficiently than for other pollutants. 27 Figure 16. Bias calculated for the NO2 daily mean in 2010 by the data assimilation systems LOTOS-EUROSa (left) , SILAMa (right) and EURADa (bottom) Figure 17. RMSE calculated for the NO2 daily mean in 2010 by the data assimilation systems LOTOS-EUROSa (left) , SILAMa (right) and EURADa (bottom) 28 Figure 18. Correlation coefficien) calculated for the NO2 daily mean in 2010 by the data assimilation systems LOTOS-EUROSa (left) , SILAMa (right) and EURADa (bottom) More detailed analysis for the daily mean was performed model by model and is proposed on Figure 19 to Figure 21 for the different statistical indicators. The analysis of model performances is consistent for all station typologies. The data assimilated system (EURADa) always gave better results for all indicators than the other codes and the “ensemble”: it helps to gain one or two points on the correlation coefficient and 3 to 5 µg/m3 on the RMSE. The EURAD system improved its performances compared to the previous years. The results were much more disappointing for LOTOS-EUROS. It seemed that the data assimilation chains did not improve the individual model results (without data assimilation). The raw simulation results are quite consistent from a model to another: CHIMERE, EMEP, EURAD and LOTOS-EUROS have more or less the same basic behaviour, which is consistent with the state of the art. 29 Figure 19. Bias the MACC-II/EVA models to predict daily mean of NO2 concentrations in 2010 for various station typologies: rural (top), suburban (middle), urban (bottom) 30 Figure 20. RMSE the MACC-II/EVA models to predict daily mean of NO2 concentrations in 2010 for various station typologies: rural (top), suburban (middle), urban (bottom) 31 Figure 21. Correlation coefficient the MACC-II/EVA models to predict daily mean of NO2 concentrations in 2010 for various station typologies: rural (top), suburban (middle), urban (bottom) 32 5. PM10 simulations and re-analyses Figure 22 presents the Taylor diagram for each individual model results and the ensemble models’results : raw simulation results (“ENS”) and data assimilated ensemble model results (“ENSa”). They relate to PM10 daily mean over the year 2010. The station typology is distinguished for a more comprehensive analysis. The data assimilated ensemble model gave good results even if it seems that they were slightly degraded by LOTOS-EUROSa performances which were lower than those of the other models. However the correlation coefficient is higher than 0.8 and the RMSE lower than 12 µg/m3 what is very good according to the state of the art. Figure 23 details the geographical distribution of these scores for the data assimilated ensemble. The scores were the best in Western Europe and were the worst in Eastern Europe and in mountainous areas. This can be explained by the complexity (in meteorological terms) of such areas but also by uncertainties in the emission inventories (especially in the east part of Europe). The gain of the data assimilation process is illustrated and quantified considering CHIMERE raw simulation results in regard with CHIMERE data assimilated results. The bias is reduced by 10 to 20 µg/m3 in absolute value, and the correlation coefficient is increased by almost 0.3-0.4. Bias becomes positive at almost all the stations: 5 to 10 µg/m3. Highest difficulties held for the prediction of the annual concentrations in Southern Europe. Best results were obtained for Western Europe. Correlation coefficients were highly variable from a site to another and from a model to another: it ranged from 0.85 in some excellent situations (with the data assimilated systems) to 0.1 in the worst ones. Figure 22. Taylor diagram representing the performance of the MACC-II/EVA modes to simulate PM10 daily mean (2010); data assimilated results correspond to the model noted with un “a” index. 33 34 Figure 23. Bias, RMSE and correlation coefficient of the data assimilated Ensemble for the prediction of PM10 annual average in 2010 In depth analysis can be conducted considering the scores for prediction PM10 daily mean at various station typologies and in the different geographical regions (Figure 24toFigure 26). Except for LOTOS-EUROS (this point should be further investigated) data assimilation improved significantly the model results. It should be noted that in rural areas it led the CHIMERE model to overestimate PM10 concentrations what is generally unexpected. As for NO2, one can note the consistent behaviour of the CHIMERE and EURAD models (raw data), but EURAD data assimilated system had a more significant impact on PM10 concentrations. It remarkably improved its performances at urban sites: 2 to 4 more points on the correlation coefficient, RMSE decreased by 10 to 15 at Eastern and Southern urban locations (known to be difficult to catch). The improvement is highly significant and these results are very encouraging, also for developing use of earth observation data in data assimilated systems. The performances of the CHIMERE and EURAD DA systems are generally very good. RMSE ranged below 10 µg/m3 (rural and suburban stations) to 15 to 20 µg/m3 (urban stations with a maximum reached in Southern Europe) and correlation coefficient stood around 0.4 in the worst case to 0.6-0.8 in the best ones. This is a significant improvement compared to the previous years reports. The results (whatever the model set-up) were generally the best for Western and Northern locations. Southern and Eastern Europe is the area where the results were the most uncertain. Finally, Figure 27 gives an overview of those results, for urban typologies at the European scale. The added-value of the Ensemble is clearly highlighted with the consistent behaviour of the CHIMEREa and EURADa models. Scores were very good for these models compared to the state of the art. 35 (a) (b) (c) Figure 24. Performance indicators of the MACC-II/EVA models subregion by subregion for the prediction of the PM10 daily mean, 2010, rural stations 36 (a) (b) (c) Figure 25. Performance indicators of the MACC-II/EVA models subregion by subregion for the prediction of the PM10 daily mean, 2010, suburban stations 37 (a) (b) (c) Figure 26. Performance indicators of the MACC-II/EVA models subregion by subregion for the prediction of the PM10 daily mean, 2010, urban stations 38 Figure 27. Various model scores to simulate PM10 daily mean at urban sites, 2010, according : Bias (top), RMSE (middle) and Correlation coefficient (bottom) 39 Last part of the analysis deal with the prediction of the situation when regulatory threshold value (50 µg/m3 for the PM10 daily mean) is exceeded. Figure 28 shows the number of days when such exceedance situations held in 2010. Western and Central Europe were mainly concerned. One can note that winter 2010 (January, February and December) was particularly rich in such events. Figure 29 represents the same indicator modelled by the data assimilated Ensemble model (ENSa). One third of the number of days of exceedance was correctly modelled. Missing events concerned the summer period, and some exceedances that occurred in winter in Western Europe and central Europe. It is important to note that the proposed simulations did not account for the impact of the huge forest fires that occurred in Russia in during the first half of August. The impact of forest fires on ozone and PM atmospheric concentrations is well-known and it is expected that summer ozone and PM10 concentrations in Central and Eastern Europe could have been influenced by huge forest fire emissions. This contribution is clearly missing in the proposed simulation and can explain the model discrepancies observed in the summer period. In the next reports the MACC-II modelling chains will account for forest fire emissions provided by the dedicated sub-project. A significant improvement of the model performances is expected from this new functionality. Figure 30 are time series of the number of days exceeding the PM10 regulatory limit value (daily mean) predicted by all the MACC-II/EVA data assimilated models and the observations. Excellent correlation between both is highlighted. This shows the ability of the models to predict episodes, even if their importance is underestimated. Very encouraging performance of the DA systems is demonstrated with these graphs, although it seems that some events are still underestimated (summer period). Figure 28. Number of days of exceedance of the PM10 daily average threshold (50 µg/m3) observed in 2010 and sorted by European sub-regions: EUW : Western, EUC : Central, EUN: Northern, EUS: Southern, EUE: eastern 40 Figure 29. Number of days of exceedance of the PM10 daily average threshold (50 µg/m3) in 2010 predicted by the MACC-II/EVA data assimilated ensemble models; sorted by European sub-regions: EUW : Western, EUC : Central, EUN: Northern, EUS: Southern, EUE: eastern Figure 30. Number of days of exceedance of the PM10 daily average threshold (50 µg/m3) in 2008 predicted by the “data assimilated ensemble” model 41 Histograms of contingency indicators (Figure 31) also give a good representation of the improvements expected from the data assimilation systems with a significant reduced number of non detections (except for LOTOS-EUROS) and an increased number of good predictions. It is interesting to note that the number of false alerts, which tends to increase, remains quite stable. Figure 31. Contingency indicators for the prediction of exceedances of the daily PM10 limit value at urban stations by the MACC-II/EVA models: good predictions, false alerts, and non detection for the year 2010 6. PM2.5 simulations and re-analyses In the previous evaluation reports (2007 to 2009), the PM2.5 modelled concentrations had not been deeply assessed because of lack of observation data. The number of PM2.5 stations increased in Europe with the implementation of the air quality directive, and now formal assessment becomes more relevant. A first attempt had been made with the 2010 re-analyses. Figure 32, Figure 33, Figure 34 represent usual statistical score indicators obtained with each model that computed PM2.5 concentrations against available observations. Because of limited number of stations those results must be interpreted with caution. The consistency of models’behavior, whatever the station typology should be noted. EURADa (data assimilation chain) is the only one which overestimated PM2.5 concentrations over Europe. Its performances were quite encouraging with a correlation coefficient of about 0.6 and RMSE of about 10-15 µg/m3. However this last figure reflects lower quality results than those obtained for the other pollutants. One should note promising results provided by the CHIMERE model, even the raw simulation results. 42 Figure 32. Bias between PM2.5 observed and modelled daily means for the year 2010 for the MACC-II/EVA models and for various station typologies: rural (top), suburban (middle), urban (bottom) 43 Figure 33. RMSE between PM2.5 observed and modelled daily means for the year 2010 for the MACC-II/EVA models and for various station typologies: rural (top), suburban (middle), urban (bottom) 44 Figure 34. Correlation coefficient between PM2.5 observed and modelled daily means for the year 2010 for the MACC-II/EVA models and for various station typologies: rural (top), suburban (middle), urban (bottom) 45 7. Conclusions This report provides an extensive analysis of the performances of the MACC-II/EVA modelling systems (simulations and re-analyses platforms) to predict the concentrations of the regulatory air pollutant concentrations (O3, NO2, PM10, PM2.5) in 2010. Daily mean values, daily maximum values (ozone), annual means and indicators related to situations when regulatory thresholds are exceeded, were investigated. For the first time, it was not possible to develop the same analysis for PM2.5. Distinction between the station typologies and the European sub-regions was made for a more comprehensive interpretation. It is interesting to note how the data-assimilated systems improved the representation of the air pollution patterns at least for EURAD, SILAM and CHIMERE. The LOTOS-EUROS situation needs to be further investigated, with data assimilated results which seems less improved compared to what is achieved with the other models. The EURAD data assimilation chain, which integrates satellite information (at least for NO2) got the best results in many situation demonstrating the potential added-value of Earth Observations for air quality issues. Anyway, such analysis demonstrates the added-value of the provision of operational reanalyses of air quality fields for the policy decision and the air quality management. The performances of the MACC-II/EVA models are very promising for the simulation platform, with capacities compliant with the state-to-the-art and even better in some cases. However it is important to improve the models in southern and Eastern Europe which are the most difficult regions to simulate. This is a well-known situation which justifies the development of current research projects: uncertainties in emissions, and limitation of the models to reproduce the dynamical and chemical processes in this geographical area are well-known and should reduce in the coming year. However it must be noted that in general few stations are available in these parts of Europe for both validation en evaluation. This can make the interpretation of the performance results more difficult. However the score performances established in this report allows building up confidence in the MACC-II/EVA assessment reports for air quality in Europe. The mapped indicators can be considered as relevant with a controlled uncertainty. For the regions where the models perform less efficiently (Southern, Eastern Europe) general patterns are correctly represented: the episode situations are generally predicted but their intensity is underestimated. In the next steps more DA systems will be run and the models should globally improve (integration of forest fires, progress in the parametrisations...), therefore the scores presented in the present report are expected to be even better. We have already mentioned that for some indicators the global performance of the modelling systems is better than for the previous years assessment exercises. Local situations in sensitive areas (Italy, Balkans, Eastern Europe) should be investigated in a deeper way. 46 ANNEX : methodologies and assumptions Models: The models that provided raw simulations and data assimilated fields of air pollutant concentrations for the year 2010 are the ones involved in the MACC-II regional cluster dedicated to the provision of air quality information (near-real time and in delayed mode) at the European scale. Seven models running operationally on their own modelling platform perform re-analyses since the end of the MACC project (October 2011) for establishing yearly assessment reports. These models are described in the QA/QC dossiers available and regularly updated on the MACC website (http://www.gmesatmosphere.eu/documents/deliverables/r-ens/). It is important to note that differences can occur between the modelling chain run routinely for the provision of daily air quality forecasts and near real time analyses, and the modelling chain used for re-analyses calculations. The later requires larger computational resources (computational time and storage space) to deal with a whole year on an hourly basis. Some teams did not achieve the development of their data assimilation system and in this cases reported only raw simulation values. The tables below give an overview of each data assimilation system developed by the regional air quality modelling partners in MACC, and its status at the time when the 2010 runs have been performed. This synthesis can facilitate the interpretation of some results reports in this report and in the validation report. Model DA process Pollutants concerned Data sources Operational production of data assimilated fields CHIMERE Optimal interpolation : kriging observation data with CHIMERE as external drift Ensemble Kalman filter O3, PM10 AIRBASE Yes Significant improvement O3 AIRBASE Under evaluation : O3 partial tropospheric columns (IASI) EMEP 3D-VAR NO2 OMI NO2 column EURAD Intermittent 3DVAR O3, NO2, NO, CO, SO2, PM10 AIRBASE in situ measurements MOSAIC air borne in situ measurements NO2 tropospheric column retrievals from OMI, GOME2, SCIAMACHY MOPITT CO profiles tropospheric Not yet; need for comparison with OI Yes but did not operate its data assimilation chain (not yet operational) Yes 47 LOTOSEUROS Ensemble filter Kalman O3 NO2, PM10, AOD, SO2, SO4 Airbase OMI : observation operator developed and used for the 2009 report Yes, Significant improvement However further evaluation is needed MATCH 3D-VAR with transform into spectral space O3, NO2 Airbase System not fully operational which did not provide model outputs for the 2010 assessment report MOCAGE 3D-VAR O3 Ozone in-situ data (AIRBASE) Yes since summer 2010, online evaluation available SILAM 3D-VAR 4D-VAR O3, NO2, SO2 AIRBASE in situ Yes, Operational Table 1: Synthesis of the current data assimilation capacities developed in the regional air quality models involved in MACC. They must be operational by the end of the MACC project Model Data provided Quality checking CHIMERE O3, PM10, with OI for the whole year and the requested episodes ; raw simulation results for NO2 and PM2.5 Significant improvement of the simulation results EMEP Only raw simulations provided, DA under development Not applicable for DA chain (still under development) EURAD O3, NO2, NO, CO, SO2, PM10 (surface obs) , MOZAIC, NO2 tropospheric column retrievals from OMI, GOME-2, SCIAMACHI and MOPITT CO profiles Significant improvement of the simulation results LOTOSEUROS ozone, PM10 and NO2, PM2.5 re-analyses on 30 km resolution Significant improvement of the simulation results MATCH No re-analyses nor raw simulations provided Not applicable MOCAGE O3 re-analyses provided; raw simulation results for the other compounds Significant improvement SILAM O3,NO2, PM10 re-analyses provided Significant improvement of the model results Table 2: Brief summary of the model configurations used for the 2010 assessment report Assumptions on input data: 48 Emission data, meteorological re-analyses and boundary conditions emissions have been provided by the other MACC-II sub-project or by the MACC-II partners. Indeed, meteorological re-analyses were provided by ECMWF. The emission inventory used for running the models is the high resolution one provided for the year 2009 by the TNO within the “Emissions” MACC subproject. Finally boundary conditions come, for the gaseous compounds from the Global “reactive gases” sub-project (MOZART re-analyses). 49 50