TY - JOUR
T1 - A multi-resolution ensemble model of three decision-tree-based algorithms to predict daily NO2 concentration in France 2005–2022
AU - Barbalat, Guillaume
AU - Hough, Ian
AU - Dorman, Michael
AU - Lepeule, Johanna
AU - Kloog, Itai
N1 - Publisher Copyright:
© 2024 The Authors
PY - 2024/9/15
Y1 - 2024/9/15
N2 - Understanding and managing the health effects of Nitrogen Dioxide (NO2) requires high resolution spatiotemporal exposure maps. Here, we developed a multi-stage multi-resolution ensemble model that predicts daily NO2 concentration across continental France from 2005 to 2022. Innovations of this work include the computation of daily predictions at a 200 m resolution in large urban areas and the use of a spatio-temporal blocking procedure to avoid data leakage and ensure fair performance estimation. Predictions were obtained after three cascading stages of modeling: (1) predicting NO2 total column density from Ozone Monitoring Instrument satellite; (2) predicting daily NO2 concentrations at a 1 km spatial resolution using a large set of potential predictors such as predictions obtained from stage 1, land-cover and road traffic data; and (3) predicting residuals from stage 2 models at a 200 m resolution in large urban areas. The latter two stages used a generalized additive model to ensemble predictions of three decision-tree algorithms (random forest, extreme gradient boosting and categorical boosting). Cross-validated performances of our ensemble models were overall very good, with a ten-fold cross-validated R2 for the 1 km model of 0.83, and of 0.69 for the 200 m model. All three basis learners participated in the ensemble predictions to various degrees depending on time and space. In sum, our multi-stage approach was able to predict daily NO2 concentrations with a relatively low error. Ensembling the predictions maximizes the chance of obtaining accurate values if one basis learner fails in a specific area or at a particular time, by relying on the other learners. To the best of our knowledge, this is the first study aiming to predict NO2 concentrations in France with such a high spatiotemporal resolution, large spatial extent, and long temporal coverage. Exposure estimates are available to investigate NO2 health effects in epidemiological studies.
AB - Understanding and managing the health effects of Nitrogen Dioxide (NO2) requires high resolution spatiotemporal exposure maps. Here, we developed a multi-stage multi-resolution ensemble model that predicts daily NO2 concentration across continental France from 2005 to 2022. Innovations of this work include the computation of daily predictions at a 200 m resolution in large urban areas and the use of a spatio-temporal blocking procedure to avoid data leakage and ensure fair performance estimation. Predictions were obtained after three cascading stages of modeling: (1) predicting NO2 total column density from Ozone Monitoring Instrument satellite; (2) predicting daily NO2 concentrations at a 1 km spatial resolution using a large set of potential predictors such as predictions obtained from stage 1, land-cover and road traffic data; and (3) predicting residuals from stage 2 models at a 200 m resolution in large urban areas. The latter two stages used a generalized additive model to ensemble predictions of three decision-tree algorithms (random forest, extreme gradient boosting and categorical boosting). Cross-validated performances of our ensemble models were overall very good, with a ten-fold cross-validated R2 for the 1 km model of 0.83, and of 0.69 for the 200 m model. All three basis learners participated in the ensemble predictions to various degrees depending on time and space. In sum, our multi-stage approach was able to predict daily NO2 concentrations with a relatively low error. Ensembling the predictions maximizes the chance of obtaining accurate values if one basis learner fails in a specific area or at a particular time, by relying on the other learners. To the best of our knowledge, this is the first study aiming to predict NO2 concentrations in France with such a high spatiotemporal resolution, large spatial extent, and long temporal coverage. Exposure estimates are available to investigate NO2 health effects in epidemiological studies.
KW - 200 m resolution
KW - Daily predictions
KW - Decision-tree
KW - Nitrogen dioxide
KW - Spatio-temporal blocking
KW - Spatio-temporal modeling
UR - http://www.scopus.com/inward/record.url?scp=85194864858&partnerID=8YFLogxK
U2 - 10.1016/j.envres.2024.119241
DO - 10.1016/j.envres.2024.119241
M3 - Article
C2 - 38810827
AN - SCOPUS:85194864858
SN - 0013-9351
VL - 257
JO - Environmental Research
JF - Environmental Research
M1 - 119241
ER -