A multi-resolution ensemble model of three decision-tree-based algorithms to predict daily NO2 concentration in France 2005–2022

Guillaume Barbalat, Ian Hough, Michael Dorman, Johanna Lepeule, Itai Kloog

Research output: Contribution to journalArticlepeer-review

Abstract

Understanding and managing the health effects of Nitrogen Dioxide (NO2) requires high resolution spatiotemporal exposure maps. Here, we developed a multi-stage multi-resolution ensemble model that predicts daily NO2 concentration across continental France from 2005 to 2022. Innovations of this work include the computation of daily predictions at a 200 m resolution in large urban areas and the use of a spatio-temporal blocking procedure to avoid data leakage and ensure fair performance estimation. Predictions were obtained after three cascading stages of modeling: (1) predicting NO2 total column density from Ozone Monitoring Instrument satellite; (2) predicting daily NO2 concentrations at a 1 km spatial resolution using a large set of potential predictors such as predictions obtained from stage 1, land-cover and road traffic data; and (3) predicting residuals from stage 2 models at a 200 m resolution in large urban areas. The latter two stages used a generalized additive model to ensemble predictions of three decision-tree algorithms (random forest, extreme gradient boosting and categorical boosting). Cross-validated performances of our ensemble models were overall very good, with a ten-fold cross-validated R2 for the 1 km model of 0.83, and of 0.69 for the 200 m model. All three basis learners participated in the ensemble predictions to various degrees depending on time and space. In sum, our multi-stage approach was able to predict daily NO2 concentrations with a relatively low error. Ensembling the predictions maximizes the chance of obtaining accurate values if one basis learner fails in a specific area or at a particular time, by relying on the other learners. To the best of our knowledge, this is the first study aiming to predict NO2 concentrations in France with such a high spatiotemporal resolution, large spatial extent, and long temporal coverage. Exposure estimates are available to investigate NO2 health effects in epidemiological studies.

Original languageEnglish
Article number119241
JournalEnvironmental Research
Volume257
DOIs
StatePublished - 15 Sep 2024

Keywords

  • 200 m resolution
  • Daily predictions
  • Decision-tree
  • Nitrogen dioxide
  • Spatio-temporal blocking
  • Spatio-temporal modeling

ASJC Scopus subject areas

  • Biochemistry
  • General Environmental Science

Fingerprint

Dive into the research topics of 'A multi-resolution ensemble model of three decision-tree-based algorithms to predict daily NO2 concentration in France 2005–2022'. Together they form a unique fingerprint.

Cite this