Spatial predictions, like other supervised learning tasks, require some criterion for a predictor's quality. Typical data-splitting schemes, such as holdouts and $k$ -fold cross-validation, ignore the fact that the training data are usually not available where predictions are being made. The common data-splitting schemes are thus biased estimates of a predictor's performance, which in turn may lead to choosing suboptimal predictors. In this contribution, we borrow ideas from the domain adaptation machine-learning literature, to suggest the importance-weighted source risk (IWSR). IWSR is a principled approach for weighting the prediction risk, which allows the practitioner to explicitly state the target locations for prediction. IWSR essentially consists of down-weighting training locations and up-weighting target locations. We show that, unlike the usual (unweighted) empirical risk, IWSR is an unbiased estimator of the prediction error. Equipped with this risk estimator, we use it to learn a model in the empirical risk minimization framework and to evaluate the existing predictors. We show the superiority of this weighted risk, using both simulated data and an empirical control: air-temperature prediction in France.
|Number of pages||9|
|Journal||IEEE Transactions on Geoscience and Remote Sensing|
|State||Published - 1 Jun 2021|
- Geospatial analysis
- machine learning algorithms
- remote sensing