TY - GEN
T1 - Limit Distributions for Smooth Total Variation and χ2-Divergence in High Dimensions
AU - Goldfeld, Ziv
AU - Kato, Kengo
N1 - Publisher Copyright:
© 2020 IEEE.
PY - 2020/6/1
Y1 - 2020/6/1
N2 - Statistical divergences are ubiquitous in machine learning as tools for measuring discrepancy between probability distributions. As these applications inherently rely on approximating distributions from samples, we consider empirical approximation under two popular f-divergences: the total variation (TV) distance and the χ2-divergence. To circumvent the sensitivity of these divergences to support mismatch, the framework of Gaussian smoothing is adopted. We study the limit distributions of n {δ (TV)(Pn} (N) σ,P∗(N) σ and n{2}(Pn} (N) σ } P∗(N) σ , where Pn is the empirical measure based on n independently and identically distributed (i.i.d.) observations from P, {{N) σ: = {{N)( {0,{σ 2}(I) d) \right), and ∗ stands for convolution. In arbitrary dimension, the limit distributions are characterized in terms of Gaussian process on d with covariance operator that depends on P and the isotropic Gaussian density of parameter σ. This, in turn, implies optimality of the n-1/2 expected value convergence rates recently derived for δ (TV )(Pn} (N) σ,P∗(N) σ and 2}(Pn} (N) σ P∗(N) σ . These strong statistical guarantees promote empirical approximation under Gaussian smoothing as a potent framework for learning and inference based on high-dimensional data.
AB - Statistical divergences are ubiquitous in machine learning as tools for measuring discrepancy between probability distributions. As these applications inherently rely on approximating distributions from samples, we consider empirical approximation under two popular f-divergences: the total variation (TV) distance and the χ2-divergence. To circumvent the sensitivity of these divergences to support mismatch, the framework of Gaussian smoothing is adopted. We study the limit distributions of n {δ (TV)(Pn} (N) σ,P∗(N) σ and n{2}(Pn} (N) σ } P∗(N) σ , where Pn is the empirical measure based on n independently and identically distributed (i.i.d.) observations from P, {{N) σ: = {{N)( {0,{σ 2}(I) d) \right), and ∗ stands for convolution. In arbitrary dimension, the limit distributions are characterized in terms of Gaussian process on d with covariance operator that depends on P and the isotropic Gaussian density of parameter σ. This, in turn, implies optimality of the n-1/2 expected value convergence rates recently derived for δ (TV )(Pn} (N) σ,P∗(N) σ and 2}(Pn} (N) σ P∗(N) σ . These strong statistical guarantees promote empirical approximation under Gaussian smoothing as a potent framework for learning and inference based on high-dimensional data.
UR - http://www.scopus.com/inward/record.url?scp=85090403211&partnerID=8YFLogxK
U2 - 10.1109/ISIT44484.2020.9174101
DO - 10.1109/ISIT44484.2020.9174101
M3 - Conference contribution
AN - SCOPUS:85090403211
T3 - IEEE International Symposium on Information Theory - Proceedings
SP - 2640
EP - 2645
BT - 2020 IEEE International Symposium on Information Theory, ISIT 2020 - Proceedings
PB - Institute of Electrical and Electronics Engineers
T2 - 2020 IEEE International Symposium on Information Theory, ISIT 2020
Y2 - 21 July 2020 through 26 July 2020
ER -