Tree Density Estimation

Laszlo Gyorfi, Aryeh Kontorovich, Roi Weiss

Research output: Contribution to journalArticlepeer-review


We study the problem of estimating the density $f({\mathbf {x}})$ of a random vector ${ {\mathbf {X}}}$ in ${\mathbb R}^{d}$. For a spanning tree $T$ defined on the vertex set $\{1, {\dots },d\}$ , the tree density $f_{T}$ is a product of bivariate conditional densities. An optimal spanning tree minimizes the Kullback-Leibler divergence between $f$ and $f_{T}$. From i.i.d. data we identify an optimal tree $T^{*}$ and efficiently construct a tree density estimate $f_{n}$ such that, without any regularity conditions on the density $f$ , one has $\lim _{n\to \infty } \int | f_{n}({\mathbf {x}})-f_{T^{*}}({\mathbf {x}})|d {\mathbf {x}}=0$ a.s. For Lipschitz $f$ with bounded support, ${\mathbb E}\left \{{ \int | f_{n}({\mathbf {x}})-f_{T^{*}}({\mathbf {x}})|d {\mathbf {x}}}\right \}=O\big (n^{-1/4}\big)$ , a dimension-free rate.

Original languageEnglish
Pages (from-to)1168-1176
Number of pages9
JournalIEEE Transactions on Information Theory
Issue number2
StatePublished - 1 Feb 2023


  • Density estimation
  • Kruskals algorithm
  • consistency
  • rate of convergence
  • tree identification

ASJC Scopus subject areas

  • Information Systems
  • Computer Science Applications
  • Library and Information Sciences


Dive into the research topics of 'Tree Density Estimation'. Together they form a unique fingerprint.

Cite this