Abstract
Identification of latent variables that govern a problem and the relationships among them given measurements in the observed world are important for causal discovery. This identi- fication can be made by analyzing constraints imposed by the latents in the measurements. We introduce the concept of pairwise cluster comparison PCC to identify causal relationships from clusters and a two-stage algorithm, called LPCC, that learns a latent variable model (LVM) using PCC. First, LPCC learns the exogenous and the collider latents, as well as their observed descendants, by utilizing pairwise comparisons between clusters in the measurement space that may explain latent causes. Second, LPCC learns the non-collider endogenous latents and their children by splitting these latents from their previously learned latent ancestors. LPCC is not limited to linear or latent-tree models and does not make assumptions about the distribution. Using simulated and real-world datasets, we show that LPCC improves accuracy with the sample size, can learn large LVMs, and is accurate in learning compared to state-of-the-art algorithms.
Original language | English |
---|---|
Pages (from-to) | 33-48 |
Number of pages | 16 |
Journal | Journal of Machine Learning Research |
Volume | 25 |
State | Published - 1 Dec 2012 |
Event | 4th Asian Conference on Machine Learning, ACML 2012 - Singapore, Singapore Duration: 4 Nov 2012 → 6 Nov 2012 |
Keywords
- Clustering
- Graphical models
- Learning latent variable models
ASJC Scopus subject areas
- Software
- Control and Systems Engineering
- Statistics and Probability
- Artificial Intelligence