Abstract
This paper focuses on estimating clustering validity by using logistic regression. For many applications it might be important to estimate the quality of the clustering, e.g. in case of speech segments' clustering, make a decision whether to use the clustered data for speaker verification. In the case of short segments speakers clustering, the common criteria for cluster validity are average cluster purity (ACP), average speaker purity (ASP) and K - the geometric mean between the two measures. As in practice, true labels are not available for evaluation, hence they have to be estimated from the clustering itself. In this paper, mean-shift clustering with PLDA score is applied in order to cluster short speaker segments represented as i-vectors. Different statistical parameters are then estimated on the clustered data and are used to train logistic regression to estimate ACP, ASP and K. It was found that logistic regression can be a good predictor of the actual ACP, ASP and K, and yields reasonable information regarding the clustering quality.
| Original language | English |
|---|---|
| Pages (from-to) | 3577-3581 |
| Number of pages | 5 |
| Journal | Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH |
| Volume | 2017-August |
| DOIs | |
| State | Published - 1 Jan 2017 |
| Externally published | Yes |
| Event | 18th Annual Conference of the International Speech Communication Association, INTERSPEECH 2017 - Stockholm, Sweden Duration: 20 Aug 2017 → 24 Aug 2017 |
Keywords
- Cluster validity
- I-vectors
- Logistic Regression
- Mean-shift
- PLDA
ASJC Scopus subject areas
- Language and Linguistics
- Human-Computer Interaction
- Signal Processing
- Software
- Modeling and Simulation