TY - GEN

T1 - Learning Distance Functions using Equivalence Relations

AU - Bar-Hillel, Aharon

AU - Hertz, Tomer

AU - Shental, Noam

AU - Weinshall, Daphna

PY - 2003/12/1

Y1 - 2003/12/1

N2 - We address the problem of learning distance metrics using side-information in the form of groups of "similar" points. We propose to use the RCA algorithm, which is a simple and efficient algorithm for learning a full ranked Mahalanobis metric (Shental et al., 2002). We first show that RCA obtains the solution to an interesting optimization problem, founded on an information theoretic basis. If the Mahalanobis matrix is allowed to be singular, we show that Fisher's linear discriminant followed by RCA is the optimal dimensionality reduction algorithm under the same criterion. We then show how this optimization problem is related to the criterion optimized by another recent algorithm for metric learning (Xing et al., 2002), which uses the same kind of side information. We empirically demonstrate that learning a distance metric using the RCA algorithm significantly improves clustering performance, similarly to the alternative algorithm. Since the RCA algorithm is much more efficient and cost effective than the alternative, as it only uses closed form expressions of the data, it seems like a preferable choice for the learning of full rank Mahalanobis distances.

AB - We address the problem of learning distance metrics using side-information in the form of groups of "similar" points. We propose to use the RCA algorithm, which is a simple and efficient algorithm for learning a full ranked Mahalanobis metric (Shental et al., 2002). We first show that RCA obtains the solution to an interesting optimization problem, founded on an information theoretic basis. If the Mahalanobis matrix is allowed to be singular, we show that Fisher's linear discriminant followed by RCA is the optimal dimensionality reduction algorithm under the same criterion. We then show how this optimization problem is related to the criterion optimized by another recent algorithm for metric learning (Xing et al., 2002), which uses the same kind of side information. We empirically demonstrate that learning a distance metric using the RCA algorithm significantly improves clustering performance, similarly to the alternative algorithm. Since the RCA algorithm is much more efficient and cost effective than the alternative, as it only uses closed form expressions of the data, it seems like a preferable choice for the learning of full rank Mahalanobis distances.

KW - Clustering

KW - Feature selection

KW - Learning from partial knowledge

KW - Semi-supervised learning

UR - http://www.scopus.com/inward/record.url?scp=1942517347&partnerID=8YFLogxK

M3 - Conference contribution

AN - SCOPUS:1942517347

SN - 1577351894

T3 - Proceedings, Twentieth International Conference on Machine Learning

SP - 11

EP - 18

BT - Proceedings, Twentieth International Conference on Machine Learning

A2 - Fawcett, T.

A2 - Mishra, N.

T2 - Proceedings, Twentieth International Conference on Machine Learning

Y2 - 21 August 2003 through 24 August 2003

ER -