Efficient classification for metric data

    Research output: Contribution to journalArticlepeer-review

    60 Scopus citations

    Abstract

    Recent advances in large-margin classification of data residing in general metric spaces (rather than Hilbert spaces) enable classification under various natural metrics, such as string edit and earthmover distance. A general framework developed for this purpose left open the questions of computational efficiency and of providing direct bounds on generalization error. We design a new algorithm for classification in general metric spaces, whose runtime and accuracy depend on the doubling dimension of the data points, and can thus achieve superior classification performance in many common scenarios. The algorithmic core of our approach is an approximate (rather than exact) solution to the classical problems of Lipschitz extension and of nearest neighbor search. The algorithm's generalization performance is guaranteed via the fat-shattering dimension of Lipschitz classifiers, and we present experimental evidence of its superiority to some common kernel methods. As a by-product, we offer a new perspective on the nearest neighbor classifier, which yields significantly sharper risk asymptotics than the classic analysis.

    Original languageEnglish
    Article number6867374
    Pages (from-to)5750-5759
    Number of pages10
    JournalIEEE Transactions on Information Theory
    Volume60
    Issue number9
    DOIs
    StatePublished - 1 Jan 2014

    Keywords

    • Classification
    • Lipschitz function
    • doubling dimension
    • metric space

    ASJC Scopus subject areas

    • Information Systems
    • Computer Science Applications
    • Library and Information Sciences

    Fingerprint

    Dive into the research topics of 'Efficient classification for metric data'. Together they form a unique fingerprint.

    Cite this