A fuzzy-based algorithm for web document clustering

Menahem Friedman, Abraham Kandel, Moti Schneider, Mark Last, Bracha Shapira, Yuval Elovici, Omer Zaafrany

Research output: Contribution to conferencePaperpeer-review

8 Scopus citations

Abstract

Most existing methods of document clustering are based on a model that assumes a fixed-size vector representation of key terms or key phrases within each document. This assumption is not realistic in large and diverse document collections such as the World Wide Web. We propose a new Fuzzy-based Document Clustering Method (FDCM), to cluster documents that are represented by variable length vectors. Each vector element consists of two fields. The first is an identification of a key phrase (its name) in the document and the second denotes a frequency associated with this key phrase within the particular document. A new averaging method is defined for the cluster centroid calculating, and a membership function is developed for relating new documents to existing clusters. The proposed approach is described in detail and we show how it is implemented in a real world application from the area of Web monitoring.

Original languageEnglish
Pages524-527
DOIs
StatePublished - 1 Jan 2004
EventNAFIPS 2004 - Annual Meeting of the North American Fuzzy Information Processing Society: Fuzzy Sets in the Heart of the Canadian Rockies - Banff, Alta, Canada
Duration: 27 Jun 200430 Jun 2004

Conference

ConferenceNAFIPS 2004 - Annual Meeting of the North American Fuzzy Information Processing Society: Fuzzy Sets in the Heart of the Canadian Rockies
Country/TerritoryCanada
CityBanff, Alta
Period27/06/0430/06/04

ASJC Scopus subject areas

  • Computer Science (all)
  • Mathematics (all)

Fingerprint

Dive into the research topics of 'A fuzzy-based algorithm for web document clustering'. Together they form a unique fingerprint.

Cite this