TY - GEN
T1 - A new approach for fuzzy clustering of web documents
AU - Friedman, Menahem
AU - Schneider, Moti
AU - Last, Mark
AU - Zaafrany, Omer
AU - Kandel, Abraham
PY - 2004/12/1
Y1 - 2004/12/1
N2 - Most existing methods of document clustering are based on the classical vector-space model, which represents each document by a fixed-size vector of key terms or key phrases. In large and diverse document collections such as the World Wide Web, this approach suffers from a tremendous computational overload, since the constant size of the term vector equals to the total number of key terms in all documents. We propose a new fuzzy-based approach to clustering documents that are represented by vectors of variable size. Each entry in a vector consists of two Fields. The first field is the name of a key phrase in the document and the second denotes an importance weight associated with this key phrase within the particular document. We will describe the proposed approach in detail and show how it is implemented in a real world application from the area of web monitoring.
AB - Most existing methods of document clustering are based on the classical vector-space model, which represents each document by a fixed-size vector of key terms or key phrases. In large and diverse document collections such as the World Wide Web, this approach suffers from a tremendous computational overload, since the constant size of the term vector equals to the total number of key terms in all documents. We propose a new fuzzy-based approach to clustering documents that are represented by vectors of variable size. Each entry in a vector consists of two Fields. The first field is the name of a key phrase in the document and the second denotes an importance weight associated with this key phrase within the particular document. We will describe the proposed approach in detail and show how it is implemented in a real world application from the area of web monitoring.
UR - http://www.scopus.com/inward/record.url?scp=11144335450&partnerID=8YFLogxK
U2 - 10.1109/FUZZY.2004.1375752
DO - 10.1109/FUZZY.2004.1375752
M3 - Conference contribution
AN - SCOPUS:11144335450
SN - 0780383532
T3 - IEEE International Conference on Fuzzy Systems
SP - 377
EP - 381
BT - 2004 IEEE International Conference on Fuzzy Systems - Proceedings
T2 - 2004 IEEE International Conference on Fuzzy Systems - Proceedings
Y2 - 25 July 2004 through 29 July 2004
ER -