TY - GEN
T1 - Cluster ranking with an application to mining mailbox networks
AU - Bar-Yossef, Ziv
AU - Guy, Ido
AU - Lempel, Ronny
AU - Maaren, Yoëlle S.
AU - Soroka, Vladimir
PY - 2006/12/1
Y1 - 2006/12/1
N2 - We initiate the study of a new clustering framework, called cluster ranking. Rather than simply partitioning a network into clusters, a cluster ranking algorithm also orders the clusters by their strength. To this end, we introduce a novel strength measure for clusters-the integrated cohesion-which is applicable to arbitrary weighted networks. We then present C-Rank: a new cluster ranking algorithm. Given a network with arbitrary pairwise similarity weights, C-Rank creates a list of overlapping clusters and ranks them by their integrated cohesion. We provide extensive theoretical and empirical analysis of C-Rank and show that it is likely to have high precision and recall. Our experiments focus on mining mailbox networks. A mailbox network is an egocentric social network, consisting of contacts with whom an individual exchanges email. Ties among contacts are represented by the frequency of their co-occurrence on message headers. C-Rank is well suited to mine such networks, since they are abundant with overlapping communities of highly variable strengths. We demonstrate the effectiveness of C-Rank on the Enron data set, consisting of 130 mailbox networks.
AB - We initiate the study of a new clustering framework, called cluster ranking. Rather than simply partitioning a network into clusters, a cluster ranking algorithm also orders the clusters by their strength. To this end, we introduce a novel strength measure for clusters-the integrated cohesion-which is applicable to arbitrary weighted networks. We then present C-Rank: a new cluster ranking algorithm. Given a network with arbitrary pairwise similarity weights, C-Rank creates a list of overlapping clusters and ranks them by their integrated cohesion. We provide extensive theoretical and empirical analysis of C-Rank and show that it is likely to have high precision and recall. Our experiments focus on mining mailbox networks. A mailbox network is an egocentric social network, consisting of contacts with whom an individual exchanges email. Ties among contacts are represented by the frequency of their co-occurrence on message headers. C-Rank is well suited to mine such networks, since they are abundant with overlapping communities of highly variable strengths. We demonstrate the effectiveness of C-Rank on the Enron data set, consisting of 130 mailbox networks.
UR - http://www.scopus.com/inward/record.url?scp=77956224601&partnerID=8YFLogxK
U2 - 10.1109/ICDM.2006.35
DO - 10.1109/ICDM.2006.35
M3 - Conference contribution
AN - SCOPUS:77956224601
SN - 0769527019
SN - 9780769527017
T3 - Proceedings - IEEE International Conference on Data Mining, ICDM
SP - 63
EP - 74
BT - Proceedings - Sixth International Conference on Data Mining, ICDM 2006
T2 - 6th International Conference on Data Mining, ICDM 2006
Y2 - 18 December 2006 through 22 December 2006
ER -