Title
Distributional Similarity Model for Multi-modality Clustering in Social Media
Abstract
User Generated Content (UGC) has become the fastest growing sector of the WWW. Data mining from UGC presents challenges not typically found in text mining from documents. UGC can be semi-structured and its content can be very short and informal, containing relatively little content similar to a chat or an email conversation. In addition, UGC can be viewed as a multi-modality data. These characteristics pose big challenges and research questions for scholars to cope with. To cluster UGC data, we can construct multiple contingency tables of modalities and employ the multi-way distributional clustering (MDC) algorithm. However, by considering a contingency table which summarizes the co-occurrence statistics of two modalities, it is not robust to represent the information entropy between two modalities in UGC data. In this paper, we propose a novel similarity measurement, called Distributional Similarity Model (DSM), to solidify the graph model in the MDC algorithm to deal with the unique characteristics of the UGC data.
Year
DOI
Venue
2007
10.1109/WI-IATW.2007.105
Web Intelligence/IAT Workshops
Keywords
Field
DocType
data mining,multiple contingency table,ugc data,cluster ugc data,user generated content,text mining,mdc algorithm,social media,distributional similarity model,contingency table,multi-modality clustering,multi-modality data,internet,user interfaces
Modalities,User-generated content,Data mining,Social media,Information retrieval,Computer science,Contingency table,Cluster analysis,User interface,Entropy (information theory),The Internet
Conference
ISBN
Citations 
PageRank 
0-7695-3028-1
3
1.18
References 
Authors
7
4
Name
Order
Citations
PageRank
Donahue C. M. Sze161.55
Tak-chung Fu240721.29
Fu Lai Chung3153486.72
Robert W. P. Luk455455.57