Abstract | ||
---|---|---|
Social media sites such as Flickr, YouTube, and Facebook host substantial amounts of user-contributed materials (e.g., photographs, videos, and textual content) for a wide vari- ety of real-world events. These range from widely known events, such as the presidential inauguration, to smaller, community-specic events, such as annual conventions and local gatherings. By identifying these events and their as- sociated user-contributed social media documents, which is the focus of this paper, we can greatly improve local event browsing and search in state-of-the-art search engines. To address our problem of focus, we exploit the rich \context" associated with social media content, including user-provided annotations (e.g., title, tags) and automatically generated information (e.g., content creation time). We form a variety of representations of social media documents using dier- ent context dimensions, and combine these dimensions in a principled way into a single clustering solution|where each document cluster ideally corresponds to one event|using a weighted ensemble approach. We evaluate our approach on a large-scale, real-world dataset of event images, and re- port promising performance with respect to several baseline approaches. Our preliminary experiments suggest that our ensemble approach identies events, and their associated im- ages, more eectively than the state-of-the-art strategies on which we build. |
Year | Venue | Keywords |
---|---|---|
2009 | WebDB | social media,document clustering,search engine |
Field | DocType | Citations |
Data mining,World Wide Web,Social media,Search engine,Information retrieval,Document clustering,Computer science,Exploit,Content creation,Cluster analysis | Conference | 40 |
PageRank | References | Authors |
1.53 | 20 | 3 |
Name | Order | Citations | PageRank |
---|---|---|---|
Hila Becker | 1 | 717 | 30.57 |
Mor Naaman | 2 | 4783 | 318.39 |
L. Gravano | 3 | 5668 | 855.47 |