Generating Labeled Datasets of Twitter Users. - Citegraph

Paper Info

Title
Generating Labeled Datasets of Twitter Users.

Abstract
In this paper we present a simple, yet powerful approach to generating labeled datasets of Twitter1 users. Our focus falls on sensitive personal details, shared as background information in tweets. Such tweets avoid the focus of user's attention and also tend to resist the vast amounts of humor, wishes or hypothetical thinking typical for tweets. Our approach combines selecting search queries, followed up by a semi-supervised filtering of indicative messages. We create datasets in several unrelated domains and prove that all sorts of target groups can be built with minimal manual annotator effort. The generated datasets include separate groups of users with specific characteristics: pet ownership, blood pressure, diabetes and psychotropic medicine usage, for which to our knowledge manually labeled data was previously not available. Our search-based approach is also used to generate a cross-domain corpus, matching Twitter users with their Yelp2 profiles.

Year	DOI	Venue
2017	10.1145/3099023.3099048	UMAP (Adjunct Publication)
Field	DocType	Citations
Personal details,World Wide Web,Computer science,Labeled data	Conference	0
PageRank	References	Authors
0.34	10	3

Authors (3 rows)

Cited by (0 rows)

References (10 rows)

Name	Order	Citations	PageRank
yasen kiprov	1	15	4.94
Pepa Gencheva	2	29	8.87
ivan koychev	3	57	6.16

1