Creating Extended Gender Labelled Datasets of Twitter Users. - Citegraph

Paper Info

Title
Creating Extended Gender Labelled Datasets of Twitter Users.

Abstract
The gender information of a Twitter user is not known a priori when analysing Twitter data, because user registration does not include gender information. This paper proposes an approach for creating extended gender labelled datasets of Twitter users. The process involves creating a smaller database of active Twitter users and to manually label the gender. The process follows by extracting features from unstructured information found on each user profile and by creating a gender classification model. The model is then applied to a larger dataset, thus providing automatic labels and corresponding confidence scores, which can be used to estimate the most accurately labeled users. The resulting databases can be further enriched with additional information extracted, for example, from the profile picture and from the user location. The proposed approach was successfully applied to English and Portuguese users, leading to two large datasets containing more than 57K labeled users each.

Year	DOI	Venue
2016	10.1007/978-3-319-40581-0_56	Communications in Computer and Information Science
Keywords	Field	DocType
Gender classification,Twitter users,Gender database,Text mining	World Wide Web,User profile,Computer science,A priori and a posteriori	Conference
Volume	ISSN	Citations
611	1865-0929	1
PageRank	References	Authors
0.37	12	3

Authors (3 rows)

Cited by (1 rows)

References (12 rows)

Name	Order	Citations	PageRank
Marco Vicente	1	2	0.73
Fernando Batista	2	115	21.04
João Paulo Carvalho	3	110	17.52

1