Abstract | ||
---|---|---|
This paper presents an approach towards creation of topic focused short text (social data) dataset using classification. With emerging use of internet, social networks have turned as the most advanced tool for information sharing among communities. Different communities from different backgrounds use globally renowned social networks often using and promoting their own cultures and languages. Hence, such information exchange turns social networks into multi-lingual information hubs. There are a number of behavioral and demographic oriented analytical studies reported that use data from social networks, but most of the studies are performed using English. In this study, we have focused on development of topic oriented bi-lingual dataset that can be used as corpus to perform further analytical studies. The languages focused are English and Roman-Urdu (which is spoken by about 8 million active users of social network). The main contribution is bi-lingual classifier which is used to create English and Roman-Urdu classified tweets dataset. |
Year | DOI | Venue |
---|---|---|
2014 | 10.1007/978-3-319-08979-9_40 | Lecture Notes in Artificial Intelligence |
Keywords | Field | DocType |
Bi-Lingual Classification,Twitter Dataset,Language Resources | Data science,Social network,Computer science,Information exchange,Classifier (linguistics),Information sharing,The Internet | Conference |
Volume | ISSN | Citations |
8556 | 0302-9743 | 2 |
PageRank | References | Authors |
0.43 | 3 | 2 |
Name | Order | Citations | PageRank |
---|---|---|---|
Iqra Javed | 1 | 7 | 1.55 |
Hammad Afzal | 2 | 41 | 11.31 |