Collecting and Analysing Chats and Tweets in SoNaR. - Citegraph

Paper Info

Title
Collecting and Analysing Chats and Tweets in SoNaR.

Abstract
In this paper a collection of chats and tweets from the Netherlands and Flanders is described. The chats and tweets are part of the freely available SoNaR corpus, a 500 million word text corpus of the Dutch language. Recruitment, metadata, anonymisation and IPR issues are discussed. To illustrate the difference of language use between the various text types and other parameters (like gender and age) simple text analysis in the form of unigram frequency lists is carried out. Furthermore a website is presented with which users can retrieve their own frequency lists.

Year	Venue	Keywords
2012	LREC 2012 - EIGHTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION	SoNaR,social media,chats,tweets,corpus collection,corpus analysis
Field	DocType	Citations
Metadata,Text mining,Information retrieval,Computer science,Text types,Text corpus,Speech recognition,Sonar,Artificial intelligence,Natural language processing	Conference	4
PageRank	References	Authors
0.70	1	1

Authors (1 rows)

Cited by (4 rows)

References (1 rows)

Name	Order	Citations	PageRank
Eric Sanders	1	138	27.90

1