Title
Why Chinese Web-as-Corpus is Wacky? Or: How Big Data is Killing Chinese Corpus Linguistics.
Abstract
This paper aims to examine and evaluate the current development of using Web-as-Corpus (WaC) paradigm in Chinese corpus linguistics. I will argue that the unstable notion of wordhood in Chinese and the resulting diverse ideas of implementing word segmentation systems have posed great challenges for those who are keen on building web-scaled corpus data. Two lexical measures are proposed to illustrate the issues and methodological discussions are provided.
Year
Venue
Keywords
2014
LREC 2014 - NINTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION
Corpus evaluation,word segmentation,Web as Corpus
Field
DocType
Citations 
Computer science,Speech recognition,Text segmentation,Artificial intelligence,Corpus linguistics,Natural language processing,Linguistics,Big data
Conference
0
PageRank 
References 
Authors
0.34
3
1
Name
Order
Citations
PageRank
Shu-kai Hsieh14721.47