Title
New word identification in social network text based on time series information
Abstract
Different from the languages widely used in western countries such as English or French, there are no spaces between words in Chinese language, and a segmentation of the texts is necessary before other superior processes. New word identification is an important problem in the segmentation process, especially when the segmentation targets are social network texts which have more abbreviated words or other non-standard representations. Several methods have been proposed to detect Chinese new words. Most of these methods take the corpus as a static set and they don't consider the time domain information. Different from these studies, we regard our social network corpus as a text series spreading along the time line and design a new kind of features named dynamic features which can reflect the temporal variety of the string's statistical features. The experimental results on the dataset crawled from the biggest microblogging application in China show that this method can significantly improve the effect of Chinese new word identification.
Year
DOI
Venue
2014
10.1109/CSCWD.2014.6846904
CSCWD
Keywords
Field
DocType
time domain,western countries,microblogging application,social network text,social network,string statistical features,time series information,social network corpus,segmentation process,chinese language,chinese new words,english,internet,new word identification,segmentation targets,time domain information,natural language processing,social networking (online),text analysis,french,time series,text series,entropy,feature extraction,vectors
Time domain,Social media,Social network,Segmentation,Computer science,Microblogging,Artificial intelligence,Natural language processing,Time line
Conference
Citations 
PageRank 
References 
0
0.34
7
Authors
3
Name
Order
Citations
PageRank
Meng Wang141.41
Lanfen Lin27824.70
Feng Wang3202.34