Abstract | ||
---|---|---|
Different from the languages widely used in western countries such as English or French, there are no spaces between words in Chinese language, and a segmentation of the texts is necessary before other superior processes. New word identification is an important problem in the segmentation process, especially when the segmentation targets are social network texts which have more abbreviated words or other non-standard representations. Several methods have been proposed to detect Chinese new words. Most of these methods take the corpus as a static set and they don't consider the time domain information. Different from these studies, we regard our social network corpus as a text series spreading along the time line and design a new kind of features named dynamic features which can reflect the temporal variety of the string's statistical features. The experimental results on the dataset crawled from the biggest microblogging application in China show that this method can significantly improve the effect of Chinese new word identification. |
Year | DOI | Venue |
---|---|---|
2014 | 10.1109/CSCWD.2014.6846904 | CSCWD |
Keywords | Field | DocType |
time domain,western countries,microblogging application,social network text,social network,string statistical features,time series information,social network corpus,segmentation process,chinese language,chinese new words,english,internet,new word identification,segmentation targets,time domain information,natural language processing,social networking (online),text analysis,french,time series,text series,entropy,feature extraction,vectors | Time domain,Social media,Social network,Segmentation,Computer science,Microblogging,Artificial intelligence,Natural language processing,Time line | Conference |
Citations | PageRank | References |
0 | 0.34 | 7 |
Authors | ||
3 |
Name | Order | Citations | PageRank |
---|---|---|---|
Meng Wang | 1 | 4 | 1.41 |
Lanfen Lin | 2 | 78 | 24.70 |
Feng Wang | 3 | 20 | 2.34 |