Title
DGeoSegmenter: A dictionary-based Chinese word segmenter for the geoscience domain.
Abstract
Larger numbers of geoscience reports create challenges and opportunities for data analysis and knowledge discovery. Segmenting texts into semantically and syntactically meaningful words is known as the Chinese word segmentation (CWS) problem because there is no space between words in the Chinese language. CWS is a crucial first step toward natural language processing (NLP). Although the available generic segmenters can process geoscience reports, their performance degrades dramatically without sufficient domain knowledge. Hence, developing effective segmenters remains a challenge and requires more work.
Year
DOI
Venue
2018
10.1016/j.cageo.2018.08.006
Computers & Geosciences
Keywords
Field
DocType
Chinese word segmentation,Geoscience reports,Unigram language model,Natural language processing
Market segmentation,Domain knowledge,Computer science,Earth science,Text segmentation,Knowledge extraction,Artificial intelligence,Deep learning,Language model
Journal
Volume
ISSN
Citations 
121
0098-3004
1
PageRank 
References 
Authors
0.35
28
4
Name
Order
Citations
PageRank
Qinjun Qiu120.72
Zhong Xie23412.55
Liang Wu3335.49
Wenjia Li425128.60