Abstract | ||
---|---|---|
Larger numbers of geoscience reports create challenges and opportunities for data analysis and knowledge discovery. Segmenting texts into semantically and syntactically meaningful words is known as the Chinese word segmentation (CWS) problem because there is no space between words in the Chinese language. CWS is a crucial first step toward natural language processing (NLP). Although the available generic segmenters can process geoscience reports, their performance degrades dramatically without sufficient domain knowledge. Hence, developing effective segmenters remains a challenge and requires more work. |
Year | DOI | Venue |
---|---|---|
2018 | 10.1016/j.cageo.2018.08.006 | Computers & Geosciences |
Keywords | Field | DocType |
Chinese word segmentation,Geoscience reports,Unigram language model,Natural language processing | Market segmentation,Domain knowledge,Computer science,Earth science,Text segmentation,Knowledge extraction,Artificial intelligence,Deep learning,Language model | Journal |
Volume | ISSN | Citations |
121 | 0098-3004 | 1 |
PageRank | References | Authors |
0.35 | 28 | 4 |
Name | Order | Citations | PageRank |
---|---|---|---|
Qinjun Qiu | 1 | 2 | 0.72 |
Zhong Xie | 2 | 34 | 12.55 |
Liang Wu | 3 | 33 | 5.49 |
Wenjia Li | 4 | 251 | 28.60 |