Abstract | ||
---|---|---|
With the emergence of mobile Internet, Internet of things and cloud computing, the domain of information security is in a rapid development. As a result, a constant stream of compound-words describing new concepts and new technologies has arisen. However, the existing dictionary does not collect those new compound-words in time, so it cannot identify them correctly. In order to solve this problem, this paper presents a SVM-based compound-word recognition method in information security. The method is based on the outputs of the existing word segmentation system. It constructs adjacent atom-word digraph according to the statistical co-occurrence features and lexical rules. Next, it produces compound-word candidate set through deep traverse the digraph by the longest match principle. It further filters the candidate set by using a SVM classifier with the help of domain contrast corpus and computer dictionary. We use this method to identify new compound-words from 2200 vulnerability description texts. It achieves a precision of 82.25% and recall of 77.44%. The results show that our method is able to effectively identify new compound-words in information security from large scale of corpus. |
Year | DOI | Venue |
---|---|---|
2013 | 10.1109/FSKD.2013.6816310 | FSKD |
Keywords | Field | DocType |
domain contrast corpus,svm-based compound-word recognition method,longest match principle,word processing,statistical analysis,pattern classification,word segmentation system,statistical co-occurrence features,svm classifier,svm,depth first traversal,mobile internet,lexical rules,directed graphs,computer dictionary,atom-word digraph,compound-word,cloud computing,information security,support vector machines,vulnerability description texts,security of data,internet of things,filtering,feature extraction,dictionaries,compound word | Data mining,Computer science,Compound,Artificial intelligence,Digraph,Pattern recognition,Support vector machine,Information security,Filter (signal processing),Feature extraction,Text segmentation,Machine learning,Cloud computing | Conference |
Citations | PageRank | References |
0 | 0.34 | 3 |
Authors | ||
7 |
Name | Order | Citations | PageRank |
---|---|---|---|
Shixian Li | 1 | 118 | 7.32 |
Lei Zhang | 2 | 59 | 5.60 |
Bo Han | 3 | 9 | 3.60 |
Tingrui Lei | 4 | 0 | 0.34 |
Qing Wang | 5 | 345 | 76.64 |
Tao Peng | 6 | 0 | 0.34 |
Peng Cao | 7 | 64 | 21.46 |