Title
A SVM-based compound-word recognition method in information security
Abstract
With the emergence of mobile Internet, Internet of things and cloud computing, the domain of information security is in a rapid development. As a result, a constant stream of compound-words describing new concepts and new technologies has arisen. However, the existing dictionary does not collect those new compound-words in time, so it cannot identify them correctly. In order to solve this problem, this paper presents a SVM-based compound-word recognition method in information security. The method is based on the outputs of the existing word segmentation system. It constructs adjacent atom-word digraph according to the statistical co-occurrence features and lexical rules. Next, it produces compound-word candidate set through deep traverse the digraph by the longest match principle. It further filters the candidate set by using a SVM classifier with the help of domain contrast corpus and computer dictionary. We use this method to identify new compound-words from 2200 vulnerability description texts. It achieves a precision of 82.25% and recall of 77.44%. The results show that our method is able to effectively identify new compound-words in information security from large scale of corpus.
Year
DOI
Venue
2013
10.1109/FSKD.2013.6816310
FSKD
Keywords
Field
DocType
domain contrast corpus,svm-based compound-word recognition method,longest match principle,word processing,statistical analysis,pattern classification,word segmentation system,statistical co-occurrence features,svm classifier,svm,depth first traversal,mobile internet,lexical rules,directed graphs,computer dictionary,atom-word digraph,compound-word,cloud computing,information security,support vector machines,vulnerability description texts,security of data,internet of things,filtering,feature extraction,dictionaries,compound word
Data mining,Computer science,Compound,Artificial intelligence,Digraph,Pattern recognition,Support vector machine,Information security,Filter (signal processing),Feature extraction,Text segmentation,Machine learning,Cloud computing
Conference
Citations 
PageRank 
References 
0
0.34
3
Authors
7
Name
Order
Citations
PageRank
Shixian Li11187.32
Lei Zhang2595.60
Bo Han393.60
Tingrui Lei400.34
Qing Wang534576.64
Tao Peng600.34
Peng Cao76421.46