Title
Identifying long tail term from large-scale candidate pairs for big data-oriented patent analysis.
Abstract
Patent is a very important and valuable type of scientific and technical big data. This paper presents how to mine patent text to obtain valuable information/knowledge from large-scale candidates obtained from these patents based on massive patent texts. We firstly propose a patent term extraction method using co-occurrence in the abstract and first-claim sections of patent records. There are three steps: 1 we extract candidate strings according to our definition of a term; 2 we propose an assumption to verify whether a candidate string is a qualified term or not by using the co-occurrence of terms in the abstract and first claim; and 3 we use term frequency-inverse document frequencyAUTHOR: TF-IDF has been defined as \"term frequency-inverse document frequency\". Please check if correct. or mutual information to rank and select candidate terms. Secondly, we propose a new method to obtain valuable long tail term from patents. To fulfill the purpose, 1 we firstly build long tail term-common term pair as candidate set; 2 then we evaluate each candidate pair's value; and finally, 3 to demonstrate our method, we give an example on our result. This study provides a new perspective in extracting terms from free texts of patent records and also proposes a new method to obtain valuable long term to aid information analysis with massive patent texts. Copyright © 2016 John Wiley & Sons, Ltd.
Year
DOI
Venue
2016
10.1002/cpe.3792
Concurrency and Computation: Practice and Experience
Keywords
Field
DocType
term extraction,long tail term,patent analysis,scientific big data
Data mining,Information retrieval,Computer science,Mutual information,Patent analysis,Big data
Journal
Volume
Issue
ISSN
28
15
1532-0626
Citations 
PageRank 
References 
3
0.40
18
Authors
4
Name
Order
Citations
PageRank
Peng Qu1102.52
Junsheng Zhang220325.16
Changqing Yao3226.71
wen zeng492.85