Identifying long tail term from large-scale candidate pairs for big data-oriented patent analysis. - Citegraph

Paper Info

Title
Identifying long tail term from large-scale candidate pairs for big data-oriented patent analysis.

Abstract
Patent is a very important and valuable type of scientific and technical big data. This paper presents how to mine patent text to obtain valuable information/knowledge from large-scale candidates obtained from these patents based on massive patent texts. We firstly propose a patent term extraction method using co-occurrence in the abstract and first-claim sections of patent records. There are three steps: 1 we extract candidate strings according to our definition of a term; 2 we propose an assumption to verify whether a candidate string is a qualified term or not by using the co-occurrence of terms in the abstract and first claim; and 3 we use term frequency-inverse document frequencyAUTHOR: TF-IDF has been defined as \"term frequency-inverse document frequency\". Please check if correct. or mutual information to rank and select candidate terms. Secondly, we propose a new method to obtain valuable long tail term from patents. To fulfill the purpose, 1 we firstly build long tail term-common term pair as candidate set; 2 then we evaluate each candidate pair's value; and finally, 3 to demonstrate our method, we give an example on our result. This study provides a new perspective in extracting terms from free texts of patent records and also proposes a new method to obtain valuable long term to aid information analysis with massive patent texts. Copyright © 2016 John Wiley & Sons, Ltd.

Year	DOI	Venue
2016	10.1002/cpe.3792	Concurrency and Computation: Practice and Experience
Keywords	Field	DocType
term extraction,long tail term,patent analysis,scientific big data	Data mining,Information retrieval,Computer science,Mutual information,Patent analysis,Big data	Journal
Volume	Issue	ISSN
28	15	1532-0626
Citations	PageRank	References
3	0.40	18
Authors
4

Authors (4 rows)

Cited by (3 rows)

References (18 rows)

Name	Order	Citations	PageRank
Peng Qu	1	10	2.52
Junsheng Zhang	2	203	25.16
Changqing Yao	3	22	6.71
wen zeng	4	9	2.85

1