Large-scale, parallel automatic patent annotation - Citegraph

Paper Info

Title
Large-scale, parallel automatic patent annotation

Abstract
When researching new product ideas or filing new patents, inventors need to retrieve all relevant pre-existing know-how and/or to exploit and enforce patents in their technological domain. However, this process is hindered by lack of richer metadata, which if present, would allow more powerful concept-based search to complement the current keyword-based approach. This paper presents our approach to automatic patent enrichment, tested in large-scale, parallel experiments on USPTO and EPO documents. It starts by defining the metadata annotation task and examines its challenges. The text analysis tools are presented next, including details on automatic annotation of sections, references and measurements. The key challenges encountered were dealing with ambiguities and errors in the data; creation and maintenance of large, domain-independent dictionaries; and building an efficient, robust patent analysis pipeline, capable of dealing with terabytes of data. The accuracy of automatically created metadata is evaluated against a human-annotated gold standard, with results of over 90% on most annotation types.

Year	DOI	Venue
2008	10.1145/1458572.1458574	PaIR
Keywords	Field	DocType
text analysis tool,automatic patent enrichment,parallel automatic patent annotation,richer metadata,new product idea,metadata annotation task,robust patent analysis pipeline,automatic annotation,current keyword-based approach,new patent,annotation type,gold standard,text analysis,parallel,gate,information extraction	Metadata,Annotation,Information retrieval,Terabyte,Computer science,Image retrieval,Exploit,Information extraction,Patent analysis,New product development	Conference
Citations	PageRank	References
7	0.70	5
Authors
8

Authors (8 rows)

Cited by (7 rows)

References (5 rows)

Name	Order	Citations	PageRank
Milan Agatonovic	1	175	8.08
Niraj Aswani	2	189	11.21
Kalina Bontcheva	3	2538	211.33
Hamish Cunningham	4	2426	255.41
Thomas Heitz	5	7	0.70
Yaoyong Li	6	393	26.55
Ian Roberts	7	207	17.68
Valentin Tablan	8	1359	119.57

1