Mining Patents Using Molecular Similarity Search - Citegraph

Paper Info

Title
Mining Patents Using Molecular Similarity Search

Abstract
Text analytics is becoming an increasingly important tool used in biomedical research, While advances continue to be made in the core algorithms for entity identification and relation extraction, a need for practical applications of these technologies arises. We developed a system that allows users to explore the US Patent corpus using molecular information. The core of our system contains three main technologies: A high performing chemical annotator which identifies chemical terms and converts them to structures, a similarity search engine based on the emerging IUPAC International Chemical Identifier (InChI) standard, and a set of on demand data mining tools. By leveraging this technology we were able to rapidly identify and index 3, 623, 248 unique chemical structures from 4,375,036 US Patents and Patent Applications. Using this system a user may go to a web page, draw a molecule, search for related Intellectual Property (IP) and analyze the results. Our results prove that this is a far more effective way for identifying IP than traditional keyword based approaches.

Year	Venue	Keywords
2007	PACIFIC SYMPOSIUM ON BIOCOMPUTING 2007	chemical similarity, data mining, patents, search engine, InChI
Field	DocType	ISSN
Data science,Web page,Biology,Information retrieval,Identifier,Chemical nomenclature,Search analytics,Intellectual property,Bioinformatics,Nearest neighbor search,Relationship extraction,Goto	Conference	2335-6936
Citations	PageRank	References
16	2.05	9
Authors
5

Authors (5 rows)

Cited by (16 rows)

References (9 rows)

Name	Order	Citations	PageRank
James Rhodes	1	42	11.56
Stephen Boyer	2	151	10.50
Jeffrey Kreulen	3	209	33.59
ying chen	4	134	17.07
Patricia Ordonez	5	25	3.63

1