Title
BibPro: A Citation Parser Based on Sequence Alignment Techniques
Abstract
The dramatic increase in the number of academic publications has led to a growing demand for efficient organization of the resources to meet researchers’ specific needs. As a result, a number of network services have compiled databases from the public resources scattered over the Internet. However, publications in different conferences and journals follow different citation formats, so the problem of accurately extracting metadata from a publication string has also attracted a great deal of attention in recent years. In this paper, we extend our previous work to propose a new tool called BibPro for extracting metadata from citation strings by using a gene sequence alignment tool. The main enhancement of BibPro to our previously tool is that BibPro does not need knowledge databases (e.g., an author name database) to generate feature indices for citation strings. Instead, only the order of punctuation marks in a citation string is used to represent its format. Second, BibPro employs the Basic Local Alignment Search Tool (BLAST) to find the most similar citation formats in database and then uses the Needleman-Wunsch algorithm to choose the best-fit citation format as the extraction template. Our experimental results show that, in terms of precision and recall, BibPro outperforms other existent systems (e.g., INFOMAP and ParaCite), and BibPro can scale well.
Year
DOI
Venue
2008
10.1109/WAINA.2008.125
AINA Workshops
Keywords
Field
DocType
digital library,new tool,sequence alignment techniques,citation string,gene sequence alignment tool,similar citation format,data,publication string,citation extraction,author name database,different citation format,citation parser,different conference,sequence alignment,best-fit citation format,knowledge databases,hidden markov models,text analysis,data mining,citation analysis,information analysis,support vector machines,meta data,internet,needleman wunsch algorithm,scattering,indexation
Metadata,Information retrieval,Computer science,Precision and recall,Citation,Citation analysis,Needleman–Wunsch algorithm,Digital library,Parsing,The Internet
Conference
Citations 
PageRank 
References 
7
0.52
13
Authors
4
Name
Order
Citations
PageRank
Chien-Chih Chen111120.42
Kai-Hsiang Yang211914.19
Hung-Yu Kao351745.20
Jan-Ming Ho4950106.64