Title
Assigning Schema Labels Using Ontology And Hueristics
Abstract
Bioinformatics data is growing at a phenomenal rate. Besides the exponential growth of individual databases, the number of data depositories is increasing too. Because of the complexity of the biological concepts, bioinformatics data usually has complex data structures and cannot be easily captured with relational model. As a result, various flat-file formats have been used. Although easy for human interpretation, flat-file formats lack of standards and are hard to be recognized automatically. As a result, manually written parsers are widely used to extract data from them. This has limited the readiness of the data for data consuming programs, such as integration systems. This paper presents a data mining based approach for automatically assigning schema labels to the attributes in a flat-file biological dataset. In conjunction with our prior work on semi-automatically identifying the delimiters and automatically generating parsers, automatic schema labeling offers a novel and practical solution for integrating biological datasets on-the-fly. Our approach for schema labeling is based on unsupervised learning, and uses a feature representation of an attribute by most frequently occurring data values in it. We combine the use of a biological ontology with heuristics. We are able to deal with noise in the datasets by using cutoff functions. Detailed experimental results from three datasets demonstrate the effectiveness of the use of data mining for biological applications
Year
DOI
Venue
2006
10.1109/BIBE.2006.253344
BIBE
Keywords
Field
DocType
feature representation,automatically generating parsers,data consuming program,data depositories,biological concept,assigning schema,biological ontology,bioinformatics data,flat-file biological dataset,data structures,data consuming programs,automatically assigning schema labels,data extraction,delimiters,heuristic programming,biology computing,biological datasets on-the-fly,data depository,heuristics,ontologies (artificial intelligence),complex data structure,data value,data mining,grammars,integration systems,data mining based approach,unsupervised learning,manually written parsers,supervised learning,complex data,relational model,integrable system
Data warehouse,Ontology,Data stream mining,Computer science,Unsupervised learning,Heuristics,Artificial intelligence,Bioinformatics,Parsing,Relational model,Schema (psychology),Machine learning
Conference
ISBN
Citations 
PageRank 
0-7695-2727-2
1
0.36
References 
Authors
30
3
Name
Order
Citations
PageRank
Xuan Zhang111018.58
Ruoming Jin2163791.73
Gagan Agrawal32058209.59