Title
Multi-label Associative Classification of Medical Documents from MEDLINE
Abstract
Ability to provide convenient access to scientific documents becomes a difficult problem due to large and constantly increasing number of incoming documents and extensive manual work associated with their storage, description and classification. This requires intelligent search and classification capabilities for users to find required information. It is especially true for repositories of scientific medical articles due to their extensive use, large size and number of new documents, and well maintained structure. This research aims to provide an automated method for classification of articles into the structure of medical document repositories, which would support currently performed extensive manual work. The proposed method classifies articles from the largest medical repository, MEDLINE, using state of the art data mining technology. The method is based on a novel associative classification technique which considers recurrent items and most importantly multi-label characteristic of the MEDLINE data. Based on large scale experiments that utilize 350,000 documents several different classification algorithms have been compared including both recurrent and non-recurrent associative classification. The algorithms are capable of assigning each medical document to several classes (multi-label classification) and are characterized by relatively high accuracy. We also investigate different measures of classification quality and point out pros and cons of each. Based on experimental result we show that recurrent item based associative classification demonstrates superior performance and propose three alternative setups that allow the user to obtain different de- sired classification qualities.
Year
DOI
Venue
2005
10.1109/ICMLA.2005.47
ICMLA
Keywords
Field
DocType
novel associative classification technique,associative classification,different classification algorithm,non-recurrent associative classification,medical documents,sired classification quality,multi-label classification,classification capability,multi-label associative classification,classification quality,extensive manual work,recurrent item,information retrieval systems,data mining,classification
Library classification,Medical documents,Associative property,Information retrieval,Computer science,Web query classification,Document handling,Statistical classification,MEDLINE
Conference
ISBN
Citations 
PageRank 
0-7695-2495-8
11
0.65
References 
Authors
19
3
Name
Order
Citations
PageRank
Rafal Rak138218.30
Lukasz Kurgan21395.89
Marek Reformat373654.02