Title
Use of Domain Knowledge for Dimension Reduction - Application to Mining of Drug Side Effects.
Abstract
High dimensionality of datasets can impair the execution of most data mining programs and lead to the production of numerous and complex patterns, inappropriate for interpretation by the experts. Thus, dimension reduction of datasets constitutes an important research orientation in which the role of domain knowledge is essential. We present here a new approach for reducing dimensions in a dataset by exploiting semantic relationships between terms of an ontology structured as a rooted directed acyclic graph. Term clustering is performed thanks to the recently described IntelliGO similarity measure and the term clusters are then used as descriptors for data representation. The strategy reported here is applied to a set of drugs associated with their side effects collected from the SIDER database. Terms describing side effects belong to the MedDRA terminology. The hierarchical clustering of about 1,200 MedDRA terms into an optimal collection of 112 term clusters leads to a reduced data representation. Two data mining experiments are then conducted to illustrate the advantage of using this reduced representation.
Year
Venue
Keywords
2011
KDIR 2011: PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY AND INFORMATION RETRIEVAL
Dimension reduction,Clustering,Semantic similarity,Drug side effects
Field
DocType
Citations 
Data mining,Dimensionality reduction,Domain knowledge,Computer science,Artificial intelligence,Drug side effects,Machine learning
Conference
0
PageRank 
References 
Authors
0.34
0
8