Abstract | ||
---|---|---|
With an increasing use of data mining tools and techniques, we envision that a Knowledge Discovery and Data Mining System (KDDMS) will have to support and optimize for the following scenarios: 1) Sequence of Queries: A user may analyze one or more datasets by issuing a sequence of related complex mining queries, and 2) Multiple Simultaneous Queries: Several users may be analyzing a set of datasets concurrently, and may issue related complex queries.This paper presents a systematic mechanism to optimize for the above cases, targetting the class of mining queries involving frequent pattern mining on one or multiple datasets. We present a system architecture and propose new algorithms for this purpose. We show the design of a knowledgeable cache which can store the past query results from queries on multiple datasets. We present algorithms which enable the use of the results stored in such a cache to further optimize multiple queries.We have implemented and evaluated our system with both real and synthetic datasets. Our experimental results show that our techniques can achieve a speedup of up to a factor of 9, compared with the systems which do not support caching or optimize for multiple queries. |
Year | DOI | Venue |
---|---|---|
2005 | 10.1145/1133890.1133893 | MDM@KDD |
Field | DocType | ISBN |
Query optimization,Data mining,Data stream mining,Computer science,Cache,Complex data type,Artificial intelligence,Knowledge extraction,Systems architecture,Machine learning,Speedup | Conference | 1-59593-216-X |
Citations | PageRank | References |
3 | 0.36 | 29 |
Authors | ||
3 |
Name | Order | Citations | PageRank |
---|---|---|---|
Ruoming Jin | 1 | 1637 | 91.73 |
Kaushik Sinha | 2 | 244 | 17.81 |
Gagan Agrawal | 3 | 2058 | 209.59 |