Abstract | ||
---|---|---|
Traditional data mining approaches look for patterns in a single table, while multi-relational data mining aims for identifying patterns that involve multiple tables. In recent years, the most common mining techniques have been extended to the multi-relational context, but there are few dedicated to deal with data stored following the multi-dimensional model, in particular the star schema. These schemas are composed of a central huge fact table linking a set of small dimension tables. Joining all the tables before mining may not be a feasible solution due to the usual massive number of records. This work proposes a method for mining frequent patterns on data following a star schema that does not materialize the join between the tables. As it extends the algorithm FP-Growth, it constructs an FP-Tree for each dimension and then combines them through the records in the fact table to form a super FP-Tree. This tree is then mined with FP-growth to find all frequent patterns. The paper presents a case study on bibliographic data, comparing efficiency and scalability of our algorithm against FP-Growth. |
Year | DOI | Venue |
---|---|---|
2011 | 10.1142/S0218488511007350 | INTERNATIONAL JOURNAL OF UNCERTAINTY FUZZINESS AND KNOWLEDGE-BASED SYSTEMS |
Keywords | Field | DocType |
Pattern mining, multi-relational data mining, star schema, FP-growth | Data mining,Data stream mining,Fact table,Star schema,Computer science,Stars,Schema (psychology),A* search algorithm | Journal |
Volume | Issue | ISSN |
19 | Supplement-1 | 0218-4885 |
Citations | PageRank | References |
0 | 0.34 | 13 |
Authors | ||
2 |
Name | Order | Citations | PageRank |
---|---|---|---|
Andreia Silva | 1 | 24 | 3.56 |
Cláudia Antunes | 2 | 161 | 16.57 |