Abstract | ||
---|---|---|
From a data mining perspective, sequence classification is to build a classifier using frequent sequential patterns. However,
mining for a complete set of sequential patterns on a large dataset can be extremely time-consuming and the large number of
patterns discovered also makes the pattern selection and classifier building very time-consuming. The fact is that, in sequence
classification, it is much more important to discover discriminative patterns than a complete pattern set. In this paper,
we propose a novel hierarchical algorithm to build sequential classifiers using discriminative sequential patterns. Firstly,
we mine for the sequential patterns which are the most strongly correlated to each target class. In this step, an aggressive
strategy is employed to select a small set of sequential patterns. Secondly, pattern pruning and serial coverage test are
done on the mined patterns. The patterns that pass the serial test are used to build the sub-classifier at the first level
of the final classifier. And thirdly, the training samples that cannot be covered are fed back to the sequential pattern mining
stage with updated parameters. This process continues until predefined interestingness measure thresholds are reached, or
all samples are covered. The patterns generated in each loop form the sub-classifier at each level of the final classifier.
Within this framework, the searching space can be reduced dramatically while a good classification performance is achieved.
The proposed algorithm is tested in a real-world business application for debt prevention in social security area. The novel
sequence classification algorithm shows the effectiveness and efficiency for predicting debt occurrences based on customer
activity sequence data. |
Year | DOI | Venue |
---|---|---|
2009 | 10.1007/s11390-009-9288-2 | J. Comput. Sci. Technol. |
Keywords | DocType | Volume |
sequential pattern mining,sequence classification,coverage test,interestingness measure | Journal | 24 |
Issue | ISSN | Citations |
6 | 1860-4749 | 6 |
PageRank | References | Authors |
0.51 | 23 | 5 |
Name | Order | Citations | PageRank |
---|---|---|---|
Huaifeng Zhang | 1 | 240 | 18.84 |
Yanchang Zhao | 2 | 233 | 20.01 |
Longbing Cao | 3 | 2212 | 185.04 |
Chengqi Zhang | 4 | 3636 | 274.41 |
Hans Bohlscheid | 5 | 40 | 3.71 |