Abstract | ||
---|---|---|
More and more applications have come with large data sets in the past decade. However, existing algorithms cannot guarantee to scale well on large data. Averaged n-Dependence Estimators (AnDE) allows for flexible learning from out-of-core data, by varying the value of $n$ (number of super parents). Hence, AnDE is especially appropriate for large data learning. In this paper, we propose a sample-based attribute selection technique for AnDE. It needs one more pass through the training data, in which a multitude of approximate AnDE models are built and efficiently assessed by leave-one-out cross validation. The use of a sample reduces the training time. Experiments on 15 large data sets demonstrate that the proposed technique significantly reduces AnDE's error at the cost of a modest increase in training time. This efficient and scalable out-of-core approach delivers superior or comparable performance to typical in-core Bayesian network classifiers. |
Year | DOI | Venue |
---|---|---|
2017 | 10.1109/TKDE.2016.2608881 | IEEE Transactions on Knowledge and Data Engineering |
Keywords | Field | DocType |
Niobium,Bayes methods,Training,Training data,Information technology,Australia,Memory management | Training set,Data mining,Data set,Feature selection,Computer science,Memory management,Bayesian network,Artificial intelligence,Cross-validation,Machine learning,Estimator,Scalability | Journal |
Volume | Issue | ISSN |
29 | 1 | 1041-4347 |
Citations | PageRank | References |
1 | 0.35 | 25 |
Authors | ||
4 |
Name | Order | Citations | PageRank |
---|---|---|---|
Shenglei Chen | 1 | 18 | 4.05 |
Ana M. Martínez | 2 | 47 | 5.78 |
Geoffrey I. Webb | 3 | 99 | 12.05 |
LiMin Wang | 4 | 816 | 48.41 |