Abstract | ||
---|---|---|
Ever increasing data quantity makes ever more urgent the need for highly scalable learners that have good classification performance. Therefore, an out-of-core learner with excellent time and space complexity, along with high expressivity (that is, capacity to learn very complex multivariate probability distributions) is extremely desirable. This paper presents such a learner. We propose an extension to the k-dependence Bayesian classifier (KDB) that discriminatively selects a sub-model of a full KDB classifier. It requires only one additional pass through the training data, making it a three-pass learner. Our extensive experimental evaluation on 16 large data sets reveals that this out-of-core algorithm achieves competitive classification performance, and substantially better training and classification time than state-of-the-art in-core learners such as random forest and linear and non-linear logistic regression. |
Year | Venue | Keywords |
---|---|---|
2016 | JOURNAL OF MACHINE LEARNING RESEARCH | scalable Bayesian classification,feature selection,out-of-core learning,big data |
Field | DocType | Volume |
Data mining,Feature selection,Computer science,Probability distribution,Artificial intelligence,Classifier (linguistics),Random forest,Pattern recognition,Naive Bayes classifier,Bayesian network,Big data,Machine learning,Scalability | Journal | 17 |
ISSN | Citations | PageRank |
1532-4435 | 1 | 0.35 |
References | Authors | |
0 | 4 |
Name | Order | Citations | PageRank |
---|---|---|---|
Ana M. Martínez | 1 | 47 | 5.78 |
Geoffrey I. Webb | 2 | 3130 | 234.10 |
Shenglei Chen | 3 | 18 | 4.05 |
Nayyar Abbas Zaidi | 4 | 91 | 9.88 |