Title
A Robust Mean Shift-Based Approach to Effectively Clustering Incomplete Educational Data.
Abstract
Data clustering is one of the popular tasks recently used in the educational data mining arena for grouping similar students by several aspects such as study performance, behavior, skill, etc. Many well-known clustering algorithms such as k-means, expectation-maximization, spectral clustering, etc. were employed in the related works for educational data clustering. None of them has taken into consideration the incompleteness of the educational data gathered in an academic credit system. If just a few records have missing values, we might ignore them in the mining task. However, as there are a large number of missing values, ignoring them may lead to the data insufficiency and ineffectiveness of the mining task. On the other hand, many research works have recently proposed several general-purpose solutions to incomplete data clustering. The main difficulties with these solutions for reuse are how to appropriately determine a pre-specified number of the clusters for each data set in a particular application domain and the non-arbitrary shapes of the resulting clusters. Hence, we focus on two parts: the first one that resolves the incomplete data clustering task in the education domain and the second one that proposes a robust effective approach to the aforementioned clustering task. Our resulting solution to clustering incomplete educational data is a mean shift-based clustering algorithm named iMS_nps using the nearest prototype strategy. iMS_nps can also overcome the aforementioned difficulties. In addition, experimental results have shown that the clusters from our proposed algorithm have better cluster quality as compared to some existing approaches.
Year
DOI
Venue
2015
10.1109/ACOMP.2015.14
ACOMP
Keywords
Field
DocType
incomplete data clustering, parameter-free data clustering, robustness, nearest prototype, educational data mining
Data mining,Canopy clustering algorithm,Fuzzy clustering,CURE data clustering algorithm,Data stream clustering,Correlation clustering,Computer science,Determining the number of clusters in a data set,Constrained clustering,Artificial intelligence,Cluster analysis,Machine learning
Conference
Citations 
PageRank 
References 
0
0.34
18
Authors
3
Name
Order
Citations
PageRank
Thi Ngoc Chau Vo1498.68
Phan Huu Loc200.34
Vo Thi Ngoc Tran320.70