Discovering Highly Informative Feature Set over High Dimensions - Citegraph

Paper Info

Title
Discovering Highly Informative Feature Set over High Dimensions

Abstract
For many textual collections, the number of features is often overly large. These features can be very redundant, it is therefore desirable to have a small, succinct, yet highly informative collection of features that describes the key characteristics of a dataset. Information theory is one such tool for us to obtain this feature collection. With this paper, we mainly contribute to the improvement of efficiency for the process of selecting the most informative feature set over high-dimensional unlabeled data. We propose a heuristic theory for informative feature set selection from high dimensional data. Moreover, we design data structures that enable us to compute the entropies of the candidate feature sets efficiently. We also develop a simple pruning strategy that eliminates the hopeless candidates at each forward selection step. We test our method through experiments on real-world data sets, showing that our proposal is very efficient.

Year	DOI	Venue
2012	10.1109/ICTAI.2012.149	ICTAI
Keywords	Field	DocType
textual collection,high-dimensional unlabeled data,informative feature set selection,informative feature,pruning strategy,feature selection,feature collection,data structures,data structure,forward selection step,information theory,informative collection,feature extraction,high dimensions,discovering highly informative feature,unsupervised,high dimensional data,data structure design,entropy,text analysis,real-world data set,candidate feature,heuristic theory	Data mining,Data set,Dimensionality reduction,Feature selection,Computer science,Artificial intelligence,Information theory,Data structure,Clustering high-dimensional data,Pattern recognition,Feature (computer vision),Feature extraction,Machine learning	Conference
Volume	ISSN	ISBN
1	1082-3409	978-1-4799-0227-9
Citations	PageRank	References
1	0.36	8
Authors
3

Authors (3 rows)

Cited by (1 rows)

References (8 rows)

Name	Order	Citations	PageRank
Chongsheng Zhang	1	16	4.05
Florent Masseglia	2	408	43.08
Xiangliang Zhang	3	728	87.74

1