Abstract | ||
---|---|---|
In this paper, we consider a novel problem referred to as term filtering with bounded error to reduce the term (feature) space by eliminating terms without (or with bounded) information loss. Different from existing works, the obtained term space provides a complete view of the original term space. More interestingly, several important questions can be answered such as: 1) how different terms interact with each other and 2) how the filtered terms can be represented by the other terms. We perform a theoretical investigation of the term filtering problem and link it to the Geometric Covering By Discs problem, and prove its NP-hardness. We present two novel approaches for both lossless and lossy term filtering with bounds on the introduced error. Experimental results on multiple text mining tasks validate the effectiveness of the proposed approaches. |
Year | DOI | Venue |
---|---|---|
2010 | 10.1109/ICDM.2010.131 | ICDM |
Keywords | Field | DocType |
geometric covering,lossy term filtering,different terms interact,np-hardness,information filtering,lossless term filtering,original term space,feature space reduction,discs problem,term space,lossy term,computational complexity,filtered term,term filtering,multiple text mining tasks,bounded error,data mining,term space reduction,novel approach,information loss,feature space,filtering,correlation,measurement,text mining | Data mining,Information loss,Lossy compression,Computer science,Filter (signal processing),Filtering problem,Correlation,Artificial intelligence,Bounded error,Machine learning,Computational complexity theory,Bounded function | Conference |
ISSN | ISBN | Citations |
1550-4786 E-ISBN : 978-0-7695-4256-0 | 978-0-7695-4256-0 | 0 |
PageRank | References | Authors |
0.34 | 30 | 4 |