Title
A Clustering-Based Approach for Integrating Document-Category Hierarchies
Abstract
E-commerce applications generate and consume a tremendous amount of online information, which is typically available as textual documents. Conceivably, organizations and individuals generally use category sets or hierarchies to organize, archive, and access their documents. Meanwhile, organizations and individuals constantly acquire relevant documents from various Internet sources, each of which may organize its documents in a category set or hierarchy different from that used by the acquiring organization or individual. Consequently, the integration of source documents organized in a category hierarchy into an existing category hierarchy deployed by the acquiring organization or individual becomes an important issue in the e-commerce era. Existing category-integration techniques are mainly designed to integrate document catalogs, each of which is organized nonhierarchically (i.e., in a flat set). In this paper, we propose a clustering-based category-hierarchy integration (CHI) technique, which is an extension of the clustering-based category-integration (CCI) technique. Our empirical evaluation results show that the proposed CHI technique appears to improve the effectiveness of category-hierarchy integration compared with that attained by nonhierarchical category-integration techniques, particularly in homogeneous and comparable scenarios.
Year
DOI
Venue
2008
10.1109/TSMCA.2007.914758
IEEE Transactions on Systems, Man, and Cybernetics, Part A
Keywords
Field
DocType
proposed chi technique,existing category hierarchy,category-integration technique,pattern clustering,clustering-based approach,document management,integrating document-category hierarchies,category hierarchy,clustering-based category-integration,textual document category hierarchy integration,category-hierarchy integration,flat set,category set,internet,document clustering,document-category integration,nonhierarchical category-integration technique,electronic commerce,text analysis,clustering-based category-hierarchy integration,e-commerce application,text mining,taxonomy integration,helium,e commerce,taxonomy,technology management,classification algorithms,organizations,business,finance
Data science,Information retrieval,Document clustering,Homogeneous,Document management system,Computer science,Source document,Hierarchy,Statistical classification,Cluster analysis,The Internet
Journal
Volume
Issue
ISSN
38
2
1083-4427
Citations 
PageRank 
References 
9
0.51
31
Authors
2
Name
Order
Citations
PageRank
Tsang-Hsiang Cheng114112.02
Chih-ping Wei274374.20