Title
Distributed classification of textual documents on the grid
Abstract
Efficient access to information and integration of information from various sources and leveraging this information to knowledge are currently major challenges in life science research. However, a large fraction of this information is only available from scientific articles that are stored in huge document databases in free text format or from the Web, where it is available in semi-structured format. Text mining provides some methods (e.g., classification, clustering, etc.) able to automatically extract relevant knowledge patterns contained in the free text data. The inclusion of the Grid text-mining services into a Grid-based knowledge discovery system can significantly support problem solving processes based on such a system. Motivation for the research effort presented in this paper is to use the Grid computational, storage, and data access capabilities for text mining tasks and text classification in particular. Text classification mining methods are time-consuming and utilizing the Grid infrastructure can bring significant benefits. Implementation of text mining techniques in distributed environment allows us to access different geographically distributed data collections and perform text mining tasks in parallel/distributed fashion.
Year
DOI
Venue
2006
10.1007/11847366_73
HPCC
Keywords
Field
DocType
textual document,free text format,free text data,text mining task,grid computational,text mining technique,text classification mining method,text classification,grid infrastructure,text mining,grid text-mining service,knowledge discovery,distributed environment,data access,data collection,structure formation,grid computing
Information integration,Co-occurrence networks,Concept mining,Grid computing,Noisy text analytics,Information retrieval,Computer science,Information access,Knowledge extraction,Free Text Format
Conference
Volume
ISSN
ISBN
4208
0302-9743
3-540-39368-4
Citations 
PageRank 
References 
4
0.57
6
Authors
4
Name
Order
Citations
PageRank
Ivan Janciak1808.93
Martin Sarnovsky293.26
A Min Tjoa32445465.02
Peter Brezany427643.73