Automated Dataset Construction from Web Resources with Tool Kayur - Citegraph

Paper Info

Title
Automated Dataset Construction from Web Resources with Tool Kayur

Abstract
Many text mining tools cannot be applied directly to documents available on web pages. There are tools for fetching and preprocessing of textual data, but combining them in one working tool chain can be time consuming. The preprocessing task is even more labor-intensive if documents are located on multiple remote sources with different storage formats. In this paper we propose the simplification of data preparation process for cases when data come from wide range of web resources. We developed an open-sourced tool, called Kayur, that greatly minimizes time and effort required for routine data preprocessing steps, allowing to quickly proceed to the main task of data analysis. The datasets generated by the tool are ready to be loaded into a data mining workbench, such as WEKA or Carrot2, to perform classification, feature prediction, and other data mining tasks.

Year	DOI	Venue
2017	10.1109/CANDAR.2016.0029	2016 Fourth International Symposium on Computing and Networking (CANDAR)
Keywords	DocType	Volume
automation,information extraction,natural language processing,web content mining	Journal	7
Issue	ISSN	ISBN
2	2379-1888	978-1-5090-2656-2
Citations	PageRank	References
0	0.34	10
Authors
3

Authors (3 rows)

Cited by (0 rows)

References (10 rows)

Name	Order	Citations	PageRank
Alexander Kohan	1	0	0.34
Mitsuharu Yamamoto	2	0	0.34
Cyrille Artho	3	588	44.46

1