Title
Networked mining of atomic and molecular data from electronic journal databases on the internet
Abstract
Several centers of atomic and molecular data in the world maintain research databases for use in fusion plasma simulations, hadron therapy, modelling the universe and other areas. Among the data center activities, collection of experimental and theoretical results across the world has been of major importance. This includes the identification, relevance assessment and retrieval of journal articles, followed by the data extraction, data mining, format conversion and data input. The methodology of the process still largely relies on working groups of specialists and part-time human labor, in spite of recent modernization in journal publishing, especially the electronic journals newly available in subscription domain and the free-access online abstract databases. This work focuses on automating the above procedure to the maximum extent possible. In particular, we design a download robot that performs query search and abstract retrieval for the candidates of relevant articles over the internet at first stage, followed by fultext retrieval (pdf format), text extraction and a deterministic relevance judgement. As a demonstration, we have also developed a bibliography database for electron-molecule collisions that automatically updates its contents over the internet in regular time intervals. The present work belongs to the project for evolutional data collecting system supported by a JSPS project which involves several research institutes.
Year
DOI
Venue
2005
10.1007/978-3-540-31970-2_13
DNIS
Keywords
Field
DocType
data center activity,data mining,data input,data extraction,networked mining,fultext retrieval,jsps project,molecular data,evolutional data,electronic journal databases,deterministic relevance judgement,abstract retrieval,working group,data collection,data center
Information system,Data mining,Data collection,Computer science,Data conversion,Information extraction,Data extraction,Data center,Database,The Internet,Electronic publishing
Conference
Volume
ISSN
ISBN
3433
0302-9743
3-540-25361-0
Citations 
PageRank 
References 
0
0.34
5
Authors
4
Name
Order
Citations
PageRank
Lukas Pichl1116.78
Manabu Suzuki200.68
Kazuyuki Joe300.34
Akira Sasaki477.22