Title
Cloud computing data capsules for non-consumptiveuse of texts
Abstract
As digital data sources grow in number and size, they pose an opportunity for computational investigation by means of text mining, natural language processing (NLP), and other text analysis techniques. In this paper we propose a virtual machine (VM) framework and methodology for non-consumptive text analysis. Using a remote VM model, the VM is configured with software and tooling for text analysis. When completed, the VM is wiped out and resources released for other users to share. Our approach extends the VM by turning it into a data capsules that prevents leakage of copyrighted content in the event that the VM is compromised. The HathiTrust Research Center Data Capsules has seen early use in application against the HathiTrust repository of digitized books from university libraries nationwide.
Year
DOI
Venue
2014
10.1145/2608029.2608031
ScienceCloud@HPDC
Keywords
Field
DocType
distributed systems,large-scale text mining,cloud computing,data capsules,non-consumptive use
Research center,Virtual machine,Computer science,Software,Digital data,Database,Cloud computing
Conference
Citations 
PageRank 
References 
14
1.05
8
Authors
5
Name
Order
Citations
PageRank
Jiaan Zeng1141.39
Guangchen Ruan2141.05
Alexander Crowell3141.05
Ataul Prakash41712202.35
Beth Plale5141.05