Abstract | ||
---|---|---|
As digital data sources grow in number and size, they pose an opportunity for computational investigation by means of text mining, natural language processing (NLP), and other text analysis techniques. In this paper we propose a virtual machine (VM) framework and methodology for non-consumptive text analysis. Using a remote VM model, the VM is configured with software and tooling for text analysis. When completed, the VM is wiped out and resources released for other users to share. Our approach extends the VM by turning it into a data capsules that prevents leakage of copyrighted content in the event that the VM is compromised. The HathiTrust Research Center Data Capsules has seen early use in application against the HathiTrust repository of digitized books from university libraries nationwide. |
Year | DOI | Venue |
---|---|---|
2014 | 10.1145/2608029.2608031 | ScienceCloud@HPDC |
Keywords | Field | DocType |
distributed systems,large-scale text mining,cloud computing,data capsules,non-consumptive use | Research center,Virtual machine,Computer science,Software,Digital data,Database,Cloud computing | Conference |
Citations | PageRank | References |
14 | 1.05 | 8 |
Authors | ||
5 |
Name | Order | Citations | PageRank |
---|---|---|---|
Jiaan Zeng | 1 | 14 | 1.39 |
Guangchen Ruan | 2 | 14 | 1.05 |
Alexander Crowell | 3 | 14 | 1.05 |
Ataul Prakash | 4 | 1712 | 202.35 |
Beth Plale | 5 | 14 | 1.05 |