Title
A Cloud-Hosted MapReduce Architecture for Syntactic Parsing
Abstract
Syntactic parsing is a time-consuming task in natural language processing particularly where a large number of text files are being processed. Parsing algorithms are conventionally designed to operate on a single machine in a sequential fashion and, as a consequence, fail to benefit from high performance and parallel computing resources available on the cloud. We designed and implemented a scalable cloud-based architecture supporting parallel and distributed syntactic parsing for large datasets. The main architecture consists of a syntactic parser (constituency and dependency parsing) and a MapReduce framework running on clusters of machines. The resulting cloud-based MapReduce parsing is able to build a map where syntactic trees of the same input file have the same key and collect into a single file containing sentences along with their corresponding trees. Our experimental evaluation shows that the architecture scales well with regard to number or processing nodes and number of cores per node. In the fastest tested cloud-based setup, the proposed design performs 7 times faster when compared to a local setup. In summary, this study takes an important step toward providing and evaluating a cloud-hosted solution for efficient syntactic parsing of natural language data sets consisting of a large number of files.
Year
DOI
Venue
2019
10.1109/SEAA.2019.00024
2019 45th Euromicro Conference on Software Engineering and Advanced Applications (SEAA)
Keywords
DocType
ISSN
cloud deployment, natural language processing (NLP), syntactic parsing
Conference
1089-6503
ISBN
Citations 
PageRank 
978-1-7281-3422-2
0
0.34
References 
Authors
12
4
Name
Order
Citations
PageRank
Yonas Woldemariam100.34
stefan pletschacher221620.78
Christian Clausner3448.49
Julian M. Bass401.01