Title
Significance of an Accurate Sandhi-Splitter in Shallow Parsing of Dravidian Languages
Abstract
This paper evaluates the challenges involved in shallow parsing of Dravidian languages which are highly agglutinative and morphologically rich. Text processing tasks in these languages are not trivial because multiple words concatenate to form a single string with morpho-phonemic changes at the point of concatenation. This phenomenon known as Sandhi, in turn complicates the individual word identification. Shallow parsing is the task of identification of correlated group of words given a raw sentence. The current work is an attempt to study the effect of Sandhi in building shallow parsers for Dravidian languages by evaluating its effect on Malayalam, one of the main languages from Dravidian family. We provide an in-depth analysis of effect ofSandhi in developing a robust shallow parser pipeline with experimental results emphasizing on how sensitive the individual components of shallow parser are, towards the accuracy of a sandhi splitter. Our work can serve as a guiding light for building robust text processing systems in Dravidian languages.
Year
Venue
DocType
2016
ACL (Student Research Workshop)
Conference
Volume
Citations 
PageRank 
P16-3
1
0.43
References 
Authors
5
2
Name
Order
Citations
PageRank
Devadath V V110.43
Dipti Misra Sharma226245.90