Abstract | ||
---|---|---|
AbstractWe present a framework for generating natural language description from structured data such as tables; the problem comes under the category of data-to-text natural language generation (NLG). Modern data-to-text NLG systems typically use end-to-end statistical and neural architectures that learn from a limited amount of task-specific labeled data, and therefore exhibit limited scalability, domain-adaptability, and interpretability. Unlike these systems, ours is a modular, pipeline-based approach, and does not require task-specific parallel data. Rather, it relies on monolingual corpora and basic off-the-shelf NLP tools. This makes our system more scalable and easily adaptable to newer domains.Our system utilizes a three-staged pipeline that: (i) converts entries in the structured data to canonical form, (ii) generates simple sentences for each atomic entry in the canonicalized representation, and (iii) combines the sentences to produce a coherent, fluent, and adequate paragraph description through sentence compounding and co-reference replacement modules. Experiments on a benchmark mixed-domain data set curated for paragraph description from tables reveals the superiority of our system over existing data-to-text approaches. We also demonstrate the robustness of our system in accepting other popular data sets covering diverse data types such as knowledge graphs and key-value maps. |
Year | DOI | Venue |
---|---|---|
2019 | 10.1162/coli_a_00363 | Hosted Content |
Field | DocType | Volume |
Natural language generation,Computer science,Natural language,Artificial intelligence,Natural language processing,Data model,Scalability | Journal | 45 |
Issue | ISSN | Citations |
4 | 0891-2017 | 0 |
PageRank | References | Authors |
0.34 | 0 | 4 |
Name | Order | Citations | PageRank |
---|---|---|---|
Anirban Laha | 1 | 21 | 4.39 |
Parag Jain | 2 | 9 | 4.53 |
Abhijit Mishra | 3 | 8 | 4.51 |
Karthik Sankaranarayanan | 4 | 28 | 9.36 |