Best Practices for Data-Efficient Modeling in NLG:How to Train Production-Ready Neural Models with Less Data - Citegraph

Paper Info

Title
Best Practices for Data-Efficient Modeling in NLG:How to Train Production-Ready Neural Models with Less Data

Abstract
Natural language generation (NLG) is a critical component in conversational systems, owing to its role of formulating a correct and natural text response. Traditionally, NLG components have been deployed using template-based solutions. Although neural network solutions recently developed in the research community have been shown to provide several benefits, deployment of such model-based solutions has been challenging due to high latency, correctness issues, and high data needs. In this paper, we present approaches that have helped us deploy data-efficient neural solutions for NLG in conversational systems to production. We describe a family of sampling and modeling techniques to attain production quality with light-weight neural network models using only a fraction of the data that would be necessary otherwise, and show a thorough comparison between each. Our results show that domain complexity dictates the appropriate approach to achieve high data efficiency. Finally, we distill the lessons from our experimental findings into a list of best practices for production-level NLG model development, and present them in a brief runbook. Importantly, the end products of all of the techniques are small sequence-to-sequence models (2Mb) that we can reliably deploy in production.

Year	Venue	DocType
2020	COLING	Conference
Volume	Citations	PageRank
2020.coling-industry	0	0.34
References	Authors
0	12

Authors (12 rows)

Cited by (0 rows)

References (0 rows)

Name	Order	Citations	PageRank
Ankit Arun	1	0	1.01
Soumya Batra	2	0	1.01
Vikas S. Bhardwaj	3	8	2.86
Ashwini Challa	4	0	0.68
Pinar Donmez	5	0	1.01
Peyman Heidari	6	0	1.01
Hakan Inan	7	8	1.68
Shashank Jain	8	0	1.01
Anuj Kumar	9	19	11.09
Shawn Mei	10	0	0.34
Karthik Mohan	11	153	7.44
M. White	12	12	5.03

1