Limits of Detecting Text Generated by Large-Scale Language Models - Citegraph

Paper Info

Title
Limits of Detecting Text Generated by Large-Scale Language Models

Abstract
Some consider large-scale language models that can generate long and coherent pieces of text as dangerous, since they may be used in misinformation campaigns. Here we formulate large-scale language model output detection as a hypothesis testing problem to classify text as genuine or generated. We show that error exponents for particular language models are bounded in terms of their perplexity, a standard measure of language generation performance. Under the assumption that human language is stationary and ergodic, the formulation is ex-tended from considering specific language models to considering maximum likelihood language models, among the class of k-order Markov approximations; error probabilities are characterized. Some discussion of incorporating semantic side information is also given.

Year	DOI	Venue
2020	10.1109/ITA50056.2020.9245012	2020 Information Theory and Applications Workshop (ITA)
Keywords	DocType	ISSN
large-scale language model output detection,language generation performance,human language,maximum likelihood language models,text detection,k-order Markov approximations,error probabilities,semantic side information	Conference	2641-8150
ISBN	Citations	PageRank
978-1-7281-8825-6	0	0.34
References	Authors
12	3

Authors (3 rows)

Cited by (0 rows)

References (12 rows)

Name	Order	Citations	PageRank
Varshney Lav R.	1	0	0.34
nitish shirish keskar	2	325	16.71
Richard Socher	3	6770	230.61

1