Abstract | ||
---|---|---|
Some consider large-scale language models that can generate long and coherent pieces of text as dangerous, since they may be used in misinformation campaigns. Here we formulate large-scale language model output detection as a hypothesis testing problem to classify text as genuine or generated. We show that error exponents for particular language models are bounded in terms of their perplexity, a standard measure of language generation performance. Under the assumption that human language is stationary and ergodic, the formulation is ex-tended from considering specific language models to considering maximum likelihood language models, among the class of k-order Markov approximations; error probabilities are characterized. Some discussion of incorporating semantic side information is also given. |
Year | DOI | Venue |
---|---|---|
2020 | 10.1109/ITA50056.2020.9245012 | 2020 Information Theory and Applications Workshop (ITA) |
Keywords | DocType | ISSN |
large-scale language model output detection,language generation performance,human language,maximum likelihood language models,text detection,k-order Markov approximations,error probabilities,semantic side information | Conference | 2641-8150 |
ISBN | Citations | PageRank |
978-1-7281-8825-6 | 0 | 0.34 |
References | Authors | |
12 | 3 |
Name | Order | Citations | PageRank |
---|---|---|---|
Varshney Lav R. | 1 | 0 | 0.34 |
nitish shirish keskar | 2 | 325 | 16.71 |
Richard Socher | 3 | 6770 | 230.61 |