Title
Investigating Esperanto's statistical proportions relative to other languages using neural networks and Zipf's law
Abstract
Esperanto is a constructed natural language, which was intended to be an easy-to-learn lingua franca. Zipf's law models the statistical proportions of various phenomena in human ecology, including natural languages. Given Esperanto's artificial origins, one wonders how "natural" it appears, relative to other natural languages, in the context of Zipf's law. To explore this question, we collected a total of 283 books from six languages: English, French, German, Italian, Spanish, and Esperanto. We applied Zipf-based metrics on our corpus to extract distributions for word, word distance, word bigram, word trigram, and word length for each book. Statistical analyses show that Esperanto's statistical proportions are similar to those of other languages. We then trained artificial neural networks (ANNs) to classify books according to language. The ANNs achieved high accuracy rates (86.3% to 98.6%). Subsequent analysis identified German as having the most unique proportions, followed by Esperanto, Italian, Spanish, English, and French. Analysis of misclassified patterns shows that Esperanto's statistical proportions resemble mostly those of German and Spanish, and least those of French and Italian.
Year
Venue
Keywords
2006
Artificial Intelligence and Applications
classification,neural network,zipf s law,artificial neural networks,natural language processing
Field
DocType
ISBN
Zipf's law,Computer science,Trigram,Human ecology,Lingua franca,Natural language,Bigram,Artificial intelligence,Natural language processing,Esperanto grammar,German
Conference
0-88986-556-6
Citations 
PageRank 
References 
3
0.57
4
Authors
4
Name
Order
Citations
PageRank
Bill Manaris120421.81
Luca Pellicoro2302.52
George Pothering3313.29
Harland Hodges430.57