Statistical digram and trigram analysis of Turkish in terms of coverage and entropy for possible language and speech based applications - Citegraph

Paper Info

Title
Statistical digram and trigram analysis of Turkish in terms of coverage and entropy for possible language and speech based applications

Abstract
In this study two frameworks, made up of digrams and trigrams, are built for a complete coverage of the Turkish language. In addition, character, digram and trigram entropy values for Turkish, English and Spanish are compared. Examining meaningful Turkish texts, we have achieved the result that, there are 3 major digram clusters which constitute slightly more than 60% of Turkish texts. Similar to digram distributions, there are 3 major trigram clusters which cover almost 40% of Turkish texts. The statistics show that, for 99% coverage of Turkish, 391 (of 841 theoretical) digrams and 3,396 (of 24,389 theoretical) trigrams are sufficient. The results of this study would constitute a general roadmap for rapid coverage to researchers who would like to work on Turkish language and speech based applications. As an application, the results could lead to a general framework for setting up the rules of prioritization in duration modeling in concatenative text-to-speech synthesis systems.

Year	Venue	Keywords
2010	Aalborg	entropy,natural language processing,speech processing,speech synthesis,text analysis,english,spanish,turkish language based applications,turkish texts,concatenative text-to-speech synthesis systems,coverage,digram distributions,digram entropy values,duration modeling,speech based applications,statistical digram analysis,statistical trigram analysis,trigram entropy values,electronic publishing,signal processing
Field	DocType	ISSN
Turkish,Computer science,Trigram,Prioritization,Natural language processing,Artificial intelligence,Electronic publishing	Conference	2219-5491
Citations	PageRank	References
0	0.34	3
Authors
3

Authors (3 rows)

Cited by (0 rows)

References (3 rows)

Name	Order	Citations	PageRank
Ibrahim Baran Uslu	1	0	0.34
Asim Egemen Yilmaz	2	6	2.86
H. G. Ilk	3	17	6.13

1