First large-scale information retrieval experiments on turkish texts - Citegraph

Paper Info

Title
First large-scale information retrieval experiments on turkish texts

Abstract
We present the results of the first large-scale Turkish information retrieval experiments performed on a TREC-like test collection. The test bed, which has been created for this study, contains 95.5 million words, 408,305 documents, 72 ad hoc queries and has a size of about 800MB. All documents come from the Turkish newspaper Milliyet. We implement and apply simple to sophisticated stemmers and various query-document matching functions and show that truncating words at a prefix length of 5 creates an effective retrieval environment in Turkish. However, a lemmatizer-based stemmer provides significantly better effectiveness over a variety of matching functions.

Year	DOI	Venue
2006	10.1145/1148170.1148288	SIGIR
Keywords	Field	DocType
million word,lemmatizer-based stemmer,turkish text,trec-like test collection,sophisticated stemmers,better effectiveness,effective retrieval environment,test bed,large-scale information retrieval experiment,prefix length,large-scale turkish information retrieval,turkish newspaper,information retrieval,stemming,lemmatizer	Lemmatisation,Data mining,Turkish,Query language,Computer science,Prefix,Newspaper,Natural language processing,Artificial intelligence,Wireless ad hoc network,Text processing,Information retrieval,Information technology	Conference
ISBN	Citations	PageRank
1-59593-369-7	5	0.50
References	Authors
5	6

Authors (6 rows)

Cited by (5 rows)

References (5 rows)

Name	Order	Citations	PageRank
Fazli Can	1	581	94.63
Seyit Kocberber	2	64	4.58
Erman Balcik	3	26	1.54
Cihan Kaynak	4	26	1.54
H. Cagdas Ocalan	5	5	0.50
Onur M. Vursavas	6	26	1.54

1