A Comparative Study Of Topic Identification On Newspaper And E-Mail - Citegraph

Paper Info

Title
A Comparative Study Of Topic Identification On Newspaper And E-Mail

Abstract
This paper presents several statistical methods for topic identification on two kinds of textual data: newspaper articles and e-mails. Five methods are tested on these two corpora: topic unigrams, cache model, TFIDF classier, topic perplexity, and weighted model. Our work aims to study these methods by confronting them to very different data. This study is very fruitful for our research. Statistical topic identification methods depend not only on a corpus, but also on its type. One of the methods achieves a topic identification of 80% on a general newspaper corpus but does not exceed 30% on e-mail corpus. Another method gives the best result on e-mails, but has not the same behavior on a newspaper corpus. Me also show in this paper that almost all our methods achieve good results in retrieving the first two manually annotated labels.

Year	DOI	Venue
2001	10.1109/SPIRE.2001.989770	EIGHTH SYMPOSIUM ON STRING PROCESSING AND INFORMATION RETRIEVAL, PROCEEDINGS
Keywords	Field	DocType
routing,statistical analysis,information retrieval,language model,testing,natural languages,speech recognition	tf–idf,Information retrieval,Cache,Computer science,Newspaper,Natural language,Artificial intelligence,Natural language processing,Text categorization,Vocabulary,Language model,Statistical analysis	Conference
Citations	PageRank	References
8	0.61	5
Authors
5

Authors (5 rows)

Cited by (8 rows)

References (5 rows)

Name	Order	Citations	PageRank
Brigitte Bigi	1	336	27.76
Armelle Brun	2	138	21.49
Jean-Paul Haton	3	380	65.42
Kamel Smaïli	4	120	25.18
Imed Zitouni	5	612	46.39

1