An extensive empirical study of feature terms selection for text summarization and categorization - Citegraph

Paper Info

Title
An extensive empirical study of feature terms selection for text summarization and categorization

Abstract
The ever-increasing availability of online textual data bases and the development of Internet have necessitated intensive research in the area of automatic text summarization within the Natural Language Processing (NLP) community. Researchers and students constantly face the problem when they work on a research project that, it is almost impossible to read most of the newly published papers. The goal of text summarization based on extraction approach is sentences selection. One of the methods to obtain the sentences is to assign some feature terms of sentences for the summary called ranking sentences and then select the best ones. Broad indexing and speedy search alone are not enough for effective retrieval. Categorized data are easy for user to browse if the data is well organized. In the first stage each document is prepared by preprocessing process: sentence segmentation, tokenization, stop word removal, case folding, lemmatization, and stemming. Then, we used important features, sentence filtering features, data compression features and finally calculate their score for each sentence. We proposed text summarization based on HMM tagger to improve the quality of the summary. By creating impressions the documents are also categorized. We compared our results with the Copernicus summarizer, Great summarizer and Microsoft Word 2007 summarizers etc. The proposed system is tested with four types' similarities: Cosine, Jaccard, Jaro-winkler and Sorenson similarities. The results show that the best quality for the summaries was obtained by feature terms method. Our text categorization approach is validated with Naïve Bayesian, Decision Tree Induction, KNN and SVM approaches.

Year	DOI	Venue
2012	10.1145/2393216.2393317	CCSEIT
Keywords	Field	DocType
sentence segmentation,automatic text summarization,ranking sentence,feature terms selection,copernicus summarizer,extensive empirical study,data compression feature,categorized data,text categorization approach,sentences selection,text summarization,online textual data base,natural language processing,term frequency	Text graph,Tokenization (data security),Automatic summarization,Naive Bayes classifier,Computer science,Search engine indexing,Artificial intelligence,Natural language processing,Sentence,Word processing,Stop words	Conference
Citations	PageRank	References
2	0.36	5
Authors
2

Authors (2 rows)

Cited by (2 rows)

References (5 rows)

Name	Order	Citations	PageRank
Suneetha Manne	1	2	1.04
S. Sameen Fatima	2	2	0.70

1