Comparing frequency of word occurrences in abstracts and texts using two stop word lists. - Citegraph

Paper Info

Title
Comparing frequency of word occurrences in abstracts and texts using two stop word lists.

Abstract
Retrieval tests have assumed that the abstract is a true surrogate of the entire text. However, the frequency of terms in abstracts has never been compared to that of the articles they represent. Even though many sources are now available in full-text, many still rely on the abstract for retrieval. 1,138 articles with their abstracts were downloaded from Journal of the American Medical Association, New England Journal of Medicine, the British Medical Journal, and the Lancet. Based on two stop word lists, one long and one short, content bearing words were extracted from the articles and their abstracts and the frequency of each word was counted in both sources. Each article and its abstract were tested using a chi-squared test to determine if the words in the abstract occurred as frequently as would be expected. 96% to 98% of the abstracts tested were not significantly different than random samples of the articles they represented. In these four journals, the abstracts are lexical, as well as intellectual, surrogates for the articles they represent.

Year	Venue	Keywords
2001	JOURNAL OF THE AMERICAN MEDICAL INFORMATICS ASSOCIATION	chi square distribution,bibliometrics
Field	DocType	Issue
Information retrieval,Computer science,Natural language processing,Bibliometrics,Artificial intelligence,Vocabulary,Stop words	Conference	SUPnan
ISSN	Citations	PageRank
1067-5027	3	0.55
References	Authors
0	7

Authors (7 rows)

Cited by (3 rows)

References (0 rows)

Name	Order	Citations	PageRank
K Su	1	3	0.55
James E. Ries	2	13	3.60
G M Peterson	3	3	0.55
Mary Ellen Sievert	4	5	1.80
Timothy B. Patrick	5	34	14.41
David E. Moxley	6	5	1.30
L D Ries	7	3	0.55

1