Title
Comparing frequency of word occurrences in abstracts and texts using two stop word lists.
Abstract
Retrieval tests have assumed that the abstract is a true surrogate of the entire text. However, the frequency of terms in abstracts has never been compared to that of the articles they represent. Even though many sources are now available in full-text, many still rely on the abstract for retrieval. 1,138 articles with their abstracts were downloaded from Journal of the American Medical Association, New England Journal of Medicine, the British Medical Journal, and the Lancet. Based on two stop word lists, one long and one short, content bearing words were extracted from the articles and their abstracts and the frequency of each word was counted in both sources. Each article and its abstract were tested using a chi-squared test to determine if the words in the abstract occurred as frequently as would be expected. 96% to 98% of the abstracts tested were not significantly different than random samples of the articles they represented. In these four journals, the abstracts are lexical, as well as intellectual, surrogates for the articles they represent.
Year
Venue
Keywords
2001
JOURNAL OF THE AMERICAN MEDICAL INFORMATICS ASSOCIATION
chi square distribution,bibliometrics
Field
DocType
Issue
Information retrieval,Computer science,Natural language processing,Bibliometrics,Artificial intelligence,Vocabulary,Stop words
Conference
SUPnan
ISSN
Citations 
PageRank 
1067-5027
3
0.55
References 
Authors
0
7
Name
Order
Citations
PageRank
K Su130.55
James E. Ries2133.60
G M Peterson330.55
Mary Ellen Sievert451.80
Timothy B. Patrick53414.41
David E. Moxley651.30
L D Ries730.55