Title
Automatic Indexing and Creating Semantic Networks for Agricultural Science Papers in the Polish Language
Abstract
This paper presents an automatic indexing system, created on the basis of text analysis, which involves grouping words and reducing them to their dictionary form. The system, developed with the help of an inflection dictionary of the Polish language, is designed to store and retrieve scientific papers dedicated to agriculture. During the analysis, auxiliary words such as pronouns, conjunctions, etc. were omitted. The words which are not present in the inflection dictionary, were used to create a dictionary of new terms. The words stored in the dictionary of new terms were used for the extraction of agricultural terms, which then could be located in the AGROVOC thesaurus. For each of the analyzed papers, a set of concepts with assigned weights was created. For each of the stored scientific papers, an "artificial sentence" was generated. An "artificial sentence" was created on the basis of the frequency of occurrence of dictionary forms of a word appearing in the texts and the word's grammatical category. This "artificial sentence" as well as sets of terms were used to find relationships between the papers stored in the system. These dependencies are used in an algorithm of searching for articles matching a query. It was observed that the number of correct results depends on the number of words in the paper. If a work consisted of at least a thousand words, the probability of misdiagnosis of content was not higher than 5%. In the case of short texts, such as abstracts, the probability of misdiagnosis was much higher, approximately 23%. Results obtained in the presented system are more accurate than those obtained by standard search engines. This method can also be applied to other natural languages with extensive inflection systems. The presented solution is a continuation of the work carried out under a grant [N N310 038538].
Year
DOI
Venue
2013
10.1109/COMPSACW.2013.63
Computer Software and Applications Conference Workshops
Keywords
Field
DocType
creating semantic networks,agricultural science papers,thousand word,text analysis,artificial sentence,scientific paper,automatic indexing,extensive inflection system,dictionary form,polish language,inflection dictionary,automatic indexing system,auxiliary word,new term,natural languages,speech,electronic publishing,computational linguistics,semantic networks,knowledge extraction,semantics,dictionaries,pattern matching,agriculture,indexing,search engines,semantic network,speech recognition,natural language processing
Grammatical category,Information retrieval,Computer science,Search engine indexing,Semantic network,Lemma (morphology),Natural language,Machine-readable dictionary,Natural language processing,Artificial intelligence,Sentence,Automatic indexing
Conference
Citations 
PageRank 
References 
2
0.45
3
Authors
2
Name
Order
Citations
PageRank
Piotr Wrzeciono121.47
Waldemar Karwowski212031.49