Title
A generic method for multi word extraction from Wikipedia
Abstract
This paper presents the generic method for multiword expression extraction from Wikipedia. The method is using the properties of this specific encyclopedic genre in its HTML format and it relies on the intention of the authors of articles to link to other articles. The relevant links were processed by applying local regular grammars within the NooJ development environment. We tested the method on a Croatian version of Wikipedia and we present the results obtained.
Year
DOI
Venue
2008
10.1109/ITI.2008.4588490
ITI
Keywords
Field
DocType
wikipedia,croatian,encyclopaedias,word processing,nooj development environment,encyclopedic genre,multiword expression extraction,hypermedia markup languages,html format,multi word expressions,generic method,multi word extraction,testing,electronic publishing,artificial neural networks,dictionaries,information services,html,space technology,internet,functional analysis,encyclopedias,computational linguistics,encoding,filtering,development environment
Information system,Rule-based machine translation,Information retrieval,Computer science,Computational linguistics,Artificial intelligence,Natural language processing,Encyclopedia,Multiword expression,Word processing,Electronic publishing,The Internet
Conference
ISSN
ISBN
Citations 
1330-1012 E-ISBN : 978-953-7138-13-4
978-953-7138-13-4
1
PageRank 
References 
Authors
0.36
1
2
Name
Order
Citations
PageRank
Božo Bekavac110.36
Marko Tadić28015.61