Abstract | ||
---|---|---|
When seeking information on the Web, Wikipedia is an essential source: its English version features nearly four million articles. Studies show that it is the most frequently plagiarized information source, so when KOPI, a new translational plagiarism checker was created, it was necessary to find a way to add this vast source of information to the database. As it is impossible to download the whole database in an easyto-handle format, like HTML or plain text, and all the available Mediawiki converters have some flaws, a Mediawiki XML dump to plain text converter has been written, which runs every time a new database dump appears on the site with the text version being published for everybody to use. |
Year | Venue | DocType |
---|---|---|
2012 | ERCIM NEWS | Journal |
Volume | Issue | ISSN |
2012 | 89 | 0926-4981 |
Citations | PageRank | References |
0 | 0.34 | 0 |
Authors | ||
3 |
Name | Order | Citations | PageRank |
---|---|---|---|
Máté Pataki | 1 | 24 | 4.15 |
Miklós Vajna | 2 | 0 | 0.34 |
Csaba Attila Marosi | 3 | 72 | 5.26 |