Publishing the Trove Newspaper Corpus. - Citegraph

Paper Info

Title
Publishing the Trove Newspaper Corpus.

Abstract
The Trove Newspaper Corpus is derived from the National Library of Australia's digital archive of newspaper text. The corpus is a snapshot of the NLA collection taken in 2015 to be made available for language research as part of the Alveo Virtual Laboratory and contains 143 million articles dating from 1806 to 2007. This paper describes the work we have done to make this large corpus available as a research collection, facilitating access to individual documents and enabling large scale processing of the newspaper text in a cloud-based environment.

Year	Venue	Keywords
2016	LREC 2016 - TENTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION	newspaper,corpus,linked data
Field	DocType	Citations
Computer science,Linked data,Newspaper,Natural language processing,Artificial intelligence,Publishing	Conference	0
PageRank	References	Authors
0.34	3	2

Authors (2 rows)

Cited by (0 rows)

References (3 rows)

Name	Order	Citations	PageRank
Steve Cassidy	1	35	4.99
Stephen Cassidy	2	0	0.34

1