OCRd text from Trove digitised journals

Harvested: 5 August 2021

I harvested metadata and OCRd text from Trove's digitised periodicals:

  • 1,163 periodicals had OCRd text available for download
  • OCRd text was downloaded from 51,928 periodical issues
  • About 10gb of text was downloaded

As well as downloading the text files, the harvesting process generated a CSV file for each periodical that captures details of each available issue with OCRd text. For each issue, the CSV file lists:

Column Description
title the title of the periodical
id the Trove object identifier for the issue
details additional issue details such as volume and issue number
pages the number of pages in the issue
text_file the name of the file containing the OCRd text from this issue

Download from CloudStor

Other options

  • The complete collection of text files for all the periodicals can be browsed here

Getting help

Cite as

Sherratt, Tim. (2019). GLAM-Workbench/trove-journals (version v0.1.0). Zenodo.

