OCRd text from Trove digitised journals
Harvested: 5 August 2021
I harvested metadata and OCRd text from Trove's digitised periodicals:
- 1,163 periodicals had OCRd text available for download
- OCRd text was downloaded from 51,928 periodical issues
- About 10gb of text was downloaded
As well as downloading the text files, the harvesting process generated a CSV file for each periodical that captures details of each available issue with OCRd text. For each issue, the CSV file lists:
Column | Description |
---|---|
title |
the title of the periodical |
id |
the Trove object identifier for the issue |
details |
additional issue details such as volume and issue number |
pages |
the number of pages in the issue |
text_file |
the name of the file containing the OCRd text from this issue |
Other options¶
- The complete collection of text files for all the periodicals can be browsed here
Related resources¶
Getting help¶
Cite as¶
Sherratt, Tim. (2019). GLAM-Workbench/trove-journals (version v0.1.0). Zenodo. https://doi.org/10.5281/zenodo.3545216