Skip to content

OCRd text from Trove digitised journals

This dataset contains OCRd text and metadata harvested from digitised periodicals in Trove.

The zip file contains a directory for each periodical which is named using it's title and identifier, eg: 14th-company-magazine-nla.obj-15956697. Each directory contains a CSV-formatted list of issues and a subdirectory named texts that contains a text file for each issue with OCRd text, The text files are named using the issue's date and identifier, eg: 1918-06-14-nla.obj-15967449.txt. If text was successfully downloaded from an issue, the issues.csv file will inlcude the name of the text file in the text_file column.

Files

trove-periodicals.zip

date harvested2024-03-12
formatapplication/zip
file size3.7 GB

Download

Context of creation

date harvested2024-03-12
notebookDownload the OCRd text for ALL the digitised periodicals in Trove!

Getting help

Cite as

Sherratt, Tim. (2024). GLAM-Workbench/trove-journals (version v2.2.0). Zenodo. https://doi.org/10.5281/zenodo.13744407