OCRd text from Trove digitised journals
This dataset contains OCRd text and metadata harvested from digitised periodicals in Trove.
The zip file contains a directory for each periodical which is named using it's title and identifier, eg: 14th-company-magazine-nla.obj-15956697
. Each directory contains a CSV-formatted list of issues and a subdirectory named texts
that contains a text file for each issue with OCRd text, The text files are named using the issue's date and identifier, eg: 1918-06-14-nla.obj-15967449.txt
. If text was successfully downloaded from an issue, the issues.csv
file will inlcude the name of the text file in the text_file
column.
Files¶
trove-periodicals.zip¶
date harvested | 2024-03-12 |
format | application/zip |
file size | 3.7 GB |
Context of creation¶
date harvested | 2024-03-12 |
notebook | Download the OCRd text for ALL the digitised periodicals in Trove! |
Getting help¶
Cite as¶
Sherratt, Tim. (2024). GLAM-Workbench/trove-journals (version v2.2.0). Zenodo. https://doi.org/10.5281/zenodo.13744407