Download the OCRd text for ALL the digitised journals in Trove!
Using the code and data from other notebooks in this repository, you can download the OCRd text from every digitised journal. If you're going to try this, you'll need a lots of patience and lots of disk space. Needless to say, don't try this on a cloud service like Binder. Fortunately you don't have to do it yourself, as I've already run the harvest and made all the text files available. See below for details. I repeat, you probably don't want to do this yourself. The point of this notebook is really to document the methodology used to create the repository.
- OCRd text from Trove digitised journals
- List of journals with OCRd text
- CSV formatted list of journals with OCRd text
Sherratt, Tim. (2022). GLAM-Workbench/trove-journals (version v1.0.0). Zenodo. https://doi.org/10.5281/zenodo.7039919