Skip to content

Download the OCRd text for ALL the digitised journals in Trove!

Using the code and data from other notebooks in this repository, you can download the OCRd text from every digitised journal. If you're going to try this, you'll need a lots of patience and lots of disk space. Needless to say, don't try this on a cloud service like Binder. Fortunately you don't have to do it yourself, as I've already run the harvest and made all the text files available. See below for details. I repeat, you probably don't want to do this yourself. The point of this notebook is really to document the methodology used to create the repository.

Run live on ARDC Binder

Other options

Additional documentation

Getting help

Cite as

Sherratt, Tim. (2022). GLAM-Workbench/trove-journals (version v1.0.0). Zenodo.