Download the OCRd text for ALL the digitised journals in Trove!
Using the code and data from other notebooks in this repository, you can download the OCRd text from every digitised journal. If you're going to try this, you'll need a lots of patience and lots of disk space. Needless to say, don't try this on a cloud service like Binder. Fortunately you don't have to do it yourself, as I've already run the harvest and made all the text files available. See below for details. I repeat, you probably don't want to do this yourself. The point of this notebook is really to document the methodology used to create the repository.
Other options¶
Related resources¶
- OCRd text from Trove digitised journals
- List of journals with OCRd text
- CSV formatted list of journals with OCRd text
Additional documentation¶
Getting help¶
Cite as¶
Sherratt, Tim. (2022). GLAM-Workbench/trove-journals (version v1.0.0). Zenodo. https://doi.org/10.5281/zenodo.7039919