Analyse rates of OCR correction
The full text of newspaper articles in Trove is extracted from page images using Optical Character Recognition (OCR). The accuracy of the OCR process is influenced by a range of factors including the font and the quality of the images. Many errors slip through. Volunteers have done a remarkable job in correcting these errors, but it's a huge task. This notebook explores the scale of OCR correction in Trove.
Preview
Using this notebook¶
To run this notebook using the ARDC Binder service you'll need to log in using an account from an Australian university or research organisation. If you don't have an account, try MyBinder instead.
The MyBinder service doesn't require any authentication, but it can be slow to start and will sometimes fail when busy. If you have a login at an Australian university, you'll probably get better results with ARDC Binder.
Binder is great for experimentation and quick tasks, but for some projects you might need a dedicated, persistent environment in which to work. There's information on other options in the run these notebooks section.
Related datasets¶
Additional documentation¶
Getting help¶
Cite as¶
Sherratt, Tim. (2024). GLAM-Workbench/trove-newspapers (version v2.0.0). Zenodo. https://doi.org/10.5281/zenodo.4724339