Skip to content

OCR corrections in Trove newspapers

Updated: 27 June 2024

OCR errors in Trove's digitised newspapers can be corrected by users. To help understand patterns in newspaper correction, this dataset has been created to record information about the number of articles with corrections.

Files

There are three files in the dataset:

  • corrections_by_year.csv – number of articles corrected in each publication year
  • corrections_by_category.csv – number of articles corrected in each Trove category
  • corrections_by_title.csv – number of articles corrected in each newspaper

The files are in CSV format and contain the following fields.

corrections_by_year.csv

  • term – the publication year
  • total_results – the number of articles with corrections
  • total_articles – the total number of articles
  • proportion – the proportion of articles with corrections

Download from GitHub

corrections_by_category.csv

  • term – the category name
  • total_results – the number of articles with corrections
  • total_articles – the total number of articles
  • proportion – the proportion of articles with corrections

Download from GitHub

corrections_by_title.csv

  • id – the Trove identitifer of the newsspaper title
  • title – the name of the newspaper
  • articles_with_corrections – the number of articles with corrections
  • total_articles – the total number of articles from the newspaper in Trove
  • percentage_with_corrections – the percentage of articles with corrections

Download from GitHub

Additional documentation

Getting help

Cite as

Sherratt, Tim. (2024). GLAM-Workbench/trove-newspapers-corrections (version v2.0). Zenodo. https://doi.org/10.5281/zenodo.12517273