Skip to content

Trove newspapers with non-English language content

Updated: 7 July 2024

This dataset contains information about newspapers published in languages other than English that have been digitised and made available through Trove. Data about the languages present in newspapers was generated by harvesting a sample of articles from each newspaper using the Trove API, and then using language detection software on the OCRd text of each article.

Files

newspapers_non_english.csv

The dataset contains the following columns:

Column Contents
id newspaper id
title newspaper title
language language code
proportion proportion of articles in this language
number number of articles sampled
language_full full language name

Download from GitHub

non-english-newspapers.md

This is a markdown-formatted list created by grouping the dataset by newspaper title. It includes details of the main languages in each newspaper.

View list

Additional documentation

Getting help

Cite as

Sherratt, Tim. (2022). GLAM-Workbench/trove-newspapers (version v1.3.4). Zenodo. https://doi.org/10.5281/zenodo.6746078