Skip to content

Trove newspapers

Assorted experiments and examples working with Trove’s digitised newspapers


Tips, tools, and examples

QueryPic Deconstructed

QueryPic is a tool I created many years ago to visualise searches in Trove's digitised newspapers. It shows you the number of articles each year that match your query — instead of a page of search results, you see the complete result set. You can look for patterns and trends across time. This is a deconstructed, extended, and hackable version of QueryPic.

Screen capture

Visualise Trove newspaper searches over time

This notebook helps you zoom out and explore how the number of Trove newspaper articles in your search results varies over time by using the decade and year facets. We then combine this approach with other search facets to see how we can slice a set of results up in different ways to investigate historical changes.

Chart showing search results over time

Visualise the total number of newspaper articles in Trove by year and state

Trove currently includes more 200 million digitised newspaper articles published between 1803 and 2015. In this notebook we explore how those newspaper articles are distributed over time, and by state.

Chart showing number of newspaper articles by state and year

Map Trove newspaper results by state

Uses the Trove state facet to create a choropleth map that visualises the number of search results per state.

Map Trove newspaper results by place of publication

Uses the Trove title facet to find the number of results per newspaper, then merges the results with a dataset of geolocated newspapers to map where articles were published.

Screen capture of heatmap

Map Trove newspaper results by place of publication over time

Adds a time dimension to the examples in the previous notebook to create an animated heatmap.

Analyse rates of OCR correction

The full text of newspaper articles in Trove is extracted from page images using Optical Character Recognition (OCR). The accuracy of the OCR process is influenced by a range of factors including the font and the quality of the images. Many errors slip through. Volunteers have done a remarkable job in correcting these errors, but it's a huge task. This notebook explores the scale of OCR correction in Trove.

Finding non-English newspapers in Trove

There are a growing number of non-English newspapers digitised in Trove. However, if you're only searching using English keywords, you might never know that they're there. I thought it would be useful to generate a list of non-English newspapers, but it wasn't quite as straightforward as I thought.

Today’s news yesterday

Uses the date index and the firstpageseq parameter to find articles from exactly 100 years ago that were published on the front page. It then selects one of the articles at random and downloads and displays an image of the front page.

Create a Trove OCR corrections ticker

Uses the has:corrections parameter to get the total number of newspaper articles with OCR corrections, then displays the results, updating every five seconds.

Screen capture

Save a Trove newspaper article as an image

Sometimes you want to be able to save a Trove newspaper article as an image. Unfortunately, the Trove web interface doesn't make this easy. The 'Download JPG' option actually loads an HTML page, and while you could individually save the images embedded in the HTML page, often articles are sliced up in ways that make the whole thing hard to read and use. This notebook grabs the page on which an article was published, and then crops the page image to the boundaries of the article. The result is a complete, intact image which presents the article as it was originally published. And if the article is split across multiple pages, you'll get one image per page.

Screen capture

Download a page image

The Trove web interface doesn’t provide a way of getting high-resolution page images from newspapers. This simple app lets you download page images as complete, high-resolution JPG files.

Screen capture

Generate an article thumbnail

Generate a nice square thumbnail image for a newspaper article.

Make composite images from lots of Trove newspaper thumbnails

This notebook starts with a search in Trove's newspapers. It uses the Trove API to work its way through the search results. For each article it creates a thumbnail image using the code from this notebook. Once this first stage is finished, you have a directory full of lots of thumbnails. The next stage takes all those thumbnails and pastes them one by one into a BIG image to create a composite, or mosaic.

Create 'scissors and paste' messages from Trove newspaper articles

Scissors and past message - help trapped inside trove

When you search for a term in Trove's digitised newspapers and click on individual article, you'll see your search terms are highlighted. If you look at the code you'll see the highlighted box around the word includes its page coordinates. That means that if we search for a word, we can find where it appears on a page, and by cropping the page to those coordinates we can create an image of an individual word. By combining these images we can create scissors and paste style messages!

Create large composite images from snipped words

Trove words - composite image

This is a variation of the 'scissors & paste' notebook that extracts words from Trove newspaper images and compiles them into messages. In this notebook, you can harvest multiple versions of a list of words and compile them all into one big image.

Upload Trove newspaper articles to Omeka-S

This notebook steps through the process of uploading Trove newspaper articles to your own Omeka-S instance via the API. As well as uploading the article metadata, it attaches image(s) and PDFs of the articles, and creates a linked record for the publishing newspaper. The source of the articles can be a Trove search, a Trove list, a Zotero collection, or just a list of article ids.

Harvest Australian Women's Weekly covers (or the front pages of any newspaper)

Somewhat confusingly, the Australian Women's Weekly is in with Trove's digitised newspapers and not the rest of the magazines. There are notebooks in the GLAM Workbench's journals section to help harvest all of a journal's covers as images, so I thought I should do the same for the Weekly. This notebook can be easily adjusted to download the front pages of any digitised newspaper.

Most of the newspaper articles on Trove were published before 1955, but there are some from the later period. Let's find out how many, and which newspapers they were published in.

Get a list of Trove newspapers that doesn't include government gazettes

The Trove API includes an option to retrieve details of digitised newspaper titles. Version 2 of the API added a separate option to get details of government gazettes. However the original newspaper/titles requests actually returns both the newspaper and gazette titles, so there's no way of getting just the newspaper titles. This notebook explains the problem and provides a simple workaround.

Data and images

CSV formatted list of Australian Women's Weekly issues, 1933 to 1982

Harvested: 26 July 2020

This file includes metadata for 2,566 issues of the Australian Women's Weekly from 1933 to 1982. Fields:

  • issue_id: issue identifier
  • date: issue date (YYYY-MM-DD)
  • url: issue url
  • page_id: identifier of the first page in this issue
  • image_file: file name of downloaded image of front page.

Australian Women's Weekly front covers, 1933 to 1982

Harvested: 26 July 2020

Using the notebook above, images of the front covers of Australian Women's Weekly issues on Trove were downloaded. Harvest details:

Trove newspapers with non-English language content

Created: 11 January 2021

Markdown formatted list of newspapers with non-English content, created using the notebook above.