You'll need an API key to work with DigitalNZ data.
Tips, tools, and examples¶
Build a DigitalNZ API search query¶
This notebook creates a form that you can use to experiment with the DigitalNZ search API.
Getting some top-level data from the DigitalNZ API¶
This notebook pokes around at the top-level of DigitalNZ, mainly using facets to generate some collection overviews and summaries.
Harvest facet data from DigitalNZ¶
This notebook explores what facets are available from the DigitalNZ API and demonstrates how to harvest data from them. It generates a summary of all available facets, as well as saving the full set of values from each facet as a CSV file.
Select a random(ish) record from DigitalNZ¶
The DigitalNZ API doesn't provide a random sort option. You can jump to a randomly selected page of results, but you can't do any deeper than 100,000 pages into a results set (that's 1,000,000 records if you set the per_page value to 100). So we need to find some way of filtering the results until there's less than 1,000,000, then we can grab a random page and record. This notebook examines the available facets, then uses them to reduce the size of the results set until it's possible to select a random record. It provides a series of examples of retrieving random records using different filters and facets.
Find results by country in DigitalNZ¶
Many items in DigtalNZ include location information. This can include a country, but as far as I can see there's no direct way to search for results relating to a particular country using the API. You can, however, search for geocoded locations using bounding boxes. This notebook shows how you can use this to search for countries.
Visualising open collections in DigitalNZ¶
usage facet tells you what you can do with a record. A
usage value of 'Use commercially' indicates that the record is 'open', according to the open licence definitions. So by harvesting data from the
usage facet, we can explore how much of DigitalNZ is open. This notebook assembles data relating to the
usage status of each
primary_collection associated with a
content_partner. It then attempts to visualise the data in a suitably colourful burst of fireworks!
Visualise a search in Papers Past¶
Start with some keywords you want to search for in Papers Past, then create a simple visualisation showing the distribution over time and by newspaper.
Harvest data from Papers Past¶
This notebooks lets you harvest large amounts of data for Papers Past (via DigitalNZ) for further analysis. It saves the results as a CSV file that you can open in any spreadsheet program. It currently includes the OCRd text of all the newspaper articles.
Data harvested from facets¶
Harvested: 22 January 2021
The repository includes CSV formatted versions of the data harvested from the 'Harvest facet data' notebook above. Of course, if you want to do something with this data, you might want to run a fresh harvest to make sure it's up-to-date. But they're saved here to get an overview of the available facets, and understand the range of values in each.
Summary of facets:
usage facets (this data was assembled by the 'Visualising open collections' notebook):
Sherratt, Tim. (2019, November 17). GLAM-Workbench/digitalnz (Version v0.1.0). Zenodo. http://doi.org/10.5281/zenodo.3544729