Skip to content

Find and explore Powerpoint presentations from a specific domain

Works with IA

Title slide from presentationTitle slide from presentation

This notebook helps you find, download, and explore all the presentation files captured from a particular domain, like defence.gov.au. It includes a series of processing steps to: harvest capture data; remove duplicates from capture data and download files; convert Powerpoint files to PDFs; extract screenshots and text from the PDFs; save metadata, screenshots, and text into an SQLite database; open the SQLite db in Datasette for exploration. Here's an example of the SQLite database created by harvesting Powerpoint files from the defence.gov.au domain, running in Datasette on Glitch.

Run live on Binder

Other options

Additional documentation

Getting help

Cite as

Sherratt, Tim & Jackson, Andrew. (2022). GLAM-Workbench/web-archives (version v1.1.0). Zenodo. https://doi.org/10.5281/zenodo.6450762

Section sponsor

The Web Archives section of the GLAM Workbench is sponsored by the British Library.