Find and explore Powerpoint presentations from a specific domain
Works with IA
This notebook helps you find, download, and explore all the presentation files captured from a particular domain, like
defence.gov.au. It includes a series of processing steps to: harvest capture data; remove duplicates from capture data and download files; convert Powerpoint files to PDFs; extract screenshots and text from the PDFs; save metadata, screenshots, and text into an SQLite database; open the SQLite db in Datasette for exploration. Here's an example of the SQLite database created by harvesting Powerpoint files from the
defence.gov.au domain, running in Datasette on Glitch.
Sherratt, Tim & Jackson, Andrew. (2022). GLAM-Workbench/web-archives (version v1.1.0). Zenodo. https://doi.org/10.5281/zenodo.6450762
The Web Archives section of the GLAM Workbench is sponsored by the British Library.