Find and explore Powerpoint presentations from a specific domain
Works with IA
This notebook helps you find, download, and explore all the presentation files captured from a particular domain, like defence.gov.au
. It includes a series of processing steps to: harvest capture data; remove duplicates from capture data and download files; convert Powerpoint files to PDFs; extract screenshots and text from the PDFs; save metadata, screenshots, and text into an SQLite database; open the SQLite db in Datasette for exploration. Here's an example of the SQLite database created by harvesting Powerpoint files from the defence.gov.au
domain, running in Datasette on Glitch.
Other options¶
Additional documentation¶
Getting help¶
Cite as¶
Sherratt, Tim; Jackson, Andrew & Bickford, Jake. (2023). GLAM-Workbench/web-archives (version v1.2.0). Zenodo. https://doi.org/10.5281/zenodo.7898218
Section sponsor
The Web Archives section of the GLAM Workbench is sponsored by the British Library.