This notebook helps you find, download, and explore all the presentation files captured from a particular domain, like It includes a series of processing steps to: harvest capture data; remove duplicates from capture data and download files; convert Powerpoint files to PDFs; extract screenshots and text from the PDFs; save metadata, screenshots, and text into an SQLite database; open the SQLite db in Datasette for exploration. Here's an example of the SQLite database created by harvesting Powerpoint files from the domain, running in Datasette on Glitch.

