Skip to content

Harvesting collections of text from archived web pages

Works with AWA, IA, NLNZ, UKWA, & UKGWA

This notebook helps you assemble datasets of text extracted from all available captures of archived web pages. You can then feed these datasets to the text analysis tool of your choice to analyse changes over time.

Run live on Binder

Other options

Additional documentation

Getting help

Cite as

Sherratt, Tim; Jackson, Andrew & Bickford, Jake. (2023). GLAM-Workbench/web-archives (version v1.2.0). Zenodo. https://doi.org/10.5281/zenodo.7898218

Section sponsor

The Web Archives section of the GLAM Workbench is sponsored by the British Library.