Skip to content

Harvest of unique urls from the domain

Harvested in April 2022

This is a dataset of (mostly unique) urls in the domain harvested using the IA CDX API. It is saved as newline-delimited JSON, with one JSON object per line. Each JSON object contains the following fields:

Field Description
urlkey The domain of the url in SURT (Sort-friendly URI Reordering Transform) format
timestamp Time and date when the url was captured
original The archived url

Download from CloudStor (75.7gb)

Additional documentation

Getting help

Cite as

Sherratt, Tim; Jackson, Andrew & Bickford, Jake. (2023). GLAM-Workbench/web-archives (version v1.2.0). Zenodo.

Section sponsor

The Web Archives section of the GLAM Workbench is sponsored by the British Library.