Exploring subdomains in the whole of gov.au
Works with IA
Most of the notebooks in this repository work with small slices of web archive data. In this notebook we'll scale things up a bit to try and find all of the subdomains that have existed in the gov.au domain. As in other notebooks, we'll obtain the data by querying the Internet Archive's CDX API. The only real difference is that it will take some hours to harvest all the data. Once we have the data we'll do some analysis, and visualise the domain hierarchy as a dendrogram.
- Harvest of unique urls from the gov.au domain
- Unique subdomains of gov.au split into components
- Unique subdomains of gov.au in SURT format
- Circular dendrograms of gov.au subdomains
Sherratt, Tim & Jackson, Andrew. (2022). GLAM-Workbench/web-archives (version v1.1.0). Zenodo. https://doi.org/10.5281/zenodo.6450762
The Web Archives section of the GLAM Workbench is sponsored by the British Library.