Convert a HTML finding aid to JSON
While I think the finding aids are created and stored as EAD encoded XML files, they are delivered as HTML. This means that to reassemble the finding aid hierarchy in a way that facilitates analysis, we have to scrape the HTML and make a few assumptions about the content.
This notebook scrapes data from the HTML of a finding aid, saving the hierarchy of series, sub-series, and items as a list of nested objects. The results can be saved as a JSON file.
Sherratt, Tim. (2023). GLAM-Workbench/trove-unpublished (version v1.0.0). Zenodo. https://doi.org/10.5281/zenodo.7690276