Create a database to search across each line of text in a series of volumes
The code in this notebook was used to create the NSW Post Office Directories search interface which lets you search across 54 volumes from 1886 to 1950. The same code, with minor modifications, could be used to index any series of publication where it would be useful to search by line (rather than Trove's default 'article') – for example, lists, directories and gazetteers – turning them into searchable databases.
The notebook assembles information about each volume in a series. It downloads OCRd text for each volume page by page, and matches it with a page identifier so that you can construct links back to Trove. The text and metadata are loaded into a SQLite database and the text is indexed. This database can be viewed using Datasette.
Sherratt, Tim. (2022). GLAM-Workbench/trove-journals (version v1.0.0). Zenodo. https://doi.org/10.5281/zenodo.7039919