Getting started with the PROV API¶

The Public Record Office Victoria's Public API provides data about its archival holdings in a machine readable format. This makes it possible to use, analyse, and visualise the collection in new ways. You can read more about the API in this blog post.

For an overview of the data currently available though the PROV API, see the PROV Data Dashboard.

This notebook attempts to document the basic functionality of the API. You might also find it useful to consult Solr's query guide.

  • A simple API request
  • Using an 'empty' query to get everything
  • The different types of entities in API results
  • Identifiers and links
  • Search facets
  • Controlling the way results are delivered
  • Retrieving a random result
  • Harvest a complete set of results
  • Constructing queries
    • Boolean operators
    • Text searches
    • Filter results by using fields
    • Filter by date
    • Find digitised records
    • Find records about people
    • Find an individual record
    • Find related entities
    • Filter by related entities

New to Jupyter notebooks? See Using Jupyter notebooks in the GLAM Workbench for an introduction.

In [1]:
import random

import pandas as pd
import requests
from IPython.display import Image, display

A simple API request¶

Making an API request is easy. The base url for API requests is https://api.prov.vic.gov.au/search/query. No authentication is required, and the only mandatory parameter is q which you use to pass along your search query. Here's an example of a simple search for records containing the word 'ostrich'.

In [2]:
api_url = "https://api.prov.vic.gov.au/search/query"

params = {
    "q": "ostrich",
}
response = requests.get(api_url, params=params)
print(f"API request url: {response.url}")
data = response.json()
API request url: https://api.prov.vic.gov.au/search/query?q=ostrich

The number of matching search results is contained in the response -> numFound field.

In [3]:
print(f"There are {data['response']['numFound']:,} results.")
There are 10 results.

The full search results can be found in response -> docs. By looping through all the results you can display their titles.

In [4]:
for result in data["response"]["docs"]:
    print(result["title"])
D501 Head close-up of ostrich
D502 Head close-up of ostrich
110
1004/312 Marinda McOstrich Jaffray: Will; Grant of probate
1004/312 Marinda McOstrich Jaffray: Grant of probate
M OSTRICH
553055
M OSTRICH
MATT OSTRICHE
V332 [Daryl Somers and Ozzie Ostrich on the children's show 'Hey, Hey' It's Saturday']

Here's the first result in full.

In [5]:
data["response"]["docs"][0]
Out[5]:
{'category': 'Item',
 'entity': 'Record',
 '_id': 'B7BE47C9-5613-11EB-BE8C-6757FF78D049',
 'timestamp': 1622643613,
 'identifier.PROV_ACM.id': 'VPRS 14517/P0001/5/155',
 'identifier.PID.id': 'B7BE47C9-5613-11EB-BE8C-6757FF78D049',
 'title': 'D501 Head close-up of ostrich',
 'consignment_id': 'P0001',
 'start_dt': '1753-01-01T00:00:00Z',
 'end_dt': '3000-12-31T00:00:00Z',
 'date_range': ['[1753 TO 3000]'],
 'date_range.not_described': ['[1753 TO 3000]'],
 'description.subject': ['OSTRICHES'],
 'description.aggregate': 'Subject : OSTRICHES',
 'presentation_text': 'Subject : OSTRICHES',
 'jurisdictional_coverage': ['Victoria'],
 'rights_statement': ['Open Public Records Act 1973'],
 'rights_status': ['Open'],
 'item_discrete': 'No',
 'format': 'Physical',
 'medium': ['Polyester Negative'],
 'location': ['North Melbourne', 'Online'],
 'access_restriction': 'No',
 'status': 'Published',
 'citation': 'VPRS 14517/P0001/5, D501',
 'citation_sort': '14517P00010000130000001550',
 'is_part_of_series.id': ['VPRS14517'],
 'is_part_of_series.title': ['Negatives of Photographs [Publications Branch]'],
 'series_id': '14517',
 'parents.ids': ['VPRS14517', 'D6BD4E47-F7E6-11E9-AE98-87DB74D2147D'],
 'parents.titles': ['Negatives of Photographs [Publications Branch]',
  '[Not Set]'],
 'agencies.titles': ['Education Department'],
 'agencies.ids': ['VA714'],
 'agencies.date_ranges': ['[1873 TO 1985]'],
 'resp_agency_title': ['Department of Education'],
 'resp_agency_title_facet': ['Department of Education'],
 'resp_agency_id': ['3098'],
 'is_part_of_item.PID': ['D6BD4E47-F7E6-11E9-AE98-87DB74D2147D'],
 'is_part_of_item.title': ['D332-D667'],
 'iiif-manifest': 'https://images.prov.vic.gov.au/manifests/B7/BE/47/C9/-5613-11EB-BE8C-6757FF78D049/images/manifest.json',
 'iiif-thumbnail': 'https://images.prov.vic.gov.au/loris/B7%2FBE%2F47%2FC9%2F-5613-11EB-BE8C-6757FF78D049%2Fimages%2F1%2Ffiles%2F14517-00013-D0501.tif/full/!200,200/0/default.jpg',
 'control_symbol_labels': ['Registration Number'],
 'control_symbol_values': ['D501'],
 'record_form': ['Photograph or Image'],
 'box_number_sort': [13],
 '_version_': 1816058689549762560}

While no authentication is required to make a request, there is a limit on the number of requests you can make within a given time period. You can see the details in the response headers.

Using an 'empty' query to get everything¶

If you want information on all the records available through the PROV API you can set q to either * or *:* to get everything. For example, you can find out the total number of records available from the PROV API.

In [7]:
api_url = "https://api.prov.vic.gov.au/search/query"

params = {"q": "*"}
response = requests.get(api_url, params=params)
print(f"API request url: {response.url}")
data = response.json()
print(f"There are {data['response']['numFound']:,} results.")
API request url: https://api.prov.vic.gov.au/search/query?q=%2A
There are 10,147,135 results.

The different types of entities in API results¶

API search results are not all the same. They include a mix of functions, agencies, series, consignments, items, images, and relationships between entities. You can find out more about the different entities that PROV uses to describe its collection in the PROV Archival Control Model.

You can identify what type of entity each record relates to by looking at the category field. For example, the category field in the record above has the value 'Item'.

The category field has one of the following values:

  • Agency
  • Function
  • Image
  • Item
  • relatedEntity
  • Series

Using the category field you can filter your search to include only certain types of records.

While some fields – like _id, category, and title – are present in all API results, other fields vary according to the type of entity.

Identifiers and links¶

Each API record has a hash identifier in the _id field. These can be used to retrieve a specific API record. But the different entities in PROV's archival control model have other identifiers that are important to understand and recognise as they're used to create relationships between entities.

Functions, agencies, and series¶

Identifiers for functions, agencies, and series have a standard format defined by the PROV's Archival Control Model. These identifiers can be found in the identifier.PROV_ACM.id field of an API result.

entity | id format | example | field | -------|-----------|---------|---------| function | VF [number] | VF 375 | identifier.PROV_ACM.id agency | VA [number] | VA 3744 | identifier.PROV_ACM.id series | VPRS [number] | VPRS 851 | identifier.PROV_ACM.id and series_id (number only)

You can use these identifiers to construct urls that point to the entity's page on the PROV website. The format is https://prov.vic.gov.au/archive/[identifier]. This works with or without a space between the letter prefix and the number, but it's probably safer to remove the space, for example:

https://prov.vic.gov.au/archive/VPRS851

These identifiers are used to create relationships between entities. However, depending on the relationship and the field, the form of the identifier might change – sometimes there's a space between the letter prefix and the number, sometimes there's not, and sometimes it's the numeric value only. For example, in the item record displayed above, the series_id field contains the numeric part of identifier of the series that contains the item, but the full identifier (without a space) is included in the is_part_of_series.id. These differences can be important if you're using these fields to retrieve information about related entities.

Items¶

Items can be identified in a number of ways. As well as the hash identifier in the _id field, there's a barcode number, and control_symbol_values which can be combined with the series number to create a reference to the item. It seems that the hash identifier is used to create relationships between items and other entities. For example, images are linked back to items by including the hash identifier in the is_part_of_item.PID field.

The hash identifiers can also be used to construct urls that point to an item's page on the PROV website. The format is https://prov.vic.gov.au/archive/[hash identifier]. For example, the item record above has an _id value of B7BE47C9-5613-11EB-BE8C-6757FF78D049, so the website url is:

https://prov.vic.gov.au/archive/B7BE47C9-5613-11EB-BE8C-6757FF78D049

Images¶

Images add metadata to pages within digitised items. They have an _id value that you can use to retrieve an API record, but there's no way to build a url that will display a web page containing the image metadata. The best you can do and construct a link to the digitised page. To do this you need the id of the parent item from the image's is_part_of_item.PID and the canvas _number of the page. Then you can construct a url with the format: https://prov.vic.gov.au/archive/[parent item id]?image=[canvas number + 1]. For example:

https://prov.vic.gov.au/archive/29AF80BD-F7F0-11E9-AE98-6FA01622F08F?image=304

Search facets¶

Search facets tell you the number of results per value in a given field. To include facets in your API results you need to set the facet parameter to true and use facet.field to specify the name of the field you're interrested in.

For example, if you wanted to know the number of results in each of the different categories described above:

  • set q to * to return everything
  • set facet to true
  • set facet.field to category

The facet counts are included in the API results at facet_counts -> facet_fields -> [name of the specified field].

In [8]:
params = {
    "q": "*",  # an empty query to get everything
    "facet": "true",
    "facet.field": "category",
}
response = requests.get("https://api.prov.vic.gov.au/search/query", params=params)
data = response.json()
values = data["facet_counts"]["facet_fields"]["category"]
print(values)
['Item', 6337827, 'Image', 3613751, 'relatedEntity', 151117, 'Consignment', 23771, 'Series', 17095, 'Agency', 3252, 'Function', 322]

The facet counts are just an array of paired value names and result counts. To convert the list into something more structured you can do this.

In [9]:
facets = [
    {"category": values[i], "count": values[i + 1]}
    for i in range(0, len(values), 2)
    if values[i + 1] > 0
]
pd.DataFrame(facets)
Out[9]:
category count
0 Item 6337827
1 Image 3613751
2 relatedEntity 151117
3 Consignment 23771
4 Series 17095
5 Agency 3252
6 Function 322

Controlling the way results are delivered¶

There are a number of parameters you can add to your API request to change the way the results are delivered.

  • wt: the encoding of the results – either JSON (the default) or XML
  • rows: the number of results to include (default is 10)
  • start: the result number to start from (default is 0)
  • sort: order in which to sort the results
  • fl: specify the fields you want to include in the results

For example, to increase the number of results returned by your API request set the rows parameter to 100.

In [41]:
params = {"q": "rabbits", "rows": 100}
response = requests.get(api_url, params=params)
print(response.url)
data = response.json()

print(
    f"This requests delivers {len(data['response']['docs'])} of {data['response']['numFound']:,} results."
)
https://api.prov.vic.gov.au/search/query?q=rabbits&rows=100
This requests delivers 100 of 365 results.

By default, search results are sorted by their relevance score. You can change this using the sort parameter. You need to supply both the name of a field to sort on and a sort order – either ascending or descending. For example, to sort by title from A to Z, you'd set sort to title asc.

In [42]:
params = {"q": "rabbits", "rows": 1, "sort": "title asc"}
response = requests.get(api_url, params=params)
print(response.url)
data = response.json()

print(data["response"]["docs"][0]["title"])
https://api.prov.vic.gov.au/search/query?q=rabbits&rows=1&sort=title+asc
'55/37/5

Changing asc to desc will reverse the order of the results.

In [43]:
params = {"q": "rabbits", "rows": 1, "sort": "title desc"}
response = requests.get(api_url, params=params)
print(response.url)
data = response.json()

print(data["response"]["docs"][0]["title"])
https://api.prov.vic.gov.au/search/query?q=rabbits&rows=1&sort=title+desc
Wire Netting Advances Files [SAMPLE ONLY RETAINED]

If you only want specific fields, you can supply a list of required field names using the fl parameter.

In [44]:
params = {"q": "rabbits", "rows": 1, "fl": "_id,title"}
response = requests.get(api_url, params=params)
print(response.url)
data = response.json()

print(data["response"]["docs"][0])
https://api.prov.vic.gov.au/search/query?q=rabbits&rows=1&fl=_id%2Ctitle
{'_id': '3589932B-F1AF-11E9-AE98-C70783C3C724', 'title': "Rabbit Inspector's Reports"}

Retrieving a random result¶

You can use the start and rows parameters to retrieve a random result:

  • first run a query and find the total number of results
  • select a random number within the range provided by the total number of results
  • set start to the random number and rows to 1

The cell below displays a randomly-selected photograph. It's similar to the code used by PROVBot to share PROV photos through the Fediverse.

In [14]:
params = {
    "q": 'iiif-manifest:[* TO *] AND record_form:"Photograph or Image"',
    "rows": 1,
}

# Get total number of results
response = requests.get("https://api.prov.vic.gov.au/search/select", params=params)
data = response.json()
total_results = data["response"]["numFound"]

# Set a random start point within the range of total results
params["start"] = random.randrange(0, total_results)

# Retrieve random result
response = requests.get("https://api.prov.vic.gov.au/search/select", params=params)
data = response.json()
item = data["response"]["docs"][0]

print(item["title"])
display(Image(url=item["iiif-thumbnail"]))
D642 Coarse cloddy soil

Harvest a complete set of results¶

You can loop through a complete set of results by updating the start value after each request. You'll know when you've reached the end of the results set when the docs list is empty.

The code below just saves the harvested results into a list named harvested_results. Alternatively, you could write the harvested results to a file.

In [15]:
params = {
    "q": "rabbits",
    "category": "Item",
    "rows": 100,
}

start = 0
harvested_results = []

# Continue in this loop while there are results to harvest
while True:
    # Update the start parameter
    params["start"] = start
    response = requests.get(api_url, params=params)
    data = response.json()
    results = data["response"]["docs"]
    # Add the results from this request to the harvested results
    harvested_results += results
    # Get the number of results returned by the current request
    num_docs = len(results)
    # Add the number of results from this request to the start value
    start += num_docs
    # There are no more results, so stop the harvest
    if num_docs == 0:
        break

print(f"Harvested {len(harvested_results)} results.")
Harvested 365 results.

Constructing queries¶

Boolean operators¶

You can use standard Boolean operators, such as AND, OR, and NOT, to combine query terms. You can also use brackets to group parts of complex queries. See the Solr documentation for more.

Text searches¶

To search for words or phrases across multiple fields, just add them to the q parameter. If you include multiple keywords, they'll be treated as if they were connected by an OR operator. So a q value of murray river is the same as murray OR river.

In [16]:
params = {
    "q": "murray river",
}
response = requests.get(api_url, params=params)
print(response.url)
data = response.json()

print(f"There are {data['response']['numFound']:,} results.")
https://api.prov.vic.gov.au/search/query?q=murray+river
There are 64,746 results.
In [17]:
params = {
    "q": "murray OR river",
}
response = requests.get(api_url, params=params)
print(response.url)
data = response.json()

print(f"There are {data['response']['numFound']:,} results.")
https://api.prov.vic.gov.au/search/query?q=murray+OR+river
There are 64,746 results.

If you want only records containing both keywords, use the AND operator.

In [18]:
params = {
    "q": "murray AND river",
}
response = requests.get(api_url, params=params)
print(response.url)
data = response.json()

print(f"There are {data['response']['numFound']:,} results.")
https://api.prov.vic.gov.au/search/query?q=murray+AND+river
There are 2,584 results.

To treat the keywords as a phrase, enclose them in quotes, eg "murray river".

In [19]:
params = {
    "q": '"murray river"',
}
response = requests.get(api_url, params=params)
print(response.url)
data = response.json()

print(f"There are {data['response']['numFound']:,} results.")
https://api.prov.vic.gov.au/search/query?q=%22murray+river%22
There are 1,746 results.

Text fields are stemmed, so you generally don't need to worry about plurals and other word forms. For example a search for box will be the same as a search for boxes, mine will match mining, and engine will match engineer. There doesn't seem to be any way to search for an exact string, so there'll always be a bit of fuzziness.

You can also use wildcards, fuzzy matches, and proximity searches. See the Solr documentation for more information. For example, a search for "gold mining" will return records that include the phrase "gold mining" (or "gold mine" because of word stemming). A search for "gold mining"~10 will find records where gold and mining (or mine) occur wihin 10 words of each other.

In [20]:
params = {
    "q": '"gold mining"',
}
response = requests.get(api_url, params=params)
print(response.url)
data = response.json()

print(f"There are {data['response']['numFound']:,} results.")
https://api.prov.vic.gov.au/search/query?q=%22gold+mining%22
There are 3,910 results.
In [21]:
params = {
    "q": '"gold mining"~10',
}
response = requests.get(api_url, params=params)
print(response.url)
data = response.json()

print(f"There are {data['response']['numFound']:,} results.")
https://api.prov.vic.gov.au/search/query?q=%22gold+mining%22~10
There are 4,563 results.

Filter results by using fields¶

You can filter search results by specifying the value of a particular field. To do this you add [field]:[value] to the q query string. For example, if you only want items in your results, you can ask for records where the category field includes the value Item. You can also combine multiple field values using the AND and OR operators:

  • to include only items: category:Item
  • to include either items or images: category:Item OR category:Image

Let's ask for items only.

In [22]:
params = {"q": "category:Item"}
response = requests.get(api_url, params=params)
print(response.url)
data = response.json()

print(f"\nThere are {data['response']['numFound']:,} results.\n")

for result in data["response"]["docs"]:
    print(result["title"])
https://api.prov.vic.gov.au/search/query?q=category%3AItem

There are 6,337,827 results.

211/374 Leslie A Lamb: Will; Grant of probate
215/936 Ellen Cahill: Will; Grant of probate
215/981 Florence M Lovegrove: Will; Grant of probate
211/107 Amelia Hawking: Will; Grant of probate
215/980 William F Finchett: Will; Grant of probate
215/979 George Wilson: Will; Grant of probate
211/102 Bernard F Cragen: Will; Grant of probate
211/221 Jonathan Coulson: Will; Grant of probate
215/978 William E S Ockenden: Will; Grant of probate
215/959 Otto Holst: Will; Grant of probate

And now compare the number of results to a request for items or images.

In [23]:
params = {"q": "category:Item OR category:Image"}
response = requests.get(api_url, params=params)
print(response.url)
data = response.json()

print(f"\nThere are {data['response']['numFound']:,} results.\n")
https://api.prov.vic.gov.au/search/query?q=category%3AItem+OR+category%3AImage

There are 9,951,578 results.

Here are some commonly-used fields, with their range of possible values, that can be used to filter your results:

  • category – possible values:
    • Agency
    • Consignment
    • Function
    • Image
    • Item
    • Series
    • relatedEntity
  • format – possible values:
    • Digital
    • Physical
  • record_form – possible values:
    • Card
    • Data
    • Document
    • File
    • Map, Plan, or Drawing
    • Moving Image
    • Object
    • Photograph or Image
    • Sound Recording
    • Volume
    • Website
  • location – possible values:
    • Ballarat
    • Beechworth
    • Bendigo
    • Geelong
    • North Melbourne
    • Online
  • rights_status – posible values:
    • Closed
    • Closed Record and Open Metadata
    • Not set
    • Open

For example, you might be interested in volumes held in Ballarat:

In [24]:
params = {"q": "record_form:Volume AND location:Ballarat"}
response = requests.get(api_url, params=params)
print(response.url)
data = response.json()

print(f"\nThere are {data['response']['numFound']:,} results.\n")

for result in data["response"]["docs"]:
    print(result["title"])
https://api.prov.vic.gov.au/search/query?q=record_form%3AVolume+AND+location%3ABallarat

There are 6,976 results.

1966 - 1873
Book 46, 16.04.1920 - 17.11.1920,
Book 47, 17.11.1920 - 29.07.1921,
Book 48, 01.08.1921 - 28.02.1922,
Book 51, 13.04.1923 - 29.09.1923,
Book 52, 01.10.1923 - 24.04.1924,
Book 53, 28.04.1924 - 02.10.1924,
Book 54, 08.10.1924 - 27.03.1925,
Book 55, 27.03.1925 - 25.09.1925,
Book 56, 25.09.1925 - 08.03.1926,

Filter by date¶

Most records include a start and end date – start_dt and end_dt. You ask for records within a specific date range by using a range query. For example, if you wanted all records with a start date between 1920 and 1949 you'd add start_dt:[1920-01-01 TO 1949-12-31] to the q query string.

In [25]:
params = {"q": "start_dt:[1920-01-01 TO 1949-12-31]"}
response = requests.get(api_url, params=params)
print(response.url)
data = response.json()

print(f"\nThere are {data['response']['numFound']:,} results.\n")
https://api.prov.vic.gov.au/search/query?q=start_dt%3A%5B1920-01-01+TO+1949-12-31%5D

There are 1,076,668 results.

You can use an asterisk instead of a date if the range is open ended. For example, to ask for all records with a start date greater than 1920 you'd add start_dt:[1920-01-01 TO *] to the query.

In [26]:
params = {"q": "start_dt:[1920-01-01 TO *]"}
response = requests.get(api_url, params=params)
print(response.url)
data = response.json()

print(f"\nThere are {data['response']['numFound']:,} results.\n")
https://api.prov.vic.gov.au/search/query?q=start_dt%3A%5B1920-01-01+TO+%2A%5D

There are 3,808,984 results.

Find digitised records¶

Digitised items have a associated IIIF manifest that describes all the digitised images attached to the item. To find digitised items, filter your search to only include records with a value in the iiif-manifest field. You can do this by using a open-ended range query: iiif-manifest:[* TO *].

In [27]:
params = {"q": "iiif-manifest:[* TO *]"}
response = requests.get(api_url, params=params)
print(response.url)
data = response.json()

print(f"\nThere are {data['response']['numFound']:,} results.\n")
https://api.prov.vic.gov.au/search/query?q=iiif-manifest%3A%5B%2A+TO+%2A%5D

There are 925,524 results.

The iiif-manifest values are urls that you can request to information about digitised images in a standard JSON format. See the IIIF documentation for more information on using IIIF manifests.

In [28]:
print(data["response"]["docs"][0]["iiif-manifest"])
https://images.prov.vic.gov.au/manifests/0135/5021/14/images/manifest.json

Some digitised pages are also described in Image records. Details of these pages will be included in the IIIF manifest of their parent item, but the Image record attaches some additional metadata, such as the name of a person mentioned on the page. To filter your results to include only image records, add category:Image to your query.

In [29]:
params = {"q": "category:Image"}
response = requests.get(api_url, params=params)
print(response.url)
data = response.json()

print(f"\nThere are {data['response']['numFound']:,} results.\n")
https://api.prov.vic.gov.au/search/query?q=category%3AImage

There are 3,613,751 results.

Digitised items and images both include a thumbnail url in the iiif-thumbnail field. To find either type of record, add iiif-thumbnail:[* TO *] to your query.

In [30]:
params = {"q": "iiif-thumbnail:[* TO *]"}
response = requests.get(api_url, params=params)
print(response.url)
data = response.json()

print(f"\nThere are {data['response']['numFound']:,} results.\n")
https://api.prov.vic.gov.au/search/query?q=iiif-thumbnail%3A%5B%2A+TO+%2A%5D

There are 4,547,296 results.

In [31]:
Image(url=data["response"]["docs"][0]["iiif-thumbnail"])
Out[31]:
No description has been provided for this image

Here's an alternative query you could use to find both digitised items and images.

In [32]:
params = {"q": "(iiif-thumbnail:[* TO *] AND category:Item) OR (category:Image)"}
response = requests.get(api_url, params=params)
print(response.url)
data = response.json()

print(f"\nThere are {data['response']['numFound']:,} results.\n")
https://api.prov.vic.gov.au/search/query?q=%28iiif-thumbnail%3A%5B%2A+TO+%2A%5D+AND+category%3AItem%29+OR+%28category%3AImage%29

There are 4,547,296 results.

Find records about people¶

Some records have been indexed with the names of people related to that record. Names are attached to individual files, such as wills or inquests, and to specific pages within volumes or registers, such as passenger lists. You can find records containing particular names by using a text search. If you want to find any records that include names you can filter on the family_name field using an open ended range query: family_name:[* TO *].

In [33]:
params = {"q": "family_name:[* TO *]"}
response = requests.get(api_url, params=params)
print(response.url)
data = response.json()

print(f"\nThere are {data['response']['numFound']:,} results.\n")
https://api.prov.vic.gov.au/search/query?q=family_name%3A%5B%2A+TO+%2A%5D

There are 6,097,812 results.

Names can also be included in other fields. I've added more than 6 million named records in the PROV collection to the GLAM Name Index Search.

Find an individual record¶

There's no separate endpoint for retrieving individual entity records via the API. Instead you have to construct a search for a value that's unique to the entity, such as its identifier.

This is pretty straightforward for functions, agencies, and series, as you can search for their standard identifiers in the identifier.PROV_ACM.id field. The identifier has to have a space between the letter prefix and the number, and should be enclosed in double quotes. For example, to retrieve details of the series VPRS 13:

In [34]:
params = {"q": 'identifier.PROV_ACM.id:"VPRS 13" AND category:Series'}
response = requests.get(api_url, params=params)
print(response.url)
data = response.json()

print(f"\nThere are {data['response']['numFound']:,} results.\n")
display(data["response"]["docs"][0])
https://api.prov.vic.gov.au/search/query?q=identifier.PROV_ACM.id%3A%22VPRS+13%22+AND+category%3ASeries

There are 1 results.

{'category': 'Series',
 'entity': 'Record',
 '_id': '13912344-F1A4-11E9-AE98-91984FD5C262',
 'timestamp': 1714524468,
 'identifier.PROV_ACM.id': 'VPRS 13',
 'series_id': '13',
 'citation': 'VPRS 13',
 'citation_sort': '00013',
 'identifier.PID.id': '13912344-F1A4-11E9-AE98-91984FD5C262',
 'title': 'Inwards Shipping Index [Refer to Microfilm Copy VPRS 3504]',
 'date_range': ['1900'],
 'start_dt': '1900',
 'start_dt_qual': '?',
 'end_dt': '1900',
 'how_to_use': ['** Further research is required to determine the exact purpose and context of this series **<br/><br/>This series comprises an alphabetical index to shipping arrivals at Victorian ports. Monitoring of shipping arrivals for customs and immigration purposes was undertaken by the Victorian Government from 1839 until responsibility for these functions passed to the Commonwealth Government in 1901 and 1924 respectively.<br/><br/>Ships have been entered in lexicographical (ie. strict alphabetical) order. Each arrival of the ship is then listed in chronological order by date of arrival.<br/><br/>Entries in the volumes include the following details:<br/>name of vessel<br/>tonnage<br/>master<br/>port of embarkation<br/>date of arrival.<br/><br/>This Index covers the period 1839 to 1900. It is assumed that the Index must have been compiled sometime around or after 1900 in order that the correct alphabetical order could be determined. Evidence in the Index suggests that it was compiled from an existing index.<br/><br/>For the period 1901 to 1924 consult VPRS 3503 which is a microfilm copy of a self-indexing chronological record of ship arrivals. Note that the ships listed in VPRS 3503 are not in strict alphabetical order.<br/>'],
 'resp_agency_title': ['Department of Transport (known as Ministry of Transport 1951 to 1992)'],
 'resp_agency_title_facet': ['Department of Transport (known as Ministry of Transport 1951 to 1992)'],
 'resp_agency_id': ['673'],
 'format': 'Physical',
 'rights_status': ['Open'],
 'location': ['North Melbourne'],
 'contents.date_range': ['[1839 TO 1900]'],
 'contents.start_dt': [1839],
 'contents.end_dt': [1900],
 'series_in_custody.date_range': ['1900'],
 'series_in_custody.start_dt': [1900],
 'series_in_custody.end_dt': [1900],
 'responsible_agents.resp_agency_id': [673],
 'responsible_agents.title': ['Department of Transport (known as Ministry of Transport 1951 to 1992)'],
 'responsible_agents.date_ranges': ['[1983 TO 1996]'],
 'responsible_agents.start_dt': [1983],
 'responsible_agents.end_dt': [1996],
 'creating_agents.creating_agency_id': [606],
 'creating_agents.title': ['Department of Trade and Customs'],
 'creating_agents.date_ranges': ['1900'],
 'creating_agents.start_dt': [1900],
 'creating_agents.end_dt': [1900],
 'status': 'Published',
 '_version_': 1816062577834196992}

There are a number of identifiers associated with items, so it's really a matter of what information you have about the item that you can use to construct a search. The hash identifier in the _id field seems to be used in documenting relationships between items, and between images and items, so you might want to use it to look for further information.

In [35]:
params = {"q": '_id:"B7BE47C9-5613-11EB-BE8C-6757FF78D049" AND category:Item'}
response = requests.get(api_url, params=params)
print(response.url)
data = response.json()

print(f"\nThere are {data['response']['numFound']:,} results.\n")
https://api.prov.vic.gov.au/search/query?q=_id%3A%22B7BE47C9-5613-11EB-BE8C-6757FF78D049%22+AND+category%3AItem

There are 1 results.

Find related entities¶

The richness of the PROV Archival Control Models lies in the network of relationships between entities. These relationships are documented in fields within individual records, and in separate relatedEntity records.

Items¶

Items are part of series. The identifier of an item's parent series can be found in the following fields:

  • series_id (numeric value only, eg: 7591)
  • is_part_of_series.id (no space between prefix and number, eg: VPRS7591)
  • parents.ids (no space between prefix and number, eg: VPRS7591)

Items can also be part of another item. The identifier of a parent item can be found in the following field:

  • parents.ids (hash identifier)

Items are part of series and series are created and controlled by agencies. Item records include agency identifiers in the following fields:

  • resp_agency_id (numeric value only, eg: 2620)
  • agencies.ids (no space between prefix and number, eg: VA2620)

Images¶

Images are part of items. The identifier of an image's parent item can be found in the is_part_of_item.PID field.

Series¶

Series are created and controlled by agencies. Series records include agency identifiers in the following fields:

  • resp_agency_id (numeric value only, eg: 2620)
  • responsible_agents.resp_agency_id (numeric value only, eg: 2620)
  • creating_agents.creating_agency_id (numeric value only, eg: 2620)

As well as agencies, series can also be related to other series. These relationships are recorded in relatedEntity records, for example:

  • Creating agency
  • Responsible agency
  • Controlled series
  • Controlling series
  • Previous series
  • Subsequent series

Agencies¶

Agencies create and control series, perform functions, and are related to other agencies. These relationships are recorded in relatedEntity records, for example:

  • Created series
  • Responsible series
  • Primary responsible function
  • Secondary responsible function
  • Subordinate agency
  • Superior agency
  • Subsequent agency
  • Previous agency

Functions¶

Functions are performed by agencies, and are related to other functions. These relationships are recorded in relatedEntity records, for example:

  • Primary responsible agency
  • Secondary responsible agency
  • Related function
  • Broader function
  • Narrower function

Where there are related identifiers in fields such as series_id, you can use these identifiers to construct a search that will return the related entity, as described in the previous section.

The relatedEntity records are richer and more complex. As well as linking identifiers, they provide some extra context around the relationship, such as the date range when it was active. The two linked identifiers are in entity_id and related_entity_id fields. Which identifier goes where depends on the direction of the relationship. For example, to find the primary functions of the 'Superintendent, Port Phillip District' (VA 473), you'd search for entity_id:VA473 and relationship:"Primary responsible function".

In [36]:
params = {
    "q": 'category:relatedEntity AND entity_id:VA473 AND relationship:"Primary responsible function"',
    "rows": 100,
}
response = requests.get(api_url, params=params)
print(response.url)
data = response.json()

print(f"\nThere are {data['response']['numFound']:,} functions.\n")
for result in data["response"]["docs"]:
    print(result["title"])
https://api.prov.vic.gov.au/search/query?q=category%3ArelatedEntity+AND+entity_id%3AVA473+AND+relationship%3A%22Primary+responsible+function%22&rows=100

There are 21 functions.

Crown lands (public)
Crown lands (government)
Armed forces command
Education
Goldfields administration and mining
Library, State
Botanic gardens
Crown solicitor's services
Census and statistics
Finance
General superintendence
Police
Ports and harbours
Immigration (nineteenth century)
Postal services
Buildings, government (design and construction)
Roads and bridges
Health, public
Customs
Aboriginal affairs
Prisons and youth training centres

Here's the first of these records.

In [37]:
data["response"]["docs"][0]
Out[37]:
{'category': 'relatedEntity',
 'status': 'Published',
 '_id': 'VA473:VF309:2690:primaryresponsibilityfor',
 'timestamp': 1614239338,
 'entity_id': 'VA473',
 'related_entity_id': 'VF309',
 'sort_id': 309,
 'title': 'Crown lands (public)',
 'relationship': 'Primary responsible function',
 'relationship_date_range': ['[1839 TO 1851]'],
 'relationship_start_dt': 1839,
 'relationship_end_dt': 1851,
 '_version_': 1816039025116446725}

To go in the other direction and find all the agencies with primary responsibility for the function 'Crown lands (public)' (VF 309), you'd search for entity_id:VF309 and relationship:"Primary responsible agency".

In [38]:
params = {
    "q": 'category:relatedEntity AND entity_id:VF309 AND relationship:"Primary responsible agency"',
    "rows": 100,
}
response = requests.get(api_url, params=params)
print(response.url)
data = response.json()

print(f"\nThere are {data['response']['numFound']:,} agencies.\n")
for result in data["response"]["docs"]:
    print(result["title"])
https://api.prov.vic.gov.au/search/query?q=category%3ArelatedEntity+AND+entity_id%3AVF309+AND+relationship%3A%22Primary+responsible+agency%22&rows=100

There are 14 agencies.

Superintendent, Port Phillip District
Department of Conservation and Natural Resources
Department of Crown Lands and Survey, Geelong Division
Department of Conservation, Forests and Lands
Department of Environment and Primary Industries
Department of Conservation and Environment
Police Magistrate Port Phillip District
Department of Natural Resources and the Environment
Department of Sustainability and Environment
Colonial Secretary's Office
Crown Lands Department
Department of Environment, Land, Water and Planning
Department of Energy, Environment and Climate Action 
Department of Crown Lands and Survey

Filter by related entities¶

As described above, the API records of items, images, and series contain links to related entities. You can use these relationships to filter your searches. For example, if you want to limit your search for items to those in series VPRS 460, you'd add series_id:460 to your query.

In [39]:
params = {"q": "category:Item AND series_id:460"}
response = requests.get(api_url, params=params)
print(response.url)
data = response.json()

print(f"\nThere are {data['response']['numFound']:,} results.\n")
https://api.prov.vic.gov.au/search/query?q=category%3AItem+AND+series_id%3A460

There are 74,409 results.

You could also use the is_part_of_series.id field, but the value needs to include the letter prefix: VPRS460.

In [40]:
params = {"q": "category:Item AND is_part_of_series.id:VPRS460"}
response = requests.get(api_url, params=params)
print(response.url)
data = response.json()

print(f"\nThere are {data['response']['numFound']:,} results.\n")
https://api.prov.vic.gov.au/search/query?q=category%3AItem+AND+is_part_of_series.id%3AVPRS460

There are 74,409 results.


Created by Tim Sherratt for the GLAM Workbench.