Getting started with the PROV API¶
The Public Record Office Victoria's Public API provides data about its archival holdings in a machine readable format. This makes it possible to use, analyse, and visualise the collection in new ways. You can read more about the API in this blog post.
For an overview of the data currently available though the PROV API, see the PROV Data Dashboard.
This notebook attempts to document the basic functionality of the API. You might also find it useful to consult Solr's query guide.
- A simple API request
- Using an 'empty' query to get everything
- The different types of entities in API results
- Identifiers and links
- Search facets
- Controlling the way results are delivered
- Retrieving a random result
- Harvest a complete set of results
- Constructing queries
New to Jupyter notebooks? See Using Jupyter notebooks in the GLAM Workbench for an introduction.
import random
import pandas as pd
import requests
from IPython.display import Image, display
A simple API request¶
Making an API request is easy. The base url for API requests is https://api.prov.vic.gov.au/search/query
. No authentication is required, and the only mandatory parameter is q
which you use to pass along your search query. Here's an example of a simple search for records containing the word 'ostrich'.
api_url = "https://api.prov.vic.gov.au/search/query"
params = {
"q": "ostrich",
}
response = requests.get(api_url, params=params)
print(f"API request url: {response.url}")
data = response.json()
API request url: https://api.prov.vic.gov.au/search/query?q=ostrich
The number of matching search results is contained in the response
-> numFound
field.
print(f"There are {data['response']['numFound']:,} results.")
There are 10 results.
The full search results can be found in response
-> docs
. By looping through all the results you can display their titles.
for result in data["response"]["docs"]:
print(result["title"])
D501 Head close-up of ostrich D502 Head close-up of ostrich 110 1004/312 Marinda McOstrich Jaffray: Will; Grant of probate 1004/312 Marinda McOstrich Jaffray: Grant of probate M OSTRICH 553055 M OSTRICH MATT OSTRICHE V332 [Daryl Somers and Ozzie Ostrich on the children's show 'Hey, Hey' It's Saturday']
Here's the first result in full.
data["response"]["docs"][0]
{'category': 'Item', 'entity': 'Record', '_id': 'B7BE47C9-5613-11EB-BE8C-6757FF78D049', 'timestamp': 1622643613, 'identifier.PROV_ACM.id': 'VPRS 14517/P0001/5/155', 'identifier.PID.id': 'B7BE47C9-5613-11EB-BE8C-6757FF78D049', 'title': 'D501 Head close-up of ostrich', 'consignment_id': 'P0001', 'start_dt': '1753-01-01T00:00:00Z', 'end_dt': '3000-12-31T00:00:00Z', 'date_range': ['[1753 TO 3000]'], 'date_range.not_described': ['[1753 TO 3000]'], 'description.subject': ['OSTRICHES'], 'description.aggregate': 'Subject : OSTRICHES', 'presentation_text': 'Subject : OSTRICHES', 'jurisdictional_coverage': ['Victoria'], 'rights_statement': ['Open Public Records Act 1973'], 'rights_status': ['Open'], 'item_discrete': 'No', 'format': 'Physical', 'medium': ['Polyester Negative'], 'location': ['North Melbourne', 'Online'], 'access_restriction': 'No', 'status': 'Published', 'citation': 'VPRS 14517/P0001/5, D501', 'citation_sort': '14517P00010000130000001550', 'is_part_of_series.id': ['VPRS14517'], 'is_part_of_series.title': ['Negatives of Photographs [Publications Branch]'], 'series_id': '14517', 'parents.ids': ['VPRS14517', 'D6BD4E47-F7E6-11E9-AE98-87DB74D2147D'], 'parents.titles': ['Negatives of Photographs [Publications Branch]', '[Not Set]'], 'agencies.titles': ['Education Department'], 'agencies.ids': ['VA714'], 'agencies.date_ranges': ['[1873 TO 1985]'], 'resp_agency_title': ['Department of Education'], 'resp_agency_title_facet': ['Department of Education'], 'resp_agency_id': ['3098'], 'is_part_of_item.PID': ['D6BD4E47-F7E6-11E9-AE98-87DB74D2147D'], 'is_part_of_item.title': ['D332-D667'], 'iiif-manifest': 'https://images.prov.vic.gov.au/manifests/B7/BE/47/C9/-5613-11EB-BE8C-6757FF78D049/images/manifest.json', 'iiif-thumbnail': 'https://images.prov.vic.gov.au/loris/B7%2FBE%2F47%2FC9%2F-5613-11EB-BE8C-6757FF78D049%2Fimages%2F1%2Ffiles%2F14517-00013-D0501.tif/full/!200,200/0/default.jpg', 'control_symbol_labels': ['Registration Number'], 'control_symbol_values': ['D501'], 'record_form': ['Photograph or Image'], 'box_number_sort': [13], '_version_': 1816058689549762560}
While no authentication is required to make a request, there is a limit on the number of requests you can make within a given time period. You can see the details in the response headers.
Using an 'empty' query to get everything¶
If you want information on all the records available through the PROV API you can set q
to either *
or *:*
to get everything. For example, you can find out the total number of records available from the PROV API.
api_url = "https://api.prov.vic.gov.au/search/query"
params = {"q": "*"}
response = requests.get(api_url, params=params)
print(f"API request url: {response.url}")
data = response.json()
print(f"There are {data['response']['numFound']:,} results.")
API request url: https://api.prov.vic.gov.au/search/query?q=%2A There are 10,147,135 results.
The different types of entities in API results¶
API search results are not all the same. They include a mix of functions, agencies, series, consignments, items, images, and relationships between entities. You can find out more about the different entities that PROV uses to describe its collection in the PROV Archival Control Model.
You can identify what type of entity each record relates to by looking at the category
field. For example, the category
field in the record above has the value 'Item'.
The category
field has one of the following values:
Agency
Function
Image
Item
relatedEntity
Series
Using the category
field you can filter your search to include only certain types of records.
While some fields – like _id
, category
, and title
– are present in all API results, other fields vary according to the type of entity.
Identifiers and links¶
Each API record has a hash identifier in the _id
field. These can be used to retrieve a specific API record. But the different entities in PROV's archival control model have other identifiers that are important to understand and recognise as they're used to create relationships between entities.
Functions, agencies, and series¶
Identifiers for functions, agencies, and series have a standard format defined by the PROV's Archival Control Model. These identifiers can be found in the identifier.PROV_ACM.id
field of an API result.
entity | id format | example | field |
-------|-----------|---------|---------|
function | VF [number] | VF 375 | identifier.PROV_ACM.id
agency | VA [number] | VA 3744 | identifier.PROV_ACM.id
series | VPRS [number] | VPRS 851 | identifier.PROV_ACM.id
and series_id
(number only)
You can use these identifiers to construct urls that point to the entity's page on the PROV website. The format is https://prov.vic.gov.au/archive/[identifier]
. This works with or without a space between the letter prefix and the number, but it's probably safer to remove the space, for example:
https://prov.vic.gov.au/archive/VPRS851
These identifiers are used to create relationships between entities. However, depending on the relationship and the field, the form of the identifier might change – sometimes there's a space between the letter prefix and the number, sometimes there's not, and sometimes it's the numeric value only. For example, in the item record displayed above, the series_id
field contains the numeric part of identifier of the series that contains the item, but the full identifier (without a space) is included in the is_part_of_series.id
. These differences can be important if you're using these fields to retrieve information about related entities.
Items¶
Items can be identified in a number of ways. As well as the hash identifier in the _id
field, there's a barcode
number, and control_symbol_values
which can be combined with the series number to create a reference to the item. It seems that the hash identifier is used to create relationships between items and other entities. For example, images are linked back to items by including the hash identifier in the is_part_of_item.PID
field.
The hash identifiers can also be used to construct urls that point to an item's page on the PROV website. The format is https://prov.vic.gov.au/archive/[hash identifier]
. For example, the item record above has an _id
value of B7BE47C9-5613-11EB-BE8C-6757FF78D049
, so the website url is:
https://prov.vic.gov.au/archive/B7BE47C9-5613-11EB-BE8C-6757FF78D049
Images¶
Images add metadata to pages within digitised items. They have an _id
value that you can use to retrieve an API record, but there's no way to build a url that will display a web page containing the image metadata. The best you can do and construct a link to the digitised page. To do this you need the id of the parent item from the image's is_part_of_item.PID
and the canvas _number
of the page. Then you can construct a url with the format: https://prov.vic.gov.au/archive/[parent item id]?image=[canvas number + 1]
. For example:
https://prov.vic.gov.au/archive/29AF80BD-F7F0-11E9-AE98-6FA01622F08F?image=304
Search facets¶
Search facets tell you the number of results per value in a given field. To include facets in your API results you need to set the facet
parameter to true
and use facet.field
to specify the name of the field you're interrested in.
For example, if you wanted to know the number of results in each of the different categories described above:
- set
q
to*
to return everything - set
facet
totrue
- set
facet.field
to category
The facet counts are included in the API results at facet_counts
-> facet_fields
-> [name of the specified field].
params = {
"q": "*", # an empty query to get everything
"facet": "true",
"facet.field": "category",
}
response = requests.get("https://api.prov.vic.gov.au/search/query", params=params)
data = response.json()
values = data["facet_counts"]["facet_fields"]["category"]
print(values)
['Item', 6337827, 'Image', 3613751, 'relatedEntity', 151117, 'Consignment', 23771, 'Series', 17095, 'Agency', 3252, 'Function', 322]
The facet counts are just an array of paired value names and result counts. To convert the list into something more structured you can do this.
facets = [
{"category": values[i], "count": values[i + 1]}
for i in range(0, len(values), 2)
if values[i + 1] > 0
]
pd.DataFrame(facets)
category | count | |
---|---|---|
0 | Item | 6337827 |
1 | Image | 3613751 |
2 | relatedEntity | 151117 |
3 | Consignment | 23771 |
4 | Series | 17095 |
5 | Agency | 3252 |
6 | Function | 322 |
Controlling the way results are delivered¶
There are a number of parameters you can add to your API request to change the way the results are delivered.
wt
: the encoding of the results – eitherJSON
(the default) orXML
rows
: the number of results to include (default is10
)start
: the result number to start from (default is0
)sort
: order in which to sort the resultsfl
: specify the fields you want to include in the results
For example, to increase the number of results returned by your API request set the rows
parameter to 100
.
params = {"q": "rabbits", "rows": 100}
response = requests.get(api_url, params=params)
print(response.url)
data = response.json()
print(
f"This requests delivers {len(data['response']['docs'])} of {data['response']['numFound']:,} results."
)
https://api.prov.vic.gov.au/search/query?q=rabbits&rows=100 This requests delivers 100 of 365 results.
By default, search results are sorted by their relevance score. You can change this using the sort
parameter. You need to supply both the name of a field to sort on and a sort order – either ascending or descending. For example, to sort by title
from A to Z, you'd set sort
to title asc
.
params = {"q": "rabbits", "rows": 1, "sort": "title asc"}
response = requests.get(api_url, params=params)
print(response.url)
data = response.json()
print(data["response"]["docs"][0]["title"])
https://api.prov.vic.gov.au/search/query?q=rabbits&rows=1&sort=title+asc '55/37/5
Changing asc
to desc
will reverse the order of the results.
params = {"q": "rabbits", "rows": 1, "sort": "title desc"}
response = requests.get(api_url, params=params)
print(response.url)
data = response.json()
print(data["response"]["docs"][0]["title"])
https://api.prov.vic.gov.au/search/query?q=rabbits&rows=1&sort=title+desc Wire Netting Advances Files [SAMPLE ONLY RETAINED]
If you only want specific fields, you can supply a list of required field names using the fl
parameter.
params = {"q": "rabbits", "rows": 1, "fl": "_id,title"}
response = requests.get(api_url, params=params)
print(response.url)
data = response.json()
print(data["response"]["docs"][0])
https://api.prov.vic.gov.au/search/query?q=rabbits&rows=1&fl=_id%2Ctitle {'_id': '3589932B-F1AF-11E9-AE98-C70783C3C724', 'title': "Rabbit Inspector's Reports"}
Retrieving a random result¶
You can use the start
and rows
parameters to retrieve a random result:
- first run a query and find the total number of results
- select a random number within the range provided by the total number of results
- set
start
to the random number androws
to1
The cell below displays a randomly-selected photograph. It's similar to the code used by PROVBot to share PROV photos through the Fediverse.
params = {
"q": 'iiif-manifest:[* TO *] AND record_form:"Photograph or Image"',
"rows": 1,
}
# Get total number of results
response = requests.get("https://api.prov.vic.gov.au/search/select", params=params)
data = response.json()
total_results = data["response"]["numFound"]
# Set a random start point within the range of total results
params["start"] = random.randrange(0, total_results)
# Retrieve random result
response = requests.get("https://api.prov.vic.gov.au/search/select", params=params)
data = response.json()
item = data["response"]["docs"][0]
print(item["title"])
display(Image(url=item["iiif-thumbnail"]))
D642 Coarse cloddy soil
Harvest a complete set of results¶
You can loop through a complete set of results by updating the start
value after each request. You'll know when you've reached the end of the results set when the docs
list is empty.
The code below just saves the harvested results into a list named harvested_results
. Alternatively, you could write the harvested results to a file.
params = {
"q": "rabbits",
"category": "Item",
"rows": 100,
}
start = 0
harvested_results = []
# Continue in this loop while there are results to harvest
while True:
# Update the start parameter
params["start"] = start
response = requests.get(api_url, params=params)
data = response.json()
results = data["response"]["docs"]
# Add the results from this request to the harvested results
harvested_results += results
# Get the number of results returned by the current request
num_docs = len(results)
# Add the number of results from this request to the start value
start += num_docs
# There are no more results, so stop the harvest
if num_docs == 0:
break
print(f"Harvested {len(harvested_results)} results.")
Harvested 365 results.
Constructing queries¶
Boolean operators¶
You can use standard Boolean operators, such as AND
, OR
, and NOT
, to combine query terms. You can also use brackets to group parts of complex queries. See the Solr documentation for more.
Text searches¶
To search for words or phrases across multiple fields, just add them to the q
parameter. If you include multiple keywords, they'll be treated as if they were connected by an OR
operator. So a q
value of murray river
is the same as murray OR river
.
params = {
"q": "murray river",
}
response = requests.get(api_url, params=params)
print(response.url)
data = response.json()
print(f"There are {data['response']['numFound']:,} results.")
https://api.prov.vic.gov.au/search/query?q=murray+river There are 64,746 results.
params = {
"q": "murray OR river",
}
response = requests.get(api_url, params=params)
print(response.url)
data = response.json()
print(f"There are {data['response']['numFound']:,} results.")
https://api.prov.vic.gov.au/search/query?q=murray+OR+river There are 64,746 results.
If you want only records containing both keywords, use the AND
operator.
params = {
"q": "murray AND river",
}
response = requests.get(api_url, params=params)
print(response.url)
data = response.json()
print(f"There are {data['response']['numFound']:,} results.")
https://api.prov.vic.gov.au/search/query?q=murray+AND+river There are 2,584 results.
To treat the keywords as a phrase, enclose them in quotes, eg "murray river"
.
params = {
"q": '"murray river"',
}
response = requests.get(api_url, params=params)
print(response.url)
data = response.json()
print(f"There are {data['response']['numFound']:,} results.")
https://api.prov.vic.gov.au/search/query?q=%22murray+river%22 There are 1,746 results.
Text fields are stemmed, so you generally don't need to worry about plurals and other word forms. For example a search for box
will be the same as a search for boxes
, mine
will match mining
, and engine
will match engineer
. There doesn't seem to be any way to search for an exact string, so there'll always be a bit of fuzziness.
You can also use wildcards, fuzzy matches, and proximity searches. See the Solr documentation for more information. For example, a search for "gold mining"
will return records that include the phrase "gold mining" (or "gold mine" because of word stemming). A search for "gold mining"~10
will find records where gold
and mining
(or mine
) occur wihin 10 words of each other.
params = {
"q": '"gold mining"',
}
response = requests.get(api_url, params=params)
print(response.url)
data = response.json()
print(f"There are {data['response']['numFound']:,} results.")
https://api.prov.vic.gov.au/search/query?q=%22gold+mining%22 There are 3,910 results.
params = {
"q": '"gold mining"~10',
}
response = requests.get(api_url, params=params)
print(response.url)
data = response.json()
print(f"There are {data['response']['numFound']:,} results.")
https://api.prov.vic.gov.au/search/query?q=%22gold+mining%22~10 There are 4,563 results.
Filter results by using fields¶
You can filter search results by specifying the value of a particular field. To do this you add [field]:[value]
to the q
query string. For example, if you only want items in your results, you can ask for records where the category
field includes the value Item
. You can also combine multiple field values using the AND
and OR
operators:
- to include only items:
category:Item
- to include either items or images:
category:Item OR category:Image
Let's ask for items only.
params = {"q": "category:Item"}
response = requests.get(api_url, params=params)
print(response.url)
data = response.json()
print(f"\nThere are {data['response']['numFound']:,} results.\n")
for result in data["response"]["docs"]:
print(result["title"])
https://api.prov.vic.gov.au/search/query?q=category%3AItem There are 6,337,827 results. 211/374 Leslie A Lamb: Will; Grant of probate 215/936 Ellen Cahill: Will; Grant of probate 215/981 Florence M Lovegrove: Will; Grant of probate 211/107 Amelia Hawking: Will; Grant of probate 215/980 William F Finchett: Will; Grant of probate 215/979 George Wilson: Will; Grant of probate 211/102 Bernard F Cragen: Will; Grant of probate 211/221 Jonathan Coulson: Will; Grant of probate 215/978 William E S Ockenden: Will; Grant of probate 215/959 Otto Holst: Will; Grant of probate
And now compare the number of results to a request for items or images.
params = {"q": "category:Item OR category:Image"}
response = requests.get(api_url, params=params)
print(response.url)
data = response.json()
print(f"\nThere are {data['response']['numFound']:,} results.\n")
https://api.prov.vic.gov.au/search/query?q=category%3AItem+OR+category%3AImage There are 9,951,578 results.
Here are some commonly-used fields, with their range of possible values, that can be used to filter your results:
category
– possible values:- Agency
- Consignment
- Function
- Image
- Item
- Series
- relatedEntity
format
– possible values:- Digital
- Physical
record_form
– possible values:- Card
- Data
- Document
- File
- Map, Plan, or Drawing
- Moving Image
- Object
- Photograph or Image
- Sound Recording
- Volume
- Website
location
– possible values:- Ballarat
- Beechworth
- Bendigo
- Geelong
- North Melbourne
- Online
rights_status
– posible values:- Closed
- Closed Record and Open Metadata
- Not set
- Open
For example, you might be interested in volumes held in Ballarat:
params = {"q": "record_form:Volume AND location:Ballarat"}
response = requests.get(api_url, params=params)
print(response.url)
data = response.json()
print(f"\nThere are {data['response']['numFound']:,} results.\n")
for result in data["response"]["docs"]:
print(result["title"])
https://api.prov.vic.gov.au/search/query?q=record_form%3AVolume+AND+location%3ABallarat There are 6,976 results. 1966 - 1873 Book 46, 16.04.1920 - 17.11.1920, Book 47, 17.11.1920 - 29.07.1921, Book 48, 01.08.1921 - 28.02.1922, Book 51, 13.04.1923 - 29.09.1923, Book 52, 01.10.1923 - 24.04.1924, Book 53, 28.04.1924 - 02.10.1924, Book 54, 08.10.1924 - 27.03.1925, Book 55, 27.03.1925 - 25.09.1925, Book 56, 25.09.1925 - 08.03.1926,
Filter by date¶
Most records include a start and end date – start_dt
and end_dt
. You ask for records within a specific date range by using a range query. For example, if you wanted all records with a start date between 1920 and 1949 you'd add start_dt:[1920-01-01 TO 1949-12-31]
to the q
query string.
params = {"q": "start_dt:[1920-01-01 TO 1949-12-31]"}
response = requests.get(api_url, params=params)
print(response.url)
data = response.json()
print(f"\nThere are {data['response']['numFound']:,} results.\n")
https://api.prov.vic.gov.au/search/query?q=start_dt%3A%5B1920-01-01+TO+1949-12-31%5D There are 1,076,668 results.
You can use an asterisk instead of a date if the range is open ended. For example, to ask for all records with a start date greater than 1920 you'd add start_dt:[1920-01-01 TO *]
to the query.
params = {"q": "start_dt:[1920-01-01 TO *]"}
response = requests.get(api_url, params=params)
print(response.url)
data = response.json()
print(f"\nThere are {data['response']['numFound']:,} results.\n")
https://api.prov.vic.gov.au/search/query?q=start_dt%3A%5B1920-01-01+TO+%2A%5D There are 3,808,984 results.
Find digitised records¶
Digitised items have a associated IIIF manifest that describes all the digitised images attached to the item. To find digitised items, filter your search to only include records with a value in the iiif-manifest
field. You can do this by using a open-ended range query: iiif-manifest:[* TO *]
.
params = {"q": "iiif-manifest:[* TO *]"}
response = requests.get(api_url, params=params)
print(response.url)
data = response.json()
print(f"\nThere are {data['response']['numFound']:,} results.\n")
https://api.prov.vic.gov.au/search/query?q=iiif-manifest%3A%5B%2A+TO+%2A%5D There are 925,524 results.
The iiif-manifest
values are urls that you can request to information about digitised images in a standard JSON format. See the IIIF documentation for more information on using IIIF manifests.
print(data["response"]["docs"][0]["iiif-manifest"])
https://images.prov.vic.gov.au/manifests/0135/5021/14/images/manifest.json
Some digitised pages are also described in Image
records. Details of these pages will be included in the IIIF manifest of their parent item, but the Image
record attaches some additional metadata, such as the name of a person mentioned on the page. To filter your results to include only image records, add category:Image
to your query.
params = {"q": "category:Image"}
response = requests.get(api_url, params=params)
print(response.url)
data = response.json()
print(f"\nThere are {data['response']['numFound']:,} results.\n")
https://api.prov.vic.gov.au/search/query?q=category%3AImage There are 3,613,751 results.
Digitised items and images both include a thumbnail url in the iiif-thumbnail
field. To find either type of record, add iiif-thumbnail:[* TO *]
to your query.
params = {"q": "iiif-thumbnail:[* TO *]"}
response = requests.get(api_url, params=params)
print(response.url)
data = response.json()
print(f"\nThere are {data['response']['numFound']:,} results.\n")
https://api.prov.vic.gov.au/search/query?q=iiif-thumbnail%3A%5B%2A+TO+%2A%5D There are 4,547,296 results.
Image(url=data["response"]["docs"][0]["iiif-thumbnail"])

Here's an alternative query you could use to find both digitised items and images.
params = {"q": "(iiif-thumbnail:[* TO *] AND category:Item) OR (category:Image)"}
response = requests.get(api_url, params=params)
print(response.url)
data = response.json()
print(f"\nThere are {data['response']['numFound']:,} results.\n")
https://api.prov.vic.gov.au/search/query?q=%28iiif-thumbnail%3A%5B%2A+TO+%2A%5D+AND+category%3AItem%29+OR+%28category%3AImage%29 There are 4,547,296 results.
Find records about people¶
Some records have been indexed with the names of people related to that record. Names are attached to individual files, such as wills or inquests, and to specific pages within volumes or registers, such as passenger lists. You can find records containing particular names by using a text search. If you want to find any records that include names you can filter on the family_name
field using an open ended range query: family_name:[* TO *]
.
params = {"q": "family_name:[* TO *]"}
response = requests.get(api_url, params=params)
print(response.url)
data = response.json()
print(f"\nThere are {data['response']['numFound']:,} results.\n")
https://api.prov.vic.gov.au/search/query?q=family_name%3A%5B%2A+TO+%2A%5D There are 6,097,812 results.
Names can also be included in other fields. I've added more than 6 million named records in the PROV collection to the GLAM Name Index Search.
Find an individual record¶
There's no separate endpoint for retrieving individual entity records via the API. Instead you have to construct a search for a value that's unique to the entity, such as its identifier.
This is pretty straightforward for functions, agencies, and series, as you can search for their standard identifiers in the identifier.PROV_ACM.id
field. The identifier has to have a space between the letter prefix and the number, and should be enclosed in double quotes. For example, to retrieve details of the series VPRS 13:
params = {"q": 'identifier.PROV_ACM.id:"VPRS 13" AND category:Series'}
response = requests.get(api_url, params=params)
print(response.url)
data = response.json()
print(f"\nThere are {data['response']['numFound']:,} results.\n")
display(data["response"]["docs"][0])
https://api.prov.vic.gov.au/search/query?q=identifier.PROV_ACM.id%3A%22VPRS+13%22+AND+category%3ASeries There are 1 results.
{'category': 'Series', 'entity': 'Record', '_id': '13912344-F1A4-11E9-AE98-91984FD5C262', 'timestamp': 1714524468, 'identifier.PROV_ACM.id': 'VPRS 13', 'series_id': '13', 'citation': 'VPRS 13', 'citation_sort': '00013', 'identifier.PID.id': '13912344-F1A4-11E9-AE98-91984FD5C262', 'title': 'Inwards Shipping Index [Refer to Microfilm Copy VPRS 3504]', 'date_range': ['1900'], 'start_dt': '1900', 'start_dt_qual': '?', 'end_dt': '1900', 'how_to_use': ['** Further research is required to determine the exact purpose and context of this series **<br/><br/>This series comprises an alphabetical index to shipping arrivals at Victorian ports. Monitoring of shipping arrivals for customs and immigration purposes was undertaken by the Victorian Government from 1839 until responsibility for these functions passed to the Commonwealth Government in 1901 and 1924 respectively.<br/><br/>Ships have been entered in lexicographical (ie. strict alphabetical) order. Each arrival of the ship is then listed in chronological order by date of arrival.<br/><br/>Entries in the volumes include the following details:<br/>name of vessel<br/>tonnage<br/>master<br/>port of embarkation<br/>date of arrival.<br/><br/>This Index covers the period 1839 to 1900. It is assumed that the Index must have been compiled sometime around or after 1900 in order that the correct alphabetical order could be determined. Evidence in the Index suggests that it was compiled from an existing index.<br/><br/>For the period 1901 to 1924 consult VPRS 3503 which is a microfilm copy of a self-indexing chronological record of ship arrivals. Note that the ships listed in VPRS 3503 are not in strict alphabetical order.<br/>'], 'resp_agency_title': ['Department of Transport (known as Ministry of Transport 1951 to 1992)'], 'resp_agency_title_facet': ['Department of Transport (known as Ministry of Transport 1951 to 1992)'], 'resp_agency_id': ['673'], 'format': 'Physical', 'rights_status': ['Open'], 'location': ['North Melbourne'], 'contents.date_range': ['[1839 TO 1900]'], 'contents.start_dt': [1839], 'contents.end_dt': [1900], 'series_in_custody.date_range': ['1900'], 'series_in_custody.start_dt': [1900], 'series_in_custody.end_dt': [1900], 'responsible_agents.resp_agency_id': [673], 'responsible_agents.title': ['Department of Transport (known as Ministry of Transport 1951 to 1992)'], 'responsible_agents.date_ranges': ['[1983 TO 1996]'], 'responsible_agents.start_dt': [1983], 'responsible_agents.end_dt': [1996], 'creating_agents.creating_agency_id': [606], 'creating_agents.title': ['Department of Trade and Customs'], 'creating_agents.date_ranges': ['1900'], 'creating_agents.start_dt': [1900], 'creating_agents.end_dt': [1900], 'status': 'Published', '_version_': 1816062577834196992}
There are a number of identifiers associated with items, so it's really a matter of what information you have about the item that you can use to construct a search. The hash identifier in the _id
field seems to be used in documenting relationships between items, and between images and items, so you might want to use it to look for further information.
params = {"q": '_id:"B7BE47C9-5613-11EB-BE8C-6757FF78D049" AND category:Item'}
response = requests.get(api_url, params=params)
print(response.url)
data = response.json()
print(f"\nThere are {data['response']['numFound']:,} results.\n")
https://api.prov.vic.gov.au/search/query?q=_id%3A%22B7BE47C9-5613-11EB-BE8C-6757FF78D049%22+AND+category%3AItem There are 1 results.
Find related entities¶
The richness of the PROV Archival Control Models lies in the network of relationships between entities. These relationships are documented in fields within individual records, and in separate relatedEntity
records.
Items¶
Items are part of series. The identifier of an item's parent series can be found in the following fields:
series_id
(numeric value only, eg:7591
)is_part_of_series.id
(no space between prefix and number, eg:VPRS7591
)parents.ids
(no space between prefix and number, eg:VPRS7591
)
Items can also be part of another item. The identifier of a parent item can be found in the following field:
parents.ids
(hash identifier)
Items are part of series and series are created and controlled by agencies. Item records include agency identifiers in the following fields:
resp_agency_id
(numeric value only, eg:2620
)agencies.ids
(no space between prefix and number, eg:VA2620
)
Images¶
Images are part of items. The identifier of an image's parent item can be found in the is_part_of_item.PID
field.
Series¶
Series are created and controlled by agencies. Series records include agency identifiers in the following fields:
resp_agency_id
(numeric value only, eg:2620
)responsible_agents.resp_agency_id
(numeric value only, eg:2620
)creating_agents.creating_agency_id
(numeric value only, eg:2620
)
As well as agencies, series can also be related to other series. These relationships are recorded in relatedEntity
records, for example:
Creating agency
Responsible agency
Controlled series
Controlling series
Previous series
Subsequent series
Agencies¶
Agencies create and control series, perform functions, and are related to other agencies. These relationships are recorded in relatedEntity
records, for example:
Created series
Responsible series
Primary responsible function
Secondary responsible function
Subordinate agency
Superior agency
Subsequent agency
Previous agency
Functions¶
Functions are performed by agencies, and are related to other functions. These relationships are recorded in relatedEntity
records, for example:
Primary responsible agency
Secondary responsible agency
Related function
Broader function
Narrower function
Where there are related identifiers in fields such as series_id
, you can use these identifiers to construct a search that will return the related entity, as described in the previous section.
The relatedEntity
records are richer and more complex. As well as linking identifiers, they provide some extra context around the relationship, such as the date range when it was active. The two linked identifiers are in entity_id
and related_entity_id
fields. Which identifier goes where depends on the direction of the relationship. For example, to find the primary functions of the 'Superintendent, Port Phillip District' (VA 473), you'd search for entity_id:VA473
and relationship:"Primary responsible function"
.
params = {
"q": 'category:relatedEntity AND entity_id:VA473 AND relationship:"Primary responsible function"',
"rows": 100,
}
response = requests.get(api_url, params=params)
print(response.url)
data = response.json()
print(f"\nThere are {data['response']['numFound']:,} functions.\n")
for result in data["response"]["docs"]:
print(result["title"])
https://api.prov.vic.gov.au/search/query?q=category%3ArelatedEntity+AND+entity_id%3AVA473+AND+relationship%3A%22Primary+responsible+function%22&rows=100 There are 21 functions. Crown lands (public) Crown lands (government) Armed forces command Education Goldfields administration and mining Library, State Botanic gardens Crown solicitor's services Census and statistics Finance General superintendence Police Ports and harbours Immigration (nineteenth century) Postal services Buildings, government (design and construction) Roads and bridges Health, public Customs Aboriginal affairs Prisons and youth training centres
Here's the first of these records.
data["response"]["docs"][0]
{'category': 'relatedEntity', 'status': 'Published', '_id': 'VA473:VF309:2690:primaryresponsibilityfor', 'timestamp': 1614239338, 'entity_id': 'VA473', 'related_entity_id': 'VF309', 'sort_id': 309, 'title': 'Crown lands (public)', 'relationship': 'Primary responsible function', 'relationship_date_range': ['[1839 TO 1851]'], 'relationship_start_dt': 1839, 'relationship_end_dt': 1851, '_version_': 1816039025116446725}
To go in the other direction and find all the agencies with primary responsibility for the function 'Crown lands (public)' (VF 309), you'd search for entity_id:VF309
and relationship:"Primary responsible agency"
.
params = {
"q": 'category:relatedEntity AND entity_id:VF309 AND relationship:"Primary responsible agency"',
"rows": 100,
}
response = requests.get(api_url, params=params)
print(response.url)
data = response.json()
print(f"\nThere are {data['response']['numFound']:,} agencies.\n")
for result in data["response"]["docs"]:
print(result["title"])
https://api.prov.vic.gov.au/search/query?q=category%3ArelatedEntity+AND+entity_id%3AVF309+AND+relationship%3A%22Primary+responsible+agency%22&rows=100 There are 14 agencies. Superintendent, Port Phillip District Department of Conservation and Natural Resources Department of Crown Lands and Survey, Geelong Division Department of Conservation, Forests and Lands Department of Environment and Primary Industries Department of Conservation and Environment Police Magistrate Port Phillip District Department of Natural Resources and the Environment Department of Sustainability and Environment Colonial Secretary's Office Crown Lands Department Department of Environment, Land, Water and Planning Department of Energy, Environment and Climate Action Department of Crown Lands and Survey
Filter by related entities¶
As described above, the API records of items, images, and series contain links to related entities. You can use these relationships to filter your searches. For example, if you want to limit your search for items to those in series VPRS 460, you'd add series_id:460
to your query.
params = {"q": "category:Item AND series_id:460"}
response = requests.get(api_url, params=params)
print(response.url)
data = response.json()
print(f"\nThere are {data['response']['numFound']:,} results.\n")
https://api.prov.vic.gov.au/search/query?q=category%3AItem+AND+series_id%3A460 There are 74,409 results.
You could also use the is_part_of_series.id
field, but the value needs to include the letter prefix: VPRS460
.
params = {"q": "category:Item AND is_part_of_series.id:VPRS460"}
response = requests.get(api_url, params=params)
print(response.url)
data = response.json()
print(f"\nThere are {data['response']['numFound']:,} results.\n")
https://api.prov.vic.gov.au/search/query?q=category%3AItem+AND+is_part_of_series.id%3AVPRS460 There are 74,409 results.
Created by Tim Sherratt for the GLAM Workbench.