Access MARC data describing an item in the SLV catalogue¶

If you have an item's Alma identifier, you can retrieve structured metadata describing the item from the SLV catalogue in a couple of ways. One approach is to download a text representation of the item's MARC record and extract data from it. This notebook provides some examples of how you can do this.

You can find an item's Alma identifier by looking for 'Record ID' in the 'Details' section of the catalogue entry.

In [2]:
import re

import requests

To get the text representation of an item's MARC record, request a url of the form:

https://find.slv.vic.gov.au/primaws/rest/pub/sourceRecord?docId=alma[ALMA ID]&vid=61SLV_INST:SLV

inserting the Alma ID where indicated.

In [43]:
def get_marc_record(alma_id):
    """
    Gets a text representation of an item's MARC record.
    """
    response = requests.get(
        f"https://find.slv.vic.gov.au/primaws/rest/pub/sourceRecord?docId=alma{alma_id}&vid=61SLV_INST:SLV"
    )
    return response.text
In [44]:
marc = get_marc_record("9921188273607636")
print(marc)
leader	01506cem a2200373 a 4500
001	9921188273607636
005	20240528080520.0
007	aj aanzn
008	101005s1968    vra       a  s  0   eng d
034	1#$aa $b15840 $dE1425000 $eE1425000 $fS0380000 $gS0380000 
035	##$a(AuCNLKIN)000027964179 
035	##$a(OCoLC)221962153 
035	##$a2118827 
035	##$a(Voyager)2118827-slvdb-Voyager 
035	##$aIE7027444 
040	##$aVSL $beng $cVSL $dVSL $dVSL $dVSL $dVSL $dVSL $dVSL $dVSL $dVSL 
042	##$aanuc 
043	##$au-at-vi 
110	1#$aVictoria. $bDepartment of Crown Lands and Survey. 
245	10$aToorak, County of Hampden $h[cartographic material] / $cdrawn and reproduced at the Department of Lands and Survey, Melbourne. 
255	##$aScale [ca. 1: 15 840] $c(E 142°50'/S 38°00'). 
260	##$aMelbourne : $bDept. of Lands and Survey, $c1968. 
300	##$a1 map ; $con sheet 76 x 102 cm. 
500	##$aCadastral map showing parish boundaries and land ownership. 
540	##$aNo copyright restrictions apply. 
542	##$lThis work is out of copyright 
650	#0$aReal property $zVictoria $zToorak (Parish) $vMaps. 
651	#0$aToorak (Vic. : Parish) $vMaps. 
830	#0$aParish maps of Victoria. 
950	##$aMaps $bStillImage $cimage/tiff $d1 $f1968 $o1 map ; on sheet 76 x 102 cm. $qToorak, County of Hampden 
956	##$a10381/139039 $bONE $c1415415 $eIE7027444 $f9921188273607636 $gDigitised $hslvdb $iAVAILABLE $jSIP3535 $kdq005511 
984	##$aVSL $cheld 
997	##$7CRSU 
999	##$9OC 

As you can see above, each line of the MARC record includes a tag ( eg 245) and a series of values, separated from the tag by a tab character. The values are defined by series of subfields whose labels begin with a $ sign (eg $a). For example, to find the title of the item you'd look in tag 245 for subfield a. There are also two characters at the beginning of each set of values used as indicators to provide additional information.

There are specialised MARC tools available for parsing and manipulating records, but they might be a bit complex for your needs. You can find tag/subfield values just by using regular expressions to extract them from the MARC text.

In [46]:
def get_marc_value(marc, tag, subfield=None):
    """
    Gets the value of a tag/subfield from a text version of an item's MARC record using regular expressions.
    """
    try:
        # Get the line that starts with the specified tag
        tag = re.search(rf"^{tag}\t.+", marc, re.M).group(0)
        if subfield:
            # If a subfield has been requested, get the subfield value
            value = re.search(rf"\${subfield.lstrip('$')}([^\$]+)", tag).group(1)
        else:
            # If no subfield has been requested, just return the tag value
            value = tag.split()[1:]
    except AttributeError:
        return None
    return value.strip(" .,")
In [47]:
get_marc_value(marc, "245", "$a")
Out[47]:
'Toorak, County of Hampden'

An alternative approach is to convert the whole MARC record into a Python dictionary by splitting the lines on the tab characters and dollar signs. You can then access the tags and subfields from the dict.

In [41]:
def convert_marc_to_dict(marc):
    """
    Converts the MARC text record into a dict, organised by tag and $ subfields.
    Indicators are ignored.
    """
    marc_dict = {}
    # Loop through each line by splitting the text on newline characters
    for line in marc.split("\n"):
        if line:
            # Split tag from values on tab characters
            tag, values = line.split("\t")
            # If there are no subfields (no $ signs in the values) add the tag and value to the dict
            if "$" not in values:
                marc_dict[tag] = values.strip()
            # If there are subfields we'll process each one and add to the dict
            else:
                marc_dict[tag] = {}
                # Strip the two indicator characters from the front of the values and split on $ sign
                # Loop through all the subfields
                for subfield in values[2:].split("$"):
                    if subfield:
                        # Get the subfield label from the front of the string
                        # Add the label and value to the dict
                        marc_dict[tag][f"${subfield[0]}"] = subfield[1:].strip()
    return marc_dict
In [48]:
marc_dict = convert_marc_to_dict(marc)
marc_dict
Out[48]:
{'leader': '01506cem a2200373 a 4500',
 '001': '9921188273607636',
 '005': '20240528080520.0',
 '007': 'aj aanzn',
 '008': '101005s1968    vra       a  s  0   eng d',
 '034': {'$a': 'a',
  '$b': '15840',
  '$d': 'E1425000',
  '$e': 'E1425000',
  '$f': 'S0380000',
  '$g': 'S0380000'},
 '035': {'$a': 'IE7027444'},
 '040': {'$a': 'VSL', '$b': 'eng', '$c': 'VSL', '$d': 'VSL'},
 '042': {'$a': 'anuc'},
 '043': {'$a': 'u-at-vi'},
 '110': {'$a': 'Victoria.', '$b': 'Department of Crown Lands and Survey.'},
 '245': {'$a': 'Toorak, County of Hampden',
  '$h': '[cartographic material] /',
  '$c': 'drawn and reproduced at the Department of Lands and Survey, Melbourne.'},
 '255': {'$a': 'Scale [ca. 1: 15 840]', '$c': "(E 142°50'/S 38°00')."},
 '260': {'$a': 'Melbourne :',
  '$b': 'Dept. of Lands and Survey,',
  '$c': '1968.'},
 '300': {'$a': '1 map ;', '$c': 'on sheet 76 x 102 cm.'},
 '500': {'$a': 'Cadastral map showing parish boundaries and land ownership.'},
 '540': {'$a': 'No copyright restrictions apply.'},
 '542': {'$l': 'This work is out of copyright'},
 '650': {'$a': 'Real property', '$z': 'Toorak (Parish)', '$v': 'Maps.'},
 '651': {'$a': 'Toorak (Vic. : Parish)', '$v': 'Maps.'},
 '830': {'$a': 'Parish maps of Victoria.'},
 '950': {'$a': 'Maps',
  '$b': 'StillImage',
  '$c': 'image/tiff',
  '$d': '1',
  '$f': '1968',
  '$o': '1 map ; on sheet 76 x 102 cm.',
  '$q': 'Toorak, County of Hampden'},
 '956': {'$a': '10381/139039',
  '$b': 'ONE',
  '$c': '1415415',
  '$e': 'IE7027444',
  '$f': '9921188273607636',
  '$g': 'Digitised',
  '$h': 'slvdb',
  '$i': 'AVAILABLE',
  '$j': 'SIP3535',
  '$k': 'dq005511'},
 '984': {'$a': 'VSL', '$c': 'held'},
 '997': {'$7': 'CRSU'},
 '999': {'$9': 'OC'}}
In [49]:
marc_dict["245"]["$a"]
Out[49]:
'Toorak, County of Hampden'
In [52]:
# IGNORE TESTING ONLY
get_marc_value(marc, "245", "$a") == "Toorak, County of Hampden"
marc_dict["245"]["$a"] == "Toorak, County of Hampden"
Out[52]:
True

Created by Tim Sherratt for the GLAM Workbench. If you find this useful, you can sponsor me on GitHub.

In [ ]: