Via Scott Leslie's EdTechpost, I found this 60 page report by Kenny Baird (the Jorum Metadata and Technical Support Officer) and the Jorum Team [1.5 MB PDF], published in July 2006. The report consists of:
- an analysis of what Jorum (a learning and teaching materials repository - see footnote) is currently doing;
- an initial assessment of what JORUM could do;
- an overview of some other systems, services and products which claim automated metadata generation as either a direct or indirect output of their aims (this section forms about one third of the whole report).
This extract from the Executive Summary give you the flavour:
".... the increased application of systems and process to automate metadata will not result – in the foreseeable future at least – in the obsolescence of the human in the metadata creation process. Whereas a computer can read, say, an IMS Manifest file and record all references of technical formats much faster than a human, a skilled cataloguer is able to make judgements on the practical application of the described resource within a learning environment in ways a computer cannot. Therefore, much of the metadata which can be automatically generated relates to its technical properties, repository users will typically need more subjective metadata to enable them to asses their retrieval results. The ideal situation is the two approaches to metadata creation working in tandem, with as much automated as possible to allow cataloguers to spend greater time on creation of metadata that cannot reasonably be expected to be automated. Concise, accurate recordings of technical properties and other elements which are consistent and overly mundane to warrant repetition on creation (such as vCard details), allied with human catalogued entries will provide the discovery user with both an overview of a resources' properties and limitations, and also allow the user to make a quick judgement on the relevance of the retrieved resource for a particular educational/learning context."
Whilst writing this I looked back at Cory Doctorow's acidly written 2001 Metacrap: Putting the torch to seven straw-men of the meta-utopia, and chanced on this 2005 summary by Google of the way metadata is used in web pages (part of a longer, but still brief, December 2005 analysis by Google of a sample of slightly over a billion documents, extracting information about popular class names, elements, attributes, and related metadata). This seemed to show much human error there is in metadata creation (across the web as a whole, as against in content curated by information professionals).
Footnote. Jorum is a JISC-funded collaborative venture in UK Higher and Further Education to collect and share learning and teaching materials, allowing their reuse and repurposing, and standing as a national statement of the importance of creating interoperable, sustainable materials.
Comments