Metadata as media
(As always, I speak only for myself as a media producer and archivist.)
I attended the panel discussion on PBCore at the recent Open Video Conference, and was struck by something that should have been obvious. Those of us pushing development of PBCore have failed to clarify one basic thing: What is PBCore for? I’ve been to workshops and sessions on PBCore over the past six years and have been on metadata panels at the AMIA Conference, the PBS Tech Conference, NETA, and iMA. We often focused on explaining the PBCore elements and why they are useful for cataloging media assets. But at the OVC, the question was raised “Why do we need PBCore to catalog our stuff if we already have a good media database?” The question reveals a conflation of two distinct things: having a media database, and being able to easily interoperate with other databases.
Most of us producers are not, after all, experts in database administration, XML, or programming in general. During the American Archive Pilot Project I talked with people at other public TV and radio stations trying to “use” PBCore without adequate tools, and without understanding why they had to use it in the first place. If the answer is “you need a data model for cataloging your media assets,” there are many other catalog and data models. A better answer is you use PBCore to create shareable metadata. If you have a media collection and you want to combine it with other collections, PBCore provides a machine-readable translation layer between systems.
Some people have asked why use PBCore instead of something simpler like RSS or Atom? I think that’s a really good question.
You can stuff lots of descriptive metadata into RSS or Atom. Their schemas are understood by a wide range of applications, and they are simple to implement. At the other end of the spectrum, MPEG7 provides an exhaustive schema for describing multimedia content…emphasis on the word “exhaustive.”
PBCore is somewhere between the simplicity of RSS and the verbose complexity of MPEG7. It provides a level of detail useful to media archives, without being ridiculous to implement. It’s sort of a “just right” format, allowing simple producers like me to share a great deal of useful metadata about my media assets with any system that can parse PBCore XML.
So in the example of the American Archive Pilot Project, my station used a MySQL-based Content Management System to catalog several hundred media assets. (See my earlier post for details.) With the CMS I could render web pages and an RSS feed, plus PBCore records for each asset. The AAPP project portal could just ingest my PBCore records into the national AAPP database, getting much more detail on each asset than would be provided by RSS.
Audio and video are media that connect human beings. RSS and PBCore are media that connect machines.
You can have a fantastic media database not designed around the PBCore standard. You can create a PBCore representation of that database by exporting XML records based on the PBCore schema. You can create other representations of your data based on RSS, JSON, and other formats. An example of this is the NPR API query generator, which provides multiple output format options including RSS, Atom, HTML, and NPR’s proprietary (and wonderfully detailed and useful) NPRML.
Given the flexibility of today’s tools, we can generate multiple different representations of our media database for different purposes. So what’s the use-case scenario with PBCore? With RSS or Atom, we know that many other systems can ingest our data. What systems can ingest PBCore?
A growing number of systems that speak PBCore have the word “archive” in them. Importantly, I hear the American Archive would adopt PBCore as a primary means for ingesting metadata from contributor collections. This would allow each contributor to use whatever database best suits their local needs, as long as each local system can create shareable metadata in the PBCore format.
PBCore won’t be used by media consumers and consumer-level applications like iTunes. PBCore will be used by media archives and the systems that contribute to them. That’s what PBCore is for.