Metadata as media

Written by Jack Brighton on Sunday, October 17, 2010

(As always, I speak only for myself as a media producer and archivist.)

I attended the panel discussion on PBCore at the recent Open Video Conference, and was struck by something that should have been obvious.  Those of us pushing development of PBCore have failed to clarify one basic thing: What is PBCore for? I’ve been to workshops and sessions on PBCore over the past six years and have been on metadata panels at the AMIA Conference, the PBS Tech Conference, NETA, and iMA. We often focused on explaining the PBCore elements and why they are useful for cataloging media assets. But at the OVC, the question was raised “Why do we need PBCore to catalog our stuff if we already have a good media database?” The question reveals a conflation of two distinct things: having a media database, and being able to easily interoperate with other databases.

Most of us producers are not, after all, experts in database administration, XML, or programming in general. During the American Archive Pilot Project I talked with people at other public TV and radio stations trying to “use” PBCore without adequate tools, and without understanding why they had to use it in the first place. If the answer is “you need a data model for cataloging your media assets,” there are many other catalog and data models. A better answer is you use PBCore to create shareable metadata. If you have a media collection and you want to combine it with other collections, PBCore provides a machine-readable translation layer between systems.

Some people have asked why use PBCore instead of something simpler like RSS or Atom? I think that’s a really good question.

You can stuff lots of descriptive metadata into RSS or Atom. Their schemas are understood by a wide range of applications, and they are simple to implement. At the other end of the spectrum, MPEG7 provides an exhaustive schema for describing multimedia content…emphasis on the word “exhaustive.”

PBCore is somewhere between the simplicity of RSS and the verbose complexity of MPEG7. It provides a level of detail useful to media archives, without being ridiculous to implement. It’s sort of a “just right” format, allowing simple producers like me to share a great deal of useful metadata about my media assets with any system that can parse PBCore XML.

So in the example of the American Archive Pilot Project, my station used a MySQL-based Content Management System to catalog several hundred media assets. (See my earlier post for details.) With the CMS I could render web pages and an RSS feed, plus PBCore records for each asset. The AAPP project portal could just ingest my PBCore records into the national AAPP database, getting much more detail on each asset than would be provided by RSS.

Audio and video are media that connect human beings. RSS and PBCore are media that connect machines.

You can have a fantastic media database not designed around the PBCore standard. You can create a PBCore representation of that database by exporting XML records based on the PBCore schema. You can create other representations of your data based on RSS, JSON, and other formats. An example of this is the NPR API query generator, which provides multiple output format options including RSS, Atom, HTML, and NPR’s proprietary (and wonderfully detailed and useful) NPRML.

Given the flexibility of today’s tools, we can generate multiple different representations of our media database for different purposes. So what’s the use-case scenario with PBCore? With RSS or Atom, we know that many other systems can ingest our data. What systems can ingest PBCore?

A growing number of systems that speak PBCore have the word “archive” in them. Importantly, I hear the American Archive would adopt PBCore as a primary means for ingesting metadata from contributor collections. This would allow each contributor to use whatever database best suits their local needs, as long as each local system can create shareable metadata in the PBCore format.

PBCore won’t be used by media consumers and consumer-level applications like iTunes. PBCore will be used by media archives and the systems that contribute to them. That’s what PBCore is for.


  • John Tynan said on 10/18 at 10:39 AM

    Well written article, Jack.  I have one thought… does the American Archive pilot project or WILL’s Prariefire have a search API so that people could build modules to access and parse pbcore records, similar to KCRW’s NPRAPI module ?

  • Rob Vincent said on 10/18 at 11:38 AM

    Thanks for this clarification of the purpose of PBCore, Jack.

    Working on the AAPP, I quickly developed an appreciation for the complexity of archiving media.  Even if we could freeze the technology of media production and recording, archiving our expanding media libraries is a herculean task.

    PBCore is a very thorough metadata standard, but it couldn’t possibly address every function of a complete and efficient archive.  Stations are eager for such a standard, but I don’t think it’s realistic to offer a single standard. It would become too large and complex.

    It may be helpful to separate out the functions of media production, archiving and transference and define a standard for each logically distinct function.  PBCore would be the standard for transference of metadata - the language archives use to communicate with other archives. But maybe that tilts the scale toward too many standards or protocols.

    It’s an exciting time to be in media, and I’m glad there are people much more clever than me working on these problems.

    Thanks for the post.

  • Karan Sheldon said on 10/18 at 11:46 AM

    Jack, you’ve expressed this very well.  From the point of view of under-resourced archives like Northeast Historic Film, PBCore offers two crucial characteristics, 1) Common sense structure, 2) Clarity in allowing persistent intellectual description to be accompanied by metadata describing future representations. 

    As you say,

    It’s sort of a “just right” format, allowing simple producers like me to share a great deal of useful metadata about my media assets with any system that can parse PBCore XML.

  • Mary said on 10/18 at 12:27 PM

    Jack, thanks for posting this.

    It’s interesting to think that PBCore’s being used as an archival database language, because in its current incarnation I find it lacks some elements that are essential for our archival purposes.  We’ve added them to the instantiation element set that we use, and I’m hoping PBCore will address them in version 2.0.

    (Not to redirect the conversation, but our archives needs to be able to say not just who created the content but who created the instantiation, since it may have been made in-house or at any of a wide number of labs we contract to.  That’s just one example of what I’m talking about.)

    Our in-house data isn’t fully PBCore compliant, but we are using it for export. PBCore can also be great for those who are using their own unique, in-house database as a standard for comparison.  If you aren’t recording the types of information asked for in the PBCore element set, maybe you should be.

  • Jack Brighton said on 10/18 at 01:09 PM

    A couple of notes addressing the above comments:

    John, I think building an API is probably the best thing we could do to allow open access to online media. We still need web sites but it seems obvious that web services are the killer app so to speak. I don’t speak for the American Archive and I don’t know about plans for an API there. We’re building an API to our entire media collection at Illinois Public Media, coming soon to a screen (of any type and size) near you!

    Rob Vincent is one of my heros, and if even he struggled with metadata issues in the AAPP we should all feel better about ourselves.

    Karan mentions a key thing that PBCore does: describing the intellectual content of a media asset, distinct from its representation in analog or digital form. It’s different kind of FRBR model, but extremely well-suited to media archives.

    Mary, I know your voice has been heard in the discussions of PBCore 2.0, and while it won’t do everything PREMIS does, I believe it will have some method of tracking the provenance of instantiations. My guess is there will continue to be a need for PREMIS and METS (and RSS!), along with in-house databases that suit in-house needs beyond any other standard.

Write a comment:

Commenting is not available in this section entry.