Sneak preview of PBCore 2.0

Written by Jack Brighton on Wednesday, January 12, 2011

If I’ve learned one thing about the PBCore user community, it’s that we’re not satisfied with the current state of PBCore. We’ve used it enough to discover its strengths in describing AV assets and creating shareable metadata, but we keep running into its gaps and flaws. We’ve been pushing for a change process, and have argued for specific changes. Common threads have emerged right here on this site:

  • A need for PBCore to support multi-part instantiations, e.g. when you have one complete work comprised of several reels or tapes or files.
  • A need to express rights information related to a specific Instantiation, instead of only the entire asset. For example, you might want to allow users to download an mpeg4 version of a film for personal use, but not grant the same kind of access to the actual film!
  • Speaking of rights, formatting of the pbcoreRightsSummary element disallows inclusion of metadata from existing standards such as ORDL or Creative Commons, which seems odd to say the least. If you already have structured rights data, why not simply reuse it?
  • A need to show relationships between Instantiations, like when you digitize a film to 10-bit uncompressed digital video, then encode an mpeg4 file from the 10-bit uncompressed file, it seems important to show that in the PBCore record.
  • With pbcoreContributor, you can say that Harrison Ford is an Actor, but you can’t say what role he plays in the film.
  • There’s no way to uniquely identify a person, subject term, location, or other value that might have an actual URI.
  • The lack of attributes of any kind! Everything is elements and sub-elements, which seems inefficient and makes parsing more difficult.
  • The lack of a valid way to identify clip information within an asset, for example where in the timeline a particular subject is discussed or a specific person appears.
  • The lack of any way to bundle multiple PBCore XML records together in a feed or collection, so you could export/import large groups of records between systems or use PBCore in RESTful web applications.

Well good news folks! PBCore 2.0 is on the way, and it solves all these issues.

Even better, it solves them in a way that doesn’t add complexity for those who want to keep PBCore simple. For example, PBCore 2.0 allows you to use attributes for a subject term to specify a URI, and a startTime and endTime for that term in the media asset timeline. So you could have something like this:

<pbcoreSubject ref=”” startTime=”00:23:14” endTime=”00:24:22”>Hobbits</pbcoreSubject>

You can also do this:

     <contributor affiliation=”NPR” ref=””>Michele Norris</contributor>

Or this:

     <contributor ref=””>Sean Connery</contributor>
     <contributorRole portrayal=”James Bond”>Actor</contributorRole>

But the use of the new 2.0 attributes is totally optional. You can keep it simple and use PBCore the same way as before.

Once you get used to the idea of adding attributes, however, you may find it opens up all kinds of new possibilities for your PBCore metadata. For example, the use of URIs to identify values like subject terms, people, and locations is the first step to enabling content to live and breath in the emerging semantic web/linked data universe. The addition of the optional ref=“URI” attribute in the 2.0 schema puts PBCore squarely onto that path.

But I suspect many of the other improvements to PBCore 2.0 will make life easier for all concerned. From what I see, the changes solve the issues people have raised on this site. The folks at WGBH who managed the 2.0 project did additional extensive research and outreach to find out what people using PBCore need, and how best to evolve the schema. And I give a lot of credit to CPB for supporting an open and transparent process. We all contributed to the 2.0 version of PBCore, and our input was taken seriously. I understand the schema will be publicly released soon, and you’ll see.

PBCore has thus far only sort of worked as a metadata standard for AV assets and collections, but gaps in its earlier versions drove many of us to implement workarounds and hacks. The result was lack of clarity at best, which is not a good thing for a technical standard. The 2.0 PBCore schema probably isn’t perfect, and we’ll all find out more as we learn about it and begin our own implementations. But in my view it takes PBCore to a much higher state of functionality and flexibility, while retaining its simplicity and its humble origins as a child of Dublin Core.


Write a comment: