Article category: PBCore 2.0
pbcoreAssetDate needs refinement…and this site needs revitalization
So…lots going on in PBCoreland that hasn’t been reflected on pbcoreresources.org. I’ll get to that in another post. For the moment I want to mark something that is currently bugging me about PBCore 2.0: pbcoreAssetDate needs something to say what date formatting is being used. This is true for all other dates in a PBCore record.
I’m in the middle of building a PBCore export feature for WILL’s main website. This will allow exchange of pretty complete metadata with systems that can ingest PBCore, like the American Archive project (if it ever gets truly rolling) and the Popup Archive (which is rolling nicely). As I dive into the specifics, I want to return to and highlight those things about the PBCore 2.0 schema that remain…unfinished.
My concern is machine readability of dates and times. The PBCore 2.0 schema suggests, but does not require, ISO 8601 or the Library of Congress Extended Date/Time Format (EDTF) (and BTW the link on pbcore.org to EDTF is broken). Two big problems here:
- The 2.0 schema doesn’t provide any way to specify date formatting at all
- Even if it did, there’s a huge range of possible date formats within either IOS 8601 or EDTF
What’s a good solution? I don’t build parsers for a living, so I’m not sure, and thus this post. I’m tempted to say we should add a source attribute to PBCore dates, and specify the source of the date format we’re using. But is this specific enough? For example:
<pbcoreAssetDate dateType=“published” source=“ISO 8601”>2006-10-16T08:19:39-05:00</pbcoreAssetDate>
p.s. For a variety of reasons, I’m back on the job here as your editor/curator/muckraker of this site. It needs a rebuild, but first things first!
PBCore.org site refresh a welcome sight
If you haven’t been to http://pbcore.org lately you’re in for a good surprise. The site has been completely rebuilt, and contains up-to-date documentation, news, case studies, and most importantly, the complete PBCore 2.0 schema. There, I buried the lead: PBCore 2.0 has been officially released!
I might quibble with the color scheme of the new site, and I see some obvious CSS tweaks that could improve readability. But hey, my own websites need lots of work so who am I to talk? I have to give the folks at WGBH, who rebuilt the new PBCore site from the bones and ashes of its 2005 incarnation, lots of credit for getting things basically right.
Things I really like: In addition to the 2.0 Schema, there’s a How To section (contained in the Documentation menu item in the sidebar navigation) which really good guidance, and a Training section with clear instructions and code examples on things like “How to express collections in PBCore,” “How to sequence records within relationships,” and “How to express time segments within a video.” None of these was even possible prior to the 2.0 schema, and now we have clear documentation on how to do them.
Another area of the PBCore site that stands out is the Elements section. This section provides concise details on each of the PBCore 2.0 elements, usage rules (i.e. minOccurs) where it appears in the schema, and its available attributes. I find the Elements section highly usable, but I find myself Command-clicking element names to open them in a new browser tab (I’m on a Mac…) so I don’t lose the Elements index page in the original tab. It might be more usable to navigate the Elements more like the old PBCore User Guide, where clicking on an Element doesn’t take you away from the navigation. But that’s a minor quibble, I’m a geek, and am never completely happy.
One thing I am happy about is the inclusion of a “related discussions” link on each Element page, which includes a link to the home page of pbcoreresources.org. This leads to an idea about how pbcoreresources could directly add value to pbcore.org. As we discuss various PBCore elements here, our posts get aggregated in categories like pbcoreTitle. So for example on the pbcoreCollection Element page on pbcore.org, the “related discussions” link could go directly to the pbcoreCollection category page on pbcoreresources.org. This assumes enough of us are contributing to pbcoreresources with questions, answers, examples, and other useful conversation about PBCore elements. We have done that to some extent, and I’m suggesting we do it more. PBCore.org can then mine those discussions to enhance the official documentation over time.
Or maybe pbcore.org will continue to grow and supplant some of the stuff we have been doing here, and I’d be OK with that. The new pbcore.org site is built on WordPress, and it does allow comments in many places including the How To pages and the Element pages. Wherever it happens, I expect the user community will continue to build a shared understanding of how to move forward with PBCore. And above all, to keep the keepers of the PBCore standard in tune with the needs and realities of real-world media producers, publishers, and archivists who use it every day.
Sneak preview of PBCore 2.0
If I’ve learned one thing about the PBCore user community, it’s that we’re not satisfied with the current state of PBCore. We’ve used it enough to discover its strengths in describing AV assets and creating shareable metadata, but we keep running into its gaps and flaws. We’ve been pushing for a change process, and have argued for specific changes. Common threads have emerged right here on this site:
- A need for PBCore to support multi-part instantiations, e.g. when you have one complete work comprised of several reels or tapes or files.
- A need to express rights information related to a specific Instantiation, instead of only the entire asset. For example, you might want to allow users to download an mpeg4 version of a film for personal use, but not grant the same kind of access to the actual film!
- Speaking of rights, formatting of the pbcoreRightsSummary element disallows inclusion of metadata from existing standards such as ORDL or Creative Commons, which seems odd to say the least. If you already have structured rights data, why not simply reuse it?
- A need to show relationships between Instantiations, like when you digitize a film to 10-bit uncompressed digital video, then encode an mpeg4 file from the 10-bit uncompressed file, it seems important to show that in the PBCore record.
- With pbcoreContributor, you can say that Harrison Ford is an Actor, but you can’t say what role he plays in the film.
- There’s no way to uniquely identify a person, subject term, location, or other value that might have an actual URI.
- The lack of attributes of any kind! Everything is elements and sub-elements, which seems inefficient and makes parsing more difficult.
- The lack of a valid way to identify clip information within an asset, for example where in the timeline a particular subject is discussed or a specific person appears.
- The lack of any way to bundle multiple PBCore XML records together in a feed or collection, so you could export/import large groups of records between systems or use PBCore in RESTful web applications.
Well good news folks! PBCore 2.0 is on the way, and it solves all these issues.
Even better, it solves them in a way that doesn’t add complexity for those who want to keep PBCore simple. For example, PBCore 2.0 allows you to use attributes for a subject term to specify a URI, and a startTime and endTime for that term in the media asset timeline. So you could have something like this:
<pbcoreSubject ref=”http://en.wikipedia.org/wiki/Hobbit” startTime=”00:23:14” endTime=”00:24:22”>Hobbits</pbcoreSubject>
You can also do this:
<contributor affiliation=”NPR” ref=”http://en.wikipedia.org/wiki/Michele_Norris”>Michele Norris</contributor>
<contributor ref=”http://en.wikipedia.org/wiki/Sean_Connery”>Sean Connery</contributor>
<contributorRole portrayal=”James Bond”>Actor</contributorRole>
But the use of the new 2.0 attributes is totally optional. You can keep it simple and use PBCore the same way as before.
Once you get used to the idea of adding attributes, however, you may find it opens up all kinds of new possibilities for your PBCore metadata. For example, the use of URIs to identify values like subject terms, people, and locations is the first step to enabling content to live and breath in the emerging semantic web/linked data universe. The addition of the optional ref=“URI” attribute in the 2.0 schema puts PBCore squarely onto that path.
But I suspect many of the other improvements to PBCore 2.0 will make life easier for all concerned. From what I see, the changes solve the issues people have raised on this site. The folks at WGBH who managed the 2.0 project did additional extensive research and outreach to find out what people using PBCore need, and how best to evolve the schema. And I give a lot of credit to CPB for supporting an open and transparent process. We all contributed to the 2.0 version of PBCore, and our input was taken seriously. I understand the schema will be publicly released soon, and you’ll see.
PBCore has thus far only sort of worked as a metadata standard for AV assets and collections, but gaps in its earlier versions drove many of us to implement workarounds and hacks. The result was lack of clarity at best, which is not a good thing for a technical standard. The 2.0 PBCore schema probably isn’t perfect, and we’ll all find out more as we learn about it and begin our own implementations. But in my view it takes PBCore to a much higher state of functionality and flexibility, while retaining its simplicity and its humble origins as a child of Dublin Core.
PBCore 2.0 session at the Open Video Conference, October 1, NYC
For those attending the Open Video Conference this weekend in New York, don’t miss the panel on PBCore 2.0. A stellar lineup will talk about the process of developing 2.0, and hopefully share some details about where it’s going.
Here are the panel specifics, copied from the OVC schedule:
Summary: An Introduction to PBCore 2.0: Metadata for Public Broadcasters - (4:30 PM - 5:30 PM)
Description: PBCore has served the Public Media community as a metadata schema for describing media since 2005. With a new round of funding from the Corporation for Public Broadcasting, WGBH Boston is working on PBCore 2.0 – an updated version which will increase its flexibility as a schema and therefore its applicability to diverse user scenarios. In addition, a new web site with updated documentation is set to launch next month (November, 2010). Come learn about PBCore: how it is evolving, how it is applied, and how it can benefit your workflow and interoperability as a video content producer or consumer.
Chair: Nan Rubin — PBCore Project
Linda Tadic — Audiovisual Archive Network
David Rice — Audiovisual Preservation Solutions
Chris Beer — WGBH Interactive