PBCore.org Alpha site launched!
Head on over to PBCore.org, check out the face lift, and leave your feedback!
WGBH is in the process of re-designing PBCore.org and we’ve just launched the “alpha” site which includes:
The interior pages remain the same for now, as the full re-launch is scheduled for November 2010 to coincide with the release of PBCore 2.0.
To that end, please contribute change requests for the schema, elements, etc., by June 30, 2010.
This summer, we’ll be conducting card sort and user testing exercises to re-organize the site’s new and legacy content and to make it as clear, concise and useful as possible for PBCore users. If you have suggestions or would like to participate please contact us or participate in the blog.
Many thanks to all who provided, and who continue to provide input into the new site, including Jack Brighton, Paul Burrows, Nan Rubin, the Code4Lib community, and the WGBH PBCore team!
PBCore and the Semantic Web
Recently I had a chance to discuss with Dave Rice, Dan Jacobson, and Chris Beer the potential role of PBCore in enabling content to flow in the “semantic web.” The topic has also come up in discussions around the development of PBCore 2.0. It seems like a good idea to open the topic for broader consideration.
The Semantic Web would mostly likely be implemented by Linked Data. Using URIs and RDF, “Linked Data is about using the Web to connect related data that wasn’t previously linked, or using the Web to lower the barriers to linking data currently linked using other methods.” Those other methods would include linking things together by hand, piece by piece, which webmasters of the world can tell you doesn’t exactly scale well.
So how does PBCore fit into linked data using URIs and RDF? I don’t have anything close to a complete answer, just ideas that came from the discussion. I would love some feedback to take this thing further.
Linked Data depends upon having Uniform Resource Identifiers for content items, their descriptive/administrative metadata, and their relationships to other items and descriptors. Every resource should have a URI, whether the resource is a file or a subject term. Relationships between resources also get URIs, such as “friend” or “author”. With URIs for everything, you can establish so-called Triples (subject -> predicate -> object) as machine-processable links, which then establish other machine-processable links to other resources, and the Linked Data universe scales up from there based on network effects. These Triples and their relationships with other resources are made possible by RDF statements. How do we create RDF statements using PBCore?
As the most basic level, a PBCore record may already contain many URIs, for example in the case of a streaming media file a valid URI might be http://will.illinois.edu/media/mp3/illinoisbroadcastarchives_040T_96k.mp3. This file is described by elements like:
<pbcoreSubject> <subject>African American Civil Rights</subject> <subjectAuthorityUsed>American Archive Civil Rights Subject Categories</subjectAuthorityUsed> </pbcoreSubject>
<pbcoreContributor> <contributor>Hughes, Langston</contributor> <contributorRole>Lecturer</contributorRole> </pbcoreContributor>
Do we have URIs for “African American Civil Rights,” “American Archive Civil Rights Subject Categories,” “Hughes, Langston” and “Lecturer”? They aren’t currently in this PBCore document but perhaps they could be if we did something radical (for PBCore) like adding attributes. We could then tap into existing namespaces or create our own as needed, using and assigning URIs for each and every element in our PBCore records.
But this adds a great deal of additional work and complication. PBCore was invented to provide interoperability among public media systems, and if we all used plain old PBCore it would work for that purpose. If we raise the bar too high and require URIs, fewer people will be able to adopt PBCore in their local workflows and systems.
Yet it would be very useful to add URIs to key elements like subject and genre terms, contributors, creators, and publishers. We could even apply this to the currently almost meaningless pbcoreRightsSummary, for example using Creative Commons license URIs.
Such an approach could be incredibly powerful, but I believe it should be optional. URI attributes should be allowed but not required in PBCore 2.0. We might also create PBCore Profiles which use URIs in specific namespaces; for example in a future American Archive PBCore Profile there could be subject and genre namespaces facilitating a common taxonomy. This could solve one of the thorniest problems of interoperability in different content collections.
But I believe content creators in pubmedia will never universally adopt the same taxonomies, nor should they. After all, we’re probably all dealing with a babel of descriptive metadata and controlled vocabularies generated by various legacy systems and catalog records, and we can’t make them conform to a new common standard. I also like the idea of harvesting user-generated keywords for my content and adding that to the metadata mix. Also, PBCore Profiles might work within the PBCore user community, but what about linking to other content domains? I want my data to interoperate not only inside pubmedia, but also with related data anywhere. A story on the new water plant drawing from the Central Illinois Mahomet Aquifer could link to historical information, climate records, corporate earnings reports, and huge data sets on the geography and hydrology of the region. Each of these domains has tons of public data but it’s all in different formats and none of it is PBCore. Almost none of it is RDF.
This leads to the notion that something else should create the RDF. It would harvest whatever it could from available data and build an explorable triple store. This would enable other applications to traverse, search, and build interesting interfaces to a growing universe of linked data.
Several software projects are working on tools to do this. Project Tupelo can harvest existing URI schemes, or in their absence create them for a given set of data. We’re attempting to test Tupelo with our content collections, and hope to have something to report later this year.
But I bet there are other tools and approaches, and maybe it’s simpler than I’m seeing it. If you know something about this please respond by commenting on this post, or starting a new one.