PBCore Tool Quest
Back in 2005 I became somewhat obsessed with cataloging media on the job here at WILL Public Media. As the website manager, people would do things like hand me a videotape and say "can you put this on the web?" But what's on the tape? It became clear producers had no clue about the importance of recording actual information about their productions. Meanwhile I started learning about metadata and controlled vocabularies, and some of the cool things we could do with structured data on the web. This lead me directly to PBCore as the theoretical metadata standard for public broadcasting and beyond. If producers and stations could create PBCore-compliant XML records for their content, we could develop tools for automated exchange of deep information about media, and the media essence as well.
But we can't do much if nobody catalogs their stuff. So the question became what tools to use? Working with a graduate assistant from Library Science here at Illinois (the great Jimi Jones!), we embarked on a PBCore tool quest.
As we began digitizing and cataloging 15 years worth of the WILL-TV program Prairie Fire, we wanted a tool that would spit out PBCore XML. Jimi and I tried the free MIC database tool, and sorry MIC, but at least back then it was (ahem) a work in progress. We were somehow unaware of the free PBCore File Maker Pro database, which I guess is pretty OK, and if you've used it feel free to comment. So we tried things like using the open-source Greenstone repository software, which I wouldn't recommend for video unless you like migraines. We even started hand-coding XML using Oxygen, which is enough to make your eyes bleed after a while.
Eventually though I started thinking like a web developer, which after all is my job at WILL. We're using a Content Management System to "catalog" media for web pages. We use the same CMS database to output RSS feeds, which is a flavor of XML. Why not add a PBCore flavor of XML? So I added a bunch of fields reflecting the PBCore elements, and asked our producers to provide...a little more detail when adding their content to the website. The result was the new Prairie Fire website, released in January 2007, featuring among other things PBCore records of every episode and segment for every program over the past 15 years.
Since then we've continued to refine our little home-grown CMS-based cataloging tool. Here are a couple of screen shots that kinda show how it works.
This is a form for entering descriptive and administrative metadata based on the PBCore elements. Mary Miller from the Peabody Archives worked with me on this, and she likes to call it the "Platonic Record" of the media object, as it is purely the intellectual record.
This is the input form for the PBCore Instantiation, which is the actual physical or digital object being cataloged. The Instantiation is then linked to the Platonic Record. The result is a complete PBCore-compliant record which looks exactly like this. (If you hit this with Safari, do a View Source to see the XML.) So our CMS can take metadata directly from the Producer to create web pages, RSS feeds/podcasts, and PBCore XML. Why not?
But here we are now in (almost) 2009, and we need more than my cute little CMS solution. Which is why I'm so excited to see the work of Dave Rice and Mike Castleman, who created a much better online system called PBCore Vermicelli. This is a Ruby app that does everything I had set up, plus a lot more. So I wanted to call more attention to it, and suggest that it's a good starting point for much more powerful things to come. What things? I have a few ideas, but I'll pause here for now. Let's talk about it.
Comments:
-
hi,
Thanks for playing with the WNET/vermicelli PBCore tool. It’s still at a very early stage (it needs, among a number of other things, some sort of user authentication system), but please do let me or Dave know if you have any suggestions.
One thing I want to implement is some kind of PBCore validator (à la the W3C’s HTML and other validators). A lot of the PBCore records we’ve been seeing (including some of the PBCore-provided examples!) violate the PBCore spec in more or less subtle ways, which hampers interoperability.
-
Jack, what an intersection of tech and art and public service. Congratulations to you and to WILL! Tell me, could you make links to high-resolution images from the screen captures? These PBCore files are treasure chests of information: mp4, flv, real video, and all documented so well. I wonder what kind of mashups are possible with these pbcore feed… it’s inspiring! Is pbcore available for multiple stories in a series or only for individual stories? Great work!
-
John, thanks for your comments. Here are URLs to download high quality pngs for the two entry forms, including the entire forms this time:
Platonic Record entry form: http://willmedia.s3.amazonaws.com/entryformplatonic.png
Instantiation entry form:
http://willmedia.s3.amazonaws.com/entryforminstantiation.png(I added links to each image above as well.)
Note that you can link each Instantiation with a Platonic entry, to create a relationship which completes the PBCore record. And you can create multiple Instantiations, and link each one to the same Platonic entry.
Of course you cannot see what’s in the dropdown lists, but they contain the same values as in the PBCore “picklists” in the User Guide on pbcore.org. In the CMS I’m using (ExpressionEngine), these are created by an Administrator while building the set of custom fields.
PBCore Vermicelli improves on this by allowing users to add or change the values in these dropdown lists…unless you don’t want to give the user that ability which sometimes would be the case. Anyway, I bet Mike can slice that any way he wants in another version.
-
Oh I forgot to say, the Instantiation form is based on PBCore version 1.1. But we now have version 1.2 so I need to update this work. Did we mention version 1.2 yet?
-
Thanks Jack, I can see now the similarities now between the Prairie Fire CMS and PBCore Vermicelli.
No. I don’t think you talked about version 1.2 yet. I have an email from Nan Rubin from November 10th that talks about it, but I haven’t had a chance to look into this in enough detail. If you could distill some of the issues here on the blog, that would be great.
-
Love playing around in the vermicelli site, not in the least because “vermicelli” and “Ruby on Rails” are both fun to hear and say.
Am confused in this record: Is part of FRNN but FRNN is a dead link. Shouldn’t this always take you to a parent record?
*And that makes me wonder if the is part of relation could also be somehow implemented for the multi-part instantiations? I am so tired of thinking about this but can’t stop until a solution is found!!
Franny’s Feet; Puppet Pals/Sweet Mystery
Intellectual Content SectionAsset UUID: 9ce0702e-ffe9-4c0f-bf0d-400b072afb15
Identifier (PBS PODS)
FRNN_000201
Title (Series)
Franny’s Feet
Title (Episode)
Puppet Pals/Sweet Mystery
Description (Segment)
{Segment: 1; Duration: 00:27:00} A) Franny travels to India, where she meets two squabbling sisters, who are preparing a shadow puppet show for their mother’s birthday. Franny helps them learn to collaborate, and together they put on a wonderful show. In Franny’s Treasures, Franny and the viewers help Bobby position himself in relation to the light source so he can make shadows. B) Franny travels to Quebec, where she meets Pierre, a boy who is helping his family make maple syrup. When Franny discovers that a busy beaver has taken the sap pail, she devises a clever plan to rescue it. In Franny’s Treasures, Franny and Bobby prompt the audience to identify fruits and nuts that grow on trees.
Genre (PBS PODS)
Children’s
Relation
Is Part Of: FRNN
Coverage (Spatial)
USA
Audience Rating
All Children
-
Responding to Mike’s comment “One thing I want to implement is some kind of PBCore validator (à la the W3C’s HTML and other validators). A lot of the PBCore records we’ve been seeing (including some of the PBCore-provided examples!) violate the PBCore spec in more or less subtle ways, which hampers interoperability.”
I’d like to suggest we not do that until PBCore itself is a workable, viable standard. I, for one, have no intention of being PBCore compliant until PBCore can take on multi-part instantiations, elaborations of roles (segment producer, name of character played by cast member, corporate affiliation of executive producer, etc.) to name two.
-
In response to Mary “I’d like to suggest we not do that until PBCore itself is a workable, viable standard.” How would this be determined? I’ve been using the standards of PBCore for 4 years now in multiple different projects and have overseen the generation of about a hundred thousand PBCore records through mapping and creation. Indeed the missing functionality that you enumerate makes PBCore frustrating to implement in some circumstances, but I’d still classify the standard in the ‘workable’ category. Also since there is already the xsd files and controlled vocubalaries that PBCore has published we can validate it to some degree. I’m hoping that a validator can make PBCore more usable since the implementations of PBCore that are all growing independently use a variety of interpretations of the standard from loose to strict. If an implementor had a PBCore validator accessible it may have a benefit on the mappings and translation tools that are being built.
Dave
-
Well, Dave is about 100,000 PBCore records ahead of me, so I have to defer. And maybe this discussion is all the proof that I need that Dave & Mike are absolutely right.
Last month y’all helped me decide to put the length of a film (1200 ft.) in formatDuration, which I think we all agree is designed for data looking like 00:30:00. 1200 is a physical duration where 00:30:00 is a temporal duration.
So I am wondering, does
<formatDuration>1096 ft.</formatDuration>
conform, or not?I would tend to think no, but if the validator tells me it’s fine, then hooray.
So bring on the validation!
-
I don’t really think we can apply much validation to formatDuration under the current rules. The value could be in SMPTE standard, a different standard, or an undefined standard. For instance a PBCore sample record (http://www.pbcore.org/PBCore/PBCore_SampleRecords/PBCore_Sample_03.html) has a value of 2:22, I’m uncertain if it should be interpreted as two hours, twenty-two minutes or two minutes and twenty-two seconds.
PBCore officially “recommends flexibility in expressing time stamps” and “the best practice is to match the time stamp designation to your preferred method” so the standards for the use of this field seems largely defined by that particular user, thus <formatDuration>1096 ft.</formatDuration> is likely acceptable to the standards and also to understand 2:22 I’d have to understand the timestamp practices of the originating institution. This makes this field flexible but require human intuition to determine what the value means and restrict programmatic use of the value.
The validation would more apply to fields like formatColors that have PBCore authored picklists. If PBCore defines the concept of black and white as “B&W;” and a record states this concept as “Black and White” then the validator could state that the value does not comply with PBCore’s definitions of these concepts.
-
so can anyone upload records? I’d like to try. Ann
-
sure, anyone can upload records. except spammers.
-
Hi Ann,
Yep, if you mean pbcore.vermicel.li, you can upload your xml directly into the tool.
With my system you can’t yet do that. Been meaning to get to that but having too much fun using pbcore.vermicel.li! Also, as Mike points out in his article on Common PBCore Errors, there are a few other changes I need to make in my system. Good thing I have…lots of time on my hands.
-
Vermicelli Instantiation?
On the Vermicelli add an asset interface, I see Intellectual Property, Intellectual Content, and Extensions, but no Instantiation info. template.
I think Dave showed us some instantiation examples…
Assistance much appreciated!
-
Assets and instantiations are on separate edit pages. For instance edit the asset here http://pbcore.vermicel.li/assets/1194/edit and edit the instantiations here http://pbcore.vermicel.li/assets/1194/instantiations.
There are a lot of records in the database without any instantiations although feel free to make some if you want, this installation of the tool is for testing, playing, and social networking.
-
thanks for the great post, I just wanted to share with you a new online xml validator tools that is completely free to use, here is the url
http://www.liquid-technologies.com/FreeXmlTools/FreeXmlValidator.aspx
Write a comment:

