Article category: How-to
pbcoreAssetDate needs refinement…and this site needs revitalization
So…lots going on in PBCoreland that hasn’t been reflected on pbcoreresources.org. I’ll get to that in another post. For the moment I want to mark something that is currently bugging me about PBCore 2.0: pbcoreAssetDate needs something to say what date formatting is being used. This is true for all other dates in a PBCore record.
I’m in the middle of building a PBCore export feature for WILL’s main website. This will allow exchange of pretty complete metadata with systems that can ingest PBCore, like the American Archive project (if it ever gets truly rolling) and the Popup Archive (which is rolling nicely). As I dive into the specifics, I want to return to and highlight those things about the PBCore 2.0 schema that remain…unfinished.
My concern is machine readability of dates and times. The PBCore 2.0 schema suggests, but does not require, ISO 8601 or the Library of Congress Extended Date/Time Format (EDTF) (and BTW the link on pbcore.org to EDTF is broken). Two big problems here:
- The 2.0 schema doesn’t provide any way to specify date formatting at all
- Even if it did, there’s a huge range of possible date formats within either IOS 8601 or EDTF
What’s a good solution? I don’t build parsers for a living, so I’m not sure, and thus this post. I’m tempted to say we should add a source attribute to PBCore dates, and specify the source of the date format we’re using. But is this specific enough? For example:
<pbcoreAssetDate dateType=“published” source=“ISO 8601”>2006-10-16T08:19:39-05:00</pbcoreAssetDate>
p.s. For a variety of reasons, I’m back on the job here as your editor/curator/muckraker of this site. It needs a rebuild, but first things first!
Making PBCore End User Friendly
Sites using the current beta of the module can now export a node as XML as well as dump of their site specific configurations making it easier to compare how organizations are implementing PBCore in practice.
The PBCore module for Drupal is still rough around the edges, but the structure is now there for anyone with even basic PHP skills to contribute.
Use of PBCore in the American Archive Pilot Project
Illinois Public Media was one of the 20-some public TV and Radio stations in the CPB-funded American Archive Pilot Project. The AAPP required participating stations to use PBCore as a metadata format, at least in principle. I decided to push implementation of PBCore in my AAPP content collection as far as possible using the toolset I used on a previous video archive project (Prairiefire on WILL-TV).
This toolset is based on the website Content Management System called ExpressionEngine, which makes setting up a particular database structure rather easy. I set up the database structure based on PBCore elements, with controlled vocabularies reflecting the AAPP taxonomy and suggested PBCore picklists. I then created xml templates in ExpressionEngine to render my AAPP collection metadata as valid PBCore records. I then went one step further, following discussions with Dan Jacobson and David Rice, and created a PBCoreCollection wrapper containing all 235 of the PBCore item records (each as a PBCoreDescriptionDocument) in my collection. The national portal for the AAPP, being developed and hosted at Oregon Public Broadcasting, was able to simply ingest the PBCoreCollection, demonstrating the viability of this approach to aggregating a large collection from multiple content sources.
This article details the methods used to accomplish this in ExpressionEngine. Similar methods could be used in Drupal, which we’re working on now.
In ExpressionEngine, one can easily define a set of fields to input data. For example a blog would need fields for a Title, a Body, and maybe a separate Image upload field along with a label field for the image (so you could add a caption or an alt tag at least). When you create these fields, you also pick a field type: textarea, dropdown list, file upload, etc. EE has several pre-defined field types and there are dozens of addons from third-party developers to add more.
One of the really great EE addons is FieldFrame, developed by Brandon Kelly. FieldFrame is a framework for developing new EE fieldtypes, and there are a bunch of good ones. The most important for our EE PBCore tool is called FF Matrix, which allows you to bundle several fields in a “row” of related data.
Here’s the way you create an FF Matrix field in ExpressionEngine:
With an FF Matrix field, you can do things like enter a PBCore subject tied to a subjectAuthorityUsed, or title along with titleType. Since most of PBCore elements are wrapped in pairs like this, it’s important to solve this in a straightforward way. With FF Matrix, you can enter as many linked pairs as needed, for example with many subject terms you want to have each term wrapped individually along with its corresponding subjectAuthorityUsed.
Here’s the PBCore Item entry form showing a number of such fields (but not the entire form which is a bit long):
We used this form to enter all the Intellectual Content and Intellectual Property metadata for each media item. Nothing in this Item form relates to the physical or digital Instantiation of that item. For that we used a different form with fields and fieldtypes defined specifically for Instantiation metadata. Here’s the fun part: One of the fieldtypes in the Instantiation form is a “relationship” field, which allows you to select an existing Item to which the Instantiation should be linked. So if you have several Instantations, like a wav file, and mp3, and an analog tape of the same Item, you create Instantiations records for each and link them to the Item.
This proved to be a quick and effective way to link multiple Instantiations with a single Item.
You might be able to see that some of the fields are blank, and their instructions say things like “formatDataRate - If MP3 file don’t enter anything.” Lots of the technical metadata like formatFileSize etc could be extracted automatically from the digital files by the system, so we don’t have to enter that data by hand. EE has a nice addon called MP3 Info + that does most of that work.
David Rice has developed better methods of reading file metadata into his PBCore Records Repository using a free tool called MediaInfo. We should get him to write more about that, as it’s work that could be leveraged and used in different systems I’m sure.
After entering all the metadata for our collection using the two forms above, the payoff is in rendering everything in usable form. Since it’s all in the CMS, it’s a simple matter to make a website displaying everything, and providing media players for the files. In fact we did this initially for the catalogers so they could work remotely and listen to and view the audio and video files.
This site was intended for that purpose: http://will.illinois.edu/metadata/aapp-inventory-all/.
As the catalogers added descriptive metadata, the site became much more interesting! We added as much descriptive stuff as possible, even full tape logs for some of the World War II oral history interviews. I chose not to display all that metadata on the web page, but it is rendered in the PBCore XML record for each item.
For example, here is a web page for one such interview: http://will.illinois.edu/metadata/aapp-inventory-all/WWII_oral_history_WesleyMatthews2008-02-21
And here is the PBCore record for the same interview: http://will.illinois.edu/metadata/pbcoreAAPP/wwii_oral_history_wesleymatthews2008-02-21
The way these are rendered is simple: an html template for the web page, and an xml template for the PBCore record, both drawing from the same database. In ExpressionEngine this is very simple to set up, and once it’s set up, you’re done.
Finally, as mentioned above I chose to try implementing the idea of a PBCoreCollection wrapper element, enclosing all 235 of the individual PBCoreDescriptionDocuments in my AAPP media collection. This is, of course, not a valid wrapper element in any PBCore version to date. This experience suggests that it should be. OPB was able to ingest my entire collection in a single gulp from this URL. Other stations in the AAPP were able to export using the same method (PBcoreCollection) even though they have different local systems. The ability to render a PBCoreCollection is all that matters, not the underlying system that rendered it.
I hope this is useful to anyone who might be looking for systems for cataloging media assets and doing various things with them like creating websites and PBCore records or whatever metadata format. I used ExpressionEngine but the basic method would work with Drupal, Plone, and other CMSs and frameworks. Most importantly, regardless of the system used, I hope this demonstration of the power of PBCoreCollection informs the development of PBCore 2.0, which is now in progress.
Fun new category added to PBCoreresources.org: Change Requests
If things appear quite in pbcoreland, that would be only a surface-level view. A lively discussion is taking place, for example, on the American Archive Pilot Project Basecamp site. People are asking questions about why PBCore elements are as they are, how PBCore matches the requirements of their collections and projects, and poking it with sticks in various ways. The AMIA Conference arrives in St. Louis next week, following a year in which many AMIA folks have submerged PBCore in white fire and subzero liquids. Other open source software projects are leveraging PBCore to exchange data and media files between far-flung systems. Lots of us have been throwing tons of actual content against the PBCore wall, and seeing which pieces stick or bounce back.
PBCore 2.0 will be coming, and it needs our help.
So I'm putting out the call (with the encouragement of Paul Burrows) for something we'll categorize on this site as Change Requests. Many suggestions have already been made on vairous listservs and other online spaces, and we'll be looking through those to compile them here. If you have other suggestions for changes to PBCore, you can add them here yourself. (Anyone who is a Member of this site can post to it, and you can sign up on the home page.) Please make sure to assign your entry to the category Change Request. We can then more easily filter these into a pile, and sort them into subcategories as needed.
I'll be an active curator as needed, so just let me know what "as needed" means to you, and let 'er rip.
How to reference a related web page?
Say you have an audio archive of a radio news story that's on a web page. In the PBCore record for the audio archive, how do you reference the web page?
Would it be:
<pbcoreRelation> <relationType>Is Part Of</relationType> <relationIdentifier>http://will.illinois.edu/news/story-item-foo/<relationIdentifier> </pbcoreRelation>
Or perhaps its relationType should be Is Referenced By.
I have a reservation about using pbcoreRelation for this, because it assumes the web page is a "media item" capable of being expressed in another PBCore record. (See: http://www.pbcore.org/PBCore/relationType.html) I doubt PBCore was ever intended to catalog web pages, and the practice of doing that would seem...daunting.
Regardless this question arises because, what if you have a media player capable of ingesting PBCore records and playing the media file, and you want to link to an original web page on which the media file was first published? This is not a hypothetical question. If pbcoreRelation isn't the way to do this, it would be important to settle the question very soon.
PBCore Genre Picklist from Hell
Let's be honest: The controlled vocabularies for pbcoreGenre suggested at pbcore.org lack relevance in many cases. I mean, "boat"? The main genre list suggested, "PBCore + Tribune Media Services Genre Categories (TiVo)," is mostly very good as far as it goes. But it doesn't go far enough.
And here's the problem: Because it's on the official PBCore website, it looks to many people like the Official PBCore Genre List. I've spoken with several PBCore users (speak up if you wish) who wanted to use certain genre terms not on the list, but didn't think it would be valid. It is valid, as long as you also declare the genreAuthorityUsed to identify the genre list.
This really matters when you want to exchange stuff between systems that speak PBCore, and you want that stuff to show up in the right places. By using a controlled vocabulary that is common to the systems exchanging the stuff, things work as intended. If I call something "boat" and you're expecting "marine," things fall apart. If I use "Horse" as in the suggested picklist, and your system wants to call it "Equestrian," we have a problem.
So what would move this forward? I'll suggest something: People should compile a genre list for a given PBCore user community (yes these really exist), and document it clearly for that community. Code it into applications (in drop-down lists for example) so everyone selects terms from the same genre list. Name that list, and you've got a valid new PBCore genreAuthorityUsed.
Shot Descriptions - standard or add-on for PBCore?
Anyone have a recommended standard for describing the shots themselves (within the programs)? I use PBCore to describe content, but need to describe it in better and exhaustive detail (i.e. wide shot, exterior, night.) Does PBCore allow for this, is there a controlled vocabulary in place (or available from another source that you can recommend) to describe footage in this way?
Thanks for the help.
Story-related images, WTF?
Let's say you have a radio news web page, with a title, text description, byline, date, and audio archive. All straightforward to encode in PBCore. But what if you have a related image, as is often the case? (Example: http://will.illinois.edu/news/story/fedrb-f/, or all over npr.org)
We could make a separate PBCore record for the image, then include
<pbcoreRelation> element containers to connect the image with the radio news story with "Is Part Of" and "Has Part" values. That means we'd be spending lots of time cataloging images. That might be a good thing, but is it the only PBCore way of doing things?