Article category: American Archive
Please help with the PBCore User/Non-user survey
The PBCore team is looking for all feedback on use and non-use of PBCore by people who manage audiovisual collections. This’ll help inform the work happening now on improving the schema, and whatever else can be done to make PBCore more useful. You can take the survey here:
Please do it, you know you want to! It’ll be good for you!
Oh, also, if you take the survey before May 5, you have a chance at winning a PBCore mug, which will feature our recently created new PBCore logo! Who wouldn’t want a PBCore mug? Everyone who completes the survey will be entered in a drawing, and three winners will be announced on Friday, May 9th.
The logo is pretty cool, no?
PBCore rides again
As the denizens and constituents of pbcoreresources.org will know, this community blog got started when somebody who shall remain nameless dropped the ball on continuing support for PBCore back in 2008. By that time, many audiovisual archives and developers had adopted PBCore as a promising metadata standard for audio and video assets and collections. Some community dialogue and self-support was needed, and the PBCore community has grown significantly since then.
Well here’s some good news: PBCore is back in action, with the official support of WGBH and the Library of Congress. It’s now under the care of the American Archive, which has been funded by the Corporation for Public Broadcasting which has renewed its commitment to preserving the audiovisual treasures created over decades by public broadcasters and producers in the United States.
There is some work to do on the PBCore schema, and a really good team of people are working on that now. The canonical PBCore website will soon be refreshed, and other teams are working on educational and support materials.
You can get more of the full rundown on the American Archive website announcement of this news that PBCore is back in action!
Metadata as media
(As always, I speak only for myself as a media producer and archivist.)
I attended the panel discussion on PBCore at the recent Open Video Conference, and was struck by something that should have been obvious. Those of us pushing development of PBCore have failed to clarify one basic thing: What is PBCore for? I’ve been to workshops and sessions on PBCore over the past six years and have been on metadata panels at the AMIA Conference, the PBS Tech Conference, NETA, and iMA. We often focused on explaining the PBCore elements and why they are useful for cataloging media assets. But at the OVC, the question was raised “Why do we need PBCore to catalog our stuff if we already have a good media database?” The question reveals a conflation of two distinct things: having a media database, and being able to easily interoperate with other databases.
Most of us producers are not, after all, experts in database administration, XML, or programming in general. During the American Archive Pilot Project I talked with people at other public TV and radio stations trying to “use” PBCore without adequate tools, and without understanding why they had to use it in the first place. If the answer is “you need a data model for cataloging your media assets,” there are many other catalog and data models. A better answer is you use PBCore to create shareable metadata. If you have a media collection and you want to combine it with other collections, PBCore provides a machine-readable translation layer between systems.
Some people have asked why use PBCore instead of something simpler like RSS or Atom? I think that’s a really good question.
You can stuff lots of descriptive metadata into RSS or Atom. Their schemas are understood by a wide range of applications, and they are simple to implement. At the other end of the spectrum, MPEG7 provides an exhaustive schema for describing multimedia content…emphasis on the word “exhaustive.”
PBCore is somewhere between the simplicity of RSS and the verbose complexity of MPEG7. It provides a level of detail useful to media archives, without being ridiculous to implement. It’s sort of a “just right” format, allowing simple producers like me to share a great deal of useful metadata about my media assets with any system that can parse PBCore XML.
So in the example of the American Archive Pilot Project, my station used a MySQL-based Content Management System to catalog several hundred media assets. (See my earlier post for details.) With the CMS I could render web pages and an RSS feed, plus PBCore records for each asset. The AAPP project portal could just ingest my PBCore records into the national AAPP database, getting much more detail on each asset than would be provided by RSS.
Audio and video are media that connect human beings. RSS and PBCore are media that connect machines.
You can have a fantastic media database not designed around the PBCore standard. You can create a PBCore representation of that database by exporting XML records based on the PBCore schema. You can create other representations of your data based on RSS, JSON, and other formats. An example of this is the NPR API query generator, which provides multiple output format options including RSS, Atom, HTML, and NPR’s proprietary (and wonderfully detailed and useful) NPRML.
Given the flexibility of today’s tools, we can generate multiple different representations of our media database for different purposes. So what’s the use-case scenario with PBCore? With RSS or Atom, we know that many other systems can ingest our data. What systems can ingest PBCore?
A growing number of systems that speak PBCore have the word “archive” in them. Importantly, I hear the American Archive would adopt PBCore as a primary means for ingesting metadata from contributor collections. This would allow each contributor to use whatever database best suits their local needs, as long as each local system can create shareable metadata in the PBCore format.
PBCore won’t be used by media consumers and consumer-level applications like iTunes. PBCore will be used by media archives and the systems that contribute to them. That’s what PBCore is for.
Use of PBCore in the American Archive Pilot Project
Illinois Public Media was one of the 20-some public TV and Radio stations in the CPB-funded American Archive Pilot Project. The AAPP required participating stations to use PBCore as a metadata format, at least in principle. I decided to push implementation of PBCore in my AAPP content collection as far as possible using the toolset I used on a previous video archive project (Prairiefire on WILL-TV).
This toolset is based on the website Content Management System called ExpressionEngine, which makes setting up a particular database structure rather easy. I set up the database structure based on PBCore elements, with controlled vocabularies reflecting the AAPP taxonomy and suggested PBCore picklists. I then created xml templates in ExpressionEngine to render my AAPP collection metadata as valid PBCore records. I then went one step further, following discussions with Dan Jacobson and David Rice, and created a PBCoreCollection wrapper containing all 235 of the PBCore item records (each as a PBCoreDescriptionDocument) in my collection. The national portal for the AAPP, being developed and hosted at Oregon Public Broadcasting, was able to simply ingest the PBCoreCollection, demonstrating the viability of this approach to aggregating a large collection from multiple content sources.
This article details the methods used to accomplish this in ExpressionEngine. Similar methods could be used in Drupal, which we’re working on now.
In ExpressionEngine, one can easily define a set of fields to input data. For example a blog would need fields for a Title, a Body, and maybe a separate Image upload field along with a label field for the image (so you could add a caption or an alt tag at least). When you create these fields, you also pick a field type: textarea, dropdown list, file upload, etc. EE has several pre-defined field types and there are dozens of addons from third-party developers to add more.
One of the really great EE addons is FieldFrame, developed by Brandon Kelly. FieldFrame is a framework for developing new EE fieldtypes, and there are a bunch of good ones. The most important for our EE PBCore tool is called FF Matrix, which allows you to bundle several fields in a “row” of related data.
Here’s the way you create an FF Matrix field in ExpressionEngine:
With an FF Matrix field, you can do things like enter a PBCore subject tied to a subjectAuthorityUsed, or title along with titleType. Since most of PBCore elements are wrapped in pairs like this, it’s important to solve this in a straightforward way. With FF Matrix, you can enter as many linked pairs as needed, for example with many subject terms you want to have each term wrapped individually along with its corresponding subjectAuthorityUsed.
Here’s the PBCore Item entry form showing a number of such fields (but not the entire form which is a bit long):
We used this form to enter all the Intellectual Content and Intellectual Property metadata for each media item. Nothing in this Item form relates to the physical or digital Instantiation of that item. For that we used a different form with fields and fieldtypes defined specifically for Instantiation metadata. Here’s the fun part: One of the fieldtypes in the Instantiation form is a “relationship” field, which allows you to select an existing Item to which the Instantiation should be linked. So if you have several Instantations, like a wav file, and mp3, and an analog tape of the same Item, you create Instantiations records for each and link them to the Item.
This proved to be a quick and effective way to link multiple Instantiations with a single Item.
You might be able to see that some of the fields are blank, and their instructions say things like “formatDataRate - If MP3 file don’t enter anything.” Lots of the technical metadata like formatFileSize etc could be extracted automatically from the digital files by the system, so we don’t have to enter that data by hand. EE has a nice addon called MP3 Info + that does most of that work.
David Rice has developed better methods of reading file metadata into his PBCore Records Repository using a free tool called MediaInfo. We should get him to write more about that, as it’s work that could be leveraged and used in different systems I’m sure.
After entering all the metadata for our collection using the two forms above, the payoff is in rendering everything in usable form. Since it’s all in the CMS, it’s a simple matter to make a website displaying everything, and providing media players for the files. In fact we did this initially for the catalogers so they could work remotely and listen to and view the audio and video files.
This site was intended for that purpose: http://will.illinois.edu/metadata/aapp-inventory-all/.
As the catalogers added descriptive metadata, the site became much more interesting! We added as much descriptive stuff as possible, even full tape logs for some of the World War II oral history interviews. I chose not to display all that metadata on the web page, but it is rendered in the PBCore XML record for each item.
For example, here is a web page for one such interview: http://will.illinois.edu/metadata/aapp-inventory-all/WWII_oral_history_WesleyMatthews2008-02-21
And here is the PBCore record for the same interview: http://will.illinois.edu/metadata/pbcoreAAPP/wwii_oral_history_wesleymatthews2008-02-21
The way these are rendered is simple: an html template for the web page, and an xml template for the PBCore record, both drawing from the same database. In ExpressionEngine this is very simple to set up, and once it’s set up, you’re done.
Finally, as mentioned above I chose to try implementing the idea of a PBCoreCollection wrapper element, enclosing all 235 of the individual PBCoreDescriptionDocuments in my AAPP media collection. This is, of course, not a valid wrapper element in any PBCore version to date. This experience suggests that it should be. OPB was able to ingest my entire collection in a single gulp from this URL. Other stations in the AAPP were able to export using the same method (PBcoreCollection) even though they have different local systems. The ability to render a PBCoreCollection is all that matters, not the underlying system that rendered it.
I hope this is useful to anyone who might be looking for systems for cataloging media assets and doing various things with them like creating websites and PBCore records or whatever metadata format. I used ExpressionEngine but the basic method would work with Drupal, Plone, and other CMSs and frameworks. Most importantly, regardless of the system used, I hope this demonstration of the power of PBCoreCollection informs the development of PBCore 2.0, which is now in progress.
Job Posting for American Archive Executive/Senior Director
This could be a very nice job for someone:
"To provide leadership and strategic direction in guiding and implementing all aspects of the American Archive, public broadcasting's comprehensive archive of valuable radio and television programming, ensuring its collection, management and preservation..."
Also, the position involves "working with the system on the growth and evolution of PBCore and related metadata models..." It seems clear from this and from earlier discussions at CPB that the management of PBCore and the American Archive are likely to be closely linked.
For the full job announcement etc., see: http://cpb.org/jobline/index.php?mode=print_listing&listing_id=6674