How to reference a related web page?
Say you have an audio archive of a radio news story that's on a web page. In the PBCore record for the audio archive, how do you reference the web page?
Would it be:
<pbcoreRelation> <relationType>Is Part Of</relationType> <relationIdentifier>http://will.illinois.edu/news/story-item-foo/<relationIdentifier> </pbcoreRelation>
Or perhaps its relationType should be Is Referenced By.
I have a reservation about using pbcoreRelation for this, because it assumes the web page is a "media item" capable of being expressed in another PBCore record. (See: http://www.pbcore.org/PBCore/relationType.html) I doubt PBCore was ever intended to catalog web pages, and the practice of doing that would seem...daunting.
Regardless this question arises because, what if you have a media player capable of ingesting PBCore records and playing the media file, and you want to link to an original web page on which the media file was first published? This is not a hypothetical question. If pbcoreRelation isn't the way to do this, it would be important to settle the question very soon.
Thoughts?
Drupal developer updating Media module looking for metadata handling feedback
The developer working to update media handling in Drupal through the Google SoC program is looking for feedback on metadata handling. This work is really going to influence how media handling and metadata work in Drupal 7. I've posted a response that includes some reference to PBCore, but any other feedback would be appreciated. If you'd like to see better support for PBCore in Drupal, this is going to be the time to help influence how that happens. http://groups.drupal.org/node/22915Instantiations, Components, and Essence Tracks
PBCore instantiation records work well for documenting renditions of an asset that are composed of a single tape or file, but when an instantiation requires multiple tapes, reels or files what should the protocol be? How can PBCore be used to efficiently document a rendition of an asset that is composed of multiple objects, each with its own set of technical metadata? Disclaimer, this post is based on my own personal experience in using PBCore 1.2.1 and resulting conclusions.Within a PBCore asset record every element may be applied multiple times (one asset may have as many titles, contributors, and instantiations as one desires); however from the perspective of the instantiation (which is a single rendition of an asset) much of the descriptive information may only occur once. For instance, an instantiation may only have one formatDigital (i.e. one mime_type), one formatGenerations, one formatFileSize. The PBCore instantiation element appears to be designed to both document a single item or a single file and to document "all the details on how the asset is actualized" (quote from the PBCore 1.2.1 XSD). However, in some cases, in order to document how an instantiation actualizes the asset, multiple files or multiple items are necessary. Here are three situational examples:
- an asset describing a musical album may have an instantiation that is one CD, then the digitized version of that CD comprises 10 digital files each representing a track. The 10 digital items together represent the same asset as the single-item CD,
- an asset describing a film exists in a collection as two instantiations: a three-reel 35mm film print and a single Digibeta (this is similar to the example that Mary Miller describes at http://www.pbcoreresources.org/article/dealing_with_multi_part_instantiations/),
- an asset documenting a television episode contains two instantiations: one being a single Digibeta tape and another as two elementary stream files (an .m2v video stream and an .wav audio stream).
All three of these examples refer to audiovisual material that changes in number of components needed to represent an asset over the reformatting process. In some types of reformatting the number goes from more to less (like example 2, the film transfer) and in some cases from less to more (link example 1, the digitization of a CD).
If PBCore instantiations are understood to only represent single-item instantiations then the individual digitized tracks of a CD or the individual reels of a film print would need to be documented in their own asset records, where one asset represents the CD and then 10 other assets represent the individual digitized tracks. This is obviously less efficient than treating the set of digitized tracks as one instantiation and the CD as another instantiation of the same asset. Another option could be to zip or tar the 10 tracks into one file, but this requirement for effective PBCore description has its own disadvantages. Alternatively a directory that contains the 10 file-based tracks could be defined by the instantiation.
Best practices for documenting multi-object instantiations are not clear. With the m2v and wav elementary streams, the two files need to work together to represent the asset, but they have their own unique values for 'formatDigital', 'formatFileSize', 'formatDataRate' and possibly their own 'formatLocation'. All of these values may only occur once per instantiation. For the m2v and wav elementary streams to be defined as a single instantiation some options are:
- the two files could be moved into a directory or folder, which would serve the role of an audiovisual wrapper. In this case the formatDigital would be 'application/x-not-regular-file' (referring to the directory) the formatFileSize could be the directory size, etc.
- or the data from the individual files could be shoehorned into the instantiation fields meant for individual files, thus formatDigital would be "video/mpeg audio/x-wav" and formatFileSize could be the sum of the two file sizes.
- or the m2v and wav files could be either zipped or tarred into a single file or multiplexed into an audiovisual wrapper, so that the collection is then represented by a single file (the analog equivalent would be splicing together film reels in order that the metadata more cleanly fits into an instantiation record).
None of these options are ideal for describing a complex object, since potentially the levels of quality of resulting technical documentation become less precise, the implementation of instantiation becomes less standardized, or the metadata process potentially burdens collection management. This is the same sort of challenge that occurred in pre-1.2 versions of PBCore where discrete track-level metadata values had to be concatenated and labeled into single fields like formatDataRate = "Total 1930 kilobits/sec; Video 1700 kilobits/sec; Audio 230 kilobits/sec". This procedure was documented by pbcore.org at http://www.pbcore.org/PBCore/formatDataRate.html that "the pbcoreInstantiation container should not be repeated in order to express a video data rate and an associated audio data rate. The two combined are part of a single instantiation for an asset".
I have two suggestions regarding this potential challenge. The first would be documenting best practices the use PBCore 1.2.1 as is to document these complex objects in a way that fits the various examples above. The second suggestion would involve a modification to PBCore which would be to integrate an additional element in between instantiation and essenceTrack, perhaps called 'component'. Typically PBCore would document single-component instantiations; however in cases where a single instantiation is made up of multiple tapes, reels or files, the instantiation would have as many component records each with its own technical metadata.
In this arrangement some of the values currently attached to instantiation would move to the component level. Whereas PBCore 1.2.1 is
instantiation { {formatIdentifier, formatIdentifierSource } dateCreated, dateIssued, formatPhysical, formatDigital, formatLocation, formatMediaType, formatGenerations, formatFileSize, formatTimeStart, formatDurations, formatColors, formatTracks, formatChannelConfiguration, language, alternativeModes {essenceTrack see below } {dateAvailableStart, dateAvailableEnd } { annotation }
essenceTrack {essenceTrackType, essenceTrackIdentifier, essenceTrackIdentifierSource, essenceTrackStandard, essenceTrackEncoding, essenceTrackDataRate, essenceTrackTimeStart, essenceTrackDuration, essenceTrackBitDepth, essenceTrackSamplingRate, essenceTrackFrameSize, essenceTrackAspectRatio, essenceTrackFrameRate, essenceTrackLanguage, essenceTrackAnnotation }
the incorporation of a component level of data could look like
instantiation { assemblyMode, formatMediaType, formatGenerations, formatFileSize, formatColors,, formatChannelConfiguration, language, alternativeModes, {dateAvailableStart, dateAvailableEnd } { annotation }
component { {componentIdentifier, componentIdentifierSource } dateCreated, dateIssued, componentPhysical, componentDigital, componentLocation, componentTimeStart, componentDuration, componentTracks, {essenceTrack see below } }
essenceTrack {essenceTrackType, essenceTrackIdentifier, essenceTrackIdentifierSource, essenceTrackStandard, essenceTrackEncoding, essenceTrackDataRate, essenceTrackTimeStart, essenceTrackDuration, essenceTrackBitDepth, essenceTrackSamplingRate, essenceTrackFrameSize, essenceTrackAspectRatio, essenceTrackFrameRate, essenceTrackLanguage, essenceTrackAnnotation }
In this draft I added a field called 'assemblyMode'. Something like assemblyMode would be needed to document how the components are related to each other. In the case of the digitized CD, the components would be assembled through concatenation and played back-to-back, so assemblyMode could equal "concatenation". With the m2v and wav elementary streams the assemblyMode would be "multiplexion" since the component needs to be multiplexed for playback. In the case of "concatenation" the total duration of the instantiation would equal the total durations of the components whereas if the assemblyMode is "multiplexion" then the instantiation's duration is roughly equal to the duration of the component, so the value is relevant to how other pieces of metadata are determined.
Since the instantiation should contain "all the details on how the asset is actualized" (as stated by the PBCore 1.2.1 XSD), adding an addition element level to accommodate multi-tape or multi-objects would help this goal be achieved with cleaner and more descriptive data. I'm interested to hear if this is an issue another other PBCore users are thinking about and if there are any easier solutions that I'm missing.
David Rice
AudioVisual Preservation Solutions
Time to get funky with PBCore
Yesterday somebody asked me "Is anything really happening with PBCore? Or is it a nice idea that CPB funded and then left hanging out to dry?" The answer seems to be yes, and maybe.
I'm aware of several significant PBCore projects, mostly below the CPB radar:
- An open source media player that will ingest content and metadata via PBCore records
- A Drupal profile that will include PBCore among other methods for exchanging media
- A project to build PBCore modules for other CMSs including ExpressionEngine, and Joomla
- The folks at NPR Online are adding PBCore as an output format for the NPR API
- A preservation repository for media using PBCore as its metadata foundation
I also just saw a CPB RFP for STEM projects relating to climate science, requiring the use of PBCore for all project media.
Meanwhile, OPB is tackling the next phase of the American Archive project, which could play a large role in shaping the future of PBCore. This is critical, because without a formal change-management process, active development, and support, A/V archivists and online media developers aren't likely to have confidence that PBCore will become a common standard for the long-term.
I think it should be, because PBCore is simply a great standard for A/V metadata. It's simple enough for most people to understand, but detailed enough to be truly useful. But the PBCore project needs further work, including refining the controlled vocabularies for subjects, genres, and probably everything else. The PBCore Resource Group has been dormant, and I don't see evidence that anyone else has officially taken the reins. Correct me if I'm wrong please.
I suspect this is the year that PBCore either sinks or swims. There are lots of good reasons it should emerge as a common standard, and lots of "things" being developed around it. The question is, who will take responsibility for maintaining the PBCore standard?
PBCore subject and pbcoreSubjectAuthorityUsed: Adding subject authorities
In some ways it's great that PBCore is so agnostic about using specific subject terms and authorities, but it also makes exchanging records between systems too unpredictable. If I say the subject is Climate Change, and your system uses Global Warming, we have a problem communicating between systems. PBCore.org doesn't even suggest any subject taxonomies, leaving users to fish for one or invent their own.
Here's a proposal to address this: Let's pick a few subject authorities as a starting point. Certain applications of PBCore may need different subject authorities and that's fine, they can be added. The list of possible subject authorities doesn't have to be written into the standard, but a few suggestions might help form usage patterns, preferences, and perhaps eventually best practices for certain types of content.
For radio news stories, for example, we might use the NPR All Topics list: http://api.npr.org/list?id=3002. I want to use this to pull in related content from the NPR API, so if I tag my content using terms from the NPR All Topics list, I can build an automated query based on those topics. More on that soon....