Source: The Promise of Big Data: Metadata and Digital Art History

When considering the challenges of digitization projects, I found myself focusing on the creation of the metadata attached to the digital images. The technical aspects of the digitizing process are complex, and format standards are important, but in my experience the really fascinating element of digitizing is the attempt to translate the context of the image to the digital world. In the archives, context is everything. The provenance of a piece, the original order of a collection, these are the elements archivists are working to preserve. How does that work in the digital realm and what are the implications for digital art history?

One project I spent several months on as an intern at the Special Collections and Archives of Northern Arizona University, was the creation of descriptive metadata for a photographic collection. The collection had been digitized as part of a large-scale project, with perhaps more enthusiasm than forethought, a common theme among early projects. The images were online, in the university database and also as part of Arizona Archives Online, a collective with open public access. The only information attached to the images was the technical metadata and the (sometimes incorrect) collection numbers. My task was sorting through the boxes of original acetate negatives and attaching the information found there to their digital counterparts, in the form of controlled vocabulary (LCSH), descriptive titles, and finding aid summaries.

As the project continued and I spent more time with the collection and related sources discovered in the archives, stories emerged- about the photographer/creator Philip Johnston, the Southwest at the turn of the century, and the lives of the Navajo people on the reservations. The images suddenly had context, both individually and within the collection. For example, these two images originally only had collection numbers:

[Miner Panning for Gold at the Uba River, California] Link:[Portrait of Peshlakai Etsetti]Link:

Now they have identities.

In this collection there are still some images labelled [Unidentified] or simply [Portrait of a Navajo Man.] The software used by SCA, ContentDM, has an option to allow comments, in this case they are open to the public. It creates an opportunity to add metadata context in a process scholars Shilton and Srinivasan have called “participatory appraisal,” reaching out to the community represented in the records to involve them in collection.[1]  Context is not limited to the intuitional record obviously, and for every image it is entirely multifaceted. Yet what are the possibilities once we have this context (or some version) attached to the online object?

Besser and Hubbert briefly mention RDF and the idea of “metadata crosswalks” in their “Introduction to Imaging.” They reference the Getty’s use of these frameworks to share metadata, and they mention the Semantic Web as a future endeavor, but they do not go into much depth on the possibilities of metadata sharing.[2] This resource was revised in 2003, and in the thirteen years since there has been more and more of a focus on the power of aggregated metadata, the trend of “big data” and what it could mean for topics ranging from cancer research to the practice of law.

Digital art history also sees the possibility in this phenomenon. The Getty Research Institute has only expanded their efforts at what Besser called “metadata crosswalks,” for example their Research Provenance Index – a database of digitized auction catalogs, archival inventories and art dealer stock books – creating the potential to trace previously unrecognized trends and specific provenances. Currently it seems this database is mostly the Getty’s own resources digitized, but the scope and power of the database will exponentially increase if/when other institutions join.

While these projects are promising, museums and other cultural institutions are faced with unique challenges when creating descriptive metadata – simply because their holdings are themselves unique and unlikely to fit into uniform and basic descriptors. Unlike traditional bibliographic cataloging, this type of metadata cannot be easily standardized. It’s why we have yet to see an OCLC for archival holdings, although there have been attempts (like Archive Grid.) Here the technology must catch up to the ideas – and in fields like bio-medical research there are solutions on the horizon. There has been a lot of buzz about the Precision Medicine Initiative, and tech start up Syapse believes it has the software to enable the project, using RDF triples and OWL ontologies, a possible version of the Semantic Web in reality (link). If the immense complexities of medical big data can be navigated, there must be hope for the data needed for digital art history. The unique holdings of museums and cultural institutions are being digitized, all that is lacking in the connections, the edges to these nodes of data (Wired piece on Graph Theory.) All of this is of course dependent on the metadata, after all “images without appropriate metadata will quickly become useless,”[3] more portraits of “Unidentified.” To me that means the focus of every successful digitization project should be the creation and preservation of sound metadata, the persistent state information that hopefully someday will provide scholars answers to questions we haven’t yet thought to ask.



[1] Katie Shilton, Ramesh Srinivasan, “Counterpoint: Participatory Appraisal and Arrangement for Multicultural Archival Collections,” Archivaria, 63 (Spring 2007) 87-101.

[2] Howard Besser, “Introduction to Imaging,” Getty Research Institution, 2003. URL:

[3] Ibid.