Be they arts-related or not, digitized or not, archival collections can be overwhelming for scholars. Where to start? What do they contain? Where should they look for the information they need most? What unexpected information does the material contain and how can they easily locate it? As both an art historian and an aspiring archivist, I see a lot of potential in data analysis tools that help users find access points within larger collections. Data visualization would help users make more informed decisions about whether a collection would be worth investing time in. Of course, some types of data has to be generated by a person, so not all useful or relevant elements of an archival collection or document would necessarily be accounted for, but I still think data analysis would be beneficial if done correctly. If an art historian already had an archival collection they knew was necessary for their research, data visualization could be all the more helpful in discovering hard to see or unrealized themes within an artist’s life or practice.

I have very little experience with data analyzation so several author’s projects and writings helped me to understand how and when such tools would be useful (and also when data analysis is probably less than necessary). I was especially intrigued  by Dan Cohen’s Searching for the Victorians project, as it translated textual material (books) into observable societal trends. Using Victorian era books available online through the Hathi Trust, Cohen was able to generate data visualizations that showed when certain terms became more popular, and thus shed light on how people of the Victorian era conceptualized issues, concerns and the world at-large. I find data visualizations such as Cohens to be particularly useful when approaching material that is not in my specific field. I was able to digest and understand information that would have taken pages to communicate in the written word. There were, of course, some data visualizations that I found less than useful. Of course the graph that plotted books that featured the word ‘revolution’ spiked around the French Revolution. Yet still, it is crucial (ok, maybe useful) to know that people were thinking and writing about the French Revolution while it was happening, even if it is a bit obvious.

I see a lot of potential in Voyant for creating access points within artists’ archives. As artists’ archives usually contain a wide array of materials, I’m going to use this post to focus on what kind of data cleanup would be necessary to process artists’ correspondences. Initially, I had thought I would pretend I was working with handwritten sketchbooks/diaries as they are, in my opinion, some of the most valuable materials in an artist’s personal collection. I use the term sketchbook/diary as a catch-all term that would be any notebook that contains sketches, design ideas, notes, and musings that offer insight into the artist’s mind and creative process. The prep work, data generation and clean-up processes for this kind of archival material could quickly get complicated, even messy, and in the end I’m not sure Voyant would be ideal for this kind of data analysis. I would have to somehow be able to indicate that the textual elements had accompanying images, and it doesn’t seem as if Voyant is designed for that kind of material (if anyone knows otherwise, please say so)*. Given that OCR technologies often struggle with handwritten materials, my imaginary artists’ correspondence is typed. I’d like to not though that if OCR could be done for handwritten letters, they could certainly be analyzed using Voyant.

In terms of preparing materials for analysis, the letters would first need to be scanned. I would then run all of the scanned materials through an OCR software. This would create text files that could then be uploaded to Voyant. A decision would have to be made regarding how to separate out and identify items; would each letter be an item? Would each correspondence set (with replies) be an item? Would letters from both the artist and those with whom they corresponded be included, or would only letters written by the artist be included? As I am still a new Voyant user, I am not sure what the best answers to these questions are, but I do know that depending on the answer, the data could look different or be used differently. For letters, it would be important to exclude words from Voyants analyzation (such as the, and, I, etc.) Once in Voyant, the correspondence could be analyzed using a word cloud (to see what the artist was thinking/talking about most), a graph (to see the relative frequency of terms), or the reader, which allows users to click on a word and see where it appears in other documents.

While nothing replicates the experience of slowly reading through an artist’s correspondence**, I think Voyant could be a very useful tool for art historians when writing about an artist’s life.

*If I only had handwritten letters or materials, such as sketchbooks/diaries, I would catalog these materials and populate necessary fields like author, date, location created, material, if it included visual materials (drawings etc.), and then also come up with a limited but hopefully useful set of subject tags (maybe dictated by my research needs).  This information could then be translated into a spreadsheet, and analysis could be performed using Excel or Tableau.

**I will say, I worked with an artist’s archive last summer, and I have never seen such creative use of type-writter generated text; artists wrote in spirals, zigzags, inserted poems into the text, and really used typed text to express their emotions, moods and set the tenor of a written exchange. Much of this text would be hard for OCR to analyze.