How can we best use huge amounts of data?

This week in class we are discussing data, data “tidying” and visualization, and data mining. We looked at theory and a variety of examples of how various scholars have used amalgamations of huge data sets to reach conclusions and visualize trends. We noted that some of these examples were more successful than others, and as a whole the class seemed to reach a rather pessimistic conclusion: so what? What do these data sets really tell us that furthers our understanding? We looked at the example of organizing paintings by color. I wholeheartedly agreed with a classmates questioning of how useful a data set of 40,000 blue images could be. Sure, she argued, we could look at the spread of pigment geographically, iconography associated with the color, or a host of other topics, but does a massive collection of images really help a scholar on that quest? I also wasn’t convinced. To further dissuade me from thinking it would be helpful, I hadn’t even thought about the way these large data sets could be skewed. Professor Bauer brought up “color pollution” or the idea that the background color of an object would also be mined for these color sets. This means that many coins are placed in black sets because of the black velvet drapes they are photographed against for collections, or that sculptures generally were not accurately tagged because of the wall color they were photographed against. So, if we were to run with the hypothetical collection of 40,000 images of works that are mainly blue, not only is this huge collection perhaps not useful to me as an individual scholar trying to make a claim, but it may not even be accurate.

Data mining is also used to identify trends in textual sources. Dan Cohen’s “Searching for the Victorians” is a great example of this, but it also raises the “so what” question from skeptics. Cohen and his fellow researchers were able to code over a million books (!!) thanks to widespread digitization of Victorian era literature by projects like Google Books and Hathitrust. Below is a graph of the number of books that reference “Revolution” in their titles (for now, only titles are analyzed, but analyzation of full text is in the pipeline for the project):

Graph showing the frequency of the word “Revolution” in the title of Victorian books from Dan Cohen’s “Searching for the Victorians”

The graph is interesting in that it lets us see how much revolution (and therefore perhaps political in/stability and social unrest) was present in the consciousness of society. The spike in the middle of the graph seems interesting and draws the viewer’s attention, but any historian would immediately know that this spike coincides with the French revolution about which there was a lot published and discussed. So again, you may be left with the question, “so what? what does this actually tell us?” In fact, some commenters asked just that in regard to Cohen’s post.

I don’t mean to be pessimistic about the use of data in the humanities, I think there is huge potential to incorporate it into research in art history and beyond. Returning to Cohen’s revolution example, I actually think there is value in simply visualizing trends. Being able to look at not only a small sample, but virtually all examples of Victorian literature and plotting trends in words used shows the general attitude of the population and what is important. Sometimes just showing data and trends is as valuable to scholarship as distinct arguments.

Forensic Architecture at the Whitney Biennial as Another Case Study

Film still from “Triple Chaser” by Forensic Architecture on view at the 2019 Whitney Biennial

I want to shift back towards collecting and mining images for a brief discussion on the piece made by Forensic Architecture included in this year’s Whitney Biennial. Forensic Architecture is an agency which comprises about 20 full-time researchers, filmmakers, and technologists, along with a team of fellows that looks into global violence, corruption, and conflict. They provide an interesting example of the ways in which image recognition and data amalgamation can be useful: as a journalistic pursuit (they try to showcase the role of a Whitney board member in profiting from violence), as a tool to recognize very different images and sort through huge sets of them, but also simply to create art (they are exhibiting in the Whitney Biennial after all!).

Forensic Architecture has enlisted artists, filmmakers, writers, data analysts, technologists, and academics in an intensively collaborative process. Maps and digital animations often play a critical role in the group’s work, allowing for painstaking recreations of shootings and disasters, and images are often culled from social media and scrutinized for information. Forensic Architecture’s work suggests a union of institutional critique and post-internet aesthetics, and it exists in many forms. On the group’s website, it lives as design-heavy interactive presentations. In museums, their work takes the form of installations dense with videos, diagrams, and elements of sound.

Alex Greenberger, “For Whitney Biennial, One Participant Targets Controversial Whitney Patron“

I encourage you to look more into how Forensic Architecture made the video that was on display at the Whitney that resulted from the larger project because my lack of understanding of the machine learning processes that made it possible also hinders my ability to talk insightfully about the piece. However, very simply, Forensic Architecture trained AI to identify images of Safariland tear gas canisters. In order to train image recognition software you need A LOT of images, it’s one of the major barriers to use. To get around this, they crowdsourced for images of the canisters (and received a disturbing amount from activists around the world). They then put these canisters against various backgrounds and repositioned them from various angles to help train the software further. Again, this is hugely simplifying the process, and the video that they produced and which was displayed at the museum goes through the process in much better detail.

I bring up this example both because I think it’s an amazing work of art and incredibly thought provoking, but also because I think this sort of image recognition training is how I can envision using large amounts of data most effectively. I can see how useful it would be to identify objects (like a teargas canister) or symbols and then train machines to find them in huge collections of images. On a grand scale this could show cross-cultural connections if we see objects or symbols in use across large geographic or temporal divides, but also in a logistical sense help viewers make sense of blurry or degrading images that the human eye may have trouble discerning.

I know in my own work when I look at colonial photographs, many photographers used the same props in multiple photos in order to create “authentic” portraits that satisfied what the colonists envisioned of the “primitives” they controlled. Using image recognition, I could potentially find all the instances in which a certain prop (or type of prop) was used and use this to highlight the fictitious nature of these photographs. Perhaps with the current state of machine learning this wouldn’t be possible, after all I would need a huge data set to train the machine, but as opposed to some of the examples we looked at in class, this type of image recognition data project may help us answer that nagging “so what” question. I’m not sure I’ll ever be able to code this type of software, although I could definitely find wonderful scholars to collaborate with. Perhaps text data would be most useful and realistic for me. I could easily chart biographical data of subjects or photographers using the basic Excel skills I already know, or use existing text mining software to go through records to pull out relevant information for my research. I’ve been the intern that has to “tidy” this type of data before for projects, so I am used to the type of work that goes into amassing data in a way that is useful for these tools. Although I have not used text-mining services in the past, I would love to work with these tools in the future as it would greatly improve my ability to get through vast archives of information. Perhaps these text based approaches are a better place for me to start as an amateur digital art historian.

How can we best use huge amounts of data?

Categories

Navigate Site

Search

Recent Posts