As Ted Underwood points out in his text

A majority of the time is as

‘• Column headers are values, not variable names.

• Multiple variables are stored in one column.
• Variables are stored in both rows and columns.
• Multiple types of observational units are stored in the same table.
• A single observational unit is stored in multiple tables.’ (Wickham, 6).

I have used Excel and other tools to analyze data sets before, but have always had difficulty in how to structure data to get the desired outcome. It has always been with some meddling that is forgotten after and can’t be replicated (also going back to previous weeks readings, to remember to write down the process, in order to be able to replicate a process and also what to avoid).

My focus has primarily been on qualitative analysis, where large data sets has been more of a nuisance. I agreed with the discussions during class, whereas the question of what these large data sets actually could be used for? As Pamela Fletcher and Anne Helmreich, with David Israel and Seth Erickson project

Some of the tools we have gone through during this week’s class, such as _Voyant_; a platform that enables keyword search and comparison of texts would be useful for upcoming projects. An example would be to do a content analysis where keywords could be analyzed in relation to time; when were certain keywords used more and not etc. From this a more qualitative discourse analysis could be concluded thanks to the distant reading done with Voyant or similar a similar tool. Similar to the Google tool Ngram Viewer, which enables you to see usage of phrases in corpus of literature, where you can also focus on specific periods of time. Using these kinds of tools makes it easy to get a broader grasp of word usage that I could see myself using in first step of analysis. Important to keep in mind is also what Underwood points out; these kind of tools may give you the impression that you don’t need to do any programming of your own, due to the large body of tools already out there. However, these available tools offer more of a scope of what is possible, but with own projects, it will most likely require you to programme in order for you to effectively focus your methodological approach.


Hadley Wickham, “Tidy Data,” Journal of Statistical Software, Submitted.

Pamela Fletcher and Anne Helmreich, with David Israel and Seth Erickson, “Local/Global: Mapping Nineteenth-Century London’s Art Market,” Nineteenth Century Art Worldwide 11:3 (Autumn 2012).

Ted Underwood, “Where to Start with Text Mining,” The Stone and the Shell.