Source: Experimenting with Gephi and network visualizations

Over the past several days, I’ve been playing around with Gephi to get a better idea of what network analysis tools can do, and how I might apply Gephi (or a similar tool) to my own research. I don’t have any previous experience with network analysis or visualization, but I’m incredibly interested in the possibilities that these tools offer for a wide variety of research programs.

I began by trying out some sample datasets that Gephi includes on their GitHub page.1 I first tried to use Gleiser and Danon’s data on social networks of jazz musicians.2 However, when I loaded it into Gephi, I discovered that none of the nodes were actually labeled with the names of the musicians, each node only labeled by its ID number in the dataset. This resulted in a very intriguing looking network, as there were over 180 nodes in the network and many with multiple edges connecting them, but did not produce an intelligible visualization. I did get experience examining the dataset itself though as I investigated this issue. Since the dataset was a .NET file (not a format I was familiar with), I wondered if Gephi was having an issue relating names to their respective nodes. I was able to open the file in a text editing program and saw that, in the dataset itself, all of the musicians’ names had been replaced by ID numbers. I imagine that the researchers have a codebook to help them interpret the raw dataset, but that was not included on the Gephi GitHub.

Next, I tried out another interesting looking dataset: a social network for a class of German students in 1880.3 I was able to find the article the researchers wrote using this data through UNC’s e-journal subscriptions and read over the article to better understand what was going on with the data. For a history of social network research, the article is well worth looking up, as it describes an early mixed-methods research study conducted by a German school teacher, Johannes Delitsch, as he analyzed friendship groups in his 1880 class of boys. The present researchers have re-purposed Delitsch’s data to perform new, more high powered social network analysis on this data to see what Delitsch’s research can reveal today.

While this research project is a great example of the interesting work that can be done with social network data (and a reminder that this data is not limited to the Facebook era), I mostly used this dataset simply to learn a little bit more about how Gephi works. After importing the data, I made a few adjustments to produce a readable, usable visualization of the whole network. As there were only a couple dozen nodes, I could produce a visualization that captures the entire network.

For this visualization, I chose the Fruchterman-Reingold model, set the color of the nodes to vary in intensity of color based on how many edges are connected to them (deeper green is more connections), and turned on labels so I could the names of the classmates. While this doesn’t tell you how the friendships are formed and sustained (this information is in Delitsch’s original study and provides great social insight), the visualization does show patterns of popularity: who is the center of various social groups and who are on the outskirts.

To dig a little deeper into the data, I then experimented with some of the different filters to produce more fine grained sub-network visualizations. Fortunately, this dataset also included the direction of edges, indicating where connections were incoming (such as receiving a gift from a classmate) or outgoing (or giving a gift). I produced one visualization where I filtered the network to only those nodes with greater than 7 incoming connections and produced another visualization where I filtered the network to only those nodes with greater than 5 outgoing connections.

Visualization filtering 7 or more incoming connections.
Visualization filtering 5 or more outgoing connections.

Again, I’m not really in a position to analyze or compare these two visualizations, but for a researcher this kind of filtering could support a lot of different queries into the data.

I was able to see from this brief exercise some of the different ways in which Gephi can be used to create visualizations and support network analysis methods. However, network analysis be used to many other ends as well. Pamela Fletcher and Anne Helmreich’s project mapping the 19th century London art world4 and Stanford’s ORBIS5 are two examples of how network analysis can be paired with geographic information to explore how networks form and manifest effects in both time and space. These examples also illustrate, however, that additional resources and expertise quickly become necessary when projects move beyond free tools like Gephi. Both of these projects had dedicated programmers working together with humanities researchers to produce the unique, interactive network visualizations.

Elijah Meeks and Karl Grossner describe ORBIS as an “interactive scholarly work” (ISW), characterizing this as a new potential scholarly output in addition to more traditional models like the journal article or monograph.6 ORBIS not only represents new and innovative scholarship into the Roman world of antiquity, but also provides an interface for individuals to make their own discoveries and support their own research. Of course, the traditional journal article is also founded on the idea that the information it presents builds upon previous scholarship and serves the development of future scholarship, but something like ORBIS makes this manifest by providing the means for interaction and direct engagement. ORBIS does more than just network analysis and visualization, but these methods clearly play an integral role in the new kinds of scholarly projects that ORBIS demonstrates—those that blur the lines between publication, research tool, and online exhibition.

Across time and history, networks of people, places, and materials have been hugely significant forces; while the importance of networks has been long recognized by scholars (as evidenced by Delitsch’s work), digital tools provide ways to interrogate and visualize these complex structures in ways that had not previously been possible. These examples illustrate the kinds of exciting projects that can be done with network analysis, but also demonstrate that additional expertise and resources quickly become necessary when projects move beyond free tools like Gephi.

NOTES

[1] https://github.com/gephi/gephi/wiki/Datasets

[2] P.Gleiser and L. Danon , Adv. Complex Syst.6, 565 (2003).

[3] Heidler, R., Gamper, M., Herz, A., Eßer, F. (2014): Relationship patterns in the 19th century: The friendship network in a German boys’ school class from 1880 to 1881 revisited. Social Networks 13: 1-13.

[4] Pamela Fletcher and Anne Helmreich, with David Israel and Seth Erickson, “Local/Global: Mapping Nineteenth-Century London’s Art Market,” Nineteenth Century Art Worldwide 11:3 (Autumn 2012). http://www.19thc-artworldwide.org/index.php/autumn12/fletcher-helmreich-mapping-the-london-art-market.

[5] http://orbis.stanford.edu/

[6] Elijah Meeks and Karl Grossner, “Modeling Networks and Scholarship with ORBIS,” Journal of Digital Humanities (2012).