Visualizing Big Data: The Perils of Pretty Colours and Fancy Graphs

By saramnixon


This week in #hist5702x we briefly glimpsed into the world of Big Data and data mining. In seminar we’ve repeatedly discussed how wonderful it is to have an overwhelming number of historical texts now digitized online. Keywords like ‘accessibility,’ ‘interactivity,’, ‘flexibility’, and ‘non-linearity’ have been brought up a lot in our discussions (and in my blog) when we talk about the possibilities of doing history in the digital world. But now, after reading Tim Hitchcock’s blog post “Big Data for Dead People:Digital Readings and the Conundrums of Positivism,” I too, have begun to reflect on how historians are actually reading and using digital texts in their own research and work. I was struck by Hitchcock’s suggestion that the technology of the digital still defines the questions that we ask of our object of study, and that we must change our approach to historical research and scholarship in the digital world in order to ask new questions. Reflecting on this point, I have realized that what Hitchcock alludes to is actually what I have been struggling with the most as I develop my academic relationship with the digital.

The examples of digital projects and digital tools that have been brought up in seminar have been plenty. Naturally, for this week’s topic on Big Data and Data Mining, our class looked at and read about all sorts of really interesting data mining tools. Some of these tools included Voyant Tools, Image Plot, Palladio and The Overview Project. These tools allow you to visualize data and make connections, find relationships using pretty colours, fancy lines and shapes, and complex graphs. We also read about some interesting digital humanities projects that used some of these Big Data tools. What I seem to struggle with most these this projects and how digital humanists are using these tools is the ‘So, what?’. Right now, it seems that we are using these pretty colours and impressive graphics to look differently at the same questions.

Voyant tools created this word map of John Adam’s diary, 1753-1804. I plugged in the diary text and this is what the program made me. Fancy, huh?

For example, in seminar, we were asked to plug the 1753-1804 diary of John Adams into Voyant. The program was able to count the most frequent words in the diary and made a word map of these. The word map looks fancy and it presents history in an appealing and visual way, but it does not tell us anything new about John Adams or even the place and time in which he was writing. I am not surprised that Adams, as a white male and an American politician in the eighteenth century, spent a lot of his time socializing and dining with men. Voyant also lets you compare frequencies of words in relation to others, and track words over time, but again, I don’t think this will tell me anything new. But hey, it looks neat!

I am not saying that these kinds of data mining tools are useless. These tools, like Voyant, enable us to read our sources in different ways, and this opens us to deeper understandings of the historical texts we study. However, we are asking the same questions and searching for the same answers, they just look more exciting. This was part of Hitchcock’s concern as well. I think that right now, we seem to be using big data and data mining to make our research easier, we can let our computers help us gather our sources and identify the links, connections and trends in our evidence. Here, I am particularly thinking of Syllabus Finder and H-Bot (two projects developed by Daniel J. Cohen) and even how academics and students use huge databases of historical records like Old Bailey Online (which I was happy to learn was in part developed by Hitchcock). In my 3rd year History of Crime and Punishment course at the University of Guelph, my professor actually asked us to write an essay using three cases we found on the Old Bailey database. Here, a whole cohort of history students were introduced to this huge digital repository of criminal trial records and they were asked to write a traditional essay. No new questions, no new answers. We treated the database as we would any other archive (especially as undergrads). I am not saying there is anything wrong with this using big data in this way either. I am suggesting, however, that maybe, the way we are using big data, the questions we are asking and the answers we are looking for, has a lot to do with our training as historians.

And for a positive conclusion: this is why I really like #hist5702x. As we develop our Air Canada Augmented Reality project, we are being challenged to ask different questions of our historical sources and to consider new ways of presenting history. We are not writing essays based on data we find online nor are we making word maps of the same texts that we can read, we are creating 3D models, linking to videos from the 1960s, combining audio with images, and developing non-linear narratives that are experienced interactively. We are creating an embodied experience all packed into a coffee-table book. This is new and surprising. This is what excites me about doing history in the digital world. This is the relationship I want.

Source: Sara