In the past several weeks, something of a debate has emerged regarding whether, in fact, there is a superior plotting system in R. First, a bit of history. The arguments on plotting all began with an off-hand comment about the plotting preferences of statistician and JHU Professor Jeff Leek, on the “Not So Standard Deviations” podcast of Hilary Parker and Roger Peng. The comment, as I remember it, had to do with why anyone would ever bother to use base graphics in R when a tool like ggplot2 exists.
An interesting phenomenon that has appeared in recent years is the notion of “Quantified Self” - simply the idea that quantifying much of our daily activity can lead to insights about our behaviors, and that a more thorough knowledge of our own behavior can help us to be more mindful of our health and lifestyle choices. In a previous post, I explored (in a very rudimentary fashion) information about my genome, in the form of single nucleotide polymorphisms sequenced by 23andMe™.
Recently, while listening to an episode of the podcast “Not So Standard Deviations”, by Roger Peng and Hilary Parker, the line “Doing data analysis with spreadsheets is like driving drunk” (attributed to statistician Philip Stark) stood out to me. This short phrase gets at the very notion of how very irresponsible the use of spreadsheets is for many of the routine tasks of data science. That is, spreadsheets provide a high level of accessibility to the data that is so central to the insights extracted by data scientists – and, it is this high level of control over the data itself that makes their use so very dangerous.