In the past several weeks, something of a debate has emerged regarding whether, in fact, there is a superior plotting system in R. First, a bit of history. The arguments on plotting all began with an off-hand comment about the plotting preferences of statistician and JHU Professor Jeff Leek, on the “Not So Standard Deviations” podcast of Hilary Parker and Roger Peng. The comment, as I remember it, had to do with why anyone would ever bother to use base graphics in R when a tool like ggplot2 exists.
Not so standard deviations
Recently, while listening to an episode of the podcast “Not So Standard Deviations”, by Roger Peng and Hilary Parker, the line “Doing data analysis with spreadsheets is like driving drunk” (attributed to statistician Philip Stark) stood out to me. This short phrase gets at the very notion of how very irresponsible the use of spreadsheets is for many of the routine tasks of data science. That is, spreadsheets provide a high level of accessibility to the data that is so central to the insights extracted by data scientists – and, it is this high level of control over the data itself that makes their use so very dangerous.