This post is the first in a series where I’ll be comparing, integrating, and supplementing R with Data Visualization. At its core, DV is a Java application built on R. Since it’s built on R, we can do some really cool things with custom R scripts and showing how tight the integration is between DV and R.
Since I often get questions from customers and conference attendees regarding R, I’ve been ramping up my R game over the past couple weeks. A question I’ve gotten more than once is if R can visualize the data (and mash up data from different sources), what is the benefit of using Data Visualization as a data scientist? I usually try and talk to the fact that you don’t have to code, you can drag and drop data elements, the column joining is done automatically by the tool, and you don’t need to be a data scientist to use DV!
But talk is cheap.
I’m going to build the same visualization in R as in DV to emphasize what DV brings to the table over just R. I will repeat that DV is a tool built using Java and R, so the tools’ visualizations should look similar.
In R, I have built the following script. The first few lines are setting up where to pull in the data then, actually, pulling it into R for analysis.
When I run “head(st)”, I get the first 6 rows of the st data set.
After I run the line to activate the ggplot2 package (so that I can create visualizations), I am ready to run my first visualization in R. Here, I am putting Humidity on the x-axis and Fluid Loss per Hour on the y-axis.
In DV, I get the exact same visualization, except I need to add an Attribute. I’ll add a unique row identifier to be safe (CLIENT_ID).
To get females only, I need to add in the filter to my code:
However, in DV, I can just drag and drop GENDER to filter:
…And no need to write code. I also changed the plots to red to keep consistent with my code.
And the same for males:
This is really to show what can be done visually in R can be done better in Data Visualization. You can also highlight various points in the visualization and choose to drill in on them based on different attributes in your data set.
…You can’t do that in R!