While going through an R plugin installation today, I realized there are some steps that some non-R users may be confused with when installing the plugins. I decided to document the steps below. I will also note that I have two versions of R running on my machine – 3.1.1 and 3.3.3 – and ran into some errors when installing my R plugin. I’ve documented resolving those below, as well.
I decided to install the Correlation R plugin from the Oracle BI Public Store to analyze data I know well – my half marathon times over the past 2.5 years. Why start with data I know well? I can quickly tell if there is an error in the calculation. If I use data I know and get results I expect, then I can trust the script and results for data sets that are unknown to me in the future.
Below is the data set I am working with for my example. I wanted to see if there was any correlation between my race times and course elevation, average HR, and max temperature (F) of the day.
From the Oracle BI Public Store, I am choosing “Correlation”.
When I click on the icon, I get the description and instructions on how to install the R tool. This plugin includes the R script and a visualization to accompany the script.
The first thing I need to do is install the R packages to my R installation. There are two main interfaces I use to build R scripts. The first one is the standard installation console, shown below.
However, I prefer R Studio (free download!) so I can see my script, console results, any variables I’ve created, and packages. I can also change the color of the editor to my preference, which is dark background with color coded keywords.
In R studio (or the standard R console, if you prefer), you can type “install.packages(“corrplot”)” then hit enter to install the corrplot package, as told to us in the Store instructions.
Once installed, you will get a visual confirmation that the package installed.
In R Studio, you also have the option of installing packages from a list. To do so, click the tab named “Packages”.
Click “Install” then enter the name of the package you want to install from the list. Click the button “Install”.
You will get confirmation in the console that the package installed. Likewise, you can also see the package listed in the full list on the bottom right-hand side of the screen.
Next, I downloaded the zip file associated with the plugin from the Store.
I unzipped the file to a folder on my desktop.
Copy the two XML files to the file location shown below (R script repository):
Next, the instructions tell you to deploy the R Viz(base64image). But what does this mean??
If you don’t install this plugin, you may get the following error: “A general error was found”.
In a not-so-great manner, the instructions want you to install another free plugin from the Store – the R Viz (Base64Image) plugin.
Once you download the zipped file, place the zipped file in the plugins path shown below.
Here is an example of where it should be placed:
Next, you should import the DVA file into Data Visualization Desktop and it *should* work.
If you have more than one version of R running on your machine, you might get the following error: “Error Processing Data”. I read through the error and see that it says “there is no package called ‘corrplot’”. Hmm. But I *did* install the package.
When I went to my R directory, I see my 2 R installations. It seems as though DVD does not connect to the most recent installation by default. It will run what is installed from the command line when installing Advanced Analytics for Data Visualization Desktop.
Under 3.3, I have all my packages. I simply copied all of these folders from 3.3 to 3.1 (which was empty).
I restarted DVD and have my Correlation Analysis project showing correctly!
To see clues on how to build out my Half Marathon Stats correlation visualization, I took a look at the 3 calculations created for the sample project.
Here is what the img_id(Cat Vs Numerical) calculation looks like in DVD.
To see how this stacks up against the XML file, I opened the file to see the inputs and outputs.
Got it. Now, I am ready to build my own correlation visualization. The first thing I did was add the “Base64ImgViz Plugin” to my canvas.
I created a new calculation using the example as my starting point for “img_id”.
To add the “img_part_id”, I copied the exact script I used for “img_id” and changed that keyword to “img_part_id”.
And again for “img_part”.
Once I placed the 3 of the calculations in the correct spot:
I get the following visualization.
My hypothesis was that elevation and max temperature correlated to my finish time, but statistics tells me otherwise. There is no strong correlation!