Allies: Oracle Data Miner, Data Visualization, and R Enterprise

Phew, it’s been a looong time since I last blogged. The frequency of my blogs is inversely related to my workload, so suffice it to say that there are lots of people interested in BI, Data Visualization, and Advanced Analytics right now!

I have had a blog post brewing in my head about all the new features of Data Visualization Desktop 4.0, but instead of showing you all the new and additional features (like everyone is doing now, therefore adding zero collective value), I want to show you how I’m demoing some of the new features using a full analytics stack, not just DV. The various advanced analytics options complement each other, not replace each other!

Background

A topic that is coming up more frequently with customers is Advanced Analytics in Oracle, specifically around R, an open source statistics programming language and software environment. I was first introduced to R in college during my capstone statistics classes but never used it post-undergrad. In grad school, we were taught to use SPSS, but by then I had already been “real-world” programming for a few years and found it easier to code my own stats programs versus learning someone else’s. …But I still had R as a base.

I’ve really been excited about the opportunity to relearn R for showing customers Oracle’s strength in the advanced analytics arena using R. This includes Oracle R Enterprise, Advanced Analytics in the Oracle database, Advanced Analytics in Oracle Business Intelligence, and Advanced Analytics & R in Data Visualization. …Yep, I finally get to use my R and statistics skills nearly 20 years after I learned them. Okay, I’ve re-learned them.

A flow that I recently presented was one using Oracle Data Miner (via the tutorial example) to filter out the insurance customers with the greatest LTV. I won’t go into too many details on this piece as I want to dedicate a blog or two to this tool on its own.

I am using the example from Oracle Data Miner that shows the flow in SQL*Developer to identify the best LTV (long term value) of customers. If you haven’t already gone through the tutorial for installing and creating flows in Data Miner, I highly recommend taking the hour or so to learn the process.

A table that is part of this process is called “INSUR_CUST_LTV_SAMPLE”.

This has proven to be a fun set of data to work with, and also real-world applicable. Before heading to ORE, let’s take a look at some fun with the data in DVD 4.0.

First things first…

  1. Create the Connection in DVD

  2. Create the Data Set to use from the Connection and connect to “INSUR_CUST_LTV_SAMPLE”.

  3. Create a new project with the new Data Set. From here, you can create whatever your mind can imagine or wonder.

One piece I’ve really enjoyed in 4.0 (not to get ahead of myself), is the “Explain” function. You can choose any Attribute in your data element list and see how that attribute correlates to other attributes and facts or find anomalies in your data. Basically, DVD goes through a series of R and Python scripts to teach you about your own data. Sidebar: If you are curious about the different R and Python scripts available to you in DVD, go to the following folder and see for yourself:

R:

Python:

When I ran “Explain” on “SEX” (right-clicking on “Sex” and choosing “Explain SEX”

I get many different panes of information. Here are a few:

And then, you can take these visualizations and use them immediately in your project.

So…enough with DVD. Let’s move to ORE…

Oracle R Enterprise allows you to connect to Oracle databases via the R interface. This can be done via RStudio or by invoking R via the command line.

To use ORE, we need to invoke the ORE library.

The first thing we need to do is connect to the database.

From the built in R datasets, let’s create a table in our Oracle database. Let’s choose ‘airquality’.

We aren’t limited to just datasets. We can create random datasets to send to the database. Here, we are creating a data frame in which the x values are 1-5 and the y values are a-e.

Next, let’s create another data frame with different, but similar parameters.

Let’s create a table with the data frames we have created from thin air.

Let’s see the list of all of our database tables from R:

AIRQUALITY, DF1, and DF2. Exactly what we would expect.

Let’s see what’s in the R list:

Since we did not add “airquality” to our R list, it doesn’t show.

Let’s save our “airquality” to our lists. Note that it was already there for me, so I got an “error”.

To show the proof that R is working with the Oracle database, here are our listings in SQL*Developer.

To show we can also do “R” things, here is an example of a summary of our data store.

So there’s the quick and dirty on how you can tie together Data Miner, Data Visualization, and ORE. These Advanced Analytics are not “or” decisions, they are easily “and” decisions!

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s