Tuesday, September 23, 2014

Plot C5.0 decision trees in R

Introduction

In many cases, the user of C50 R-package wants to present the resulted decision tree in some graphical form. Such a plotting feature however is not available in the C50 package. This R code overcomes the plotting issue by providing automatic interpretation of the decision tree to GraphViz design language, enabling then to straightforwardly plot the tree model using the dot command of GraphViz. In case the user has not a local installation GraphViz (open source software), they may freely acquire it from this site.

The steps to plot the decision tree are the following:
  1. Generate your model using the C50 package in R, as usual.
  2. Interpret the model output and save it to a file, by calling C5.0.graphviz function (given later in this page), using as parameters your C5.0 model and a desired text output filename.
  3. In your operating system, call the GraphViz's dot command with proper parameter syntax (example given below).

Usage

C5.0.graphviz ( C5.0.model, filename, fontname ='Arial',  col.draw ='black', col.font ='blue',  col.conclusion ='lightpink',  col.question = 'grey78', shape.conclusion ='box3d',  shape.question ='diamond', bool.substitute = c('None',     'yesno','truefalse','TF'), prefix=FALSE, vertical=TRUE )

Arguments

C5.0.model       The name of a variable which is a valid C5.0 model result.
filename         The name of a file where the output GraphViz model will be saved to.
fontname         The font that will be used for the graph.
col.draw         The color of the drawing lines.
col.font         The color of the font.
col.conclusion   The color which will be used for conclusion nodes (tree leaves).
col.question     The color which will be used for question nodes (tree inner nodes).
shape.conclusion The shape which will be used for conclusion nodes (tree leaves).
shape.question   The shape which will be used for question nodes (tree inner nodes).
bool.substitute  A substitution may take place for boolean comparisons. Default is 'None' which will plot '= 0' and '= 1' on the respective decision tree branches. The option 'yesno', will plot 'no' and 'yes' respectively, the 'truefalse' will plot 'false' and 'true' and the 'TF' option will plot '.F.' and '.T.' (considering the value of 0 as 'false' and 1 as 'true')
prefix           When set to true, the class nodes will have the prefix 'Class' before the class number (useful to multiclass problems where the class is referred by a single number).
vertical         The orientation of the decision tree. Default is vertical, and if this is set to False, the tree is drawn from left to right.

Details

In GraphViz, the X11 color scheme, the SVG scheme, and the Brewer scheme are supported, with X11 being the default. For an exhaustive list of candidate colors, the user can check the respective GraphViz page here.

An exhaustive list of candidate node shapes, is also included in this GraphViz page.

Some information on GraphViz's fonts can be found here.

Value

If successful, the function will create a text file at the given directory, containing the decision tree model described in GraphViz's dot language.

Note

In respect of boolean substitutions using the bool.substitute parameter, it is noted that the routine in this version is not able to know whether the comparison is indeed of boolean nature, neither traces other boolean comparisons with arithmetic arguments (e.g. between '1' and '2'), i.e. the comparison is performed between '0' and '1' (both required, as inner node options, for a successful substitution).

Update: Version 2 extends the translation into multi-branched trees. Version 1 was able to handle only trees with binary splits. 
Version 2.2 corrects a missing initialization of the firstindent variable.

Example

We use the example from the C50 package, a data set from the MLC++ machine learning software for modeling customer churn.

library(C50)
data(churn)
treeModel <- C5.0(x = churnTrain[, -20], y = churnTrain$churn)
summary(treeModel) #to compare output
C5.0.graphviz(treeModel, 'c:\\dtreeout.txt', col.question ='cyan')

(The generated output of the C5.0.graphviz routine, contained in the dtreeout.txt file is shown in the Appendix, at the end of this page).

Then, in the operating system, we ensure we have access to dot command of the GraphViz package (having the directory either in path, or having navigated to the respective directory) and we enter the following command (here, presumed from a WINDOWS command prompt):

dot -Tpng c:\dtreeout.txt > c:\dtreeout.png

This command produces a graphic file named 'dtreeout.png'.  This file, includes the following graph, which depicts graphically the example decision tree (not shown in actual size here):



Code

#---------------------------------------------------------#
# Function: C5.0.graphviz                                 # 
# Version: 2.2.0                                          #
# Date: 26/09/2014                                        #
# Author: Athanasios Tsakonas                             #
# This code implements C5.0.graphviz conversion routine   #
#---------------------------------------------------------#

C5.0.graphviz <- function( C5.0.model, filename, fontname ='Arial',col.draw ='black',
col.font ='blue',col.conclusion ='lightpink',col.question = 'grey78',
shape.conclusion ='box3d',shape.question ='diamond', 
bool.substitute = 'None', prefix=FALSE, vertical=TRUE ) {

library(cwhmisc)  
library(stringr) 
treeout <- C5.0.model$output
treeout<- substr(treeout, cpos(treeout, 'Decision tree:', start=1)+14,nchar(treeout))
treeout<- substr(treeout, 1,cpos(treeout, 'Evaluation on training data', start=1)-2)
variables <- data.frame(matrix(nrow=500, ncol=4)) 
names(variables) <- c('SYMBOL','TOKEN', 'TYPE' , 'QUERY') 
connectors <- data.frame(matrix(nrow=500, ncol=3)) 
names(connectors) <- c('TOKEN', 'START','END')
theStack <- data.frame(matrix(nrow=500, ncol=1)) 
names(theStack) <- c('ITEM')
theStackIndex <- 1
currentvar <- 1
currentcon <- 1
open_connection <- TRUE
previousindent <- -1
firstindent <- 4
substitutes <- data.frame(None=c('= 0','= 1'), yesno=c('no','yes'),
truefalse=c('false', 'true'),TF=c('F','T'))
dtreestring<-unlist( scan(text= treeout,   sep='\n', what =list('character')))

for (linecount in c(1:length(dtreestring))) {
lineindent<-0
shortstring <- str_trim(dtreestring[linecount], side='left')
leadingspaces <- nchar(dtreestring[linecount]) - nchar(shortstring)
lineindent <- leadingspaces/4
dtreestring[linecount]<-str_trim(dtreestring[linecount], side='left') 
while (!is.na(cpos(dtreestring[linecount], ':   ', start=1)) ) {
lineindent<-lineindent + 1 
dtreestring[linecount]<-substr(dtreestring[linecount],
ifelse(is.na(cpos(dtreestring[linecount], ':   ', start=1)), 1,
cpos(dtreestring[linecount], ':   ', start=1)+4),
nchar(dtreestring[linecount]) )
shortstring <- str_trim(dtreestring[linecount], side='left')
leadingspaces <- nchar(dtreestring[linecount]) - nchar(shortstring)
lineindent <- lineindent + leadingspaces/4
dtreestring[linecount]<-str_trim(dtreestring[linecount], side='left')
}
if (!is.na(cpos(dtreestring[linecount], ':...', start=1)))
lineindent<- lineindent +  1 
dtreestring[linecount]<-substr(dtreestring[linecount],
ifelse(is.na(cpos(dtreestring[linecount], ':...', start=1)), 1,
cpos(dtreestring[linecount], ':...', start=1)+4),
nchar(dtreestring[linecount]) )
dtreestring[linecount]<-str_trim(dtreestring[linecount])
stringlist <- strsplit(dtreestring[linecount],'\\:')
stringpart <- strsplit(unlist(stringlist)[1],'\\s')
if (open_connection==TRUE) { 
variables[currentvar,'TOKEN'] <- unlist(stringpart)[1]
variables[currentvar,'SYMBOL'] <- paste('node',as.character(currentvar), sep='')
variables[currentvar,'TYPE'] <- shape.question
variables[currentvar,'QUERY'] <- 1
   theStack[theStackIndex,'ITEM']<-variables[currentvar,'SYMBOL']
theStack[theStackIndex,'INDENT'] <-firstindent 
theStackIndex<-theStackIndex+1
currentvar <- currentvar + 1
if(currentvar>2) {
  connectors[currentcon - 1,'END'] <- variables[currentvar - 1, 'SYMBOL']
}
   }
connectors[currentcon,'TOKEN'] <- paste(unlist(stringpart)[2],unlist(stringpart)[3])
if (connectors[currentcon,'TOKEN']=='= 0') 
connectors[currentcon,'TOKEN'] <- as.character(substitutes[1,bool.substitute])
if (connectors[currentcon,'TOKEN']=='= 1') 
connectors[currentcon,'TOKEN'] <- as.character(substitutes[2,bool.substitute])
if (open_connection==TRUE) { 
if (lineindent<previousindent) {
theStackIndex <- theStackIndex-(( previousindent- lineindent)  +1 )
currentsymbol <-theStack[theStackIndex,'ITEM']
} else
currentsymbol <-variables[currentvar - 1,'SYMBOL']
} else {  
currentsymbol <-theStack[theStackIndex-((previousindent -lineindent ) +1    ),'ITEM']
theStackIndex <- theStackIndex-(( previousindent- lineindent)    )
}
connectors[currentcon, 'START'] <- currentsymbol
currentcon <- currentcon + 1
open_connection <- TRUE 
if (length(unlist(stringlist))==2) {
 stringpart2 <- strsplit(unlist(stringlist)[2],'\\s')
variables[currentvar,'TOKEN'] <- paste(ifelse((prefix==FALSE),'','Class'), unlist(stringpart2)[2]) 
variables[currentvar,'SYMBOL'] <- paste('node',as.character(currentvar), sep='')
variables[currentvar,'TYPE'] <- shape.conclusion
variables[currentvar,'QUERY'] <- 0
currentvar <- currentvar + 1
connectors[currentcon - 1,'END'] <- variables[currentvar - 1,'SYMBOL']
open_connection <- FALSE
}
previousindent<-lineindent
}
runningstring <- paste('digraph g {', 'graph ', sep='\n')
runningstring <- paste(runningstring, ' [rankdir="', sep='')
runningstring <- paste(runningstring, ifelse(vertical==TRUE,'TB','LR'), sep='' )
runningstring <- paste(runningstring, '"]', sep='')
  for (lines in c(1:(currentvar-1))) {
  runningline <- paste(variables[lines,'SYMBOL'], '[shape="')
  runningline <- paste(runningline,variables[lines,'TYPE'], sep='' )
  runningline <- paste(runningline,'" label ="', sep='' )
  runningline <- paste(runningline,variables[lines,'TOKEN'], sep='' )
  runningline <- paste(runningline,
  '" style=filled fontcolor=', sep='')
  runningline <- paste(runningline, col.font)
  runningline <- paste(runningline,' color=' )
  runningline <- paste(runningline, col.draw)
  runningline <- paste(runningline,' fontname=')
  runningline <- paste(runningline, fontname)
  runningline <- paste(runningline,' fillcolor=')
  runningline <- paste(runningline,
  ifelse(variables[lines,'QUERY']== 0 ,col.conclusion,col.question))
  runningline <- paste(runningline,'];')
  runningstring <- paste(runningstring, runningline , sep='\n')
  }
  for (lines in c(1:(currentcon-1)))
  runningline <- paste (connectors[lines,'START'], '->')
  runningline <- paste (runningline, connectors[lines,'END'])
  runningline <- paste (runningline,'[label="')
  runningline <- paste (runningline,connectors[lines,'TOKEN'], sep='')
  runningline <- paste (runningline,'" fontname=', sep='')
  runningline <- paste (runningline, fontname)
  runningline <- paste (runningline,'];')
  runningstring <- paste(runningstring, runningline , sep='\n')
  }
runningstring <- paste(runningstring,'}')
cat(runningstring)
  sink(filename, split=TRUE)
cat(runningstring)
sink()
}

Appendix

The example decision tree as shown in the C50 package summary and the generated output by the C5.0.graphviz routine are shown below.

C50 summary output tree:

total_day_minutes > 264.4:
:...voice_mail_plan = yes:
:   :...international_plan = no: no (45/1)
:   :   international_plan = yes: yes (8/3)
:   voice_mail_plan = no:
:   :...total_eve_minutes > 187.7:
:       :...total_night_minutes > 126.9: yes (94/1)
:       :   total_night_minutes <= 126.9:
:       :   :...total_day_minutes <= 277: no (4)
:       :       total_day_minutes > 277: yes (3)
:       total_eve_minutes <= 187.7:
:       :...total_eve_charge <= 12.26: no (15/1)
:           total_eve_charge > 12.26:
:           :...total_day_minutes <= 277:
:               :...total_night_minutes <= 224.8: no (13)
:               :   total_night_minutes > 224.8: yes (5/1)
:               total_day_minutes > 277:
:               :...total_night_minutes > 151.9: yes (18)
:                   total_night_minutes <= 151.9:
:                   :...account_length <= 123: no (4)
:                       account_length > 123: yes (2)
total_day_minutes <= 264.4:
:...number_customer_service_calls > 3:
    :...total_day_minutes <= 160.2:
    :   :...total_eve_charge <= 19.83: yes (79/3)
    :   :   total_eve_charge > 19.83:
    :   :   :...total_day_minutes <= 120.5: yes (10)
    :   :       total_day_minutes > 120.5: no (13/3)
    :   total_day_minutes > 160.2:
    :   :...total_eve_charge > 12.05: no (130/24)
    :       total_eve_charge <= 12.05:
    :       :...total_eve_calls <= 125: yes (16/2)
    :           total_eve_calls > 125: no (3)
    number_customer_service_calls <= 3:
    :...international_plan = yes:
        :...total_intl_calls <= 2: yes (51)
        :   total_intl_calls > 2:
        :   :...total_intl_minutes <= 13.1: no (173/7)
        :       total_intl_minutes > 13.1: yes (43)
        international_plan = no:
        :...total_day_minutes <= 223.2: no (2221/60)
            total_day_minutes > 223.2:
            :...total_eve_charge <= 20.5: no (295/22)
                total_eve_charge > 20.5:
                :...voice_mail_plan = yes: no (20)
                    voice_mail_plan = no:
                    :...total_night_minutes > 174.2: yes (50/8)
                        total_night_minutes <= 174.2:
                        :...total_day_minutes <= 246.6: no (12)
                            total_day_minutes > 246.6:
                            :...total_day_charge <= 43.33: yes (4)
                                total_day_charge > 43.33: no (2)

Produced C5.0.graphviz dot description: 

digraph g {
graph [rankdir="TB"]
node1 [shape="diamond" label ="total_day_minutes" style=filled fontcolor= blue  color= black  fontname= Arial  fillcolor= cyan ];
node2 [shape="diamond" label ="voice_mail_plan" style=filled fontcolor= blue  color= black  fontname= Arial  fillcolor= cyan ];
node3 [shape="diamond" label ="international_plan" style=filled fontcolor= blue  color= black  fontname= Arial  fillcolor= cyan ];
node4 [shape="box3d" label =" no" style=filled fontcolor= blue  color= black  fontname= Arial  fillcolor= lightpink ];
node5 [shape="box3d" label =" yes" style=filled fontcolor= blue  color= black  fontname= Arial  fillcolor= lightpink ];
node6 [shape="diamond" label ="total_eve_minutes" style=filled fontcolor= blue  color= black  fontname= Arial  fillcolor= cyan ];
node7 [shape="diamond" label ="total_night_minutes" style=filled fontcolor= blue  color= black  fontname= Arial  fillcolor= cyan ];
node8 [shape="box3d" label =" yes" style=filled fontcolor= blue  color= black  fontname= Arial  fillcolor= lightpink ];
node9 [shape="diamond" label ="total_day_minutes" style=filled fontcolor= blue  color= black  fontname= Arial  fillcolor= cyan ];
node10 [shape="box3d" label =" no" style=filled fontcolor= blue  color= black  fontname= Arial  fillcolor= lightpink ];
node11 [shape="box3d" label =" yes" style=filled fontcolor= blue  color= black  fontname= Arial  fillcolor= lightpink ];
node12 [shape="diamond" label ="total_eve_charge" style=filled fontcolor= blue  color= black  fontname= Arial  fillcolor= cyan ];
node13 [shape="box3d" label =" no" style=filled fontcolor= blue  color= black  fontname= Arial  fillcolor= lightpink ];
node14 [shape="diamond" label ="total_day_minutes" style=filled fontcolor= blue  color= black  fontname= Arial  fillcolor= cyan ];
node15 [shape="diamond" label ="total_night_minutes" style=filled fontcolor= blue  color= black  fontname= Arial  fillcolor= cyan ];
node16 [shape="box3d" label =" no" style=filled fontcolor= blue  color= black  fontname= Arial  fillcolor= lightpink ];
node17 [shape="box3d" label =" yes" style=filled fontcolor= blue  color= black  fontname= Arial  fillcolor= lightpink ];
node18 [shape="diamond" label ="total_night_minutes" style=filled fontcolor= blue  color= black  fontname= Arial  fillcolor= cyan ];
node19 [shape="box3d" label =" yes" style=filled fontcolor= blue  color= black  fontname= Arial  fillcolor= lightpink ];
node20 [shape="diamond" label ="account_length" style=filled fontcolor= blue  color= black  fontname= Arial  fillcolor= cyan ];
node21 [shape="box3d" label =" no" style=filled fontcolor= blue  color= black  fontname= Arial  fillcolor= lightpink ];
node22 [shape="box3d" label =" yes" style=filled fontcolor= blue  color= black  fontname= Arial  fillcolor= lightpink ];
node23 [shape="diamond" label ="number_customer_service_calls" style=filled fontcolor= blue  color= black  fontname= Arial  fillcolor= cyan ];
node24 [shape="diamond" label ="total_day_minutes" style=filled fontcolor= blue  color= black  fontname= Arial  fillcolor= cyan ];
node25 [shape="diamond" label ="total_eve_charge" style=filled fontcolor= blue  color= black  fontname= Arial  fillcolor= cyan ];
node26 [shape="box3d" label =" yes" style=filled fontcolor= blue  color= black  fontname= Arial  fillcolor= lightpink ];
node27 [shape="diamond" label ="total_day_minutes" style=filled fontcolor= blue  color= black  fontname= Arial  fillcolor= cyan ];
node28 [shape="box3d" label =" yes" style=filled fontcolor= blue  color= black  fontname= Arial  fillcolor= lightpink ];
node29 [shape="box3d" label =" no" style=filled fontcolor= blue  color= black  fontname= Arial  fillcolor= lightpink ];
node30 [shape="diamond" label ="total_eve_charge" style=filled fontcolor= blue  color= black  fontname= Arial  fillcolor= cyan ];
node31 [shape="box3d" label =" no" style=filled fontcolor= blue  color= black  fontname= Arial  fillcolor= lightpink ];
node32 [shape="diamond" label ="total_eve_calls" style=filled fontcolor= blue  color= black  fontname= Arial  fillcolor= cyan ];
node33 [shape="box3d" label =" yes" style=filled fontcolor= blue  color= black  fontname= Arial  fillcolor= lightpink ];
node34 [shape="box3d" label =" no" style=filled fontcolor= blue  color= black  fontname= Arial  fillcolor= lightpink ];
node35 [shape="diamond" label ="international_plan" style=filled fontcolor= blue  color= black  fontname= Arial  fillcolor= cyan ];
node36 [shape="diamond" label ="total_intl_calls" style=filled fontcolor= blue  color= black  fontname= Arial  fillcolor= cyan ];
node37 [shape="box3d" label =" yes" style=filled fontcolor= blue  color= black  fontname= Arial  fillcolor= lightpink ];
node38 [shape="diamond" label ="total_intl_minutes" style=filled fontcolor= blue  color= black  fontname= Arial  fillcolor= cyan ];
node39 [shape="box3d" label =" no" style=filled fontcolor= blue  color= black  fontname= Arial  fillcolor= lightpink ];
node40 [shape="box3d" label =" yes" style=filled fontcolor= blue  color= black  fontname= Arial  fillcolor= lightpink ];
node41 [shape="diamond" label ="total_day_minutes" style=filled fontcolor= blue  color= black  fontname= Arial  fillcolor= cyan ];
node42 [shape="box3d" label =" no" style=filled fontcolor= blue  color= black  fontname= Arial  fillcolor= lightpink ];
node43 [shape="diamond" label ="total_eve_charge" style=filled fontcolor= blue  color= black  fontname= Arial  fillcolor= cyan ];
node44 [shape="box3d" label =" no" style=filled fontcolor= blue  color= black  fontname= Arial  fillcolor= lightpink ];
node45 [shape="diamond" label ="voice_mail_plan" style=filled fontcolor= blue  color= black  fontname= Arial  fillcolor= cyan ];
node46 [shape="box3d" label =" no" style=filled fontcolor= blue  color= black  fontname= Arial  fillcolor= lightpink ];
node47 [shape="diamond" label ="total_night_minutes" style=filled fontcolor= blue  color= black  fontname= Arial  fillcolor= cyan ];
node48 [shape="box3d" label =" yes" style=filled fontcolor= blue  color= black  fontname= Arial  fillcolor= lightpink ];
node49 [shape="diamond" label ="total_day_minutes" style=filled fontcolor= blue  color= black  fontname= Arial  fillcolor= cyan ];
node50 [shape="box3d" label =" no" style=filled fontcolor= blue  color= black  fontname= Arial  fillcolor= lightpink ];
node51 [shape="diamond" label ="total_day_charge" style=filled fontcolor= blue  color= black  fontname= Arial  fillcolor= cyan ];
node52 [shape="box3d" label =" yes" style=filled fontcolor= blue  color= black  fontname= Arial  fillcolor= lightpink ];
node53 [shape="box3d" label =" no" style=filled fontcolor= blue  color= black  fontname= Arial  fillcolor= lightpink ];
node1 -> node2 [label="> 264.4" fontname= Arial ];
node2 -> node3 [label="= yes" fontname= Arial ];
node3 -> node4 [label="= no" fontname= Arial ];
node3 -> node5 [label="= yes" fontname= Arial ];
node2 -> node6 [label="= no" fontname= Arial ];
node6 -> node7 [label="> 187.7" fontname= Arial ];
node7 -> node8 [label="> 126.9" fontname= Arial ];
node7 -> node9 [label="<= 126.9" fontname= Arial ];
node9 -> node10 [label="<= 277" fontname= Arial ];
node9 -> node11 [label="> 277" fontname= Arial ];
node6 -> node12 [label="<= 187.7" fontname= Arial ];
node12 -> node13 [label="<= 12.26" fontname= Arial ];
node12 -> node14 [label="> 12.26" fontname= Arial ];
node14 -> node15 [label="<= 277" fontname= Arial ];
node15 -> node16 [label="<= 224.8" fontname= Arial ];
node15 -> node17 [label="> 224.8" fontname= Arial ];
node14 -> node18 [label="> 277" fontname= Arial ];
node18 -> node19 [label="> 151.9" fontname= Arial ];
node18 -> node20 [label="<= 151.9" fontname= Arial ];
node20 -> node21 [label="<= 123" fontname= Arial ];
node20 -> node22 [label="> 123" fontname= Arial ];
node1 -> node23 [label="<= 264.4" fontname= Arial ];
node23 -> node24 [label="> 3" fontname= Arial ];
node24 -> node25 [label="<= 160.2" fontname= Arial ];
node25 -> node26 [label="<= 19.83" fontname= Arial ];
node25 -> node27 [label="> 19.83" fontname= Arial ];
node27 -> node28 [label="<= 120.5" fontname= Arial ];
node27 -> node29 [label="> 120.5" fontname= Arial ];
node24 -> node30 [label="> 160.2" fontname= Arial ];
node30 -> node31 [label="> 12.05" fontname= Arial ];
node30 -> node32 [label="<= 12.05" fontname= Arial ];
node32 -> node33 [label="<= 125" fontname= Arial ];
node32 -> node34 [label="> 125" fontname= Arial ];
node23 -> node35 [label="<= 3" fontname= Arial ];
node35 -> node36 [label="= yes" fontname= Arial ];
node36 -> node37 [label="<= 2" fontname= Arial ];
node36 -> node38 [label="> 2" fontname= Arial ];
node38 -> node39 [label="<= 13.1" fontname= Arial ];
node38 -> node40 [label="> 13.1" fontname= Arial ];
node35 -> node41 [label="= no" fontname= Arial ];
node41 -> node42 [label="<= 223.2" fontname= Arial ];
node41 -> node43 [label="> 223.2" fontname= Arial ];
node43 -> node44 [label="<= 20.5" fontname= Arial ];
node43 -> node45 [label="> 20.5" fontname= Arial ];
node45 -> node46 [label="= yes" fontname= Arial ];
node45 -> node47 [label="= no" fontname= Arial ];
node47 -> node48 [label="> 174.2" fontname= Arial ];
node47 -> node49 [label="<= 174.2" fontname= Arial ];
node49 -> node50 [label="<= 246.6" fontname= Arial ];
node49 -> node51 [label="> 246.6" fontname= Arial ];
node51 -> node52 [label="<= 43.33" fontname= Arial ];
node51 -> node53 [label="> 43.33" fontname= Arial ]; }

References

None.

Friday, July 19, 2013

Interactive 4-dimensional contour plots


Contour plots are plots that draw a 3-D surface on a 2-D plot. They are characterized by the slices for the z-axis that are called contours. Hence, contour plots answer to the question 'how z is changing as a function of x and y'. They are often being used to plot results from experiments where x and y are two input variables and z is the output of the experiment.

R has many functions for calculating contours and displaying x,y plots. The most common is the contour function. Sometimes a good alternative, the filled.contour function can provide a clear understanding of the transition of z for the given x and y.

Although contour plots are very efficient in displaying the z variable given two input variables, there are cases where an experiment is repeated for different values over a set of values of a third variable. In these cases, there is a need to project the output w given the input values of x, y and z.

A simple solution is to create a series of contour plots, where each contour shows variable w in respect of x and y, for a given value of z (in the case dealt here, we require that z is available at distinct values/levels). Another solution, that has been proposed here, is to create a superimposed image of one standard (line) contourplot with a second one, color-filled. Although this solution addresses the fact that different measurements of w are shown for x, y and z, it is limited to using at maximum two different levels (two values) of z, since more levels reduce the plot clarity. Moreover, in some cases, the resulting plot from that method cannot be interpreted nicely, as it will be seen in the example that follows.

To address these issues, the proposed solution here is the use of a 3D plot where the contour plots are superimposed. In this way, the reader can visualize both the contour levels of each x, y, z input set, and the relative distance of each of the x, y surfaces for different values of z.

The main function presented here, contour4D,  uses a persp3d function to initially draw the first surface, and then superimposes the plot with the rest surfaces using the surface3d function. The input is two vectors x and y, and one list of vectors W containing z vectors of the w output for each combination of x and y, given a value/level of z. The last parameter is the vector size of z. The function also handles color adjustment to help distinguishing among the different surfaces.

library(rgl)
library(colorspace)
#---------------------------------------------------------#
# Function: contour4D                                     # 
# Version: 1.0.1                                          #
# Date: 19/07/13                                          #
# Author: Athanasios Tsakonas                             #
# Inputs:                                                 #
#    x, y: vectors of size N,M                            #
#    W: list of size z,containing vectors of size NxM     #
# Variables:                                              #
#    pal,lpal: palette vectors for surfaces, contour lines#
#    clines: contour lines of surfaces                    #
#---------------------------------------------------------#
contour4D <- function(x, y, W) {
  pal<-rainbow_hcl(length(W), start = 90, end = -30)
  lpal<-rainbow_hcl(length(W), start = 70, end = -50)

  persp3d(x,y,W[[1]], col=pal[1], alpha=0.5, axes=T, zlab='z')
  clines <- contourLines(x,y,W[[1]])
  for (i in 1:length(clines))
    with(clines[[i]], lines3d(x, y, level, col=lpal[1]))
  for (k in 2:length(W)) {
    surface3d(x,y,W[[k]], col=pal[k], alpha=0.5, axes=F)
    clines <- contourLines(x,y,W[[k]])
    for (i in 1:length(clines))
      with(clines[[i]], lines3d(x, y, level, col=lpal[k]))
  }
}


To demonstrate the function output we'll consider an artificial data problem. We first create some data in the required input form.

#declare functions to be used for producing some output;
#each of the functions is assumed to be output for a different z value
f1 <- function(x, y) (x^2 + y^2 + sin(x*y)+ 1.2)
f2 <- function(x, y) (x^2 + y^2 + sin(x*y)+ 0.7)
f3 <- function(x, y) (x^2 + y^2 + sin(x*y)+ 0.2)

#add some jittering to the data, for more realistic results
a1<-rnorm(15*15)
a2<-rnorm(15*15)
a3<-rnorm(15*15)
#generate some x, y
x <- seq(-1,1,len=15)
y <- seq(-1,1,len=15)

#now produce output for each of the z levels
w1 <- outer(x, y, f1) + 0.08*a1
w2 <- outer(x, y, f2) + 0.08*a2
w3 <- outer(x, y, f3) + 0.08*a3

#add w results to a list
W<-list()
W<-c(W,list(w1,w2,w3))


As it can be seen, we created three sets of x, y 'experiments', for three different levels of z. Practically, there is no limit on the sets we can include.The x,y contour plots for each z level are shown below:



We can try to superimpose e.g. the first two contour plots, using the code described here, but the result in our case (and in similar experiments, where the z value creates linear or semi-linear overall shifts to the output) cannot get easily interpreted, as seen below (we cannot easily percveive the relative distance between the two surfaces). Moreover, this method is practically limited to only two cases for z.

Alternatively, we can call the contour4D function, described above:

open3d()
contour4D(x,y,W)


The result is then shown in an interactive 3D window and it can be plot and seen from various perspectives.