Introduction
In many cases, the user of C50 R-package wants to present the resulted decision tree in some graphical form. Such a plotting feature however is not available in the C50 package. This R code overcomes the plotting issue by providing automatic interpretation of the decision tree to GraphViz design language, enabling then to straightforwardly plot the tree model using the dot command of GraphViz. In case the user has not a local installation GraphViz (open source software), they may freely acquire it from this site.
The steps to plot the decision tree are the following:
- Generate your model using the C50 package in R, as usual.
- Interpret the model output and save it to a file, by calling C5.0.graphviz function (given later in this page), using as parameters your C5.0 model and a desired text output filename.
- In your operating system, call the GraphViz's dot command with proper parameter syntax (example given below).
Usage
C5.0.graphviz ( C5.0.model, filename, fontname ='Arial', col.draw ='black', col.font ='blue', col.conclusion ='lightpink', col.question = 'grey78', shape.conclusion ='box3d', shape.question ='diamond', bool.substitute = c('None', 'yesno','truefalse','TF'), prefix=FALSE, vertical=TRUE )
Arguments
C5.0.model The name of a variable which is a valid C5.0 model result.
filename The name of a file where the output GraphViz model will be saved to.
fontname The font that will be used for the graph.
col.draw The color of the drawing lines.
col.font The color of the font.
col.conclusion The color which will be used for conclusion nodes (tree leaves).
col.question The color which will be used for question nodes (tree inner nodes).
shape.conclusion The shape which will be used for conclusion nodes (tree leaves).
shape.question The shape which will be used for question nodes (tree inner nodes).
bool.substitute A substitution may take place for boolean comparisons. Default is 'None' which will plot '= 0' and '= 1' on the respective decision tree branches. The option 'yesno', will plot 'no' and 'yes' respectively, the 'truefalse' will plot 'false' and 'true' and the 'TF' option will plot '.F.' and '.T.' (considering the value of 0 as 'false' and 1 as 'true').
prefix When set to true, the class nodes will have the prefix 'Class' before the class number (useful to multiclass problems where the class is referred by a single number).
vertical The orientation of the decision tree. Default is vertical, and if this is set to False, the tree is drawn from left to right.
Details
In GraphViz, the X11 color scheme, the SVG scheme, and the Brewer scheme are supported, with X11 being the default. For an exhaustive list of candidate colors, the user can check the respective GraphViz page here.
Some information on GraphViz's fonts can be found here.
Value
If successful, the function will create a text file at the given directory, containing the decision tree model described in GraphViz's dot language.
Note
In respect of boolean substitutions using the bool.substitute parameter, it is noted that the routine in this version is not able to know whether the comparison is indeed of boolean nature, neither traces other boolean comparisons with arithmetic arguments (e.g. between '1' and '2'), i.e. the comparison is performed between '0' and '1' (both required, as inner node options, for a successful substitution).
Update: Version 2 extends the translation into multi-branched trees. Version 1 was able to handle only trees with binary splits.
Version 2.2 corrects a missing initialization of the firstindent variable.
Update: Version 2 extends the translation into multi-branched trees. Version 1 was able to handle only trees with binary splits.
Version 2.2 corrects a missing initialization of the firstindent variable.
Example
We use the example from the C50 package, a data set from the MLC++ machine learning software for modeling customer churn.
library(C50)
data(churn)
treeModel <- C5.0(x = churnTrain[, -20], y = churnTrain$churn)
summary(treeModel) #to compare output
C5.0.graphviz(treeModel, 'c:\\dtreeout.txt', col.question ='cyan')
(The generated output of the C5.0.graphviz routine, contained in the dtreeout.txt file is shown in the Appendix, at the end of this page).
Then, in the operating system, we ensure we have access to dot command of the GraphViz package (having the directory either in path, or having navigated to the respective directory) and we enter the following command (here, presumed from a WINDOWS command prompt):
dot -Tpng c:\dtreeout.txt > c:\dtreeout.png
This command produces a graphic file named 'dtreeout.png'. This file, includes the following graph, which depicts graphically the example decision tree (not shown in actual size here):
Code
#---------------------------------------------------------#
# Function: C5.0.graphviz #
# Version: 2.2.0 #
# Date: 26/09/2014 #
# Author: Athanasios Tsakonas #
# Version: 2.2.0 #
# Date: 26/09/2014 #
# Author: Athanasios Tsakonas #
# This code implements C5.0.graphviz conversion routine #
#---------------------------------------------------------#
C5.0.graphviz <- function( C5.0.model, filename, fontname ='Arial',col.draw ='black',
col.font ='blue',col.conclusion ='lightpink',col.question = 'grey78',
shape.conclusion ='box3d',shape.question ='diamond',
bool.substitute = 'None', prefix=FALSE, vertical=TRUE ) {
library(cwhmisc)
library(stringr)
treeout <- C5.0.model$output
treeout<- substr(treeout, cpos(treeout, 'Decision tree:', start=1)+14,nchar(treeout))
treeout<- substr(treeout, 1,cpos(treeout, 'Evaluation on training data', start=1)-2)
variables <- data.frame(matrix(nrow=500, ncol=4))
names(variables) <- c('SYMBOL','TOKEN', 'TYPE' , 'QUERY')
connectors <- data.frame(matrix(nrow=500, ncol=3))
names(connectors) <- c('TOKEN', 'START','END')
theStack <- data.frame(matrix(nrow=500, ncol=1))
names(theStack) <- c('ITEM')
theStackIndex <- 1
currentvar <- 1
currentcon <- 1
open_connection <- TRUE
previousindent <- -1
firstindent <- 4
substitutes <- data.frame(None=c('= 0','= 1'), yesno=c('no','yes'),
truefalse=c('false', 'true'),TF=c('F','T'))
dtreestring<-unlist( scan(text= treeout, sep='\n', what =list('character')))
for (linecount in c(1:length(dtreestring))) {
lineindent<-0
shortstring <- str_trim(dtreestring[linecount], side='left')
leadingspaces <- nchar(dtreestring[linecount]) - nchar(shortstring)
lineindent <- leadingspaces/4
dtreestring[linecount]<-str_trim(dtreestring[linecount], side='left')
while (!is.na(cpos(dtreestring[linecount], ': ', start=1)) ) {
lineindent<-lineindent + 1
dtreestring[linecount]<-substr(dtreestring[linecount],
ifelse(is.na(cpos(dtreestring[linecount], ': ', start=1)), 1,
cpos(dtreestring[linecount], ': ', start=1)+4),
nchar(dtreestring[linecount]) )
shortstring <- str_trim(dtreestring[linecount], side='left')
leadingspaces <- nchar(dtreestring[linecount]) - nchar(shortstring)
lineindent <- lineindent + leadingspaces/4
dtreestring[linecount]<-str_trim(dtreestring[linecount], side='left')
}
if (!is.na(cpos(dtreestring[linecount], ':...', start=1)))
lineindent<- lineindent + 1
dtreestring[linecount]<-substr(dtreestring[linecount],
ifelse(is.na(cpos(dtreestring[linecount], ':...', start=1)), 1,
cpos(dtreestring[linecount], ':...', start=1)+4),
nchar(dtreestring[linecount]) )
dtreestring[linecount]<-str_trim(dtreestring[linecount])
stringlist <- strsplit(dtreestring[linecount],'\\:')
stringpart <- strsplit(unlist(stringlist)[1],'\\s')
if (open_connection==TRUE) {
variables[currentvar,'TOKEN'] <- unlist(stringpart)[1]
variables[currentvar,'SYMBOL'] <- paste('node',as.character(currentvar), sep='')
variables[currentvar,'TYPE'] <- shape.question
variables[currentvar,'QUERY'] <- 1
theStack[theStackIndex,'ITEM']<-variables[currentvar,'SYMBOL']
theStack[theStackIndex,'INDENT'] <-firstindent
theStackIndex<-theStackIndex+1
currentvar <- currentvar + 1
if(currentvar>2) {
connectors[currentcon - 1,'END'] <- variables[currentvar - 1, 'SYMBOL']
}
}
connectors[currentcon,'TOKEN'] <- paste(unlist(stringpart)[2],unlist(stringpart)[3])
if (connectors[currentcon,'TOKEN']=='= 0')
connectors[currentcon,'TOKEN'] <- as.character(substitutes[1,bool.substitute])
if (connectors[currentcon,'TOKEN']=='= 1')
connectors[currentcon,'TOKEN'] <- as.character(substitutes[2,bool.substitute])
if (open_connection==TRUE) {
if (lineindent<previousindent) {
theStackIndex <- theStackIndex-(( previousindent- lineindent) +1 )
currentsymbol <-theStack[theStackIndex,'ITEM']
} else
currentsymbol <-variables[currentvar - 1,'SYMBOL']
} else {
currentsymbol <-theStack[theStackIndex-((previousindent -lineindent ) +1 ),'ITEM']
theStackIndex <- theStackIndex-(( previousindent- lineindent) )
}
connectors[currentcon, 'START'] <- currentsymbol
currentcon <- currentcon + 1
open_connection <- TRUE
if (length(unlist(stringlist))==2) {
stringpart2 <- strsplit(unlist(stringlist)[2],'\\s')
variables[currentvar,'TOKEN'] <- paste(ifelse((prefix==FALSE),'','Class'), unlist(stringpart2)[2])
variables[currentvar,'SYMBOL'] <- paste('node',as.character(currentvar), sep='')
variables[currentvar,'TYPE'] <- shape.conclusion
variables[currentvar,'QUERY'] <- 0
currentvar <- currentvar + 1
connectors[currentcon - 1,'END'] <- variables[currentvar - 1,'SYMBOL']
open_connection <- FALSE
}
previousindent<-lineindent
}
runningstring <- paste('digraph g {', 'graph ', sep='\n')
runningstring <- paste(runningstring, ' [rankdir="', sep='')
runningstring <- paste(runningstring, ifelse(vertical==TRUE,'TB','LR'), sep='' )
runningstring <- paste(runningstring, '"]', sep='')
for (lines in c(1:(currentvar-1))) {
runningline <- paste(variables[lines,'SYMBOL'], '[shape="')
runningline <- paste(runningline,variables[lines,'TYPE'], sep='' )
runningline <- paste(runningline,'" label ="', sep='' )
runningline <- paste(runningline,variables[lines,'TOKEN'], sep='' )
runningline <- paste(runningline,
'" style=filled fontcolor=', sep='')
runningline <- paste(runningline, col.font)
runningline <- paste(runningline,' color=' )
runningline <- paste(runningline, col.draw)
runningline <- paste(runningline,' fontname=')
runningline <- paste(runningline, fontname)
runningline <- paste(runningline,' fillcolor=')
runningline <- paste(runningline,
ifelse(variables[lines,'QUERY']== 0 ,col.conclusion,col.question))
runningline <- paste(runningline,'];')
runningstring <- paste(runningstring, runningline , sep='\n')
}
for (lines in c(1:(currentcon-1))) {
runningline <- paste (connectors[lines,'START'], '->')
runningline <- paste (runningline, connectors[lines,'END'])
runningline <- paste (runningline,'[label="')
runningline <- paste (runningline,connectors[lines,'TOKEN'], sep='')
runningline <- paste (runningline,'" fontname=', sep='')
runningline <- paste (runningline, fontname)
runningline <- paste (runningline,'];')
runningstring <- paste(runningstring, runningline , sep='\n')
}
runningstring <- paste(runningstring,'}')
cat(runningstring)
sink(filename, split=TRUE)
cat(runningstring)
sink()
}
col.font ='blue',col.conclusion ='lightpink',col.question = 'grey78',
shape.conclusion ='box3d',shape.question ='diamond',
bool.substitute = 'None', prefix=FALSE, vertical=TRUE ) {
library(cwhmisc)
library(stringr)
treeout <- C5.0.model$output
treeout<- substr(treeout, cpos(treeout, 'Decision tree:', start=1)+14,nchar(treeout))
treeout<- substr(treeout, 1,cpos(treeout, 'Evaluation on training data', start=1)-2)
variables <- data.frame(matrix(nrow=500, ncol=4))
names(variables) <- c('SYMBOL','TOKEN', 'TYPE' , 'QUERY')
connectors <- data.frame(matrix(nrow=500, ncol=3))
names(connectors) <- c('TOKEN', 'START','END')
theStack <- data.frame(matrix(nrow=500, ncol=1))
names(theStack) <- c('ITEM')
theStackIndex <- 1
currentvar <- 1
currentcon <- 1
open_connection <- TRUE
previousindent <- -1
firstindent <- 4
substitutes <- data.frame(None=c('= 0','= 1'), yesno=c('no','yes'),
truefalse=c('false', 'true'),TF=c('F','T'))
dtreestring<-unlist( scan(text= treeout, sep='\n', what =list('character')))
for (linecount in c(1:length(dtreestring))) {
lineindent<-0
shortstring <- str_trim(dtreestring[linecount], side='left')
leadingspaces <- nchar(dtreestring[linecount]) - nchar(shortstring)
lineindent <- leadingspaces/4
dtreestring[linecount]<-str_trim(dtreestring[linecount], side='left')
while (!is.na(cpos(dtreestring[linecount], ': ', start=1)) ) {
lineindent<-lineindent + 1
dtreestring[linecount]<-substr(dtreestring[linecount],
ifelse(is.na(cpos(dtreestring[linecount], ': ', start=1)), 1,
cpos(dtreestring[linecount], ': ', start=1)+4),
nchar(dtreestring[linecount]) )
shortstring <- str_trim(dtreestring[linecount], side='left')
leadingspaces <- nchar(dtreestring[linecount]) - nchar(shortstring)
lineindent <- lineindent + leadingspaces/4
dtreestring[linecount]<-str_trim(dtreestring[linecount], side='left')
}
if (!is.na(cpos(dtreestring[linecount], ':...', start=1)))
lineindent<- lineindent + 1
dtreestring[linecount]<-substr(dtreestring[linecount],
ifelse(is.na(cpos(dtreestring[linecount], ':...', start=1)), 1,
cpos(dtreestring[linecount], ':...', start=1)+4),
nchar(dtreestring[linecount]) )
dtreestring[linecount]<-str_trim(dtreestring[linecount])
stringlist <- strsplit(dtreestring[linecount],'\\:')
stringpart <- strsplit(unlist(stringlist)[1],'\\s')
if (open_connection==TRUE) {
variables[currentvar,'TOKEN'] <- unlist(stringpart)[1]
variables[currentvar,'SYMBOL'] <- paste('node',as.character(currentvar), sep='')
variables[currentvar,'TYPE'] <- shape.question
variables[currentvar,'QUERY'] <- 1
theStack[theStackIndex,'ITEM']<-variables[currentvar,'SYMBOL']
theStack[theStackIndex,'INDENT'] <-firstindent
theStackIndex<-theStackIndex+1
currentvar <- currentvar + 1
if(currentvar>2) {
connectors[currentcon - 1,'END'] <- variables[currentvar - 1, 'SYMBOL']
}
}
connectors[currentcon,'TOKEN'] <- paste(unlist(stringpart)[2],unlist(stringpart)[3])
if (connectors[currentcon,'TOKEN']=='= 0')
connectors[currentcon,'TOKEN'] <- as.character(substitutes[1,bool.substitute])
if (connectors[currentcon,'TOKEN']=='= 1')
connectors[currentcon,'TOKEN'] <- as.character(substitutes[2,bool.substitute])
if (open_connection==TRUE) {
if (lineindent<previousindent) {
theStackIndex <- theStackIndex-(( previousindent- lineindent) +1 )
currentsymbol <-theStack[theStackIndex,'ITEM']
} else
currentsymbol <-variables[currentvar - 1,'SYMBOL']
} else {
currentsymbol <-theStack[theStackIndex-((previousindent -lineindent ) +1 ),'ITEM']
theStackIndex <- theStackIndex-(( previousindent- lineindent) )
}
connectors[currentcon, 'START'] <- currentsymbol
currentcon <- currentcon + 1
open_connection <- TRUE
if (length(unlist(stringlist))==2) {
stringpart2 <- strsplit(unlist(stringlist)[2],'\\s')
variables[currentvar,'TOKEN'] <- paste(ifelse((prefix==FALSE),'','Class'), unlist(stringpart2)[2])
variables[currentvar,'SYMBOL'] <- paste('node',as.character(currentvar), sep='')
variables[currentvar,'TYPE'] <- shape.conclusion
variables[currentvar,'QUERY'] <- 0
currentvar <- currentvar + 1
connectors[currentcon - 1,'END'] <- variables[currentvar - 1,'SYMBOL']
open_connection <- FALSE
}
previousindent<-lineindent
}
runningstring <- paste('digraph g {', 'graph ', sep='\n')
runningstring <- paste(runningstring, ' [rankdir="', sep='')
runningstring <- paste(runningstring, ifelse(vertical==TRUE,'TB','LR'), sep='' )
runningstring <- paste(runningstring, '"]', sep='')
for (lines in c(1:(currentvar-1))) {
runningline <- paste(variables[lines,'SYMBOL'], '[shape="')
runningline <- paste(runningline,variables[lines,'TYPE'], sep='' )
runningline <- paste(runningline,'" label ="', sep='' )
runningline <- paste(runningline,variables[lines,'TOKEN'], sep='' )
runningline <- paste(runningline,
'" style=filled fontcolor=', sep='')
runningline <- paste(runningline, col.font)
runningline <- paste(runningline,' color=' )
runningline <- paste(runningline, col.draw)
runningline <- paste(runningline,' fontname=')
runningline <- paste(runningline, fontname)
runningline <- paste(runningline,' fillcolor=')
runningline <- paste(runningline,
ifelse(variables[lines,'QUERY']== 0 ,col.conclusion,col.question))
runningline <- paste(runningline,'];')
runningstring <- paste(runningstring, runningline , sep='\n')
}
for (lines in c(1:(currentcon-1))) {
runningline <- paste (connectors[lines,'START'], '->')
runningline <- paste (runningline, connectors[lines,'END'])
runningline <- paste (runningline,'[label="')
runningline <- paste (runningline,connectors[lines,'TOKEN'], sep='')
runningline <- paste (runningline,'" fontname=', sep='')
runningline <- paste (runningline, fontname)
runningline <- paste (runningline,'];')
runningstring <- paste(runningstring, runningline , sep='\n')
}
runningstring <- paste(runningstring,'}')
cat(runningstring)
sink(filename, split=TRUE)
cat(runningstring)
sink()
}
Appendix
The example decision tree as shown in the C50 package summary and the generated output by the C5.0.graphviz routine are shown below.
C50 summary output tree:
total_day_minutes > 264.4:
:...voice_mail_plan = yes:
: :...international_plan = no: no (45/1)
: : international_plan = yes: yes (8/3)
: voice_mail_plan = no:
: :...total_eve_minutes > 187.7:
: :...total_night_minutes > 126.9: yes (94/1)
: : total_night_minutes <= 126.9:
: : :...total_day_minutes <= 277: no (4)
: : total_day_minutes > 277: yes (3)
: total_eve_minutes <= 187.7:
: :...total_eve_charge <= 12.26: no (15/1)
: total_eve_charge > 12.26:
: :...total_day_minutes <= 277:
: :...total_night_minutes <= 224.8: no (13)
: : total_night_minutes > 224.8: yes (5/1)
: total_day_minutes > 277:
: :...total_night_minutes > 151.9: yes (18)
: total_night_minutes <= 151.9:
: :...account_length <= 123: no (4)
: account_length > 123: yes (2)
total_day_minutes <= 264.4:
:...number_customer_service_calls > 3:
:...total_day_minutes <= 160.2:
: :...total_eve_charge <= 19.83: yes (79/3)
: : total_eve_charge > 19.83:
: : :...total_day_minutes <= 120.5: yes (10)
: : total_day_minutes > 120.5: no (13/3)
: total_day_minutes > 160.2:
: :...total_eve_charge > 12.05: no (130/24)
: total_eve_charge <= 12.05:
: :...total_eve_calls <= 125: yes (16/2)
: total_eve_calls > 125: no (3)
number_customer_service_calls <= 3:
:...international_plan = yes:
:...total_intl_calls <= 2: yes (51)
: total_intl_calls > 2:
: :...total_intl_minutes <= 13.1: no (173/7)
: total_intl_minutes > 13.1: yes (43)
international_plan = no:
:...total_day_minutes <= 223.2: no (2221/60)
total_day_minutes > 223.2:
:...total_eve_charge <= 20.5: no (295/22)
total_eve_charge > 20.5:
:...voice_mail_plan = yes: no (20)
voice_mail_plan = no:
:...total_night_minutes > 174.2: yes (50/8)
total_night_minutes <= 174.2:
:...total_day_minutes <= 246.6: no (12)
total_day_minutes > 246.6:
:...total_day_charge <= 43.33: yes (4)
total_day_charge > 43.33: no (2)
Produced C5.0.graphviz dot description:
digraph g {
graph [rankdir="TB"]
node1 [shape="diamond" label ="total_day_minutes" style=filled fontcolor= blue color= black fontname= Arial fillcolor= cyan ];
node2 [shape="diamond" label ="voice_mail_plan" style=filled fontcolor= blue color= black fontname= Arial fillcolor= cyan ];
node3 [shape="diamond" label ="international_plan" style=filled fontcolor= blue color= black fontname= Arial fillcolor= cyan ];
node4 [shape="box3d" label =" no" style=filled fontcolor= blue color= black fontname= Arial fillcolor= lightpink ];
node5 [shape="box3d" label =" yes" style=filled fontcolor= blue color= black fontname= Arial fillcolor= lightpink ];
node6 [shape="diamond" label ="total_eve_minutes" style=filled fontcolor= blue color= black fontname= Arial fillcolor= cyan ];
node7 [shape="diamond" label ="total_night_minutes" style=filled fontcolor= blue color= black fontname= Arial fillcolor= cyan ];
node8 [shape="box3d" label =" yes" style=filled fontcolor= blue color= black fontname= Arial fillcolor= lightpink ];
node9 [shape="diamond" label ="total_day_minutes" style=filled fontcolor= blue color= black fontname= Arial fillcolor= cyan ];
node10 [shape="box3d" label =" no" style=filled fontcolor= blue color= black fontname= Arial fillcolor= lightpink ];
node11 [shape="box3d" label =" yes" style=filled fontcolor= blue color= black fontname= Arial fillcolor= lightpink ];
node12 [shape="diamond" label ="total_eve_charge" style=filled fontcolor= blue color= black fontname= Arial fillcolor= cyan ];
node13 [shape="box3d" label =" no" style=filled fontcolor= blue color= black fontname= Arial fillcolor= lightpink ];
node14 [shape="diamond" label ="total_day_minutes" style=filled fontcolor= blue color= black fontname= Arial fillcolor= cyan ];
node15 [shape="diamond" label ="total_night_minutes" style=filled fontcolor= blue color= black fontname= Arial fillcolor= cyan ];
node16 [shape="box3d" label =" no" style=filled fontcolor= blue color= black fontname= Arial fillcolor= lightpink ];
node17 [shape="box3d" label =" yes" style=filled fontcolor= blue color= black fontname= Arial fillcolor= lightpink ];
node18 [shape="diamond" label ="total_night_minutes" style=filled fontcolor= blue color= black fontname= Arial fillcolor= cyan ];
node19 [shape="box3d" label =" yes" style=filled fontcolor= blue color= black fontname= Arial fillcolor= lightpink ];
node20 [shape="diamond" label ="account_length" style=filled fontcolor= blue color= black fontname= Arial fillcolor= cyan ];
node21 [shape="box3d" label =" no" style=filled fontcolor= blue color= black fontname= Arial fillcolor= lightpink ];
node22 [shape="box3d" label =" yes" style=filled fontcolor= blue color= black fontname= Arial fillcolor= lightpink ];
node23 [shape="diamond" label ="number_customer_service_calls" style=filled fontcolor= blue color= black fontname= Arial fillcolor= cyan ];
node24 [shape="diamond" label ="total_day_minutes" style=filled fontcolor= blue color= black fontname= Arial fillcolor= cyan ];
node25 [shape="diamond" label ="total_eve_charge" style=filled fontcolor= blue color= black fontname= Arial fillcolor= cyan ];
node26 [shape="box3d" label =" yes" style=filled fontcolor= blue color= black fontname= Arial fillcolor= lightpink ];
node27 [shape="diamond" label ="total_day_minutes" style=filled fontcolor= blue color= black fontname= Arial fillcolor= cyan ];
node28 [shape="box3d" label =" yes" style=filled fontcolor= blue color= black fontname= Arial fillcolor= lightpink ];
node29 [shape="box3d" label =" no" style=filled fontcolor= blue color= black fontname= Arial fillcolor= lightpink ];
node30 [shape="diamond" label ="total_eve_charge" style=filled fontcolor= blue color= black fontname= Arial fillcolor= cyan ];
node31 [shape="box3d" label =" no" style=filled fontcolor= blue color= black fontname= Arial fillcolor= lightpink ];
node32 [shape="diamond" label ="total_eve_calls" style=filled fontcolor= blue color= black fontname= Arial fillcolor= cyan ];
node33 [shape="box3d" label =" yes" style=filled fontcolor= blue color= black fontname= Arial fillcolor= lightpink ];
node34 [shape="box3d" label =" no" style=filled fontcolor= blue color= black fontname= Arial fillcolor= lightpink ];
node35 [shape="diamond" label ="international_plan" style=filled fontcolor= blue color= black fontname= Arial fillcolor= cyan ];
node36 [shape="diamond" label ="total_intl_calls" style=filled fontcolor= blue color= black fontname= Arial fillcolor= cyan ];
node37 [shape="box3d" label =" yes" style=filled fontcolor= blue color= black fontname= Arial fillcolor= lightpink ];
node38 [shape="diamond" label ="total_intl_minutes" style=filled fontcolor= blue color= black fontname= Arial fillcolor= cyan ];
node39 [shape="box3d" label =" no" style=filled fontcolor= blue color= black fontname= Arial fillcolor= lightpink ];
node40 [shape="box3d" label =" yes" style=filled fontcolor= blue color= black fontname= Arial fillcolor= lightpink ];
node41 [shape="diamond" label ="total_day_minutes" style=filled fontcolor= blue color= black fontname= Arial fillcolor= cyan ];
node42 [shape="box3d" label =" no" style=filled fontcolor= blue color= black fontname= Arial fillcolor= lightpink ];
node43 [shape="diamond" label ="total_eve_charge" style=filled fontcolor= blue color= black fontname= Arial fillcolor= cyan ];
node44 [shape="box3d" label =" no" style=filled fontcolor= blue color= black fontname= Arial fillcolor= lightpink ];
node45 [shape="diamond" label ="voice_mail_plan" style=filled fontcolor= blue color= black fontname= Arial fillcolor= cyan ];
node46 [shape="box3d" label =" no" style=filled fontcolor= blue color= black fontname= Arial fillcolor= lightpink ];
node47 [shape="diamond" label ="total_night_minutes" style=filled fontcolor= blue color= black fontname= Arial fillcolor= cyan ];
node48 [shape="box3d" label =" yes" style=filled fontcolor= blue color= black fontname= Arial fillcolor= lightpink ];
node49 [shape="diamond" label ="total_day_minutes" style=filled fontcolor= blue color= black fontname= Arial fillcolor= cyan ];
node50 [shape="box3d" label =" no" style=filled fontcolor= blue color= black fontname= Arial fillcolor= lightpink ];
node51 [shape="diamond" label ="total_day_charge" style=filled fontcolor= blue color= black fontname= Arial fillcolor= cyan ];
node52 [shape="box3d" label =" yes" style=filled fontcolor= blue color= black fontname= Arial fillcolor= lightpink ];
node53 [shape="box3d" label =" no" style=filled fontcolor= blue color= black fontname= Arial fillcolor= lightpink ];
node1 -> node2 [label="> 264.4" fontname= Arial ];
node2 -> node3 [label="= yes" fontname= Arial ];
node3 -> node4 [label="= no" fontname= Arial ];
node3 -> node5 [label="= yes" fontname= Arial ];
node2 -> node6 [label="= no" fontname= Arial ];
node6 -> node7 [label="> 187.7" fontname= Arial ];
node7 -> node8 [label="> 126.9" fontname= Arial ];
node7 -> node9 [label="<= 126.9" fontname= Arial ];
node9 -> node10 [label="<= 277" fontname= Arial ];
node9 -> node11 [label="> 277" fontname= Arial ];
node6 -> node12 [label="<= 187.7" fontname= Arial ];
node12 -> node13 [label="<= 12.26" fontname= Arial ];
node12 -> node14 [label="> 12.26" fontname= Arial ];
node14 -> node15 [label="<= 277" fontname= Arial ];
node15 -> node16 [label="<= 224.8" fontname= Arial ];
node15 -> node17 [label="> 224.8" fontname= Arial ];
node14 -> node18 [label="> 277" fontname= Arial ];
node18 -> node19 [label="> 151.9" fontname= Arial ];
node18 -> node20 [label="<= 151.9" fontname= Arial ];
node20 -> node21 [label="<= 123" fontname= Arial ];
node20 -> node22 [label="> 123" fontname= Arial ];
node1 -> node23 [label="<= 264.4" fontname= Arial ];
node23 -> node24 [label="> 3" fontname= Arial ];
node24 -> node25 [label="<= 160.2" fontname= Arial ];
node25 -> node26 [label="<= 19.83" fontname= Arial ];
node25 -> node27 [label="> 19.83" fontname= Arial ];
node27 -> node28 [label="<= 120.5" fontname= Arial ];
node27 -> node29 [label="> 120.5" fontname= Arial ];
node24 -> node30 [label="> 160.2" fontname= Arial ];
node30 -> node31 [label="> 12.05" fontname= Arial ];
node30 -> node32 [label="<= 12.05" fontname= Arial ];
node32 -> node33 [label="<= 125" fontname= Arial ];
node32 -> node34 [label="> 125" fontname= Arial ];
node23 -> node35 [label="<= 3" fontname= Arial ];
node35 -> node36 [label="= yes" fontname= Arial ];
node36 -> node37 [label="<= 2" fontname= Arial ];
node36 -> node38 [label="> 2" fontname= Arial ];
node38 -> node39 [label="<= 13.1" fontname= Arial ];
node38 -> node40 [label="> 13.1" fontname= Arial ];
node35 -> node41 [label="= no" fontname= Arial ];
node41 -> node42 [label="<= 223.2" fontname= Arial ];
node41 -> node43 [label="> 223.2" fontname= Arial ];
node43 -> node44 [label="<= 20.5" fontname= Arial ];
node43 -> node45 [label="> 20.5" fontname= Arial ];
node45 -> node46 [label="= yes" fontname= Arial ];
node45 -> node47 [label="= no" fontname= Arial ];
node47 -> node48 [label="> 174.2" fontname= Arial ];
node47 -> node49 [label="<= 174.2" fontname= Arial ];
node49 -> node50 [label="<= 246.6" fontname= Arial ];
node49 -> node51 [label="> 246.6" fontname= Arial ];
node51 -> node52 [label="<= 43.33" fontname= Arial ];
node51 -> node53 [label="> 43.33" fontname= Arial ]; }
References
None.
Hi,
ReplyDeleteThanks for great post above.
Any chance you could let me know how hard it would be to add stats for 1/0-outcomes in the graph? (Such as in the pic here: http://exploringdatablog.blogspot.se/2013/04/classification-tree-models.html )
NOT that I'm asking you to do it, only asking how heavy lifting you believe this would be to implement in your code above....
Thanks again,
Matti
Hello,
ReplyDeleteSince the statistics are already contained in the decision tree (e.g. line 3 of the decision tree above reads: : :...international_plan = no: no (45/1)) it's rather straightforward to extend the program including this value as well.
Think of extending the code line above that reads:
variables[currentvar,'TOKEN'] <- unlist(stringpart)[1]
Hope it helped,
Thanos
Thanks Thanos.
DeleteI switched this line;
variables[currentvar,'TOKEN'] <- paste(ifelse((prefix==FALSE),'','Class'), unlist(stringpart2)[2])
To this;
variables[currentvar,'TOKEN'] <- paste(ifelse((prefix==FALSE),'','Class'), unlist(stringpart2)[2], " ", unlist(stringpart2)[3], sep = "") ##Added From [2] to [3]
This comment has been removed by the author.
ReplyDeleteThis comment has been removed by the author.
ReplyDeleteThis comment has been removed by a blog administrator.
ReplyDeleteHi,
ReplyDeleteI am getting following error:
Read 342 records
Hide Traceback
Rerun with Debug
Error in `[<-.data.frame`(`*tmp*`, currentcon, "START", value = c("node1", :
replacement has 499 rows, data has 1
4 stop(sprintf(ngettext(N, "replacement has %d row, data has %d",
"replacement has %d rows, data has %d"), N, n), domain = NA)
3 `[<-.data.frame`(`*tmp*`, currentcon, "START", value = c("node1",
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, ...
2 `[<-`(`*tmp*`, currentcon, "START", value = c("node1", NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, ...
1 C5.0.graphviz(decision_tree, "c:\\output.txt", col.question = "cyan")
I am having the same problem. any answer on that?
DeletePlease make sure you follow all the steps as described. F.ex., the input to the C5.0.graphviz command should be a complete model of C5.0 (e.g. not just the part of the C5.0 tree).
DeleteHello, I get the same error, what could be the cause?
DeleteHi,
ReplyDeleteI am new to R and when I am running the code
C5.0.graphviz(model1, 'C:\\mydotfile.txt')
I am getting the error "could not find function "C5.0.graphviz"
I installed graphviz 2.38 msi as well
Hi I am getting the below error...
ReplyDeleteC5.0.graphviz(Model_C50, "dtreeout.txt", col.question ='cyan')
Read 3001 records
Error in `[<-.data.frame`(`*tmp*`, currentcon, "START", value = character(0)) :
replacement has length zero
Please help
Hi,
DeleteTry to increase the model size, in the following lines, eg.:
variables <- data.frame(matrix(nrow=2000, ncol=4))
connectors <- data.frame(matrix(nrow=2000, ncol=3))
theStack <- data.frame(matrix(nrow=2000, ncol=1))
Also, ensure you enter the complete model as an argument, not only the decision tree part.
I have installed graphviz but still getting this error:
ReplyDelete"could not find function "C5.0.graphviz"
is there any configuration I need to do after instalation?
Hi,
DeleteYou have to follow these steps:
1. Open R, load the C5.0.graphviz code (or type it within R environment).
2. Run your C5.0 model.
3. Execute the C5.0.graphviz with proper parameters.
4. Find your generated output in your OS directory.
5. Execute the graphviz command (as stated above).
Note that latest C5.0 versions in R offer a plotting function. That function although not as versatile as a generic graphviz command, offers a complete information for the C5.0 tree in the plot.
I get this error:-
ReplyDeleteError in if (start + lsub1 > lstr) return(NA) else { :
missing value where TRUE/FALSE needed
This post is really great. If you use Mac, it is enough to use the R system command: system("dot -T png -O ~/directoryPath/c5.txt")
ReplyDeleteThis comment has been removed by a blog administrator.
ReplyDeleteThis comment has been removed by a blog administrator.
ReplyDeleteThis comment has been removed by a blog administrator.
ReplyDeleteThis comment has been removed by a blog administrator.
ReplyDeleteThis comment has been removed by a blog administrator.
ReplyDelete