> ddply(ChickWeight, .(Diet), summarize, weight=mean(weight), Time=median(Time))
The above function will show the mean weight and median time recorded for the 50 chickens of the dataset.
However, there are cases when the a-priori knowledge of the data frame structure is not known and some columns might be missing. This can be the common case when the ddply command is used inside a user function. In such a case, a ddply command with a fixed column specification will fail. Suppose the following:
> ChickWeight2 <- ChickWeight[,-2]
> ddply(ChickWeight2, .(Diet), summarize, weight=mean(weight), Time=median(Time)) #this will fail
The above ddply command will fail, returning an 'object 'Time' not found' error. to deal with such situations I include hereinafter a solution. The code handles optionally the existence of potential columns and their desired aggregate function.
First we set up one dataset (complete or with missing columns):
>#create a dataset with all or fewer columns, choose one of these lines
>ChickWeight2 <- ChickWeight[,-2] #missing column 'Time'
># OR
>ChickWeight2 <- ChickWeight #includes column 'Time'
Then we apply the following commands, constructing a list parameter for a call to ddply.
>plyrlist <- .() #initialize an empty list
>if (length(colnames(ChickWeight2)[which(colnames(ChickWeight2)=='weight')])>0) plyrlist <- (c(plyrlist , .(weight = mean( weight)))) #add optional term, conditional on column existence
>if (length(colnames(ChickWeight2)[which(colnames(ChickWeight2)=='weight')])>0) plyrlist <- (c(plyrlist , .(weight = mean( weight)))) #add optional term, conditional on column existence
>if (length(colnames(ChickWeight2)[which(colnames(ChickWeight2)=='Time')])>0) plyrlist <- (c(plyrlist , .(Time= median(Time)))) #add optional term, conditional on column existence
>do.call(ddply,c(.data = quote(ChickWeight2), .variables = 'Diet',.fun = quote(summarize), plyrlist)) #call to ddply, this works, irrespective the existence of Time
This approach can be expanded to handle as many columns as desired.