library(forecast)
<- ets(fma::hsales)
fit plot(forecast(fit), include=120)
Visualization of probabilistic forecasts
This week my research group discussed Adrian Raftery’s recent paper on “Use and Communication of Probabilistic Forecasts” which provides a fascinating but brief survey of some of his work on modelling and communicating uncertain futures. Coincidentally, today I was also sent a copy of David Spiegelhalter’s paper on “Visualizing Uncertainty About the Future”. Both are well-worth reading.
It made me think about my own efforts to communicate future uncertainty through graphics. Of course, for time series forecasts I normally show prediction intervals. I prefer to use more than one interval at a time because it helps convey a little more information. The default in the forecast package for R is to show both an 80% and a 95% interval like this:
It is sometimes preferable to use a 50% and a 95% interval, rather like a boxplot:
plot(forecast(fit, level=c(50,95)), include=120)
In some circles (especially macroeconomic forecasting), fan charts are popular:
plot(forecast(fit,fan=TRUE),include=120)
Personally, I don’t like these at all as they lose any specific probabilistic interpretability. What does the darker shaded region actually refer to? At least in the previous version, it is clear that the dark region contains 50% of the probability.
For multi-modal distributions I like to use highest density regions. Here is an example applied to Nicholson’s blowfly data using a threshold model:
The dark region has 50% coverage and the light region has 95% coverage. The forecast distributions become bimodal after the first ten iterations, and so the 50% region is split in two to show that. This graphic was taken from a J. Forecasting paper I wrote in 1996, so these ideas have been around for a while!
It is easy enough to produce forecast HDR with time series. Here is some R code to do it:
#HDR for time series object
# Assumes that forecasts can be computed and futures simulated from object
<- function(object, h = ifelse(object$m > 1, 2 * object$m, 10),
forecasthdr nsim=2000, plot=TRUE, level=c(50,95), xlim=NULL, ylim=NULL, ...)
{require(hdrcde)
set.seed(2014)
# Compute forecasts
<- forecast(object)
fc <- time(fc$mean)
ft
# Simulate future sample paths
<- matrix(0,nrow=h,ncol=nsim)
sim for(i in 1:nsim)
<- simulate(object, nsim=h)
sim[,i]
# Compute HDRs
<- length(level)
nl <- array(0, c(h,nl,10))
hd <- numeric(h)
mode for(k in 1:h)
{<- hdr(sim[k,], prob=level)
hdtmp 1:ncol(hdtmp$hdr)] <- hdtmp$hdr
hd[k,,<- hdtmp$mode
mode[k]
}# Remove unnecessary sections of HDRs
<- apply(abs(hd),3,sum) > 0
nz <- hd[,,nz]
hd dimnames(hd)[[1]] <- 1:h
dimnames(hd)[[2]] <- level
# Produce plot if required
if(plot)
{if(is.null(xlim))
<- range(time(object$x),ft)
xlim if(is.null(ylim))
<- range(object$x, hd)
ylim plot(object$x,xlim=xlim, ylim=ylim, ...)
# Show HDRs
<- rev(colorspace::sequential_hcl(52))[level - 49]
cols for(k in 1:h)
{for(j in 1:nl)
{<- hd[k,j,]
hdtmp <- length(hdtmp)/2
nint for(l in 1:nint)
{polygon(ft[k]+c(-1,-1,1,1)/object$m/2,
c(hdtmp[2*l-1],hdtmp[2*l],hdtmp[2*l],hdtmp[2*l-1]),
col=cols[j], border=FALSE)
}
}points(ft[k], mode[k], pch=19, col="blue",cex=0.8)
}#lines(fc$mean,col='blue',lwd=2)
}
# Return HDRs
return(list(hdr=hd,mode=mode,level=level))
}
We can apply it using the example I started with:
<- forecasthdr(fit,xlim=c(1986,1998),nsim=5000,
z xlab="Year",ylab="US monthly housing sales")
The dots are modes of the forecast distributions, and the 50% and 95% highest density regions are also shown. In this case, the distributions are unimodal, and so all the regions are intervals.