forecast 8.0

In what is now a roughly annual event, the forecast package has been updated on CRAN with a new version, this time 8.0.

A few of the more important new features are described below.

Check residuals

A common task when building forecasting models is to check that the residuals satisfy some assumptions (that they are uncorrelated, normally distributed, etc.). The new function checkresiduals makes this very easy: it produces a time plot, an ACF, a histogram with super-imposed normal curve, and does a Ljung-Box test on the residuals with appropriate number of lags and degrees of freedom.

fit <- auto.arima(WWWusage)
checkresiduals(fit)

## 
##  Ljung-Box test
## 
## data:  residuals
## Q* = 7.8338, df = 8, p-value = 0.4499
## 
## Model df: 2.   Total lags used: 10

This should work for all the modelling functions in the package, as well as some of the time series modelling functions in the stats package.

Different types of residuals

Usually, residuals are computed as the difference between observations and the corresponding one-step forecasts. But for some models, residuals are computed differently; for example, a multiplicative ETS model or a model with a Box-Cox transformation. So the residuals() function now has an additional argument to deal with this situation.

“Innovation residuals”” correspond to the white noise process that drives the evolution of the time series model. “Response residuals” are the difference between the observations and the fitted values (as with GLMs). For homoscedastic models, the innovation residuals and the one-step response residuals are identical. “Regression residuals” are also available for regression models with ARIMA errors, and are equal to the original data minus the effect of the regression variables. If there are no regression variables, the errors will be identical to the original series (possibly adjusted to have zero mean).

library(ggplot2)
fit <- ets(woolyrnq)
res <- cbind(Residuals = residuals(fit), 
             Response.residuals = residuals(fit, type='response'))
autoplot(res, facets=TRUE)

Some new graphs

The geom_histogram() function in the ggplot2 package is nice, but it does not have a good default binwidth. So I added the gghistogram function which provides a quick histogram with good defaults. You can also overlay a normal density curve or a kernel density estimate.

gghistogram(lynx)

The ggseasonplot function is useful for studying seasonal patterns and how they change over time. It now has a polar argument to create graphs like this.

ggseasonplot(USAccDeaths, polar=TRUE)

I often want to add a time series line to an existing plot. Base graphics has line() which works well when a time series is passed as an argument. So I added autolayer which is similar (but more general). It is an S3 method like autoplot, and adds a layer to an existing ggplot object. autolayer will eventually form part of the next release of ggplot2, but for now it is available in the forecast package. There are methods provided for ts and forecast objects:

WWWusage %>% ets %>% forecast(h=20) -> fc
autoplot(WWWusage, series="Data") + 
  autolayer(fc, series="Forecast") + 
  autolayer(fitted(fc), series="Fitted")

Cross-validation

The tsCV and CVar functions have been added. These were discussed in a previous post.

Bagged ETS

The baggedETS function has been added, which implements the procedure discussed in Bergmeir et al (2016) for bagging ETS forecasts.

head and tail of time series

I’ve long found it annoying that head and tail do not work on multiple time series. So I added some functions to the package so they now work.

Imports and Dependencies

The pipe operator from the magrittr package is now imported. So you don’t need to load the magrittr package to use it.

There are now no packages that are loaded with forecast – everything required is imported. This makes the start up much cleaner (no more annoying messages from all those packages being loaded). Instead, some random tips are occasionally printed when you load the forecast package (much like ggplot2 does).

There is quite a bit more — see the Changelog for a list.

forecast v7 and ggplot2 graphics

Version 7 of the forecast package was released on CRAN about a month ago, but I'm only just getting around to posting about the new features.

The most visible feature was the introduction of ggplot2 graphics. I first wrote the forecast package before ggplot2 existed, and so only base graphics were available. But I figured it was time to modernize and use the nice features available from ggplot2. The following examples illustrate the main new graphical functionality.

For illustration purposes, I'm using the male and female monthly deaths from lung diseases in the UK.

Continue reading →

Plotting overlapping prediction intervals

I often see figures with two sets of prediction intervals plotted on the same graph using different line types to distinguish them. The results are almost always unreadable. A better way to do this is to use semi-transparent shaded regions. Here is an example showing two sets of forecasts for the Nile River flow.

library(forecast)
 
f1 = forecast(auto.arima(Nile, lambda=0), h=20, level=95)
f2 = forecast(ets(Nile), h=20, level=95)
 
plot(f1, shadecol=rgb(0,0,1,.4), flwd=1,
     main="Forecasts of Nile River flow",
     xlab="Year", ylab="Billions of cubic metres")
polygon(c(time(f2$mean),rev(time(f2$mean))),
        c(f2$lower,rev(f2$upper)),
        col=rgb(1,0,0,.4),
        border=FALSE)
lines(f2$mean, col=rgb(.7,0,0))
legend("bottomleft", 
       fill=c(rgb(0,0,1,.4),rgb(1,0,0,.4)),
       col=c("blue","red"), lty=1,
       legend=c("ARIMA","ETS"))

NileRiverFlow

The blue region shows 95% prediction intervals for the ARIMA forecasts, while the red region shows 95% prediction intervals for the ETS forecasts. Where they overlap, the colors blend to make purple. In this case, the point forecasts are quite close, but the prediction intervals are relatively different.

Mathematical annotations on R plots

I’ve always struggled with using plotmath via the expression function in R for adding mathematical notation to axes or legends. For some reason, the most obvious way to write something never seems to work for me and I end up using trial and error in a loop with far too many iterations.

So I am very happy to see the new latex2exp package available which translates LaTeX expressions into a form suitable for R graphs. This is going to save me time and frustration! Continue reading →

Murphy diagrams in R

At the recent International Symposium on Forecasting, held in Riverside, California, Tillman Gneiting gave a great talk on “Evaluating forecasts: why proper scoring rules and consistent scoring functions matter”. It will be the subject of an IJF invited paper in due course.

One of the things he talked about was the “Murphy diagram” for comparing forecasts, as proposed in Ehm et al (2015). Here’s how it works for comparing mean forecasts. Continue reading →

Statistical modelling and analysis of big data

I’m currently attending the one day workshop on this topic at QUT in Brisbane. This morning I spoke on “Visualizing and forecasting big time series data”. My slides are here.

The talks are being streamed.

OVERVIEW

Big data is now endemic in business, industry, government, environmental management, medical science, social research and so on. One of the commensurate challenges is how to effectively model and analyse these data.

This workshop will bring together national and international experts in statistical modelling and analysis of big data, to share their experiences, approaches and opinions about future directions in this field.