Forecast v7 (part 2)

As mentioned in my previous post on the forecast package v7, the most visible feature was the introduction of ggplot2 graphics. This post briefly summarizes the remaining new features of forecast v7.

library(forecast)
library(ggplot2)

 

tslm rewritten

The tslm function is designed to fit linear models to time series data. It is intended to approximately mimic lm (and calls lm to do the estimation), but to package the output to remember the ts attributes. It also handles some predictor variables automatically, notably trend and season. The re-write means that tslm now handles functions as predictors, including fourier.

deaths.lm  <- tslm(mdeaths ~ trend + fourier(mdeaths,3))
mdeaths.fcast <- forecast(deaths.lm,
    data.frame(fourier(mdeaths,3,36)))
autoplot(mdeaths.fcast)

Note that fourier now takes 3 arguments. The first is the series, which is only used to grab the seasonal period and the tsp attribute. The second argument K is the number of Fourier harmonics to compute. If the third argument h is NULL (the default), the function returns Fourier terms for the times of the historical observations. But if h is a positive integer, the function returns Fourier terms for the next h time periods after the end of the historical data.

The lm function has long allowed a matrix to be passed and independent linear models fitted to each column. The new tslm function also allows this now.

Bias adjustment for Box-Cox transformations

Almost all modelling and forecasting functions in the package allow Box-Cox transformations to be applied before the model is fitted, and for the forecasts to be back transformed. This will give median forecasts on the original scale, as I’ve explained before.

There is now an option to adjust the forecasts so they are means rather than medians, but setting biasadj=TRUE whenever the forecasts are computed. I will probably make this the default in some future version, but for now the default is biasadj=FALSE so the forecasts are actually medians.

library(fpp, quietly=TRUE)
fit <- ets(eggs, model="AAN", lambda=0)
fc1 <- forecast(fit, biasadj=TRUE, h=20, level=95)
fc2 <- forecast(fit, biasadj=FALSE, h=20)
cols <- c("Mean"="#0000ee","Median"="#ee0000")
autoplot(fc1) + ylab("Price") + xlab("Year") +
  geom_line(aes(x=time(fc1$mean),y=fc1$mean, colour="Mean")) +
  geom_line(aes(x=time(fc2$mean),y=fc2$mean, colour="Median")) +
  guides(fill=FALSE) +
  scale_colour_manual(name="Forecasts",values=cols)

unnamed-chunk-4-1

A new Ccf function

Cross-correlations can now be computed using Ccf, mimicing ccf except that the axes are more informative.

The Acf function now handles multivariate time series, with cross-correlation functions computed as well as the ACFs of each series.

Covariates in neural net AR models

The nnetar function allows neural networks to be applied to time series data by building a nonlinear autoregressive model. A new feature allows additional inputs to be included in the model.

Better subsetting of time series

subset.ts allows quite sophisticated subsetting of a time series. For example

plot(subset(gas,month="November"))

unnamed-chunk-5-1

subset(woolyrnq,quarter=3)
## Time Series:
## Start = 1965.5 
## End = 1994.5 
## Frequency = 1 
##  [1] 6633 6730 6946 6915 7190 7105 6840 7819 7045 5540 5906 5505 5318 5466
## [15] 5696 5341 5464 5129 5524 6080 6540 6339 6590 6077 5146 5127 5222 4954
## [29] 5309 6396

This is now substantially more robust than it used to be.

What’s next?

The next major release will probably be around the end of 2016. On the to-do list are:

  • In-sample multi-step fitted values. Currently fitted returns in-sample one-step forecasts. A new argument to fitted will allow multi-step forecasts of the training data.

  • Applying fitted models to new data sets. A related issue is to take an estimated model and apply it to some new data without re-estimating parameters. This is already possible with Arima and ets models. It will be extended to many more model types.

  • Better choice of seasonal differencing. Currently auto.arima does a pretty good job at finding the orders of a model, and the number of first-differences required, but it does not handle seasonal differences well. It often selects 0 differences, when I think it should select 1 difference. So I tend to over-ride the automatic choice with auto.arima(x, D=1). I will attempt to find some better tests of seasonal unit roots than those that are currently implemented.

  • Prediction intervals for NNAR forecasts. The forecasts obtained using a NNAR model (via the nnetar function) do not have prediction intervals because there is no underlying stochastic model on which to base them. However, there are ways of computing the uncertainty using simulation, and I hope to implement something like that for the next version.


Related Posts:


  • Jason

    This is amazing!! Thank you for all the hard work you put into the package! A question about the covariate nnetar change, is this now an alternative model to use instead of auto.arima with seasonal dummy variables (i.e. floating holidays like Easter)?

  • Mark Adamson

    I’ve updated our model to forecast v7 now, but it is now generating warnings now when I call forecast.Arima that it didn’t before with the same call and same data.

    Most are:
    Error in sqrt(z[[2L]] * object$sigma2) :
    (converted from warning) NaNs produced

    This happens within the predict function where it is calculating the standard error I think (even though I asked for level=0):

    se <- ts(sqrt(z[[2L]] * object$sigma2), start = xtsp[2L] + deltat(rsd), frequency = xtsp[3L])

    It's then getting this warning on the first line of ts when calling is.data.frame(data). I have checked and data are all NaNs, not sure if they are meant to be at this stage or not.

    My call to forecast is: forecast.Arima(newModel, xreg = futureHHXReg, level = 0)

    • Mark Adamson

      So I think it is because sigma2 is negative in the model so the sqrt then fails. I am using lambda=0 by the way

    • Please provide a reproducible example of the problem. Most likely the model is mis-specified with rank-deficient regressors or over-parameterized ARMA. The new version is picking up more problems with models than before.

      • Mark Adamson

        I’ve now emailed you a reproduction of the problem. Hope it’s ok for your needs. Thanks, Mark

  • Isabella Ghement

    Dear Professor Hyndman,

    Thank you for all of the improvements you have made to the forecast package. It is an extremely valuable package for applied practitioners of statistics and, unlike many other R packages, it has excellent documentation via your articles, blog and online forecasting book.

    I do have a question about the example provided in the help file for the fitted.Arima() function. The help file states that the function “Returns one-step forecasts for the data used in fitting the ARIMA model”. If the original time series takes values x1, x2, x3, … at equally spaced times t1, t2, t3, …, then are the fitted values supposed to be displaced by one time unit relative to the original time series values? In other words, when plotting the time series and fitted values on the same plot, are we supposed to plot x2, x3, …versus t2, t3, … and fitted1, fitted2, … versus t2, t3, …?
    Thank you in advance for your answer.

    • No. The time stamps are correct on the fitted values. fitted1 is the fitted value for t1, etc.

  • Elkin Tabares

    Could I perfomance cross validation in the nnetar function?. Thank you very much.

  • Ashin Mukherjee

    Dear Prof. Hyndman,

    Thanks for all the recent improvements to the forecast package. I had a question about seasonal models in auto.arima. I was wondering if you ever thought about adding an option to only search over seasonal models in auto.arima. I understand that it is quite straightforward to restrict the search space among non-seasonal models by setting the seasonal parameter to FALSE but could not get an easy way of doing the reverse. In my examples sometimes the data set is short maybe 3 to 4 years of bi-weekly data and the seasonal pattern only became prominent in the last couple of years so clearly not an ideal data set for box-jenkins type of model. auto.arima selects a high order (3,1,1) ARIMA model as the optimal choice which does fine in fitting but forecasts look rather off due to the absence of seasonal orders. But when I tried (0,1,1)x(0,1,1) the often quoted fall back model in econometric studies I get much nicer looking forecasts at the cost of slight increase in one-step ahead accuracy metrics. This example got me thinking whether it would be possible to force a seasonal model in auto.arima similar to what you have already done for non-seasonal models.

    I look forward to hear from you.

    Best regards,
    Ashin

    • I’d rather not do that as it can lead to poor models. But you can force D=1 to get seasonal differencing included. In general, if a seasonal model is not being selected, there is a reason for it. The purpose in setting seasonal=FALSE is to improve computational speed, not to over-ride a good model.

      • Ashin Mukherjee

        Thanks for the pointer I will try that out.