Last October I gave a 3-day masterclass on “Forecasting with R” in Eindhoven, Netherlands. There is a follow-up event planned for Tuesday 18 April 2017. It is particularly designed for people who attended the 3-day class, but if anyone else wants to attend they would be welcome.
In what is now a roughly annual event, the forecast package has been updated on CRAN with a new version, this time 8.0.
A few of the more important new features are described below.
A common task when building forecasting models is to check that the residuals satisfy some assumptions (that they are uncorrelated, normally distributed, etc.). The new function
checkresiduals makes this very easy: it produces a time plot, an ACF, a histogram with super-imposed normal curve, and does a Ljung-Box test on the residuals with appropriate number of lags and degrees of freedom.
fit <- auto.arima(WWWusage) checkresiduals(fit)
## ## Ljung-Box test ## ## data: residuals ## Q* = 7.8338, df = 8, p-value = 0.4499 ## ## Model df: 2. Total lags used: 10
This should work for all the modelling functions in the package, as well as some of the time series modelling functions in the
Different types of residuals
Usually, residuals are computed as the difference between observations and the corresponding one-step forecasts. But for some models, residuals are computed differently; for example, a multiplicative ETS model or a model with a Box-Cox transformation. So the
residuals() function now has an additional argument to deal with this situation.
“Innovation residuals”” correspond to the white noise process that drives the evolution of the time series model. “Response residuals” are the difference between the observations and the fitted values (as with GLMs). For homoscedastic models, the innovation residuals and the one-step response residuals are identical. “Regression residuals” are also available for regression models with ARIMA errors, and are equal to the original data minus the effect of the regression variables. If there are no regression variables, the errors will be identical to the original series (possibly adjusted to have zero mean).
library(ggplot2) fit <- ets(woolyrnq) res <- cbind(Residuals = residuals(fit), Response.residuals = residuals(fit, type='response')) autoplot(res, facets=TRUE)
Some new graphs
geom_histogram() function in the
ggplot2 package is nice, but it does not have a good default binwidth. So I added the
gghistogram function which provides a quick histogram with good defaults. You can also overlay a normal density curve or a kernel density estimate.
ggseasonplot function is useful for studying seasonal patterns and how they change over time. It now has a
polar argument to create graphs like this.
I often want to add a time series line to an existing plot. Base graphics has
line() which works well when a time series is passed as an argument. So I added
autolayer which is similar (but more general). It is an S3 method like
autoplot, and adds a layer to an existing
autolayer will eventually form part of the next release of
ggplot2, but for now it is available in the
forecast package. There are methods provided for
WWWusage %>% ets %>% forecast(h=20) -> fc autoplot(WWWusage, series="Data") + autolayer(fc, series="Forecast") + autolayer(fitted(fc), series="Fitted")
CVar functions have been added. These were discussed in a previous post.
baggedETS function has been added, which implements the procedure discussed in Bergmeir et al (2016) for bagging ETS forecasts.
head and tail of time series
I’ve long found it annoying that
tail do not work on multiple time series. So I added some functions to the package so they now work.
Imports and Dependencies
The pipe operator from the
magrittr package is now imported. So you don’t need to load the
magrittr package to use it.
There are now no packages that are loaded with
forecast – everything required is imported. This makes the start up much cleaner (no more annoying messages from all those packages being loaded). Instead, some random tips are occasionally printed when you load the forecast package (much like
There is quite a bit more — see the Changelog for a list.
AusMacroData is a new website that encourages and facilitates the use of quantitative, publicly available Australian macroeconomic data. The Australian Macro Database hosted at ausmacrodata.org provides a user-friendly front end for searching among over 40000 economic variables and is loosely based on similar international sites such as the Federal Reserve Economic Database (FRED). Continue reading →
From my email today
You use an illustration of a seasonal arima model:
I would like to simulate data from this process then fit a model… but I am unable to find any information as to how this can be conducted… if I set phi1, Phi1, theta1, and Theta1 it would be reassuring that for large n the parameters returned by
Arima(foo,order=c(1,1,1),seasonal=c(1,1,1))are in agreement…
arima.sim() won’t handle seasonal ARIMA models. I wrote
simulate.Arima() to handle them, but it is designed to simulate from a fitted model rather than a specified model. However, you can use the following code to do it. It first “estimates” an ARIMA model with specified coefficients. Then simulates from it.
library(forecast) model <- Arima(ts(rnorm(100),freq=4), order=c(1,1,1), seasonal=c(1,1,1), fixed=c(phi=0.5, theta=-0.4, Phi=0.3, Theta=-0.2)) foo <- simulate(model, nsim=1000) fit <- Arima(foo, order=c(1,1,1), seasonal=c(1,1,1))
The data used in the tourism forecasting competition, discussed in Athanasopoulos et al (2011), have been made available in the Tcomp package for R. The objects are of the same format as for Mcomp package containing data from the M1 and M3 competitions.
I’m currently in the Netherlands for a few weeks, and I’ll be giving a seminar at the Data Science Centre in Eindhoven next Wednesday afternoon on “Visualization of big time series data”. Details follow. Continue reading →
A common problem is to forecast the aggregate of several time periods of data, using a model fitted to the disaggregated data. For example, you may have monthly data but wish to forecast the total for the next year. Or you may have weekly data, and want to forecast the total for the next four weeks.
If the point forecasts are means, then adding them up will give a good estimate of the total. But prediction intervals are more tricky due to the correlations between forecast errors.
I’ve pushed a minor update to the forecast package to CRAN. Some highlights are listed here.
I have a new R package available to do temporal hierarchical forecasting, based on my paper with George Athanasopoulos, Nikolaos Kourentzes and Fotios Petropoulos. (Guess the odd guy out there!)
It is called “thief” – an acronym for Temporal HIErarchical Forecasting. The idea is to take a seasonal time series, and compute all possible temporal aggregations that result in an integer number of observations per year. For example, a quarterly time series is aggregated to biannual and annual; while a monthly time series is aggregated to 2-monthly, quarterly, 4-monthly, biannual and annual. Each of the resulting time series are forecast, and then the forecasts are reconciled using the hierarchical reconciliation algorithm described in our paper.
It turns out that this tends to give better forecasts, even though no new information has been added, especially for time series with long seasonal periods. It also allows different types of forecasts for different forecast horizons to be combined in a consistent manner.