Date: 1921 October 2016
Location: Eurandom, Metaforum Building MF1112, Department of Mathematics and Computer Science, TU Eindhoven, the Netherlands
Prerequisites
Please bring your own laptop with a recent version of R installed, along with the following packages and their dependencies:
fpp
ggplot2
magrittr
readxl
thief
knitr
Participants will be assumed to be familiar with basic statistical tools such as multiple regression, but no knowledge of time series or forecasting will be assumed.
Reference
![alt text](/eindhoven/fppcoversmall.jpg)Need help with R?
 Base R cheatsheet
 Datacamp: Introduction to R course
 Help with Rmarkdown
 Rmarkdown cheatsheet [Dutch translation]
Program
Wednesday 19 October
09.30  10.00  Registration and welcome  Slides 
10.00  11.30  Time series and R, Time series graphics  Slides 
11.30  11.45  Break  
11.45  13.00  The forecaster's toolbox  Slides 
13.00  14.00  Lunch  
14.00  15.30  Seasonality and trends  Slides 
15.30  15.45  Break  
15.45  17.00  Exponential smoothing  Slides 
Thursday 20 October
10.00  11.30  ETS and state space models  Slides 
11.30  11.45  Break  
11.45  13.00  Transformations and differencing  Slides1 Slides2 
13.00  14.00  Lunch  
14.00  15.30  Nonseasonal ARIMA models  Slides 
15.30  15.45  Break  
15.45  17.00  Seasonal ARIMA models  Slides 
Friday 21 October
10.00  11.30  Time series crossvalidation, Dynamic regression  Slides1 Slides2 
11.30  11.45  Break  
11.45  13.00  Hierarchical forecasting  Slides 
13.00  14.00  Lunch  
14.00  15.30  VAR, TBATS and NNAR models  Slides 
15.30  15.45  Break  
15.45  16.30  Final comments, questions and discussion  Slides 
Lab sessions
Download the following two Rmarkdown files for recording your lab session exercises:
Use these to record your answers to all lab session exercises. Use Retail.Rmd
for all questions involving the retail data introduced in the first lab session, and the other file for the remaining exercises.
Lab Session 1

Download the monthly Australian retail data. These represent retail sales in various categories for different Australian states.

Read the data into R and choose one of the series. This time series will be used throughout the short course in lab sessions.
Please script this, don’t just use the Rstudio clickandpoint interface. That way you can save the results for easy replication later.
You will need the
read_excel
function from thereadxl
package:retaildata < readxl::read_excel("retail.xlsx", skip = 1) mytimeseries < ts(retaildata[["A3349873A"]], frequency=12, start=c(1982,4)) autoplot(mytimeseries)
[Replace the column name with your own chosen column.]
Lab Session 2
 The following graphics functions have been introduced:
```r
autoplot, ggseasonplot, ggmonthplot, gglagplot, ggAcf, ggtsdisplay
```
Explore your chosen retail time series using these functions. Can you spot any seasonality, cyclicity and trend? What do you learn about the series?
 Repeat for some of the following series:
```r
bicoal, chicken, dole, usdeaths, bricksq, lynx, ibmclose
```
Use the help files to find out what the series are.
Lab Session 3

Calculate the residuals from a seasonal naïve forecast applied to quarterly Australian beer production from 1992.
beer < window(ausbeer, start=1992) fc < snaive(beer) autoplot(fc) res < residuals(fc)

Test if the residuals are white noise.
```r
ggtsdisplay(res, plot.type='scatter')
Box.test(res, lag=8, fitdf=0, type="Lj")
```
What do you conclude?
 Check if the residuals are normally distributed.
```r
qplot(res)
```
What do you conclude?
Lab Session 4
For your retail time series:

Split the data into two parts using
x1 < window(mytimeseries, end=c(2010,12)) x2 < window(mytimeseries, start=2011)

Check that your data have been split appropriately by producing the following plot.
autoplot(cbind(Training=x1,Test=x2))

Calculate forecasts using
snaive
applied tox1
. 
Compare the accuracy of your forecasts against the actual values stored in
x2
.f1 < snaive(x1) accuracy(f1,x2)

Compute the residuals and plot them.
res < residuals(f1) ggtsdisplay(res) Box.test(res, lag=24, fitdf=0, type="Lj") qplot(res)
Do the residuals appear to be uncorrelated and normally distributed?

How sensitive are the accuracy measures to the training/test split?
Lab Session 5
We will use the bricksq
data for this exercise.
 Use an STL decomposition to calculate the trendcycle and seasonal indices. (Experiment with having fixed or changing seasonality.)
 Compute and plot the seasonally adjusted data.
 Use a naïve method to produce forecasts of the seasonally adjusted data.
 Reseasonalize the results to give forecasts on the original scale.
[Hint: use thestlf
function withmethod="naive"
.]  Do the residuals look uncorrelated?
 Repeat with a robust STL decomposition. Does it make much difference?
Lab Session 6

For this exercise, use the price of a dozen eggs in the United States from 1900–1993 (data set
eggs
). Use SES and Holt’s method (with and without damping) to forecast “future " data.
[Hint: useh=100
so you can clearly see the differences between the options when plotting the forecasts.]  Which method gives the best training RMSE?
 Are these RMSE values comparable?
 Do the residuals from the best fitting method look like white noise?
 Use SES and Holt’s method (with and without damping) to forecast “future " data.
Lab Session 7

Now we will apply HoltWinters' multiplicative method to your retail time series data.
 Why is multiplicative seasonality necessary here?
 Experiment with making the trend damped.
 Compare the RMSE of the onestep forecasts from the two methods. Which do you prefer?
 Check that the residuals from the best method look like white noise.

Now use the same training/test split you did in Lab Session 4 to test your HoltWinters' forecasts on the retail data. Can you beat the seasonal naïve approach from Lab Session 4?
Lab Session 8

Use
ets()
to find the best ETS model for your retail data. How does this model compare to the HoltWinters' method you applied in the previous lab session?
 What do the smoothing parameters tell you about the trend and seasonality?

Use
ets()
on some of these series:bicoal, chicken, dole, usdeaths, bricksq, lynx, ibmclose, eggs, bricksq, ausbeer
Does it always give good forecasts?

Find an example where it does not work well. Can you figure out why?
Lab Session 9

For the following series, find an appropriate BoxCox transformation in order to stabilize the variance.
usnetelec
usgdp
mcopper
enplanements

Why is a BoxCox transformation unhelpful for the
cangas
data? 
What BoxCox transformation would you select for your retail data?

For the same retail data, try an STL decomposition applied to the BoxCox transformed series, followed by ETS on the seasonally adjusted data. How does that compare with your best previous forecasts on the test set?
Lab Session 10

For the following series, find an appropriate differencing (after transformation if necessary) to obtain stationary data.
usnetelec
usgdp
mcopper
enplanements
visitors

For your retail data, find the appropriate order of differencing (after transformation if necessary) to obtain stationary data.
Lab Session 11
Select a time series from wmurders
, usgdp
and mcopper
.
 if necessary, find a suitable BoxCox transformation for the data;
 fit a suitable ARIMA model to the transformed data using
auto.arima()
;  try some other plausible models by experimenting with the orders chosen;
 choose what you think is the best model and check the residual diagnostics;
 produce forecasts of your fitted model. Do the forecasts look reasonable?
 compare the results with what you would obtain using
ets()
(with no transformation).
Lab Session 12

For your retail time series, develop an appropriate seasonal ARIMA model, and compare the forecasts with those you obtained earlier.

Obtain uptodate retail data from the ABS website (Cat. 8501.0, Table 11), and compare your forecasts with the actual numbers. How good were the forecasts from the various models?

Choose one of the following seasonal time series:
condmilk
,hsales
,uselec
 Do the data need transforming? If so, find a suitable transformation.
 Are the data stationary? If not, find an appropriate differencing which yields stationary data.
 Identify a couple of ARIMA models that might be useful in describing the time series. Which of your models is the best according to their AIC values?
 Estimate the parameters of your best model and do diagnostic testing on the residuals. Do the residuals resemble white noise? If not, try to find another ARIMA model which fits better.
 Forecast the next 24 months of data using your preferred model.
 Compare the forecasts obtained using
ets()
.
Lab Session 13
 For this exercise, we will continue to use your retail data.

Use
ets
to find the best model for these data and record the training set RMSE. 
We will now check how much larger the onestep RMSE is on outofsample data using time series crossvalidation. The following code will compute the result, beginning with four years of data in the training set. Replace
???
with the appropriate values for your ETS model.k < 48 # minimum size for training set n < length(mytimeseries) # Total number of observations e < mytimeseries*NA # Vector to record onestep forecast errors for(i in 48:(n1)) { train < ts(mytimeseries[1:i],freq=12) fit < ets(train, model=???, damped=???) fc < forecast(fit,h=1)$mean e[i] < mytimeseries[i+1]fc } sqrt(mean(e^2,na.rm=TRUE))

What would happen in the above loop if I had set
train < mytimeseries[1:i]
? 
Plot
e
usingggtsdisplay
. Do they look uncorrelated and homoscedastic? 
In practice, we will not know the best model on the whole data set until we observe all the data. So a more realistic analysis would be to allow
ets
to select a different model each time through the loop. Calculate the RMSE using this approach. (Warning: it will take a while as there are a lot of models to fit.) 
How do the RMSE values compare? Does the reselection of a model at each step make much difference?
Lab Session 14
For your retail time series:
 Develop an appropriate dynamic regression model with Fourier terms for the seasonality. Use the AIC to select the number of Fourier terms to include in the model. (You will probably need to use the same BoxCox transformation you identified previously.)
 Check the residuals of the fitted model. Does the residual series look like white noise?
 Compare the forecasts with those you obtained earlier using alternative models.
Lab Session 15
We will reconcile the forecasts for the infant deaths data. The following code can be used. Check that you understand what each step is doing.
library(hts)
plot(infantgts)
smatrix(infantgts)
# Forecast 10stepahead and reconcile the forecasts
infantforecast < forecast(infantgts, h=10)
# Plot the forecasts including the last ten historical years
plot(infantforecast, include=10)
# Create a matrix of all aggregated time series
allts_infant < aggts(infantgts)
# Forecast all series using ARIMA models
allf < matrix(, nrow=10, ncol=ncol(allts_infant))
for(i in 1:ncol(allts_infant))
allf[,i] < forecast(auto.arima(allts_infant[,i]), h=10)$mean
allf < ts(allf, start=2004)
# combine the forecasts with the group matrix to get a gts object
y.f < combinef(allf, groups = infantgts$groups)
# set up training and testing samples
data < window(infantgts, end=1993)
test < window(infantgts, start=1994)
# Compute forecasts on training data
forecast < forecast(data, h=10)
# calculate ME, RMSE, MAE, MAPE, MPE and MASE
accuracy.gts(forecast, test)

How would you measure overall forecast accuracy across the whole collection of time series?

Repeat the training/test set analysis using “bottomup” forecasts (set
method="bu"
in theforecast
function). 
Does the ``optimal'' reconciliation method work best here?
Lab Session 16
Use thief()
on your retail data, using ARIMA models for each aggregation. Does the temporal reconciliation lead to improved forecasts on your test set?
Lab Session 17

Use the
tbats
function to model your retail time series.(a) Check the residuals and produce forecasts. (b) Does this completely automated approach work for these data? (c) Have you saved any degrees of freedom by using Fourier terms rather than seasonal differencing?

The following code will read weekly data on US finished motor gasoline products supplied (thousands of barrels per day):
gas < read.csv("https://robjhyndman.com/data/gasoline.csv")[,1] gas < ts(gas, start=1991+31/365.25, frequency = 365.25/7)
(a) Fit a
tbats
model to these data. (b) Check the residuals and produce forecasts. (c) Could you model these data using any of the other methods we have considered in this course?
Lab Session 18
Experiment with using nnetar()
on your retail data and other data we have considered in previous lab sessions.
Lab Session 19
Over this shortcourse, you have developed several models for your Australian retail data. The last exercise is to use crossvalidation to objectively compare the models you have developed. Compute crossvalidated RMSE values for each of the time series models you have considered. It will take some time to run, so perhaps leave it running overnight and check the results tomorrow.