Time Series in R: Forecasting and Visualisation

29 May 2017

This is a one-day workshop given as part of the Melbourne Data Science Week.

Date: 29 May 2017
Presenters: Rob J Hyndman and Earo Wang
Location: KPMG, Tower Two, Collins Square, 727 Collins St, Melbourne

Prerequisites

Please bring your own laptop with a recent version of R installed, along with the following packages and their dependencies:

  • devtools
  • fpp2
  • knitr
  • plotly
  • shiny
  • tidyverse

Participants will be assumed to be familiar with basic statistical tools such as multiple regression, but no knowledge of time series or forecasting will be assumed.

Reference

alt text

Program

08.30 - 09.00 Registration and welcome Slides
09.00 - 10.30 Time series and R, Time series graphics
Lab Sessions 1-2
Slides
10.30 - 11.00 Morning tea
11.00 - 12.30 Visualising temporal data
Lab Sessions 3-4
Slides
12.30 - 13.30 Lunch
13.30 - 15.00 Some automatic forecasting algorithms
Lab Sessions 5-6
Slides
15.00 - 15.30 Afternoon tea
15.30 - 16.45 Forecast evaluation
Lab Sessions 7-8
Slides
16.45 - 17:00 Wrap up Slides

Lab sessions

Lab Session 1

  1. Download the Retail.Rmd file. This will be used for all analysis of the retail data.

  2. Download the monthly Australian retail data. These represent retail sales in various categories for different Australian states.

  3. Read the data into R and choose one of the series. This time series will be used throughout the workshop in lab sessions 1–2, and 5–10.

    Please script this, don’t just use the Rstudio click-and-point interface. That way you can save the results for easy replication later.

    You will need the read_excel function from the readxl package:

    retaildata <- readxl::read_excel("retail.xlsx", skip = 1)
    mytimeseries <- ts(retaildata[["A3349873A"]], frequency=12, start=c(1982,4))
    autoplot(mytimeseries)

    [Replace the column name with your own chosen column.]

Lab Session 2

The following graphics functions have been introduced:

autoplot, ggseasonplot, ggmonthplot, gglagplot, ggAcf, ggtsdisplay
  1. Explore your chosen retail time series using these functions.
  2. Can you spot any seasonality, cyclicity and trend?
  3. What do you learn about the series?

Lab Session 3

Download the Rmd file for this lab session.

  1. Download the billboard data. The billboard dataset contains the date a song first entered the Billboard Top 100 in 2000 and its rank over 76 weeks.
  2. Read the dataset into R and take a look at the data.
  3. Transform the data to the long data form named as billboard_long.
  4. [Bonus] Split the billboard_long to two separate datasets as song and rank. The song data will include artist, track, time and a new column called id assigning a unique identifier for each song. The rank data will include the id, date, week, rank columns. The id column is the key variable that maintains the linking between two datasets.

Lab Session 4

Download the Rmd file for this lab session.

  1. Download the weather data.
  2. Read the dataset into R and tidy it up for visualising with ggplot2 later.
  3. Write some ggplot2 code to reproduce the plot shown on the slides.

Lab Session 5

  1. Use ets() to find the best ETS model for your retail data.
- What does the model choice tell you about the data?
- What do the smoothing parameters tell you about the trend and seasonality?
- Do the forecasts look reasonable?
  1. Obtain up-to-date retail data from the ABS website (Cat. 8501.0, Table 11), and compare your forecasts with the actual numbers. How good were the forecasts from the various models?

Lab Session 6

We will now fit an ARIMA model for your retail data.

  1. What Box-Cox transformation would you select to stabilize the variance?

  2. Use auto.arima to obtain a seasonal ARIMA model, and compare the forecasts with those you obtained earlier, and with the latest retail data.

  3. Experiment with different Box-Cox transformations to see their effect on the chosen model and forecasts.

Lab Session 7

For your retail time series:

  1. Use the accuracy function to compare the forecasts obtained from your ETS and ARIMA models. Which is giving the best forecasts?

  2. Repeat with forecasts obtained using stlf (with the same Box-Cox transformation as you used for the ARIMA model).

  3. Repeat with forecasts obtained using snaive (there’s no need for a transformation).

  4. Which approach gives the best forecasts?

Lab Session 8

  1. Use ets to find the best model for your retail data and record the training set MAPE.
  2. We will now check how much larger the MAPE is on out-of-sample data using time series cross-validation. The following code will compute the result. Replace ??? with the appropriate values for your ETS model.
```r
fets <- function(x, h, model="ZZZ", damped=NULL, ...) {
  forecast(ets(x, model=model, damped=damped), h=h)
}
e <- tsCV(mytimeseries, fets, model=???, damped=???)
pe <- 100*e/mytimeseries
sqrt(mean(pe^2, na.rm=TRUE))
```
  1. Plot pe using autoplot and ggAcf. Do they look uncorrelated and homoskedastic?

  2. In practice, we will not know the best model on the whole data set until we observe all the data. So a more realistic analysis would be to allow ets to select a different model each time through the loop. Calculate the MAPE using this approach. (Warning: there a lot of models to fit, so this will take a while.)

  3. How do the MAPE values compare? Does the re-selection of a model at each step make much difference?


« Follow-up Forecasting Forum | Probabilistic outlier detection and visualization of smart metre data »