Exploratory time series analysis using R

Date

6 December 2022

Venue

WOMBAT 2022 (online)

 

Three hour workshop at WOMBAT 2022. 6 December 2022

Course description

Many organisations collect huge amounts of data over time, and we need time series analysis tools capable of handling the scale, frequency and structure of the data collected. In this workshop, we will look at some R packages and methods that have been developed to handle the analysis of large collections of time series. We will look at the tsibble data structure for flexibly managing collections of related time series, and consider how to do data wrangling, data visualisation, and exploratory data analysis to analyse time series data in high dimensions.

  • Session 1: How to wrangle time series data with familiar tidy tools.
  • Session 2: How to visualize the trend and seasonal patterns in individual time series.
  • Session 3: How to compute time series features and visualize large collections of time series.

Primary packages will be tsibble, lubridate and feasts (along with the tidyverse of course).

Prework

Attendees are expected to be familiar with R, and with the tidyverse collection of packages including dplyr and ggplot2. They will need to have R and RStudio installed on their own device, and have installed the fpp3 package.

People who don’t use R regularly, or don’t know the tidyverse packages, are recommended to do the tutorials at learnr.numbat.space beforehand.

Please ensure your computer has a recent version of R and RStudio installed. The following code will install the main packages needed for the workshop.

install.packages(c("tidyverse","fpp3","GGally"))

Slides

Lab Sessions

Lab Session 1

  1. Download tourism.xlsx from http://robjhyndman.com/data/tourism.xlsx, and read it into R using read_excel() from the readxl package.
  2. Create a tsibble which is identical to the tourism tsibble from the tsibble package.
  3. Find what combination of Region and Purpose had the maximum number of overnight trips on average.
  4. Create a new tsibble which combines the Purposes and Regions, and just has total trips by State.

Lab Session 2

Look at the quarterly tourism data for the Snowy Mountains

snowy <- tourism |> filter(Region == "Snowy Mountains")
  • Use autoplot(), gg_season() and gg_subseries() to explore the data.
  • What do you learn?

Lab Session 3

  1. Produce an STL decomposition of the Snowy Mountains data.
  2. Experiment with different values of the two window arguments.
  3. Plot the seasonally adjusted series.

Lab Session 4

  • Find the most seasonal time series in the tourism data.
  • Which state has the strongest trends?
  • Use a feature-based approach to look for outlying series in tourism.
  • What is unusual about the series you identify as outliers?