Exploratory time series analysis using R
Three hour workshop at WOMBAT 2022. 6 December 2022
Course description
Many organisations collect huge amounts of data over time, and we need time series analysis tools capable of handling the scale, frequency and structure of the data collected. In this workshop, we will look at some R packages and methods that have been developed to handle the analysis of large collections of time series. We will look at the tsibble data structure for flexibly managing collections of related time series, and consider how to do data wrangling, data visualisation, and exploratory data analysis to analyse time series data in high dimensions.
- Session 1: How to wrangle time series data with familiar tidy tools.
- Session 2: How to visualize the trend and seasonal patterns in individual time series.
- Session 3: How to compute time series features and visualize large collections of time series.
Primary packages will be tsibble, lubridate and feasts (along with the tidyverse of course).
Prework
Attendees are expected to be familiar with R, and with the tidyverse collection of packages including dplyr and ggplot2. They will need to have R and RStudio installed on their own device, and have installed the fpp3 package.
People who don’t use R regularly, or don’t know the tidyverse packages, are recommended to do the tutorials at learnr.numbat.space beforehand.
Please ensure your computer has a recent version of R and RStudio installed. The following code will install the main packages needed for the workshop.
install.packages(c("tidyverse","fpp3","GGally"))
Slides
Lab Sessions
Lab Session 1
- Download
tourism.xlsx
fromhttp://robjhyndman.com/data/tourism.xlsx
, and read it into R usingread_excel()
from thereadxl
package. - Create a tsibble which is identical to the
tourism
tsibble from thetsibble
package. - Find what combination of
Region
andPurpose
had the maximum number of overnight trips on average. - Create a new tsibble which combines the Purposes and Regions, and just has total trips by State.
Lab Session 2
Look at the quarterly tourism data for the Snowy Mountains
<- tourism |> filter(Region == "Snowy Mountains") snowy
- Use
autoplot()
,gg_season()
andgg_subseries()
to explore the data. - What do you learn?
Lab Session 3
- Produce an STL decomposition of the Snowy Mountains data.
- Experiment with different values of the two
window
arguments. - Plot the seasonally adjusted series.
Lab Session 4
- Find the most seasonal time series in the tourism data.
- Which state has the strongest trends?
- Use a feature-based approach to look for outlying series in
tourism
. - What is unusual about the series you identify as outliers?