High-dimensional time series analysis
Venue
Presenters
Course description
It is becoming increasingly common for organizations to collect huge amounts of data over time, and existing time series analysis tools are not always suitable to handle the scale and type of data collected. In this workshop, we will look at some new methods that have been developed to handle the analysis of large collections of time series.
We will explore feature-based visualizations and interactive visualizations, in order to explore time series data in high dimensions. A similar feature-based approach can be used to identify anomalous time series within a collection of time series. Finally, we will discuss how fast automatic forecasting algorithms, along with sparse forecast reconciliation, can allow millions of time series to be forecast in a relatively short time
Program
(approximate times)
- 09:00. Welcome
- 09:15. 1. Tidy time series analysis using tsibbles [Slides]
- 10:15. 2. Visualization of high-dimensional time series [Slides]
- 11:30. 3. A feature-based approach to time series analysis [Slides]
- 12:30. Lunch
- 13:30. 4. Automatic forecasting algorithms [Slides]
- 15:30. 5. Optimal forecast reconciliation [Slides]
- 16:45. Conclusion
Prerequisites
Participants should be familiar with the use of R, at least to the point where they can fit a linear regression model, and work with data frames.
Please bring your own laptop with a recent version of R and RStudio installed. The following code will install the main packages needed for the workshop.
install.packages("tidyverse")
install.packages("tsibble")
install.packages("tsibbledata")
install.packages("fabletools")
install.packages("feasts", repos = "https://tidyverts.org")
install.packages("fable", repos = "https://tidyverts.org")
Lab Sessions
Lab Session 1
- Download
tourism.xlsx
fromhttp://robjhyndman.com/data/tourism.xlsx
, and read it into R usingread_excel()
from thereadxl
package. - Create a tsibble which is identical to the
tourism
tsibble from thetsibble
package. - Find what combination of
Region
andPurpose
had the maximum number of overnight trips on average. - Create a new tsibble which combines the Purposes and Regions, and just has total trips by State.
Lab Session 2
- Create time plots of the following time series:
Beer
fromaus_production
,Lynx
frompelt
,Close
fromgafa_stock
- Use
help()
to find out about the data in each series. - For the last plot, modify the axis labels and title.
Lab Session 3
Look at the quarterly tourism data for the Snowy Mountains
<- filter(tourism, Region == "Snowy Mountains") snowy
- Use
autoplot()
,gg_season()
andgg_subseries()
to explore the data. - What do you learn?
Lab Session 4
Repeat the decomposition using
%>%
holidays STL(Trips ~ season(window=7) + trend(window=11)) %>%
autoplot()
What happens as you change season(window = ???)
and trend(window = ???)
?
Lab Session 5
- Use
GGally::ggpairs()
to look at the relationships between the STL-based features. You might wish to changeseasonal_peak_year
andseasonal_trough_year
to factors. - Which is the peak quarter for holidays in each state?
Lab Session 6
- Use a feature-based approach to look for outlying series in
PBS
. - What is unusual about these series?
Lab Session 7
Find an ETS model for the Gas data from aus_production
.
- Why is multiplicative seasonality necessary here?
- Experiment with making the trend damped.
Lab Session 8
For the United States GDP data (from global_economy
):
- Fit a suitable ARIMA model for the logged data.
- Produce forecasts of your fitted model. Do the forecasts look reasonable?
Lab Session 9
For the Australian tourism data (from tourism
):
- Fit a suitable ARIMA model for all data.
- Produce forecasts of your fitted models.
- Check the forecasts for the “Snowy Mountains” and “Melbourne” regions. Do they look reasonable?
Lab Session 10
- Prepare aggregations of the PBS data by Concession, Type, and ATC1.
- Use forecast reconciliation with the PBS data, using ETS, ARIMA and SNAIVE models, applied to all but the last 3 years of data.
- Which type of model works best?
- Does the reconciliation improve the forecast accuracy?
- Why doesn’t the reconcililation make any difference to the SNAIVE forecasts?