It is becoming increasingly common for organizations to collect huge amounts of data over time, and existing time series analysis tools are not always suitable to handle the scale and type of data collected. In this workshop, we will look at some new methods that have been developed to handle the analysis of large collections of time series.
We will explore feature-based visualizations and interactive visualizations, in order to explore time series data in high dimensions. A similar feature-based approach can be used to identify anomalous time series within a collection of time series. Finally, we will discuss how fast automatic forecasting algorithms, along with sparse forecast reconciliation, can allow millions of time series to be forecast in a relatively short time
- 09:00. Welcome
- 09:15. 1. Tidy time series analysis using tsibbles [Slides]
- 10:15. 2. Visualization of high-dimensional time series [Slides]
- 11:30. 3. A feature-based approach to time series analysis [Slides]
- 12:30. Lunch
- 13:30. 4. Automatic forecasting algorithms [Slides]
- 15:30. 5. Optimal forecast reconciliation [Slides]
- 16:45. Conclusion
Participants should be familiar with the use of R, at least to the point where they can fit a linear regression model, and work with data frames.
Please bring your own laptop with a recent version of R and RStudio installed. The following code will install the main packages needed for the workshop.
install.packages("tidyverse") install.packages("tsibble") install.packages("tsibbledata") install.packages("fabletools") install.packages("feasts", repos = "https://tidyverts.org") install.packages("fable", repos = "https://tidyverts.org")
Lab Session 1
http://robjhyndman.com/data/tourism.xlsx, and read it into R using
- Create a tsibble which is identical to the
tourismtsibble from the
- Find what combination of
Purposehad the maximum number of overnight trips on average.
- Create a new tsibble which combines the Purposes and Regions, and just has total trips by State.
Lab Session 2
- Create time plots of the following time series:
help()to find out about the data in each series.
- For the last plot, modify the axis labels and title.
Lab Session 3
Look at the quarterly tourism data for the Snowy Mountains
snowy <- filter(tourism, Region == "Snowy Mountains")
gg_subseries()to explore the data.
- What do you learn?
Lab Session 4
Repeat the decomposition using
holidays %>% STL(Trips ~ season(window=7) + trend(window=11)) %>% autoplot()
What happens as you change
season(window = ???) and
trend(window = ???)?
Lab Session 5
GGally::ggpairs()to look at the relationships between the STL-based features. You might wish to change
- Which is the peak quarter for holidays in each state?
Lab Session 6
- Use a feature-based approach to look for outlying series in
- What is unusual about these series?
Lab Session 7
Find an ETS model for the Gas data from
- Why is multiplicative seasonality necessary here?
- Experiment with making the trend damped.
Lab Session 8
For the United States GDP data (from
- Fit a suitable ARIMA model for the logged data.
- Produce forecasts of your fitted model. Do the forecasts look reasonable?
Lab Session 9
For the Australian tourism data (from
- Fit a suitable ARIMA model for all data.
- Produce forecasts of your fitted models.
- Check the forecasts for the “Snowy Mountains” and “Melbourne” regions. Do they look reasonable?
Lab Session 10
- Prepare aggregations of the PBS data by Concession, Type, and ATC1.
- Use forecast reconciliation with the PBS data, using ETS, ARIMA and SNAIVE models, applied to all but the last 3 years of data.
- Which type of model works best?
- Does the reconciliation improve the forecast accuracy?
- Why doesn’t the reconcililation make any difference to the SNAIVE forecasts?