```
dgoog <- gafa_stock |>
filter(Symbol == "GOOG", year(Date) >= 2018) |>
mutate(trading_day = row_number()) |>
update_tsibble(index=trading_day, regular=TRUE) |>
mutate(diff = difference(Close))
```

- Google Scholar
- Semantic Scholar
- Drag this link to your bookmarks to allow access to a paper via your Monash account.

- Quarto
- Quarto thesis template or use the monash package.

It is becoming increasingly common for organizations to collect large numbers of related time series, and existing time series analysis tools are not always suitable to handle the scale, frequency and structure of the data collected. We will introduce the R packages tsibble, feasts and fable, designed to work with the tidyverse to flexibly manage and analyse collections of related time series. We will look at how to do data wrangling, data visualizations and exploratory data analysis, and we will show how some classical time series models can be applied using the fable package.

Download pdf ]]>Many organisations collect huge amounts of data over time, and we need time series analysis tools capable of handling the scale, frequency and structure of the data collected. In this workshop, we will look at some R packages and methods that have been developed to handle the analysis of large collections of time series. We will look at the tsibble data structure for flexibly managing collections of related time series, and consider how to do data wrangling, data visualisation, and exploratory data analysis to analyse time series data in high dimensions.

**Session 1**: How to wrangle time series data with familiar tidy tools.**Session 2**: How to visualize the trend and seasonal patterns in individual time series.**Session 3**: How to compute time series features and visualize large collections of time series.

Primary packages will be tsibble, lubridate and feasts (along with the tidyverse of course).

Attendees are expected to be familiar with R, and with the tidyverse collection of packages including dplyr and ggplot2. They will need to have R and RStudio installed on their own device, and have installed the fpp3 package.

People who don’t use R regularly, or don’t know the tidyverse packages, are recommended to do the tutorials at learnr.numbat.space beforehand.

Please ensure your computer has a recent version of R and RStudio installed. The following code will install the main packages needed for the workshop.

`install.packages(c("tidyverse","fpp3","GGally"))`

- Download
`tourism.xlsx`

from`http://robjhyndman.com/data/tourism.xlsx`

, and read it into R using`read_excel()`

from the`readxl`

package. - Create a tsibble which is identical to the
`tourism`

tsibble from the`tsibble`

package. - Find what combination of
`Region`

and`Purpose`

had the maximum number of overnight trips on average. - Create a new tsibble which combines the Purposes and Regions, and just has total trips by State.

Look at the quarterly tourism data for the Snowy Mountains

`snowy <- tourism |> filter(Region == "Snowy Mountains")`

- Use
`autoplot()`

,`gg_season()`

and`gg_subseries()`

to explore the data. - What do you learn?

- Produce an STL decomposition of the Snowy Mountains data.
- Experiment with different values of the two
`window`

arguments. - Plot the seasonally adjusted series.

- Find the most seasonal time series in the tourism data.
- Which state has the strongest trends?
- Use a feature-based approach to look for outlying series in
`tourism`

. - What is unusual about the series you identify as outliers?

Room 5.02, Marie Reay Teaching Building, Australian National University, Canberra.

It is becoming increasingly common for organizations to collect huge amounts of data over time, and existing time series analysis tools are not always suitable to handle the scale, frequency and structure of the data collected. In this workshop, we will look at some new packages and methods that have been developed to handle the analysis of large collections of time series.

On day 1, we will look at the tsibble data structure for flexibly managing collections of related time series. We will look at how to do data wrangling, data visualizations and exploratory data analysis. We will explore feature-based methods to explore time series data in high dimensions. A similar feature-based approach can be used to identify anomalous time series within a collection of time series, or to cluster or classify time series. Primary packages for day 1 will be tsibble, lubridate and feasts (along with the tidyverse of course).

Day 2 will be about forecasting. We will look at some classical time series models and how they are automated in the fable package. We will look at creating ensemble forecasts and hybrid forecasts, as well as some new forecasting methods that have performed well in large-scale forecasting competitions. Finally, we will look at forecast reconciliation, allowing millions of time series to be forecast in a relatively short time while accounting for constraints on how the series are related.

Attendees will learn:

- How to wrangle time series data with familiar tidy tools.
- How to compute time series features and visualize large collections of time series.
- How to select a good forecasting algorithm for your time series.
- How to ensure forecasts of a large collection of time series are coherent.

This course will be appropriate for you if you answer yes to these questions:

- Do you already use the tidyverse packages in R such as dplyr, tidyr, tibble and ggplot2?
- Do you need to analyse large collections of related time series?
- Would you like to learn how to use some tidy tools for time series analysis including visualization, decomposition and forecasting?

People who don’t use R regularly, or don’t know the tidyverse packages, are recommended to do the tutorials at learnr.numbat.space beforehand.

Please bring your own laptop with a recent version of R and RStudio installed. The following code will install the main packages needed for the workshop.

`install.packages(c("tidyverse","fpp3", "GGally", "sugrrants"))`

Time | Activity |
---|---|

09:00 - 10:30 | Session 1 |

10:30 - 11:00 | Coffee break |

11:00 - 12:30 | Session 2 |

12:30 - 13:30 | Lunch break |

13:30 - 15:00 | Session 3 |

15:00 - 15:30 | Coffee break |

15:30 - 17:00 | Session 4 |

- Download
`tourism.xlsx`

from`http://robjhyndman.com/data/tourism.xlsx`

, and read it into R using`read_excel()`

from the`readxl`

package. - Create a tsibble which is identical to the
`tourism`

tsibble from the`tsibble`

package. - Find what combination of
`Region`

and`Purpose`

had the maximum number of overnight trips on average. - Create a new tsibble which combines the Purposes and Regions, and just has total trips by State.

- Create time plots of the following four time series:
`Bricks`

from`aus_production`

,`Lynx`

from`pelt`

,`Close`

from`gafa_stock`

,`Demand`

from`vic_elec`

. - Use
`help()`

to find out about the data in each series. - For the last plot, modify the axis labels and title.

Look at the quarterly tourism data for the Snowy Mountains

`snowy <- tourism |> filter(Region == "Snowy Mountains")`

- Use
`autoplot()`

,`gg_season()`

and`gg_subseries()`

to explore the data. - What do you learn?

- Use
Produce a calendar plot for the

`pedestrian`

data from one location and one year.

We have introduced the following functions: `gg_lag`

and `ACF`

. Use these functions to explore the four time series: `Bricks`

from `aus_production`

, `Lynx`

from `pelt`

, `Close`

price of Amazon from `gafa_stock`

, `Demand`

from `vic_elec`

. Can you spot any seasonality, cyclicity and trend? What do you learn about the series?

You can compute the daily changes in the Google stock price in 2018 using

```
dgoog <- gafa_stock |>
filter(Symbol == "GOOG", year(Date) >= 2018) |>
mutate(trading_day = row_number()) |>
update_tsibble(index=trading_day, regular=TRUE) |>
mutate(diff = difference(Close))
```

Does `diff`

look like white noise?

Consider the GDP information in `global_economy`

. Plot the GDP per capita for each country over time. Which country has the highest GDP per capita? How has this changed over time?

For the following series, find an appropriate Box-Cox transformation in order to stabilise the variance.

- United States GDP from
`global_economy`

- Slaughter of Victorian “Bulls, bullocks and steers” in
`aus_livestock`

- Victorian Electricity Demand from
`vic_elec`

. - Gas production from
`aus_production`

- United States GDP from
Why is a Box-Cox transformation unhelpful for the

`canadian_gas`

data?

Produce the following decomposition

`canadian_gas |> STL(Volume ~ season(window=7) + trend(window=11)) |> autoplot()`

What happens as you change the values of the two

`window`

arguments?How does the seasonal shape change over time? [Hint: Try plotting the seasonal component using

`gg_season`

.]Can you produce a plausible seasonally adjusted series? [Hint:

`season_adjust`

is one of the variables returned by`STL`

.]

- Use
`GGally::ggpairs()`

to look at the relationships between the STL-based features. You might wish to change`seasonal_peak_year`

and`seasonal_trough_year`

to factors. - Which is the peak quarter for holidays in each state?

- Use a feature-based approach to look for outlying series in
`PBS`

. - What is unusual about the series you identify as “outliers”.

- Produce forecasts using an appropriate benchmark method for household wealth (
`hh_budget`

). Plot the results using`autoplot()`

. - Produce forecasts using an appropriate benchmark method for Australian takeaway food turnover (
`aus_retail`

). Plot the results using`autoplot()`

.

- Compute seasonal naïve forecasts for quarterly Australian beer production from 1992.
- Test if the residuals are white noise. What do you conclude?

- Create a training set for household wealth (
`hh_budget`

) by witholding the last four years as a test set. - Fit all the appropriate benchmark methods to the training set and forecast the periods covered by the test set.
- Compute the accuracy of your forecasts. Which method does best?
- Repeat the exercise using the Australian takeaway food turnover data (
`aus_retail`

) with a test set of four years.

Try forecasting the Chinese GDP from the `global_economy`

data set using an ETS model.

Experiment with the various options in the `ETS()`

function to see how much the forecasts change with damped trend, or with a Box-Cox transformation. Try to develop an intuition of what each is doing to the forecasts.

[Hint: use `h=20`

when forecasting, so you can clearly see the differences between the various options when plotting the forecasts.]

Find an ETS model for the Gas data from `aus_production`

and forecast the next few years.

- Why is multiplicative seasonality necessary here?
- Experiment with making the trend damped. Does it improve the forecasts?

For the United States GDP data (from `global_economy`

):

- Fit a suitable ARIMA model for the logged data.
- Produce forecasts of your fitted model. Do the forecasts look reasonable?

For the Australian tourism data (from `tourism`

):

- Fit a suitable ARIMA model for all data.
- Produce forecasts of your fitted models.
- Check the forecasts for the “Snowy Mountains” and “Melbourne” regions. Do they look reasonable?

Repeat the daily electricity example, but instead of using a quadratic function of temperature, use a piecewise linear function with the “knot” around 20 degrees Celsius (use predictors `Temperature`

& `Temp2`

). How can you optimize the choice of knot?

The data can be created as follows.

```
vic_elec_daily <- vic_elec |>
filter(year(Time) == 2014) |>
index_by(Date = date(Time)) |>
summarise(
Demand = sum(Demand)/1e3,
Temperature = max(Temperature),
Holiday = any(Holiday)
) |>
mutate(
Temp2 = I(pmax(Temperature-20,0)),
Day_Type = case_when(
Holiday ~ "Holiday",
wday(Date) %in% 2:6 ~ "Weekday",
TRUE ~ "Weekend"
)
)
```

Repeat Lab Session 16 but using all available data, and handling the annual seasonality using Fourier terms.

- Prepare aggregations of the PBS data by Concession, Type, and ATC1.
- Use forecast reconciliation with the PBS data, using ETS, ARIMA and SNAIVE models, applied to all but the last 3 years of data.
- Which type of model works best?
- Does the reconciliation improve the forecast accuracy?
- Why doesn’t the reconcililation make any difference to the SNAIVE forecasts?

People have been forecasting for thousands of years. They forecast whether it will rain tomorrow, how much wheat will be harvested, how long it will take for your dinner to cook, how many widgets their company will sell next month, what the unemployment figure will be in a year’s time, or how much superannuation they will have when they retire.

Some things are relatively easy to forecast and some are unpredictable. Why is it that we can accurately forecast a solar eclipse in 1000 years, but we have no idea whether Google’s stock price will rise or fall tomorrow? How can we forecast the daily electricity consumption for the next week with remarkable precision, but we cannot forecast daily COVID-19 cases with the same accuracy?

In his presentation, Professor Rob J Hyndman FAA FASSA will discuss the conditions we need for predictability, how to measure the uncertainty of our forecasts, and how to evaluate whether we are uncertain enough. Rob will draw on his 30 years of forecasting experience, including forecasting Australia’s health budget for the next few years, forecasting peak electricity demand in 20 years, and producing weekly forecasts of daily COVID-19 cases for all Australian states since March 2020.

He will also look at how forecasting has changed over the last 50 years, and what forecasting might look like in the future.

Download pdf ]]>Time series data often contains a rich complexity of seasonal patterns. Time series that are observed at a sub-daily level can exhibit multiple seasonal patterns corresponding to different granularities. Seasonal granularities can be circular such as hour-of-the-day, day-of-the-week or month-of-the-year; or quasi-circular such as day-of-the-month. They can be nested (e.g., hour-of-the-day within day-of-the-week) and non-nested (e.g., day-of-the-year in both the Gregorian and Hijri calendars). They can also follow irregular topologies induced by public holidays and other aperiodic events. Available tools to visualize, model and forecast these seasonal patterns are currently very limited. I will discuss two new time series decomposition tools for handling seasonal data: MSTL and STR. These allow for multiple seasonal and cyclic components, covariates, seasonal patterns that may have non-integer periods, and seasonality with complex topology. They can be used for time series with any regular time index including hourly, daily, weekly, monthly or quarterly data, but tackle many more decomposition problems than other methods allow. I will also demonstrate some new tools to assist in visualizing seasonal patterns in time series, emphasising changes in the conditional distribution with respect to different time granularities. The granularities form categorical variables (ordered or unordered) which induce groupings of the observations. The resulting graphics are then displays of conditional distributions compared across combinations of these categorical variables. These are implemented in the gravitas package for R.

- Sayani Gupta, Rob J Hyndman, Dianne Cook and Antony Unwin (2022) Visualizing probability distributions across bivariate cyclic temporal granularities.
*J Computational & Graphical Statistics*,**31**(1), 14-25. robjhyndman.com/publications/gravitas/ - Sayani Gupta, Rob J Hyndman, Dianne Cook (2021) Detecting distributional differences between temporal granularities for exploratory time series analysis. working paper. robjhyndman.com/publications/hakear/
- Kasun Bandara, Rob J Hyndman, Christoph Bergmeir (2022) MSTL: A Seasonal-Trend Decomposition Algorithm for Time Series with Multiple Seasonal Patterns.
*International J Operational Research*, to appear. robjhyndman.com/publications/mstl/ - Alex Dokumentov and Rob J Hyndman (2022) STR: Seasonal-Trend decomposition using Regression.
*INFORMS Journal on Data Science*,**1**(1), 50-62. robjhyndman.com/publications/str/

*Also given online to ARLES (Ageing Risks and their Long-term impact on the Economy and Society). 24 Feb 2022*

I’ll describe how to forecast the old-age dependency ratio for Australia under various pension age proposals, and estimate a pension age scheme that will provide a stable old-age dependency ratio at a specified level. The approach involves a stochastic population forecasting method based on coherent functional data models for mortality, fertility and net migration, which is used to simulate the future age-structure of the population. The results suggest that the Australian pension age should be increased to 68 by 2030, 69 by 2036, and 70 by 2050, in order to maintain the old-age dependency ratio at 23%, just above the 2018 level. The general approach described can easily be extended to other target levels of the old-age dependency ratio and to other countries.

Time series data often contain a rich complexity of seasonal patterns. Time series that are observed at a sub-daily level can exhibit multiple seasonal patterns corresponding to different granularities such as hour-of-the-day, day-of-the-week or month-of-the-year. They can be nested (e.g., hour-of-the-day within day-of-the-week) and non-nested (e.g., day-of-the-year in both the Gregorian and Hijri calendars). We will discuss two new time series decomposition tools for handling seasonalities in time series data: MSTL and STR. These allow for multiple seasonal and cyclic components, covariates, seasonal patterns that may have non-integer periods, and seasonality with complex topology. They can be used for time series with any regular time index including hourly, daily, weekly, monthly or quarterly data, but tackle many more decomposition problems than other methods allow.

- Kasun Bandara, Rob J Hyndman, Christoph Bergmeir (2022) MSTL: A Seasonal-Trend Decomposition Algorithm for Time Series with Multiple Seasonal Patterns.
*International J Operational Research*, to appear. robjhyndman.com/publications/mstl/ - Alex Dokumentov and Rob J Hyndman (2022) STR: Seasonal-Trend decomposition using Regression.
*INFORMS Journal on Data Science*, to appear. robjhyndman.com/publications/str/

Social good is created whenever we make new forecasting methods and resources freely available and usable. That could take the form of open source software and data, open access papers and textbooks, reproducible source files, and so on. I will discuss progress in this area over the last 25 years, and reflect on my own experiences in publishing forecasting papers, books and software. I will discuss the benefits in working openly and publicly from an academic, commercial, and social good perspective.

It is becoming increasingly common for organizations to collect very large amounts of data over time. Data visualization is essential for exploring and understanding structures and patterns, and to identify unusual observations. However, the sheer quantity of data available means that new time series visualisation methods are needed. I will demonstrate an approach to this problem using a vector of features on each time series, measuring characteristics of the series. These feature vectors can then be mapped to a 2-dimensional space for visualization. The feature-based approach to time series can also be used for many other analysis tasks including (1) clustering time series; (2) identifying anomalous time series within a collection of time series; (3) selecting the best forecasting model; and (4) finding the optimal weighted ensemble of forecasts. I will demonstrate examples for each of these, and show some new R packages that make feature-based time series analysis easy to do.

Download pdf ]]>ANU-AAMT National Mathematics Summer School

Download pdf ]]>It is now common for organizations to collect huge amounts of data over time, and existing time series analysis tools are not always able to handle the scale, frequency and structure of the data collected. I will demonstrate some new tools and methods that have been developed to handle the analysis of large collections of time series. These include a feature-based approach for exploring time series data in high dimensions, and to allow anomalous time series to be identified within a collection of time series. I will also show how automated large-scale probabilistic forecasting is now very easy to do. No knowledge of time series analysis or forecasting will be assumed! The ideas will be illustrated using the tsibble, feasts and fable packages for R.

It is becoming increasingly common for organizations to collect large numbers of related time series, and existing time series analysis tools are not always suitable to handle the scale, frequency and structure of the data collected. We will introduce the R packages tsibble, feasts and fable, designed to work with the tidyverse to flexibly manage and analyse collections of related time series. We will look at how to do data wrangling, data visualizations and exploratory data analysis, and we will show how some classical time series models can be applied using the fable package.

Download pdf ]]>It is common to forecast at different levels of aggregation. For example, a retail company will want national forecasts, state forecasts, and store-level forecasts. And they will want them for all products, for groups of products, and for individual products. Forecast reconciliation methods allow for the forecasts at all levels of aggregation to be adjusted so they are consistent with each other.

I will describe a geometric interpretation for reconciliation methods used to forecast time series that adhere to known linear constraints. In particular, a general framework is established nesting many existing popular reconciliation methods within the class of projections. This interpretation facilitates the derivation of novel results that explain why and how reconciliation via projection is guaranteed to improve forecast accuracy with respect to a specific class of loss functions. The result is also demonstrated empirically using Australian tourism flows. I will also discuss how this geometric interpretation naturally extends to probabilistic forecasting.

Finally, I will show how these ideas can be easily implemented using the fable package in R.

Download pdf ]]>Synthetic time series are useful for benchmarking and testing methods for forecasting, clustering, classification and other tasks. I will discuss an approach to this where we can generate time series with diverse and controllable characteristics using mixture autoregressive (MAR) models. This can be done with the gratis package for R.

Download pdf ]]>Australian Centre of Excellence for Mathematical and Statistical Frontiers, 24 August 2021.

Why is it that we can accurately forecast a solar eclipse in 1000 years time, but we have no idea whether Google’s stock price will rise or fall tomorrow? Or why can we forecast the daily electricity consumption for the next week with remarkable precision, but we cannot forecast daily COVID-19 cases with the same accuracy?

In this talk, I will discuss the conditions we need for predictability, how to measure the uncertainty of our forecasts, and how to evaluate whether we are uncertain enough.

I will draw on 30 years of forecasting practice, including forecasting Australia’s health budget for the next few years, forecasting peak electricity demand in 20 years time, producing weekly forecasts of daily COVID-19 cases for all Australian states since March 2020, and forecasting the post-pandemic recovery of Australia’s tourism industry.

Download pdf ]]>In March 2020, I joined a team responsible for providing probabilistic forecasts of COVID-19 cases to all Australian state & territory Chief Health Officers. We use case-level data of all Australian positive COVID cases, along with nationwide surveys and mobility data from Google, Facebook and Apple. Three separate models have been built: (1) a stochastic susceptible-exposed-infectious-recovered (SEEIIR) compartmental model; (2) a stochastic epidemic model; and (3) a global autoregressive model based on public case data from 31 countries. These are then combined into a mixture ensemble to generate probabilistic forecasts of daily cases which are provided to the Australian governments each week. I will discuss the ensemble forecasting aspects of this work and how we evaluate the results.

Download pdf ]]>Lokad is a supply chain software company based in Paris. They have a TV channel on which they discuss supply chain issues.

Recently I was interviewed on Lokad TV, discussing my R packages for forecasting.

]]>