```
set.seed(2023)
library(fpp3)
df <- tibble(
time = seq(100),
x = rnorm(100),
y = x + rnorm(100)
)
fit <- lm(y ~ x, data = df)
AIC(fit)
```

`[1] 275.6267`

The AIC returned by

`TSLM()`

is different from that returned by `lm()`

. Why?
I get this question a lot, so I thought it might help to explain some issues with AIC calculation.

First, the equation for the AIC is given by where is the likelihood of the model and is the number of parameters that are estimated (including the error variance). For a linear regression model with iid errors, fitted to observations, the log-likelihood can be written as where is the residual for the th observation. The AIC is then Since we don’t know , we estimate it using the mean squared error (the maximum likelihood estimator), giving where is a constant that depends only on the sample size and not on the model. This constant is often ignored. Thus, different software implementations can lead to different AIC values for the same model, since they may include or exclude the constant .

Now, let’s look at what R returns in a simple case using the `lm()`

function.

```
set.seed(2023)
library(fpp3)
df <- tibble(
time = seq(100),
x = rnorm(100),
y = x + rnorm(100)
)
fit <- lm(y ~ x, data = df)
AIC(fit)
```

`[1] 275.6267`

We can check how this is calculated by computing it ourselves.

```
mse <- mean(residuals(fit)^2)
n <- length(residuals(fit))
k <- length(fit$coefficients) + 1
# With constant
2*k + n*log(mse) + n*log(2*pi) + n
```

`[1] 275.6267`

```
# Without constant
2*k + n*log(mse)
```

`[1] -8.161047`

Clearly, `AIC()`

applied to the output from `lm()`

is using the version with the constant.

Now compare that with what we obtain using the `TSLM()`

function from the fable package.

```
df |>
as_tsibble(index = time) |>
model(TSLM(y ~ x)) |>
glance() |>
pull(AIC)
```

`[1] -8.161047`

This is the AIC without the constant.

The situation is even more confusing with ARIMA models, and some other model classes, because some functions use approximations to the likelihood, rather than the exact likelihood.

Thus, AIC values can be compared across models fitted using the same functions, but not necessarily when models have been fitted using different functions.

]]>
Does it make any sense to compute p-values for prediction intervals?

I received this email today:

My team recently used some techniques found in your writings to perform forecasts … Our work has been well received by reviewers, but one commenter asked two questions that I was hoping you may be able to provide insight on.

First, they wanted to know if we could provide P-values for our prediction intervals. In our work, we said, “Observed rates were deemed significantly different from expected rates when they did not fall within the 95% PI.” This same language has been used by others published in the same journal. I am curious to hear your thoughts on giving P-values for these PIs and what the appropriate method for doing so would be (if any).

Second, they asked about making a correction for multiple comparisons. … I believe we could apply a Bonferroni correction to the PIs, but that feels too liberal. Moreover, I am curious if this is even called for given our statement of what is deemed significant and the fact that our prediction interval construction relies on a non-parametric method.

Here is my reply:

I don’t think this makes any sense. A p-value is the probability of obtaining observations at least as extreme as those observed given a null hypothesis. What’s the hypothesis here? In forecasting, we don’t usually have a hypothesis. Instead, we fit a model to the data, and make predictions based on the model. I guess you could make the null hypothesis “The future observations come from the forecast distributions”, and then the p-value for each future time period would be the probability of the tails beyond the observations. But it is well-known that the estimated prediction intervals are almost always too narrow due to them not taking into account all sources of variance. So the size of this test would not be well-calibrated. I think you’re better off pushing back rather than trying to meet the request.

A Bonferroni correction assumes independence between the intervals, and that is not true for PIs from a forecasting model. The future forecast errors are all correlated (with the strength of the correlation depending on the model and the DGP). Usually we just say that these are pointwise PI, and so we expect 5% of observations to fall outside the 95% prediction intervals. It is possible to generate uniform PI, which contain 95% of all future sample paths, but this is a little tricky due to the correlations between horizons. It could be done via simulation – simulate a 1000 future sample paths and compute the envelope that contains 950 of them.

It sounds like the reviewers are only familiar with inferential statistics, and not with predictive modelling. You could point them to Shmueli’s excellent 2010 paper “To explain or predict”, highlighting the differences between the two paradigms.

]]>
I am writing a new textbook on anomaly detection. It probably won’t be finished for at least a year, but here is an excerpt.

There is a widespread myth that NASA did not discover the hole in the ozone layer above the Antarctic because they had been throwing away anomalous data that would have revealed it. This is not true, but the real story is also instructive (Pukelsheim 1990; Christie 2001, 2004).

NASA had been collecting satellite data on Antarctic ozone levels using a Total Ozone Mapping Spectrometer (TOMS) since 1979, while British scientists had collected ozone data using ground sensors at the Halley Research Station, on the edge of the Brunt Ice Shelf in Antarctica, since 1957. Figure 1 shows average daily values from the NASA measurements in blue, and from the British observations in orange. There is a clear downward trend in the British data, especially from the late 1970s, which is confirmed with the NASA data. So why wasn’t the “ozone hole” discovered until 1985?

The British scientists had noticed the low ozone values as early as 1981, but it took a few years for the scientists to be convinced that the low values were real and not due to instrument problems, and then there were the usual publication delays. Eventually, the results were published in Farman, Gardiner, and Shanklin (1985).

Meanwhile, NASA was flagging observations as anomalous when they were below 180 DU (shown as a horizontal line in Figure 1). As is clear from the figure, this is much lower than any of the plotted points before the early 1980s. However, the 180 threshold was used for the *daily* measurements, which are much more variable than the monthly averages that are plotted. Occasionally daily observations did fall below 180, and so it was a reasonable threshold for the purpose of identifying instrument problems.

In fact, NASA had checked the unusually low TOMS values obtained before 1985 by comparing them against other available data. But the other data available to them showed ozone values of about 300 DU, so it was assumed that the satellite sensor was malfunctioning. The British Halley data were not available to them, and only after the publication of Farman, Gardiner, and Shanklin (1985) did the NASA scientists realise that the TOMS results were accurate.

In 1986, NASA scientists were able to confirm the British finding, also demonstrating that the ozone hole was widespread across the Antarctic (Stolarski et al. 1986).

This example reveals some lessons about anomaly detection:

- The NASA threshold of 180 was based on daily data, and was designed to identify instrument problems, not genuine systematic changes in ozone levels. The implicit assumption was that ozone levels varied seasonally, but that otherwise the distribution of observations was stable. All anomaly detection involves some implicit assumptions like this, and it is well to be aware of them.
- Sometimes what we think are anomalies are not really anomalies, but the result of incorrect assumptions.
- Often smoothing or averaging data will help to reveal issues that are not so obvious from the original data. This reduces the variation in the data, and allows more systematic variation to be uncovered.
- Always plot the data. In this case, a graph such as Figure 1 would have revealed the problem in the late 1970s, but it seems no-one was producing plots like this.

Christie, M. 2001. *The Ozone Layer: A Philosophy of Science Perspective*. Cambridge, UK: Cambridge University Press.

———. 2004. “Data Collection and the Ozone Hole: Too Much of a Good Thing?” *History of Meteorology* 1: 99–105.

Farman, J C, B G Gardiner, and J D Shanklin. 1985. “Large Losses of Total Ozone in Antarctica Reveal Seasonal ClO/NO Interaction.” *Nature* 315 (6016): 207–10.

Pukelsheim, F. 1990. “Robustness of Statistical Gossip and the Antarctic Ozone Hole.” *The IMS Bulletin* 19 (4): 540–45.

Stolarski, R S, A J Krueger, M R Schoeberl, R D McPeters, P A Newman, and J C Alpert. 1986. “Nimbus 7 Satellite Measurements of the Springtime Antarctic Ozone Decrease.” *Nature* 322 (6082): 808–11.

When using a training/test split, or time-series cross-validation, are you choosing a specific model or a model class?

This question arises most time I teach a forecasting workshop, and it was raised again in the following email I received today:

I have a time series that I have split into training and test datasets with an 80%-20% ratio. I fit a series of different models (ETS, BATS, ARIMA, NN etc) to the training data and generate my forecasts from each model. When evaluating the forecasts against the test set I find the model that gives the best outcome is an ARIMA(1,1,1) that was selected using the auto.arima function. My question is this, should I proceed to fit an ARIMA(1,1,1) to the whole data set, or should I use the auto.arima function again which may give me a slightly different (p,d,q) order as there is now an extra 20% of unseen data available to the forecast model? Any guidance would be greatly received.

If you only have one class of model to consider (e.g., only ETS or only ARIMA), then it is easy enough to select the model on all available data using the AIC, and use the selected model to forecast the future. But if you are selecting between model classes, then you need to use either a training/test split, or (preferably) a time-series cross-validation procedure.

If you use time-series cross-validation, then there would usually be different models selected for each training set, and the cross-validated error is a measure of how well the model class works for your data. In that case, there is no single model for the training data, and you are selecting the *model class* rather than a specific model. This makes it clear that you should then apply the selected model class to all the data, when forecasting beyond the end of the available data. In other words, if you choose ARIMA over ETS, then you would then fit an ARIMA model to all the data, and use that model to forecast the future.

You can think of a simple training/test split as a special case of time-series cross-validation, where there is a single fold. So the same argument applies. That is, you are selecting the model class that works best for your data, and so you should apply that model class to all the data, when forecasting beyond the end of the available data.

This example also illustrates why it is important to use a time-series cross-validation procedure, rather than a simple training/test split. In this case, the ARIMA model was selected because it happened to work best for the particular training/test split that was used. But if a different split had been used, then a different model might have been selected. So the model selection is not stable. By averaging over multiple folds using a time-series cross-validation procedure, you can get a more stable estimate of the model class that works best for your data.

]]>
Over the past 6 months, George Athanasopoulos and I have added videos to most sections of the 3rd edition of our textbook *Forecasting: principles and practice*.

We have taught from the book many times, but this year we decided to pre-record short videos for each section. Our students often prefer a video explanation than reading the textbook, and we thought other readers might appreciate hearing from us as well.

These videos are embedded in most sections of the book. So far, we’ve covered the sections that we include in our own courses, but we hope to eventually have videos for all sections. Most of these were done in a single take, so they are sometimes a little rough, but hopefully still useful.

You can view the entire playlist on YouTube.

]]>
The Ljung-Box test is widely used to test for autocorrelation remaining in the residuals after fitting a model to a time series. In this post, I look at the degrees of freedom used in such tests.

Suppose an ARMA() model is fitted to a time series of length , giving a series of residuals , and let the autocorrelations of this residual series be denoted by The first autocorrelations are used to construct the statistic

This statistic was discussed by Box and Pierce (1970), who argued that if is large, and the model parameters correspond to the true data generating process, then has a distribution with degrees of freedom. Later, Ljung and Box (1978) showed that if the model is correct, but with unknown parameters, then has a distribution with degrees of freedom.

These days, the Ljung-Box test is applied to a lot more models than non-seasonal ARMA models, and it is not clear what the degrees of freedom should be for other models. For example:

- What if the model includes an intercept term? Should that be included in the degrees of freedom calculation?
- What about a seasonal ARIMA model? Do we just count all coefficients?
- Or a regression with ARMA errors? Should we include the regression coefficients when computing the degrees of freedom?
- Or an ETS model? Do we count just the smoothing parameters, or do we include the states as well, or something else?

Not long ago, I had naively assumed that the correct degrees of freedom would be where is the number of parameters estimated. I am in good company because Andrew Harvey in Harvey (1990, p259) made exactly the same conjecture. That was what was coded in the `forecast::checkresiduals()`

function prior to v8.21, and how the test was applied in Hyndman and Athanasopoulos (2018) and Hyndman and Athanasopoulos (2021) until February 2023. But a recent github discussion with Achim Zeilis convinced me that it is incorrect.

Let’s look at a few examples. For each model, we will simulate 5000 series, each of length 250 observations. For each series, we compute the p-value of a Ljung-Box test with and degrees of freedom, for different values of . Under the null hypothesis of uncorrelated residuals, the values should have a uniform distribution.

```
library(forecast)
library(ggplot2)
set.seed(0)
# Function to simulate p-values given a DGP model and
# a function to fit the model to a time series
simulate_pvalue <- function(model, fit_fn, l=10) {
## simulate series
if(is.null(model$xreg)) {
y <- simulate(model, n = 250)
} else {
y <- simulate(model, xreg=model$xreg, n=250)
}
if(inherits(model, "ets")) {
# If multiplicative errors, fix non-positive values
if(model$components[1] == "M")
y[y <= 0] <- 1e-5
}
## Fit model
m <- fit_fn(y)
## compute p-values for various df
pv <- purrr::map_vec(0:3, m=m,
function(x, m) {
Box.test(residuals(m), lag = l, fitdf = x, type = "Ljung-Box")$p.value
}
)
names(pv) <- paste("K =", 0:3)
return(pv)
}
# Function to replicate the above function
simulate_pvalues <- function(model, fit_fn, nsim = 5000, l=10) {
purrr::map_dfr(seq(nsim), function(x) {
simulate_pvalue(model, fit_fn, l=l)
})
}
# Histograms of p values
hist_pvalues <- function(pv) {
pv |>
tidyr::pivot_longer(cols = seq(NCOL(arima_pvalues))) |>
ggplot(aes(x = value)) +
geom_histogram(bins = 30, boundary = 0) +
facet_grid(. ~ name) +
labs(title = "P value distributions")
}
# A nice table of the size of the test
table_pvalues <- function(pv) {
tibble::tibble(`test size` = c(0.01, 0.05, 0.1)) %>%
dplyr::bind_cols(
purrr::map_df(pv, function(x) {ecdf(x)(.$`test size`)})
) |>
knitr::kable()
}
```

We will simulate from an ARIMA(2,0,0) model with a non-zero intercept. For the Ljung-Box test, we will consider . Note that was the original proposal by Box and Pierce (1970), counts only ARMA coefficients, and counts all parameters estimated in the model. The resulting distributions of the p-values are shown below.

```
model <- Arima(sqrt(lynx), order=c(2,0,0))
fit_fn <- function(y) {
Arima(y, order = c(2, 0, 0), include.mean = TRUE)
}
arima_pvalues <- simulate_pvalues(model, fit_fn)
hist_pvalues(arima_pvalues)
```

`table_pvalues(arima_pvalues)`

test size | K = 0 | K = 1 | K = 2 | K = 3 |
---|---|---|---|---|

0.01 | 0.0046 | 0.0072 | 0.0124 | 0.0220 |

0.05 | 0.0226 | 0.0354 | 0.0534 | 0.0818 |

0.10 | 0.0474 | 0.0678 | 0.1016 | 0.1514 |

Clearly the one with is better than the alternatives. The table shows the empirical size of the test for different threshold levels. The empirical sizes are closest to the nominal sizes when . So we shouldn’t count the intercept when computing the degrees of freedom.

Next, we will simulate from an ARIMA(0,1,1)(0,1,1) model, often called the “airline model” due to its application to the Air passenger series in Box et al. (2016). In fact, our DGP for the simulations will be a model fitted to the `AirPassengers`

data set. Again, we consider . There are two parameters to be estimated.

```
model <- Arima(log(AirPassengers), order=c(0,1,1), seasonal=c(0,1,1))
fit_fn <- function(y) {
Arima(y, order = c(0,1,1), seasonal=c(0,1,1))
}
sarima_pvalues <- simulate_pvalues(model, fit_fn)
hist_pvalues(sarima_pvalues)
```

`table_pvalues(sarima_pvalues)`

test size | K = 0 | K = 1 | K = 2 | K = 3 |
---|---|---|---|---|

0.01 | 0.0100 | 0.0170 | 0.0282 | 0.0456 |

0.05 | 0.0484 | 0.0684 | 0.1018 | 0.1452 |

0.10 | 0.0882 | 0.1218 | 0.1774 | 0.2526 |

Interesting. Although there are two parameters here, the tests with and do better than . I would have expected to be the right choice, but the test with has empirical size about twice the nominal size.

As a guess, perhaps the seasonal parameters aren’t having an effect with . We can test what happens for larger by setting (covering two years), and repeating the exercise.

```
sarima_pvalues <- simulate_pvalues(model, fit_fn, l=24)
hist_pvalues(sarima_pvalues)
```

`table_pvalues(sarima_pvalues)`

test size | K = 0 | K = 1 | K = 2 | K = 3 |
---|---|---|---|---|

0.01 | 0.0134 | 0.0164 | 0.0216 | 0.0278 |

0.05 | 0.0492 | 0.0614 | 0.0748 | 0.0940 |

0.10 | 0.0864 | 0.1066 | 0.1312 | 0.1590 |

I was expecting to do best there, but not so. is the most uniform, and gives empirical sizes closest to the nominal sizes, with the results getting worse as increases. Perhaps always setting would be a sensible strategy for ARIMA models, even if they contain seasonal components. This needs some theoretical analysis.

We will simulate from a linear trend model with AR(1) errors. Here, counts only ARMA coefficients, while counts all parameters estimated. The resulting distributions of the p-values are shown below.

```
model <- Arima(10 + seq(250)/10 + arima.sim(list(ar=0.7), n=250),
order = c(1,0,0), xreg = seq(250))
fit_fn <- function(y) {
Arima(y, order = c(1,0,0), include.constant = TRUE, xreg = seq(250))
}
regarima_pvalues <- simulate_pvalues(model, fit_fn)
hist_pvalues(regarima_pvalues)
```

`table_pvalues(regarima_pvalues)`

test size | K = 0 | K = 1 | K = 2 | K = 3 |
---|---|---|---|---|

0.01 | 0.0072 | 0.0104 | 0.0176 | 0.0294 |

0.05 | 0.0310 | 0.0532 | 0.0782 | 0.1200 |

0.10 | 0.0696 | 0.0996 | 0.1478 | 0.2150 |

The test with looks the most uniform, with the size of the test closest to the nominal values. So only counting ARMA coefficients seems to be correct here.

Next, we will consider a linear trend model with iid errors. That is the same as the previous model, but with a simpler error structure.

```
model <- Arima(10 + seq(250)/10 + rnorm(250),
order = c(0,0,0), xreg = seq(250))
fit_fn <- function(y) {
Arima(y, include.constant = TRUE, xreg = seq(250))
}
trend_pvalues <- simulate_pvalues(model, fit_fn)
hist_pvalues(trend_pvalues)
```

`table_pvalues(trend_pvalues)`

test size | K = 0 | K = 1 | K = 2 | K = 3 |
---|---|---|---|---|

0.01 | 0.0148 | 0.0214 | 0.0338 | 0.0554 |

0.05 | 0.0580 | 0.0844 | 0.1204 | 0.1746 |

0.10 | 0.1054 | 0.1478 | 0.2062 | 0.2816 |

The test with looks best. If we think of a regression model as a RegARIMA model with ARIMA(0,0,0) errors, this is consistent with the previous results, setting .

Now let’s try an ETS(A,N,N) model, again using 5000 series each of length 250. If we count only the smoothing parameter, , but if we count all estimated parameters, . The distributions of p-values are shown below.

```
model <- ets(fma::strikes, "ANN")
fit_fn <- function(y) {
ets(y, model = "ANN", damped = FALSE)
}
ets_pvalues <- simulate_pvalues(model, fit_fn)
hist_pvalues(ets_pvalues)
```

`table_pvalues(ets_pvalues)`

test size | K = 0 | K = 1 | K = 2 | K = 3 |
---|---|---|---|---|

0.01 | 0.0064 | 0.0112 | 0.0174 | 0.0316 |

0.05 | 0.0334 | 0.0522 | 0.0778 | 0.1196 |

0.10 | 0.0702 | 0.0960 | 0.1442 | 0.2098 |

looks about right. That makes sense as an ETS(A,N,N) model is equivalent to an ARIMA(0,1,1) model.

Next, let’s try an ETS(M,N,N) model, which has no ARIMA equivalent, but which has one smoothing parameter and one initial state to estimate.

```
model <- ets(fma::strikes, "MNN")
fit_fn <- function(y) {
ets(y, model = "MNN", damped = FALSE)
}
ets_pvalues <- simulate_pvalues(model, fit_fn)
hist_pvalues(ets_pvalues)
```

`table_pvalues(ets_pvalues)`

test size | K = 0 | K = 1 | K = 2 | K = 3 |
---|---|---|---|---|

0.01 | 0.0054 | 0.0084 | 0.0172 | 0.0308 |

0.05 | 0.0334 | 0.0548 | 0.0814 | 0.1274 |

0.10 | 0.0708 | 0.1062 | 0.1520 | 0.2136 |

Again, appears to be the best.

An ETS(A,A,N) model is equivalent to an ARIMA(0,2,2) model, so I expect this one to need .

```
model <- ets(fma::strikes, "AAN")
fit_fn <- function(y) {
ets(y, model = "AAN", damped = FALSE)
}
ets_pvalues <- simulate_pvalues(model, fit_fn)
hist_pvalues(ets_pvalues)
```

`table_pvalues(ets_pvalues)`

test size | K = 0 | K = 1 | K = 2 | K = 3 |
---|---|---|---|---|

0.01 | 0.0034 | 0.0062 | 0.0106 | 0.0172 |

0.05 | 0.0180 | 0.0308 | 0.0514 | 0.0802 |

0.10 | 0.0430 | 0.0658 | 0.1008 | 0.1486 |

This time, my conjecture is correct, and works well.

Finally, we will check a seasonal ETS model

```
model <- ets(log(AirPassengers), model="AAA", damped=FALSE)
fit_fn <- function(y) {
ets(y, model = "AAA", damped = FALSE)
}
ets_pvalues <- simulate_pvalues(model, fit_fn)
hist_pvalues(ets_pvalues)
```

`table_pvalues(ets_pvalues)`

test size | K = 0 | K = 1 | K = 2 | K = 3 |
---|---|---|---|---|

0.01 | 0.0178 | 0.0276 | 0.0396 | 0.0602 |

0.05 | 0.0634 | 0.0916 | 0.1360 | 0.1942 |

0.10 | 0.1206 | 0.1672 | 0.2244 | 0.3078 |

Here there are 3 smoothing parameters, and 13 initial states to estimate. So I was expecting to do best, but it is the worst. Instead, is the best. I’m not sure what to make of this result.

Based only on this empirical evidence:

- For ARIMA models, use degrees of freedom.
- For seasonal ARIMA models, it appears that also gives the best results.
- For regression with ARIMA errors, use degrees of freedom.
- For OLS regression, use degrees of freedom.
- For non-seasonal ETS models, use the number of smoothing parameters.
- For seasonal ETS models, use .

The last two of these appear to be contradictory, and it is not clear why.

It seems like this might be a good project for a PhD student to explore. In particular, can these suggestions based on empirical evidence be supported theoretically? It would also be good to explore other models such as TBATS, ARFIMA, NNETAR, etc.

For now, I might avoid teaching the Ljung-Box test, and just get students to look at the ACF plot of the residuals instead.

- Kim et al. (2004) shows that for an AR(1) model with ARCH errors.
- McLeod and Li (1983) consider the equivalent test applied to autocorrelations of squared residuals, and show that .
- Mahdi (2016) discusses a variation on the LB test for seasonal ARIMA models considering only autocorrelations at the seasonal lags.
- Several other portmanteau tests (i.e., based on multiple autocorrelations) are available, and perhaps we should be using them and not the older Ljung-Box test. See Mahdi (2021) for some recent developments.

Box, George E P, Gwilym M Jenkins, Gregory C Reinsel, and Greta M Ljung. 2016. *Time Series Analysis: Forecasting and Control*. 5th ed. John Wiley; Sons.

Box, George E P, and David A Pierce. 1970. “Distribution of Residual Autocorrelations in Autoregressive-Integrated Moving Average Time Series Models.” *Journal of the American Statistical Association* 65 (332): 1509–26. https://doi.org/10.2307/2284333.

Harvey, Andrew C. 1990. *Forecasting, Structural Time Series Models and the Kalman Filter*. Cambridge University Press.

Hyndman, Rob J, and George Athanasopoulos. 2018. *Forecasting: Principles and Practice*. 2nd ed. Melbourne, Australia: OTexts. OTexts.org/fpp2.

———. 2021. *Forecasting: Principles and Practice*. 3rd ed. Melbourne, Australia: OTexts. OTexts.org/fpp3.

Kim, Eunhee, Jeongcheol Ha, Youngsook Jeon, and Sangyeol Lee. 2004. “Ljung-Box Test in Unit Root AR-ARCH Model.” *Communications for Statistical Applications and Methods* 11 (2): 323–27. https://doi.org/10.5351/ckss.2004.11.2.323.

Ljung, Greta M, and George E P Box. 1978. “On a Measure of Lack of Fit in Time Series Models.” *Biometrika* 65 (2): 297–303. https://doi.org/10.1093/biomet/65.2.297.

Mahdi, Esam. 2016. “Portmanteau Test Statistics for Seasonal Serial Correlation in Time Series Models.” *SpringerPlus* 5 (1): 1485. https://doi.org/10.1186/s40064-016-3167-4.

———. 2021. “New Goodness-of-Fit Tests for Time Series Models.” http://arxiv.org/abs/2008.08176.

McLeod, A I, and W K Li. 1983. “Diagnostic Checking ARMA Time Series Models Using Squared-Residual Autocorrelations.” *Journal of Time Series Analysis* 4 (4): 269–73. https://doi.org/10.1111/j.1467-9892.1983.tb00373.x.

Date | Podcast | Episode |
---|---|---|

26 May 2023 | Forecasting Impact | Forecasting software panel |

14 March 2022 | Faculty.net | Forecasting in social settings |

17 November 2021 | The Random Sample | Software as a first class research output |

24 May 2021 | Data Skeptic | Forecasting principles and practice |

12 April 2021 | Seriously Social | Forecasting the future: the science of prediction |

6 February 2021 | Forecasting Impact | Rob Hyndman |

19 July 2020 | The Curious Quant | Forecasting COVID, time series, and why causality doesnt matter as much as you think |

27 May 2020 | The Random Sample | Forecasting the future & the future of forecasting |

9 October 2019 | Thought Capital | Forecasts are always wrong (but we need them anyway) |

I’m giving a 2-day workshop on “Tidy Time Series and Forecasting in R”, first at the New York R Conference in July, and then at the Posit Conference in Chicago in September.

Places are limited, so please book in early.

- New York, 11-12 July 2023. Followed by the New York R conference. (Register for the workshop as part of the NYR conference registration.)
- Chicago, 17-18 September 2023. Followed by the Posit conference. (Register for the workshop as part of the Posit conference registration.)

The workshop introduces the tidyverts set of packages. Further details about the workshop are here.

]]>
I’ve created some quarto templates with Monash University branding.

This is a Quarto template that assists you in creating a letter on Monash University letterhead.

This is a Quarto template that assists you in creating a memo, with optional Monash University branding.

This is a Quarto template that assists you in creating a working paper for the Department of Econometrics & Business Statistics, Monash University.

This is a Quarto template that assists you in creating a Monash University report.

This is a Quarto template that assists you in creating a Monash University thesis.

Either fork or download the repository to get started. |

These are all based on my Rmarkdown templates which are distributed via the `monash`

R package.

Australia has a problem with government data. Actually it has three problems with government data: 1. It is often kept secret. 2. If it is available, it is often out-of-date. 3. If it is available and timely, it is often in a form that makes any analysis difficult.

I think it would be better for the country if government data was available freely, immediately, and in a form that is useful for analysis. Of course, we should make an exception if there are privacy issues, or some other harm that would be caused by releasing it. Let me explain using some examples.

Take mortality data. During the pandemic, it has been important to know how many people died of any cause, so we could know the effect of the pandemic overall. Obviously some people were dying of COVID-19, but others might have been dying because they were unable to get treated when medical staff were overwhelmed by COVID patients. On the other hand, lockdowns may have reduced deaths due to road crashes, but perhaps they also affected deaths due to suicide. If we could compare the total deaths each week during the pandemic, with the corresponding totals in previous years, we could determine the overall effect of the pandemic on Australian mortality.

You would think, that during a global pandemic, having good mortality data would be important. But in June 2020, nearly six months after the start of COVID-19, the most recent available mortality data in Australia was from 2018. Eighteen months out of date! Think about that. For the first six months of the biggest public health event in 100 years, we had no official data on the effect of COVID-19 on Australian mortality. Eventually the Australian Bureau of Statistics got their act together and started producing provisional mortality data more frequently, but only after several of us complained loudly and publicly. Even now, the provisional mortality data available from the ABS is more than 3 months out of date. Contrast that to other countries. I could find mortality data on 38 countries, and Australia was the 5th worst for producing timely mortality data.

Another example concerns COVID-19 case numbers. There is still no reliable Australian government repository of daily COVID-19 cases by state. Some states are now producing historical data, but for most of 2020, when we really needed reliable information, the public information was incomplete. For much of the first two years of the pandemic, the state health departments were putting out their little dashboard images containing the numbers, but these were preliminary numbers, and did not include cases that were registered late, and some other data revisions. To do any serious analysis, you needed daily case numbers from the beginning of the pandemic, but these were not available on government websites until relatively recently. Some media organizations, and some individuals, were collating the case numbers from the dashboard images and putting them online in the form of spreadsheets, and people were using them to do analysis, but these data were usually inaccurate and subject to revisions. The state health departments generally didn’t update the initial numbers that were released, even though they had more reliable information. So the public data was inaccurate, and most people wanting to do any data analysis were relying on media outlets, or a few 14 year old boys running covidlive.com.au, to get even that.

For nearly three years, I have been part of the forecasting team appointed to provide advice to all of the Chief Health Officers of the states and territories of Australia. Every week, we produce forecasts of COVID daily case numbers for all states and territories. For that purpose, we were able to put together a relatively good data set of case numbers for all states, but we were explicitly forbidden to make the data publicly available, even though our data was more accurate than what was appearing in the media.

Similarly, our forecasts were kept secret even though they were being used to make policy decisions. Premiers would justify their policies by vaguely referring to “the modelling”, or occasionally “the Doherty modelling” (even though most of us are not at the Doherty institute), but we would have preferred to have our forecasts available. So the good data and the forecasts are kept secret, and what is available is of poorer quality, or out-of-date.

Why? There are no privacy issues here. No harm would be done by working more transparently. On the contrary, if everyone had access to the best available data, then the independent modelling that was being done would have been of a higher quality.

We use a forecasting ensemble, where we have several forecasting models, and we combine them to produce the final forecasts that are submitted to the various state governments each week. Because we can’t share the data, the only forecasts that are included are those from members of our team. Generally in forecasting, it is better to use a wide range of models, not rely on a select few. But we can’t do that in Australia because of government obsession with secrecy.

Compare that to the United States where there was an official repository of data set up early in the pandemic, and anyone could download it and produce forecasts, and submit those forecasts to the Centre for Disease Control for inclusion in their analysis. Therefore, the US forecasting ensemble that was being used for policy decisions was based on a much larger range of models, and anyone could contribute to it. The resulting forecasts are then published publicly, so anyone can see what is being forecast, and what information a government has available when making policy decisions.

I’ve focused on COVID, but similar problems arise in many other areas in Australia. We have a culture of secrecy around data that is damaging to our public discourse, it leads to worse analysis, it means less transparency in government, and it feeds distrust of government because it is not clear why decisions are being made. Making more data publicly available leads to a better society.

]]>
In my forecasting textbook coauthored with George Athanasopoulos, we provide formulas for the forecast variances of four simple benchmark forecasting methods, but we don’t explain where they come from. So here are the derivations.

We assume that the residuals from the method are uncorrelated and homoscedastic, with mean 0 and variance . Let denote the time series observations, and let be the estimated forecast mean (or point forecast). Then we can write where is a white noise process. Let be the estimated -step forecast variance.

For a random walk, Equation 1 suggests that the appropriate model is Therefore Consequently

Here the model is where is the seasonal period. Thus where is the integer part of (i.e., the number of complete years in the forecast period prior to time ). Therefore

The model underpinning the mean method is for some constant to be estimated. The least-squares estimate of is the mean, Thus, Therefore

For a random walk with drift Therefore, Now the least squares estimate of is . Therefore

The Australian Academy of Science has put out a new video about my work.

]]>
Regular readers will know that I develop statistical models and algorithms, and I write R implementations of them. I’m often asked if there are also Python implementations available. There are.

The best Python implementations for my time series methods are available from Nixtla. Here are some of their packages related to my work, all compatible with `scikit-learn`

.

**statsforecast**: Automatic ARIMA and ETS forecasting (Hyndman et al., 2002; Hyndman & Khandakar, 2008).**hierarchicalforecast**: Hierarchical forecasting (Hyndman et al., 2011; Wickramasuriya et al., 2019).**tsfeatures**: Time series features (Kang et al., 2017; Montero-Manso et al., 2020; T. S. Talagala et al., 2018).

They have also produced a lot of other great time series tools that are fast (optimized using `numba`

) and perform well compared to various alternatives.

GluonTS from Amazon is excellent and provides lots of probabilistic time series forecasting models, with wrappers to some of my R code, and statsforecast from Nixtla. The other models in GluonTS are also well worth exploring.

Merlion from Salesforce is another interesting python library which includes both my automatic ARIMA and automatic ETS algorithms, along with other forecasting methods. It also has some anomaly detection methods for time series.

The first attempt to port my `auto.arima()`

function to Python was `pmdarima`

.

sktime has the most complete set of time series methods for Python including

**AutoARIMA**: (Hyndman & Khandakar, 2008);**ETS**: (Hyndman et al., 2002);**BATS**/**TBATS**: (De Livera et al., 2011);**Theta**: (Assimakopoulos & Nikolopoulos, 2000; Hyndman & Billah, 2003);**STLForecaster**: (Bandara et al., 2022);**Croston**: (Shenstone & Hyndman, 2005);**Bagged-ETS**: (Bergmeir et al., 2016);

and more. These are also compatible with `scikit-learn`

.

Recently, Kate Buchhorn has ported some of my anomaly detection algorithms to Python and made them available in sktime including:

The statsmodels collection includes a few functions based on my work:

**ETS**: (Hyndman et al., 2002);**Theta**: (Assimakopoulos & Nikolopoulos, 2000; Hyndman & Billah, 2003);**MSTL**: (Bandara et al., 2022);**functional boxplot**: (Hyndman & Shang, 2010);**functional HDR boxplot**: (Hyndman & Shang, 2010);**rainbowplot**: (Hyndman & Shang, 2010).

Bohan Zhang has produced pyhts, a re-implementation of the hts package in Python, based on Hyndman et al. (2011), Hyndman et al. (2016) and Wickramasuriya et al. (2019).

Darts is a Python library for wrangling and forecasting time series. It includes wrappers for ETS and ARIMA models from `statsforecast`

and `pmdarima`

, as well as an implementation of TBATS and some reconciliation functionality.

Assimakopoulos, V., & Nikolopoulos, K. (2000). The theta model: A decomposition approach to forecasting. *International Journal of Forecasting*, *16*(4), 521–530. https://doi.org/10.1016/S0169-2070(00)00066-2

Bandara, K., Hyndman, R. J., & Bergmeir, C. (2022). MSTL: A seasonal-trend decomposition algorithm for time series with multiple seasonal patterns. *International J Operational Research*. robjhyndman.com/publications/mstl/

Bergmeir, C., Hyndman, R. J., & Benıtez, J. M. (2016). Bagging exponential smoothing methods using STL decomposition and Box-Cox transformation. *International Journal of Forecasting*, *32*(2), 303–312. robjhyndman.com/publications/bagging-ets

De Livera, A. M., Hyndman, R. J., & Snyder, R. D. (2011). Forecasting time series with complex seasonal patterns using exponential smoothing. *J American Statistical Association*, *106*(496), 1513–1527. robjhyndman.com/publications/complex-seasonality/

Hyndman, R. J., Ahmed, R. A., Athanasopoulos, G., & Shang, H. L. (2011). Optimal combination forecasts for hierarchical time series. *Computational Statistics & Data Analysis*, *55*(9), 2579–2589. robjhyndman.com/publications/hierarchical/

Hyndman, R. J., & Billah, M. B. (2003). Unmasking the theta method. *International Journal of Forecasting*, *19*(2), 287–290. robjhyndman.com/publications/unmasking-the-theta-method/

Hyndman, R. J., & Khandakar, Y. (2008). Automatic time series forecasting: The forecast package for R. *Journal of Statistical Software*, *26*(3), 1–22. robjhyndman.com/publications/automatic-forecasting/

Hyndman, R. J., Koehler, A. B., Snyder, R. D., & Grose, S. (2002). A state space framework for automatic forecasting using exponential smoothing methods. *International Journal of Forecasting*, *18*(3), 439–454. robjhyndman.com/publications/hksg/

Hyndman, R. J., Lee, A., & Wang, E. (2016). Fast computation of reconciled forecasts for hierarchical and grouped time series. *Computational Statistics & Data Analysis*, *97*, 16–32.

Hyndman, R. J., & Shang, H. L. (2010). Rainbow plots, bagplots and boxplots for functional data. *J Computational & Graphical Statistics*, *19*(1), 29–45. robjhyndman.com/publications/rainbow-fda

Kandanaarachchi, S., & Hyndman, R. J. (2021). Dimension reduction for outlier detection using DOBIN. *J Computational & Graphical Statistics*, *30*(1), 204–219. robjhyndman.com/publications/dobin

Kang, Y., Hyndman, R. J., & Smith-Miles, K. (2017). Visualising forecasting algorithm performance using time series instance spaces. *International Journal of Forecasting*, *33*(2), 345–358. robjhyndman.com/publications/ts-feature-space/

Montero-Manso, P., Athanasopoulos, G., Hyndman, R. J., & Talagala, T. S. (2020). FFORMA: Feature-based forecast model averaging. *International Journal of Forecasting*, *36*(1), 86–92. robjhyndman.com/publications/fforma/

Shenstone, L., & Hyndman, R. J. (2005). Stochastic models underlying croston’s method for intermittent demand forecasting. *Journal of Forecasting*, *24*(6), 389–402. robjhyndman.com/publications/croston/

Talagala, P. D., Hyndman, R. J., & Smith-Miles, K. (2021). Anomaly detection in high-dimensional data. *J Computational & Graphical Statistics*, *30*(2), 360–374. robjhyndman.com/publications/stray/

Talagala, T. S., Hyndman, R. J., & Athanasopoulos, G. (2018). *Meta-learning how to forecast time series* (Working Paper No. 6/18). Department of Econometrics & Business Statistics, Monash University. robjhyndman.com/publications/fforms/

Wickramasuriya, S. L., Athanasopoulos, G., & Hyndman, R. J. (2019). Optimal forecast reconciliation for hierarchical and grouped time series through trace minimization. *J American Statistical Association*, *114*(526), 804–819. robjhyndman.com/publications/mint

The various papers on forecast reconcilation written over the last 13 years have not used a consistent notation. We have revised our notation as we have slowly come to understand the problem better. This is common in a new area, but it makes it tricky to read the literature as you need to figure out how the notation of each paper maps to what you’ve already read.

Recently I spent a few weeks visiting Professor Tommaso Di Fonzo at the University of Padova (Italy), and one of the things we discussed was finding a notation we were both happy with so we could be more consistent in our future papers.

This is what we came up with. Hopefully others will agree and use it too!

For readers new to forecast reconciliation, Chapter 11 of FPP3 provides an introduction.

We observe time series at time , written as . The base forecasts of given data are denoted by .

This was the original formulation of the problem due to Hyndman et al. (2011), but presented here in our new notation.

Let be a vector of “bottom-level” time series at time , and let be a corresponding vector of aggregated time series, where and is the “aggregation” matrix specifying how the bottom-level series are to be aggregated to form . The full vector of time series is given by This leads to the “summing” or “structural” matrix given by such that .

All bottom-up, middle-out, top-down and linear reconciliation methods can be written as for different matrices .

Optimal reconciled forecasts are obtained with , or where the “mapping” matrix is given by are the -step forecasts of given data to time , and is an positive definite matrix. Different choices for lead to different solutions such as OLS, WLS and MinT (Wickramasuriya, Athanasopoulos, and Hyndman 2019).

There is actually no reason for to be restricted to aggregates of . They can include any linear combination of the bottom-level series , so the corresponding and matrices may contain any real values, not just 0s and 1s. Nevertheless, we will use the same notation for this more general setting.

This representation is more efficient and was used by Di Fonzo and Girolimetto (2021). It was also discussed in Wickramasuriya, Athanasopoulos, and Hyndman (2019). Here it is in the new notation.

We can express the structural representation using the constraint matrix so that . Then we can write the mapping matrix as Note that Equation 2 involves inverting an matrix, rather than the matrix in Equation 1. For most practical problems, , so Equation 2 is more efficient.

This form of the mapping matrix also allows us to interpret the reconciliation as an additive adjustment to the base forecasts. If the base forecasts are already reconciled, then and so .

The most general way to express the problem is not to denote individual series as bottom-level or aggregated, but to define the linear constraints where is an matrix, not necessarily full rank, which may contain any real values.

If is full rank, then Equation 2 holds with .

Temporal reconciliation was proposed by Athanasopoulos et al. (2017). Here it is in our new notation.

For simplicity we will assume the original (scalar) time series is observed with a single seasonality of period (e.g., for monthly data), and the total length of the series is an integer multiple of . We will denote the original series by , and the various temporally aggregated series by .

Let denote the factors of in ascending order, where and . For each factor of , we can construct a temporally aggregated series for . Of course, .

Since the observation index varies with each aggregation level, we define as the observation index of the most aggregated level (e.g., annual), so that at that level.

For each aggregation level, we stack the observations in the column vectors where , , and . Collecting these in one column vector, we obtain

The structural representation of this formulation is where and

The zero-constrained representation is .

If there are multiple seasonalities that are not integer multiples of each other, the resulting additional temporal aggregations can simply be stacked in , and can be extended accordingly.

Now consider the case where we have both cross-sectional and temporal aggregations, as discussed in Di Fonzo and Girolimetto (2021).

Suppose we have observed at the most temporally disaggregated level, including all the cross-sectionally disaggregated and aggregated (or constrained) series. Let be the th element of the vector , . For each , we can expand to include all the temporally aggregated variants, giving a vector of length : These can then be stacked into a long vector:

If denotes the structural matrix for the cross-sectional reconciliation, and denotes the structural matrix for the temporal reconciliation, then the cross-temporal structural matrix is , so that where the bottom-level series

Athanasopoulos, George, Rob J. Hyndman, Nikolaos Kourentzes, and Fotios Petropoulos. 2017. “Forecasting with temporal hierarchies.” *European Journal of Operational Research* 262 (1): 60–74. https://doi.org/10.1016/j.ejor.2017.02.046.

Di Fonzo, Tommaso, and Daniele Girolimetto. 2021. “Cross-temporal forecast reconciliation: Optimal combination method and heuristic alternatives.” *International Journal of Forecasting* forthcoming. https://doi.org/10.1016/j.ijforecast.2021.08.004.

Hyndman, Rob J., Roman A. Ahmed, George Athanasopoulos, and Han Lin Shang. 2011. “Optimal combination forecasts for hierarchical time series.” *Computational Statistics & Data Analysis* 55 (9): 2579–89. https://doi.org/10.1016/j.csda.2011.03.006.

Wickramasuriya, Shanika L., George Athanasopoulos, and Rob J. Hyndman. 2019. “Optimal Forecast Reconciliation for Hierarchical and Grouped Time Series Through Trace Minimization.” *Journal of the American Statistical Association* 114 (526): 804–19. https://doi.org/10.1080/01621459.2018.1448825.

WOMBAT is back! The WOMBAT conferences are “Workshops Organized by the Monash Business Analytics Team”. The first one was held in 2016, and later editions took place in 2017 and 2019. The 2022 version will take place on 6-7 December.

The focus this year is on communicating with data. As with all WOMBAT events, the purpose is to bring together analysts from academia, industry and government to learn and discuss new open source tools for business analytics and data science.

The first day will be virtual with 8 tutorials to choose from. I will be giving one on “Exploratory time series analysis using R”. Each tutorial has limited places, so register early!

The second day will be in-person workshop, limited 60 people. The keynote speaker is Amanda Cox, Head of special data projects, USAFacts. She is well-known for the sixteen years she spent at *The New York Times* producing some amazing data visualizations. She will speak on “Charts and Words: Being more influential with your data graphics”.

Other invited speakers will talk about data communication in environment, health and sport.

The workshop on December 7 will be held at the Royal South Yarra Lawn Tennis Club, located near the Yarra River, at 310 Williams Rd N, Toorak.

The **6 Dec online tutorials** are each limited to 20 participants. **Register for tutorials only here**. Registering for the 7 Dec workshop provides a 30% discount on tutorial registration. Your discount code will be sent in the confirmation email after you have first registered for 7 Dec.

The **7 Dec in-person event** is limited to 60 attendees. **Register here**. Registration includes lunch, morning and afternoon tea.

For more details, see the event website.

I’ve long wanted to ditch Disqus as the commenting system on this blog, as it is bloated, adds a lot of extra and unnecessary links, and generally looks noisy.

I’ve been using Disqus for more than 13 years, largely because it was the only available solution at the time I added comments. To make Disqus interface a little cleaner, I disabled all the advertising and as much of the other noise as possible, but it still looked like something from mySpace (for those of you who remember the 20th century).

But now there are several alternatives, and I’ve opted for giscus which is very lightweight, is built on Github Discussions, and is open source with no tracking or advertising. The other system I considered was utterances which is also hosted on Github, but uses issues rather than discussions. Consequently, comments on utterances can’t be threaded (with replies to previous comments). Also, giscus appears to have a much more active development team behind it.

The first step was to set up giscus on my blog. With quarto, this simply requires adding a few lines to the `_metadata.yml`

file in the relevant folder. Here is what it looks like for me:

```
comments:
giscus:
repo: robjhyndman/robjhyndman.com
repo-id: "R_kgDOH5G3Uw"
category: "Announcements"
category-id: "DIC_kwDOH5G3U84CRUp9"
mapping: "pathname"
reactions-enabled: true
loading: lazy
input-position: "bottom"
theme: "light"
```

Then I needed to set up giscus on the Github repo that hosts the website (`robjhyndman/robjhyndman.com`

). The instructions on the giscus website make it very simple.

The last step was the hardest – how to migrate 4000 comments from Disqus to giscus. Here I followed the nice blog post of Maëlle Salmon to download the Disqus comments as an xml file, and wrangle them into a tibble. Then I needed to use the GraphQL API for Github Discussions to generate all the comments on the Github repo. Fortunately, Mitch O’Hara-Wild came to my rescue (as usual), and helped with some of this code. The resulting code is here if anyone wants to try to do the same. You will need to change some specific details in lines 9-13. Everything else should work as it is.

]]>
In most recent years, I’ve run a 2-3 day workshop, held in various locations around the world. The one this year will be in Canberra on 9-10 November, and will be taught jointly with Associate Professor Bahman Rostami-Tabar. Details are here.

On day 1, we will look at the `tsibble`

data structure for flexibly managing collections of related time series. We will look at how to do data wrangling, data visualizations and exploratory data analysis. We will explore feature-based methods to analyse time series data in high dimensions. A similar feature-based approach can be used to identify anomalous time series within a collection of time series, or to cluster or classify time series. Primary packages for day 1 will be `tsibble`

, `lubridate`

and `feasts`

(along with the tidyverse of course).

Day 2 will be about forecasting. We will look at some classical time series models and how they are automated in the `fable`

package. We will look at creating ensemble forecasts and hybrid forecasts, as well as some new forecasting methods that have performed well in large-scale forecasting competitions. Finally, we will look at forecast reconciliation, allowing millions of time series to be forecast in a relatively short time while accounting for constraints on how the series are related.

Places are limited, so sign up early if you’re interested.

]]>
This website is now managed using Quarto. It ran using blogdown for the last six years, and various other platforms before that. But Quarto had a few advantages, and I wanted to learn how to do it, so here we are.

For the blogdown site, I had to (painfully) hack my own hugo theme to make it look the way I wanted. This one is pretty much straight out of the Quarto box other than some css styling, and some tweaking of Quarto templates. In case anyone wants to create something similar for themselves, I’ve set up a template version with just the bare minimum so you don’t need to wade through the extra folders I’ve kept to ensure existing links continue to work.

Actually, setting up a website in Quarto is extremely easy when following the online instructions. The hard part for me was the migration. There are about 800 pages that make up this site, and about 4000 comments on my blog. I didn’t want to break any existing links, so retaining the same structure was important.

I also decided to convert the commenting system from Disqus to giscus, which is built on Github Discussions. I’ll describe that conversion in a separate post in case anyone else wants to do something similar.

There are almost certainly things that are still broken, so please let me know in the comments below if you find anything that doesn’t work as it should.

]]>The cricketdata package has been around for a few years on github, and it has been on CRAN since February 2022. There are only four functions:

`fetch_cricinfo()`

: Fetch team data on international cricket matches provided by ESPNCricinfo.`fetch_player_data()`

: Fetch individual player data on international cricket matches provided by ESPNCricinfo.`find_player_id()`

: Search for the player ID on ESPNCricinfo.`fetch_cricsheet()`

: Fetch ball-by-ball, match and player data from Cricsheet.

Jacquie Tran wrote the first version of the `fetch_cricsheet()`

function, and the vignette which demonstrates it.

Here are some examples demonstrating the Cricinfo functions.

```
library(cricketdata)
library(tidyverse)
```

The `fetch_cricinfo()`

function downloads data for international T20, ODI or Test matches, for men or women, and for batting, bowling or fielding. By default, it downloads career-level statistics for individual players. Here is an example for women T20 bowlers.

```
# Fetch all Women's T20 data
wt20 <- fetch_cricinfo("T20", "Women", "Bowling")
```

```
wt20 %>%
select(Player, Country, Matches, Runs, Wickets, Economy, StrikeRate)
#> # A tibble: 1,798 × 7
#> Player Country Matches Runs Wickets Economy StrikeRate
#> <chr> <chr> <int> <int> <int> <dbl> <dbl>
#> 1 A Mohammed West Indies 117 2206 125 5.58 19.0
#> 2 S Ismail South Africa 105 2153 115 5.81 19.3
#> 3 EA Perry Australia 126 2237 115 5.87 19.9
#> 4 KH Brunt England 104 2019 108 5.50 20.4
#> 5 M Schutt Australia 84 1685 108 6.05 15.5
#> 6 Nida Dar Pakistan 114 1951 106 5.35 20.6
#> 7 SFM Devine New Zealand 107 1822 104 6.36 16.5
#> 8 A Shrubsole England 79 1587 102 5.96 15.7
#> 9 Poonam Yadav India 72 1495 98 5.75 15.9
#> 10 SR Taylor West Indies 111 1639 98 5.66 17.7
#> # … with 1,788 more rows
```

We can plot a bowler’s strike rate (balls per wicket) vs economy rate (runs per wicket). Each observation represents one player, who has taken at least 50 international wickets.

```
wt20 %>%
filter(Wickets >= 50) %>%
ggplot(aes(y = StrikeRate, x = Average)) +
geom_point(alpha = 0.3, col = "blue") +
ggtitle("Women International T20 Bowlers") +
ylab("Balls per wicket") + xlab("Runs per wicket")
```

The extraordinary result on the bottom left is due to the Thai all-rounder, Nattaya Boochatham, who has taken 59 wickets, with a strike rate of 13.475, an average of 8.78, and an economy rate of 3.909.

The next example shows Australian men’s ODI batting results by innings.

```
# Fetch all Australian Men's ODI data by innings
menODI <- fetch_cricinfo("ODI", "Men", "Batting", type = "innings", country = "Australia")
```

```
menODI %>%
select(Date, Player, Runs, StrikeRate, NotOut)
#> # A tibble: 10,675 × 5
#> Date Player Runs StrikeRate NotOut
#> <date> <chr> <int> <dbl> <lgl>
#> 1 2011-04-11 SR Watson 185 193. TRUE
#> 2 2007-02-20 ML Hayden 181 109. TRUE
#> 3 2017-01-26 DA Warner 179 140. FALSE
#> 4 2015-03-04 DA Warner 178 134. FALSE
#> 5 2001-02-09 ME Waugh 173 117. FALSE
#> 6 2016-10-12 DA Warner 173 127. FALSE
#> 7 2004-01-16 AC Gilchrist 172 137. FALSE
#> 8 2019-06-20 DA Warner 166 113. FALSE
#> 9 2006-03-12 RT Ponting 164 156. FALSE
#> 10 2016-12-04 SPD Smith 164 104. FALSE
#> # … with 10,665 more rows
```

```
menODI %>%
ggplot(aes(y = Runs, x = Date)) +
geom_point(alpha = 0.2, col = "#D55E00") +
geom_smooth() +
ggtitle("Australia Men ODI: Runs per Innings")
```

The average number of runs per innings slowly increased until about 2000, after which it has remained largely constant at about 35.1. This is a little higher than the smooth line shown on the plot, which has not taken account of not-out results.

Next, we demonstrate some of the fielding data available, using Test match fielding from Indian men’s players.

`Indfielding <- fetch_cricinfo("Test", "Men", "Fielding", country = "India")`

```
Indfielding
#> # A tibble: 303 × 11
#> Player Start End Matches Innings Dismis…¹ Caught Caugh…² Caugh…³
#> <chr> <int> <int> <int> <int> <int> <int> <int> <int>
#> 1 MS Dhoni 2005 2014 90 166 294 256 0 256
#> 2 R Dravid 1996 2012 163 299 209 209 209 0
#> 3 SMH Kirmani 1976 1986 88 151 198 160 0 160
#> 4 VVS Laxman 1996 2012 134 248 135 135 135 0
#> 5 KS More 1986 1993 49 90 130 110 0 110
#> 6 RR Pant 2018 2022 31 61 122 111 0 111
#> 7 SR Tendulkar 1989 2013 200 366 115 115 115 0
#> 8 SM Gavaskar 1971 1987 125 216 108 108 108 0
#> 9 NR Mongia 1994 2001 44 77 107 99 0 99
#> 10 M Azharuddin 1984 2000 99 177 105 105 105 0
#> # … with 293 more rows, 2 more variables: Stumped <int>,
#> # MaxDismissalsInnings <dbl>, and abbreviated variable names
#> # ¹Dismissals, ²CaughtFielder, ³CaughtBehind
```

We can plot the number of dismissals by number of matches for all male test players. Because wicket keepers typically have a lot more dismissals than other players, they are shown in a different colour.

```
Indfielding %>%
mutate(wktkeeper = (CaughtBehind > 0) | (Stumped > 0)) %>%
ggplot(aes(x = Matches, y = Dismissals, col = wktkeeper)) +
geom_point() +
ggtitle("Indian Men Test Fielding")
```

The high number of dismissals, close to 300, is of course due to MS Dhoni. Another interesting one here is the non-wicketkeeper with over 200 dismissals, which is Rahul Dravid who took 209 catches during his career.

Finally, let’s look at individual player data. The `fetch_player_data()`

requires the Cricinfo player ID, which you can either look up on their website, or use the `find_player_id()`

function. We will look at the ODI results of Australia’s captain, Meg Lanning.

```
meg_lanning_id <- find_player_id("Lanning")$ID
MegLanning <- fetch_player_data(meg_lanning_id, "ODI") %>%
mutate(NotOut = (Dismissal == "not out"))
```

```
MegLanning
#> # A tibble: 100 × 14
#> Date Innings Opposition Ground Runs Mins BF X4s X6s SR
#> <date> <int> <chr> <chr> <dbl> <dbl> <int> <int> <int> <dbl>
#> 1 2011-01-05 1 ENG Women Perth 20 60 38 2 0 52.6
#> 2 2011-01-07 2 ENG Women Perth 104 148 118 8 1 88.1
#> 3 2011-06-14 2 NZ Women Brisb… 11 15 14 2 0 78.6
#> 4 2011-06-16 1 NZ Women Brisb… 5 8 8 1 0 62.5
#> 5 2011-06-30 1 NZ Women Chest… 17 24 20 3 0 85
#> 6 2011-07-02 2 India Wom… Chest… 23 40 32 3 0 71.9
#> 7 2011-07-05 2 ENG Women Lord's 43 40 33 9 0 130.
#> 8 2011-07-07 2 ENG Women Worms… 0 2 3 0 0 0
#> 9 2012-03-12 1 India Wom… Ahmed… 45 61 44 7 0 102.
#> 10 2012-03-14 1 India Wom… Wankh… 128 125 104 19 1 123.
#> # … with 90 more rows, and 4 more variables: Pos <int>, Dismissal <chr>,
#> # Inns <int>, NotOut <lgl>
```

We can plot her runs per innings on the vertical axis over time on the horizontal axis.

```
# Compute batting average
MLave <- MegLanning %>%
filter(!is.na(Runs)) %>%
summarise(Average = sum(Runs) / (n() - sum(NotOut))) %>%
pull(Average)
names(MLave) <- paste("Average =", round(MLave, 2))
# Plot ODI scores
ggplot(MegLanning) +
geom_hline(aes(yintercept = MLave), col="gray") +
geom_point(aes(x = Date, y = Runs, col = NotOut)) +
ggtitle("Meg Lanning ODI Scores") +
scale_y_continuous(sec.axis = sec_axis(~., breaks = MLave))
```

She has shown amazing consistency over her career, with centuries scored in every year of her career except for 2021, when her highest score from 6 matches was 53.

Some of these data sets have been made available in R packages previously, based on `ts`

objects which worked ok for annual, quarterly and monthly data, but is not a good format for daily and sub-daily data.

The `tsibbledata`

package provides the function `monash_forecasting_respository()`

to download the data and return it as a `tsibble`

object. These can be analysed and plotted using the `feasts`

package, and modelled and forecast using the `fable`

package. It is convenient to simply load the `fpp3`

package which will then load all the necessary packages.

`library(fpp3)`

`── Attaching packages ────────────────────────────────── fpp3 0.4.0.9000 ──`

```
✔ tibble 3.1.8 ✔ tsibble 1.1.2
✔ dplyr 1.0.10 ✔ tsibbledata 0.4.1.9000
✔ tidyr 1.2.1 ✔ feasts 0.3.0.9000
✔ lubridate 1.8.0 ✔ fable 0.3.2.9000
✔ ggplot2 3.3.6 ✔ fabletools 0.3.2.9000
```

```
── Conflicts ──────────────────────────────────────────── fpp3_conflicts ──
✖ lubridate::date() masks base::date()
✖ dplyr::filter() masks stats::filter()
✖ tsibble::intersect() masks base::intersect()
✖ tsibble::interval() masks lubridate::interval()
✖ dplyr::lag() masks stats::lag()
✖ tsibble::setdiff() masks base::setdiff()
✖ tsibble::union() masks base::union()
```

To download the M3 data, we need to know the unique zenodo identifiers for each data set. From the forecastingdata.org page, find the M3 links (there are four, one for each observational frequency). For example, the Yearly link takes you to https://zenodo.org/record/4656222, so the Zenodo identifier for this data set is 4656222. Similarly, the Quarterly, Monthly and Other links have identifiers 4656262, 4656298 and 4656335 respectively.

```
m3_yearly <- monash_forecasting_repository(4656222)
m3_quarterly <- monash_forecasting_repository(4656262)
m3_monthly <- monash_forecasting_repository(4656298)
m3_other <- monash_forecasting_repository(4656335)
```

The first three data sets are stored with a date index, so they are read as daily data. Therefore we first need to convert them to yearly, quarterly and monthly data.

```
m3_yearly <- m3_yearly %>%
mutate(year = year(start_timestamp)) %>%
as_tsibble(index=year) %>%
select(-start_timestamp)
m3_quarterly <- m3_quarterly %>%
mutate(quarter = yearquarter(start_timestamp)) %>%
as_tsibble(index=quarter) %>%
select(-start_timestamp)
m3_monthly <- m3_monthly %>%
mutate(month = yearmonth(start_timestamp)) %>%
as_tsibble(index=month) %>%
select(-start_timestamp)
```

The resulting monthly data set is shown below.

`m3_monthly`

```
# A tsibble: 167,562 x 3 [1M]
# Key: series_name [1,428]
series_name value month
<chr> <dbl> <mth>
1 T1 2640 1990 Jan
2 T1 2640 1990 Feb
3 T1 2160 1990 Mar
4 T1 4200 1990 Apr
5 T1 3360 1990 May
6 T1 2400 1990 Jun
7 T1 3600 1990 Jul
8 T1 1920 1990 Aug
9 T1 4200 1990 Sep
10 T1 4560 1990 Oct
# … with 167,552 more rows
```

The series names are `T1`

, `T2`

, … The M3 data included both training and test data. These have been combined in this data set.

This data set contains total half-hourly electricity demand by state from 1 January 2002 to 1 April 2015, for five states of Australia: New South Wales, Queensland, South Australia, Tasmania, and Victoria. A subset of this data (one state and only three years) is provided as `tsibbledata::vic_elec`

.

```
aus_elec <- monash_forecasting_repository(4659727)
aus_elec
```

```
# A tsibble: 1,155,264 x 4 [30m] <UTC>
# Key: series_name, state [5]
series_name state start_timestamp value
<chr> <chr> <dttm> <dbl>
1 T1 NSW 2002-01-01 00:00:00 5714.
2 T1 NSW 2002-01-01 00:30:00 5360.
3 T1 NSW 2002-01-01 01:00:00 5015.
4 T1 NSW 2002-01-01 01:30:00 4603.
5 T1 NSW 2002-01-01 02:00:00 4285.
6 T1 NSW 2002-01-01 02:30:00 4075.
7 T1 NSW 2002-01-01 03:00:00 3943.
8 T1 NSW 2002-01-01 03:30:00 3884.
9 T1 NSW 2002-01-01 04:00:00 3878.
10 T1 NSW 2002-01-01 04:30:00 3838.
# … with 1,155,254 more rows
```

```
aus_elec %>%
filter(state=="VIC") %>%
autoplot(value) +
labs(x = "Time", y="Electricity demand (MWh)")
```

We also provide some accuracy measures of the performance of 13 baseline forecasting methods applied to the data sets in the repository. This makes it easy for anyone proposing a new method to compare against some standard existing methods, without having to do all the calculations themselves.

The data can be loaded as a Pandas dataframe by following this example in the github repository. Download the `.tsf`

files as required from Zenodo and put them into `tsf_data`

folder.