All Hyndsight posts by date
There have been some great data visualizations produced of COVID-19 case and deaths data, the best known of which is the graph from John Burn-Murdoch in the Financial Times. To my knowledge, it was first used by Matt Cowgill from the Grattan Institute, and has been widely copied. This is a great visualization and has helped introduce log-scale graphics to a wide audience.
Reproducing the Financial Times cumulative confirmed cases graph To produce something like it, we can use the tidycovid19 package from Joachim Gassen:
What makes forecasting hard? Forecasting pandemics is harder than many people think. In my book with George Athanasopoulos, we discuss the contributing factors that make forecasts relatively accurate. We identify three major factors:
how well we understand the factors that contribute to it; how much data is available; whether the forecasts can affect the thing we are trying to forecast. For example, tomorrow’s weather can be forecast relatively accurately using modern tools because we have good models of the physical atmosphere, there is tons of data, and our weather forecasts cannot possibly affect what actually happens.
The tsibbledata packages contains the vic_elec data set, containing half-hourly electricity demand for the state of Victoria, along with corresponding temperatures from the capital city, Melbourne. These data cover the period 2012-2014.
Other similar data sets are also available, and these may be of interest to researchers in the area.
For people new to tsibbles, please read my introductory post.
Australian state-level demand The rawdata for other states are also stored in the tsibbledata github repository (under the data-raw folder), but these are not included in the package to satisfy CRAN space constraints.
library(tidyverse) library(tsibble) library(readabs) Australian data analysts will know how frustrating it is to work with time series data from the Australian Bureau of Statistics. They are stored as multiple ugly Excel files (each containing multiple sheets) with inconsistent formatting, embedded comments, meta data stored along with the actual data, dates stored in a painful Excel format, and so on.
Fortunately there is an R package available to make this a little easier.
library(tidyverse) library(tsibble) library(lubridate) library(feasts) library(fable) In my previous post about the new fable package, we saw how fable can produce forecast distributions, not just point forecasts. All my examples used Gaussian (normal) distributions, so in this post I want to show how non-Gaussian forecasting can be done.
As an example, we will use eating-out expenditure in my home state of Victoria.
vic_cafe <- tsibbledata::aus_retail %>% filter( State == "Victoria", Industry == "Cafes, restaurants and catering services" ) %>% select(Month, Turnover) vic_cafe %>% autoplot(Turnover) + ggtitle("Monthly turnover of Victorian cafes") Forecasting with transformations Clearly the variance is increasing with the level of the series, so we will consider modelling a Box-Cox transformation of the data.
The fable package for doing tidy forecasting in R is now on CRAN. Like tsibble and feasts, it is also part of the tidyverts family of packages for analysing, modelling and forecasting many related time series (stored as tsibbles).
For a brief introduction to tsibbles, see this post from last month.
Here we will forecast Australian tourism data by state/region and purpose. This data is stored in the tourism tsibble where Trips contains domestic visitor nights in thousands.
In my last post, I showed how the feasts package can be used to produce various time series graphics.
The feasts package also includes functions for computing FEatures And Statistics from Time Series (hence the name). In this post I will give three examples of how these might be used.
library(tidyverse) library(tsibble) library(feasts) Exploring Australian tourism data I used this example in my talk at useR!2019 in Toulouse, and it is also the basis of a vignette in the package, and a recent blog post by Mitchell O’Hara-Wild.
This is the second post on the new tidyverts packages for tidy time series analysis. The previous post is here.
For users migrating from the forecast package, it might be useful to see how to get similar graphics to those they are used to. The forecast package is built for ts objects, while the feasts package provides features, statistics and graphics for tsibbles. (See my first post for a description of tsibbles.
There is a new suite of packages for tidy time series analysis, that integrates easily into the tidyverse way of working. We call these the tidyverts packages, and they are available at tidyverts.org. Much of the work on these packages has been done by Earo Wang and Mitchell O’Hara-Wild.
The first of the packages to make it to CRAN was tsibble, providing the data infrastructure for tidy temporal data with wrangling tools.
One of the few people in Australia who did not write off a possible Coalition win at the recent federal election was Peter Ellis. We’ve invited him to come and give a talk about making sense of opinion polls and the Australian federal election on Friday this week at Monash University. Visitors are welcome. Here are the details.
11am, 31 May 2019. Room G03, Learning and Teaching Building, 19 Ancora Imparo Way, Clayton Campus, Monash University