We need more open data in Australia


13 December 2022

I made the following comments in an ISI webinar today on Statistical and Data Science Issues in Australasia.

Australia has a problem with government data. Actually it has three problems with government data: 1. It is often kept secret. 2. If it is available, it is often out-of-date. 3. If it is available and timely, it is often in a form that makes any analysis difficult.

I think it would be better for the country if government data was available freely, immediately, and in a form that is useful for analysis. Of course, we should make an exception if there are privacy issues, or some other harm that would be caused by releasing it. Let me explain using some examples.

Take mortality data. During the pandemic, it has been important to know how many people died of any cause, so we could know the effect of the pandemic overall. Obviously some people were dying of COVID-19, but others might have been dying because they were unable to get treated when medical staff were overwhelmed by COVID patients. On the other hand, lockdowns may have reduced deaths due to road crashes, but perhaps they also affected deaths due to suicide. If we could compare the total deaths each week during the pandemic, with the corresponding totals in previous years, we could determine the overall effect of the pandemic on Australian mortality.

You would think, that during a global pandemic, having good mortality data would be important. But in June 2020, nearly six months after the start of COVID-19, the most recent available mortality data in Australia was from 2018. Eighteen months out of date! Think about that. For the first six months of the biggest public health event in 100 years, we had no official data on the effect of COVID-19 on Australian mortality. Eventually the Australian Bureau of Statistics got their act together and started producing provisional mortality data more frequently, but only after several of us complained loudly and publicly. Even now, the provisional mortality data available from the ABS is more than 3 months out of date. Contrast that to other countries. I could find mortality data on 38 countries, and Australia was the 5th worst for producing timely mortality data.

Another example concerns COVID-19 case numbers. There is still no reliable Australian government repository of daily COVID-19 cases by state. Some states are now producing historical data, but for most of 2020, when we really needed reliable information, the public information was incomplete. For much of the first two years of the pandemic, the state health departments were putting out their little dashboard images containing the numbers, but these were preliminary numbers, and did not include cases that were registered late, and some other data revisions. To do any serious analysis, you needed daily case numbers from the beginning of the pandemic, but these were not available on government websites until relatively recently. Some media organizations, and some individuals, were collating the case numbers from the dashboard images and putting them online in the form of spreadsheets, and people were using them to do analysis, but these data were usually inaccurate and subject to revisions. The state health departments generally didn’t update the initial numbers that were released, even though they had more reliable information. So the public data was inaccurate, and most people wanting to do any data analysis were relying on media outlets, or a few 14 year old boys running covidlive.com.au, to get even that.

For nearly three years, I have been part of the forecasting team appointed to provide advice to all of the Chief Health Officers of the states and territories of Australia. Every week, we produce forecasts of COVID daily case numbers for all states and territories. For that purpose, we were able to put together a relatively good data set of case numbers for all states, but we were explicitly forbidden to make the data publicly available, even though our data was more accurate than what was appearing in the media.

Similarly, our forecasts were kept secret even though they were being used to make policy decisions. Premiers would justify their policies by vaguely referring to “the modelling”, or occasionally “the Doherty modelling” (even though most of us are not at the Doherty institute), but we would have preferred to have our forecasts available. So the good data and the forecasts are kept secret, and what is available is of poorer quality, or out-of-date.

Why? There are no privacy issues here. No harm would be done by working more transparently. On the contrary, if everyone had access to the best available data, then the independent modelling that was being done would have been of a higher quality.

We use a forecasting ensemble, where we have several forecasting models, and we combine them to produce the final forecasts that are submitted to the various state governments each week. Because we can’t share the data, the only forecasts that are included are those from members of our team. Generally in forecasting, it is better to use a wide range of models, not rely on a select few. But we can’t do that in Australia because of government obsession with secrecy.

Compare that to the United States where there was an official repository of data set up early in the pandemic, and anyone could download it and produce forecasts, and submit those forecasts to the Centre for Disease Control for inclusion in their analysis. Therefore, the US forecasting ensemble that was being used for policy decisions was based on a much larger range of models, and anyone could contribute to it. The resulting forecasts are then published publicly, so anyone can see what is being forecast, and what information a government has available when making policy decisions.

I’ve focused on COVID, but similar problems arise in many other areas in Australia. We have a culture of secrecy around data that is damaging to our public discourse, it leads to worse analysis, it means less transparency in government, and it feeds distrust of government because it is not clear why decisions are being made. Making more data publicly available leads to a better society.