New Australian data on the HMD

The Human Mor­tal­ity Data­base is a won­der­ful resource for any­one inter­ested in demo­graphic data. It is a care­fully curated col­lec­tion of high qual­ity deaths and pop­u­la­tion data from 37 coun­tries, all in a con­sis­tent for­mat with con­sis­tent def­i­n­i­tions. I have used it many times and never cease to be amazed at the care taken to main­tain such a great resource.

The data are con­tin­u­ally being revised and updated. Today the Aus­tralian data has been updated to 2011. There is a time lag because of lagged death reg­is­tra­tions which results in under­counts; so only data that are likely to be com­plete are included.

Tim Riffe from the HMD has pro­vided the fol­low­ing infor­ma­tion about the update:

  1. All death counts since 1964 are now included by year of occur­rence, up to 2011. We have 2012 data but do not pub­lish them because they are likely a 5% under­count due to lagged registration.
  2. Death count inputs for 1921 to 1963 are now in sin­gle ages. Pre­vi­ously they were in 5-​​year age groups. Rather than hav­ing an open age group of 85+ in this period counts usu­ally go up to the max­i­mum observed (stated) age. This change (i) intro­duces minor heap­ing in early years and (ii) implies dif­fer­ent appar­ent old-​​age mor­tal­ity than before, since pre­vi­ously any­thing above 85 was mod­eled accord­ing to the Meth­ods Pro­to­col.
  3. Pop­u­la­tion denom­i­na­tors have been swapped out for years 1992 to the present, owing to new ABS method­ol­ogy and inter­censal esti­mates for the recent period.

Some of the data can be read into R using the and hmd.e0 func­tions from the demog­ra­phy pack­age. Tim has his own pack­age on github that pro­vides a more exten­sive interface.

Visualization of probabilistic forecasts

This week my research group dis­cussed Adrian Raftery’s recent paper on “Use and Com­mu­ni­ca­tion of Prob­a­bilis­tic Fore­casts” which pro­vides a fas­ci­nat­ing but brief sur­vey of some of his work on mod­el­ling and com­mu­ni­cat­ing uncer­tain futures. Coin­ci­den­tally, today I was also sent a copy of David Spiegelhalter’s paper on “Visu­al­iz­ing Uncer­tainty About the Future”. Both are well-​​worth reading.

It made me think about my own efforts to com­mu­ni­cate future uncer­tainty through graph­ics. Of course, for time series fore­casts I nor­mally show pre­dic­tion inter­vals. I pre­fer to use more than one inter­val at a time because it helps con­vey a lit­tle more infor­ma­tion. The default in the fore­cast pack­age for R is to show both an 80% and a 95% inter­val like this: Con­tinue reading →

IJF review papers

Review papers are extremely use­ful for new researchers such as PhD stu­dents, or when you want to learn about a new research field. The Inter­na­tional Jour­nal of Fore­cast­ing pro­duced a whole review issue in 2006, and it con­tains some of the most highly cited papers we have ever pub­lished. Now, begin­ning with the lat­est issue of the jour­nal, we have started pub­lish­ing occa­sional review arti­cles on selected areas of fore­cast­ing. The first two arti­cles are:

  1. Elec­tric­ity price fore­cast­ing: A review of the state-​​of-​​the-​​art with a look into the future by Rafał Weron.
  2. The chal­lenges of pre-​​launch fore­cast­ing of adop­tion time series for new durable prod­ucts by Paul Good­win, Sheik Meeran, and Karima Dyussekeneva.

Both tackle very impor­tant top­ics in fore­cast­ing. Weron’s paper con­tains a com­pre­hen­sive sur­vey of work on elec­tric­ity price fore­cast­ing, coher­ently bring­ing together a large body of diverse research — I think it is the longest paper I have ever approved at 50 pages. Good­win, Meeran and Dyussekeneva review research on new prod­uct fore­cast­ing, a prob­lem every com­pany that pro­duces goods or ser­vices has faced; when there are no his­tor­i­cal data avail­able, how do you fore­cast the sales of your product?

We have a few other review papers in progress, so keep an eye out for them in future issues.


Seasonal periods

I get ques­tions about this almost every week. Here is an exam­ple from a recent com­ment on this blog:

I have two large time series data. One is sep­a­rated by sec­onds inter­vals and the other by min­utes. The length of each time series is 180 days. I’m using R (3.1.1) for fore­cast­ing the data. I’d like to know the value of the “fre­quency” argu­ment in the ts() func­tion in R, for each data set. Since most of the exam­ples and cases I’ve seen so far are for months or days at the most, it is quite con­fus­ing for me when deal­ing with equally sep­a­rated sec­onds or min­utes. Accord­ing to my under­stand­ing, the “fre­quency” argu­ment is the num­ber of obser­va­tions per sea­son. So what is the “sea­son” in the case of seconds/​minutes? My guess is that since there are 86,400 sec­onds and 1440 min­utes a day, these should be the val­ues for the “freq” argu­ment. Is that correct?

Con­tinue reading →

ABS seasonal adjustment update

Since my last post on the sea­sonal adjust­ment prob­lems at the Aus­tralian Bureau of Sta­tis­tics, I’ve been work­ing closely with peo­ple within the ABS to help them resolve the prob­lems in time for tomorrow’s release of the Octo­ber unem­ploy­ment figures.

Now that the ABS has put out a state­ment about the prob­lem, I thought it would be use­ful to explain the under­ly­ing method­ol­ogy for those who are inter­ested. Con­tinue reading →

Prediction intervals too narrow

Almost all pre­dic­tion inter­vals from time series mod­els are too nar­row. This is a well-​​known phe­nom­e­non and arises because they do not account for all sources of uncer­tainty. In my 2002 IJF paper, we mea­sured the size of the prob­lem by com­put­ing the actual cov­er­age per­cent­age of the pre­dic­tion inter­vals on hold-​​out sam­ples. We found that for ETS mod­els, nom­i­nal 95% inter­vals may only pro­vide cov­er­age between 71% and 87%. The dif­fer­ence is due to miss­ing sources of uncertainty.

There are at least four sources of uncer­tainty in fore­cast­ing using time series models:

  1. The ran­dom error term;
  2. The para­me­ter estimates;
  3. The choice of model for the his­tor­i­cal data;
  4. The con­tin­u­a­tion of the his­tor­i­cal data gen­er­at­ing process into the future.

Con­tinue reading →

hts with regressors

The hts pack­age for R allows for fore­cast­ing hier­ar­chi­cal and grouped time series data. The idea is to gen­er­ate fore­casts for all series at all lev­els of aggre­ga­tion with­out impos­ing the aggre­ga­tion con­straints, and then to rec­on­cile the fore­casts so they sat­isfy the aggre­ga­tion con­straints. (An intro­duc­tion to rec­on­cil­ing hier­ar­chi­cal and grouped time series is avail­able in this Fore­sight paper.)

The base fore­casts can be gen­er­ated using any method, with ETS mod­els and ARIMA mod­els pro­vided as options in the forecast.gts() func­tion. As ETS mod­els do not allow for regres­sors, you will need to choose ARIMA mod­els if you want to include regres­sors. Con­tinue reading →

Congratulations to Dr Souhaib Ben Taieb

Souhaib Ben Taieb has been awarded his doc­tor­ate at the Uni­ver­sité libre de Brux­elles and so he is now offi­cially Dr Ben Taieb! Although Souhaib lives in Brus­sels, and was a stu­dent at the Uni­ver­sité libre de Brux­elles, I co-​​supervised his doc­tor­ate (along with Pro­fes­sor Gian­luca Bon­tempi). Souhaib is the 19th PhD stu­dent of mine to graduate.

His the­sis was on “Machine learn­ing strate­gies for multi-​​step-​​ahead time series fore­cast­ing” and is now avail­able online. The prior research in this area has largely cen­tred around two strate­gies (recur­sive and direct), and which one works bet­ter in cer­tain cir­cum­stances. Recur­sive fore­cast­ing is the stan­dard approach where a model is designed to pre­dict one step ahead, and is then iter­ated to obtain multi-​​step-​​ahead fore­casts. Direct fore­cast­ing involves using a sep­a­rate fore­cast­ing model for each fore­cast hori­zon. Souhaib took a very dif­fer­ent per­spec­tive from the prior research and has devel­oped new strate­gies that are either hybrids of these two strate­gies, or com­pletely dif­fer­ent from either of them. The result­ing fore­casts are often sig­nif­i­cantly bet­ter than those obtained using the more tra­di­tional approaches.

Some of the papers to come out of Souhaib’s the­sis are already avail­able on his Google scholar page.

Well done Souhaib, and best wishes for the future.




Explaining the ABS unemployment fluctuations

Although the Guardian claimed yes­ter­day that I had explained “what went wrong” in the July and August unem­ploy­ment fig­ures, I made no attempt to do so as I had no infor­ma­tion about the prob­lems. Instead, I just explained a lit­tle about the pur­pose of sea­sonal adjustment.

How­ever, today I learned a lit­tle more about the ABS unem­ploy­ment data prob­lems, includ­ing what may be the expla­na­tion for the fluc­tu­a­tions. This expla­na­tion was offered by Westpac’s chief econ­o­mist, Bill Evans (see here for a video of him explain­ing the issue). Con­tinue reading →