ABS seasonal adjustment update

Since my last post on the sea­sonal adjust­ment prob­lems at the Aus­tralian Bureau of Sta­tis­tics, I’ve been work­ing closely with peo­ple within the ABS to help them resolve the prob­lems in time for tomorrow’s release of the Octo­ber unem­ploy­ment figures.

Now that the ABS has put out a state­ment about the prob­lem, I thought it would be use­ful to explain the under­ly­ing method­ol­ogy for those who are inter­ested. Con­tinue reading →

Prediction intervals too narrow

Almost all pre­dic­tion inter­vals from time series mod­els are too nar­row. This is a well-​​known phe­nom­e­non and arises because they do not account for all sources of uncer­tainty. In my 2002 IJF paper, we mea­sured the size of the prob­lem by com­put­ing the actual cov­er­age per­cent­age of the pre­dic­tion inter­vals on hold-​​out sam­ples. We found that for ETS mod­els, nom­i­nal 95% inter­vals may only pro­vide cov­er­age between 71% and 87%. The dif­fer­ence is due to miss­ing sources of uncertainty.

There are at least four sources of uncer­tainty in fore­cast­ing using time series models:

  1. The ran­dom error term;
  2. The para­me­ter estimates;
  3. The choice of model for the his­tor­i­cal data;
  4. The con­tin­u­a­tion of the his­tor­i­cal data gen­er­at­ing process into the future.

Con­tinue reading →

hts with regressors

The hts pack­age for R allows for fore­cast­ing hier­ar­chi­cal and grouped time series data. The idea is to gen­er­ate fore­casts for all series at all lev­els of aggre­ga­tion with­out impos­ing the aggre­ga­tion con­straints, and then to rec­on­cile the fore­casts so they sat­isfy the aggre­ga­tion con­straints. (An intro­duc­tion to rec­on­cil­ing hier­ar­chi­cal and grouped time series is avail­able in this Fore­sight paper.)

The base fore­casts can be gen­er­ated using any method, with ETS mod­els and ARIMA mod­els pro­vided as options in the forecast.gts() func­tion. As ETS mod­els do not allow for regres­sors, you will need to choose ARIMA mod­els if you want to include regres­sors. Con­tinue reading →

Congratulations to Dr Souhaib Ben Taieb

Souhaib Ben Taieb has been awarded his doc­tor­ate at the Uni­ver­sité libre de Brux­elles and so he is now offi­cially Dr Ben Taieb! Although Souhaib lives in Brus­sels, and was a stu­dent at the Uni­ver­sité libre de Brux­elles, I co-​​supervised his doc­tor­ate (along with Pro­fes­sor Gian­luca Bon­tempi). Souhaib is the 19th PhD stu­dent of mine to graduate.

His the­sis was on “Machine learn­ing strate­gies for multi-​​step-​​ahead time series fore­cast­ing” and is now avail­able online. The prior research in this area has largely cen­tred around two strate­gies (recur­sive and direct), and which one works bet­ter in cer­tain cir­cum­stances. Recur­sive fore­cast­ing is the stan­dard approach where a model is designed to pre­dict one step ahead, and is then iter­ated to obtain multi-​​step-​​ahead fore­casts. Direct fore­cast­ing involves using a sep­a­rate fore­cast­ing model for each fore­cast hori­zon. Souhaib took a very dif­fer­ent per­spec­tive from the prior research and has devel­oped new strate­gies that are either hybrids of these two strate­gies, or com­pletely dif­fer­ent from either of them. The result­ing fore­casts are often sig­nif­i­cantly bet­ter than those obtained using the more tra­di­tional approaches.

Some of the papers to come out of Souhaib’s the­sis are already avail­able on his Google scholar page.

Well done Souhaib, and best wishes for the future.




Explaining the ABS unemployment fluctuations

Although the Guardian claimed yes­ter­day that I had explained “what went wrong” in the July and August unem­ploy­ment fig­ures, I made no attempt to do so as I had no infor­ma­tion about the prob­lems. Instead, I just explained a lit­tle about the pur­pose of sea­sonal adjustment.

How­ever, today I learned a lit­tle more about the ABS unem­ploy­ment data prob­lems, includ­ing what may be the expla­na­tion for the fluc­tu­a­tions. This expla­na­tion was offered by Westpac’s chief econ­o­mist, Bill Evans (see here for a video of him explain­ing the issue). Con­tinue reading →

Connect with local employers

I keep telling stu­dents that there are lots of jobs in data sci­ence (includ­ing sta­tis­tics), and they often tell me they can’t find them adver­tised. As usual, you do have to do some net­work­ing, and one of the best ways of doing it is via a Data Sci­ence Meetup. Many cities now have them includ­ing Mel­bourne, Syd­ney, Lon­don, etc. It is the per­fect oppor­tu­nity to meet with local employ­ers, many of which are hir­ing due to the huge expan­sion in the use of data analy­sis in busi­ness (aka busi­ness analytics).

At the end of each Mel­bourne meetup, some employ­ers have been adver­tis­ing their cur­rent ana­lytic job open­ings to the audience.

Now the local orga­niz­ers are going to extend the oppor­tu­nity to allow job-​​searchers to give a 90 sec­ond pitch to employ­ers. Details are pro­vided on the mes­sage board.

IIF Sponsored Workshops

The Inter­na­tional Insti­tute of Fore­cast­ers spon­sors work­shops every year, each of which focuses on a spe­cific theme. The pur­pose of these work­shops is to facil­i­tate small, infor­mal meet­ings where experts in a par­tic­u­lar field of fore­cast­ing can dis­cuss fore­cast­ing prob­lems, research, and solu­tions. Over the years, our work­shops have cov­ered top­ics from Pre­dict­ing Rare Events, ICT Fore­cast­ing, and, most recently, Sin­gu­lar Spec­trum Analy­sis. Often these work­shops are asso­ci­ated with a spe­cial issue of the Inter­na­tional Jour­nal of Fore­cast­ing.

If you are already host­ing a work­shop on a fore­cast­ing topic and need sup­port from the IIF, or if you are inter­ested in organ­is­ing and host­ing a new work­shop, please con­tact George Athana­sopou­los.

A list of past work­shops and work­shop guide­lines are pro­vided on the IIF web­site.

TBATS with regressors

I’ve received a few emails about includ­ing regres­sion vari­ables (i.e., covari­ates) in TBATS mod­els. As TBATS mod­els are related to ETS mod­els, tbats() is unlikely to ever include covari­ates as explained here. It won’t actu­ally com­plain if you include an xreg argu­ment, but it will ignore it.

When I want to include covari­ates in a time series model, I tend to use auto.arima() with covari­ates included via the xreg argu­ment. If the time series has mul­ti­ple sea­sonal peri­ods, I use Fourier terms as addi­tional covari­ates. See my post on fore­cast­ing daily data for some dis­cus­sion of this model. Note that fourier() and fourierf() now han­dle msts objects, so it is very sim­ple to do this.

For exam­ple, if holiday con­tains some dummy vari­ables asso­ci­ated with pub­lic hol­i­days and holidayf con­tains the cor­re­spond­ing vari­ables for the first 100 fore­cast peri­ods, then the fol­low­ing code can be used:

y <- msts(x, seasonal.periods=c(7,365.25))
z <- fourier(y, K=c(5,5))
zf <- fourierf(y, K=c(5,5), h=100)
fit <- auto.arima(y, xreg=cbind(z,holiday), seasonal=FALSE)
fc <- forecast(fit, xreg=cbind(zf,holidayf), h=100)

The main dis­ad­van­tage of the ARIMA approach is that the sea­son­al­ity is forced to be peri­odic, whereas a TBATS model allows for dynamic seasonality.