Prediction intervals too narrow

Almost all pre­dic­tion inter­vals from time series mod­els are too nar­row. This is a well-​​known phe­nom­e­non and arises because they do not account for all sources of uncer­tainty. In my 2002 IJF paper, we mea­sured the size of the prob­lem by com­put­ing the actual cov­er­age per­cent­age of the pre­dic­tion inter­vals on hold-​​out sam­ples. We found that for ETS mod­els, nom­i­nal 95% inter­vals may only pro­vide cov­er­age between 71% and 87%. The dif­fer­ence is due to miss­ing sources of uncertainty.

There are at least four sources of uncer­tainty in fore­cast­ing using time series models:

  1. The ran­dom error term;
  2. The para­me­ter estimates;
  3. The choice of model for the his­tor­i­cal data;
  4. The con­tin­u­a­tion of the his­tor­i­cal data gen­er­at­ing process into the future.

Con­tinue reading →

hts with regressors

The hts pack­age for R allows for fore­cast­ing hier­ar­chi­cal and grouped time series data. The idea is to gen­er­ate fore­casts for all series at all lev­els of aggre­ga­tion with­out impos­ing the aggre­ga­tion con­straints, and then to rec­on­cile the fore­casts so they sat­isfy the aggre­ga­tion con­straints. (An intro­duc­tion to rec­on­cil­ing hier­ar­chi­cal and grouped time series is avail­able in this Fore­sight paper.)

The base fore­casts can be gen­er­ated using any method, with ETS mod­els and ARIMA mod­els pro­vided as options in the forecast.gts() func­tion. As ETS mod­els do not allow for regres­sors, you will need to choose ARIMA mod­els if you want to include regres­sors. Con­tinue reading →

Explaining the ABS unemployment fluctuations

Although the Guardian claimed yes­ter­day that I had explained “what went wrong” in the July and August unem­ploy­ment fig­ures, I made no attempt to do so as I had no infor­ma­tion about the prob­lems. Instead, I just explained a lit­tle about the pur­pose of sea­sonal adjustment.

How­ever, today I learned a lit­tle more about the ABS unem­ploy­ment data prob­lems, includ­ing what may be the expla­na­tion for the fluc­tu­a­tions. This expla­na­tion was offered by Westpac’s chief econ­o­mist, Bill Evans (see here for a video of him explain­ing the issue). Con­tinue reading →

Connect with local employers

I keep telling stu­dents that there are lots of jobs in data sci­ence (includ­ing sta­tis­tics), and they often tell me they can’t find them adver­tised. As usual, you do have to do some net­work­ing, and one of the best ways of doing it is via a Data Sci­ence Meetup. Many cities now have them includ­ing Mel­bourne, Syd­ney, Lon­don, etc. It is the per­fect oppor­tu­nity to meet with local employ­ers, many of which are hir­ing due to the huge expan­sion in the use of data analy­sis in busi­ness (aka busi­ness analytics).

At the end of each Mel­bourne meetup, some employ­ers have been adver­tis­ing their cur­rent ana­lytic job open­ings to the audience.

Now the local orga­niz­ers are going to extend the oppor­tu­nity to allow job-​​searchers to give a 90 sec­ond pitch to employ­ers. Details are pro­vided on the mes­sage board.

TBATS with regressors

I’ve received a few emails about includ­ing regres­sion vari­ables (i.e., covari­ates) in TBATS mod­els. As TBATS mod­els are related to ETS mod­els, tbats() is unlikely to ever include covari­ates as explained here. It won’t actu­ally com­plain if you include an xreg argu­ment, but it will ignore it.

When I want to include covari­ates in a time series model, I tend to use auto.arima() with covari­ates included via the xreg argu­ment. If the time series has mul­ti­ple sea­sonal peri­ods, I use Fourier terms as addi­tional covari­ates. See my post on fore­cast­ing daily data for some dis­cus­sion of this model. Note that fourier() and fourierf() now han­dle msts objects, so it is very sim­ple to do this.

For exam­ple, if holiday con­tains some dummy vari­ables asso­ci­ated with pub­lic hol­i­days and holidayf con­tains the cor­re­spond­ing vari­ables for the first 100 fore­cast peri­ods, then the fol­low­ing code can be used:

y <- msts(x, frequency=c(7,365.25))
z <- fourier(y, K=c(5,5))
zf <- fourierf(y, K=c(5,5), h=100)
fit <- auto.arima(y, xreg=cbind(z,holiday), seasonal=FALSE)
fc <- forecast(fit, xreg=cbind(zf,holidayf), h=100)

The main dis­ad­van­tage of the ARIMA approach is that the sea­son­al­ity is forced to be peri­odic, whereas a TBATS model allows for dynamic seasonality.

FPP now available as a downloadable e-​​book

FPP coverMy fore­cast­ing text­book with George Athana­sopou­los is already avail­able online (for free), and in print via Ama­zon (for under $40). Now we have made it avail­able as a down­load­able e-​​book via Google Books (for $15.55). The Google Books ver­sion is iden­ti­cal to the print ver­sion on Ama­zon (apart from a few typos that have been fixed).

To use the e-​​book ver­sion on an iPad or Android tablet, you need to have the Google Books app installed [iPad, Android]. You could also put it on an iPhone or Android phone, but I wouldn’t rec­om­mend it as the text will be too small to read.

You can down­load a free sam­ple (up to the end of Chap­ter 2) if you want to check how it will look on your device.

The sales of the print and e-​​book ver­sions are used to fund the run­ning the OTexts web­site where all OTexts books are freely available.

The online ver­sion is con­tin­u­ously updated — any errors dis­cov­ered are fixed imme­di­ately. The print and e-​​book ver­sions will be updated approx­i­mately annu­ally to bring them into line with the online version.


Generating quantile forecasts in R

From today’s email:

I have just fin­ished read­ing a copy of ‘Forecasting:Principles and Prac­tice’ and I have found the book really inter­est­ing. I have par­tic­u­larly enjoyed the case stud­ies and focus on prac­ti­cal applications.

After fin­ish­ing the book I have joined a fore­cast­ing com­pe­ti­tion to put what I’ve learnt to the test. I do have a cou­ple of queries about the fore­cast­ing out­puts required. The out­put required is a quan­tile fore­cast, is this the same as pre­dic­tion inter­vals? Is there any R func­tion to pro­duce quan­tiles from 0 to 99?

If you were able to point me in the right direc­tion regard­ing the above it would be greatly appreciated.

Many Thanks,

Con­tinue reading →

Resources for the FPP book

The FPP resources page has recently been updated with sev­eral new addi­tions including

  • R code for all exam­ples in the book. This was already avail­able within each chap­ter, but the exam­ples have been col­lected into one file per chap­ter to save copy­ing and past­ing the var­i­ous code fragments.
  • Slides from a course on Pre­dic­tive Ana­lyt­ics from the Uni­ver­sity of Sydney.
  • Slides from a course on Eco­nomic Fore­cast­ing from the Uni­ver­sity of Hawaii.

If any one using the book has other mate­r­ial that could be made avail­able, please send them to me. For exam­ple, recorded lec­tures, slides, addi­tional exam­ples, assign­ments, exam ques­tions, solu­tions, etc.

A new candidate for worst figure

Today I read a paper that had been sub­mit­ted to the IJF which included the fol­low­ing figure


along with sev­eral sim­i­lar plots. (Click for a larger ver­sion.) I haven’t seen any­thing this bad for a long time. In fact, I think I would find it very dif­fi­cult to repro­duce using R, or even Excel (which is par­tic­u­larly adept at bad graphics).

A few years ago I pro­duced “Twenty rules for good graph­ics”. I think I need to add a cou­ple of addi­tional rules:

  • Rep­re­sent time changes using lines.
  • Never use fill pat­terns such as cross-​​hatching.

(My orig­i­nal rule #20 said Avoid pie charts.)

It would have been rel­a­tively sim­ple to show these data as six lines on a plot of GDP against time. That would have made it obvi­ous that the Euro­pean GDP was shrink­ing, the GDP of Asia/​Oceania was increas­ing, while other regions of the world were fairly sta­ble. At least I think that is what is hap­pen­ing, but it is very hard to tell from such graph­i­cal obfuscation.