RSS feeds for statistics and related journals

I’ve now res­ur­rected the col­lec­tion of research jour­nals that I fol­low, and set it up as a shared col­lec­tion in feedly. So any­one can eas­ily sub­scribe to all of the same jour­nals, or select a sub­set of them, to fol­low on feedly. Con­tinue reading →

IJF review papers

Review papers are extremely use­ful for new researchers such as PhD stu­dents, or when you want to learn about a new research field. The Inter­na­tional Jour­nal of Fore­cast­ing pro­duced a whole review issue in 2006, and it con­tains some of the most highly cited papers we have ever pub­lished. Now, begin­ning with the lat­est issue of the jour­nal, we have started pub­lish­ing occa­sional review arti­cles on selected areas of fore­cast­ing. The first two arti­cles are:

  1. Elec­tric­ity price fore­cast­ing: A review of the state-​​of-​​the-​​art with a look into the future by Rafał Weron.
  2. The chal­lenges of pre-​​launch fore­cast­ing of adop­tion time series for new durable prod­ucts by Paul Good­win, Sheik Meeran, and Karima Dyussekeneva.

Both tackle very impor­tant top­ics in fore­cast­ing. Weron’s paper con­tains a com­pre­hen­sive sur­vey of work on elec­tric­ity price fore­cast­ing, coher­ently bring­ing together a large body of diverse research — I think it is the longest paper I have ever approved at 50 pages. Good­win, Meeran and Dyussekeneva review research on new prod­uct fore­cast­ing, a prob­lem every com­pany that pro­duces goods or ser­vices has faced; when there are no his­tor­i­cal data avail­able, how do you fore­cast the sales of your product?

We have a few other review papers in progress, so keep an eye out for them in future issues.


biblatex for statisticians

I am now using bibla­tex for all my bib­li­o­graphic work as it seems to have devel­oped enough to be sta­ble and reli­able. The big advan­tage of bibla­tex is that it is easy to for­mat the bib­li­og­ra­phy to con­form to spe­cific jour­nal or pub­lisher styles. It is also pos­si­ble to have struc­tured bib­li­ogra­phies (e.g., divided into sec­tions: books, papers, R pack­ages, etc.) Con­tinue reading →

Varian on big data

Last week my research group dis­cussed Hal Varian’s inter­est­ing new paper on “Big data: new tricks for econo­met­rics”, Jour­nal of Eco­nomic Per­spec­tives, 28(2): 3–28.

It’s a nice intro­duc­tion to trees, bag­ging and forests, plus a very brief entrée to the LASSO and the elas­tic net, and to slab and spike regres­sion. Not enough to be able to use them, but ok if you’ve no idea what they are. Con­tinue reading →

To explain or predict?

Last week, my research group dis­cussed Galit Shmueli’s paper “To explain or to pre­dict?”, Sta­tis­ti­cal Sci­ence, 25(3), 289–310. (See her web­site for fur­ther mate­ri­als.) This is a paper every­one doing sta­tis­tics and econo­met­rics should read as it helps to clar­ify a dis­tinc­tion that is often blurred. In the dis­cus­sion, the fol­low­ing issues were cov­ered amongst other things.

  1. The AIC is bet­ter suited to model selec­tion for pre­dic­tion as it is asymp­tot­i­cally equiv­a­lent to leave-​​one-​​out cross-​​validation in regres­sion, or one-​​step-​​cross-​​validation in time series. On the other hand, it might be argued that the BIC is bet­ter suited to model selec­tion for expla­na­tion, as it is consistent.
  2. P-​​values are asso­ci­ated with expla­na­tion, not pre­dic­tion. It makes lit­tle sense to use p-​​values to deter­mine the vari­ables in a model that is being used for pre­dic­tion. (There are prob­lems in using p-​​values for vari­able selec­tion in any con­text, but that is a dif­fer­ent issue.)
  3. Mul­ti­collinear­ity has a very dif­fer­ent impact if your goal is pre­dic­tion from when your goal is esti­ma­tion. When pre­dict­ing, mul­ti­collinear­ity is not really a prob­lem pro­vided the val­ues of your pre­dic­tors lie within the hyper-​​region of the pre­dic­tors used when esti­mat­ing the model.
  4. An ARIMA model has no explana­tory use, but is great at short-​​term prediction.
  5. How to han­dle miss­ing val­ues in regres­sion is dif­fer­ent in a pre­dic­tive con­text com­pared to an explana­tory con­text. For exam­ple, when build­ing an explana­tory model, we could just use all the data for which we have com­plete obser­va­tions (assum­ing there is no sys­tem­atic nature to the miss­ing­ness). But when pre­dict­ing, you need to be able to pre­dict using what­ever data you have. So you might have to build sev­eral mod­els, with dif­fer­ent num­bers of pre­dic­tors, to allow for dif­fer­ent vari­ables being missing.
  6. Many sta­tis­tics and econo­met­rics text­books fail to observe these dis­tinc­tions. In fact, a lot of sta­tis­ti­cians and econo­me­tri­cians are trained only in the expla­na­tion par­a­digm, with pre­dic­tion an after­thought. That is unfor­tu­nate as most applied work these days requires pre­dic­tive mod­el­ling, rather than explana­tory modelling.



Great papers to read

My research group meets every two weeks. It is always fun to talk about gen­eral research issues and new tools and tips we have dis­cov­ered. We also use some of the time to dis­cuss a paper that I choose for them. Today we dis­cussed Breiman’s clas­sic (2001) two cul­tures paper — some­thing every sta­tis­ti­cian should read, includ­ing the discussion.

I select papers that I want every mem­ber of research team to be famil­iar with. Usu­ally they are clas­sics in fore­cast­ing, or they are recent sur­vey papers.

In the last cou­ple of months we have also read the fol­low­ing papers:

Past, present, and future of statistical science

This is the title of a won­der­ful new book that has just been released, cour­tesy of the Com­mit­tee of Pres­i­dents of Sta­tis­ti­cal Societies.

It can be freely down­loaded from the COPSS web­site or a hard copy can be pur­chased on Ama­zon (for only a lit­tle over 10c per page which is not bad com­pared to other sta­tis­tics books).

The book con­sists of 52 chap­ters span­ning 622 pages. The full table of con­tents below shows its scope and the list of authors (a ver­i­ta­ble who’s who in sta­tis­tics). Con­tinue reading →

Errors on percentage errors

The MAPE (mean absolute per­cent­age error) is a pop­u­lar mea­sure for fore­cast accu­racy and is defined as

    \[\text{MAPE} = 100\text{mean}(|y_t - \hat{y}_t|/|y_t|)\]

where y_t denotes an obser­va­tion and \hat{y}_t denotes its fore­cast, and the mean is taken over t.

Arm­strong (1985, p.348) was the first (to my knowl­edge) to point out the asym­me­try of the MAPE say­ing that “it has a bias favor­ing esti­mates that are below the actual val­ues”. Con­tinue reading →

My forecasting book now on Amazon

For all those peo­ple ask­ing me how to obtain a print ver­sion of my book “Fore­cast­ing: prin­ci­ples and prac­tice” with George Athana­sopou­los, you now can.

FPP cover

Order on Ama​zon​.com

Order on Ama​zon​.co​.uk

Order on Ama​zon​.fr

The online book will con­tinue to be freely avail­able. The print ver­sion of the book is intended to help fund the devel­op­ment of the OTexts plat­form.

The price is US$45, £27 or €35.

Com­pare that to $195 for my pre­vi­ous fore­cast­ing text­book, $150 for Fildes and Ord, or $182 for Gonzalez-​​Rivera. No mat­ter how good the books are, the prices are absurdly high.

OTexts is intended to be a dif­fer­ent kind of pub­lisher — all our books are online and free, those in print will be rea­son­ably priced.

The online ver­sion will con­tinue to be updated reg­u­larly. The print ver­sion is a snap­shot of the online ver­sion today. We will release a new print edi­tion occa­sion­ally, no more than annu­ally and only when the online ver­sion has changed enough to war­rant a new print edition.

We are plan­ning an offline elec­tronic ver­sion as well. I’ll announce it here when it is ready.

Top papers in the International Journal of Forecasting

Every year or so, Else­vier asks me to nom­i­nate five Inter­na­tional Jour­nal of Fore­cast­ing papers from the last two years to high­light in their mar­ket­ing mate­ri­als as “Editor’s Choice”. I try to select papers across a broad range of sub­jects, and I take into account cita­tions and down­loads as well as my own impres­sion of the paper. That tends to bias my selec­tion a lit­tle towards older papers as they have had more time to accu­mu­late cita­tions. Here are the papers I chose this morn­ing (in the order they appeared):

  1. Diebold and Yil­maz (2012) Bet­ter to give than to receive: Pre­dic­tive direc­tional mea­sure­ment of volatil­ity spillovers. IJF 28(1), 57–66.
  2. Loter­man, Brown, Martens, Mues, and Bae­sens (2012) Bench­mark­ing regres­sion algo­rithms for loss given default mod­el­ing. IJF 28(1), 161–170.
  3. Soyer and Hog­a­rth (2012) The illu­sion of pre­dictabil­ity: How regres­sion sta­tis­tics mis­lead experts. IJF 28(3), 695–711.
  4. Fried­man (2012) Fast sparse regres­sion and clas­si­fi­ca­tion. IJF 28(3), 722–738.
  5. Davy­denko and Fildes (2013) Mea­sur­ing fore­cast­ing accu­racy: The case of judg­men­tal adjust­ments to SKU-​​level demand fore­casts. IJF 29(3), 510–522.

Last time I did this, three of the five papers I chose went on to win awards. (I don’t pick the award win­ners — that’s a mat­ter for the whole edi­to­r­ial board.) On the other hand, I didn’t pick the paper that got the top award for the period 2010–2011. So per­haps my selec­tion is not such a good guide.