A blog by Rob J Hyndman 

Twitter Gplus RSS

Posts Tagged ‘references’:

biblatex for statisticians

Published on 22 August 2014

I am now using bibla­tex for all my bib­li­o­graphic work as it seems to have devel­oped enough to be sta­ble and reli­able. The big advan­tage of bibla­tex is that it is easy to for­mat the bib­li­og­ra­phy to con­form to spe­cific jour­nal or pub­lisher styles. It is also pos­si­ble to have struc­tured bib­li­ogra­phies (e.g., divided into sec­tions: books, papers, R pack­ages, etc.)

No Comments  comments 

Varian on big data

Published on 16 June 2014

Last week my research group dis­cussed Hal Varian’s inter­est­ing new paper on “Big data: new tricks for econo­met­rics”, Jour­nal of Eco­nomic Per­spec­tives, 28(2): 3–28. It’s a nice intro­duc­tion to trees, bag­ging and forests, plus a very brief entrée to the LASSO and the elas­tic net, and to slab and spike regres­sion. Not enough to be able to use them, but ok if you’ve no idea what they are.

No Comments  comments 

To explain or predict?

Published on 19 May 2014

Last week, my research group dis­cussed Galit Shmueli’s paper “To explain or to pre­dict?”, Sta­tis­ti­cal Sci­ence, 25(3), 289–310. (See her web­site for fur­ther mate­ri­als.) This is a paper every­one doing sta­tis­tics and econo­met­rics should read as it helps to clar­ify a dis­tinc­tion that is often blurred. In the dis­cus­sion, the fol­low­ing issues were cov­ered amongst other things. The AIC is bet­ter suited to model selec­tion for pre­dic­tion as it is asymp­tot­i­cally equiv­a­lent to leave-​​​​one-​​​​out cross-​​​​validation in regres­sion, or one-​​​​step-​​​​cross-​​​​validation in time series. On the other hand, it might be argued that the BIC is bet­ter suited to model selec­tion for expla­na­tion, as it is con­sis­tent. P-​​​​values are asso­ci­ated with expla­na­tion, not pre­dic­tion. It makes lit­tle sense to use p-​​​​values to deter­mine the vari­ables in a model that is being used for pre­dic­tion. (There are prob­lems in using p-​​​​values for vari­able selec­tion in any con­text, but that is a dif­fer­ent issue.) Mul­ti­collinear­ity has a very dif­fer­ent impact if your goal is pre­dic­tion from when your goal is esti­ma­tion. When pre­dict­ing, mul­ti­collinear­ity is not really a prob­lem pro­vided the val­ues of your pre­dic­tors lie within the hyper-​​​​region of the pre­dic­tors used when esti­mat­ing the model. An ARIMA model has no explana­tory use, but is great at short-​​​​term pre­dic­tion. How to


4 Comments  comments 

Great papers to read

Published on 2 May 2014

My research group meets every two weeks. It is always fun to talk about gen­eral research issues and new tools and tips we have dis­cov­ered. We also use some of the time to dis­cuss a paper that I choose for them. Today we dis­cussed Breiman’s clas­sic (2001) two cul­tures paper — some­thing every sta­tis­ti­cian should read, includ­ing the dis­cus­sion. I select papers that I want every mem­ber of research team to be famil­iar with. Usu­ally they are clas­sics in fore­cast­ing, or they are recent sur­vey papers. In the last cou­ple of months we have also read the fol­low­ing papers: Tim­mer­mann (2008) Elu­sive return pre­dictabil­ity Diebold (2013) Com­par­ing pre­dic­tive accu­racy, twenty years later: A per­sonal per­spec­tive on the use and abuse of Diebold-​​​​Mariano tests Gneit­ing and Kats­fuss (2014) Prob­a­bilis­tic fore­cast­ing Makri­dakis and Hibon (1978) Accu­racy of fore­cast­ing: an empir­i­cal investigation

2 Comments  comments 

Past, present, and future of statistical science

Published on 28 April 2014

This is the title of a won­der­ful new book that has just been released, cour­tesy of the Com­mit­tee of Pres­i­dents of Sta­tis­ti­cal Soci­eties. It can be freely down­loaded from the COPSS web­site or a hard copy can be pur­chased on Ama­zon (for only a lit­tle over 10c per page which is not bad com­pared to other sta­tis­tics books). The book con­sists of 52 chap­ters span­ning 622 pages. The full table of con­tents below shows its scope and the list of authors (a ver­i­ta­ble who’s who in statistics).

1 Comment  comments 

Errors on percentage errors

Published on 16 April 2014

The MAPE (mean absolute per­cent­age error) is a pop­u­lar mea­sure for fore­cast accu­racy and is defined as     where denotes an obser­va­tion and denotes its fore­cast, and the mean is taken over . Arm­strong (1985, p.348) was the first (to my knowl­edge) to point out the asym­me­try of the MAPE say­ing that “it has a bias favor­ing esti­mates that are below the actual values”.

9 Comments  comments 

My forecasting book now on Amazon

Published on 9 April 2014

For all those peo­ple ask­ing me how to obtain a print ver­sion of my book “Fore­cast­ing: prin­ci­ples and prac­tice” with George Athana­sopou­los, you now can. Order on Ama​zon​.com Order on Ama​zon​.co​.uk Order on Ama​zon​.fr The online book will con­tinue to be freely avail­able. The print ver­sion of the book is intended to help fund the devel­op­ment of the OTexts plat­form. The price is US45, 27 or €35. Compare that to195 for my pre­vi­ous fore­cast­ing text­book, 150 for Fildes and Ord, or182 for Gonzalez-​​​​Rivera. No mat­ter how good the books are, the prices are absurdly high. OTexts is intended to be a dif­fer­ent kind of pub­lisher — all our books are online and free, those in print will be rea­son­ably priced. The online ver­sion will con­tinue to be updated reg­u­larly. The print ver­sion is a snap­shot of the online ver­sion today. We will release a new print edi­tion occa­sion­ally, no more than annu­ally and only when the online ver­sion has changed enough to war­rant a new print edi­tion. We are plan­ning an offline elec­tronic ver­sion as well. I’ll announce it here when it is ready.

5 Comments  comments 

Top papers in the International Journal of Forecasting

Published on 4 February 2014

Every year or so, Else­vier asks me to nom­i­nate five Inter­na­tional Jour­nal of Fore­cast­ing papers from the last two years to high­light in their mar­ket­ing mate­ri­als as “Editor’s Choice”. I try to select papers across a broad range of sub­jects, and I take into account cita­tions and down­loads as well as my own impres­sion of the paper. That tends to bias my selec­tion a lit­tle towards older papers as they have had more time to accu­mu­late cita­tions. Here are the papers I chose this morn­ing (in the order they appeared): Diebold and Yil­maz (2012) Bet­ter to give than to receive: Pre­dic­tive direc­tional mea­sure­ment of volatil­ity spillovers. IJF 28(1), 57–66. Loter­man, Brown, Martens, Mues, and Bae­sens (2012) Bench­mark­ing regres­sion algo­rithms for loss given default mod­el­ing. IJF 28(1), 161–170. Soyer and Hog­a­rth (2012) The illu­sion of pre­dictabil­ity: How regres­sion sta­tis­tics mis­lead experts. IJF 28(3), 695–711. Fried­man (2012) Fast sparse regres­sion and clas­si­fi­ca­tion. IJF 28(3), 722–738. Davy­denko and Fildes (2013) Mea­sur­ing fore­cast­ing accu­racy: The case of judg­men­tal adjust­ments to SKU-​​​​level demand fore­casts. IJF 29(3), 510–522. Last time I did this, three of the five papers I chose went on to win awards. (I don’t pick the award win­ners — that’s a mat­ter for the whole edi­to­r­ial board.) On the other hand, I didn’t pick the


No Comments  comments 

Automatic time series forecasting in Granada

Published on 31 January 2014

In two weeks I am pre­sent­ing a work­shop at the Uni­ver­sity of Granada (Spain) on Auto­matic Time Series Fore­cast­ing. Unlike most of my talks, this is not intended to be pri­mar­ily about my own research. Rather it is to pro­vide a state-​​​​of-​​​​the-​​​​art overview of the topic (at a level suit­able for Mas­ters stu­dents in Com­puter Sci­ence). I thought I’d pro­vide some his­tor­i­cal per­spec­tive on the devel­op­ment of auto­matic time series fore­cast­ing, plus give some com­ments on the cur­rent best practices.

1 Comment  comments 

Free books on statistical learning

Published on 30 January 2014

Hastie, Tib­shi­rani and Friedman’s Ele­ments of Sta­tis­ti­cal Learn­ing first appeared in 2001 and is already a clas­sic. It is my go-​​​​to book when I need a quick refresher on a machine learn­ing algo­rithm. I like it because it is writ­ten using the lan­guage and per­spec­tive of sta­tis­tics, and pro­vides a very use­ful entry point into the lit­er­a­ture of machine learn­ing which has its own ter­mi­nol­ogy for sta­tis­ti­cal con­cepts. A free down­load­able pdf ver­sion is avail­able on the web­site. Recently, a sim­pler related book appeared enti­tled Intro­duc­tion to Sta­tis­ti­cal Learn­ing with appli­ca­tions in R by James, Wit­ten, Hastie and Tib­shi­rani. It “is aimed for upper level under­grad­u­ate stu­dents, mas­ters stu­dents and Ph.D. stu­dents in the non-​​​​mathematical sci­ences”. This would be a great text­book for our new 3rd year sub­ject on Busi­ness Ana­lyt­ics. The R code is a wel­come addi­tion in show­ing how to imple­ment the meth­ods. Again, a free down­load­able pdf ver­sion is avail­able on the web­site. There is also a new, free book on Sta­tis­ti­cal foun­da­tions of machine learn­ing by Bön­tempi and Ben Taieb avail­able on the OTexts plat­form. This is more of a hand­book and is writ­ten by two authors com­ing from a machine learn­ing back­ground. R code is also pro­vided. Being an OTexts book, it is con­tin­u­ally updated and revised, and is freely avail­able


2 Comments  comments