A blog by Rob J Hyndman 

Twitter Gplus RSS

Posts Tagged ‘statistics’:


Resources for the FPP book

Published on 3 September 2014

The FPP resources page has recently been updated with sev­eral new addi­tions includ­ing R code for all exam­ples in the book. This was already avail­able within each chap­ter, but the exam­ples have been col­lected into one file per chap­ter to save copy­ing and past­ing the var­i­ous code frag­ments. Slides from a course on Pre­dic­tive Ana­lyt­ics from the Uni­ver­sity of Syd­ney. Slides from a course on Eco­nomic Fore­cast­ing from the Uni­ver­sity of Hawaii. If any one using the book has other mate­r­ial that could be made avail­able, please send them to me. For exam­ple, recorded lec­tures, slides, addi­tional exam­ples, assign­ments, exam ques­tions, solu­tions, etc.

 
No Comments  comments 

A new candidate for worst figure

Published on 1 September 2014

Today I read a paper that had been sub­mit­ted to the IJF which included the fol­low­ing fig­ure along with sev­eral sim­i­lar plots. (Click for a larger ver­sion.) I haven’t seen any­thing this bad for a long time. In fact, I think I would find it very dif­fi­cult to repro­duce using R, or even Excel (which is par­tic­u­larly adept at bad graph­ics). A few years ago I pro­duced “Twenty rules for good graph­ics”. I think I need to add a cou­ple of addi­tional rules: Rep­re­sent time changes using lines. Never use fill pat­terns such as cross-​​​​hatching. (My orig­i­nal rule #20 said Avoid pie charts.) It would have been rel­a­tively sim­ple to show these data as six lines on a plot of GDP against time. That would have made it obvi­ous that the Euro­pean GDP was shrink­ing, the GDP of Asia/​​Oceania was increas­ing, while other regions of the world were fairly sta­ble. At least I think that is what is hap­pen­ing, but it is very hard to tell from such graph­i­cal obfuscation.

 
No Comments  comments 

GEFCom 2014 energy forecasting competition is underway

Published on 18 August 2014

GEF­Com 2014 is the most advanced energy fore­cast­ing com­pe­ti­tion ever orga­nized, both in terms of the data involved, and in terms of the way the fore­casts will be eval­u­ated. So every­one inter­ested in energy fore­cast­ing should head over to the com­pe­ti­tion web­page and start fore­cast­ing: www​.gef​com​.org. This time, the com­pe­ti­tion is hosted on Crow­d­AN­A­LYTIX rather than Kag­gle. High­lights of GEFCom2014: An upgraded edi­tion from GEFCom2012 Four tracks: elec­tric load, elec­tric­ity price, wind power and solar power fore­cast­ing. Prob­a­bilis­tic fore­cast­ing: con­tes­tants are required to sub­mit 99 quan­tiles for each step through­out the fore­cast hori­zon. Rolling fore­cast­ing: incre­men­tal data sets are being released on weekly basis to fore­cast the next period of inter­est. Prizes for win­ning teams and insti­tu­tions: up to 3 teams from each track will be rec­og­nized as the win­ning team; top insti­tu­tions with mul­ti­ple well-​​​​performing teams will be rec­og­nized as the win­ning insti­tu­tions. Global par­tic­i­pa­tion: 200+ peo­ple from 40+ coun­tries have already signed up the GEFCom2014 inter­est list. Tao Hong (the main orga­nizer) has a few tips on his blog that you should read before starting.  

 
No Comments  comments 

Visit of Di Cook

Published on 13 August 2014

Next week, Pro­fes­sor Di Cook from Iowa State Uni­ver­sity is vis­it­ing my research group at Monash Uni­ver­sity. Di is a world leader in data visu­al­iza­tion, and is espe­cially well-​​​​known for her work on inter­ac­tive graph­ics and the XGobi and GGobi soft­ware. See her book with Deb Swayne for details. For those want­ing to hear her speak, read on.

 
No Comments  comments 

Coherent population forecasting using R

Published on 24 July 2014

This is an exam­ple of how to use the demog­ra­phy pack­age in R for sto­chas­tic pop­u­la­tion fore­cast­ing with coher­ent com­po­nents. It is based on the papers by Hyn­d­man and Booth (IJF 2008) and Hyn­d­man, Booth and Yas­meen (Demog­ra­phy 2013). I will use Aus­tralian data from 1950 to 2009 and fore­cast the next 50 years. In demog­ra­phy, “coher­ent” fore­casts are where male and females (or other sub-​​​​groups) do not diverge over time. (Essen­tially, we require the dif­fer­ence between the groups to be sta­tion­ary.) When we wrote the 2008 paper, we did not know how to con­strain the fore­casts to be coher­ent in a func­tional data con­text and so this was not dis­cussed. My later 2013 paper pro­vided a way of impos­ing coher­ence. This blog post shows how to imple­ment both ideas using R.

 
No Comments  comments 

Plotting the characteristic roots for ARIMA models

Published on 23 July 2014

When mod­el­ling data with ARIMA mod­els, it is some­times use­ful to plot the inverse char­ac­ter­is­tic roots. The fol­low­ing func­tions will com­pute and plot the inverse roots for any fit­ted ARIMA model (includ­ing sea­sonal models).

 
No Comments  comments 

I am not an econometrician

Published on 21 July 2014

I am a sta­tis­ti­cian, but I have worked in a depart­ment of pre­dom­i­nantly econo­me­tri­cians for the past 17 years. It is a lit­tle like an Aus­tralian vis­it­ing the United States. Ini­tially, it seems that we talk the same lan­guage, do the same sorts of things, and have a very sim­i­lar cul­ture. But the longer you stay there, the more you realise there are dif­fer­ences that run deep and affect the way you see the world. Last week at my research group meet­ing, I spoke about some of the dif­fer­ences I have noticed. Coin­ci­den­tally, Andrew Gel­man blogged about the same issue a day later.

 
No Comments  comments 

Variations on rolling forecasts

Published on 16 July 2014

Rolling fore­casts are com­monly used to com­pare time series mod­els. Here are a few of the ways they can be com­puted using R. I will use ARIMA mod­els as a vehi­cle of illus­tra­tion, but the code can eas­ily be adapted to other uni­vari­ate time series models.

 
No Comments  comments 

Varian on big data

Published on 16 June 2014

Last week my research group dis­cussed Hal Varian’s inter­est­ing new paper on “Big data: new tricks for econo­met­rics”, Jour­nal of Eco­nomic Per­spec­tives, 28(2): 3–28. It’s a nice intro­duc­tion to trees, bag­ging and forests, plus a very brief entrée to the LASSO and the elas­tic net, and to slab and spike regres­sion. Not enough to be able to use them, but ok if you’ve no idea what they are.

 
No Comments  comments 

Specifying complicated groups of time series in hts

Published on 15 June 2014

With the lat­est ver­sion of the hts pack­age for R, it is now pos­si­ble to spec­ify rather com­pli­cated group­ing struc­tures rel­a­tively eas­ily. All aggre­ga­tion struc­tures can be rep­re­sented as hier­ar­chies or as cross-​​​​products of hier­ar­chies. For exam­ple, a hier­ar­chi­cal time series may be based on geog­ra­phy: coun­try, state, region, store. Often there is also a sep­a­rate prod­uct hier­ar­chy: prod­uct groups, prod­uct types, packet size. Fore­casts of all the dif­fer­ent types of aggre­ga­tion are required; e.g., prod­uct type A within region X. The aggre­ga­tion struc­ture is a cross-​​​​product of the two hier­ar­chies. This frame­work includes even appar­ently non-​​​​hierarchical data: con­sider the sim­ple case of a time series of deaths split by sex and state. We can con­sider sex and state as two very sim­ple hier­ar­chies with only one level each. Then we wish to fore­cast the aggre­gates of all com­bi­na­tions of the two hier­ar­chies. Any num­ber of sep­a­rate hier­ar­chies can be com­bined in this way. Non-​​​​hierarchical fac­tors such as sex can be treated as single-​​​​level hierarchies.

 
No Comments  comments