A blog by Rob J Hyndman 

Twitter Gplus RSS

Posts Tagged ‘statistics’:

Coherent population forecasting using R

Published on 24 July 2014

This is an exam­ple of how to use the demog­ra­phy pack­age in R for sto­chas­tic pop­u­la­tion fore­cast­ing with coher­ent com­po­nents. It is based on the papers by Hyn­d­man and Booth (IJF 2008) and Hyn­d­man, Booth and Yas­meen (Demog­ra­phy 2013). I will use Aus­tralian data from 1950 to 2009 and fore­cast the next 50 years. In demog­ra­phy, “coher­ent” fore­casts are where male and females (or other sub-​​​​groups) do not diverge over time. (Essen­tially, we require the dif­fer­ence between the groups to be sta­tion­ary.) When we wrote the 2008 paper, we did not know how to con­strain the fore­casts to be coher­ent in a func­tional data con­text and so this was not dis­cussed. My later 2013 paper pro­vided a way of impos­ing coher­ence. This blog post shows how to imple­ment both ideas using R.

No Comments  comments 

Plotting the characteristic roots for ARIMA models

Published on 23 July 2014

When mod­el­ling data with ARIMA mod­els, it is some­times use­ful to plot the inverse char­ac­ter­is­tic roots. The fol­low­ing func­tions will com­pute and plot the inverse roots for any fit­ted ARIMA model (includ­ing sea­sonal models).

No Comments  comments 

I am not an econometrician

Published on 21 July 2014

I am a sta­tis­ti­cian, but I have worked in a depart­ment of pre­dom­i­nantly econo­me­tri­cians for the past 17 years. It is a lit­tle like an Aus­tralian vis­it­ing the United States. Ini­tially, it seems that we talk the same lan­guage, do the same sorts of things, and have a very sim­i­lar cul­ture. But the longer you stay there, the more you realise there are dif­fer­ences that run deep and affect the way you see the world. Last week at my research group meet­ing, I spoke about some of the dif­fer­ences I have noticed. Coin­ci­den­tally, Andrew Gel­man blogged about the same issue a day later.

No Comments  comments 

Variations on rolling forecasts

Published on 16 July 2014

Rolling fore­casts are com­monly used to com­pare time series mod­els. Here are a few of the ways they can be com­puted using R. I will use ARIMA mod­els as a vehi­cle of illus­tra­tion, but the code can eas­ily be adapted to other uni­vari­ate time series models.

No Comments  comments 

Varian on big data

Published on 16 June 2014

Last week my research group dis­cussed Hal Varian’s inter­est­ing new paper on “Big data: new tricks for econo­met­rics”, Jour­nal of Eco­nomic Per­spec­tives, 28(2): 3–28. It’s a nice intro­duc­tion to trees, bag­ging and forests, plus a very brief entrée to the LASSO and the elas­tic net, and to slab and spike regres­sion. Not enough to be able to use them, but ok if you’ve no idea what they are.

No Comments  comments 

Specifying complicated groups of time series in hts

Published on 15 June 2014

With the lat­est ver­sion of the hts pack­age for R, it is now pos­si­ble to spec­ify rather com­pli­cated group­ing struc­tures rel­a­tively eas­ily. All aggre­ga­tion struc­tures can be rep­re­sented as hier­ar­chies or as cross-​​​​products of hier­ar­chies. For exam­ple, a hier­ar­chi­cal time series may be based on geog­ra­phy: coun­try, state, region, store. Often there is also a sep­a­rate prod­uct hier­ar­chy: prod­uct groups, prod­uct types, packet size. Fore­casts of all the dif­fer­ent types of aggre­ga­tion are required; e.g., prod­uct type A within region X. The aggre­ga­tion struc­ture is a cross-​​​​product of the two hier­ar­chies. This frame­work includes even appar­ently non-​​​​hierarchical data: con­sider the sim­ple case of a time series of deaths split by sex and state. We can con­sider sex and state as two very sim­ple hier­ar­chies with only one level each. Then we wish to fore­cast the aggre­gates of all com­bi­na­tions of the two hier­ar­chies. Any num­ber of sep­a­rate hier­ar­chies can be com­bined in this way. Non-​​​​hierarchical fac­tors such as sex can be treated as single-​​​​level hierarchies.

No Comments  comments 

European talks. June-​​July 2014

Published on 14 June 2014

For the next month I am trav­el­ling in Europe and will be giv­ing the fol­low­ing talks. 17 June. Chal­lenges in fore­cast­ing peak elec­tric­ity demand. Energy Forum, Sierre, Valais/​​Wallis, Switzer­land. 20 June. Com­mon func­tional prin­ci­pal com­po­nent mod­els for mor­tal­ity fore­cast­ing. Inter­na­tional Work­shop on Func­tional and Oper­a­to­r­ial Sta­tis­tics. Stresa, Italy. 24–25 June. Func­tional time series with appli­ca­tions in demog­ra­phy. Hum­boldt Uni­ver­sity, Berlin. 1 July. Fast com­pu­ta­tion of rec­on­ciled fore­casts in hier­ar­chi­cal and grouped time series. Inter­na­tional Sym­po­sium on Fore­cast­ing, Rot­ter­dam, Netherlands.

No Comments  comments 

Data science market places

Published on 26 May 2014

Some new web­sites are being estab­lished offer­ing “mar­ket places” for data sci­ence. Two I’ve come across recently are Experfy and SnapAnalytx.

No Comments  comments 

Structural breaks

Published on 23 May 2014

I’m tired of read­ing about tests for struc­tural breaks and here’s why. A struc­tural break occurs when we see a sud­den change in a time series or a rela­tion­ship between two time series. Econo­me­tri­cians love papers on struc­tural breaks, and appar­ently believe in them. Per­son­ally, I tend to take a dif­fer­ent view of the world. I think a more real­is­tic view is that most things change slowly over time, and only occa­sion­ally with sud­den dis­con­tin­u­ous change.

5 Comments  comments 

To explain or predict?

Published on 19 May 2014

Last week, my research group dis­cussed Galit Shmueli’s paper “To explain or to pre­dict?”, Sta­tis­ti­cal Sci­ence, 25(3), 289–310. (See her web­site for fur­ther mate­ri­als.) This is a paper every­one doing sta­tis­tics and econo­met­rics should read as it helps to clar­ify a dis­tinc­tion that is often blurred. In the dis­cus­sion, the fol­low­ing issues were cov­ered amongst other things. The AIC is bet­ter suited to model selec­tion for pre­dic­tion as it is asymp­tot­i­cally equiv­a­lent to leave-​​​​one-​​​​out cross-​​​​validation in regres­sion, or one-​​​​step-​​​​cross-​​​​validation in time series. On the other hand, it might be argued that the BIC is bet­ter suited to model selec­tion for expla­na­tion, as it is con­sis­tent. P-​​​​values are asso­ci­ated with expla­na­tion, not pre­dic­tion. It makes lit­tle sense to use p-​​​​values to deter­mine the vari­ables in a model that is being used for pre­dic­tion. (There are prob­lems in using p-​​​​values for vari­able selec­tion in any con­text, but that is a dif­fer­ent issue.) Mul­ti­collinear­ity has a very dif­fer­ent impact if your goal is pre­dic­tion from when your goal is esti­ma­tion. When pre­dict­ing, mul­ti­collinear­ity is not really a prob­lem pro­vided the val­ues of your pre­dic­tors lie within the hyper-​​​​region of the pre­dic­tors used when esti­mat­ing the model. An ARIMA model has no explana­tory use, but is great at short-​​​​term pre­dic­tion. How to


4 Comments  comments