A blog by Rob J Hyndman 

Twitter Gplus RSS

Posts Tagged ‘statistics’:

Seven forecasting blogs

Published on 22 April 2014

There are sev­eral other blogs on fore­cast­ing that read­ers might be inter­ested in. Here are seven worth fol­low­ing: No Hes­i­ta­tions by Fran­cis Diebold (Pro­fes­sor of Eco­nom­ics, Uni­ver­sity of Penn­syl­va­nia). Diebold needs no intro­duc­tion to fore­cast­ers. He pri­mar­ily cov­ers fore­cast­ing in eco­nom­ics and finance, but also xkcd car­toons, graph­ics, research issues, etc. Econo­met­rics Beat by Dave Giles. Dave is a pro­fes­sor of eco­nom­ics at the Uni­ver­sity of Vic­to­ria (Canada), for­merly from my own depart­ment at Monash Uni­ver­sity (Aus­tralia), and a native New Zealan­der. Not a lot on fore­cast­ing, but plenty of inter­est­ing posts about econo­met­rics and sta­tis­tics more gen­er­ally. Busi­ness fore­cast­ing by Clive Jones (a pro­fes­sional fore­caster based in Col­orado, USA). Orig­i­nally about sales and new prod­uct fore­cast­ing, but he now cov­ers a lot of other fore­cast­ing top­ics and has an inter­est­ing prac­ti­tioner per­spec­tive. Freakono­met­rics: by Arthur Char­p­en­tier (an actu­ary and pro­fes­sor of math­e­mat­ics at the Uni­ver­sity of Que­bec at Mon­tréal, Canada). This is the most pro­lific blog on this list. Wide rang­ing and tak­ing in sta­tis­tics, fore­cast­ing, econo­met­rics, actu­ar­ial sci­ence, R, and any­thing else that takes his fancy. Some­times in French. No free hunch: the kag­gle blog. Some of the most inter­est­ing posts are from kag­gle com­pe­ti­tion win­ners explain­ing their meth­ods. Energy fore­cast­ing by Tao Hong (for­merly an energy fore­caster for


No Comments  comments 

Errors on percentage errors

Published on 16 April 2014

The MAPE (mean absolute per­cent­age error) is a pop­u­lar mea­sure for fore­cast accu­racy and is defined as     where denotes an obser­va­tion and denotes its fore­cast, and the mean is taken over . Arm­strong (1985, p.348) was the first (to my knowl­edge) to point out the asym­me­try of the MAPE say­ing that “it has a bias favor­ing esti­mates that are below the actual values”.

8 Comments  comments 

Job at Center for Open Science

Published on 8 April 2014

This looks like an inter­est­ing job. Dear Dr. Hyn­d­man, I write from the Cen­ter for Open Sci­ence, a non-​​​​profit orga­ni­za­tion based in Char­lottesville, Vir­ginia in the United States, which is ded­i­cated to improv­ing the align­ment between sci­en­tific val­ues and sci­en­tific prac­tices. We are ded­i­cated to open source and open sci­ence. We are reach­ing out to you to find out if you know any­one who might be inter­ested in our Sta­tis­ti­cal and Method­olog­i­cal Con­sul­tant posi­tion. The posi­tion is a unique oppor­tu­nity to con­sult on repro­ducible best prac­tices in data analy­sis and research design; the con­sul­tant will make shorts vis­its to pro­vide lec­tures and train­ing at uni­ver­si­ties, lab­o­ra­to­ries, con­fer­ences, and through vir­tual medi­ums. An espe­cially unique part of the job involves col­lab­o­rat­ing with the White House’s Office of Sci­ence and Tech­nol­ogy Pol­icy on mat­ters relat­ing to repro­ducibil­ity. If you know some­one with sub­stan­tial train­ing and expe­ri­ence in sci­en­tific research, quan­ti­ta­tive meth­ods, repro­ducible research prac­tices, and some pro­gram­ming expe­ri­ence (at least R, ide­ally Python or Julia) might you please pass this along to them? Any­one may find out more about the job or apply via our web­site: http://​cen​ter​foropen​science​.org/​j​o​b​s​/​#​stats The posi­tion is full-​​​​time and located at our office in beau­ti­ful Char­lottesville, VA. Thanks in advance for your time


No Comments  comments 

Interpreting noise

Published on 6 April 2014

When watch­ing the TV news, or read­ing news­pa­per com­men­tary, I am fre­quently amazed at the attempts peo­ple make to inter­pret ran­dom noise. For exam­ple, the lat­est tiny fluc­tu­a­tion in the share price of a major com­pany is attrib­uted to the CEO being ill. When the exchange rate goes up, the TV finance com­men­ta­tor con­fi­dently announces that it is a reac­tion to Chi­nese build­ing con­tracts. No one ever says “The unem­ploy­ment rate has dropped by 0.1% for no appar­ent rea­son.” What is going on here is that the com­men­ta­tors are assum­ing we live in a noise-​​​​free world. They imag­ine that every­thing is explic­a­ble, you just have to find the expla­na­tion. How­ever, the world is noisy — real data are sub­ject to ran­dom fluc­tu­a­tions, and are often also mea­sured inac­cu­rately. So to inter­pret every lit­tle fluc­tu­a­tion is silly and misleading.

4 Comments  comments 

Fast computation of cross-​​validation in linear models

Published on 17 March 2014

The leave-​​​​one-​​​​out cross-​​​​validation sta­tis­tic is given by     where , are the obser­va­tions, and is the pre­dicted value obtained when the model is esti­mated with the th case deleted. This is also some­times known as the PRESS (Pre­dic­tion Resid­ual Sum of Squares) sta­tis­tic. It turns out that for lin­ear mod­els, we do not actu­ally have to esti­mate the model times, once for each omit­ted case. Instead, CV can be com­puted after esti­mat­ing the model once on the com­plete data set.

6 Comments  comments 

Probabilistic forecasting by Gneiting and Katzfuss (2014)

Published on 14 March 2014

The IJF is intro­duc­ing occa­sional review papers on areas of fore­cast­ing. We did a whole issue in 2006 review­ing 25 years of research since the Inter­na­tional Insti­tute of Fore­cast­ers was estab­lished. Since then, there has been a lot of new work in appli­ca­tion areas such as call cen­ter fore­cast­ing and elec­tric­ity price fore­cast­ing. In addi­tion, there are areas we did not cover in 2006 includ­ing new prod­uct fore­cast­ing and fore­cast­ing in finance. There have also been method­olog­i­cal and the­o­ret­i­cal devel­op­ments over the last eight years. Con­se­quently, I’ve started invit­ing emi­nent researchers to write sur­vey papers for the jour­nal. One obvi­ous choice was Tilmann Gneit­ing, who has pro­duced a large body of excel­lent work on prob­a­bilis­tic fore­cast­ing in the last few years. The the­ory of fore­cast­ing was badly in need of devel­op­ment, and Tilmann and his coau­thors have made sev­eral great con­tri­bu­tions in this area. How­ever, when I asked him to write a review he explained that another jour­nal had got in before me, and that the review was already writ­ten. It appeared in the very first vol­ume of the new jour­nal Annual Review of Sta­tis­tics and its Appli­ca­tion: Gneit­ing and Katz­fuss (2014) Prob­a­bilis­tic Fore­cast­ing, pp.125–151. Hav­ing now read it, I’m both grate­ful for this more acces­si­ble


1 Comment  comments 

Testing for trend in ARIMA models

Published on 13 March 2014

Today’s email brought this one: I was won­der­ing if I could get your opin­ion on a par­tic­u­lar prob­lem that I have run into dur­ing the review­ing process of an arti­cle. Basi­cally, I have an analy­sis where I am look­ing at a cou­ple of time-​​​​series and I wanted to know if, over time there was an upward trend in the series. Inspec­tion of the raw data sug­gests there is, but we want some sta­tis­ti­cal evi­dence for this. To achieve this I ran some ARIMA (0,1,1) mod­els includ­ing a drift/​​trend term to see if the mean of the series did indeed shift upwards with time and found that it did. How­ever, we have run into an issue with a reviewer who argues that dif­fer­enc­ing removes trends and may not be a suit­able way to detect trends. There­fore, the fact that we found a trend despite dif­fer­enc­ing sug­gest that dif­fer­enc­ing was not suc­cess­ful. I know there are a few papers and text­books that use ARIMA (0,1,1) mod­els as ‘ran­dom walks with drift’-type mod­els so I cited them as exam­ples of this pro­ce­dure in action, but they remained uncon­vinced. Instead it was sug­gested that I look for trends in the raw undif­fer­enced time-​​​​series as these would be more reli­able as no trends had been removed. AT the moment I am hes­i­tant to do this


10 Comments  comments 

Unit root tests and ARIMA models

Published on 12 March 2014

An email I received today: I have a small prob­lem. I have a time series called x : — If I use the default val­ues of auto.arima(x), the best model is an ARIMA(1,0,0) — How­ever, I tried the func­tion ndiffs(x, test=“adf”) and ndiffs(x, test=“kpss”) as the KPSS test seems to be the default value, and the num­ber of dif­fer­ence is 0 for the kpss test (con­sis­tent with the results of auto.arima() ) but 2 for the ADF test. I then tried auto.arima(x, test=“adf”) and now I have another model ARIMA(1,2,1). I am unsure which order of inte­gra­tion I should use as tests give fairly dif­fer­ent results. Is there a test that prevails ?

No Comments  comments 

Forecasting weekly data

Published on 5 March 2014

This is another sit­u­a­tion where Fourier terms are use­ful for han­dling the sea­son­al­ity. Not only is the sea­sonal period rather long, it is non-​​​​integer (aver­ag­ing 365.25÷7 = 52.18). So ARIMA and ETS mod­els do not tend to give good results, even with a period of 52 as an approximation.

10 Comments  comments 

Fitting models to short time series

Published on 4 March 2014

Fol­low­ing my post on fit­ting mod­els to long time series, I thought I’d tackle the oppo­site prob­lem, which is more com­mon in busi­ness envi­ron­ments. I often get asked how few data points can be used to fit a time series model. As with almost all sam­ple size ques­tions, there is no easy answer. It depends on the num­ber of model para­me­ters to be esti­mated and the amount of ran­dom­ness in the data. The sam­ple size required increases with the num­ber of para­me­ters to be esti­mated, and the amount of noise in the data.

1 Comment  comments