A blog by Rob J Hyndman 

Twitter Gplus RSS

Posts Tagged ‘statistics’:


Errors on percentage errors

Published on 16 April 2014

The MAPE (mean absolute per­cent­age error) is a pop­u­lar mea­sure for fore­cast accu­racy and is defined as     where denotes an obser­va­tion and denotes its fore­cast, and the mean is taken over . Arm­strong (1985, p.348) was the first (to my knowl­edge) to point out the asym­me­try of the MAPE say­ing that “it has a bias favor­ing esti­mates that are below the actual values”.

 
6 Comments  comments 

Job at Center for Open Science

Published on 8 April 2014

This looks like an inter­est­ing job. Dear Dr. Hyn­d­man, I write from the Cen­ter for Open Sci­ence, a non-​​​​profit orga­ni­za­tion based in Char­lottesville, Vir­ginia in the United States, which is ded­i­cated to improv­ing the align­ment between sci­en­tific val­ues and sci­en­tific prac­tices. We are ded­i­cated to open source and open sci­ence. We are reach­ing out to you to find out if you know any­one who might be inter­ested in our Sta­tis­ti­cal and Method­olog­i­cal Con­sul­tant posi­tion. The posi­tion is a unique oppor­tu­nity to con­sult on repro­ducible best prac­tices in data analy­sis and research design; the con­sul­tant will make shorts vis­its to pro­vide lec­tures and train­ing at uni­ver­si­ties, lab­o­ra­to­ries, con­fer­ences, and through vir­tual medi­ums. An espe­cially unique part of the job involves col­lab­o­rat­ing with the White House’s Office of Sci­ence and Tech­nol­ogy Pol­icy on mat­ters relat­ing to repro­ducibil­ity. If you know some­one with sub­stan­tial train­ing and expe­ri­ence in sci­en­tific research, quan­ti­ta­tive meth­ods, repro­ducible research prac­tices, and some pro­gram­ming expe­ri­ence (at least R, ide­ally Python or Julia) might you please pass this along to them? Any­one may find out more about the job or apply via our web­site: http://​cen​ter​foropen​science​.org/​j​o​b​s​/​#​stats The posi­tion is full-​​​​time and located at our office in beau­ti­ful Char­lottesville, VA. Thanks in advance for your time

(More)…

 
No Comments  comments 

Interpreting noise

Published on 6 April 2014

When watch­ing the TV news, or read­ing news­pa­per com­men­tary, I am fre­quently amazed at the attempts peo­ple make to inter­pret ran­dom noise. For exam­ple, the lat­est tiny fluc­tu­a­tion in the share price of a major com­pany is attrib­uted to the CEO being ill. When the exchange rate goes up, the TV finance com­men­ta­tor con­fi­dently announces that it is a reac­tion to Chi­nese build­ing con­tracts. No one ever says “The unem­ploy­ment rate has dropped by 0.1% for no appar­ent rea­son.” What is going on here is that the com­men­ta­tors are assum­ing we live in a noise-​​​​free world. They imag­ine that every­thing is explic­a­ble, you just have to find the expla­na­tion. How­ever, the world is noisy — real data are sub­ject to ran­dom fluc­tu­a­tions, and are often also mea­sured inac­cu­rately. So to inter­pret every lit­tle fluc­tu­a­tion is silly and misleading.

 
2 Comments  comments 

Fast computation of cross-​​validation in linear models

Published on 17 March 2014

The leave-​​​​one-​​​​out cross-​​​​validation sta­tis­tic is given by     where , are the obser­va­tions, and is the pre­dicted value obtained when the model is esti­mated with the th case deleted. This is also some­times known as the PRESS (Pre­dic­tion Resid­ual Sum of Squares) sta­tis­tic. It turns out that for lin­ear mod­els, we do not actu­ally have to esti­mate the model times, once for each omit­ted case. Instead, CV can be com­puted after esti­mat­ing the model once on the com­plete data set.

 
6 Comments  comments 

Probabilistic forecasting by Gneiting and Katzfuss (2014)

Published on 14 March 2014

The IJF is intro­duc­ing occa­sional review papers on areas of fore­cast­ing. We did a whole issue in 2006 review­ing 25 years of research since the Inter­na­tional Insti­tute of Fore­cast­ers was estab­lished. Since then, there has been a lot of new work in appli­ca­tion areas such as call cen­ter fore­cast­ing and elec­tric­ity price fore­cast­ing. In addi­tion, there are areas we did not cover in 2006 includ­ing new prod­uct fore­cast­ing and fore­cast­ing in finance. There have also been method­olog­i­cal and the­o­ret­i­cal devel­op­ments over the last eight years. Con­se­quently, I’ve started invit­ing emi­nent researchers to write sur­vey papers for the jour­nal. One obvi­ous choice was Tilmann Gneit­ing, who has pro­duced a large body of excel­lent work on prob­a­bilis­tic fore­cast­ing in the last few years. The the­ory of fore­cast­ing was badly in need of devel­op­ment, and Tilmann and his coau­thors have made sev­eral great con­tri­bu­tions in this area. How­ever, when I asked him to write a review he explained that another jour­nal had got in before me, and that the review was already writ­ten. It appeared in the very first vol­ume of the new jour­nal Annual Review of Sta­tis­tics and its Appli­ca­tion: Gneit­ing and Katz­fuss (2014) Prob­a­bilis­tic Fore­cast­ing, pp.125–151. Hav­ing now read it, I’m both grate­ful for this more acces­si­ble

(More)…

 
1 Comment  comments 

Testing for trend in ARIMA models

Published on 13 March 2014

Today’s email brought this one: I was won­der­ing if I could get your opin­ion on a par­tic­u­lar prob­lem that I have run into dur­ing the review­ing process of an arti­cle. Basi­cally, I have an analy­sis where I am look­ing at a cou­ple of time-​​​​series and I wanted to know if, over time there was an upward trend in the series. Inspec­tion of the raw data sug­gests there is, but we want some sta­tis­ti­cal evi­dence for this. To achieve this I ran some ARIMA (0,1,1) mod­els includ­ing a drift/​​trend term to see if the mean of the series did indeed shift upwards with time and found that it did. How­ever, we have run into an issue with a reviewer who argues that dif­fer­enc­ing removes trends and may not be a suit­able way to detect trends. There­fore, the fact that we found a trend despite dif­fer­enc­ing sug­gest that dif­fer­enc­ing was not suc­cess­ful. I know there are a few papers and text­books that use ARIMA (0,1,1) mod­els as ‘ran­dom walks with drift’-type mod­els so I cited them as exam­ples of this pro­ce­dure in action, but they remained uncon­vinced. Instead it was sug­gested that I look for trends in the raw undif­fer­enced time-​​​​series as these would be more reli­able as no trends had been removed. AT the moment I am hes­i­tant to do this

(More)…

 
10 Comments  comments 

Unit root tests and ARIMA models

Published on 12 March 2014

An email I received today: I have a small prob­lem. I have a time series called x : — If I use the default val­ues of auto.arima(x), the best model is an ARIMA(1,0,0) — How­ever, I tried the func­tion ndiffs(x, test=“adf”) and ndiffs(x, test=“kpss”) as the KPSS test seems to be the default value, and the num­ber of dif­fer­ence is 0 for the kpss test (con­sis­tent with the results of auto.arima() ) but 2 for the ADF test. I then tried auto.arima(x, test=“adf”) and now I have another model ARIMA(1,2,1). I am unsure which order of inte­gra­tion I should use as tests give fairly dif­fer­ent results. Is there a test that prevails ?

 
No Comments  comments 

Forecasting weekly data

Published on 5 March 2014

This is another sit­u­a­tion where Fourier terms are use­ful for han­dling the sea­son­al­ity. Not only is the sea­sonal period rather long, it is non-​​​​integer (aver­ag­ing 365.25÷7 = 52.18). So ARIMA and ETS mod­els do not tend to give good results, even with a period of 52 as an approximation.

 
10 Comments  comments 

Fitting models to short time series

Published on 4 March 2014

Fol­low­ing my post on fit­ting mod­els to long time series, I thought I’d tackle the oppo­site prob­lem, which is more com­mon in busi­ness envi­ron­ments. I often get asked how few data points can be used to fit a time series model. As with almost all sam­ple size ques­tions, there is no easy answer. It depends on the num­ber of model para­me­ters to be esti­mated and the amount of ran­dom­ness in the data. The sam­ple size required increases with the num­ber of para­me­ters to be esti­mated, and the amount of noise in the data.

 
1 Comment  comments 

Fitting models to long time series

Published on 1 March 2014

I received this email today: I recall you made this very insight­ful remark some­where that, fit­ting a stan­dard arima model with too much data, ie. a very long time series, is a bad idea. Can you elab­o­rate why? I can see the issue with noise, which com­pounds the ML esti­ma­tion as the series gets too long. But is there any­thing else? I’m not sure where I made a com­ment about this, but it is true that ARIMA mod­els don’t work well for very long time series. The same can be said about almost any other model too. The prob­lem is that real data do not come from the mod­els we use. When the num­ber of obser­va­tions is not large (say up to about 200) the mod­els often work well as an approx­i­ma­tion to what­ever process gen­er­ated the data. But even­tu­ally you will have enough data that the dif­fer­ence between the true process and the model starts to become more obvi­ous. An addi­tional prob­lem is that the opti­miza­tion of the para­me­ters becomes more time con­sum­ing because of the num­ber of obser­va­tions involved. What to do about these issues depends on the pur­pose of the model. A more flex­i­ble non­para­met­ric model could be used, but this still assumes that the model

(More)…

 
1 Comment  comments