A blog by Rob J Hyndman 

Twitter Gplus RSS

Posts Tagged ‘statistics’:


Blog aggregators

Published on 15 May 2012

A very use­ful way of keep­ing up with blogs in a par­tic­u­lar area is to sub­scribe to a blog aggre­ga­tor. These will syn­di­cate posts from a large num­ber of blogs and pro­vide links back to the orig­i­nal sources. So you only need to sub­scribe once to get all the good stuff in that area. There are now sev­eral blog aggre­ga­tors avail­able that might be of inter­est to read­ers here. And this blog is now syn­di­cated on sev­eral other sites includ­ing those listed below.

 
1 Comment  comments 

Measuring time series characteristics

Published on 2 May 2012

A few years ago, I was work­ing on a project where we mea­sured var­i­ous char­ac­ter­is­tics of a time series and used the infor­ma­tion to deter­mine what fore­cast­ing method to apply or how to clus­ter the time series into mean­ing­ful groups. The two main papers to come out of that project were: Wang, Smith and Hyn­d­man (2006) Characteristic-​​​​​​based clus­ter­ing for time series data. Data Min­ing and Knowl­edge Dis­cov­ery, 13(3), 335–364. Wang, Smith-​​​​Miles and Hyn­d­man (2009) “Rule induc­tion for fore­cast­ing method selec­tion: meta-​​​​​​learning the char­ac­ter­is­tics of uni­vari­ate time series”, Neu­ro­com­puting, 72, 2581–2594. I’ve since had a lot of requests for the code which one of my coau­thors has been help­fully email­ing to any­one who asked. But to make it eas­ier, we thought it might be help­ful if I post some updated code here. This is not the same as the R code we used in the paper, as I’ve improved it in sev­eral ways (so it will give dif­fer­ent results). If you just want the code, skip to the bot­tom of the post.

 
16 Comments  comments 

Data visualization

Published on 5 March 2012

For those who have not read the sem­i­nal works of Tufte and Cleve­land, please hang your heads in shame. To sal­vage some sense of self-​​​​worth, you can then head over to Solomon Messing’s blog where he is start­ing a series on data visu­al­iza­tion based on the prin­ci­ples devel­oped by Tufte and Cleve­land (with R exam­ples). The clas­sics are also worth read­ing, and remain rel­e­vant despite the 20 or 30 years that have elapsed since they appeared.

 
1 Comment  comments 

Internet surveys

Published on 19 January 2012

I received the fol­low­ing email today: I am prepar­ing a the­sis … I need to con­duct the widest pos­si­ble poll, and it occurred to me that per­haps you could guide me toward an internet-​​​​based way in which this can be done eas­ily. I have a ten-​​​​question ques­tion­naire pre­pared, that I wish to have an ran­dom sam­ple of the pop­u­la­tion respond to. I have no bud­get for this, so I hope you can sug­gest a way in which a good num­ber of responses can be har­vested using blogs or sites you may be aware of. Here is my response.

 
2 Comments  comments 

Cyclic and seasonal time series

Published on 14 December 2011

These terms get con­fused all the time (e.g., this ques­tion on Cross​Val​i​dated​.com), and so I thought it might be help­ful to try to sum­ma­rize the dis­tinc­tion and some of the asso­ci­ated models.

 
1 Comment  comments 

What you wish you knew before you started a PhD

Published on 11 November 2011

I asked my research group recently what they wished they had learned before they started work on a PhD. Here are some of the responses.

 
5 Comments  comments 

Learn Machine Learning at Stanford for free

Published on 16 August 2011

Andrew Ng’s machine learn­ing course at Stan­ford is being offered free to any­one online in the (north­ern) fall of 2011. I’ve seen some of the notes from this course and it looks to be an excel­lent broad intro­duc­tion to machine learn­ing and data min­ing. For example, support vec­tor machines, neural net­works, ker­nels, clus­ter­ing, dimen­sion reduction, etc.

 
No Comments  comments 

Ten rules for data analysis

Published on 15 March 2011

Peter Kennedy was an asso­ciate edi­tor of the Inter­na­tional Jour­nal of Fore­cast­ing and a superb applied econo­me­tri­cian. He died unex­pect­edly in August 2010. He was best known for his excel­lent book A Guide to Econo­met­rics as well as his “Ten Com­mand­ments of Applied Econo­met­rics”. He pro­vided a vari­a­tion on his ten com­mand­ments in advice to his stu­dents in the form of the fol­low­ing ten rules:

 
No Comments  comments 

Statistical tests for variable selection

Published on 15 March 2011

I received an email today with the fol­low­ing com­ment: I’m using ARIMA with Inter­ven­tion detec­tion and was plan­ning to use your pack­age to iden­tify my ini­tial ARIMA model for later iter­a­tion, how­ever I found that some­times the auto.arima func­tion returns a model where AR/​​MA coef­fi­cients are not sig­nif­i­cant. So my ques­tion is: Is there a way to fil­ter the search for ARIMA mod­els that only have sig­nif­i­cant coef­fi­cients. I can remove the non-​​​​significant coef­fi­cients but I think it would be bet­ter to search for those mod­els that only have sig­nif­i­cant coef­fi­cients. Sta­tis­ti­cal sig­nif­i­cance is not usu­ally a good basis for deter­min­ing whether a vari­able should be included in a model, despite the fact that many peo­ple who should know bet­ter use them for exactly this pur­pose.  Even some text­books dis­cuss vari­able selec­tion using sta­tis­ti­cal tests, thus per­pet­u­at­ing bad sta­tis­ti­cal prac­tice. Sta­tis­ti­cal tests were designed to test hypothe­ses, not select vari­ables. Tests on coef­fi­cients are answer­ing a dif­fer­ent ques­tion from whether the vari­able is use­ful in fore­cast­ing. It is pos­si­ble to have an insignif­i­cant coef­fi­cient asso­ci­ated with a vari­able that is use­ful for fore­cast­ing. It is also pos­si­ble to have a sig­nif­i­cant vari­able asso­ci­ated with a vari­able that is bet­ter omit­ted when fore­cast­ing. To see why the first sit­u­a­tion occurs, think about two highly

(More)…

 
4 Comments  comments 

Lies, damn lies and statistics

Published on 14 January 2011

There’s a nice arti­cle with this title by Stephan Lewandowsky on the ABC web­site today, explor­ing the dif­fer­ence between anec­dotes and data, and the dan­gers of cherry-​​​​picking evidence.

 
No Comments  comments