A blog by Rob J Hyndman 

Twitter Gplus RSS

Posts Tagged ‘StackExchange’:


Interview for the Capital of Statistics

Published on 5 February 2014

Earo Wang recently inter­viewed me for the Chi­nese web­site Cap­i­tal of Sta­tis­tics. The Eng­lish tran­script of the inter­vew is on Earo’s per­sonal web­site. This is the third inter­view I’ve done in the last 18 months. The oth­ers were for: Data Min­ing Research. Repub­lished in Amstat News. DecisionStats.  

 
No Comments  comments 

Forecasting annual totals from monthly data

Published on 15 May 2013

This ques­tion was posed on cross​val​i​dated​.com: I have a monthly time series (for 2009–2012 non-​​​​stationary, with sea­son­al­ity). I can use ARIMA (or ETS) to obtain point and inter­val fore­casts for each month of 2013, but I am inter­ested in fore­cast­ing the total for the whole year, includ­ing pre­dic­tion inter­vals. Is there an easy way in R to obtain inter­val fore­casts for the total for 2013? I’ve come across this prob­lem before in my con­sult­ing work, although I don’t think I’ve ever pub­lished my solu­tion. So here it is.

 
12 Comments  comments 

Seeking help

Published on 8 May 2012

Every day I receive emails, or com­ments on this blog, ask­ing for help with R, fore­cast­ing, LaTeX, pos­si­ble research top­ics, how to install soft­ware, or some other thing I’m sup­posed to know some­thing about. Unfor­tu­nately, I can­not pro­vide a one-​​​​man help ser­vice to the rest of the world. I used to reply to all the requests explain­ing where to go for help, but I stopped reply­ing a while ago as it took too much time to do even that. If you want help, please ask at either stats​.stack​ex​change​.com (for R or sta­tis­tics ques­tions) or tex​.stack​ex​change​.com (for LaTeX ques­tions). Unless you are one of my stu­dents, the only ques­tions I will answer are ones that con­cern my R pack­ages or research papers. And even then, I won’t reply if the answer is in the help files. I write those help files for a rea­son, so please read them. I’m sorry I can’t do more, but if I did every­thing peo­ple ask me to do, I’d never write any papers or pro­duce any R pack­ages, and I think that’s a bet­ter use of my time.

 
2 Comments  comments 

Academia StackExchange

Published on 22 February 2012

There’s a new Stack­Ex­change site that might be use­ful to read­ers: Acad­e­mia. It is a Q&A site for aca­d­e­mics and those enrolled in higher edu­ca­tion. The draft FAQ says it will cover: Life as a grad­u­ate stu­dent, post­doc­toral researcher, uni­ver­sity pro­fes­sor Tran­si­tion­ing from under­grad­u­ate to grad­u­ate researcher Inner work­ings of research depart­ments Require­ments and expec­ta­tions of aca­d­e­mi­cians Judg­ing from the first 89 ques­tions, this is going to be extremely help­ful, espe­cially for PhD students.  

 
No Comments  comments 

Social networking for researchers

Published on 21 July 2011

It would be nice to have a place to share ideas, links, com­ments in a very infor­mal way with oth­ers involved in research in sta­tis­ti­cal method­ol­ogy and data sci­ence. Cross​Val​i​dated​.com is great for spe­cific ques­tions, but is not suit­able for com­ment­ing on papers or shar­ing ideas and links.

 
7 Comments  comments 

CrossValidated Journal Club

Published on 22 December 2010

Jour­nal Clubs are a great way to learn new research ideas and to keep up with the lit­er­a­ture. The idea is that a group of peo­ple get together every week or so to dis­cuss a paper of joint inter­est. This can hap­pen within your own research group or depart­ment, or vir­tu­ally online. There is now a vir­tual jour­nal club oper­at­ing in con­junc­tion with Cross​Val​i​dated​.com. The first paper dis­cussed was on text data min­ing. It appears that the next paper may be on col­lab­o­ra­tive fil­ter­ing. The empha­sis is on Open Access papers, prefer­ably with asso­ci­ated soft­ware that is freely avail­able. Some of the dis­cus­sion tends to cen­tre on how to imple­ment the ideas in R. For those of us in Aus­tralia, the tim­ing is tricky. The first dis­cus­sion took place at 3am local time! If you can’t make the Cross­Val­i­dated Jour­nal Club chats, why not start your own local club?

 
No Comments  comments 

CrossValidated launched!

Published on 5 November 2010

The Cross­Val­i­dated Q&A site is now out of beta and the new design and site name is live. New design The new design looks great, thanks to Jin Yang, our designer-​​​​in-​​​​residence. Note the nor­mal den­sity icon for accepted answers and the site icon depict­ing a 5-​​​​fold cross-​​​​validation (light green for the test set and dark green for the train­ing set). There is a faint back­ground graphic in the header and footer from a pro­gram that tracks and plots a person’s mouse move­ment. This gives the sug­ges­tion of ran­dom­ness as well as the idea of data visu­al­iza­tion (another topic cov­ered on the site). Name and URL The URL cross​val​i​dated​.com will work, but re-​​​​directs to stats​.stack​ex​change​.com. The Stack­Ex­change team (who host the site and pro­vide all the archi­tec­ture) wanted the site to be a sub­do­main of stack​ex​change​.com. How­ever, at least we got the name Cross­Val­i­dated. Scope The site is intended for use by sta­tis­ti­cians, data min­ers, and any­one else doing data analy­sis. It cov­ers ques­tions about sta­tis­ti­cal analy­sis data min­ing and machine learn­ing data visu­al­iza­tion prob­a­bil­ity the­ory sta­tis­ti­cal and data-​​​​driven com­put­ing (e.g., ques­tions about R, SAS, SPSS, Stata and Minitab) The inclu­sion of data min­ing and machine learn­ing along with sta­tis­tics and prob­a­bil­ity was a delib­er­ate attempt to get these two com­mu­ni­ties

(More)…

 
2 Comments  comments 

How to avoid annoying a referee

Published on 22 October 2010

It’s not a good idea to annoy the ref­er­ees of your paper. They make rec­om­men­da­tions to the edi­tor about your work and it is best to keep them happy. There is an inter­est­ing dis­cus­sion on stats​.stack​ex​change​.com on this sub­ject. This inspired my own list below. Explain what you’ve done clearly, avoid­ing unnec­es­sary jar­gon. Don’t claim your paper con­tributes more than it actu­ally does. (I ref­er­eed a paper this week where the author claimed to have invented prin­ci­pal com­po­nent analy­sis!) Ensure all fig­ures have clear cap­tions and labels. Include cita­tions to the referee’s own work. Obvi­ously you don’t know who is going to ref­eree your paper, but you should aim to cite the main work in the area. It places your work in con­text, and keeps the ref­er­ees happy if they are the authors. Make sure the cited papers say what you think they say. Sight what you cite! Include proper cita­tions for all soft­ware pack­ages. If you are unsure how to cite an R pack­age, try the com­mand citation(“packagename”). Never pla­gia­rise from other papers — not even sen­tence frag­ments. Use your own words. I’ve ref­er­eed a the­sis which had slabs taken from my own lec­ture notes includ­ing the typos. Don’t pla­gia­rise from your own papers. Either ref­er­ence

(More)…

 
2 Comments  comments 

Happy World Statistics Day!

Published on 20 October 2010

The United Nations has declared today “World Sta­tis­tics Day”. I’ve no idea what that means, or why we need a WSD. Per­haps it is because the date is 20.10.2010 (except in North Amer­ica where it is 10.20.2010). But then, what hap­pens from 2013 to 2099? And do we just for­get the whole idea after 3112? In any case, if we are going to have a WSD, let’s use it to do some­thing use­ful. Patrick Burns has some ideas over at Port­fo­lio Probe. Here are some of my own: Learn R. The time has come when it is not really pos­si­ble to be a well-​​​​informed applied sta­tis­ti­cian if you are not a reg­u­lar R user. Get involved on Stats​.Stack​Ex​change​.com. It’s only 3 months old, but we already have over 1600 users and it has quickly become the best place to ask and answer ques­tions about sta­tis­tics, data min­ing, data visu­al­iza­tion and every­thing else to do with analysing data. If you’re in Lon­don, head to the RSS get­stats launch, held appro­pri­ately at 20:10 on 20.10.2010. Learn some new sta­tis­ti­cal tech­niques. If you have never used the boot­strap, EM algo­rithm, mixed mod­els or the Kalman fil­ter, now is a great day to start. Stop using hypoth­e­sis tests and p-​​​​values! Instead, use con­fi­dence inter­vals

(More)…

 
3 Comments  comments 

Why every statistician should know about cross-​​validation

Published on 4 October 2010

Sur­pris­ingly, many sta­tis­ti­cians see cross-​​​​validation as some­thing data min­ers do, but not a core sta­tis­ti­cal tech­nique. I thought it might be help­ful to sum­ma­rize the role of cross-​​​​validation in sta­tis­tics, espe­cially as it is pro­posed that the Q&A site at stats​.stack​ex​change​.com should be renamed Cross​Val​i​dated​.com. Cross-​​​​validation is pri­mar­ily a way of mea­sur­ing the pre­dic­tive per­for­mance of a sta­tis­ti­cal model. Every sta­tis­ti­cian knows that the model fit sta­tis­tics are not a good guide to how well a model will pre­dict: high does not nec­es­sar­ily mean a good model. It is easy to over-​​​​fit the data by includ­ing too many degrees of free­dom and so inflate and other fit sta­tis­tics. For exam­ple, in a sim­ple poly­no­mial regres­sion I can just keep adding higher order terms and so get bet­ter and bet­ter fits to the data. But the pre­dic­tions from the model on new data will usu­ally get worse as higher order terms are added. One way to mea­sure the pre­dic­tive abil­ity of a model is to test it on a set of data not used in esti­ma­tion. Data min­ers call this a “test set” and the data used for esti­ma­tion is the “train­ing set”. For exam­ple, the pre­dic­tive accu­racy of a model can be mea­sured by the mean squared error on the test set. This will

(More)…

 
33 Comments  comments