A blog by Rob J Hyndman 

Twitter Gplus RSS

Posts Tagged ‘R’:

My forecasting book now on Amazon

Published on 9 April 2014

For all those peo­ple ask­ing me how to obtain a print ver­sion of my book “Fore­cast­ing: prin­ci­ples and prac­tice” with George Athana­sopou­los, you now can. Order on Ama​zon​.com Order on Ama​zon​.co​.uk Order on Ama​zon​.fr The online book will con­tinue to be freely avail­able. The print ver­sion of the book is intended to help fund the devel­op­ment of the OTexts plat­form. The price is US45, 27 or €35. Compare that to195 for my pre­vi­ous fore­cast­ing text­book, 150 for Fildes and Ord, or182 for Gonzalez-​​​​Rivera. No mat­ter how good the books are, the prices are absurdly high. OTexts is intended to be a dif­fer­ent kind of pub­lisher — all our books are online and free, those in print will be rea­son­ably priced. The online ver­sion will con­tinue to be updated reg­u­larly. The print ver­sion is a snap­shot of the online ver­sion today. We will release a new print edi­tion occa­sion­ally, no more than annu­ally and only when the online ver­sion has changed enough to war­rant a new print edi­tion. We are plan­ning an offline elec­tronic ver­sion as well. I’ll announce it here when it is ready.

1 Comment  comments 

Job at Center for Open Science

Published on 8 April 2014

This looks like an inter­est­ing job. Dear Dr. Hyn­d­man, I write from the Cen­ter for Open Sci­ence, a non-​​​​profit orga­ni­za­tion based in Char­lottesville, Vir­ginia in the United States, which is ded­i­cated to improv­ing the align­ment between sci­en­tific val­ues and sci­en­tific prac­tices. We are ded­i­cated to open source and open sci­ence. We are reach­ing out to you to find out if you know any­one who might be inter­ested in our Sta­tis­ti­cal and Method­olog­i­cal Con­sul­tant posi­tion. The posi­tion is a unique oppor­tu­nity to con­sult on repro­ducible best prac­tices in data analy­sis and research design; the con­sul­tant will make shorts vis­its to pro­vide lec­tures and train­ing at uni­ver­si­ties, lab­o­ra­to­ries, con­fer­ences, and through vir­tual medi­ums. An espe­cially unique part of the job involves col­lab­o­rat­ing with the White House’s Office of Sci­ence and Tech­nol­ogy Pol­icy on mat­ters relat­ing to repro­ducibil­ity. If you know some­one with sub­stan­tial train­ing and expe­ri­ence in sci­en­tific research, quan­ti­ta­tive meth­ods, repro­ducible research prac­tices, and some pro­gram­ming expe­ri­ence (at least R, ide­ally Python or Julia) might you please pass this along to them? Any­one may find out more about the job or apply via our web­site: http://​cen​ter​foropen​science​.org/​j​o​b​s​/​#​stats The posi­tion is full-​​​​time and located at our office in beau­ti­ful Char­lottesville, VA. Thanks in advance for your time


No Comments  comments 

Cover of my forecasting textbook

Published on 18 March 2014

We now have a cover for the print ver­sion of my fore­cast­ing book with George Athana­sopou­los. It should be on Ama­zon in a cou­ple of weeks. The book is also freely avail­able online. This is a vari­a­tion of the most pop­u­lar one in the poll con­ducted a month or two ago. The cover was pro­duced by Scar­lett Rugers who I can hap­pily rec­om­mend to any­one want­ing a book cover designed.

No Comments  comments 

Fast computation of cross-​​validation in linear models

Published on 17 March 2014

The leave-​​​​one-​​​​out cross-​​​​validation sta­tis­tic is given by     where , are the obser­va­tions, and is the pre­dicted value obtained when the model is esti­mated with the th case deleted. This is also some­times known as the PRESS (Pre­dic­tion Resid­ual Sum of Squares) sta­tis­tic. It turns out that for lin­ear mod­els, we do not actu­ally have to esti­mate the model times, once for each omit­ted case. Instead, CV can be com­puted after esti­mat­ing the model once on the com­plete data set.

6 Comments  comments 

Testing for trend in ARIMA models

Published on 13 March 2014

Today’s email brought this one: I was won­der­ing if I could get your opin­ion on a par­tic­u­lar prob­lem that I have run into dur­ing the review­ing process of an arti­cle. Basi­cally, I have an analy­sis where I am look­ing at a cou­ple of time-​​​​series and I wanted to know if, over time there was an upward trend in the series. Inspec­tion of the raw data sug­gests there is, but we want some sta­tis­ti­cal evi­dence for this. To achieve this I ran some ARIMA (0,1,1) mod­els includ­ing a drift/​​trend term to see if the mean of the series did indeed shift upwards with time and found that it did. How­ever, we have run into an issue with a reviewer who argues that dif­fer­enc­ing removes trends and may not be a suit­able way to detect trends. There­fore, the fact that we found a trend despite dif­fer­enc­ing sug­gest that dif­fer­enc­ing was not suc­cess­ful. I know there are a few papers and text­books that use ARIMA (0,1,1) mod­els as ‘ran­dom walks with drift’-type mod­els so I cited them as exam­ples of this pro­ce­dure in action, but they remained uncon­vinced. Instead it was sug­gested that I look for trends in the raw undif­fer­enced time-​​​​series as these would be more reli­able as no trends had been removed. AT the moment I am hes­i­tant to do this


10 Comments  comments 

Unit root tests and ARIMA models

Published on 12 March 2014

An email I received today: I have a small prob­lem. I have a time series called x : — If I use the default val­ues of auto.arima(x), the best model is an ARIMA(1,0,0) — How­ever, I tried the func­tion ndiffs(x, test=“adf”) and ndiffs(x, test=“kpss”) as the KPSS test seems to be the default value, and the num­ber of dif­fer­ence is 0 for the kpss test (con­sis­tent with the results of auto.arima() ) but 2 for the ADF test. I then tried auto.arima(x, test=“adf”) and now I have another model ARIMA(1,2,1). I am unsure which order of inte­gra­tion I should use as tests give fairly dif­fer­ent results. Is there a test that prevails ?

No Comments  comments 

Using old versions of R packages

Published on 10 March 2014

I received this email yes­ter­day: I have been using your ‘fore­cast’ pack­age for more than a year now. I was on R ver­sion 2.15 until last week, but I am hav­ing issues with lubri­date pack­age, hence decided to update R ver­sion to R 3.0.1. In our orga­ni­za­tion even get­ting an open source appli­ca­tion require us to go through a whole lot of approval processes. I asked for R 3.0.1, before I get approval for 3.0.1, a new ver­sion of R ( R 3.0.2 ) came out. Unfor­tu­nately for me fore­cast pack­age was built in R3.0.2. Is there any ver­sion of fore­cast pack­age that works in older ver­sion of R(3.0.1). I just don’t want to go through this entire approval war again within the orga­ni­za­tion. Please help if you have any work around for this This is unfor­tu­nately very com­mon. Many cor­po­rate IT envi­ron­ments lock down com­put­ers to such an extent that it crip­ples the use of mod­ern soft­ware like R which is con­tin­u­ously updated. It also affects uni­ver­si­ties (which should know bet­ter) and I am con­stantly try­ing to invent work-​​​​arounds to the con­straints that Monash IT ser­vices place on staff and stu­dent com­put­ers. Here are a few thoughts that might help.

6 Comments  comments 

Forecasting weekly data

Published on 5 March 2014

This is another sit­u­a­tion where Fourier terms are use­ful for han­dling the sea­son­al­ity. Not only is the sea­sonal period rather long, it is non-​​​​integer (aver­ag­ing 365.25÷7 = 52.18). So ARIMA and ETS mod­els do not tend to give good results, even with a period of 52 as an approximation.

10 Comments  comments 

Fitting models to short time series

Published on 4 March 2014

Fol­low­ing my post on fit­ting mod­els to long time series, I thought I’d tackle the oppo­site prob­lem, which is more com­mon in busi­ness envi­ron­ments. I often get asked how few data points can be used to fit a time series model. As with almost all sam­ple size ques­tions, there is no easy answer. It depends on the num­ber of model para­me­ters to be esti­mated and the amount of ran­dom­ness in the data. The sam­ple size required increases with the num­ber of para­me­ters to be esti­mated, and the amount of noise in the data.

1 Comment  comments 

More time series data online

Published on 27 February 2014

Ear­lier this week I had cof­fee with Ben Fulcher who told me about his online col­lec­tion com­pris­ing about 30,000 time series, mostly med­ical series such as ECG mea­sure­ments, mete­o­ro­log­i­cal series, bird­song, etc. There are some finance series, but not many other data from a busi­ness or eco­nomic con­text, although he does include my Time Series Data Library. In addi­tion, he pro­vides Mat­lab code to com­pute a large num­ber of char­ac­ter­is­tics. Any­one want­ing to test time series algo­rithms on a large col­lec­tion of data should take a look. Unfor­tu­nately there is no R code, and no R inter­face for down­load­ing the data.

No Comments  comments