Useful tutorials

There are some tools that I use reg­u­larly, and I would like my research stu­dents and post-​​docs to learn them too. Here are some great online tuto­ri­als that might help.

R vs Autobox vs ForecastPro vs ...

Every now and then a com­mer­cial soft­ware ven­dor makes claims on social media about how their soft­ware is so much bet­ter than the fore­cast pack­age for R, but no details are provided.

There are lots of rea­sons why you might select a par­tic­u­lar soft­ware solu­tion, and R isn’t for every­one. But any­one claim­ing supe­ri­or­ity should at least pro­vide some evi­dence rather than make unsub­stan­ti­ated claims. Con­tinue reading →

New Australian data on the HMD

The Human Mor­tal­ity Data­base is a won­der­ful resource for any­one inter­ested in demo­graphic data. It is a care­fully curated col­lec­tion of high qual­ity deaths and pop­u­la­tion data from 37 coun­tries, all in a con­sis­tent for­mat with con­sis­tent def­i­n­i­tions. I have used it many times and never cease to be amazed at the care taken to main­tain such a great resource.

The data are con­tin­u­ally being revised and updated. Today the Aus­tralian data has been updated to 2011. There is a time lag because of lagged death reg­is­tra­tions which results in under­counts; so only data that are likely to be com­plete are included.

Tim Riffe from the HMD has pro­vided the fol­low­ing infor­ma­tion about the update:

  1. All death counts since 1964 are now included by year of occur­rence, up to 2011. We have 2012 data but do not pub­lish them because they are likely a 5% under­count due to lagged registration.
  2. Death count inputs for 1921 to 1963 are now in sin­gle ages. Pre­vi­ously they were in 5-​​year age groups. Rather than hav­ing an open age group of 85+ in this period counts usu­ally go up to the max­i­mum observed (stated) age. This change (i) intro­duces minor heap­ing in early years and (ii) implies dif­fer­ent appar­ent old-​​age mor­tal­ity than before, since pre­vi­ously any­thing above 85 was mod­eled accord­ing to the Meth­ods Pro­to­col.
  3. Pop­u­la­tion denom­i­na­tors have been swapped out for years 1992 to the present, owing to new ABS method­ol­ogy and inter­censal esti­mates for the recent period.

Some of the data can be read into R using the hmd.mx and hmd.e0 func­tions from the demog­ra­phy pack­age. Tim has his own pack­age on github that pro­vides a more exten­sive interface.

Errors on percentage errors

The MAPE (mean absolute per­cent­age error) is a pop­u­lar mea­sure for fore­cast accu­racy and is defined as

    \[\text{MAPE} = 100\text{mean}(|y_t - \hat{y}_t|/|y_t|)\]

where y_t denotes an obser­va­tion and \hat{y}_t denotes its fore­cast, and the mean is taken over t.

Arm­strong (1985, p.348) was the first (to my knowl­edge) to point out the asym­me­try of the MAPE say­ing that “it has a bias favor­ing esti­mates that are below the actual val­ues”. Con­tinue reading →

Job at Center for Open Science

This looks like an inter­est­ing job.

Dear Dr. Hyndman,

I write from the Cen­ter for Open Sci­ence, a non-​​profit orga­ni­za­tion based in Char­lottesville, Vir­ginia in the United States, which is ded­i­cated to improv­ing the align­ment between sci­en­tific val­ues and sci­en­tific prac­tices. We are ded­i­cated to open source and open science.

We are reach­ing out to you to find out if you know any­one who might be inter­ested in our Sta­tis­ti­cal and Method­olog­i­cal Con­sul­tant position.

The posi­tion is a unique oppor­tu­nity to con­sult on repro­ducible best prac­tices in data analy­sis and research design; the con­sul­tant will make shorts vis­its to pro­vide lec­tures and train­ing at uni­ver­si­ties, lab­o­ra­to­ries, con­fer­ences, and through vir­tual medi­ums. An espe­cially unique part of the job involves col­lab­o­rat­ing with the White House’s Office of Sci­ence and Tech­nol­ogy Pol­icy on mat­ters relat­ing to reproducibility.

If you know some­one with sub­stan­tial train­ing and expe­ri­ence in sci­en­tific research, quan­ti­ta­tive meth­ods, repro­ducible research prac­tices, and some pro­gram­ming expe­ri­ence (at least R, ide­ally Python or Julia) might you please pass this along to them?

Any­one may find out more about the job or apply via our website:

http://​cen​ter​foropen​science​.org/​j​o​b​s​/​#​stats

The posi­tion is full-​​time and located at our office in beau­ti­ful Char­lottesville, VA.

Thanks in advance for your time and help.

More time series data online

Ear­lier this week I had cof­fee with Ben Fulcher who told me about his online col­lec­tion com­pris­ing about 30,000 time series, mostly med­ical series such as ECG mea­sure­ments, mete­o­ro­log­i­cal series, bird­song, etc. There are some finance series, but not many other data from a busi­ness or eco­nomic con­text, although he does include my Time Series Data Library. In addi­tion, he pro­vides Mat­lab code to com­pute a large num­ber of char­ac­ter­is­tics. Any­one want­ing to test time series algo­rithms on a large col­lec­tion of data should take a look.

Unfor­tu­nately there is no R code, and no R inter­face for down­load­ing the data.

Reflections on UseR! 2013

This week I’ve been at the R Users con­fer­ence in Albacete, Spain. These con­fer­ences are a lit­tle unusual in that they are not really about research, unlike most con­fer­ences I attend. They pro­vide a place for peo­ple to dis­cuss and exchange ideas on how R can be used.

Here are some thoughts and high­lights of the con­fer­ence, in no par­tic­u­lar order. Con­tinue reading →