Dark themes for writing

I spend much of my day sit­ting in front of a screen, cod­ing or writ­ing. To limit the strain on my eyes, I use a dark theme as much as pos­si­ble. That is, I write with light col­ored text on a dark back­ground. I don’t know why this is not the default in more soft­ware as it makes a big dif­fer­ence after a few hours of writing.

Most of the time, I am writ­ing using either Sub­lime Text, RStu­dio or TeX­studio. Each of them can be set to use a dark theme with syn­tax col­or­ing to high­light struc­tural fea­tures in the text.
Con­tinue reading →

RSS feeds for statistics and related journals

I’ve now res­ur­rected the col­lec­tion of research jour­nals that I fol­low, and set it up as a shared col­lec­tion in feedly. So any­one can eas­ily sub­scribe to all of the same jour­nals, or select a sub­set of them, to fol­low on feedly. Con­tinue reading →

Di Cook is moving to Monash

I’m delighted that Pro­fes­sor Dianne Cook will be join­ing Monash Uni­ver­sity in July 2015 as a Pro­fes­sor of Busi­ness Ana­lyt­ics. Di is an Aus­tralian who has worked in the US for the past 25 years, mostly at Iowa State Uni­ver­sity. She is mov­ing back to Aus­tralia and join­ing the Depart­ment of Econo­met­rics and Busi­ness Sta­tis­tics in the Monash Busi­ness School, as part of our ini­tia­tive in Busi­ness Analytics.

Di is a world leader in data visu­al­iza­tion, and is well-​​​​known for her work on inter­ac­tive graph­ics. She is also the aca­d­e­mic super­vi­sor of sev­eral lead­ing data sci­en­tists includ­ing Hadley Wick­ham and Yihui Xie, both of whom work for RStu­dio.

Di has a great deal of energy and enthu­si­asm for com­pu­ta­tional sta­tis­tics and data visu­al­iza­tion, and will play a key role in devel­op­ing and teach­ing our new sub­jects in busi­ness analytics.

The Monash Busi­ness School is already excep­tion­ally strong in econo­met­rics (ranked 7th in the world on RePEc), and fore­cast­ing (ranked 11th on RePEc), and we have recently expanded into actu­ar­ial sci­ence. With Di join­ing the depart­ment, we will be extend­ing our exper­tise in the area of data visu­al­iza­tion as well.

 

 

New R package for electricity forecasting

Shu Fan and I have devel­oped a model for elec­tric­ity demand fore­cast­ing that is now widely used in Aus­tralia for long-​​term fore­cast­ing of peak elec­tric­ity demand. It has become known as the “Monash Elec­tric­ity Fore­cast­ing Model”. We have decided to release an R pack­age that imple­ments our model so that other peo­ple can eas­ily use it. The pack­age is called “MEFM” and is avail­able on github. We will prob­a­bly also put in on CRAN eventually.

The model was first described in  Hyn­d­man and Fan (2010). We are con­tin­u­ally improv­ing it, and the lat­est ver­sion is decribed in the model doc­u­men­ta­tion which will be updated from time to time.

The pack­age is being released under a GPL licence, so any­one can use it. All we ask is that our work is prop­erly cited.

Nat­u­rally, we are not able to pro­vide free tech­ni­cal sup­port, although we wel­come bug reports. We are avail­able to under­take paid con­sult­ing work in elec­tric­ity forecasting.

 

A time series classification contest

Amongst today’s email was one from some­one run­ning a pri­vate com­pe­ti­tion to clas­sify time series. Here are the essen­tial details.

The data are mea­sure­ments from a med­ical diag­nos­tic machine which takes 1 mea­sure­ment every sec­ond, and after 32–1000 sec­onds, the time series must be clas­si­fied into one of two classes. Some pre-​​classified train­ing data is pro­vided. It is not nec­es­sary to clas­sify all the test data, but you do need to have rel­a­tively high accu­racy on what is clas­si­fied. So you could find a sub­set of more eas­ily clas­si­fi­able test time series, and leave the rest of the test data unclas­si­fied. Con­tinue reading →

New Australian data on the HMD

The Human Mor­tal­ity Data­base is a won­der­ful resource for any­one inter­ested in demo­graphic data. It is a care­fully curated col­lec­tion of high qual­ity deaths and pop­u­la­tion data from 37 coun­tries, all in a con­sis­tent for­mat with con­sis­tent def­i­n­i­tions. I have used it many times and never cease to be amazed at the care taken to main­tain such a great resource.

The data are con­tin­u­ally being revised and updated. Today the Aus­tralian data has been updated to 2011. There is a time lag because of lagged death reg­is­tra­tions which results in under­counts; so only data that are likely to be com­plete are included.

Tim Riffe from the HMD has pro­vided the fol­low­ing infor­ma­tion about the update:

  1. All death counts since 1964 are now included by year of occur­rence, up to 2011. We have 2012 data but do not pub­lish them because they are likely a 5% under­count due to lagged registration.
  2. Death count inputs for 1921 to 1963 are now in sin­gle ages. Pre­vi­ously they were in 5-​​year age groups. Rather than hav­ing an open age group of 85+ in this period counts usu­ally go up to the max­i­mum observed (stated) age. This change (i) intro­duces minor heap­ing in early years and (ii) implies dif­fer­ent appar­ent old-​​age mor­tal­ity than before, since pre­vi­ously any­thing above 85 was mod­eled accord­ing to the Meth­ods Pro­to­col.
  3. Pop­u­la­tion denom­i­na­tors have been swapped out for years 1992 to the present, owing to new ABS method­ol­ogy and inter­censal esti­mates for the recent period.

Some of the data can be read into R using the hmd.mx and hmd.e0 func­tions from the demog­ra­phy pack­age. Tim has his own pack­age on github that pro­vides a more exten­sive interface.

Visualization of probabilistic forecasts

This week my research group dis­cussed Adrian Raftery’s recent paper on “Use and Com­mu­ni­ca­tion of Prob­a­bilis­tic Fore­casts” which pro­vides a fas­ci­nat­ing but brief sur­vey of some of his work on mod­el­ling and com­mu­ni­cat­ing uncer­tain futures. Coin­ci­den­tally, today I was also sent a copy of David Spiegelhalter’s paper on “Visu­al­iz­ing Uncer­tainty About the Future”. Both are well-​​worth reading.

It made me think about my own efforts to com­mu­ni­cate future uncer­tainty through graph­ics. Of course, for time series fore­casts I nor­mally show pre­dic­tion inter­vals. I pre­fer to use more than one inter­val at a time because it helps con­vey a lit­tle more infor­ma­tion. The default in the fore­cast pack­age for R is to show both an 80% and a 95% inter­val like this: Con­tinue reading →