What’s your Hall number?

Today I attended the funeral of Peter Hall, one of the finest mathematical statisticians ever to walk the earth and easily the best from Australia. One of the most remarkable things about Peter was his astonishing productivity, with over 600 papers. As I sat in the audience I realised that many of the people there were probably coauthors of papers with Peter, and I wondered how many statisticians in the world would have been his coauthors or second-degree co-authors.

In mathematics, people calculate Erdős numbers — the “collaborative distance” between Paul Erdős and another person, as measured by authorship of mathematical papers. An Erdős number of 1 means you wrote a paper with Erdős; an Erdős number of 2 means you wrote a paper with someone who has an Erdős number of 1; and so on. My Erdős number is 3, measured in two different ways:

  • via Peter Brockwell / Kai-Lai Chung / Paul Erdös
  • via J. Keith Ord / Peter C Fishburn / Paul Erdös

It seems appropriate that we should compute Hall numbers in statistics. Mine is 1, as I was lucky enough to have coauthored two papers with Peter Hall. You can compute your own Hall number here. Just put your own surname in the second author field.

 

 

ACEMS Business Analytics Prize 2016

We have established a new annual prize for research students at Monash University in the general area of business analytics, funded by the Australian Centre of Excellence in Mathematical and Statistical Frontiers (ACEMS). The rules of the award are listed below.

  1. The student must have submitted a paper to a high quality journal or refereed conference on some topic in the general area of business analytics, computational statistics or data visualization.
  2. Up to $3000 will be awarded to the student to assist with research expenses subject to the approval of the relevant supervisor.
  3. Applications should include the submitted paper, along with a brief statement (no more than 200 words) on how they intend to spend the money. Applications should be emailed to econometrics@monash.edu by 31 March 2016.
  4. The winning student will be selected by a panel consisting of Di Cook, Rob Hyndman, Catherine Forbes and Geoff Webb.
  5. Any HDR student currently enrolled at Monash University is eligible to apply.

Questions about the award can be asked in the comments section below.

Starting a career in data science

I received this email from one of my undergraduate students:

I’m writing to you asking for advice on how to start a career in Data Science. Other professions seem a bit more straight forward, in that accountants for example simply look for Internships and ways into companies from there. From my understanding, the nature of careers in data science seem to be on a project-to-project basis. I’m not sure how to get my foot stuck in the door.

I am expecting to finish degree by Semester 1 2016. In my job searching so far, I have only encountered positions which require 3+ years of previous data analysis experience and have not seen any “entry-level” data analysis positions or graduate data positions. What is the nature of entry level recruitment in this industry?

Any help would be greatly appreciated.

Regards,
Aran

Continue reading →

Making data analysis easier

Di Cook and I are organizing a workshop on “Making data analysis easier” for 18-19 February 2016.

We are calling it WOMBAT2016, which an acronym for Workshop Organized by the Monash Business Analytics Team. Appropriately, it will be held at the Melbourne Zoo. Our plan is to make these workshops an annual event.

Some details are available on the workshop website. Key features are:

  • Hadley Wickham is our keynote speaker. He has been instrumental in changing the way we think about data analysis, and providing new tools for tidying, rearranging, summarising and plotting data. His R packages (including tidyr, dplyr, ggplot2, and ggvis) are very widely used.
  • Other speakers include Phil Brierley, Eugene Dubossarsky, Heike Hofmann, Thomas Lumley, Andrew Robinson, Elle Saber, Carson Sievert, Zoe van Havre, Geoff Webb, Yanchang Zhao, as well as Di and me.
  • The numbers are limited to a total of 100 with a quota on students, academics and people from business/industry. The aim is to have a good mix of people from different backgrounds to encourage productive discussions and mutual learning.
  • Register on Eventbrite.
  • We also have some places available for contributing speakers (15 minute talks). If you would like to do a contributed talk, you will need to email us a title and abstract by 15 January. We will notify you if your peer-reviewed abstract is successful by 29 January.

If you miss out on the workshop, you can still hear Hadley speak. Data Science Melbourne will host a meetup featuring him in the evening of Monday 22 February 2016.

 

The hidden benefits of open-source software

I’ve been having discussions with colleagues and university administration about the best way for universities to manage home-grown software.

The traditional business model for software is that we build software and sell it to everyone willing to pay. Very often, that leads to a software company spin-off that has little or nothing to do with the university that nurtured the development. Think MATLAB, S-Plus, Minitab, SAS and SPSS, all of which grew out of universities or research institutions. This model has repeatedly been shown to stifle research development, channel funds away from the institutions where the software was born, and add to research costs for everyone.

I argue that the open-source model is a much better approach both for research development and for university funding. Under the open-source model, we build software, and make it available for anyone to use and adapt under an appropriate licence. This approach has many benefits that are not always appreciated by university administrators. Continue reading →

ODI looking for young postgrad statisticians

The Overseas Development Institute Fellowship Scheme sends young postgraduate statisticians (and economists) to work in the public sectors of developing countries in Africa, the Caribbean and the Pacific on two-year contracts. This is a great way to develop skills and gain experience working within a developing country’s government. And you get to live in a fascinating place!

The application process for the 2016-2018 Fellowship Scheme is now open. Students are advised to apply before 17 December 2015 for a chance to be part of the ODI Fellowship Scheme.

Essential criteria:

  • degree in statistics, economics, or a related field
  • postgraduate degree qualification
  • ability to commit to a two-year assignment

Application is via the online application form.

Read some first-hand experiences of current and former Fellows.

 

Big Data for Official Statistics Competition

This is a new competition being organized by EuroStat. The first phase involves nowcasting economic indicators at national and European level including unemployment, HICP, Tourism and Retail Trade and some of their variants.

The main goal of the competition is to discover promising methodologies and data sources that could, now or in the future, be used to improve the production of official statistics in the European Statistical System.

The organizers seem to have been encouraged by the success of Kaggle and other data science competition platforms. Unfortunately, they have chosen not to give any prizes other than an invitation to give a conference presentation or poster, which hardly seems likely to attract many good participants.

The deadline for registration is 10 January 2016. The duration of the competition is roughly a year (including about a month for evaluation).

See the call for participation for more information.

Reproducibility in computational research

Jane Frazier spoke at our research team meeting today on “Reproducibility in computational research”. We had a very stimulating and lively discussion about the issues involved. One interesting idea was that reproducibility is on a scale, and we can all aim to move further along the scale towards making our own research more reproducible. For example

  • Can you reproduce your results tomorrow on the same computer with the same software installed?
  • Could someone else on a different computer reproduce your results with the same software installed?
  • Could you reproduce your results in 3 years time after some of your software environment may have changed?
  • etc.

Think about what changes you need to make to move one step further along the reproducibility continuum, and do it.

Jane’s slides and handout are below. Continue reading →

Upcoming talks in California

I’m back in California for the next couple of weeks, and will give the following talk at Stanford and UC-Davis.

Optimal forecast reconciliation for big time series data

Time series can often be naturally disaggregated in a hierarchical or grouped structure. For example, a manufacturing company can disaggregate total demand for their products by country of sale, retail outlet, product type, package size, and so on. As a result, there can be millions of individual time series to forecast at the most disaggregated level, plus additional series to forecast at higher levels of aggregation.

A common constraint is that the disaggregated forecasts need to add up to the forecasts of the aggregated data. This is known as forecast reconciliation. I will show that the optimal reconciliation method involves fitting an ill-conditioned linear regression model where the design matrix has one column for each of the series at the most disaggregated level. For problems involving huge numbers of series, the model is impossible to estimate using standard regression algorithms. I will also discuss some fast algorithms for implementing this model that make it practicable for implementing in business contexts.

Stanford: 4.30pm, Tuesday 6th October.
UCDavis: 4:10pm, Thursday 8th October.