Come to Melbourne, even if not to Monash

The University of Melbourne is advertising for a “Professor in Statistics (Data Science)”. Melbourne (the city) is fast becoming a vibrant centre for data science and applied statistics, with more than 4700 people signed up for the Data Science Meetup Group, a thriving start-up scene, the group at Monash Business School (including Di Cook and me), and the Monash Centre for Data Science (including Geoff Webb and Wray Buntine). Not to mention that Melbourne is a wonderful place to live, having won the “World’s most liveable city” award from the Economist for the last 6 years in a row.

Actually, the Uni of Melbourne currently has two professorships on offer — the other being the Peter Hall Chair in Mathematical Statistics. (Not sure that anyone would actually feel qualified to have a job with that title!)

So any professors of statistics out there looking for a new challenge, please consider coming to Melbourne. We’ll even invite you to visit us from time to time at Monash.


Statistics positions available at Monash University

We are hiring again, and looking for people in statistics, econometrics and related fields (such as actuarial science, machine learning, and business analytics). We have a strong business analytics group (with particular expertise in data visualization, machine learning, statistical computing, R, and forecasting), and it would be great to see it grow. The official advert follows.

Continue reading →

Model variance for ARIMA models

From today’s email:

I wanted to ask you about your R forecast package, in particular the Arima() function. We are using this function to fit an ARIMAX model and produce model estimates and standard errors, which in turn can be used to get p-values and later model forecasts. To double check our work, we are also fitting the same model in SAS using PROC ARIMA and comparing model coefficients and output. Continue reading →

Omitting outliers

Someone sent me this email today:

One of my colleagues said that you once said/wrote that you had encountered very few real outliers in your work, and that normally the “outlier-looking” data points were proper data points that should not have been treated as outliers. Have you discussed this in writing? If so, I would love to read it.

I don’t think I’ve ever said or written anything quite like that, and I see lots of outliers in real data. But I have counselled against omitting apparent outliers.

Often the most interesting part of a data set is in the unusual or unexpected observations, so I’m strongly opposed to automatic omission of outliers. The most famous case of that is the non-detection of the hole in the ozone layer by NASA. The way I was told the story was that outliers had been automatically filtered from the data obtained from Nimbus-7. It was only when the British Antarctic Survey observed the phenomenon in the mid 1980s that scientists went back and found the problem could have been detected a decade earlier if automated outlier filtering had not been applied by NASA. In fact, that is also how the story was told on the NASA website for a few years. But in a letter to the editor of the IMS bulletin, Pukelsheim (1990) explains that the reality was more complicated. In the corrected story, scientists were investigating the unusual observations to see if they were genuine, or the result of instrumental error, but still didn’t detect the problem until quite late.

Whatever actually happened, outliers need to be investigated not omitted. Try to understand what caused some observations to be different from the bulk of the observations. If you understand the reasons, you are then in a better position to judge whether the points can legitimately removed from the data set, or whether you’ve just discovered something new and interesting. Never remove a point just because it is weird.