I’ve added a couple of new functions to the forecast package for R which implement two types of cross-validation for time series. Continue reading →

# Tag / statistics

# Come to Melbourne, even if not to Monash

The University of Melbourne is advertising for a “Professor in Statistics (Data Science)”. Melbourne (the city) is fast becoming a vibrant centre for data science and applied statistics, with more than 4700 people signed up for the Data Science Meetup Group, a thriving start-up scene, the group at Monash Business School (including Di Cook and me), and the Monash Centre for Data Science (including Geoff Webb and Wray Buntine). Not to mention that Melbourne is a wonderful place to live, having won the “World’s most liveable city” award from the Economist for the last 6 years in a row.

Actually, the Uni of Melbourne currently has two professorships on offer — the other being the Peter Hall Chair in Mathematical Statistics. (Not sure that anyone would actually feel qualified to have a job with that title!)

So any professors of statistics out there looking for a new challenge, please consider coming to Melbourne. We’ll even invite you to visit us from time to time at Monash.

# “Forecasting with R” short course in Eindhoven

I will be giving my 3-day short-course/workshop on “Forecasting with R” in Eindhoven (Netherlands) from 19-21 October.

Details at https://www.win.tue.nl/~adriemel/shortcourse.html

# Statistics positions available at Monash University

We are hiring again, and looking for people in statistics, econometrics and related fields (such as actuarial science, machine learning, and business analytics). We have a strong business analytics group (with particular expertise in data visualization, machine learning, statistical computing, R, and forecasting), and it would be great to see it grow. The official advert follows.

# Explore Australian Elections Data with R

This is a guest post by my colleague Professor Di Cook, cross-posted from her Visiphilia blog. Di and I are two of the authors of the new eechidna package for R, now on CRAN. Continue reading →

# SSA helping you find a job

One of the great services of the Statistical Society of Australia is an excellent **jobs board** advertising available jobs for statisticians, data analysts, data scientists, etc. Jobs can be filtered by industry, location and job function.

Today the SSA announced a new service to job seekers: CV/Resume Critique. Continue reading →

# Sample quantiles 20 years later

Almost exactly 20 years ago I wrote a paper with Yanan Fan on how sample quantiles are computed in statistical software. It was cited 43 times in the first 10 years, and 457 times in the next 10 years, making it my third paper to receive 500+ citations.

So what happened in 2006 to suddenly increase the citations? I think it was a combination of things: Continue reading →

# Monash Business Analytics Team Profile

Our research group been growing lately, as you can see below! We were featured in the latest issue of the Monash newsletter *The Insider*. Check it out.

# Model variance for ARIMA models

From today’s email:

I wanted to ask you about your R forecast package, in particular the Arima() function. We are using this function to fit an ARIMAX model and produce model estimates and standard errors, which in turn can be used to get p-values and later model forecasts. To double check our work, we are also fitting the same model in SAS using PROC ARIMA and comparing model coefficients and output. Continue reading →

# Omitting outliers

Someone sent me this email today:

One of my colleagues said that you once said/wrote that you had encountered very few real outliers in your work, and that normally the “outlier-looking” data points were proper data points that should not have been treated as outliers. Have you discussed this in writing? If so, I would love to read it.

I don’t think I’ve ever said or written anything quite like that, and I see lots of outliers in real data. But I have counselled against omitting apparent outliers.

Often the most interesting part of a data set is in the unusual or unexpected observations, so I’m strongly opposed to automatic omission of outliers. The most famous case of that is the non-detection of the hole in the ozone layer by NASA. The way I was told the story was that outliers had been automatically filtered from the data obtained from Nimbus-7. It was only when the British Antarctic Survey observed the phenomenon in the mid 1980s that scientists went back and found the problem could have been detected a decade earlier if automated outlier filtering had not been applied by NASA. In fact, that is also how the story was told on the NASA website for a few years. But in a letter to the editor of the IMS bulletin, Pukelsheim (1990) explains that the reality was more complicated. In the corrected story, scientists *were* investigating the unusual observations to see if they were genuine, or the result of instrumental error, but still didn’t detect the problem until quite late.

Whatever actually happened, outliers need to be investigated not omitted. Try to understand what caused some observations to be different from the bulk of the observations. If you understand the reasons, you are then in a better position to judge whether the points can legitimately removed from the data set, or whether you’ve just discovered something new and interesting. Never remove a point just because it is weird.