The first rOpenSci unconference in Australia will be held on Thursday and Friday (April 21-22) in Brisbane, at the Microsoft Innovation Centre.
This event will bring together researchers, developers, data scientists and open data enthusiasts from industry, government and university. The aim is to conceptualise and develop R-based tools that address current challenges in data science, open science and reproducibility.
I wanted to ask you about your R forecast package, in particular the Arima() function. We are using this function to fit an ARIMAX model and produce model estimates and standard errors, which in turn can be used to get p-values and later model forecasts. To double check our work, we are also fitting the same model in SAS using PROC ARIMA and comparing model coefficients and output. Continue reading →
One of my colleagues said that you once said/wrote that you had encountered very few real outliers in your work, and that normally the “outlier-looking” data points were proper data points that should not have been treated as outliers. Have you discussed this in writing? If so, I would love to read it.
I don’t think I’ve ever said or written anything quite like that, and I see lots of outliers in real data. But I have counselled against omitting apparent outliers.
Often the most interesting part of a data set is in the unusual or unexpected observations, so I’m strongly opposed to automatic omission of outliers. The most famous case of that is the non-detection of the hole in the ozone layer by NASA. The way I was told the story was that outliers had been automatically filtered from the data obtained from Nimbus-7. It was only when the British Antarctic Survey observed the phenomenon in the mid 1980s that scientists went back and found the problem could have been detected a decade earlier if automated outlier filtering had not been applied by NASA. In fact, that is also how the story was told on the NASA website for a few years. But in a letter to the editor of the IMS bulletin, Pukelsheim (1990) explains that the reality was more complicated. In the corrected story, scientists were investigating the unusual observations to see if they were genuine, or the result of instrumental error, but still didn’t detect the problem until quite late.
Whatever actually happened, outliers need to be investigated not omitted. Try to understand what caused some observations to be different from the bulk of the observations. If you understand the reasons, you are then in a better position to judge whether the points can legitimately removed from the data set, or whether you’ve just discovered something new and interesting. Never remove a point just because it is weird.
The GEFCom competitions have been a great success in generating good research on forecasting methods for electricity demand, and in enabling a comprehensive comparative evaluation of various methods. But they have only considered price forecasting in a simplified setting. So I’m happy to see this challenge is being taken up as part of the European Energy Market Conference for 2016, to be held from 6-9 June at the University of Porto in Portugal. Continue reading →
Today I attended the funeral of Peter Hall, one of the finest mathematical statisticians ever to walk the earth and easily the best from Australia. One of the most remarkable things about Peter was his astonishing productivity, with over 600 papers. As I sat in the audience I realised that many of the people there were probably coauthors of papers with Peter, and I wondered how many statisticians in the world would have been his coauthors or second-degree co-authors.
In mathematics, people calculate Erdős numbers — the “collaborative distance” between Paul Erdős and another person, as measured by authorship of mathematical papers. An Erdős number of 1 means you wrote a paper with Erdős; an Erdős number of 2 means you wrote a paper with someone who has an Erdős number of 1; and so on. My Erdős number is 3, measured in two different ways:
The student must have submitted a paper to a high quality journal or refereed conference on some topic in the general area of business analytics, computational statistics or data visualization.
Up to $3000 will be awarded to the student to assist with research expenses subject to the approval of the relevant supervisor.
Applications should include the submitted paper, along with a brief statement (no more than 200 words) on how they intend to spend the money. Applications should be emailed to email@example.com by 31 March 2016.
Peter Hall passed away on Saturday after a long battle with illness over the last couple of years. No statistician will need reminding of Peter’s extensive contributions to the field. He had over 500 published papers, and had won every major award available, many of them listed on his Wikipedia page. Continue reading →