A very useful way of keeping up with blogs in a particular area is to subscribe to a blog aggregator. These will syndicate posts from a large number of blogs and provide links back to the original sources. So you only need to subscribe once to get all the good stuff in that area. There are now several blog aggregators available that might be of interest to readers here. And this blog is now syndicated on several other sites including those listed below.
Posts Tagged ‘R’:
Measuring time series characteristics
A few years ago, I was working on a project where we measured various characteristics of a time series and used the information to determine what forecasting method to apply or how to cluster the time series into meaningful groups. The two main papers to come out of that project were: Wang, Smith and Hyndman (2006) Characteristic-based clustering for time series data. Data Mining and Knowledge Discovery, 13(3), 335–364. Wang, Smith-Miles and Hyndman (2009) “Rule induction for forecasting method selection: meta-learning the characteristics of univariate time series”, Neurocomputing, 72, 2581–2594. I’ve since had a lot of requests for the code which one of my coauthors has been helpfully emailing to anyone who asked. But to make it easier, we thought it might be helpful if I post some updated code here. This is not the same as the R code we used in the paper, as I’ve improved it in several ways (so it will give different results). If you just want the code, skip to the bottom of the post.
Forecasts and ggplot
The forecast package uses the base R graphics for all plots, but some people may prefer to use the nice graphics available using the ggplot2 package. In the following two posts, Frank Davenport shows how it can be done: Plotting forecast() objects in ggplot part 1: Extracting the Data Plotting forecast() objects in ggplot part 2: Visualize Observations, Fits, and Forecasts
Data visualization
For those who have not read the seminal works of Tufte and Cleveland, please hang your heads in shame. To salvage some sense of self-worth, you can then head over to Solomon Messing’s blog where he is starting a series on data visualization based on the principles developed by Tufte and Cleveland (with R examples). The classics are also worth reading, and remain relevant despite the 20 or 30 years that have elapsed since they appeared.
Exponential smoothing and regressors
I have thought quite a lot about including regressors (i.e. covariates) in exponential smoothing (ETS) models, and I have done it a couple of times in my published work. See my 2008 exponential smoothing book (chapter 9) and my 2008 Tourism Management paper. However, there are some theoretical issues with these approaches, which have come to light through the research of Ahmad Farid Osman, one of our PhD students at Monash University. Basically, they are never forecastable in the sense explained in Section 10.2 my 2008 book (forecastability is the ETS equivalent of invertibility in ARIMA models). Osman has attempted to repair the problem by proposing a different formulation from those in the above references. The only public description of his proposed model is given by Osman and King in this presentation – sorry, they do have a full paper explaining their approach, but it is not publicly available. However, the model is much messier than the formulation we put in our book, and although it avoids the forecastability issues, I think it is more difficult to interpret. Still, it’s a good attempt at a tough problem, and there’s nothing else around that’s any better. So don’t expect any code for fitting ETS models with regressors to appear in the forecast package
(More)…
Internet surveys
I received the following email today: I am preparing a thesis … I need to conduct the widest possible poll, and it occurred to me that perhaps you could guide me toward an internet-based way in which this can be done easily. I have a ten-question questionnaire prepared, that I wish to have an random sample of the population respond to. I have no budget for this, so I hope you can suggest a way in which a good number of responses can be harvested using blogs or sites you may be aware of. Here is my response.
Forecasting time series using R
I gave this talk on Forecasting time series using R for the Melbourne Users of R Network (MelbURN) on Thursday 27 October 2011. Slides Examples Abstract I look at the various facilities for time series forecasting available in R, concentrating on the forecast package. This package implements several automatic methods for forecasting time series including forecasts from ARIMA models, ARFIMA models and exponential smoothing models. I also look more generally at how to go about forecasting non-seasonal data, seasonal data, seasonal data with high frequency, and seasonal data with multiple frequencies. Examples are taken from my own consulting experience. I give an overview of what’s possible and available and where it is useful, rather than give the mathematical details of any specific time series methods.
The art of R programming
This is a gem of a book. It will become the book I give PhD students when they are learning how to write good R code. That is, if I ever see it again. I had hoped to write a review of it, but I haven’t seen it since it arrived in the mail a couple of weeks ago because a research student or research assistant has always had it on loan. I guess that’s a testament to how useful it is.
Kaggle on TV
It is good to see forecasting algorithms getting some mainstream exposure on ABC Catalyst. Update: See also this great talk by Jeremy Howard, a data scientist from Melbourne and now part of Kaggle.
What you wish you knew before you started a PhD
I asked my research group recently what they wished they had learned before they started work on a PhD. Here are some of the responses.

Rob J Hyndman