When I want to insert figures generated in R into a LaTeX document, it looks better if I first remove the white space around the figure. Unfortunately, R does not make this easy as the graphs are generated to look good on a screen, not in a document.
There are two things that can be done to fix this problem. (more…)
Originally, I wrote this blog for my own PhD students and I covered issues to do with research. I called it “Research tips” because that is what it was meant to be.
However, over time I’ve started covering other things of interest to me, and the readership has grown way beyond what I ever expected. So I decided it was time to acknowledge the change of focus with a change of name. Hyndsight is intended to cover my reflections on anything to do with statistics, forecasting, research, technology, or whatever else I’m thinking about at the time that is somehow related to my job as a Professor of Statistics.
All the old links should still work as I have set up a redirection from researchtips to hyndsight. But if you find something is broken, please let me know.
It is common to fit a model using training data, and then to evaluate its performance on a test data set. When the data are time series, it is useful to compute one-step forecasts on the test data. For some reason, this is much more commonly done by people trained in machine learning rather than statistics.
If you are using the forecast package in R, it is easily done with ETS and ARIMA models. For example:
library(forecast)
fit <- ets(trainingdata)
fit2 <- ets(testdata, model=fit)
onestep <-fitted(fit2)
Note that the second call to ets does not involve the model being re-estimated. Instead, the model obtained in the first call is applied to the test data in the second call. This works because fitted values are one-step forecasts in a time series model.
The same process works for ARIMA models when ets is replaced by Arima or auto.arima. Note that it does not work with the arima function from the stats package. One of the reasons I wrote Arima (in the forecast package) is to allow this sort of thing to be done.
There must be dozens of statistical consulting businesses and organizations in Australia, each specializing in different areas.
I do some consulting work myself, mostly in the forecasting area, but sometimes in other areas of applied statistics including expert witness work in court cases. Email me if you have a project you would like me to take on. However, I often refer potential clients to other statistical consulting groups, as I only have a limited amount of time I can spend on consulting projects. (more…)
This is a guest post by Benedict Noel of Zombal. Many statisticians do a little bit of consulting in addition to their main job, and Zombal provides a way for people to find such work. (more…)
I sometimes get asked about forecasting many time series automatically. Here is a recent email, for example:
I have looked but cannot find any info on generating forecasts on multiple data sets in sequence. I have been using analysis services for sql server to generate fitted time series but it is too much of a black box (or I don’t know enough to tweak/manage the inputs). In short, what package should I research that will allow me to load data, generate a forecast (presumably best fit), export the forecast then repeat for a few thousand items. I have read that R does not like ‘loops’ but not sure if the current cpu power offsets that or not. Any guidance would be greatly appreciated. Thank you!!
For 25 years I have been an intrepid statistical consultant, tackling the wild frontiers of real data, real problems and real time constraints. I have faced problems ranging from linguistics to river beds, from making paper plates to selling pies at the MCG, from tax office audits to surveys about the colour purple. University education helps prepare you to be a statistical consultant in the same way that Google maps helps prepare you to cross the Simpson Desert. You have some idea of the main features, but when you get there, nothing looks familiar.
I will describe some of my adventures, and explain how to bluff your way through ignorance, work with inadequate tools, and deal with smelly clients. I will tell you the story of the client who wouldn’t give me the data, the client who wouldn’t tell me the problem, and the client who wanted all meetings held at random locations for security reasons.
Along the way we will learn about the skills that statisticians need to survive in the wild.
A few days ago I released version 4.0 of the forecast package for R. There were quite a few changes and new features, so I thought it deserved a new version number. I keep a list of changes in the Changelog for the package, but I doubt that many people look at it. So for the record, here are the most important changes to the forecast package made since v3.0 was released. (more…)