There are several other blogs on forecasting that readers might be interested in. Here are seven worth following: No Hesitations by Francis Diebold (Professor of Economics, University of Pennsylvania). Diebold needs no introduction to forecasters. He primarily covers forecasting in economics and finance, but also xkcd cartoons, graphics, research issues, etc. Econometrics Beat by Dave Giles. Dave is a professor of economics at the University of Victoria (Canada), formerly from my own department at Monash University (Australia), and a native New Zealander. Not a lot on forecasting, but plenty of interesting posts about econometrics and statistics more generally. Business forecasting by Clive Jones (a professional forecaster based in Colorado, USA). Originally about sales and new product forecasting, but he now covers a lot of other forecasting topics and has an interesting practitioner perspective. Freakonometrics: by Arthur Charpentier (an actuary and professor of mathematics at the University of Quebec at Montréal, Canada). This is the most prolific blog on this list. Wide ranging and taking in statistics, forecasting, econometrics, actuarial science, R, and anything else that takes his fancy. Sometimes in French. No free hunch: the kaggle blog. Some of the most interesting posts are from kaggle competition winners explaining their methods. Energy forecasting by Tao Hong (formerly an energy forecaster for

## Posts Tagged ‘statistics’:

## Errors on percentage errors

The MAPE (mean absolute percentage error) is a popular measure for forecast accuracy and is defined as where denotes an observation and denotes its forecast, and the mean is taken over . Armstrong (1985, p.348) was the first (to my knowledge) to point out the asymmetry of the MAPE saying that “it has a bias favoring estimates that are below the actual values”.

## Job at Center for Open Science

This looks like an interesting job. Dear Dr. Hyndman, I write from the Center for Open Science, a non-profit organization based in Charlottesville, Virginia in the United States, which is dedicated to improving the alignment between scientific values and scientific practices. We are dedicated to open source and open science. We are reaching out to you to find out if you know anyone who might be interested in our Statistical and Methodological Consultant position. The position is a unique opportunity to consult on reproducible best practices in data analysis and research design; the consultant will make shorts visits to provide lectures and training at universities, laboratories, conferences, and through virtual mediums. An especially unique part of the job involves collaborating with the White House’s Office of Science and Technology Policy on matters relating to reproducibility. If you know someone with substantial training and experience in scientific research, quantitative methods, reproducible research practices, and some programming experience (at least R, ideally Python or Julia) might you please pass this along to them? Anyone may find out more about the job or apply via our website: http://centerforopenscience.org/jobs/#stats The position is full-time and located at our office in beautiful Charlottesville, VA. Thanks in advance for your time

## Interpreting noise

When watching the TV news, or reading newspaper commentary, I am frequently amazed at the attempts people make to interpret random noise. For example, the latest tiny fluctuation in the share price of a major company is attributed to the CEO being ill. When the exchange rate goes up, the TV finance commentator confidently announces that it is a reaction to Chinese building contracts. No one ever says “The unemployment rate has dropped by 0.1% for no apparent reason.” What is going on here is that the commentators are assuming we live in a noise-free world. They imagine that everything is explicable, you just have to find the explanation. However, the world is noisy — real data are subject to random fluctuations, and are often also measured inaccurately. So to interpret every little fluctuation is silly and misleading.

## Fast computation of cross-validation in linear models

The leave-one-out cross-validation statistic is given by where , are the observations, and is the predicted value obtained when the model is estimated with the th case deleted. This is also sometimes known as the PRESS (Prediction Residual Sum of Squares) statistic. It turns out that for linear models, we do not actually have to estimate the model times, once for each omitted case. Instead, CV can be computed after estimating the model once on the complete data set.

## Probabilistic forecasting by Gneiting and Katzfuss (2014)

The IJF is introducing occasional review papers on areas of forecasting. We did a whole issue in 2006 reviewing 25 years of research since the International Institute of Forecasters was established. Since then, there has been a lot of new work in application areas such as call center forecasting and electricity price forecasting. In addition, there are areas we did not cover in 2006 including new product forecasting and forecasting in finance. There have also been methodological and theoretical developments over the last eight years. Consequently, I’ve started inviting eminent researchers to write survey papers for the journal. One obvious choice was Tilmann Gneiting, who has produced a large body of excellent work on probabilistic forecasting in the last few years. The theory of forecasting was badly in need of development, and Tilmann and his coauthors have made several great contributions in this area. However, when I asked him to write a review he explained that another journal had got in before me, and that the review was already written. It appeared in the very first volume of the new journal Annual Review of Statistics and its Application: Gneiting and Katzfuss (2014) Probabilistic Forecasting, pp.125–151. Having now read it, I’m both grateful for this more accessible

## Testing for trend in ARIMA models

Today’s email brought this one: I was wondering if I could get your opinion on a particular problem that I have run into during the reviewing process of an article. Basically, I have an analysis where I am looking at a couple of time-series and I wanted to know if, over time there was an upward trend in the series. Inspection of the raw data suggests there is, but we want some statistical evidence for this. To achieve this I ran some ARIMA (0,1,1) models including a drift/trend term to see if the mean of the series did indeed shift upwards with time and found that it did. However, we have run into an issue with a reviewer who argues that differencing removes trends and may not be a suitable way to detect trends. Therefore, the fact that we found a trend despite differencing suggest that differencing was not successful. I know there are a few papers and textbooks that use ARIMA (0,1,1) models as ‘random walks with drift’-type models so I cited them as examples of this procedure in action, but they remained unconvinced. Instead it was suggested that I look for trends in the raw undifferenced time-series as these would be more reliable as no trends had been removed. AT the moment I am hesitant to do this

## Unit root tests and ARIMA models

An email I received today: I have a small problem. I have a time series called x : — If I use the default values of auto.arima(x), the best model is an ARIMA(1,0,0) — However, I tried the function ndiffs(x, test=“adf”) and ndiffs(x, test=“kpss”) as the KPSS test seems to be the default value, and the number of difference is 0 for the kpss test (consistent with the results of auto.arima() ) but 2 for the ADF test. I then tried auto.arima(x, test=“adf”) and now I have another model ARIMA(1,2,1). I am unsure which order of integration I should use as tests give fairly different results. Is there a test that prevails ?

## Forecasting weekly data

This is another situation where Fourier terms are useful for handling the seasonality. Not only is the seasonal period rather long, it is non-integer (averaging 365.25÷7 = 52.18). So ARIMA and ETS models do not tend to give good results, even with a period of 52 as an approximation.

## Fitting models to short time series

Following my post on fitting models to long time series, I thought I’d tackle the opposite problem, which is more common in business environments. I often get asked how few data points can be used to fit a time series model. As with almost all sample size questions, there is no easy answer. It depends on the number of model parameters to be estimated and the amount of randomness in the data. The sample size required increases with the number of parameters to be estimated, and the amount of noise in the data.