At the recent International Symposium on Forecasting, held in Riverside, California, Tillman Gneiting gave a great talk on “Evaluating forecasts: why proper scoring rules and consistent scoring functions matter”. It will be the subject of an IJF invited paper in due course.

One of the things he talked about was the “Murphy diagram” for comparing forecasts, as proposed in Ehm et al (2015). Here’s how it works for comparing mean forecasts.

It is well-known that the mean of the forecast distribution is obtained by minimizing the squared error loss function, S(x,y)=(x-y)^2 where x is the point forecast and y is the actual observation. That is
\text{E}(Y) = \text{min arg}_x S(x,y).

An old result that is not so well known is that there are other loss functions which would also lead to the forecast mean. In fact, Savage (1971) showed that any scoring function is consistent for the mean if and only if it is of the form \begin{equation}\label{savage}
S(x,y) = \phi(y) - \phi(x) - \phi'(x)(y-x)
\end{equation} where the function \phi is convex with subgradient \phi'. Setting \phi(u)=u^2 gives the usual squared error result. So when comparing point forecasts that are intended to be estimates of the mean, it may not be appropriate to only compare the MSE values. A different scoring function that satisfies condition (1) may give a different ranking of competing forecasts.

Ehm et al (2015) showed that any scoring function satisfying condition (1) can be written as
S(x,y) = \int_{-\infty}^\infty S_{\theta}(x,y) dH(\theta)
where H is a non-negative measure and
S_{\theta}(x,y) = \begin{cases}
|y-\theta| & \text{if} \min(x,y)\le\theta < \max(x,y) \\
0 & \text{otherwise}.
\end{cases}
Different H measures give different scoring functions, but S_{\theta}(x,y) is the same for all such scoring functions.

So if we have point forecasts for n events, we can calculate the average value of S_{\theta}(x,y) for each \theta:
s(\theta) = \frac{1}{n}\sum_{i=1}^n S_{\theta}(x_i,y_i)
and plot this as a function of \theta. This is what Ehm et al call a “Murphy diagram”. A similar approach can be taken for quantile and expectile forecasts (with a different function S_\theta(x,y) of course).

I have written some R code for producing these plots. Here it is applied to rolling one-step forecasts computed using ETS and ARIMA models. The data was simulated from an ARIMA model, so we would expect the ARIMA forecasts to be better: