# The forecast mean after back-transformation

Many functions in the forecast package for R will allow a Box-Cox transformation. The models are fitted to the transformed data and the forecasts and prediction intervals are back-transformed. This preserves the coverage of the prediction intervals, and the back-transformed point forecast can be considered the median of the forecast densities (assuming the forecast densities on the transformed scale are symmetric). For many purposes, this is acceptable, but occasionally the mean forecast is required. For example, with hierarchical forecasting the forecasts need to be aggregated, and medians do not aggregate but means do.

It is easy enough to derive the mean forecast using a Taylor series expansion. Suppose $f(x)$ represents the back-transformation function, $\mu$ is the mean on the transformed scale and $\sigma^2$ is the variance on the transformed scale. Then using the first three terms of a Taylor expansion around $\mu$, the mean on the original scale is given by
$$f(\mu) + \frac{1}{2}\sigma^2f”(\mu).$$

### Box-Cox transformations

For a Box-Cox transformation,
$$f(x) = \begin{cases} (\lambda x+1)^{1/\lambda} & \text{if \lambda\ne0;}\\ e^x & \text{if \lambda=0.} \end{cases}$$
So
$$f”(x) = \begin{cases} (1-\lambda)(\lambda x+1)^{1/\lambda-2} & \text{if \lambda\ne0;}\\ e^x & \text{if \lambda=0.}\end{cases}$$
and the backtransformed mean is given by
$$\begin{cases} (\lambda \mu+1)^{1/\lambda}\left[1 + \frac{\sigma^2(1-\lambda)}{2(\lambda \mu+1)^{2}}\right] & \text{if \lambda\ne0;}\\ e^\mu\left[1 + \frac{\sigma^2}{2}\right] & \text{if \lambda=0.}\end{cases}$$
Therefore, to adjust the back-transformed mean obtained by R, the following code can be used.

### Related Posts:

• Brian

Do the forecast intervals still speak to coverage of the median (in the first example)?

• The forecast intervals are calculated independently of the point forecasts. They provide coverage of the forecast density, not of a specific point forecast.

• John

This really skips a lot of serious issues with considering what are and aren’t comparable results (for starters). The analysis of the transformed data is analysis of a different thing. Doing this kind of back transformation and wanting to combine values is really tantamount to just analyzing the untransformed values in the first place. Furthermore, aggregating the values may have obviated the transformations because the aggregate values will benefit from CLT. I’ve always recommended the best practice to be to either analyze and use transform values exclusively with back transformed values only used to orient the reader. Or, if you must use both treat the analyses as very different things…because they are.

• Jeffery K

Can you point me towards a reference for the taylor expansion? I know of it through single variable calc…but where does the use of the variance come from? Thanks!

• Jeffery K

Ah the Delta method!

• Thanks. I fixed the other post. I don’t have access to SAS.