The forecast mean after back-transformation

Many functions in the forecast package for R will allow a Box-Cox transformation. The models are fitted to the transformed data and the forecasts and prediction intervals are back-transformed. This preserves the coverage of the prediction intervals, and the back-transformed point forecast can be considered the median of the forecast densities (assuming the forecast densities on the transformed scale are symmetric). For many purposes, this is acceptable, but occasionally the mean forecast is required. For example, with hierarchical forecasting the forecasts need to be aggregated, and medians do not aggregate but means do.

It is easy enough to derive the mean forecast using a Taylor series expansion. Suppose $f(x)$ represents the back-transformation function, $\mu$ is the mean on the transformed scale and $\sigma^2$ is the variance on the transformed scale. Then using the first three terms of a Taylor expansion around $\mu$, the mean on the original scale is given by
$$f(\mu) + \frac{1}{2}\sigma^2f”(\mu).$$

Box-Cox transformations

For a Box-Cox transformation,
$$f(x) = \begin{cases}
(\lambda x+1)^{1/\lambda} & \text{if $\lambda\ne0$;}\\
e^x & \text{if $\lambda=0$.}
\end{cases}
$$
So
$$f”(x) = \begin{cases}
(1-\lambda)(\lambda x+1)^{1/\lambda-2} & \text{if $\lambda\ne0$;}\\
e^x & \text{if $\lambda=0$.}\end{cases}
$$
and the backtransformed mean is given by
$$\begin{cases}
(\lambda \mu+1)^{1/\lambda}\left[1 + \frac{\sigma^2(1-\lambda)}{2(\lambda \mu+1)^{2}}\right] & \text{if $\lambda\ne0$;}\\
e^\mu\left[1 + \frac{\sigma^2}{2}\right] & \text{if $\lambda=0$.}\end{cases}
$$
Therefore, to adjust the back-transformed mean obtained by R, the following code can be used.

library(fpp)
 
fit <- ets(eggs, lambda=0)
fc <- forecast(fit, h=50, level=95)
fvar <- ((BoxCox(fc$upper,fit$lambda)-BoxCox(fc$lower,fit$lambda))/qnorm(0.975)/2)^2
plot(fc)
fc$mean <- fc$mean * (1 + 0.5*fvar)
lines(fc$mean,col='red')
 
fit <- ets(eggs, lambda=0.2)
fc <- forecast(fit, h=50, level=95)
fvar <- ((BoxCox(fc$upper,fit$lambda)-BoxCox(fc$lower,fit$lambda))/qnorm(0.975)/2)^2
plot(fc)
fc$mean <- fc$mean * (1 + 0.5*fvar*(1-fit$lambda)/(fc$mean)^(2*fit$lambda))
lines(fc$mean,col='red')

The second of these plots is shown below. The blue line shows the forecast medians while the red line shows the forecast means.

eggs3

Scaled logistic transformation

In my previous post on transformations, I described the scaled logit transformation for bounding a forecast between specified limits $a$ and $b$. In this case,
$$f^{-1}(x) = \log\left(\frac{x-a}{b-x}\right)$$
and so
\begin{align}\nonumber
f(x) &= \frac{a+be^x}{1+e^x},\\
f”(x) &= \frac{(b-a)e^x(1-e^{x})}{(1+e^x)^3},
\end{align}
and the back-transformed mean is given by
$$
\frac{1}{(1+e^\mu)^3} \left[
(a+be^\mu)(1+e^\mu)^2 + \frac{\sigma^2}{2}(b-a)e^\mu(1-e^{\mu})\right]
$$

In R, this can be calculated as follows.

# Bounds
a <- 50
b <- 400
# Transform data
y <- log((eggs-a)/(b-eggs))
fit <- ets(y)
fc <- forecast(fit, h=50, level=0.95)
fvar <- ((fc$upper-fc$lower)/qnorm(0.975)/2)^2
emu <- exp(fc$mean)
# Back-transform forecasts
fc$mean <- (b-a)*exp(fc$mean)/(1+exp(fc$mean)) + a
fc$lower <- (b-a)*exp(fc$lower)/(1+exp(fc$lower)) + a
fc$upper <- (b-a)*exp(fc$upper)/(1+exp(fc$upper)) + a
fc$x <- eggs
# Plot result on original scale
plot(fc)
# Compute forecast mean
fc$mean <- 1/(1+emu)^3*((a+b*emu)*(1+emu)^2 + fvar*(b-a)*emu*(1-emu)/2)
lines(fc$mean,col='red')

Related Posts:


  • Brian

    Do the forecast intervals still speak to coverage of the median (in the first example)?

    • The forecast intervals are calculated independently of the point forecasts. They provide coverage of the forecast density, not of a specific point forecast.

  • John

    This really skips a lot of serious issues with considering what are and aren’t comparable results (for starters). The analysis of the transformed data is analysis of a different thing. Doing this kind of back transformation and wanting to combine values is really tantamount to just analyzing the untransformed values in the first place. Furthermore, aggregating the values may have obviated the transformations because the aggregate values will benefit from CLT. I’ve always recommended the best practice to be to either analyze and use transform values exclusively with back transformed values only used to orient the reader. Or, if you must use both treat the analyses as very different things…because they are.

  • Jeffery K

    Can you point me towards a reference for the taylor expansion? I know of it through single variable calc…but where does the use of the variance come from? Thanks!

  • Thanks. I fixed the other post. I don’t have access to SAS.