Forecast estimation, evaluation and transformation

I’ve had a few emails lately about fore­cast eval­u­a­tion and esti­ma­tion cri­te­ria. Here is one I received today, along with some comments.

I have a rather sim­ple ques­tion regard­ing the use of MSE as opposed to MAD and MAPE. If the para­me­ters of a time series model are esti­mated by min­i­miz­ing MSE, why do we eval­u­ate the model using some other met­ric, e.g., MAD and MAPE. I could see that MAPE is not scale depen­dent. But MAPE is a per­cent­age ver­sion of MAD. So why don’t we use the per­cent­age ver­sion of MSE?

MSE (mean squared error) is not scale-​​free. If your data are in dol­lars, then the MSE is in squared dol­lars. Often you will want to com­pare fore­cast accu­racy across a num­ber of time series hav­ing dif­fer­ent units. In this case, MSE makes no sense. MAE (mean absolute error) is also scale-​​dependent and so can­not be used for com­par­isons across series of dif­fer­ent units. The MAD (mean absolute devi­a­tion) is just another name for the MAE.

The MAPE (mean absolute per­cent­age error) is not scale-​​dependent and is often use­ful for fore­cast eval­u­a­tion. How­ever, it has a num­ber of lim­i­ta­tions. For example,

  1. If the data con­tain zeros, the MAPE can be infi­nite as it will involve divi­sion by zero. If the data con­tain very small num­bers, the MAPE can be huge.
  2. The MAPE assumes that per­cent­ages make sense; that is, that the zero on the scale of the data is mean­ing­ful. When fore­cast­ing wid­gets, this is ok. But when fore­cast­ing tem­per­a­tures in degrees Cel­sius or Fahren­heit it makes no sense. The zero on these tem­per­a­ture scales is rel­a­tively arbi­trary, and so per­cent­ages are meaningless.

It is pos­si­ble to have a per­cent­age ver­sion of MSE, the Mean Squared Per­cent­age Error, but this isn’t used very often.

The MASE (mean absolute scaled error) was intended to avoid these problems.

For fur­ther dis­cus­sion on these and related points, see Hyn­d­man & Koehler (IJF, 2006). A preprint ver­sion is also available.

Also, sup­pose we have a log­nor­mal model, where the esti­ma­tion is done on the log-​​transformed scale and the pre­dic­tion is done on the orig­i­nal, untrans­formed scale. One could either pre­dict with the con­di­tional mean or the con­di­tional median. It seems to me that you would pre­dict with the mean if the MSE is your met­ric, but you would pre­dict with the median if the MAD is your met­ric. My thought is that the mean would min­i­mize MSE, while the median would min­i­mize MAD. So whether you use the mean or the median depends on which met­ric you use for eval­u­at­ing the model.

In most cases, the mean and median will coin­cide on the trans­formed scale because the trans­for­ma­tion should have pro­duced a sym­met­ric error dis­tri­b­u­tion. I would usu­ally esti­mate with the MSE because it is more effi­cient (assum­ing the errors look nor­mal). It might help to esti­mate with the MAD if there are out­liers, but I would pre­fer to explic­itly deal with them.

When fore­cast­ing on the orig­i­nal, untrans­formed scale, the sim­ple thing to do is to back-​​transform the fore­casts (and the pre­dic­tion inter­val lim­its). The point fore­casts will then be the con­di­tional median (assum­ing sym­me­try on the trans­formed scale), and the pre­dic­tion inter­val will still have the desired coverage.

To get the con­di­tional mean on the orig­i­nal scale, it is nec­es­sary to adjust the point fore­casts. If X is the vari­able on the log-​​scale and Y = e^X is the vari­able on the orig­i­nal scale, then \text{E}(Y) = e^{\mu + \sigma^2/2} where \mu is the point fore­cast on the log-​​scale and \sigma^2 is the fore­cast vari­ance on the log-​​scale. The pre­dic­tion inter­val remains unchanged whether you use a con­di­tional mean or con­di­tional median for the point forecast.

Occa­sion­ally, there may be some rea­son to pre­fer a con­di­tional mean point fore­cast; for exam­ple, if you are fore­cast­ing a num­ber of related prod­ucts and you need the point fore­casts to sum to give the fore­cast of total num­ber of prod­ucts. But in most sit­u­a­tions, the con­di­tional median will be suitable.

In R, the plot.forecast() func­tion (from the fore­cast pack­age) will back-​​transform point fore­casts and pre­dic­tion inter­vals using an inverse Box-​​Cox trans­for­ma­tion. Just include the argu­ment lambda. For example:

fit <- ar(BoxCox(lynx,0.5))
plot(forecast(fit,h=20), lambda=0.5)

Related Posts:

  • devi

    Respected Sir,
    I need to know whether the non­lin­ear mod­els are suit­able for fore­cast­ing or not. And how to iden­tify the non­lin­ear­i­ties in the time series by mutual information.

    • Rob J Hyndman

      Some non­lin­ear mod­els pro­vide good fore­casts for some data sets. Iden­ti­fi­ca­tion of non­lin­ear­ity is a com­pli­cated topic. See Fan and Yao (2005) for a good sur­vey of the area.

  • devi

    Respected Sir,
    my ques­tion is “what is mutual infor­ma­tion ?(i under­stood that it will give some idea about lin­ear as well as non­lin­ear depen­den­cies) but, how can i inter­pret it in the plot (ie.,) how can i dif­fer­en­ti­ate the non­lin­ear from the lin­ear dependencies?

  • Chewyraver

    Thank you for shar­ing! When I first start­ing explor­ing the fore­cast­ing world, I was very con­fused about which loss func­tion to use, it almost seemed that it didn’t mat­ter which one was used. Show­ing the dif­fer­ence between loss func­tions and pro­vid­ing a sim­ple method of selec­tion was part of my hon­ours thesis.

  • zbi­cy­clist

    Thanks. This is the clear­est expla­na­tion of the log adjust­ment I’ve ever read.

    A bit of humor: when I first ran into this, I saw it writ­ten as
    e^(u+1/2s^2), which is ambigu­ous. Since at the time I was run­ning a large ana­lytic group (>75 pro­fes­sion­als), I went to the spe­cific sub­group that typ­i­cally did log mod­els and sce­nar­ios (more what-​​if than fore­casts) and asked them what the expres­sion meant. A sam­pling of opinion:

    (1) what?
    (2) (s^2) /​ 2
    (3) 1 /​ (2s^2)
    (4) I don’t use that. I com­pute the aver­age bias ratio above/​below the mean on the mod­eled obser­va­tions and use those fac­tors to cor­rect the what-​​if forecasts.

    Answer #4 turned out to work pretty well in prac­tice, and since that answer came from the sta­tis­ti­cian who’d writ­ten the pro­duc­tion code, we stayed with that.

  • Dr. Abed

    I do not know why pack­age fore­cast 2.16 in R does not pro­duce Theil’s U? I really appre­ci­ate your efforts.

    • Rob J Hyndman

      It does include it. Use the accu­racy() function.

  • Tom Shel­ton

    Dear Mr. Hyndman

    I am a novice in R but some­what knowl­edge­able about fore­cast­ing (how­ever, I am not a mathematician_​. I am attempt­ing to fore­casts from 3 years of his­toric traf­fic data that has strong day of week as well as weekly sea­sonal pat­terns. I’ve been able to gen­er­ate rea­son­able fore­casts with the stlf func­tion using a fre­quency of 52(weekly) with rel­a­tively good MAPE val­ues of 6–16%. How­ever, last week I re-​​ran some scripts(no script changes) on the same data set that I used for the pre­vi­ous runs sev­eral weeks ago. I am now find­ing that the MAPE in the accu­racy out­put has increased by two dec­i­mal points. For instance a pre­vi­ous run gave me a MAPE of 9.xxxx% and now I am get­ting or another exam­ple is 16.xxxx% from a pre­vi­ous run to 1666.xxxx for the cur­rent run. All the val­ues in both the sum­mary and accu­racy are the same except the dec­i­mal point seems to have shifted in the MAPE. What am I doing wrong or has there been a change in the fore­cast pack­age? Are there poten­tially other programs/​packages that could be interfering?

    Thank you

    Tom Shelton/​Berlin

    • Rob J Hyndman

      It looks like a bug. In ver­sion 4.05 I com­pletely rewrote the accu­racy() func­tion. Unfor­tu­nately, the mape and mpe are now 100 times too large. I’ll fix it in the next version.

      • Tom Shel­ton

        Thank you very much for your quick reply. When do you think the next ver­sion will be released.

        • Rob J Hyndman

          Hope­fully today or tomorrow.