Errors on percentage errors

The MAPE (mean absolute per­cent­age error) is a pop­u­lar mea­sure for fore­cast accu­racy and is defined as

    \[\text{MAPE} = 100\text{mean}(|y_t - \hat{y}_t|/|y_t|)\]

where y_t denotes an obser­va­tion and \hat{y}_t denotes its fore­cast, and the mean is taken over t.

Arm­strong (1985, p.348) was the first (to my knowl­edge) to point out the asym­me­try of the MAPE say­ing that “it has a bias favor­ing esti­mates that are below the actual val­ues”. A few years later, Arm­strong and Col­lopy (1992) argued that the MAPE “puts a heav­ier penalty on fore­casts that exceed the actual than those that are less than the actual”. Makri­dakis (1993) took up the argu­ment say­ing that “equal errors above the actual value result in a greater APE than those below the actual value”. He pro­vided an exam­ple where y_t=150 and \hat{y}_t=100, so that the rel­a­tive error is 50÷150=0.33, in con­trast to the sit­u­a­tion where y_t=100 and \hat{y}_t=150, when the rel­a­tive error would be 50÷100=0.50.

Thus, the MAPE puts a heav­ier penalty on neg­a­tive errors (when y_t < \hat{y}_t) than on pos­i­tive errors. This is what is stated in my text­book. Unfor­tu­nately, Anne Koehler and I got it the wrong way around in our 2006 paper on mea­sures of fore­cast accu­racy, where we said the heav­ier penalty was on pos­i­tive errors. We were prob­a­bly think­ing that a fore­cast that is too large is a pos­i­tive error. How­ever, fore­cast errors are defined as y_t - \hat{y}_t, so pos­i­tive errors arise only when the fore­cast is too small.

To avoid the asym­me­try of the MAPE, Arm­strong (1985, p.348) pro­posed the “adjusted MAPE”, which he defined as

    \[\overline{\text{MAPE}} = 100\text{mean}(2|y_t - \hat{y}_t|/(y_t + \hat{y}_t))\]

By that def­i­n­i­tion, the adjusted MAPE can be neg­a­tive (if y_t+\hat{y}_t < 0), or infi­nite (if y_t+\hat{y}_t=0), although Arm­strong claims that it has a range of (0,200). Pre­sum­ably he never imag­ined that data and fore­casts can take neg­a­tive val­ues. Strangely, there is no ref­er­ence to this mea­sure in Arm­strong and Col­lopy (1992).

Makri­dakis (1993) pro­posed almost the same mea­sure, call­ing it the “sym­met­ric MAPE” (sMAPE), but with­out cred­it­ing Arm­strong (1985), defin­ing it

    \[\text{sMAPE} = 100\text{mean}(2|y_t - \hat{y}_t|/|y_t + \hat{y}_t|)\]

How­ever, in the M3 com­pe­ti­tion paper by Makri­dakis and Hibon (2000), sMAPE is defined equiv­a­lently to Armstrong’s adjusted MAPE (with­out the absolute val­ues in the denom­i­na­tor), again with­out ref­er­ence to Arm­strong (1985). Makri­dakis and Hibon claim that this ver­sion of sMAPE has a range of (-200,200).

Flo­res (1986) pro­posed a mod­i­fied ver­sion of Armstrong’s mea­sure, defined as exactly half of the adjusted MAPE defined above. He claimed (again incor­rectly) that it had an upper bound of 100.

Of course, the true range of the adjusted MAPE is (-\infty,\infty) as is eas­ily seen by con­sid­er­ing the two cases y_t+\hat{y}_t = \varepsilon and y_t+\hat{y}_t = -\varepsilon, where \varepsilon>0, and let­ting \varepsilon\rightarrow0. Sim­i­larly, the true range of the sMAPE defined by Makri­dakis (1993) is (0,\infty). I’m not sure that these errors have pre­vi­ously been doc­u­mented, although they have surely been noticed.

Good­win and Law­ton (1999) point out that on a per­cent­age scale, the MAPE is sym­met­ric and the sMAPE is asym­met­ric. For exam­ple, if y_t =100, then \hat{y}_t=110 gives a 10% error, as does \hat{y}_t=90. Either would con­tribute the same incre­ment to MAPE, but a dif­fer­ent incre­ment to sMAPE.

Anne Koehler (2001) in a com­men­tary on the M3 com­pe­ti­tion, made the same point, but with­out ref­er­ence to Good­win and Lawton.

Whether sym­me­try mat­ters or not, and whether we want to work on a per­cent­age or absolute scale, depends entirely on the prob­lem, so these dis­cus­sions over (a)symmetry don’t seem par­tic­u­larly use­ful to me.

Chen and Yang (2004), in an unpub­lished work­ing paper, defined the sMAPE as

    \[\text{sMAPE} = \text{mean}(2|y_t - \hat{y}_t|/(|y_t| + |\hat{y}_t|)).\]

They still called it a mea­sure of “per­cent­age error” even though they dropped the mul­ti­plier 100. At least they got the range cor­rect, stat­ing that this mea­sure has a max­i­mum value of two when either y_t or \hat{y}_t is zero, but is unde­fined when both are zero. The range of this ver­sion of sMAPE is (0,2). Per­haps this is the def­i­n­i­tion that Makri­dakis and Arm­strong intended all along, although nei­ther has ever man­aged to include it cor­rectly in one of their papers or books.

As will be clear by now, the lit­er­a­ture on this topic is lit­tered with errors. The Wikipedia page on sMAPE con­tains sev­eral as well, which a reader might like to correct.

If all data and fore­casts are non-​​negative, then the same val­ues are obtained from all three def­i­n­i­tions of sMAPE. But more gen­er­ally, the last def­i­n­i­tion above from Chen and Yang is clearly the most sen­si­ble, if the sMAPE is to be used at all. In the M3 com­pe­ti­tion, all data were pos­i­tive, but some fore­casts were neg­a­tive, so the dif­fer­ences are impor­tant. How­ever, I can’t match the pub­lished results for any def­i­n­i­tion of sMAPE, so I’m not sure how the cal­cu­la­tions were actu­ally done.

Per­son­ally, I would much pre­fer that either the orig­i­nal MAPE be used (when it makes sense), or the mean absolute scaled error (MASE) be used instead. There seems lit­tle point using the sMAPE except that it makes it easy to com­pare the per­for­mance of a new fore­cast­ing algo­rithm against the pub­lished M3 results. But even there, it is not nec­es­sary, as the fore­casts sub­mit­ted to the M3 com­pe­ti­tion are all avail­able in the Mcomp pack­age for R, so a com­par­i­son can eas­ily be made using what­ever mea­sure you prefer.

Thanks to Andrey Kostenko for alert­ing me to the dif­fer­ent def­i­n­i­tions of sMAPE in the lit­er­a­ture.

Related Posts:

  • Matt

    I’d like a bet­ter under­stand­ing of how the heav­ier penalty MAPE puts on over fore­cast­ing is rel­e­vant for fore­cast eval­u­a­tion and model selection.

    In some sense, I don’t see the asym­me­try– if we hold the actual value fixed, MAPE for over fore­cast­ing and under fore­cast­ing of the same absolute mag­ni­tude will be the same. E.g. for actual value 100, fore­casts of 50 and 150 give equiv­a­lent MAPE (50%). Doesn’t this imply that given an expected value for the actual obser­va­tion of the fore­cast hori­zon, MAPE treats over and under fore­cast­ing equally when­ever the mag­ni­tude of fore­cast error is the same?

    We only get the asym­me­try, it seems, if we hold the mag­ni­tude of fore­cast error the same and vary the expected value for the actu­als, which doesn’t seem prac­ti­cally relevant.

    It’s not true, in other words, that you can “cheat” by low-​​balling a fore­cast in order to improve fore­cast MAPE; as long as that’s the case, what is the prob­lem with using it, as it’s not going to favor mod­els that under fore­cast over those that over fore­cast? (I’m assum­ing here that we don’t need to worry about inter­mit­tent demand.)

    Any direc­tion here would be most appre­ci­ated; your blog has been an invalu­able resource in my busi­ness fore­cast­ing education.

    • Rob J Hyndman

      I agree that it makes more sense to con­sider the case where the actual stays the same and the fore­casts vary, because we can’t change actu­als we can only change forecasts.

      • Matt

        Thanks, good to get some clar­ity here. It would be a shame to avoid a sim­ple met­ric like MAPE based on a mis­un­der­stand­ing. MASE is help­ful too, though in some cases one won’t have a naïve fore­cast to work with (e.g. for the first period of a new product’s sales).

  • Matt

    I should add (and this is from your Arm­strong ref­er­ence) that it’s true that under fore­cast­ing has a max­i­mum MAPE 100% (in the case where the fore­cast is always zero), whereas over fore­cast­ing has no upper bound; this is assum­ing that the fore­cast is always pos­i­tive, of course. This still seems to have lim­ited sig­nif­i­cance to the ques­tion of whether one should use MAPE in assess­ing fore­casts, pro­vided that zero fore­casts are not com­mon in practice.

    • Rob J Hyndman

      It’s zero (or very small) actu­als that is the issue, not zero fore­casts. They come up a lot. e.g., if you are try­ing to pre­dict stock returns.

      • Matt

        Absolutely right, that was a slip on my part.

  • Chad Scher­rer

    For most appli­ca­tions of this, the val­ues are pos­i­tive, and it makes sense to either use a model with a log link (as in a GLM) or to just log-​​transform the response. So is there any rea­son to pre­fer MAPE over some sta­tis­tic (MSE or MAE, per­haps) of the resid­u­als on the log scale? If the big deal is hav­ing them as per­cent­ages, I guess you could do some­thing weird like use a base 1.01 for the log. Still seems more sen­si­ble and less arbi­trary than MAPE, which has no con­nec­tion to the loss func­tion of any model I’ve ever seen.

  • edyhsgr

    Why is MAPE typ­i­cally used instead of Median Absolute Per­cent Error? Is MAPE better?

  • Luis

    Hi Rob, I would like to know if the func­tion accu­racy() works with bats(). I’m try­ing to use it but I got some errors.

  • Simon

    I am no Math­e­mati­cian, but some time ago in wrestling with this prob­lem I have mod­i­fied this sta­tis­tic to nMAPE (for Nor­malised MPAE) where the divi­sor becomes the max­i­mum of Actual and Forecast.

    • Adam

      I recently started think­ing about doing this as well. From what I can tell, this is also sym­met­ric (using the exam­ple above abs(150–100)/150 = 0.33, abs(100–150)/150 = 0.33 and what I like about it is it is bounded between (0,1) or (0,100) if you mul­ti­ple by 100 (for pos­i­tive mea­sure­ments such as my use case is). For me this an intu­itive bound for error. what has been your expe­ri­ence with this? Is there any lit­er­a­ture to sup­port this?

  • cmos

    In the orig­i­nal paper by Makri­dakis and also in the M-​​3 paper the denom­i­na­tor of the sMAPE is mul­ti­plied by 2 whereas in your blog post the numer­a­tor is mul­ti­plied by 2. Addi­tion­ally, (Makri­dakis 1993) nowhere men­tions the term “sMAPE”. This term is only used in the M-​​3 paper.

    • Rob J Hyndman

      1. No it isn’t. In the two papers you men­tion, the denom­i­na­tor is DIVIDED by 2 which is equiv­a­lent to mul­ti­ply­ing the numer­a­tor by 2.
      2. Yes, Makri­dakis didn’t use the acronym “sMAPE” in 1993. That came later.

      • cmos

        You’re right. I read that wrong.