A blog by Rob J Hyndman 

Twitter Gplus RSS

Benchmarks for forecasting

Published on 25 August 2010

Every week I reject papers sub­mit­ted to the Inter­na­tional Jour­nal of Fore­cast­ing because they present new meth­ods with­out ever attempt­ing to demon­strate that the new meth­ods are bet­ter than exist­ing meth­ods. It is a pol­icy of the jour­nal that every new method must be com­pared to stan­dard bench­marks and exist­ing meth­ods before the paper will even be con­sid­ered for publication.

For uni­vari­ate time series meth­ods, it is not dif­fi­cult. As a min­i­mum, com­par­isons should be made against a naïve method and a stan­dard method such as an ARIMA model.

  1. The naïve method for non-​​seasonal data is based on a ran­dom walk — all fore­casts are equal to the last obser­va­tion. For sea­sonal data, the best naïve method is to use the last obser­va­tion from the same sea­son. That is, for monthly data, fore­casts for Feb­ru­ary are all equal to the last Feb­ru­ary observation.
  2. Com­par­isons with ARIMA mod­els used to be prob­lem­atic because some authors did not have suf­fi­cient exper­tise to fit a good ARIMA model, and so com­par­isons were some­times made, for exam­ple, against a non-​​seasonal AR model when the data were obvi­ously sea­sonal. This should no longer be a prob­lem as there are now good auto­matic ARIMA algo­rithms such as auto.arima() in the fore­cast pack­age for R.

For mul­ti­vari­ate time series, the same uni­vari­ate bench­marks can be used.

For meth­ods involv­ing covari­ates, a stan­dard lin­ear regres­sion can often pro­vide a basic bench­mark. Authors some­times argue that lin­ear regres­sion is not appro­pri­ate for their data (e.g., because of non-​​linear rela­tion­ships or cor­re­la­tions), but that is not the point. I don’t care if the lin­ear regres­sion is appro­pri­ate — I just want them to be able to show that their method pro­vides bet­ter pre­dic­tions than a stan­dard and sim­ple bench­mark. If it can’t beat a sim­ple stan­dard regres­sion, espe­cially if it is inap­pro­pri­ate, there is not much point proceeding.

The best bench­marks are those that are already pub­lished. For exam­ple, new uni­vari­ate time series meth­ods can be com­pared with the M-​​competition or M3 com­pe­ti­tion data where there are already pub­lished eval­u­a­tions on large num­bers of obser­va­tions.  In this case, authors do not even have to imple­ment the bench­marks them­selves. All they have to do is use the same test sets and com­pare their MAPE or sMAPE val­ues with those pub­lished for other methods.

Just beat­ing the bench­marks is not, of itself, jus­ti­fi­ca­tion for pub­li­ca­tion, but it helps. It is also nec­es­sary to be able to describe your new method in enough detail and clar­ity that oth­ers could imple­ment it. It is usu­ally also nec­es­sary to show that the method works on more than one data set. It is rel­a­tively easy to find a method that out­per­forms the bench­marks on a sin­gle data set; but that is no rea­son to think it will be use­ful on other data sets. The M-​​competitions are use­ful as they pro­vide a large set of data for com­par­isons. If a method does well on 1001 or 3003 time series, then I know it is not a fluke.

Sim­i­larly, not being able to beat the bench­marks does not, of itself, mean the paper is dead. It may be that the new method is not far behind the bench­marks but has other advan­tages. Or the new method may be par­tic­u­larly good in some cir­cum­stances or for a small sub­set of problems.

The job of the author is to care­fully and per­sua­sively present the case for their pro­posed method. As an edi­tor, I am look­ing for authors to con­vince me of the value of their ideas. Papers propos­ing new fore­cast­ing meth­ods must include com­par­isons with stan­dard bench­marks, and should involve large scale empir­i­cal eval­u­a­tions.


Related Posts:


 
Tags:
3 Comments  comments 
  • http://microsolar.wordpress.com Kanti Mohan Pandit

    I have read your advice for authors to con­vince you on the new and pro­posed meth­ods for fore­cast­ing and must include com­par­isons with stan­dard bench­mark. If the author is not con­ver­sant about the stan­dard other fore­cast­ing meth­ods but he has invented inno­v­a­tive method of his own how can he jus­tify the com­par­isons. He can of course place large scale empir­i­cal evi­dence to prove his the­ory with results. I feel your sug­ges­tion can not be car­ried out in this cir­cum­stance. Kindly relax and let tnew author prove his cal­iber with­out com­par­i­son.
    With kind Regards
    Kanti Mohan Pandit

    • http://robjhyndman.com Rob J Hyndman

      Large scale empir­i­cal stud­ies are only use­ful when a bench­mark is included for com­par­i­son. Oth­er­wise how can you argue that your new method is use­ful? An impor­tant part of research is being famil­iar with pre­vi­ous lit­er­a­ture on the sub­ject, and that includes being famil­iar with the stan­dard meth­ods that are used and the rel­e­vant bench­marks that are appropriate.

  • Chewyraver

    I fully agree with this, I don’t do many com­par­isons, but I do always do some, even if it is only the ran­dom walk model for the rea­sons you list here.