The difference between prediction intervals and confidence intervals

Pre­dic­tion inter­vals and con­fi­dence inter­vals are not the same thing. Unfor­tu­nately the terms are often con­fused, and I am often fre­quently cor­rect­ing the error in stu­dents’ papers and arti­cles I am review­ing or editing.

A pre­dic­tion inter­val is an inter­val asso­ci­ated with a ran­dom vari­able yet to be observed, with a spec­i­fied prob­a­bil­ity of the ran­dom vari­able lying within the inter­val. For exam­ple, I might give an 80% inter­val for the fore­cast of GDP in 2014. The actual GDP in 2014 should lie within the inter­val with prob­a­bil­ity 0.8. Pre­dic­tion inter­vals can arise in Bayesian or fre­quen­tist statistics.

A con­fi­dence inter­val is an inter­val asso­ci­ated with a para­me­ter and is a fre­quen­tist con­cept. The para­me­ter is assumed to be non-​​random but unknown, and the con­fi­dence inter­val is com­puted from data. Because the data are ran­dom, the inter­val is ran­dom. A 95% con­fi­dence inter­val will con­tain the true para­me­ter with prob­a­bil­ity 0.95. That is, with a large num­ber of repeated sam­ples, 95% of the inter­vals would con­tain the true parameter.

A Bayesian con­fi­dence inter­val, also known as a “cred­i­ble inter­val”, is an inter­val asso­ci­ated with the pos­te­rior dis­tri­b­u­tion of the para­me­ter. In the Bayesian per­spec­tive, para­me­ters are treated as ran­dom vari­ables, and so have prob­a­bil­ity dis­tri­b­u­tions. Thus a Bayesian con­fi­dence inter­val is like a pre­dic­tion inter­val, but asso­ci­ated with a para­me­ter rather than an observation.

I think the dis­tinc­tion between pre­dic­tion and con­fi­dence inter­vals is worth pre­serv­ing because some­times you want to use both. For exam­ple, con­sider the regression

    \[ y_i = \alpha + \beta x_i + e_i \]

where y_i is the change in GDP from quar­ter i-1 to quar­ter i, x_i is the change in the unem­ploy­ment rate from quar­ter i-1 to quar­ter i, and e_i\sim\text{N}(0,\sigma^2). (This regres­sion model is known as Okun’s law in macro­eco­nom­ics.) In this case, both con­fi­dence inter­vals and pre­dic­tion inter­vals are inter­est­ing. You might be inter­ested in the con­fi­dence inter­val asso­ci­ated with the mean value of y when x=0; that is, the mean growth in GDP when the unem­ploy­ment rate does not change. You might also be inter­ested in the pre­dic­tion inter­val for y when x=0; that is, the likely range of future val­ues of GDP growth when the unem­ploy­ment rate does not change.

The dis­tinc­tion is mostly retained in the sta­tis­tics lit­er­a­ture. How­ever, in econo­met­rics it is com­mon to use “con­fi­dence inter­vals” for both types of inter­val (e.g., Granger & New­bold, 1986). I once asked Clive Granger why he con­fused the two con­cepts, and he dis­missed my objec­tion as fuss­ing about triv­i­al­i­ties. I dis­agreed with him then, and I still do.

I have seen some­one com­pute a con­fi­dence inter­val for the mean, and use it as if it was a pre­dic­tion inter­val for a future obser­va­tion. The trou­ble is, con­fi­dence inter­vals for the mean are much nar­rower than pre­dic­tion inter­vals, and so this gave him an exag­ger­ated and false sense of the accu­racy of his fore­casts. Instead of the inter­val con­tain­ing 95% of the prob­a­bil­ity space for the future obser­va­tion, it con­tained only about 20%.

So I ask sta­tis­ti­cians to please pre­serve this dis­tinc­tion. And I ask econo­me­tri­cians to stop being so sloppy about ter­mi­nol­ogy. Unfor­tu­nately, I can’t con­tinue my debate with Clive Granger. I rather hoped he would come to accept my point of view.

Related Posts:

  • molecule61

    It’s not cor­rect to say that model para­me­ters are con­sid­ered to be ran­dom in the Bayesian per­spec­tive — they are con­sid­ered to be unknown. The prob­a­bil­ity dis­tri­b­u­tion for the para­me­ter is a mea­sure of your uncer­tainty about its fixed value.

    • Rob J Hyndman

      Yes, but it is cor­rect to say they are “treated as ran­dom variables”.

  • Eran

    Is it the case that there is one-​​to-​​one map­ping between PI and CI?
    (For exam­ple, PI = CI+std*1, when sym­me­try is assumed)
    if so, might be an addi­tional rea­son for the confusion.

    • Rob J Hyndman

      Maybe. But there is a one-​​to-​​one map­ping between vari­ance and stan­dard devi­a­tion too, but nobody con­fuses them.

  • Hrode­bert

    The state­ment: “A 95% con­fi­dence inter­val will con­tain the true para­me­ter with prob­a­bil­ity 0.95.” might be mis­un­der­stood, because the true para­me­ter falls into an inter­val or not. But if an inter­val looks like: a — T < param < a + T where T is a sta­tis­tics you are absolutely right.

    • Rob J Hyndman

      If you read the next sen­tence, I don’t think it can be misunderstood.

  • Rob J Hyndman

    Read the fol­low­ing sen­tence. The CI is ran­dom because it is based on the data. The prob­a­bil­ity cov­er­age occurs with repeated sampling.

  • zbi­cy­clist

    I agree this is an impor­tant dis­tinc­tion. I agree that stu­dents have a lot of trou­ble remem­ber­ing which is which.

    I think the ter­mi­nol­ogy is to blame. Aren’t they both con­fi­dence inter­vals, just con­fi­dence about dif­fer­ent things? So we might call them “Indi­vid­ual Pre­dic­tion con­fi­dence inter­val” and “Gen­eral Pre­dic­tion con­fi­dence inter­val”, although I’m not ter­ri­bly happy with that exact phrasing.

    • d0ubs

      I totally agree with you, they are both CI but for dif­fer­ent things. One is for the mean of the depen­dant vari­able and the other is for the depen­dant vari­able itself.

      Also the arti­cle is a bit con­fus­ing by imply­ing that one dif­fer­ence between the two con­cepts is that pre­dic­tion inter­val is used for future value(s). It is kind of mis­lead­ing, you can very well com­pute an inter­vall for the future mean value of the depen­dant vari­able as well as you can com­pute an inter­vall for the value of depen­dant vari­able con­di­tionned on observed value of the inde­pen­dant vari­able (or, for instance, at the sam­ple mean value of inde­pen­dant variable).

  • mark

    I had ques­tion, which doesn’t really core­spon­dence with topic above. Namely, I esti­mated arima coef­fi­cients using auto.arima() func­tion on 250 obser­va­tions and i did fore­casts. Now I added I want to use this par­tic­u­lar model and its coef­fi­cients to do fore­cast from 251 th obser­va­tions. What should i do?

    • Rob J Hyndman

      Use the model argu­ment in forecast.Arima().

      • marek

        Ok thank you, but then I will not only the struc­ture of arima (num­ber of para­me­ters), but i will change values.

        • Rob J Hyndman

          No. As I have already said, it applies the model to new data *with­out chang­ing the coefficients*.

      • mark

        Now i under­stand, sorry. I found the doc­u­men­ta­tion of fore­cast pack­age. Thank you!

  • Johnno

    Thanks for this! I have a prac­ti­cal ques­tion that’s related to this. I have some time­series data that I’m using to cre­ate a mul­ti­plica­tive HW fore­cast. And I want to cre­ate a PI around the 12 month look-​​ahead fore­cast. So I was think­ing about going into my time series, and for a period in it, cre­at­ing some 12 month looka­head fore­casts and using the empir­i­cal dis­tri­b­u­tion of the error between them and the actu­als to gen­er­ate a PI.

    As it relates to the PI/​CI dis­cus­sion above, I was read­ing about mak­ing boot­strap CIs, but since what I want is a PI, maybe that approach doesn’t work. Or does it?

    Sec­ondly, just gen­er­ally, is there an approach that uses the empir­i­cal dis­tri­b­u­tion of fore­cast errors to con­struct PIs?

    Johnno K.

    • Rob J Hyndman

      Yes, you can do that. But you gen­er­ally won’t have enough data to get a good esti­mate. Usu­ally, a bet­ter approach is to use the mod­el­ling frame­work for HW. If you are using R, use the ets() func­tion in the fore­cast pack­age with model=“MAM”.

  • Pingback: Forecasting Continued: Using Simulation to Create Prediction Intervals Around Holt-Winters | Analytics Made Skeezy()

  • Ken

    It is odd that this is some­thing that is not usu­ally cov­ered in a first year stats course, but rather in sec­ond year for lin­ear regres­sion. Cov­er­ing it in first year for means would help in clar­i­fy­ing the dif­fer­ence between stan­dard devi­a­tion and stan­dard error, and then make it eas­ier to cover for regression.

  • Rajib Sarkar

    thank you, prof.hyndman! this sis the first time i have under­stood the dis­tinc­tion between i and ci clearly. many thanks, indeed!!

  • Wei1

    thanks! So in the above exam­ple, “the mean growth in GDP when the unem­ploy­ment rate does not change” here the mean growth means the toal GDP dis­tri­b­u­tion right?
    And how the pre­dic­tion inter­val is com­puted in fore­cast func­tion of fore­cast pack­age? Do you use the resid­u­als’ vari­ance to esti­mate the vari­ance of the fore­cast­ing data?

    • Rob J Hyndman

      No. the mean growth in GDP means the aver­age quar­terly change in GDP.

      Pre­dic­tion inter­vals depend on the model. The fore­cast func­tion com­putes them using the the­o­ret­i­cal vari­ance of the fore­cast dis­tri­b­u­tion. For a one-​​step time series fore­cast, that is equal to the resid­ual vari­ance. But for other steps, and for regres­sion mod­els, the fore­cast vari­ance is not the same as the resid­ual variance.

      • Wei1

        But the resid­ual vari­ance is used to esti­mate the vari­ance of the fore­cast dis­tri­b­u­tion. If my data has frequency=7 days and I want to fore­cast for exam­ple 15th day’s data, should I only con­sider the vari­ance of the 1st,8th and 14th data? Thanks!

        • Rob J Hyndman

          No. It esti­mates the vari­ance of the one-​​step fore­cast vari­ance for time series. For multi-​​step or cross-​​sectional fore­casts, the resid­ual vari­ance is NOT equal to the fore­cast vari­ance as I’ve already explained. Your sec­ond ques­tion does not make sense to me. The fore­cast vari­ance does not depend directly on the vari­ance of any par­tic­u­lar days.

      • Wei1

        And, do you assume the dis­tri­b­u­tion is nor­mal when com­put­ing the pre­dic­tion inter­val in fore­cast func­tion? Thanks!

        • Rob J Hyndman

          Yes, usu­ally. But some func­tions have a boot­strap argu­ment, and then no dis­tri­b­u­tional assump­tion is made.

  • Brian

    This post would have been much bet­ter if you fleshed out the dis­tinc­tion in an example.

  • hk

    Although “pre­dic­tion inter­val is an inter­val asso­ci­ated with a ran­dom vari­able yet to be observed”, but when you try to cross-​​validate pre­dic­tion inter­vals in some data, future val­ues of a test data are already avail­able. So, in this case two notions should/​can be com­pared. Could you please elab­o­rate on this? For exam­ple how it is pos­si­ble to show a pre­dic­tion method pro­vides bet­ter (not nec­es­sar­ily nar­rower) con­fi­dence
    inter­vals. For instance, show­ing ets() gives bet­ter pre­dic­tion inter­vals com­pared to meanf().

    • hk

      What about “Empir­i­cal Pre­dic­tion Interval“s?

  • Laura Poole

    So when using the fore­cast pack­age to per­form ARIMA analy­sis.
    Can you change the CI?
    I want dif­fer­ent con­fi­dence inter­vals other than 80% and 95% but can­not fig­ure out how to change them.

    • Rob J Hyndman

      Use the argu­ment level.

  • Rob J Hyndman

    A cred­i­ble inter­val is a Bayesian ver­sion of a con­fi­dence inter­val. Your first inter­val is a cred­i­ble inter­val. I don’t know what you mean by “para­me­ters’ pos­te­rior pre­dic­tive dis­tri­b­u­tions”. Pre­sum­ably if you are refer­ring to the dis­tri­b­u­tion of a para­me­ter, it is a cred­i­ble inter­val. A pre­dic­tion inter­val refers to the dis­tri­b­u­tion of an unob­served data value.

  • SAN

    Do you dis­cuss “The dif­fer­ence between pre­dic­tion inter­vals and con­fi­dence intervals“in any journal/​book? I want to cite it in my manuscript.

    • Rob J Hyndman

      No, but you should be able to cite a website.

  • SAN

    I have a set of sim­u­la­tion data , the pre­dic­tion inter­val is cal­cu­lated from the sim­u­la­tion data, and the pre­dic­tion inter­val is used to pre­dict the real data from exper­i­ment. Is it correct?

  • SAN

    Can I know why the pre­dic­tion inter­val need to add 1 on the con­fi­dence interval?

  • Rizwan

    How are con­fi­dence inter­vals related to the con­fi­dence band (in a non­lin­ear regres­sion prob­lem)? I under­stand that the term con­fi­dence inter­val is reserved for the para­me­ters involved in a regres­sion prob­lem and the con­fi­dence band encloses the area that one is cer­tain of to con­tain the best fit curve. If the lower and upper lim­its (say there are two non­lin­ear para­me­ters with con­fi­dence inter­vals a1<= a <= a2, b1<= b <=b2 and y = f(x;a,b) is the func­tion such that yl = f(x,a1,b1) and y2 = (x,a2,b2)) of all the para­me­ters obtained through an asymp­totic analy­sis (related to the variance-​​covariance matrix) is used in the best fit func­tion and plot­ted, does this plot relate to the con­fi­dence band? Would the strip gen­er­ated using the lower and upper lim­its of the con­fi­dence inter­vals equals the con­fi­dence band? I guess the answer is no, but I am not sure how. Could you please explain and high­light the differences?

  • Rizwan

    Could you please explain the dif­fer­ence between con­fi­dence inter­vals and the con­fi­dence bound? Can a con­fi­dence bound of the best fit curve be obtained from the lower and upper lim­its of the con­fi­dence inter­vals of para­me­ters ? Does it make sense to com­pute the con­fi­dence inter­vals using an asymp­totic tech­nique by com­put­ing the variance-​​covariance matrix and then using their lower and upper lim­its to trace the func­tion and call the region as con­fi­dence band?

  • kon­stan­ti­nweix­el­baum

    Could you maybe give an exam­ple how to cal­cu­late the Bayesian pre­dic­tion inter­val? Maybe with an easy set of Data? Look­ing through the inter­net I couldn’t really find a def­i­n­i­tion or exam­ple. Thanks!

  • Nico­las

    really, sta­tis­tics should get rid of that “i describe in words the oper­a­tions I do” .
    For­mu­las. Mod­ern math. Not being stuck in 16th cen­tury kind of approach.
    Descartes is the new black.

    The dis­tinc­tion IS a triv­i­al­ity ONCE the cor­rep­sond­ing equa­tion is writ­ten down.
    Before that, it is just another bloody case of bad sci­ence badly explained.

  • Liang-​​Cheng Zhang

    Thank you, Prof. Hyn­d­man. This clar­i­fies many things now. I have a fur­ther ques­tion for this. In eco­nom­ics, eco­nomic esti­mates are usu­ally cal­cu­lated by the com­bi­na­tions of pre­dic­tions. Take economies of scope for instance. Once the cost func­tion is esti­mated, economies of scope are esti­mated by the pro­por­tion of cost sav­ings from joint
    pro­duc­tion rel­a­tive to fully inte­grated costs. The above costs are con­di­tional expec­ta­tions
    given that cost func­tion coef­fi­cients equal to some fixed con­stants. Could you tell me that the inter­val for this esti­mate (economies of scope) is con­fi­dence inter­val or pre­dic­tion inter­val? Thanks for your time.

    • Rob J Hyndman

      You can com­pute either depend­ing on whether you want to allow for uncer­tainty in the esti­mate only, or whether you also want to allow for the obser­va­tional uncer­tainty in future.

  • 王蒙

    How can I give the pre­dic­tion inter­val using the HoltWin­ters Model? I am writ­ing codes to imple­ment HoltWin­ters Model, but I can­not give the pre­dic­tion interval.

    • Rob J Hyndman

      See my 2008 Springer book, chap­ter 6.

      • 王蒙

        The para­me­ter alpha,beta and gamma is trained accord­ing to the train­ing data set. But dif­fer­ent size of data set gives dif­fer­ent para­me­ter (for exam­ple, I can choose the pre­vi­ous two months data to train the model, and I can also choose the pre­vi­ous three months data to train the model. How­ever, the above two mod­els is not the same. ). How should I choose the size of the train­ing data set?
        To han­dle this prob­lem, I guess that may be the para­me­ter sequence is con­ver­gent as the size of the data set is grow­ing up. I do an exper­i­ment to test the hypoth­e­sis, how­ever, the sequence is not convergent.