Estimating a nonlinear time series model in R

There are quite a few R pack­ages avail­able for non­lin­ear time series analy­sis, but some­times you need to code your own mod­els. Here is a sim­ple exam­ple to show how it can be done.

The model is a first order thresh­old autoregression:

    \[y_t = \begin{cases} \alpha y_{t-1} + e_t & \text{if $y_{t-1} \le r$}\\ \beta y_{t-1} + \gamma e_t & \text{if $y_{t-1}>r$} \end{cases}\]

where e_t is a Gauss­ian white noise series with vari­ance \sigma^2. The fol­low­ing func­tion will gen­er­ate some ran­dom data from this model.

simnlts <- function(n, alpha, beta, r, sigma, gamma, burnin=100)
  # Generate noise
  e <- rnorm(n+burnin, 0, sigma)
  # Create space for y
  y <- numeric(n+burnin)
  # Generate time series
  for(i in 2:(n+burnin))
    if(y[i-1] <= r)
      y[i] <- alpha*y[i-1] + e[i]
      y[i] <- beta*y[i-1] + gamma*e[i]
  # Throw away first burnin values
  y <- ts(y[-(1:burnin)])
  # Return result

The “burn-​​in” para­me­ter allows the first value of the series to be a ran­dom draw from the sta­tion­ary dis­tri­b­u­tion of the process (assum­ing it is stationary).

We will esti­mate the model by min­i­miz­ing the sum of squared errors across both regimes. Other approaches that take account of the dif­fer­ent vari­ances in the two regimes may be bet­ter, but it is use­ful to keep it sim­ple, at least ini­tially. The fol­low­ing func­tion uses a non-​​linear opti­mizer to esti­mate \alpha, \beta and r. To ensure good start­ing val­ues, we begin the opti­miza­tion with \alpha=\beta=0 and set the ini­tial value of r to be the mean of the observed data.

fitnlts <- function(x)
  ss <- function(par,x)
    alpha <- par[1]
    beta <- par[2]
    r <- par[3]
    n <- length(x)
    # Check that each regime has at least 10% of observations
    if(sum(x<=r) < n/10 | sum(x>r) < n/10)
    e1 <- x[2:n] - alpha*x[1:(n-1)]
    e2 <- x[2:n] - beta*x[1:(n-1)]
    regime1 <- (x[1:(n-1)] <= r)
    e <- e1*(regime1) + e2*(!regime1)
  fit <- optim(c(0,0,mean(x)),ss,x=x,control=list(maxit=1000))
  if(fit$convergence > 0)

There are a few things to notice here.

  • I have used a func­tion inside a func­tion. The ss func­tion com­putes the sum of squared errors. It would have been pos­si­ble to define this out­side fitnlts, but as I don’t need it for any other pur­pose it is neater to define it locally.
  • Occa­sion­ally the opti­mizer will not con­verge, and then fitnlts will return miss­ing values.
  • I have avoided the use of a loop by com­put­ing the errors in both regimes. This is much faster than the alternatives.
  • If r is out­side the range of the data, then either \alpha or \beta will be uniden­ti­fi­able. To avoid this prob­lem, I force r to be such that each régime must con­tain at least 10% of the data. This is an arbi­trary value, but it makes the esti­ma­tion more stable.
  • The func­tion being opti­mized is not con­tin­u­ous in the r direc­tion. This will cause gradient-​​based opti­miz­ers to fail, but the Nelder-​​Mead opti­mizer used here is rel­a­tively robust to such problems.
  • This par­tic­u­lar model can be more effi­ciently esti­mated by exploit­ing the con­di­tional lin­ear­ity. For exam­ple, we could loop over a grid of r val­ues (e.g., at the mid-​​points between con­sec­u­tive ordered obser­va­tions) and use lin­ear regres­sion for the other para­me­ters. But the above approach is more gen­eral and can be adapted to other non­lin­ear time series models.

The func­tions can be used as fol­lows (with some exam­ple parameters):

y <- simnlts(100, 0.5, -1.8, -1, 1, 2)

A sim­i­lar approach can be used for other time series models.

Related Posts:

  • Stephen Nicar

    Hi Rob,

    Enjoy your blog, as I’m in the process of switch­ing from Mat­lab to R as my pri­mary soft­ware. I’ve also been using TeX­Studio for a while now based on your rec­om­men­da­tion and I’m quite happy with it.

    In your exam­ple you use a non-​​linear opti­mizer to esti­mate the para­me­ter val­ues. Is there any mean­ing­ful dif­fer­ence (other than speed, per­haps) between this method and sim­ply esti­mat­ing the model via least squares for each value of the thresh­old vari­able and pick­ing the one that min­i­mizes the SSE?

    I ask because I esti­mated a thresh­old VAR model in one of my dis­ser­ta­tion essays (eco­nom­ics, empir­i­cal macro) and used the loop­ing method I just described, pick­ing the spec­i­fi­ca­tion that min­i­mized the log deter­mi­nant of the resid­ual covari­ance matrix. I don’t have a strong pro­gram­ming back­ground, but I know that loops are gen­er­ally inef­fi­cient. Are there other ben­e­fits to your approach other than com­pu­ta­tional efficiency?


    • Rob J Hyndman

      Actu­ally, your approach may be bet­ter for this model as it exploits the con­di­tional lin­ear­ity, and it avoids the prob­lem of the opti­miza­tion func­tion being non-​​continuous in the thresh­old vari­able. The advan­tage of my approach is that it can be adapted for any non­lin­ear time series model.

      • JC COLM

        Hi Rob,

        Thanks for this great post. Always learn­ing a lot of things thanks to your blog !
        I’m cur­rently strug­gling with strange non-​​linear time-​​series, see enclosed picture.

        Have you ever dealt with such con­fus­ing data, con­tain­ing a long of steady peri­ods ? (tran­si­tions between steady peri­ods are often smoother than plotted)

        I really don’t know how to han­dle this… maybe this is even not adapted to any mod­el­ling method ?
        More­over, I would like to per­form a mul­ti­vari­ate analy­sis on sev­eral time-​​series look­ing like this… but I think than even a sim­ple cor­re­la­tion analy­sis won’t be adapted…

        Do you have any gen­eral rec­om­men­da­tions to work on such data ?

        Thank you very much !


        • Rob J Hyndman

          I’d guess that you would need to know some­thing about the data gen­er­at­ing process in order to con­struct an appro­pri­ate model. An off-​​the-​​shelf time series model is unlikely to be appropriate.

          • JC COLM

            Thank you very much Rob for this insight. That’s what I thought : no stan­dard solu­tion.
            Under­stand­ing the con­text is cer­tainly the best first step before jump­ing into the data, as usual… Let’s inves­ti­gate ! :-)
            Thanks !

          • JC COLM

            After many hours read­ing papers, can’t it be con­sid­ered as an INAR time series, or some­thing like this?
            Papers about this process are very the­o­ret­i­cal, and I can’t find any R appli­ca­tion related to such a process, so it’s hard for me to val­i­date this idea…

  • Supri Amir

    Hi Rob,

    I want to ask you about pre­dic­tion.
    I have data y is refer to week and x is refer to failures

    x y
    1 15
    2 29
    3 22

    and so on.

    the ques­tion is how to pre­dict the total fail­ure when the fail­ure still occur­ing?
    what method can i use to solve it?

    thank you

    • Rob J Hyndman

      I sug­gest you ask on cross​val​i​dated​.com.

  • Chen Yusheng

    Thank you very much for your posts. Using R for time series data analy­sis has become eas­ier after I started fol­low­ing your blog.
    I have taken lib­erty here to request you to kindly post some hints for plot­ting impulse response func­tions in R, sup­pose we use a thresh­old type model as describe above. I can­not fig­ure out how to apply shocks to series in lev­els and finally obtain the con­fi­dence bands in the plot. Please help me out.
    Thank you very much

  • Matthieu


    This model can be also esti­mated with the pack­age tsDyn, which imple­ments the con­di­tional lest square, with a grid search, and appears to give much more pre­cise results for the threshold:


    estim <- func­tion(){
    y <- simnlts(200, 0.5, –1.8, –1, 1, 2)
    b <- fitnlts(y)
    a<-setar(y, include=“none”, m=1, trim=0.1)
    res <- c(b, coef(a))
    names(res) <- c(“alpha_1”, “beta_​1”, “th_​1”, “alpha_​2”, “beta_​2”, “th_​2”)

    estims <- replicate(500, estim())

    ### com­pare results:
    ddply(subset(estims_m, param==“th”), .(estim), sum­marise,
    bias=-1 –mean(value, na.rm=TRUE),
    sd=sd(value, na.rm=TRUE),
    mse=(-1-mean(value, na.rm=TRUE))^2+var(value, na.rm=TRUE),

    estim bias sd mse no.val
    1 grid –0.02230693 0.1276594 0.01679453 0
    2 optim –0.22195637 0.2857795 0.13093455 6

    ### plot den­sity of esti­ma­tor of th:

    estims_​m <- melt(
    estims_​mparam <- gsub("_[12]", "", estims_mvari­able)
    estims_​mestim <- ifelse(grepl("_1”, estims_m$variable), “optim”, “grid”)


    qplot(x=value, data=subset(estims_m, param==“th”), geom=“density”, colour=estim)

    see: https://​dl​.drop​boxuser​con​tent​.com/​u​/​6​1​1​3​3​5​8​/​P​e​r​m​a​n​e​n​t​/​C​o​m​p​a​r​e​E​s​t​i​m​T​h.png