Estimating a nonlinear time series model in R

There are quite a few R packages available for nonlinear time series analysis, but sometimes you need to code your own models. Here is a simple example to show how it can be done.

The model is a first order threshold autoregression:

    \[y_t = \begin{cases} \alpha y_{t-1} + e_t & \text{if $y_{t-1} \le r$}\\ \beta y_{t-1} + \gamma e_t & \text{if $y_{t-1}>r$} \end{cases}\]

where e_t is a Gaussian white noise series with variance \sigma^2. The following function will generate some random data from this model.

simnlts <- function(n, alpha, beta, r, sigma, gamma, burnin=100)
  # Generate noise
  e <- rnorm(n+burnin, 0, sigma)
  # Create space for y
  y <- numeric(n+burnin)
  # Generate time series
  for(i in 2:(n+burnin))
    if(y[i-1] <= r)
      y[i] <- alpha*y[i-1] + e[i]
      y[i] <- beta*y[i-1] + gamma*e[i]
  # Throw away first burnin values
  y <- ts(y[-(1:burnin)])
  # Return result

The “burn-in” parameter allows the first value of the series to be a random draw from the stationary distribution of the process (assuming it is stationary).

We will estimate the model by minimizing the sum of squared errors across both regimes. Other approaches that take account of the different variances in the two regimes may be better, but it is useful to keep it simple, at least initially. The following function uses a non-linear optimizer to estimate \alpha, \beta and r. To ensure good starting values, we begin the optimization with \alpha=\beta=0 and set the initial value of r to be the mean of the observed data.

fitnlts <- function(x)
  ss <- function(par,x)
    alpha <- par[1]
    beta <- par[2]
    r <- par[3]
    n <- length(x)
    # Check that each regime has at least 10% of observations
    if(sum(x<=r) < n/10 | sum(x>r) < n/10)
    e1 <- x[2:n] - alpha*x[1:(n-1)]
    e2 <- x[2:n] - beta*x[1:(n-1)]
    regime1 <- (x[1:(n-1)] <= r)
    e <- e1*(regime1) + e2*(!regime1)
  fit <- optim(c(0,0,mean(x)),ss,x=x,control=list(maxit=1000))
  if(fit$convergence > 0)

There are a few things to notice here.

  • I have used a function inside a function. The ss function computes the sum of squared errors. It would have been possible to define this outside fitnlts, but as I don’t need it for any other purpose it is neater to define it locally.
  • Occasionally the optimizer will not converge, and then fitnlts will return missing values.
  • I have avoided the use of a loop by computing the errors in both regimes. This is much faster than the alternatives.
  • If r is outside the range of the data, then either \alpha or \beta will be unidentifiable. To avoid this problem, I force r to be such that each regime must contain at least 10% of the data. This is an arbitrary value, but it makes the estimation more stable.
  • The function being optimized is not continuous in the r direction. This will cause gradient-based optimizers to fail, but the Nelder-Mead optimizer used here is relatively robust to such problems.
  • This particular model can be more efficiently estimated by exploiting the conditional linearity. For example, we could loop over a grid of r values (e.g., at the mid-points between consecutive ordered observations) and use linear regression for the other parameters. But the above approach is more general and can be adapted to other nonlinear time series models.

The functions can be used as follows (with some example parameters):

y <- simnlts(100, 0.5, -1.8, -1, 1, 2)

A similar approach can be used for other time series models.

Related Posts:

  • Stephen Nicar

    Hi Rob,

    Enjoy your blog, as I’m in the process of switching from Matlab to R as my primary software. I’ve also been using TeXStudio for a while now based on your recommendation and I’m quite happy with it.

    In your example you use a non-linear optimizer to estimate the parameter values. Is there any meaningful difference (other than speed, perhaps) between this method and simply estimating the model via least squares for each value of the threshold variable and picking the one that minimizes the SSE?

    I ask because I estimated a threshold VAR model in one of my dissertation essays (economics, empirical macro) and used the looping method I just described, picking the specification that minimized the log determinant of the residual covariance matrix. I don’t have a strong programming background, but I know that loops are generally inefficient. Are there other benefits to your approach other than computational efficiency?


    • Rob J Hyndman

      Actually, your approach may be better for this model as it exploits the conditional linearity, and it avoids the problem of the optimization function being non-continuous in the threshold variable. The advantage of my approach is that it can be adapted for any nonlinear time series model.

      • JC COLM

        Hi Rob,

        Thanks for this great post. Always learning a lot of things thanks to your blog !
        I’m currently struggling with strange non-linear time-series, see enclosed picture.

        Have you ever dealt with such confusing data, containing a long of steady periods ? (transitions between steady periods are often smoother than plotted)

        I really don’t know how to handle this… maybe this is even not adapted to any modelling method ?
        Moreover, I would like to perform a multivariate analysis on several time-series looking like this… but I think than even a simple correlation analysis won’t be adapted…

        Do you have any general recommendations to work on such data ?

        Thank you very much !


        • Rob J Hyndman

          I’d guess that you would need to know something about the data generating process in order to construct an appropriate model. An off-the-shelf time series model is unlikely to be appropriate.

          • JC COLM

            Thank you very much Rob for this insight. That’s what I thought : no standard solution.
            Understanding the context is certainly the best first step before jumping into the data, as usual… Let’s investigate ! :-)
            Thanks !

          • JC COLM

            After many hours reading papers, can’t it be considered as an INAR time series, or something like this?
            Papers about this process are very theoretical, and I can’t find any R application related to such a process, so it’s hard for me to validate this idea…

  • Supri Amir

    Hi Rob,

    I want to ask you about prediction.
    I have data y is refer to week and x is refer to failures

    x y
    1 15
    2 29
    3 22

    and so on.

    the question is how to predict the total failure when the failure still occuring?
    what method can i use to solve it?

    thank you

    • Rob J Hyndman

      I suggest you ask on

  • Chen Yusheng

    Thank you very much for your posts. Using R for time series data analysis has become easier after I started following your blog.
    I have taken liberty here to request you to kindly post some hints for plotting impulse response functions in R, suppose we use a threshold type model as describe above. I cannot figure out how to apply shocks to series in levels and finally obtain the confidence bands in the plot. Please help me out.
    Thank you very much

  • Matthieu


    This model can be also estimated with the package tsDyn, which implements the conditional lest square, with a grid search, and appears to give much more precise results for the threshold:


    estim <- function(){
    y <- simnlts(200, 0.5, -1.8, -1, 1, 2)
    b <- fitnlts(y)
    a<-setar(y, include="none", m=1, trim=0.1)
    res <- c(b, coef(a))
    names(res) <- c("alpha_1", "beta_1", "th_1", "alpha_2", "beta_2", "th_2")

    estims <- replicate(500, estim())

    ### compare results:
    ddply(subset(estims_m, param=="th"), .(estim), summarise,
    bias=-1 -mean(value, na.rm=TRUE),
    sd=sd(value, na.rm=TRUE),
    mse=(-1-mean(value, na.rm=TRUE))^2+var(value, na.rm=TRUE),

    estim bias sd mse no.val
    1 grid -0.02230693 0.1276594 0.01679453 0
    2 optim -0.22195637 0.2857795 0.13093455 6

    ### plot density of estimator of th:

    estims_m <- melt(
    estims_mparam <- gsub("_[12]", "", estims_mvariable)
    estims_mestim <- ifelse(grepl("_1", estims_m$variable), "optim", "grid")


    qplot(x=value, data=subset(estims_m, param=="th"), geom="density", colour=estim)