A blog by Rob J Hyndman 

Twitter Gplus RSS

Measuring time series characteristics

Published on 2 May 2012

A few years ago, I was work­ing on a project where we mea­sured var­i­ous char­ac­ter­is­tics of a time series and used the infor­ma­tion to deter­mine what fore­cast­ing method to apply or how to clus­ter the time series into mean­ing­ful groups. The two main papers to come out of that project were:

  1. Wang, Smith and Hyn­d­man (2006) Characteristic-​​​​based clus­ter­ing for time series data. Data Min­ing and Knowl­edge Dis­cov­ery, 13(3), 335–364.
  2. Wang, Smith-​​Miles and Hyn­d­man (2009) “Rule induc­tion for fore­cast­ing method selec­tion: meta-​​​​learning the char­ac­ter­is­tics of uni­vari­ate time series”, Neu­ro­com­puting, 72, 2581–2594.

I’ve since had a lot of requests for the code which one of my coau­thors has been help­fully email­ing to any­one who asked. But to make it eas­ier, we thought it might be help­ful if I post some updated code here. This is not the same as the R code we used in the paper, as I’ve improved it in sev­eral ways (so it will give dif­fer­ent results). If you just want the code, skip to the bot­tom of the post.

Find­ing the period of the data

Usu­ally in time series work, we know the period of the data (if the obser­va­tions are monthly, the period is 12, for exam­ple). But in this project, some of our data was of unknown period and we wanted a method to auto­mat­i­cally deter­mine the appro­pri­ate period. The method we used was based on local peaks and troughs in the ACF. But I’ve since devised a bet­ter approach (prompted on cross​val​i​dated​.com) using an esti­mate of the spec­tral density:

find.freq <- function(x)
{
  n <- length(x)
  spec <- spec.ar(c(na.contiguous(x)),plot=FALSE)
  if(max(spec$spec)>10) # Arbitrary threshold chosen by trial and error.
  {
    period <- round(1/spec$freq[which.max(spec$spec)])
    if(period==Inf) # Find next local maximum
    {
      j <- which(diff(spec$spec)>0)
      if(length(j)>0)
      {
        nextmax <- j[1] + which.max(spec$spec[j[1]:500])
        if(nextmax <= length(spec$freq))
          period <- round(1/spec$freq[nextmax])
        else
          period <- 1
      }
      else
        period <- 1
    }
  }
  else
    period <- 1
 
  return(period)
}

 
The func­tion is called find.freq because time series peo­ple often call the period of sea­son­al­ity the “fre­quency” (which is of course highly confusing).

Decom­pos­ing the data into trend and sea­sonal components

We needed a mea­sure of the strength of trend and the strength of sea­son­al­ity, and to do this we decom­posed the data into trend, sea­sonal and error terms.

Because not all data could be decom­posed addi­tively, we first needed to apply an auto­mated Box-​​Cox trans­for­ma­tion. We tried a range of Box-​​Cox para­me­ters on a grid, and selected the one which gave the most nor­mal errors. That worked ok, but I’ve since found some papers that pro­vide quite good auto­mated Box-​​Cox algo­rithms that I’ve imple­mented in the fore­cast pack­age. So this code uses Guerrero’s (1993) method instead.

For sea­sonal time series, we decom­posed the trans­formed data using an stl decom­po­si­tion with peri­odic seasonality.

For non-​​seasonal time series, we esti­mated the trend of the trans­formed data using penal­ized regres­sion splines via the mgcv pack­age.

decomp <- function(x,transform=TRUE)
{
  require(forecast)
  # Transform series
  if(transform & min(x,na.rm=TRUE) >= 0)
  {
    lambda <- BoxCox.lambda(na.contiguous(x))
    x <- BoxCox(x,lambda)
  }
  else
  {
    lambda <- NULL
    transform <- FALSE
  }
  # Seasonal data
  if(frequency(x)>1)
  {
    x.stl <- stl(x,s.window="periodic",na.action=na.contiguous)
    trend <- x.stl$time.series[,2]
    season <- x.stl$time.series[,1]
    remainder <- x - trend - season
  }
  else #Nonseasonal data
  {
    require(mgcv)
    tt <- 1:length(x)
    trend <- rep(NA,length(x))
    trend[!is.na(x)] <- fitted(gam(x ~ s(tt)))
    season <- NULL
    remainder <- x - trend
  }
  return(list(x=x,trend=trend,season=season,remainder=remainder,
    transform=transform,lambda=lambda))
}

 

Putting every­thing on a [0,1] scale

We wanted to mea­sure a range of char­ac­ter­is­tics such as strength of sea­son­al­ity, strength of trend, level of non­lin­ear­ity, skew­ness, kur­to­sis, ser­ial cor­re­lat­ed­ness, self-​​similarity, level of chaotic­ity (is that a word?) and the peri­od­ic­ity of the data. But we wanted all these on the same scale which meant map­ping the nat­ural range of each mea­sure onto [0,1]. The fol­low­ing two func­tions were used to do this.

# f1 maps [0,infinity) to [0,1]
f1 <- function(x,a,b)
{
  eax <- exp(a*x)
  if (eax == Inf)
    f1eax <- 1
  else
    f1eax <- (eax-1)/(eax+b)
  return(f1eax)
}
 
# f2 maps [0,1] onto [0,1]
f2 <- function(x,a,b)
{
  eax <- exp(a*x)
  ea <- exp(a)
  return((eax-1)/(eax+b)*(ea+b)/(ea-1))
}

 
The val­ues of a and b in each func­tion were cho­sen so the mea­sure had a 90th per­centile of 0.10 when the data were iid stan­dard nor­mal, and a value of 0.9 using a well-​​known bench­mark time series.

Cal­cu­lat­ing the measures

Now we are ready to cal­cu­late the mea­sures on the orig­i­nal data, as well as on the adjusted data (after remov­ing trend and seasonality).

measures <- function(x)
{
  require(forecast)
 
  N <- length(x)
  freq <- find.freq(x)
  fx <- c(frequency=(exp((freq-1)/50)-1)/(1+exp((freq-1)/50)))
  x <- ts(x,f=freq)
 
  # Decomposition
  decomp.x <- decomp(x)
 
  # Adjust data
  if(freq > 1)
    fits <- decomp.x$trend + decomp.x$season
  else # Nonseasonal data
    fits <- decomp.x$trend
  adj.x <- decomp.x$x - fits + mean(decomp.x$trend, na.rm=TRUE)
 
  # Backtransformation of adjusted data
  if(decomp.x$transform)
    tadj.x <- InvBoxCox(adj.x,decomp.x$lambda)
  else
    tadj.x <- adj.x
 
  # Trend and seasonal measures
  v.adj <- var(adj.x, na.rm=TRUE)
  if(freq > 1)
  {
    detrend <- decomp.x$x - decomp.x$trend
    deseason <- decomp.x$x - decomp.x$season
    trend <- ifelse(var(deseason,na.rm=TRUE) < 1e-10, 0, 
      max(0,min(1,1-v.adj/var(deseason,na.rm=TRUE))))
    season <- ifelse(var(detrend,na.rm=TRUE) < 1e-10, 0,
      max(0,min(1,1-v.adj/var(detrend,na.rm=TRUE))))
  }
  else #Nonseasonal data
  {
    trend <- ifelse(var(decomp.x$x,na.rm=TRUE) < 1e-10, 0,
      max(0,min(1,1-v.adj/var(decomp.x$x,na.rm=TRUE))))
    season <- 0
  }
 
  m <- c(fx,trend,season)
 
  # Measures on original data
  xbar <- mean(x,na.rm=TRUE)
  s <- sd(x,na.rm=TRUE)
 
  # Serial correlation
  Q <- Box.test(x,lag=10)$statistic/(N*10)
  fQ <- f2(Q,7.53,0.103)
 
  # Nonlinearity
  p <- terasvirta.test(na.contiguous(x))$statistic
  fp <- f1(p,0.069,2.304)
 
  # Skewness
  sk <- abs(mean((x-xbar)^3,na.rm=TRUE)/s^3)
  fs <- f1(sk,1.510,5.993)
 
  # Kurtosis
  k <- mean((x-xbar)^4,na.rm=TRUE)/s^4
  fk <- f1(k,2.273,11567)
 
  # Hurst=d+0.5 where d is fractional difference.
  H <- fracdiff(na.contiguous(x),0,0)$d + 0.5
 
  # Lyapunov Exponent
  if(freq > N-10)
      stop("Insufficient data")
  Ly <- numeric(N-freq)
  for(i in 1:(N-freq))
  {
    idx <- order(abs(x[i] - x))
    idx <- idx[idx < (N-freq)]
    j <- idx[2]
    Ly[i] <- log(abs((x[i+freq] - x[j+freq])/(x[i]-x[j])))/freq
    if(is.na(Ly[i]) | Ly[i]==Inf | Ly[i]==-Inf)
      Ly[i] <- NA
  }
  Lyap <- mean(Ly,na.rm=TRUE)
  fLyap <- exp(Lyap)/(1+exp(Lyap))
 
  m <- c(m,fQ,fp,fs,fk,H,fLyap)
 
  # Measures on adjusted data
  xbar <- mean(tadj.x, na.rm=TRUE)
  s <- sd(tadj.x, na.rm=TRUE)
 
  # Serial
  Q <- Box.test(adj.x,lag=10)$statistic/(N*10)
  fQ <- f2(Q,7.53,0.103)
 
  # Nonlinearity
  p <- terasvirta.test(na.contiguous(adj.x))$statistic
  fp <- f1(p,0.069,2.304)
 
  # Skewness
  sk <- abs(mean((tadj.x-xbar)^3,na.rm=TRUE)/s^3)
  fs <- f1(sk,1.510,5.993)
 
  # Kurtosis
  k <- mean((tadj.x-xbar)^4,na.rm=TRUE)/s^4
  fk <- f1(k,2.273,11567)
 
  m <- c(m,fQ,fp,fs,fk)
  names(m) <- c("frequency", "trend","seasonal",
    "autocorrelation","non-linear","skewness","kurtosis",
    "Hurst","Lyapunov",
    "dc autocorrelation","dc non-linear","dc skewness","dc kurtosis")
 
  return(m)
}

 

Here is a quick exam­ple applied to Aus­tralian monthly gas production:

library(forecast)
measures(gas)
     frequency              trend           seasonal    autocorrelation 
        0.1096             0.9989             0.9337             0.9985 
    non-linear           skewness           kurtosis              Hurst 
        0.4947             0.1282             0.0055             0.9996 
      Lyapunov dc autocorrelation      dc non-linear        dc skewness 
        0.5662             0.1140             0.0538             0.1743 
   dc kurtosis 
        0.9992

 
The func­tion is far from per­fect, and it is not hard to find exam­ples where it fails. For exam­ple, it doesn’t work with mul­ti­ple sea­son­al­ity — try measure(taylor) and check the sea­son­al­ity. Also, I’m not con­vinced the kur­to­sis pro­vides any­thing use­ful here, or that the skew­ness mea­sure is done in the best way pos­si­ble. But it was really a proof of con­cept, so we will leave it to oth­ers to revise and improve the code.

In our papers, we took the mea­sures obtained using R, and pro­duced self-​​organizing maps using Vis­cov­ery. There is now a som pack­age in R for that, so it might be pos­si­ble to inte­grate that step into R as well. The den­do­gram was gen­er­ated in mat­lab, although that could now also be done in R using the ggden­dro pack­age for example.


Down­load the code in a sin­gle file.


Related Posts:


 
18 Comments  comments 
  • Fran­cis Markham

    Thanks for post­ing your code Rob. This is very sim­i­lar to a project I envis­aged a cou­ple of years ago, before I had the skills to put the ideas into prac­tice. I’ll let you know if I find the time to pub­lish a paper based on the code.

  • Guest

    If there a way to use the for­cast func­tion out of the mea­sures out put function ?

    • http://robjhyndman.com Rob J Hyndman

      I don’t under­stand what you mean. What do you want to forecast?

      • Guest

        I was look­ing at one of the many fore­cast pack­age example:

            fit <- Arima(WWWusage,order=c(3,1,0))
            plot(forecast(fit,h=20))Is it pos­si­ble to imag­ine doing some­thing sim­i­lar to:
           fit2 <- measures(WWWusage)   plot(forecast(fit2,h=20))?

        • http://robjhyndman.com Rob J Hyndman

          forecast(fit) pro­duces fore­casts from an ARIMA model. How can you fore­cast from the mea­sures? These are just sum­mary sta­tis­tics of the time series.

          • Guest

            Sorry, my ques­tion may not have any sense. From the mea­sures func­tion out­put, time­series can be clus­tered. Link­ing (how?) clus­ters with appro­pri­ate mod­els, a fore­cast can then be per­formed. For instance, a clus­ter with strong trend and sea­son­nal­ity com­po­nents could be fit­ted with fit <- tslm(y ~ trend + sea­son) and then forcasted. 

  • Abhi­jith B

    How can we auto­mate a time­series pro­ce­dure to find the orders of ARIMA in R (not with auto.arima()).
    Sup­pose if there are more than one time series data are available.

    • http://robjhyndman.com Rob J Hyndman

      What’s wrong with using auto.arima()? If you have more than one time series, use auto.arima() on each one.

  • Paul Prew

    Hello, This looked very promis­ing for apro­ject I’m work­ing on.  I tried to run the code, but got an error mes­sage.  Maybe it’s beca­sue I never spec­i­fied the (a,b) argu­ments?  I ran the code in blocks as grouped in your post.  I know R a bit, mostly inter­act using RCom­man­der.  I’ll be the first to admit my data is strange, but it’s actual pric­ing his­tory for a raw material.

    > srn.dat$srn172163.100 x summary(x)
       Min. 1st Qu.  Median    Mean 3rd Qu.    Max.
     0.6250  0.6350  0.6850  0.7089  0.7550  0.8810
     
    > dput(x)
    structure(c(0.705, 0.705, 0.7077561766, 0.725, 0.725, 0.789008964,
    0.8664955774, 0.868, 0.881, 0.8706588748, 0.8199582478, 0.725,
    0.675, 0.625, 0.625, 0.625, 0.625, 0.625, 0.625, 0.625, 0.625,
    0.635, 0.635, 0.635, 0.635, 0.635, 0.7920070187, 0.8376571637,
    0.755, 0.755, 0.685, 0.685, 0.685, 0.685, 0.685, 0.7180954833
    ), .Tsp = c(2009, 2044, 1), class = “ts”)
     
    > f1 summary(f1eax)
    ERROR:  object ‘f1eax’ not found

    mea­sures m
    [7] ERROR:  object ‘m’ not found

    > ls() [1] “decomp”         “f1”             “f2”             “find.freq”      [5] “snr.dat”        “snr.datsrn.dmx” “srn.dat”        “srn.dmx”        [9] “srn.dmy”        “srn.tslars”     “x”

    • http://robjhyndman.com Rob J Hyndman

      Use measures(x)

  • Paul Prew

    Thank you, Rob.  Worked per­fectly.  Next month I start an R pro­gram­ming class, maybe I can avoid these ele­men­tary errors in the future.

    I see there’s auto.arima, and this auto­mated mea­sures script — is there any auto­mated pro­ce­dure for trans­fer func­tions?  We searches My project involves fore­cast­ing raw mate­r­ial cost from price indices of cer­tain com­modi­ties.  There’s a cou­ple thou­sand raw mate­ri­als, and I thought the time series clus­ter­ing might shed some light.  But an auto­mated trans­fer seems even more valuable.

    Thanks again for mak­ing this code avail­able.  regards, Paul

    • http://robjhyndman.com Rob J Hyndman

      Auto­mated trans­fer func­tions would be cool, but it isn’t avail­able in R at this stage. The com­mer­cial pack­age Auto­box will do it, I think. 

  • http://www.facebook.com/profile.php?id=535283031 Umer Zee­shan Ijaz

    Thanks for shar­ing the scripts Rob. They are quite help­ful! Installing fore­cast pack­age on Ubuntu (11.04 Natty) was a big pain though as I found out that I can only install “fore­cast” when the cur­rent ver­sion of R >=2.14. In my case, I was run­ning 2.12 and so I had to not only install the newer ver­sion of R but also update the GCC to ver­sion 4.6. In my case, one of the depen­den­cies, “quad­prog” wasn’t com­pil­ing with the cur­rent ver­sion (I think it was 4.5) of GCC.

    Hav­ing fixed all these prob­lems, I am writ­ing what I did to resolve these issues so that those of us who have debian dis­tros and are fac­ing the same prob­lem can save some hassle.

    ###### If GCC is a problem ######

    The fol­low­ing steps are more or less sim­i­lar to what is men­tioned in the fol­low­ing blog­post. http://buildall.wordpress.com/2011/04/20/installing-gcc-4–6-in-the-ubuntu-10–10/

    Fol­low­ing this, I got the GCC from http://​gcc​.par​entingamer​ica​.com/​s​n​a​p​s​h​o​t​s​/​L​A​T​E​S​T​-4.6/      (gcc-4.6–20120504.tar.bz2)

    Untar it:
    tar xjvf gcc-4.6–20110401.tar.bz2

    This ver­sion of gcc requires the fol­low­ing libraries: gmp-4.3.2, mpfr-2.4.2, mpc-0.8.1. To get these libraries, run the fol­low­ing script in the untarred folder :
    gcc-4.6–20120427/contrib/download_prerequisites

    Go to each fold­ers for the above libraries that are cre­ated in gcc-4.6–20120427/contrib and install using either:
    ./​configure
    make install
    or
    ./​install-​​sh

    Now we are ready to com­pile gcc. Come out of gcc-4.6–20120427 and make a folder called build:
    mkdir build

    Now you should have two fold­ers in the cur­rent direc­tory i.e.
    /​build
    /gcc-4.6–20110401

    Go inside the build folder and run the fol­low­ing
    ../gcc-4.6–20110401/configure
    disable-​​checking
    enable-languages=c,c++
    enable-​​multiarch
    enable-​​shared
    enable-threads=posix
    program-suffix=-4.6
    with-gmp=/usr/local/lib
    with-mpc=/usr/lib
    with-mpfr=/usr/lib
    without-​​included-​​gettext
    with-​​system-​​zlib
    with-tune=generic
    make
    sudo make install

    Update your Ubuntu to use the new GCC:
    cd /​usr/​bin
    sudo rm g++
    sudo ln –s /usr/local/bin/g++-4.6 g++
    sudo rm gcc
    sudo ln –s /usr/local/bin/gcc-4.6 gcc

    ##### If using an older ver­sion of R #####

    Update R as:
    sudo apt-​​get install r-​​base
    sudo apt-​​get install r-​​recommended

    Now inside R check if you have the lat­est ver­sion
    R.Version()

    Install fore­cast pack­age now
    install.packages(“forecast”)

    If you are access­ing inter­net through your proxy server, set the fol­low­ing envi­ron­men­tal vari­able in R before you run the above com­mand
    Sys.setenv(http_proxy=””)

    I was still unable to get quad­prog to install. It required libg­for­tran to be in /​usr/​lib/​ whereas in my Lenovo thinkpad’s ubuntu dis­tro, by default it was located at /​usr/​lib/​x86_​64-​​linux-​​gnu/​
    I used the fol­low­ing com­mand to find it:
    ldcon­fig –p | grep –i “libgfortran”

    I then made a soft sym­bolic link to it in /​usr/​lib using
    ln –s /usr/lib/x86_64-linux-gnu/libgfortran.so.3.0.0 /usr/lib/libgfortran.so

    ##### If RKward stopped work­ing after upgrad­ing R #####

    I use RKward (http://​rkward​.source​forge​.net/) as a GUI edi­tor to pro­gram in R. After updat­ing R, it stopped work­ing prop­erly. It gave the mes­sage “The ‘rkward’ R-​​library either could not be loaded at all, or not in the cor­rect ver­sion” and then crashed as soon as I ran any R com­mands. Here is the fix for it:

    Remove RKward using any of the fol­low­ing com­mands:
    sudo apt-​​get purge rkward /​ sudo apt-​​get autore­move rkward /​ sudo apt-​​get remove rkward

    Make sure that the library is not deinstalled/​deselected in dpkg
    dpkg –get-​​selection | grep –i “rkward”

    If found then purge it using
    dpkg –P rkward

    Make sure you remove the fol­low­ing direc­tory as well
    rm –r ~/.rkward

    Install rkward using this ppa
    sudo add-​​apt-​​repository ppa:rkward-devel/rkward-stable-cran
    sudo apt-​​get update

    Regards,
    Umer (http://​user​web​.eng​.gla​.ac​.uk/​u​m​e​r​.ijaz)
    #            sudo apt-​​get install rkward

    • http://robjhyndman.com Rob J Hyndman

      The lat­est ver­sions of fore­cast now require R2.14 so they can use code com­pil­ing and Rcp­pAr­madillo to speed things up.

  • Yan­fei Kang

    Hi Rob, 

    Thanks for mak­ing the codes pub­lic. It works per­fectly except the kur­to­sis and Dc kur­to­sis. You may find in your exam­ple on Aus­tralian monthly gas pro­duc­tion, both the kur­to­sis and Dc kur­to­sis are 1.0000, which is kind of wired. By try­ing the codes on sev­eral other time series, I think the rea­son is as follows:In 

     # Kurtosis

      k <- mean((tadj.x-xbar)^4,na.rm=TRUE)/s^4

      fk <- f1(k,2.273,11567), the s is actu­ally the value taken from:

      # Skewness

      s <- abs(mean((tadj.x-xbar)^3,na.rm=TRUE)/s^3)

      fs <- f1(s,1.510,5.993)
    but not the wanted s value from 

      s <- sd(x,na.rm=TRUE). In your exam­ple, the skew­ness value before trans­for­ma­tion should be very small, which causes k to be infin­ity. That’s why the trans­formed kur­to­sis fk is 1.0000. Also for Dc kurtosis.Thanks again for mak­ing the codes available.Regards,yanfei

    • http://robjhyndman.com Rob J Hyndman

      Oops. Good catch. That error was intro­duced when I mod­i­fied the code for the blog post — it wasn’t in the code we used for the paper. Any­way, now fixed. Thanks.

  • Pingback: Time Series Decomposition : Box Cox for Additive Decomp | Q&A System

  • http://www.facebook.com/profile.php?id=1665382363 Scott Lock­lin

    Thanks kindly for that source code. While I love the SOM’s, I couldn’t get it to work that way. It does work pretty well with proxy::dist and stats::hclust + stats::cutree
    This is a great trick: I used some­thing sim­i­lar to it + knn to detect regimes in sub­seg­ments of very large time series on another problem.

    Your ver­sion of the tech­nique is bet­ter for shorter/​lower-​​frequency time­series. The lack of ade­quate clus­ter­ing algo­rithms on CRAN cries out for a rem­edy, even if it is a pas­tiche of many pack­ages. Maybe it’s time for me to write one…