Measuring time series characteristics

A few years ago, I was work­ing on a project where we mea­sured var­i­ous char­ac­ter­is­tics of a time series and used the infor­ma­tion to deter­mine what fore­cast­ing method to apply or how to clus­ter the time series into mean­ing­ful groups. The two main papers to come out of that project were:

  1. Wang, Smith and Hyn­d­man (2006) Characteristic-​​​​based clus­ter­ing for time series data. Data Min­ing and Knowl­edge Dis­cov­ery, 13(3), 335–364.
  2. Wang, Smith-​​Miles and Hyn­d­man (2009) “Rule induc­tion for fore­cast­ing method selec­tion: meta-​​​​learning the char­ac­ter­is­tics of uni­vari­ate time series”, Neu­ro­com­puting, 72, 2581–2594.

I’ve since had a lot of requests for the code which one of my coau­thors has been help­fully email­ing to any­one who asked. But to make it eas­ier, we thought it might be help­ful if I post some updated code here. This is not the same as the R code we used in the paper, as I’ve improved it in sev­eral ways (so it will give dif­fer­ent results). If you just want the code, skip to the bot­tom of the post.

Find­ing the period of the data

Usu­ally in time series work, we know the period of the data (if the obser­va­tions are monthly, the period is 12, for exam­ple). But in this project, some of our data was of unknown period and we wanted a method to auto­mat­i­cally deter­mine the appro­pri­ate period. The method we used was based on local peaks and troughs in the ACF. But I’ve since devised a bet­ter approach (prompted on cross​val​i​dated​.com) using an esti­mate of the spec­tral density:

find.freq <- function(x)
  n <- length(x)
  spec <-,plot=FALSE)
  if(max(spec$spec)>10) # Arbitrary threshold chosen by trial and error.
    period <- round(1/spec$freq[which.max(spec$spec)])
    if(period==Inf) # Find next local maximum
      j <- which(diff(spec$spec)>0)
        nextmax <- j[1] + which.max(spec$spec[j[1]:500])
        if(nextmax <= length(spec$freq))
          period <- round(1/spec$freq[nextmax])
          period <- 1
        period <- 1
    period <- 1

The func­tion is called find.freq because time series peo­ple often call the period of sea­son­al­ity the “fre­quency” (which is of course highly confusing).

Decom­pos­ing the data into trend and sea­sonal components

We needed a mea­sure of the strength of trend and the strength of sea­son­al­ity, and to do this we decom­posed the data into trend, sea­sonal and error terms.

Because not all data could be decom­posed addi­tively, we first needed to apply an auto­mated Box-​​Cox trans­for­ma­tion. We tried a range of Box-​​Cox para­me­ters on a grid, and selected the one which gave the most nor­mal errors. That worked ok, but I’ve since found some papers that pro­vide quite good auto­mated Box-​​Cox algo­rithms that I’ve imple­mented in the fore­cast pack­age. So this code uses Guerrero’s (1993) method instead.

For sea­sonal time series, we decom­posed the trans­formed data using an stl decom­po­si­tion with peri­odic seasonality.

For non-​​seasonal time series, we esti­mated the trend of the trans­formed data using penal­ized regres­sion splines via the mgcv pack­age.

decomp <- function(x,transform=TRUE)
  # Transform series
  if(transform & min(x,na.rm=TRUE) >= 0)
    lambda <- BoxCox.lambda(na.contiguous(x))
    x <- BoxCox(x,lambda)
    lambda <- NULL
    transform <- FALSE
  # Seasonal data
    x.stl <- stl(x,s.window="periodic",na.action=na.contiguous)
    trend <- x.stl$time.series[,2]
    season <- x.stl$time.series[,1]
    remainder <- x - trend - season
  else #Nonseasonal data
    tt <- 1:length(x)
    trend <- rep(NA,length(x))
    trend[!] <- fitted(gam(x ~ s(tt)))
    season <- NULL
    remainder <- x - trend


Putting every­thing on a [0,1] scale

We wanted to mea­sure a range of char­ac­ter­is­tics such as strength of sea­son­al­ity, strength of trend, level of non­lin­ear­ity, skew­ness, kur­to­sis, ser­ial cor­re­lat­ed­ness, self-​​similarity, level of chaotic­ity (is that a word?) and the peri­od­ic­ity of the data. But we wanted all these on the same scale which meant map­ping the nat­ural range of each mea­sure onto [0,1]. The fol­low­ing two func­tions were used to do this.

# f1 maps [0,infinity) to [0,1]
f1 <- function(x,a,b)
  eax <- exp(a*x)
  if (eax == Inf)
    f1eax <- 1
    f1eax <- (eax-1)/(eax+b)
# f2 maps [0,1] onto [0,1]
f2 <- function(x,a,b)
  eax <- exp(a*x)
  ea <- exp(a)

The val­ues of a and b in each func­tion were cho­sen so the mea­sure had a 90th per­centile of 0.10 when the data were iid stan­dard nor­mal, and a value of 0.9 using a well-​​known bench­mark time series.

Cal­cu­lat­ing the measures

Now we are ready to cal­cu­late the mea­sures on the orig­i­nal data, as well as on the adjusted data (after remov­ing trend and seasonality).

measures <- function(x)
  N <- length(x)
  freq <- find.freq(x)
  fx <- c(frequency=(exp((freq-1)/50)-1)/(1+exp((freq-1)/50)))
  x <- ts(x,f=freq)
  # Decomposition
  decomp.x <- decomp(x)
  # Adjust data
  if(freq > 1)
    fits <- decomp.x$trend + decomp.x$season
  else # Nonseasonal data
    fits <- decomp.x$trend
  adj.x <- decomp.x$x - fits + mean(decomp.x$trend, na.rm=TRUE)
  # Backtransformation of adjusted data
    tadj.x <- InvBoxCox(adj.x,decomp.x$lambda)
    tadj.x <- adj.x
  # Trend and seasonal measures
  v.adj <- var(adj.x, na.rm=TRUE)
  if(freq > 1)
    detrend <- decomp.x$x - decomp.x$trend
    deseason <- decomp.x$x - decomp.x$season
    trend <- ifelse(var(deseason,na.rm=TRUE) < 1e-10, 0, 
    season <- ifelse(var(detrend,na.rm=TRUE) < 1e-10, 0,
  else #Nonseasonal data
    trend <- ifelse(var(decomp.x$x,na.rm=TRUE) < 1e-10, 0,
    season <- 0
  m <- c(fx,trend,season)
  # Measures on original data
  xbar <- mean(x,na.rm=TRUE)
  s <- sd(x,na.rm=TRUE)
  # Serial correlation
  Q <- Box.test(x,lag=10)$statistic/(N*10)
  fQ <- f2(Q,7.53,0.103)
  # Nonlinearity
  p <- terasvirta.test(na.contiguous(x))$statistic
  fp <- f1(p,0.069,2.304)
  # Skewness
  sk <- abs(mean((x-xbar)^3,na.rm=TRUE)/s^3)
  fs <- f1(sk,1.510,5.993)
  # Kurtosis
  k <- mean((x-xbar)^4,na.rm=TRUE)/s^4
  fk <- f1(k,2.273,11567)
  # Hurst=d+0.5 where d is fractional difference.
  H <- fracdiff(na.contiguous(x),0,0)$d + 0.5
  # Lyapunov Exponent
  if(freq > N-10)
      stop("Insufficient data")
  Ly <- numeric(N-freq)
  for(i in 1:(N-freq))
    idx <- order(abs(x[i] - x))
    idx <- idx[idx < (N-freq)]
    j <- idx[2]
    Ly[i] <- log(abs((x[i+freq] - x[j+freq])/(x[i]-x[j])))/freq
    if([i]) | Ly[i]==Inf | Ly[i]==-Inf)
      Ly[i] <- NA
  Lyap <- mean(Ly,na.rm=TRUE)
  fLyap <- exp(Lyap)/(1+exp(Lyap))
  m <- c(m,fQ,fp,fs,fk,H,fLyap)
  # Measures on adjusted data
  xbar <- mean(tadj.x, na.rm=TRUE)
  s <- sd(tadj.x, na.rm=TRUE)
  # Serial
  Q <- Box.test(adj.x,lag=10)$statistic/(N*10)
  fQ <- f2(Q,7.53,0.103)
  # Nonlinearity
  p <- terasvirta.test(na.contiguous(adj.x))$statistic
  fp <- f1(p,0.069,2.304)
  # Skewness
  sk <- abs(mean((tadj.x-xbar)^3,na.rm=TRUE)/s^3)
  fs <- f1(sk,1.510,5.993)
  # Kurtosis
  k <- mean((tadj.x-xbar)^4,na.rm=TRUE)/s^4
  fk <- f1(k,2.273,11567)
  m <- c(m,fQ,fp,fs,fk)
  names(m) <- c("frequency", "trend","seasonal",
    "dc autocorrelation","dc non-linear","dc skewness","dc kurtosis")


Here is a quick exam­ple applied to Aus­tralian monthly gas production:

     frequency              trend           seasonal    autocorrelation 
        0.1096             0.9989             0.9337             0.9985 
    non-linear           skewness           kurtosis              Hurst 
        0.4947             0.1282             0.0055             0.9996 
      Lyapunov dc autocorrelation      dc non-linear        dc skewness 
        0.5662             0.1140             0.0538             0.1743 
   dc kurtosis 

The func­tion is far from per­fect, and it is not hard to find exam­ples where it fails. For exam­ple, it doesn’t work with mul­ti­ple sea­son­al­ity — try measure(taylor) and check the sea­son­al­ity. Also, I’m not con­vinced the kur­to­sis pro­vides any­thing use­ful here, or that the skew­ness mea­sure is done in the best way pos­si­ble. But it was really a proof of con­cept, so we will leave it to oth­ers to revise and improve the code.

In our papers, we took the mea­sures obtained using R, and pro­duced self-​​organizing maps using Vis­cov­ery. There is now a som pack­age in R for that, so it might be pos­si­ble to inte­grate that step into R as well. The den­do­gram was gen­er­ated in mat­lab, although that could now also be done in R using the ggden­dro pack­age for example.

Down­load the code in a sin­gle file.

Related Posts:

  • Fran­cis Markham

    Thanks for post­ing your code Rob. This is very sim­i­lar to a project I envis­aged a cou­ple of years ago, before I had the skills to put the ideas into prac­tice. I’ll let you know if I find the time to pub­lish a paper based on the code.

  • Guest

    If there a way to use the for­cast func­tion out of the mea­sures out put function ?

    • Rob J Hyndman

      I don’t under­stand what you mean. What do you want to forecast?

      • Guest

        I was look­ing at one of the many fore­cast pack­age example:

            fit <- Arima(WWWusage,order=c(3,1,0))
            plot(forecast(fit,h=20))Is it pos­si­ble to imag­ine doing some­thing sim­i­lar to:
           fit2 <- measures(WWWusage)   plot(forecast(fit2,h=20))?

        • Rob J Hyndman

          forecast(fit) pro­duces fore­casts from an ARIMA model. How can you fore­cast from the mea­sures? These are just sum­mary sta­tis­tics of the time series.

          • Guest

            Sorry, my ques­tion may not have any sense. From the mea­sures func­tion out­put, time­series can be clus­tered. Link­ing (how?) clus­ters with appro­pri­ate mod­els, a fore­cast can then be per­formed. For instance, a clus­ter with strong trend and sea­son­nal­ity com­po­nents could be fit­ted with fit <- tslm(y ~ trend + sea­son) and then forcasted. 

  • Abhi­jith B

    How can we auto­mate a time­series pro­ce­dure to find the orders of ARIMA in R (not with auto.arima()).
    Sup­pose if there are more than one time series data are available.

    • Rob J Hyndman

      What’s wrong with using auto.arima()? If you have more than one time series, use auto.arima() on each one.

  • Paul Prew

    Hello, This looked very promis­ing for apro­ject I’m work­ing on.  I tried to run the code, but got an error mes­sage.  Maybe it’s beca­sue I never spec­i­fied the (a,b) argu­ments?  I ran the code in blocks as grouped in your post.  I know R a bit, mostly inter­act using RCom­man­der.  I’ll be the first to admit my data is strange, but it’s actual pric­ing his­tory for a raw material.

    > srn.dat$srn172163.100 x summary(x)
       Min. 1st Qu.  Median    Mean 3rd Qu.    Max.
     0.6250  0.6350  0.6850  0.7089  0.7550  0.8810
    > dput(x)
    structure(c(0.705, 0.705, 0.7077561766, 0.725, 0.725, 0.789008964,
    0.8664955774, 0.868, 0.881, 0.8706588748, 0.8199582478, 0.725,
    0.675, 0.625, 0.625, 0.625, 0.625, 0.625, 0.625, 0.625, 0.625,
    0.635, 0.635, 0.635, 0.635, 0.635, 0.7920070187, 0.8376571637,
    0.755, 0.755, 0.685, 0.685, 0.685, 0.685, 0.685, 0.7180954833
    ), .Tsp = c(2009, 2044, 1), class = “ts”)
    > f1 summary(f1eax)
    ERROR:  object ‘f1eax’ not found

    mea­sures m
    [7] ERROR:  object ‘m’ not found

    > ls() [1] “decomp”         “f1”             “f2”             “find.freq”      [5] “snr.dat”        “snr.datsrn.dmx” “srn.dat”        “srn.dmx”        [9] “srn.dmy”        “srn.tslars”     “x”

    • Rob J Hyndman

      Use measures(x)

  • Paul Prew

    Thank you, Rob.  Worked per­fectly.  Next month I start an R pro­gram­ming class, maybe I can avoid these ele­men­tary errors in the future.

    I see there’s auto.arima, and this auto­mated mea­sures script — is there any auto­mated pro­ce­dure for trans­fer func­tions?  We searches My project involves fore­cast­ing raw mate­r­ial cost from price indices of cer­tain com­modi­ties.  There’s a cou­ple thou­sand raw mate­ri­als, and I thought the time series clus­ter­ing might shed some light.  But an auto­mated trans­fer seems even more valuable.

    Thanks again for mak­ing this code avail­able.  regards, Paul

    • Rob J Hyndman

      Auto­mated trans­fer func­tions would be cool, but it isn’t avail­able in R at this stage. The com­mer­cial pack­age Auto­box will do it, I think. 

  • Umer Zee­shan Ijaz

    Thanks for shar­ing the scripts Rob. They are quite help­ful! Installing fore­cast pack­age on Ubuntu (11.04 Natty) was a big pain though as I found out that I can only install “fore­cast” when the cur­rent ver­sion of R >=2.14. In my case, I was run­ning 2.12 and so I had to not only install the newer ver­sion of R but also update the GCC to ver­sion 4.6. In my case, one of the depen­den­cies, “quad­prog” wasn’t com­pil­ing with the cur­rent ver­sion (I think it was 4.5) of GCC.

    Hav­ing fixed all these prob­lems, I am writ­ing what I did to resolve these issues so that those of us who have debian dis­tros and are fac­ing the same prob­lem can save some hassle.

    ###### If GCC is a problem ######

    The fol­low­ing steps are more or less sim­i­lar to what is men­tioned in the fol­low­ing blog­post.–6-in-the-ubuntu-10–10/

    Fol­low­ing this, I got the GCC from http://​gcc​.par​entingamer​ica​.com/​s​n​a​p​s​h​o​t​s​/​L​A​T​E​S​T​-4.6/      (gcc-4.6–20120504.tar.bz2)

    Untar it:
    tar xjvf gcc-4.6–20110401.tar.bz2

    This ver­sion of gcc requires the fol­low­ing libraries: gmp-4.3.2, mpfr-2.4.2, mpc-0.8.1. To get these libraries, run the fol­low­ing script in the untarred folder :

    Go to each fold­ers for the above libraries that are cre­ated in gcc-4.6–20120427/contrib and install using either:
    make install

    Now we are ready to com­pile gcc. Come out of gcc-4.6–20120427 and make a folder called build:
    mkdir build

    Now you should have two fold­ers in the cur­rent direc­tory i.e.

    Go inside the build folder and run the fol­low­ing
    sudo make install

    Update your Ubuntu to use the new GCC:
    cd /​usr/​bin
    sudo rm g++
    sudo ln –s /usr/local/bin/g++-4.6 g++
    sudo rm gcc
    sudo ln –s /usr/local/bin/gcc-4.6 gcc

    ##### If using an older ver­sion of R #####

    Update R as:
    sudo apt-​​get install r-​​base
    sudo apt-​​get install r-​​recommended

    Now inside R check if you have the lat­est ver­sion

    Install fore­cast pack­age now

    If you are access­ing inter­net through your proxy server, set the fol­low­ing envi­ron­men­tal vari­able in R before you run the above com­mand

    I was still unable to get quad­prog to install. It required libg­for­tran to be in /​usr/​lib/​ whereas in my Lenovo thinkpad’s ubuntu dis­tro, by default it was located at /​usr/​lib/​x86_​64-​​linux-​​gnu/​
    I used the fol­low­ing com­mand to find it:
    ldcon­fig –p | grep –i “libgfortran”

    I then made a soft sym­bolic link to it in /​usr/​lib using
    ln –s /usr/lib/x86_64-linux-gnu/ /usr/lib/

    ##### If RKward stopped work­ing after upgrad­ing R #####

    I use RKward (http://​rkward​.source​forge​.net/) as a GUI edi­tor to pro­gram in R. After updat­ing R, it stopped work­ing prop­erly. It gave the mes­sage “The ‘rkward’ R-​​library either could not be loaded at all, or not in the cor­rect ver­sion” and then crashed as soon as I ran any R com­mands. Here is the fix for it:

    Remove RKward using any of the fol­low­ing com­mands:
    sudo apt-​​get purge rkward /​ sudo apt-​​get autore­move rkward /​ sudo apt-​​get remove rkward

    Make sure that the library is not deinstalled/​deselected in dpkg
    dpkg –get-​​selection | grep –i “rkward”

    If found then purge it using
    dpkg –P rkward

    Make sure you remove the fol­low­ing direc­tory as well
    rm –r ~/.rkward

    Install rkward using this ppa
    sudo add-​​apt-​​repository ppa:rkward-devel/rkward-stable-cran
    sudo apt-​​get update

    Umer (http://​user​web​.eng​.gla​.ac​.uk/​u​m​e​r​.ijaz)
    #            sudo apt-​​get install rkward

    • Rob J Hyndman

      The lat­est ver­sions of fore­cast now require R2.14 so they can use code com­pil­ing and Rcp­pAr­madillo to speed things up.

  • Yan­fei Kang

    Hi Rob, 

    Thanks for mak­ing the codes pub­lic. It works per­fectly except the kur­to­sis and Dc kur­to­sis. You may find in your exam­ple on Aus­tralian monthly gas pro­duc­tion, both the kur­to­sis and Dc kur­to­sis are 1.0000, which is kind of wired. By try­ing the codes on sev­eral other time series, I think the rea­son is as follows:In 

     # Kurtosis

      k <- mean((tadj.x-xbar)^4,na.rm=TRUE)/s^4

      fk <- f1(k,2.273,11567), the s is actu­ally the value taken from:

      # Skewness

      s <- abs(mean((tadj.x-xbar)^3,na.rm=TRUE)/s^3)

      fs <- f1(s,1.510,5.993)
    but not the wanted s value from 

      s <- sd(x,na.rm=TRUE). In your exam­ple, the skew­ness value before trans­for­ma­tion should be very small, which causes k to be infin­ity. That’s why the trans­formed kur­to­sis fk is 1.0000. Also for Dc kurtosis.Thanks again for mak­ing the codes available.Regards,yanfei

    • Rob J Hyndman

      Oops. Good catch. That error was intro­duced when I mod­i­fied the code for the blog post — it wasn’t in the code we used for the paper. Any­way, now fixed. Thanks.

  • Pingback: Time Series Decomposition : Box Cox for Additive Decomp | Q&A System

  • Scott Lock­lin

    Thanks kindly for that source code. While I love the SOM’s, I couldn’t get it to work that way. It does work pretty well with proxy::dist and stats::hclust + stats::cutree
    This is a great trick: I used some­thing sim­i­lar to it + knn to detect regimes in sub­seg­ments of very large time series on another problem.

    Your ver­sion of the tech­nique is bet­ter for shorter/​lower-​​frequency time­series. The lack of ade­quate clus­ter­ing algo­rithms on CRAN cries out for a rem­edy, even if it is a pas­tiche of many pack­ages. Maybe it’s time for me to write one…

  • John Williams

    Thanks heaps Rob. Just a quick note: on my sys­tem (R 3.0.1) I had to add a cou­ple of requires: tseries and fracd­iff. But that’s triv­ial. Thanks again! :-)

    • Rob J Hyndman

      Thanks. When I wrote this, the fore­cast pack­age auto­mat­i­cally loaded those pack­ages. More recent ver­sions of fore­cast now only import the func­tions needed, so there is now a need to add those requires.

  • SriV

    Dear Rob,
    You code is very infor­ma­tive. Thanks. Regard­ing the ‘nor­mal­iza­tion’ , par­tic­u­larly, the func­tion “f1” which con­verts any [0, Infinin­ity] to [0, 1], can you please explain that func­tion a bit more? I am unable to under­stand it. Also, what are those three argu­ments to the func­tion — x, a and b? I looked at its usage fur­ther down, but, still could not understand.


  • Baman

    Thank you for pub­lish­ing these help­ful codes, I’m try­ing to use the mea­sure() func­tion for my data set, but that gives me an error for some of them

    > x

    Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec

    2001 9 20 13 33 25 9 5 34 20 11 7 8

    2002 5 4 10 46 71 32 10 15 4 18 7 5

    2003 2 5 5 9 26 18 16 16 6 8 1 13

    2004 5 9 7 32 33 10 24 16 42 33 18 38

    2005 17 13 2 10 12 46 23 37 11 20 9 23

    2006 33 14 31 17 39 20 18 31 26 48 30 53

    2007 10 32 27 27 29 46 33 22 32 30 29 51

    2008 44 20 42 35 42 14 30 26 62 36 34 29

    2009 36 37 25 42 32 48 20


    Error in stl(x, s.window = “peri­odic”, na.action = na.contiguous) :

    only uni­vari­ate series are allowed

    Could you please give me any sug­ges­tion to cor­rect it?

    • Rob J Hyndman

      It looks like your data is in a dataframe or a matrix object, rather than a ts object. It must be a uni­vari­ate ts object.

  • Pingback: Hyndsight - Detecting seasonality

  • Geral­dine

    Dear Rob
    Thank you so much for post­ing your code. It is exactly what I was look­ing for.
    I’m cur­rently work­ing on imple­ment­ing the sec­ond block of your code. Am I wrong that it doesn’t work with time series < 10 peri­ods? It seems like the gam func­tion can not han­dle these short time series. I replaced it with an lm func­tion and that seems to work ok. Do you have any idea what else I can do to fix that prob­lem? Around 40% of my time series have less than 10 periods.

    Best Geral­dine