A time series classification contest

Amongst today’s email was one from someone running a private competition to classify time series. Here are the essential details.

The data are measurements from a medical diagnostic machine which takes 1 measurement every second, and after 32-1000 seconds, the time series must be classified into one of two classes. Some pre-classified training data is provided. It is not necessary to classify all the test data, but you do need to have relatively high accuracy on what is classified. So you could find a subset of more easily classifiable test time series, and leave the rest of the test data unclassified.

Accuracy is measured using
$$
0.5\left(\frac{p}{p+n’}+\frac{n}{n+p’}\right)
$$
where $p=$ true positive, $p’=$ false positive, $n=$ true negative and $n’=$ false negative.

The prizes are:

  1. $5000 if using at least 50% of the test samples and achieving 0.75 accuracy.
  2. $15000 if using at least 50% of the test samples and achieving 0.85 or higher.
  3. for any accuracy above 0.75, while using less than 50% of test samples (but at least 25% of test samples), any additional 0.05 increase in accuracy, grants an additional $2K. For example, if you use 30% of test samples, and achieve accuracy of 0.85%, the price will be $5K+$4K=$9K

The winner will be:
The one with highest accuracy with highest amount of samples
OR
THE FIRST ONE that achieves 0.85 accuracy with at least 50% of data
OR
THE FIRST ONE that achieves 0.9 accuracy with at least 30% of data.

In the link below you will see a text file that explains the data and how to access it and a png image which explain how the time series to classify was built and how the classes were assigned. The link also includes the actual train and test samples to be used for the challenge and some plots of the time series.

https://drive.google.com/folderview?id=0BxmzB6Xm7Ga1MGxsdlMxbGllZnM&usp=sharing

Entries should include:

  1. Proof of accuracy
  2. R code which grants the organizer full right to use
  3. R code to support new additional test samples.

The prizes create some strange discontinuities. Someone with accuracy of 0.75 using 50% of the data gets $5K, but someone with accuracy of 0.76 using only 25% of the data gets more. On the other hand, someone using 49% of the test with 0.85 accuracy gets $9K, but if they use 50% of the test they get $15K. Surely a continuous bivariate function of accuracy and percentage would have been better.

I also think this would have been better on Kaggle or CrowdAnalytix, but instead it has been posted on the R group on LinkedIn.

For all further questions, either ask via the comments on LinkedIn, or email the organizer Roni Kass


Related Posts:


  • A. Hartati

    Sir Hyndman. May I know, when does deadline of this competition Sir? Thank you in advance, Regards Alia

    • Roni Kass

      Jan 2015

      • A. Hartati

        Thank you very much for the info, Sir.

      • Adam

        January 1st or 31st ? It’d make a lot of difference, given that we’re already the 16th of December.

        • Roni Kass

          This is a rather short challenge. It can be Jan 31st, although I know of some people which are already working, as they are trying to win the prize by being the FIRST ONE to hit at least 85% accuracy

          • Adam

            I’ll try to build something quickly, then. Thanks !

      • A. Hartati

        Sir Roni, who is the winner of Time Series Classification’s competition at last January 2015?

  • Roni Kass

    Hi,

    During this entire holiday season, I am fully on-line to support all those needing any assistance for the
    competition

    I wish you all Merry |Christmas and Happy new year

    Good luck,

    Roni Kass

    • fernandopv

      Happy new year Roni.
      Could you explain a little more in deep how is selected the true part on each sample?
      thanks,
      Fernando

      • Roni Kass

        For any in depth questions, please email me directly, so we don’t overcrowd this comments section

        In the link there is a document that explains rather well the “true” section. Basically one can use alll 1,000 samples from the time series, or choose just the true section, which is the “last” section (most right of time series), where the raise in value from the beginning of true till end of time series is >= +-0.5 (we feel this is the interesting segment int he timer series to use for classification, but it is not a must)

  • Marco

    Hello Rob,

    What will the winning algorithms be used for?

    Who is Roni Kass? What is your association with him and why are you promoting his competition?

    Does he work for the Israeli millitary?

    How much does he stand to make from this work?

    • Roni Kass

      Hi,

      This is has nothing to do with any military, it is a for a new prototype and feasibility test for an application in the medical field

      Roni Kass

  • Sportsmanship

    Any new deadline so far!

    • Roni Kass

      Waiting for the winner, this sets the deadline

      75% accuracy was achieved, but with only 8% of the test data (minimum of 30% is required)

      Roni Kass

      • Mohamed TOUZANI

        can i get the the winner R code or the best so far . i’m a begginer at Datamining and i’m curious because i thought it is not possible to classify time series in R touzanimo@gmail.com

        • A. Hartati

          Who is the winner, Sir?

  • tijptjik

    Any ideas whether the dataset is still available somewhere? Seems the Google Drive is empty.