The Collective of Transform Ensembles (COTE) for Time Series Classification

Lead Research Organisation: University of East Anglia
Department Name: Computing Sciences

Abstract

Time series classification is the problem of trying to predict an outcome based on a series of ordered data. So, for example, if we take a series of electronic readings from a sample of meat, the classification problem could be to determine whether that sample is pure beef or whether it has been adulterated with some other meat. Alternatively, if we have a series of electricity usage, the classification problem could be to determine which type of device generated those readings. Time series classification problems arise in all areas of science, and we have worked on problems involving ECG and EEG data, chemical concentration readings, astronomical measurements, otolith outlines, electricity usage, food spectrographs, hand and bone radiograph data and mutant worm motion. The algorithm we have developed to do this, The Collective of Transform Ensembles (COTE), is significantly better than any other technique proposed in the literature (when assessed on 80 data sets used in the literature). This project looks to improve COTE further and to apply it to three problem domains of genuine importance to society. In collaboration with Imperial, we will look at classifying Caenorhabditis elegans via motion traces. C. elegans is a nematode worm commonly used as a model organism in the study of genetics. We will help develop an automated classifier for C. elegans mutant types based on their motion, with the objective of identifying genes that regulate appetite. This classifier will automate a task previously done manually at great cost and will uncover conserved regulators of appetite in a model organism in which functional dissection is possible at the level of behaviour, neural circuitry, and fat storage. In the long term, this may give insights into the genetic component of human obesity.
Working closely with the Institute of Food Research (IFR), we will attempt to solve two problems involving classifying food types by their molecular spectra (infrared, IR, and nuclear magnetic resonance, NMR). The first problem involves classifying meat type. The horse meat scandal of 2012/3 has shown that there is an urgent need to increase current authenticity testing regimes for meat. IFR have been working closely with a company called Oxford Instruments to develop a new low-cost, bench-top spectrometer called the Pulsar for rapid screening of meat. We will collaborate with IFR to find the best algorithms for performing this classification. The second problem aims to find non-destructive ways for testing whether the content of intact spirits bottles is genuine or fake. Forged alcohol is commonplace, and in recent years there has been an increasing number of serious injuries and even deaths from the consumption of illegally produced spirits. The development of sensor technology to detect this type of fraud would thus have great societal value, and the collaboration with Oxford Instruments offers the potential for the development of portable scanners for product verification.
Our third case study involves classifying electric devices from smart meter data. Currently 25% of the United Kingdom's greenhouse gasses are accounted for by domestic energy consumption, such as heating, lighting and appliance use. The government has committed to an 80% reduction of CO2 emissions by 2050, and to meet this is requiring the installation of smart energy meters in every household to promote energy saving. The primary output of this investment of billions of pounds in technology will be enormous quantities of data relating to electricity usage. Understanding and intelligently using this data will be crucial if we are to meet the emissions target. We will focus on one part of the analysis, which is the problem of determining whether we can automatically classify the nature of the device(s) currently consuming electricity at any point in time. This is a necessary first step in better understanding household practices, which is essential for reducing usage.

Planned Impact

We have chosen our case studies to demonstrate the breadth of domains in which time series classification arises and we hope these will act as a catalyst for other biological, food and climate scientists to work with us and/or our code. The investigators on this project have a strong track record of working with industry, and we aim to exploit our research to have a direct impact.

The work with Institute of Food research has perhaps the greatest potential for immediate impact on society and the economy. The horsemeat scandal shook the public confidence in the sector and the complexity in the international market for meat make it hard to guard against further occurrences. Devices like O.I.s Pulsar offer a cost effective mechanism for screening against contamination. If we can find a better algorithm for classification there is a simple and direct path to implementation within Pulsar. Forged alcohol is commonplace, and cases vary from simple economic crimes through to fraud with serious health implications. In recent years there has been an increasing number of serious injuries and even deaths from the consumption of poor-quality, illegally produced spirits. The development of sensor technology to detect this type of fraud would thus have great societal value, and the collaboration with Oxford Instruments offers the potential for the development of commercial hardware to facilitate the usage of the algorithms our research produces. Improving Pulsar and developing a new product will both have a positive economic and societal impact. Devices like Pulsar help with the public engagement with science, as demonstrated by its appearance on the BBC1 program Ripoff Britain http://youtu.be/t8zWLat8NQ0.

The collaborative research with Imperial is part of the important drive to understand the genetic components of obesity. Model species are useful in this respect as it is possible to directly connect behaviour to genetics in a reproducible way. Hence, if we can automatically detect worms that are exhibiting aberrant behaviour, we can then determine what mutations caused it. Conversely, we can cause mutations in the worm then observe behaviour. Both of these tasks require a laborious, manual identification of mutants. This project will not be involved with performing the experiments. We will instead help look at the best ways of automating this time consuming task.

Smart meters will soon be in all of our homes collecting detailed data on our electricity usage. This massive investment in technology must yield a significant reduction in our carbon footprint to justify the cost. The key to altering patterns of consumer behaviour is providing useful and relevant information. This in turn requires the ability to extract knowledge from the raw data. We will concentrate on the problem of identifying the nature of devices being used in a household. This offers the potential for constructing more complex models of behaviour based on combined device usage which in turn may lead to more informative advice on how to modify behaviour.
 
Description Through extensive empirical experimentation involving over 30 million experiments we have shown that the basic COTE algorithm is significantly more accurate than over 30 other proposed algorithms and is on average 8% more accurate than the standard benchmark algorithms for time series classification
Exploitation Route We have released all the code for our experiments and can easily evaluate new algorithms
Sectors Agriculture, Food and Drink,Energy,Other
URL http://www.timeseriesclassification.com/
 
Description BBSRC iCASE Studentship
Amount £90,000 (GBP)
Organisation Scotch Whisky Research Institute 
Sector Charity/Non Profit
Country United Kingdom of Great Britain & Northern Ireland (UK)
Start 10/2016 
End 09/2021
 
Description Norwich Research Park Bioscience Doctoral Training Partnership
Amount £70,000 (GBP)
Organisation Biotechnology and Biological Sciences Research Council (BBSRC) 
Sector Public
Country United Kingdom of Great Britain & Northern Ireland (UK)
Start 10/2017 
End 09/2021
 
Title Time Series Classification Repository 
Description This is an extension of the UCR Time Series Classification and Clustering data repository that will be jointly maintained by UEA and UCR at the new website www.timeseriesclassification.com, a work in progress. 
Type Of Material Database/Collection of data 
Year Produced 2015 
Provided To Others? Yes  
Impact work in progress 
URL http://www.timeseriesclassification.com
 
Description Classifying early onset Alzheimers 
Organisation Medical Research Council (MRC)
Department Medical Research Council (MRC), MRC Cognition and Brain Sciences Unit
Country United Kingdom of Great Britain & Northern Ireland (UK) 
Sector Public 
PI Contribution We have begun to look at data provided by Richard Henson with the goal of seeing if we can contribute to the task of classifying patients with early onset Alzheimers
Collaborator Contribution They have provided us with data from 50 patients
Impact no outputs yet, it has just begun
Start Year 2017
 
Description Classifying insects 
Organisation University of California at Riverside
Country United States of America 
Sector Academic/University 
PI Contribution We have been working with Eamonn Keogh of UCR to apply our algorithms to the problem of insect classification from sound snippets
Collaborator Contribution UCR have provided us with over 100,000 sound recordings of insects
Impact This collaboration lead directly to a successful bid for a DTP studentship.
Start Year 2016
 
Description Dictionary based classifiers for time series classification 
Organisation University of Rennes 1 (Université de Rennes 1)
Country France, French Republic 
Sector Academic/University 
PI Contribution Simon Malinowski and Romain Tavenard from Rennes invited me to talk at a workshop in Italy. Further discussion lead to a collaboration on developing a specific type of algorithm. This lead to a paper which is currently under review
Collaborator Contribution They bought expertise and code in a specific form of algorithm used in Computer Vision which we adapted for Time Series Classification
Impact Paper under review for PKDD 2017
Start Year 2016
 
Description Scotch Whisky Research Institute 
Organisation Scotch Whisky Research Institute
Country United Kingdom of Great Britain & Northern Ireland (UK) 
Sector Charity/Non Profit 
PI Contribution SWRI have supported an extension of research related to one of the work packages in the form of financial support of £25,000 for a BBSRC iCASE studentship to start in September 2016.
Collaborator Contribution SWRI will provide advice and guidance for our attempt to develop a mechanism for non-intrusively detecting forged spirits.
Impact This collaboration has just begun, so there are as yet no outcomes
Start Year 2015
 
Title Time Series Classification WEKA Code Base 
Description This freely available code base contains implementations of over 20 recently proposed time series classification algorithms and methods to recreate the experiments reported in our papers 
Type Of Technology Software 
Year Produced 2016 
Open Source License? Yes  
Impact The code has been downloaded by over 200 researchers worldwide. 
URL https://bitbucket.org/TonyBagnall/time-series-classification