Recognition and Localisation of Human Actions in Image Sequences

Lead Research Organisation: Queen Mary, University of London
Department Name: Sch of Electronic Eng & Computer Science


The explosion in the amount of generated and distributed digital visual data that we nowadays witness can only be paralleled to the similar explosion in the amount of textual data that has been witnessed the decade before. However, while retrieval based on textual information made great progress and resulted in commercially usable search engines (e.g. Google, Yahoo), vision-based retrieval of multimedia material remains an open research question. As the amount of produced and distributed videos increases at an unprecedented pace, the significance of having efficient methods for content-based indexing in terms of the depicted actions can hardly be overestimated. In particular in the domain of analysis of human motion progress is expected to boost applications in human computer interaction, health care, surveillance, computer animation and games, and multimedia retrieval. However, mapping low level visual descriptors to high level action/object models is open problem and the analysis faces major challenges to the degree that the analysed image sequence exhibits large variability in appearance and the spatiotemporal structure of the actions, occlusions, cluttered backgrounds and large motions. In addition learning structure and appearance models is hindered by the fact that segmentation and annotation for the creation of training datasets are onerous tasks. For these reasons, there is a great incentive for the development of recognition and localisation methods that can either learn from few annotated examples or in a way that minimizes the amount of required manual segmentation and annotation.This project will build on recent development in Computer Vision and Pattern Recognition in order to develop methods for recognition and localisation of human and animal action categories in image sequences. Once trained, the methods should be able to detect and localise in a previously unknown image sequence, all the actions that belong to one of the known categories. The methods will allow learning the models in an incremental way starting from few examples and will allow computer assisted manual interaction using appropriate interfaces in order to facilitate model refinement. The methodologies will allow training the models in image sequences in which there is significant background clutter, that is in the presence of multiple objects/actions in the scene and moving cameras. No prior knowledge of the anatomy of the human body is a-priori considered, and therefore the models will be able to identify a large class of action categories, including facial/hand/body actions, animal motion, as well as interaction between humans and objects in their environment (such as drinking a glass of water).


10 25 50
Guo W (2012) Tensor learning for regression. in IEEE transactions on image processing : a publication of the IEEE Signal Processing Society
Koelstra S (2013) Fusion of facial expressions and EEG for implicit affective tagging in Image and Vision Computing
Koelstra S (2010) A dynamic texture-based approach to recognition of facial actions and their temporal models. in IEEE transactions on pattern analysis and machine intelligence
Kotsia I (2012) Higher rank Support Tensor Machines for visual recognition in Pattern Recognition
Kotsia I (2013) Computer Vision - ACCV 2012
Description The project has developed machine learning and computer vision methods for analysis of facial expressions and body gestures so as to recognise behaviour and actions of human in natural environments. We have advanced the state of the art and have shown that machines are getting better at performing such tasks.
Exploitation Route 1) Content providers and distributors could utilise the results on facial expression analysis for inferring people's affective and cognitive state while watching films and TV programs.

2) Robot manufacturers could utilise the methods for facial expression analysis and gesture recognition for natural interfaces.

3) Gaming companies could use both the pose estimation and the gesture recognition results for game control.

4) Applications like interactive programs that guide people through their daily exercises could be built based on the technology for gesture recognition and pose estimation.
The research can be utilised by companies and academic institutions that are interested in behaviour analysis. This includes analysis of human behaviour for assisted living (e.g of elderly people), or restaurants/shops that monitor costumer behaviour and/or preferences and/or interaction with products and/or reaction to provided services.

Our work on facial expression analysis can be used for analysing human reactions (e.g. affective states) to presentation of multimedia content. In the later direction, and in collaboration with partners from the FP7 Network of excellence Petamedia, results are already obtained and published.

Digital media companies can also utilise the findings. Specifically, the action spotting and action recognition algorithms developed in this project can be used for video annotation and/or retrieval system for better managing digital media.

Academic researchers can also utilised the theoretical findings of our work. In particular our works on tensor-based regression/classification or our works on max-margin non negative matrix factorisation are core pattern recognition methodologies with applications beyond the field of Computer Vision.

The dissemination efforts include a dedicate website (

Source code for several of our methods is provided online (
Sectors Creative Economy,Digital/Communication/Information Technologies (including Software),Healthcare
Description The work on localisation of human actions, and in particular the works on part-based models laid the foundations for research that led to collaboration with Yamaha Motors Ltd. That collaboration led to a follow up project with Yamaha, and a submitted patent for a pedestrian detection system (Spring2016). The work on human motion analysis is also supportive of a recently awarded InnovateUK project (SensingFeeling) that aims to monitor and access the affective state of people in the retail environment.
Sector Creative Economy,Leisure Activities, including Sports, Recreation and Tourism,Manufacturing, including Industrial Biotechology,Transport
Impact Types Societal,Economic
Description Direct Industrial Funding (from Yamaha Motors Ltd)
Amount £150,000 (GBP)
Organisation Yamaha Motors 
Sector Private
Country Unknown
Start 11/2014 
End 11/2016
Title Machine Learning codes 
Description Methods for data analysis, classification and regression. 
Type Of Material Data analysis technique 
Year Produced 2011 
Provided To Others? Yes  
Impact The code has been used by a few researchers worldwide. 
Description Academic Visit of Dr. Javier Trevor 
Organisation Universitat Jaume I
Country Spain, Kingdom of 
Sector Academic/University 
PI Contribution Dr. Javier Trevor, an academic at Universitat Jaume I, Spain, collaborated with me during a 6 month visit at QMUL. The visit was funded by an award obtained by the Spanish Ministry of Education and Research. The related proposal was entitled: Human Action Recognition with partial and hidden information
Start Year 2011
Description Collaboration with Imperial College 
Organisation Imperial College London (ICL)
Country United Kingdom of Great Britain & Northern Ireland (UK) 
Sector Academic/University 
PI Contribution Ongoing collaboration with the group of Prof. Pantic in facial and body gesture analysis that resulted in several publications. Joint supervision of Antonis Oikonomopoulos which resulted in several papers in action recognition. Collaboration in pose-invariant facial expression recognition in the framework of the work of Ognjen Rudovic
Start Year 2009
Description Collaboration with Institute on Telematics and Informatics 
Organisation Centre for Research and Technology Hellas (CERTH)
Country Greece, Hellenic Republic 
Sector Academic/University 
PI Contribution Within the framework of a Doctorate Programme that I initiated, Informatics and Telematics Institute (ITI-CERTH, Greece) funded the salaries and paid the fees of several researchers enrolled as PhD students in QMUL under my supervision. 2 students have already graduated, two will graduate until 2017 and 1 will enrol in Spring 2017. I am co-supervisor of the students.
Collaborator Contribution Informatics and Telematics Institute (ITI-CERTH, Greece) funded the salaries and paid the fees of the researchers, provides equipment and travel costs and co-supervision of the research.
Impact In the period 2009 - 2015 the collaboration has resulted in 25 publications
Start Year 2009
Title Max-Margin Semi-NMF 
Description This code implements the paper Max-Margin Semi-NMF (MNMF) as presented in Vijay Kumar, Irene Kotsia and Ioannis Patras, "Max-Margin Semi-NMF", in BMVC 2011. 
Type Of Technology Software 
Year Produced 2012 
Impact The paper has been cited 10 times since 2012. 
Title Support Tucker Machines 
Description This code implements Support Tucker Machines (STuMs) and Sw-STuMs, as presented in Irene Kotsia and Ioannis Patras, "Support Tucker Machines", in CVPR 2011, 2011. 
Type Of Technology Software 
Year Produced 2012 
Impact The paper has been cited 22 times since 2012. 
Title Tensor Regression 
Description This code implements Support Tensor Regression (STR) as presented in Weiwei Guo, Irene Kotsia and Ioannis Patras, "Tensor Learning for Regression", in IEEE Transactions on Image Processing, 2011. 
Type Of Technology Software 
Year Produced 2012 
Impact The paper has been cited by 30 researchers since 2012