Integration of enhanced protein function prediction with experimental studies of fertilisation in Plasmodium - a wet/dry study

Lead Research Organisation: Imperial College London
Department Name: Life Sciences

Abstract

This proposal is to conduct a wet/dry study that will benefit from the synergies of a computational group developing novel bioinformatics strategies for protein function prediction working closely with an experimental group requiring these methodologies to prioritise wet studies to extend our understanding of fertilisation in Plasmodium. Genome projects are determining the sequences of proteins in numerous species including, man, model mammals, plants (including those of relevance to agriculture), and pathogens of human, animals and plants. Central to the understanding and exploitation of this information is the assignment of function to the proteins. A range of computational approaches have been developed to assign function to proteins. Several approaches, including one developed in our group, use information from the sequence and the 3D protein structure. Other approaches use information at a systems level / which genes are turned on or off together (transcriptomics, proteomic) and which proteins interact (interactomic). This proposal is to develop an enhanced method of function prediction that integrates information from sequences, structures, transcriptomes , proteomes and interactomes. The computer program will be made available to the academic community via a web server. We will also take part in international blind trials of function prediction. This general development of software will be targeted at understanding the fertilisation of Plasmodium gametes. One particular species of Plasmodium, (Plasmodium falciparum) is the parasite that is responsible for the majority of malarial deaths. The disease presents a risk to 40% of the world's population and is responsible for c. 400 million cases and 3 million deaths annually world wide. Plasmodium is transmitted from person to person by the bite of an anopheline mosquito. In addition, Plasmodium, and notably Plasmodium berghei (a parasite of rodents, that is not pathogenic to man) has become a model organism for study of parasite/host interactions because of the importance of understanding the molecular basis of malaria. The proteins in the gamete of P. berghei have been recently characterised in our laboratory. To direct our proposed experimental studies, the enhanced bioinformatics tools for function prediction will be applied to these gamete proteins. In addition, the bioinformatics tools will be used to improve the functional annotation of Plasmodium sequences. Our results will be added to the community databases (Gene DB and PlamoDB) describing the annotation of Plasmodium.

Technical Summary

1. Evaluation and application of available function prediction tools to Plasmodium. The bioinformatician and the biologist will jointly annotate the proteome of the male gamete of Plasmodium, i.e. a wet/dry approach. We will also apply our software to the entire Plasmodium proteome and add our annotation to community databases (GeneDB and PlasmoDB). 2. Development and dissemination of function prediction from domain combination. We will identify each component domain combination (using sequence and enhanced fold recognition) and use machine learnt (support vector) rules to provide a set of possible functions for the unknown protein. We will take part in international blind testing of function prediction. 3. Development and dissemination of an integrated approach for function prediction. We will integrate the results from a range of other prediction approaches including enhanced sequence-based function prediction, co-expression data analysis and interactome data.. 4. Experimental characterisation in P. berghei of male gamete proteins involved in fertilisation directed by the bioinformatics analysis using: 4.1 Gene knockout - Two independent clones of transgenic parasites arising from 2 rounds of drug selection will be subjected to our routine analysis for their ability to make gametocytes, to exflagellate (make male gametes), to produce ookinetes in vitro, and to produce oocysts in vivo. 4.2 Protein localization - This will be approached using i) a robust method for gfp- and/or c-myc tagging malarial proteins and ii) confirm these studies by locating the native protein with antibodies raised to protein domains expressed in E.coli . 4.3 Protein complex detection - Hypotheses will be tested by using the antibodies and tagged parasite combinations described in step 5.2, to attempt to identify by co-precipitation proteins complexed to the target molecule. We will determine whether the complex is formed within the male gamete or between gamete sexes. Co-funded by EPSRC.
 
Description • Development of two new methods for analysis of protein function - 3DLigandSite for predicting small molecule binding sites in proteins and CombFunc for Gene Ontology based protein function prediction. Both methods have performed well in the international assessments CASP (Critical Assessment for Protein Structure Prediction) and CAFA (Critical Analysis of Functional Annotation), for example with CombFunc in the top ten methods worldwide for the most recent CAFA2 assessment. Both methods are available as web servers making them available to the scientific community and between them receive more than 100,000 per annum.
• Application of these methods and other to annotate the proteins present in the Plasmodium berghei male gamete proteome. This enabled functions to be proposed for 50% of the proteins present in the proteome that were previously annotated as hypothetical or of unknown function.
• This analysis identified a number of potential cell surface expressed proteins that could be targeted to distrupt the plasmodium sexual cycle. Theses were investigated in the wet laboratory.
Exploitation Route The new approaches used in the computational methods could be adapted by others working in this area.
The functional annotation data generated for the Plasmodium gamete could be used by wet lab researchers working on plasmodium.
Development of methods to prevent and treat malaria.
Sectors Agriculture, Food and Drink,Healthcare,Pharmaceuticals and Medical Biotechnology
 
Description The funding developed novel algorithms to predict protein function from sequence. The method was applied to the parasite Plasmodium and protein identified which were then studied experimentally.
First Year Of Impact 2010
Sector Agriculture, Food and Drink,Healthcare,Pharmaceuticals and Medical Biotechnology
Impact Types Economic
 
Description Biomedical Resource Development Fund
Amount £830,000 (GBP)
Funding ID WT104955MA 
Organisation The Wellcome Trust Ltd 
Department Wellcome Trust Institutional Strategic Support Fund
Sector Charity/Non Profit
Country United Kingdom of Great Britain & Northern Ireland (UK)
Start 01/2015 
End 12/2020
 
Title 3DLigandsite 
Description Predicts ligand binding sites for a protein 
Type Of Technology Webtool/Application 
Year Produced 2010 
Impact Widespread use by the community 
URL http://www.sbg.bio.ic.ac.uk/3dligandsite/
 
Title CombFunc - A server for protein structure prediction 
Description CombFunc takes a protein sequence and uses a range of bioinformatics methods to assign its function. 
Type Of Technology Webtool/Application 
Year Produced 2009 
Impact Performed well in the international blind trial of protein function prediction CAFA held in 2013. 
URL http://www.sbg.bio.ic.ac.uk/~mwass/combfunc/
 
Title Confunc 
Description A web server to predict protein function from sequence 
Type Of Technology Webtool/Application 
Year Produced 2008 
Impact Used by bioscience workers. Now incorporated into CombFunc 
URL http://www.sbg.bio.ic.ac.uk/confunc/about.html
 
Description Imperial Festival & Fringe (open to public) 
Form Of Engagement Activity Participation in an open day or visit at my research institution
Part Of Official Scheme? No
Geographic Reach Regional
Primary Audience Public/other audiences
Results and Impact The Imperial festival is attended by over 10,000 visitors ranging from policy makers, the general public including children of all ages. We demonstrated the implications of understanding protein structure. At our stand we had over 100 visitors.
Year(s) Of Engagement Activity 2014,2016
URL https://www.imperial.ac.uk/be-inspired/festival/
 
Description Lecture - Art and Science 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Public/other audiences
Results and Impact Talk highlighted link of structural biology and art.

Follow up invitation to talk at a human/computer iteraction conference
Year(s) Of Engagement Activity 2013
 
Description School lecture (London) 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach Local
Primary Audience Schools
Results and Impact Talk to school children to spark interest in science

Requests for work experience
Year(s) Of Engagement Activity 2012
 
Description Talk at school 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach Local
Primary Audience Schools
Results and Impact Describing use of bioinformatics in medical research
Year(s) Of Engagement Activity 2015
 
Description Work experience for 16-18 year old pupils 
Form Of Engagement Activity Participation in an open day or visit at my research institution
Part Of Official Scheme? No
Geographic Reach Local
Primary Audience Schools
Results and Impact We provided 1 weeks work experience for about 6 students each year. They visting facilities at Imperial and we introduced to computer programming and molecular graphics.
Year(s) Of Engagement Activity 2014,2015