The Arthropod Supertree of Life: An Online Interactive Resource for Testing Patterns in Arthropod Evolution and Biodiversity

Lead Research Organisation: The Natural History Museum
Department Name: Life Sciences


Around 80% of all animal species are arthropods: the group that includes insects, crabs and spiders. From their rapid radiation over 550 million years ago, they evolved to fill almost every habitat and exploit most imaginable lifestyles. Today, arthropods underpin virtually all ecological communities and food webs. They are of immense economic and medical importance to humans: as sources of food, crop pests and vectors of disease.

In order to understand the biodiversity of arthropods, to investigate the mechanisms by which they evolved, and to plan for their conservation, it is vitally important that we have a clear picture of their evolutionary relationships. There are many thousands of published evolutionary trees for particular arthropod groups at a shallow level (e.g., species within families) as well as many that attempt to resolve the more ancient branching events. These published trees represent an enormously rich resource, but one that largely remains locked within the pages of journals. This project will digitise 5,000 or more trees from across the arthropods, and make them available to all researchers electronically online.

Unfortunately, there are serious difficulties when researchers try to compare published trees: partly because they are derived from many different types of data (anatomy, molecules, genomes and fossils) and partly because they are analysed in an even greater variety of ways. More problematically, they often imply contradictory patterns of evolution. How, then, can we bring all of this information together to yield the giant, all-inclusive trees that evolutionary biologists and conservationists need, and do so without cherry-picking the data? Supertree methods are presently the most tractable approach, resolving conflict and finding overlap between the source trees using objective and repeatable rules. Such approaches have yielded the largest trees ever published.

Unfortunately, again, the construction of supertrees is presently very time-consuming and labour-intensive. Moreover, once constructed, it is extremely difficult or impossible to add new trees, to sub-sample the data (e.g., molecules or morphology), or to generate supertrees using different methods. Another core objective of this project is therefore to develop a set of software tools that will largely automate the process, providing inexperienced users with the ability to construct a supertree for any arthropod group at any taxonomic level (e.g., species, genera, families, etc.), and using multiple filtering criteria (e.g., only the most robust or recent source trees). We will then embed these tools in the website containing our data.

Existing, fast supertree methods are not without their problems, and another key objective of the project will therefore be to realise and program novel approaches (new Quartet Joining, Maximum Likelihood, Conservative and Bayesian methods are all under development by members of the team and our collaborators). The properties of these new methods need characterisation, and our arthropod dataset will offer the perfect test case against which to benchmark their performance.

We will then use our supertrees to ask a range of important questions in the study of arthropod biodiversity. Which evolutionary relationships are well-understood, and which are most uncertain and in need of further research? Which arthropod groups have an evolutionary branching sequence that matches the order in which they appear as fossils (such groups are useful for calibrating 'molecular clocks')? Is there a relationship between the age of arthropod groups and their present day diversity? We will also explore the utility of supertrees for addressing conservation priorities. Species that are alone on isolated branches of the supertree have greater than average 'evolutionary distinctiveness'. Where these are also imminently endangered, a powerful case can be mounted to prioritise their preservation.

Technical Summary

1. We will construct the largest ever supertrees of arthropods by synthesising 5,000+ peer-reviewed cladograms from the literature. The Researcher Co-I and Data Clerk will archive these in Newick format along with rich metadata in XML (character type, analysis type, branch support measures and complete bibliographic information) that will add significant value. A SynTax-funded prototype and proof of principle for crabs is already online.

2. We will develop and implement new supertree algorithms, including quartet joining, conservative, maximum likelihood and Bayesian methods. These will be incorporated into new versions of the open-source Supertree Toolkit (STK) alongside MRP variants, making it the most versatile supertree software available. We will also include tools to test for adequate overlap; a necessary prerequisite for efficient analyses. We will additionally incorporate 'taxonomic awareness' enabling trees to be produced at various hierarchical levels with no recoding of the source trees. We will also explore and program measures of supertree support, of congruence/conflict between data partitions, and of congruence between our trees and stratigraphic data. The arthropod case-study will be used for benchmarking.

3. All data and tools will be embedded online, and linked to analytical software written in Python and released under GNU GPL. A user-friendly GUI will enable anyone to produce supertrees easily but rigorously from any sub-sample of the data, and by multiple methods. Users will also be able to upload their own trees and metadata, enabling our resource to grow organically.

4. We will conduct several pilot studies for several focal clades (crabs, crayfish, bumblebees, dung beetles and butterflies). Specifically, we will collaborate with conservationists (Ben Collen and Richard Grenyer) to identify EDGE species, and to investigate the relationship between measures of phylogenetic spread and biogeographical distribution.

Planned Impact

Academic Impact

This project stands as a proof of principle for managing, curating and maximising the impact of a much larger database of published trees than assembled hitherto, along with its associated metadata. The project entails the development and implementation of important new methods for supertree construction, with applications for evolutionary biology, ecology, behavioural science and conservation. It will create a lasting legacy for the wider academic community in the form of the revised Supertree Toolkit (STK) and its associated website. The latter will comprise data, software and in-built data processing capabilities, all of which will benefit the wider biological community in future projects.
All of the new quartet joining, conservative, Bayesian and likelihood algorithms within the updated Supertree Toolkit will be released under an open source license, enabling other theoreticians and programmers to build upon its functionality. The front-end of the Toolkit will be easy for any researcher to use, and we envisage a lasting legacy from its redeployment on other groups of organisms.

This project is keenly supported by researchers on all major arthropod clades, for whom our resources will offer a comprehensive synthesis of the state of published knowledge. It will also highlight where disparate sources of data concur, and where there is significant conflict necessitating further research. Collaborative links have already been established with: Jonathan Coddington, Smithsonian (Arachnida); Jason Dunlop, Museum für Naturkunde (Chelicerata); Bill Shear, Hampden-Sydney College VA (Chilopoda); Adam Slipinski, CSIRO (Coleoptera); Geoff Boxshall, NHM (Copepoda and all Crustacea); Keith Crandall, Brigham Young (Decapoda); Greg Edgecombe, NHM (Diplopoda); Rudolf Meier, University of Singapore (Diptera); Richard Brusca (Isopoda); Stefan Richter, University of Rostock (Malacostraca); John Trueman (Odonata); Darren Mann, Oxford (Scarabaeida).
This project wil generate robust supertrees for use in secondary analyses by conservationists, ecologists, ethologists and evolutionary biologists. More importantly, these workers will be able to produce their own trees using any desired data filtering and processing criteria, as well as using powerful new supertree methods. We have established links with Richard Grenyer (Geography, Oxford) and Ben Collen (Head of Indicators and Assessment Unit, ZSL) in order to design our resources with this objective in view.

Economic and Societal Impact

Conservative estimates of the economic costs of biodiversity loss are around £40 billion per annum, although these figures are not currently included in estimates of GDP. An equivalent loss of 7% of GDP is predicted by 2050 if current rates continue. Approaches to conservation that simply count species are crude; the additional information imparted by large phylogenies allows evolutionary distinctiveness to be factored into policy-making decisions. This project will hugely simplify the synthesis of existing phylogenetic information for all groups by providing new methods and tools. It will specifically and immediately enhance our understanding of arthropod biodiversity; a clade containing 80% of all animal species. If our resource helps to slow the decline by just one thousandth of one percent over the next ten years (a modest claim), it's value might conservatively be placed at £4 million.
Public interest in biodiversity loss is enormous. The scale of this project, and the sheer size and inclusiveness of the trees that we will generate will make our work of great public interest. By adding an accessible 'public front end' to our website (linked to 'ARKive' images and species notes), we will improve public understanding of phylogeny and evolution, and raise awareness of the importance of, and applications for, systematics in general. This is of vital importance at a time when teaching of the discipline is declining.


10 25 50
publication icon
Akanni WA (2015) Horizontal gene flow from Eubacteria to Archaebacteria and what it means for our understanding of eukaryogenesis. in Philosophical transactions of the Royal Society of London. Series B, Biological sciences

publication icon
Haggerty LS (2014) A pluralistic account of homology: adapting the models to the data. in Molecular biology and evolution

publication icon
Wilkinson M (2016) Comments on detecting rogue taxa using RogueNaRok in Systematics and Biodiversity

Description We have implemented the loose supertree method.
We have implemented and published a simple Maximum Likelihood supertree method and developed and implemented associated statistical tests of inferred trees.
We have implemented, published a Bayesian supertree method and applied it some high profile case studies.
We have developed and implemented and published methods for identifying ineffective overlap and rogue taxa in input trees and phylogenomic data sets.
We have addressed through experiment alternative approaches to incorporating previously unsampled taxa into phylogenies.
Exploitation Route We have developed general tools that can be used by biologists needing to build phylogenies, focussed on issues of accuracy and efficiency.
Sectors Agriculture, Food and Drink,Environment,Healthcare,Pharmaceuticals and Medical Biotechnology
Description Our findings have only recently been published. We have made use of them to investigate patterns of horizontal gene transfer in the Eubacteria and Archaea.
First Year Of Impact 2016
Sector Education
Impact Types Cultural
Title Concatabominations 
Description This software implements a approach to detecting rogue taxa and ineffective overlap in phylogenetic and phylogenomic data sets. 
Type Of Technology Software 
Year Produced 2015 
Impact This work is generating a lot of interest and has led to several seminar invitations. 
Title LUs.t. 
Description This is an implementation of a Maximum Likelihood supertree method 
Type Of Technology Software 
Year Produced 2014 
Impact Proof of concept 
Description Effective Overlap talk 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact Scientific seminars on rogue taxa and effective overlap in phylogenetic and phylogenomic studies. These have been given at the University of Greifswald Phylogenetics Meeting (2014), the University of Frankfurt (2015), the Systematics Association Biennial at Oxford (2016), the Museum Alexander Koenig, Bonn (2015), an EMBO short course in Phylogenomics in Iquitos, Peru (2016) and the University of Michigan (2016),
Year(s) Of Engagement Activity 2014,2015,2016