Peta-5: A National Facility for Petascale Data Intensive Computation and Analytics

Lead Research Organisation: University of Cambridge
Department Name: Physics

Abstract

The Peta-5 proposal from the University of Cambridge brings together 15 world-leading HPC system and application experts from 10 different institutions to lead the creation of a breakthrough HPC and data analytics capability that will deliver significant National impact to the UK research, industry and health sectors.

Peta-5 aims to make a significant contribution towards the establishment and sustainability of a new EPSRC Tier 2 HPC network. The Cambridge Tier 2 Centre working in collaboration with other Tier 1, Tier 2 and Tier 3 stakeholders aims to form a coherent, coordinated and productive National e-Infrastructure (Ne-I) ecosystem. This greatly strengthened computational research support capability will enable a significant increase in computational and data centric research outputs, driving growth in both academic research discovery and the wider UK knowledge economy.

The Peta-5 system will be one of the largest heterogeneous data intensive HPC systems available to EPSRC research in the UK. In order to create the critical mass in terms of system capability and capacity needed to make an impact at National level Cambridge have pooled funding and equipment resources from the University, STFC DiRAC and this EPSRC Tier 2 proposal to create a total capital equipment value of £11.5M; the request to EPSRC is £5M. The University will guarantee to cover all operational costs of the system for 4 years from the service start date, with the option to run for a fifth year to be discussed. Cambridge will ensure that 80% of the EPSRC funded element of Peta-5 is deployed on EPSRC research projects, with 65% of the EPSRC funded element of Peta-5 being made available to any UK EPSRC funded project free of charge by use of a light weight resource allocation committee, 15% going to Cambridge EPSRC research and 20% being sold to UK industry to drive the UK knowledge economy.

The Peta-5 system will be the most capable HPC system in operation in the UK when it enters servicen May 2017. In total Peta-5 will provide 3 petaflops (PF) of sustained performance derived from 3 heterogeneous compute elements, 1PF Intel X86, 1PF Intel KNL and 1PF NIVIDIA Pascal GPU (Peta-1) connected via a Pb/s HPC fabric (Peta-2) to an extreme I/O solid state storage pool (Peta-3), a petascale data analytics (Machine Learning + Hadoop) pool (Peta-4) and a large 15 PB tiered storage solution (Peta-5), all under a single execution environment. This creates a new HPC capability in the UK specifically designed to meet the requirements of both affordable petascale simulation and data intensive workloads combined with complex data analytics. It is the combination of these features which unlocks a new generation of computational science research.

The core science justification for the Peta-5 service is based on three broad science themes: Materials Science and Computational Chemistry; Computational Engineering and Smart Cities; Health Informatics. These themes were chosen as they represent significant EPSRC research areas, which demonstrate large benefit from the data intensive HPC capability of Peta-5. The service will clearly be valuable for many other areas of heterogeneous computing and Data Intensive science. Hence a fourth horizontal thematic of "Heterogeneous - Data Intensive Science" is included. Initial theme allocation in the RAC will be: Materials 30%, Engineering 30%, Health, 20%, Heterogeneous - Data Intensive 20%.

The Peta-5 facility will drive research discovery and impact at national level, creating the largest and most cost effective petascale HPC resource in the UK, bringing petascale simulation within the reach of a wide range of research projects and UK companies. Also Peta-5 is the first UK HPC system specifically designed for large scale machine learning and data analytics, combining the areas of HPC and Big Data, promising to unlock both knowledge and economic benefit from the Big Data revolution.

Planned Impact

As an innovative HPC service for data intensive science, Peta-5 will impact significantly on the research communities who make use of its resources. However, in addition to the expected science outcomes (e.g. papers in high-impact, peer-reviewed journals; keynote presentations at international conferences, etc.), Peta-5 will deliver impact in a number of other key areas:

1)Peta-5 will create one of the most powerful academic UK supercomputer facilities.

2)Peta-5 will provide the most cost effective petascale simulation capability in the UK providing unrivalled price performance. This unlocks sustainable HPC for academia and industry, demonstrating affordable petascale simulation capability. This is a game-changing capability widening access and opening new possibilities out of reach for many research projects or company budgets.

3)Peta-5 is currently the only HPC system in the UK aimed at data intensive computing, combining state of the art extreme I/O solid state storage technologies with emerging machine learning and data analytics frameworks. This provides a new capability for tackling the largest "Big Data" problems in UK research and industry.

In particular Peta-5 will:-

1)Enable new petascale academic research projects
Cambridge will pro-actively seek UK academic usage of the Peta-5 system by opening the system up to UK EPSRC researchers free of charge, with strong user support and low inertia application processes and particular emphasis on new users. Cambridge are well-connected to all levels of the Ne-I and via its involvement in many existing HPC academic networks will promote the uptake of the Peta-5 system.

2)Enable industrial use of petascale HPC capability
Cambridge have a long-established and successful industry engagement activity called CORE. CORE will proactively seek industry HPC use cases, promoting the use of HPC and advanced data analytics to drive industrial R&D.

3)Enable new extreme I/O and high performance data analytics capability
The Peta-5 architecture provides new extreme I/O capability combined with emerging machine learning and data analytics capability at a scale not available anywhere else in the UK. This will enable UK research projects and industry to develop new approaches to solving the largest "Big Data" problems addressed to date.

4)Cambridge have a specific partnership with the Alan Turing institute (ATI) to develop novel big data analytic methods and solutions to implement on the Peta-5 system. The ATI will then help disseminate the capability and train both academic and industrial beneficiaries.

5)Enable new advances in health informatics
Peta-5 will provide the advanced data analytics technologies and data safe havens for interdisciplinary research in health informatics, linking leading EPSRC research projects in this domain with the ATI, Addenbrookes and Genomics England (GEL). This combination of linkage and capability will result in ground breaking health informatics capability with potential use within the clinical setting. Partners such as Addenbrookes and GEL provide a direct route to patient health outcomes from the methods developed from the interdisciplinary research undertaken on Peta-5. Such outcomes can then be adopted nationally.

Publications


10 25 50