Census 2022: Transforming Small Area Socio-Economic Indicators through 'Big Data'

Lead Research Organisation: University of Southampton
Department Name: Faculty of Engineering & the Environment


The potentially imminent demise of the decennial UK Census presents social, policy and commercial researchers with both a challenge and an opportunity.

The challenge is to transform ‘census-taking’ by finding robust alternative methods for creating traditional (‘census-like’) small area socio-economic indicators over time. The opportunity is to transform the very nature of the socio-economic indicators themselves (‘census-plus’ indicators) using new analytic methods applied to new geo-coded datasets and to radically accelerate the temporal cycle from decennial to annual or sub-annual production. If we are no longer to be restricted to what can be asked in a Census, then what kinds of social indicators might we want or be able to produce, how regularly and what forms of novel social, policy and commercial analysis might this then underpin?

The project will address these questions using existing large-scale geo-coded transactional datasets, including household level energy monitoring data, held at the University of Southampton. It will test the utility of a range of analytic techniques for deriving traditional and novel small area socio-economic indicators including relatively simple ratios and distribution measures as well as more experimental temporal sequence and profile analysis.

Planned Impact

A key objective of this project is to support the onward exploitation of the results by appropriate stakeholders. We do not expect to produce robust UK-wide small area indicators but to develop a range of validated approaches to the production of such indicators. Exploitable results will therefore include insights, know-how, data analysis tools and estimation algorithms rather than complete datasets.

Our potential end users include the following key sectors:

Local and national authorities with an interest in small area socio-economic indicators and local area trend/intervention analysis who may be able to apply our results to innovative evidence based policy-making and to influencing public policies and legislation at the local, regional, and national level;

Commercial data owners and analysts who may hold transactional datasets of potential future value or who may be interested in applying the analytic techniques to commercial market research thus potentially supporting the commercialisation and exploitation of knowledge, leading to spin out companies, and the creation of new processes, products and services;

Commercial data aggregators and analysts who will be able to use the results to improve or innovate in in their analytic methods and commercial products thus enhancing the research capacity, knowledge and skills of businesses and organisations;

Local and national organisations with an interest in novel 'sustainable consumption' indicators who can use the results as an input to policy targeting and/or intervention analyses thus potentially contributing to environmental sustainability, protection and impact reduction.

Our strategy for ensuring maximum exploitation of the project results concentrates on raising awareness of our work, engaging and collaborating with potential end users and supporting them in any exploitation.

We will raise awareness of our work with research users through three main channels: The project website will be regularly updated with summary results to be highlighted in our other communications; the project Stakeholder Group (SG) will be carefully selected not only to represent appropriate intra and inter-University interests but also to represent key research users; and we will use predominantly social media (especially twitter and Linkedin) and targeted 'news' articles through the Faculty and University publicity channels.

We will seek to engage future research users in our work firstly through the activities of the Stakeholder Group, secondly through the provision of tailored ad-hoc presentations on our results and thirdly through the rapid publication of project results and data via the project website. Whilst there is insufficient scope within the project's resources to carry out bespoke, user-specified analysis, we anticipate that this Open Research approach will lead to a range of potential future collaborations and we are already engaged in a number of discussions with potential partners. In addition our graduate student research project scheme will develop skiled analytic capacity in a cohort of future 'data scientists'.

The rate of impact could be rapid in the context of exploitation by the ONS' Beyond 2011 project team's ongoing study of options for Census 2021, in the analysis of novel local sustainable consumption indicators and in the generation of skilled graduate student capacity. However other impacts are likely to be longer term with the uptake of new methods and approaches requiring ongoing collaboration or 'people transfer' with or to the relevant sectors.
Description This small feasibility study explored the value of a number of existing large-scale transactional datasets in the estimation of small area socio-economic indicators as a potential supplement to future census-taking. The study reviewed and sought access to a number of data sources before using a small scale University of Southampton (UoS) electricity consumption dataset and a large scale Irish Smart Meter Trial (CER) dataset.
Our key findings include:
Working with ½ hour level pre-aggregated transactional data (e.g. CER data) for monthly samples is relatively straightforward using standard analytic tools (STATA, R, desktop PC/laptop). However the cleaning and aggregation of finer grained temporal data (e.g. the UoS 1 second data) for even small samples of households (c 100) at the monthly level required the use of the University's High Performance Computer (Iridis4). Analysts should not overestimate the value of using such data in its raw form without regard to the potential value of working with suitable aggregates or samples in a rigorous manner.
Multi-level regression modeling approaches proved valuable in determining the best co-variates to use in estimating 'census-like' characteristics at the household level where multiple observations on the same household were available;
K-means clustering proved a useful way to classify households according to their 24 hour half-hourly consumption profiles using while autocorrelation methods proved a useful way to create an indicator of 'habitualness' both weekdays and weekends. Whilst the intention was to use these in the estimation of census-like household characteristics (see below), they proved interesting in their own right as a new form of social indicator;
The number of persons and of children in a household could be relatively robustly inferred from half-hourly electricity consumption data of the kind likely to be available from smart meters. In each case the correct classification rate was around 73% and key predictors were the mean morning consumption and overall mean consumption however the addition of a measure of the ratio between evening and daytime consumption (evening consumption factors - ECF) as well as cluster membership based on k-means clustering of 24 hour consumption profiles also added explanatory power;
It proved possible to classify households in terms of employed vs non employed household 'heads' using weekday half-hourly electricity consumption data with a correct classification rate of c 64%. When the number of people and of children were included as co-variates the correct classification rate rose to 70%;
It proved difficult to classify households into income bands. Whilst mean and baseline (01:00 - 05:00) mean electricity consumption, maximum half-hourly consumption, cluster membership and measures of autocorrelation were all significant predictors but classification tests proved no better than random;
It did not prove possible to infer the floor area of the dwelling from electricity consumption data except to the extent that it correlated with the number of persons in the household;
Finally it proved relatively straightforward to determine the probability of household occupancy at a given point in time as a resource for use in planning survey fieldwork/census enumeration at both the dwelling and area levels.
Exploitation Route The findings are already being taken forward by four key stakeholders:
1. The Office for National Statistics who were involved in the project's advisory group and who funded a parallel but independent study have used project results in their recent study of the value of smart meter data in generating national statistics. This study is still ongoing and members of the project team have given ongoing advice to the ONS;
2. Members of the Census & Geodemographics Group of the Market Research Society who were also involved in the project's advisory group have been keen to arrange further meetings and presentations of results in order to understand the potential for such data for their respective businesses.
3. At least one local network operator (DNO) has expressed interest in the 24 hour classification and autoregression methods as ways to better understand the nature of electricity demand on their networks, especially at evening peak (winter) periods;
4. The EPSRC/ESRC funded DEMAND End User Energy Demand Research Centre has expressed a similar interest in taking forward the autocorrelation analysis as a way to understand aspects of habitual consumption behavior and this will be implemented through the PI's role in DEMAND.
Sectors Communities and Social Services/Policy,Digital/Communication/Information Technologies (including Software),Energy,Leisure Activities, including Sports, Recreation and Tourism,Retail,Other
URL http://www.energy.soton.ac.uk/tag/census2022/
Description The results have been used to: - inform ongoing 'Big Data' research by the ONS on the practicality of using smart meter data for official statistical purposes; - inform the activities of the Census & Geodemographics Group of the Market Research Society who were involved in the project's advisory group and are keen to arrange further meetings and presentations of results in order to understand the potential for such data for their respective businesses. One such iis scheduled for March 2016 alongside contributions from the ONS; - inform the ongoing customer analysis techniques of local distirbuted network operators (DNOs) who have expressed interest in the 24 hour classification and autoregression methods; - inform the ongoing research activities of the DEMAND (RCUK EUED Centre) and LCNF funded SAVE project.
First Year Of Impact 2014
Sector Energy,Government, Democracy and Justice,Retail,Other
Impact Types Societal,Economic,Policy & public services
Description Input to ONS 'Big Data' Project thinking
Geographic Reach National 
Policy Influence Type Influenced training of practitioners or researchers
URL http://www.ons.gov.uk/ons/guide-method/development-programmes/the-ons-big-data-project/what-is-ons-d...
Title STATA code to process & analyse 1 minute electricity consumption data from UKDA 
Description STATA code to: - explore One-Minute Resolution Domestic Electricity Use Data, 2008-2009 http://discover.ukdataservice.ac.uk/catalogue?sn=6583 - compare electricity demand profiles for different kinds of households - test regression models to predict consumption from attributes 
Type Of Technology Software 
Year Produced 2014 
Impact none known 
URL https://github.com/dataknut/Census2022
Title STATA code to process & analyse CER Irish Smart Meter Data 
Description Code to: - process CER half-hourly electricity consumption data (see http://www.ucd.ie/issda/data/commissionforenergyregulationcer/) and merge to survey files - explore data (descriptive statistics) & merge cluster data created in R - calculate consumption autocorrelation coefficients 
Type Of Technology Software 
Year Produced 2015 
Impact none known 
URL https://github.com/dataknut/Census2022
Description MRS Census and Geodemographic Group (CGG) Meeting 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Professional Practitioners
Results and Impact Several members of the group have since followed-up with requests for more information and have volunteered their services to the project's Advisory Group.

Several members of MRS CCG joined project Advisory Group and follow-up presentation of results requested.
Year(s) Of Engagement Activity 2013
URL http://www.energy.soton.ac.uk/census-2022-project-grabs-attention/
Description ONS research meeting 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Professional Practitioners
Results and Impact Discussions of research results lead to parallel ONS funded feasibility study.

Has helped encourage the ONS to develop capacities and interests in using 'big data' for small area statistics
Year(s) Of Engagement Activity 2014
URL http://www.ons.gov.uk/ons/guide-method/development-programmes/the-ons-big-data-project/what-is-ons-d...