HOD: Handling missing data and time-varying confounding in causal inference for observational event history data

Lead Research Organisation: University College London
Department Name: Institute of Child Health

Abstract

In medicine it is often important to obtain valid estimates of the effects (both beneficial and detrimental) of a new treatment. To do this, we typically compare outcomes in a group of patients who received the new treatment (treatment group) with those who did not receive the new treatment (control group). The randomised controlled trial (RCT) is the gold standard for obtaining these estimates of treatment effects because it fairly allocates patients to the two groups, which makes them likely to be comparable prior to the start of treatment, e.g. one group will not be older or younger, sicker or healthier and so on.

However, RCTs are very expensive and complicated to run, and are not necessarily appropriate for answering all questions about the effects of treatment. For example, a drug may cause cancer as a side-effect, but the cancers may only appear after several years of treatment. It is then unlikely that an RCT would be maintained for long enough to detect this effect. It would therefore be very useful to measure the effects of treatments by looking only at data about patients who received the treatments as part of their normal care (through "observational studies").

However unlike in RCTs, the investigators have no control over the assignment of patients to different treatment regimens in observational studies and therefore groups of patients given different drugs may differ in other ways as well. For example, patients with more severe disease may be more likely to be given drugs which are good at improving the disease but have unpleasant side-effects. If there is a difference in outcome found between the groups, it is not clear whether the difference is due to the fact that the groups are different beyond just the drugs received, or whether the difference was really caused by the treatment (i.e. it was a "causal effect"). One widely used method to make groups more comparable when estimating the causal effect is to calculate propensity scores. For each patient, his/her propensity score is the predicted probability of receiving a particular treatment based on that patient's characteristics at the time the treatment decision is made. Groups of patients with the same propensity score but different treatments should, on average, be comparable for all of their characteristics, and any differences in outcome between the groups should therefore be attributable to treatment.


The aim of this project is to extend standard methods for obtaining causal treatment effects so that they can be used when important information about patient characteristics is missing and when patient's treatment changes over time. Both of these situations are common in observational studies, thus it is important to have reliable and robust ways to deal with them. We propose a programme of methodological research to address the above situations in observational studies, with a particular focus on the effect of treatments on the time to clinical events (e.g. how long does a patient survive after a surgery, or how soon after the start of a new treatment do unpleasant side-effects start appearing). This project will provide a general framework and guidelines for practitioners who use observational data in medical research.

Technical Summary

Observational studies play an important role in the evaluation of treatment effects on long-term outcomes, when randomised controlled trials are not feasible because of size, time, budget and ethical constraints. Because of the absence of randomisation in observational studies, it is crucial to adequately control potential confounding from various factors (time-invariant and time-varying) in order to obtain causal effects of treatments. There has been rich literature on how to control potential confounding in observational studies such as using standard techniques-propensity score (PS) methods. However, there are various important methodological issues that have not been addressed adequately in the existing literature, including 1) partially missing confounder data in PS estimation; 2) sensitivity analysis for unmeasured confounding; 3) time-varying confounding; 4) multi-state treatment and outcome processes.


In the present application we aim to propose a programme of methodological research to address the above issues for the analysis and interpretation of data from observational studies, with a particular focus on event history data. We will develop and validate diagnostic tools in measuring the balance between treatment groups in terms of both observed values of confounders and their missing data patterns. We will provide a detailed evaluation of different missing data methods and PS methods, using balance diagnostic tools developed. We will develop general Monte Carlo sensitivity analysis methods for unmeasured confounding and non-ignorable missing data in measured confounders for common models for event history data analysis. We will develop robust time-varying PS methods for obtaining causal treatment effects when there are missing data in important time-varying confounders and explore a multistate framework to handle time-varying confounding in more general treatment and outcome processes for observational event history data.

Planned Impact

The immediate beneficiaries of this project will be the academic community involved in using data from observational studies to obtain causal effects of treatments, interventions or exposures.

In addition downstream beneficiaries will be clinicians and public health policy makers who wish to make healthcare decisions based on data from observational studies. Currently, there is no consensus in the causal inference community on how best to deal with missing data in propensity score estimation, complex time-varying confounding and unmeasured confounding in the analysis of observational event history data. This proposed project would provide a methodological framework for addressing these common problems and thus better inform healthcare decision making.

Publications


10 25 50
Farewell V (2017) Two-Part and Related Regression Models for Longitudinal Data in Annual Review of Statistics and Its Application
Li Q (2017) Accommodating informative dropout and death: a joint modelling approach for longitudinal and semicompeting risks data in Journal of the Royal Statistical Society: Series C (Applied Statistics)
 
Description Influenced training of practitioners or researchers - Book chapter 'Missing Confounder Data in Propensity Score Methods for Causal Inference'
Geographic Reach Multiple continents/international 
Policy Influence Type Influenced training of practitioners or researchers
Impact book chapter in Statistical Causal Inferences and Their Applications in Public Health Research by Springer
 
Description Introductory statistics short courses at MRC Biostatistics Unit
Geographic Reach National 
Policy Influence Type Influenced training of practitioners or researchers
Impact Organise the new introductory statistics short courses at the MRC Biostatistics Unit
 
Description Child Health Research PhD Programme -Statistical methods for missing data, linkage error, complex confounding in the causal pathway analysis using linked administrative data
Amount £56,589 (GBP)
Organisation Child Health Research Appeal Trust (CHRAT) 
Sector Charity/Non Profit
Country United Kingdom of Great Britain & Northern Ireland (UK)
Start 10/2016 
End 09/2019
 
Description ESRC DTC PhD program - Time-varying confounders and unmeasured confounders in longitudinal causal analysis of administrative data
Amount £57,000 (GBP)
Organisation Economic and Social Research Council (ESRC) 
Sector Public
Country United Kingdom of Great Britain & Northern Ireland (UK)
Start 09/2016 
End 09/2019
 
Description MASTERPLANS
Amount £164,800 (GBP)
Organisation Medical Research Council (MRC) 
Sector Academic/University
Country United Kingdom of Great Britain & Northern Ireland (UK)
Start 04/2015 
End 03/2019
 
Description Cincinnati Children's Hospital 
Organisation Cincinnati Children's Hospital Medical Center
Country United States of America 
Sector Hospitals 
PI Contribution Develop statistical methods for patten mixture models and sensitivity analysis for non-response two-phase sampling
Collaborator Contribution Expertise in missing data research
Impact No output yet.
Start Year 2016
 
Description Fudan University 
Organisation Fudan University
Country China, People's Republic of 
Sector Academic/University 
PI Contribution Develop methods for causal inference using electronic health records data
Collaborator Contribution Develop methods for causal inference using electronic health records data
Impact No outcome yet
Start Year 2017
 
Description Keele University 
Organisation Keele University
Country United Kingdom of Great Britain & Northern Ireland (UK) 
Sector Academic/University 
PI Contribution Develop methods for quantile estimation using complex survey data
Collaborator Contribution Expertise in nutritional research and growth chart estimation
Impact No outcome yet
Start Year 2015
 
Description MASTERPLANS consortium in lupus 
Organisation University of Manchester
Department Wellcome Trust Centre for Cell-Matrix Research
Country United Kingdom of Great Britain & Northern Ireland (UK) 
Sector Charity/Non Profit 
PI Contribution Co-investigator
Collaborator Contribution Principle Investigator of the funded consortium
Impact No outcome yet
Start Year 2015
 
Description The University of Texas at Austin 
Organisation University of Texas at Austin
Department Department of Statistics & Data Sciences
Country United States of America 
Sector Academic/University 
PI Contribution Develop methods for sensitivity analysis for analysing longitudinal data missing not at random
Collaborator Contribution Expertise in methods for handling missing data
Impact One publication in Statistics in Medicine in 2015
Start Year 2015
 
Description University College London 
Organisation University College London (UCL)
Department UCL Institute of Neurology
Country United Kingdom of Great Britain & Northern Ireland (UK) 
Sector Academic/University 
PI Contribution Develop multistate models for correlated processes
Collaborator Contribution Develop multistate models for correlated processes
Impact One manuscript submitted
Start Year 2016
 
Description University of Cambridge, Department of Pathology 
Organisation University of Cambridge
Department Department of Public Health and Primary Care
Country United Kingdom of Great Britain & Northern Ireland (UK) 
Sector Academic/University 
PI Contribution Analyse the CPRD data for disease progression of iron deficiency following oral iron treatment in primary care of UK
Collaborator Contribution Expertise in nutritional research
Impact No outcome yet
Start Year 2016
 
Description University of Cambridge, Department of Primary Care and Public Health 
Organisation University of Cambridge
Department Department of Pathology
Country United Kingdom of Great Britain & Northern Ireland (UK) 
Sector Academic/University 
PI Contribution Develop joint models for non-ignorable missing data
Collaborator Contribution Expertise in joint models of longitudinal and time-to-event data
Impact Ongoing research
Start Year 2017
 
Description A talk in the 14th Armitage workshop in MRC Biostatistics Unit 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Professional Practitioners
Results and Impact Dr Li Su gave a talk on 'Two-part model for longitudinal data' in the 14th Armitage workshop at MRC Biostatistics Unit on 17th November 2016.
Year(s) Of Engagement Activity 2016
 
Description Armitage Lectures 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Professional Practitioners
Results and Impact Annual workshop and lecture created and hosted by the MRC Biostatistics Unit, to honour the immense contributions of Professor Peter Armitage who was at the unit from 1947 to 1961, and whose work is recognised throughout the world as achieving a successful balance between methodological rigour and applied commonsense, to which all statisticians aspire.

An eminent medical statistician visits for a week and works with members of the unit. The highlight is the Armitage Lecture, where more than 100 delegates attend. This event raises the unit research profile and creates new collaborations.
Year(s) Of Engagement Activity Pre-2006,2006,2007,2008,
URL https://www.mrc-bsu.cam.ac.uk/news-and-events/armitage-lectureships-and-workshops/
 
Description Article in Brown University website 
Form Of Engagement Activity A magazine, newsletter or online publication
Part Of Official Scheme? No
Geographic Reach Local
Primary Audience Undergraduate students
Results and Impact Li Su, Senior Investigator Statistician, contributed to alumni spotlight article in Brown University website https://www.brown.edu/academics/public-health/biostatistics/news/2016-05/alumni-spotlight
Year(s) Of Engagement Activity 2016
 
Description CRiSM seminar at University of Warwick, Department of Statistics 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Professional Practitioners
Results and Impact Dr Li Su gave a talk on 'Bayesian modeling of the covariance structure for irregular longitudinal data using the partial autocorrelation function' for the CRiSM seminar series at University of Warwick, Department of Statistics in January 2016.
Year(s) Of Engagement Activity 2016
 
Description Present at Farr Institute of Health Informatics Research - London, UK, July 2016 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Professional Practitioners
Results and Impact Dr Bo Fu was invited to give a seminar on "Causal analysis of administrative data and methodological challenges" at Farr Institute of Health Informatics Research - London, UK, July 2016
Year(s) Of Engagement Activity 2016
 
Description Present at MRC Biostatistics Unit at Cambridge 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Professional Practitioners
Results and Impact Dr Bo Fu gave a seminar on "Causal analysis of administrative data and methodological challenges" at MRC Biostatistics Unit, Cambridge in Dec 2016
Year(s) Of Engagement Activity 2016
 
Description Present at School of Public Health, Fudan University, Shanghai, China, June 2016 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact Dr Bo Fu gave a seminar at School of Public Health, Fudan University, Shanghai, China, June 2016
Year(s) Of Engagement Activity 2016
 
Description Present at the 8th International Conference of Compuational and Methodological Statistics, London, UK, Dec 2015. 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact invited talk at The 8th International Conference of Compuational and Methodological Statistics, London, UK, Dec 2015.
Year(s) Of Engagement Activity 2016
 
Description present at School of Public Health, Shanghai Jiaotong University, Shanghai, China, June 2016 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact Dr Bo Fu gave a seminar at • School of Public Health, Shanghai Jiaotong University, Shanghai, China, June 2016
Year(s) Of Engagement Activity 2016