Mapping complex biological processes across the landscape: the problem of non-stationarity

Lead Research Organisation: Rothamsted Research
Department Name: Computational & Systems Biology

Abstract

Many important processes happen in the landscape. For example, certain soil bacteria, denitrifiers, use nitrate in their respiration. One product of this is nitrous oxide, a gas with 270 times the impact of carbon dioxide on global warming. Management of farmland affects how much nitrous oxide is released. We must understand and measure these processes, but managed soils and vegetation are extremely variable. They vary at fine spatial scales (e.g. between adjacent clods of an apparently uniform soil) and at coarser scales (e.g. between the bottom of a slope and the top of a hill). But we must make predictions of variables in this complex system from relatively few observations. Scientists do this with methods called geostatistics. We assume that a variable over a region has arisen from a random process. We know that this is not true, of course, but we often make such assumptions: the number on a thrown die depends on Newton's laws of motion, but we can treat it as a random variable, here a number from 1 to 6 that cannot be predicted in advance, and all occur with equal probability. A die will not behave this well if it is not a perfect, uniform cube. We can estimate a better model for a real die if we throw it many times and record the numbers that turn up. In the geostatistical picture of variation, we think of our data, obtained at a set of sites, as numbers obtained by throwing a die. The only complication of the model is that the numbers on two dice thrown at sites near to each other are more likely to be similar than on two dice at sites that are further apart. This is called spatial dependence and geostatistics requires that we can describe it. This is not simple; for any two sites a and b we only have one observation at each, and from this we can make no statements about their joint variation. The solution is to assume that the variation between two observations of a variable separated by some distance (e.g. 10m) in one part of the landscape, and the variation between another two observations 10m apart are, as it were, duplicate information about the variability in space. In this way we build up a model of the spatial dependence, called the variogram, but it depends on this assumption of an underlying process for which the variation between two sites depends only on how far apart they are, not on where they are. This is called stationarity of the variance. This assumption is often unrealistic in the landscape. The variability of a process like denitrification in a wet low-lying area with peaty patches in the soil will be much greater than than in a well-drained, cultivated arable field. The variability of soil pH over short distances may be larger in mixed sediments at the bottom of a slope than on an old eroded land surface. The aim of this project is to develop geostatistical methods to deal with such variation. Our idea is to treat the spatial dependence as a mathematical function of spatial position, adding some extra parameters to the model of spatial dependence. We believe that, while this won't remove assumptions from our analysis, the assumptions will be more plausible than stationarity. As well as developing new models of spatial dependence we shall develop exploratory methods to decide when more complex spatial models are needed, and to help design them for particular problems. We shall also investigate how our scientific knowledge of processes, described mathematically, can be used to help predict them when the variation is complex. We shall test and demonstrate these methods using new data on the rates of nitrous oxide emissions from soils in complex farmed landscapes with many very different land uses. This important variable will vary in a complex way, so that stationarity cannot be assumed with confidence. If successful this project will provide tools to study and predict many different complex variables such as soil biodiversity and pollution.

Technical Summary

We must often estimate properties of managed landscapes, which are complex and variable, e.g. greenhouse gas emissions or soil degradation. This may be done with geostatistics, but the attendant assumption of an underlying random process, that is stationary in the variance, is implausible in complex landscapes where the variability is likely to change in space. In consequence, geostatistical estimates will be suboptimal, measures of uncertainty will be unreliable, and our sampling will be inefficient. Existing solutions to this problem are either specific to spatio-temporal data and assume weak correlation in time, which is not plausible for land-based variables, or require arbitrary subdivision of the study area. We hypothesize that non-stationarity can be handled by a linear mixed model (LMM) for the variable of interest, with additional terms in the variance model that describe non-stationary aspects of the variation. These terms can be estimated by residual maximum likelihood (REML). Some preliminary analyses suggest that this approach is plausible, and it is also supported by recent work on modelling anisotropy (directional dependence) by REML. In addition to this core approach, we shall develop exploratory data analyses to identify components of the variance model where non-stationarity may be an issue. We also hypothesize that a mechanistic process model can be used to deal with non-stationarity; for example, by incorporating model predictions into the geostatistical LMM leaving an unexplained component of variation which may plausibly be treated as an outcome of a stationary process. We shall test our hypotheses using data on nitrous oxide emission rates from soils in complex managed landscapes with varied land uses and quantify the benefits of using our new method. The resulting methodology will be invaluable for the study and evaluation of key variables in the managed environment from pollutants to biodiversity.
 
Description The aim of this project was to show how the linear mixed model, a statistical model used to analyse spatially distributed data, can be extended to allow it to function with more realistic assumptions about how processes vary in space. We developed novel statistical methodology and demonstrated its value on data which show how the nitrous oxide emission potential of soil varies across a complex landscape. The project showed how better statistical modelling of such data is possible. It also gave insight into the soil properties which determine the variation of nitrous oxide emission from soil at landscape scale.
Exploitation Route The general methodology of non-stationary linear mixed models has been taken forward. RML uses it at the British Geological Survey, where a PhD student is applying it to understand the factors which drive the uncertainty of 3D geological models.
Sectors Agriculture, Food and Drink,Energy,Environment
 
Description The findings on nitrous oxide emission rates led to our involvement in the current Defra project to establish a platform for evaluating the greenhouse gas emission budgets for UK agriculture. The methdology on non-stationary linear mixed models is applied at the British Geological Survey to understand how the uncertainty of 3D geological models varies in space.
First Year Of Impact 2010
Sector Agriculture, Food and Drink,Environment
Impact Types Policy & public services
 
Description National Greenhouse Gas Platform
Policy Influence Type Participation in a national consultation
Impact no actual impacts realised to date
 
Description Royal Society Discussion meeting
Policy Influence Type Participation in advisory committee
Impact FInal report submitted to Defra.
 
Description Greenhouse Gas Inventory Platform
Amount £225,435 (GBP)
Funding ID AC0114 
Organisation Government of the UK 
Department Department For Environment, Food And Rural Affairs (DEFRA)
Sector Public
Country United Kingdom of Great Britain & Northern Ireland (UK)
Start 04/2011 
End 03/2016