ISCF WAVE 1 AGRI TECH Agronomic Big Data Analytics for improved crop management

Lead Research Organisation: University of Nottingham
Department Name: Sch of Biosciences


Agricultural systems are complex, and must be managed if we are to achieve food security and maintain environmental quality. The management of complex systems in industry and commerce is being improved by the collection, processing and analysis of "big data" sets. For some years farmers have had the potential to collect big data sets on their crops and soils using GPS-driven monitors on the combine or tractor, data from satellite-borne sensors and the direct sampling and analysis of soils. This raises the question of whether agriculture can enter the big data era in order to solve management problems more quickly and robustly than through the conventional approach of field trials at a limited number of experimental sites. We contend that this is possible, but only by using methods to analyse the data that are biologically meaningful rather than by blindly mining data for correlations. This is a feasibility study to test two tailored big-data analytical methods on a large data set on arable fields from across the U.K.

Two general approaches will be used, both of which have already been developed and published in the peer-reviewed literature, and used as research tools. The first is called boundary line analysis, a method to identify the maximum yield that a crop can achieve as a function of some soil or crop property that represents a factor (nutrient supply, canopy development) that may limit the potential yield. Boundary line analysis requires big data sets, but has the potential to give greater biological insight into the crop system, and to facilitate management decisions to remove limiting effects, than the relatively crude tools that are used in much data mining.

The second approach is focussed on the analysis of yield maps produced by yield monitors on combine harvesters equipped with GPS. These maps show complex patterns of spatial variation, which are often hard to interpret usefully. When maps for two or more seasons are overlaid, the variability is even more complex. In past research we have shown that a pattern-recognition method called k-means clustering can be used to subdivide a field into regions within which the season-to-season fluctuations in yield are more or less uniform. One region may show consistent high yields, and another consistent low yields, while others fluctuate between seasons. Such regions are likely to represent parts of the field where the crop is subject to similar limitations. For example, where the soil available water content is relatively small yields may drop in drier years. A region with an emerging nutrient deficiency may show a steady decline in yield over a series of seasons. By relating the regionalization of the field, and each regions characteristic yield variations over time, to soil and other environmental information, we can hope to identify the key limiting factors at subfield scales, and by doing these analyses on big data sets, farm and regional scale patterns should also emerge.

Within this project we shall show how a big agronomic data set can be most effectively analysed to allow the agronomy company which holds it best to advise their customers and obtain maximum value from the data that they collect. This will help to support improved management at farm scale, possibly including the use of precision agriculture methods to respond to within-field variation.

Technical Summary

Large data sets on crop yield and soil properties (as opposed to experimental data) rarely yield simple answers to conventional statistical analysis. It is unusual, for example, to find a strong relationship between crop yield and a soil property that can be expressed by a regression model . Similarly, the spatial variations of yield in a single field over successive seasons can be markedly different with small correlations between yield in any two seasons. This means that one cannot, in general, use yield maps to segment fields into zones with consistent yield performance. For this reason we propose that, to be agronomically informative, the analysis of big data sets requires hypothesis driven models. We will propose and test two such models, and then work with our commercial partner to integrate them into a data interpretation service.

First, we shall use boundary line analysis as a method to model limiting effects of environmental factors on crop yield. This was first proposed by Webb (1972), who suggested that the limiting response of a biological system, such as yield, (y) to an environmental variable, such as a nutrient concentration (x) is seen in the upper boundary of a scatter plot of y against x. A statistical formulation of this model, which can be fitted and tested by rigorous methods, has only recently become available through research by the PI and colleagues. It will be applied in the current project to examine potentially limiting effects on yield of a range of physical and chemical soil properties.

We shall address the problem of the complex spatio-temporal variation of crop yield using k-means cluster analysis of yield map sequences. The method identifies subregions of a field within which crop yields, over seasons, are more internally uniform than within the field/farm as a whole. We shall test the hypotheses that such subregions express underlying soil variation at farm scale in a way consistent with boundary line models.

Planned Impact

This project will demonstrate the potential for focussed statistical analysis of Big Data sets from the agricultural sector using analytical models which embody clear agronomic concepts, and which will allow the identification of limiting factors on crop production at within-field to regional scales. This has the potential to improve the management of farm land, whether by identifying regional trends in limiting factors, reflecting climatic and geological constraints, or factors that apply to just part of a single field (e.g. where a potentially high-yielding subregion has developed a nutrient deficiency due to sustained high offtake-rates for nutrients in crop products and residues). As such, the methodology has the potential to support the implementation of regional strategies for advisors and agribusinesses at one spatial scale, and the use of precision farming technology for improved profitability and reduced environmental impact at another.

In order to have this impact the methodology must be applicable to a substantial volume of data representing a significant proportion of agricultural land, integrated with existing workflows for data collection and advice and support to growers. This must be achieved in a sustainable commercial framework. Because this is a catalyst project there is enormous potential to achieve this through the enhanced opportunity for the commercial partner AgSpace to offer big-data based services to its customers. There is an existing commercial relationship between AgSpace and large agri-business like BASF, Syngenta, Farmcare, IPF and Agrii. This means that the project will have an immediate reach, via these land managers, advisors and commercial organizations that directly manage or influence the management of a significant proportion of agricultural land in the UK and further afield.

These immediate impacts would be seen in land management. In addition, there is also considerable potential value of these big data approaches to agricultural research. The agricultural research market that this project wishes to access is very large in terms of financial value. There are 2 key sectors we wish to explore; firstly, the commercial sector is increasing R & D budgets annually to stay ahead of competition. For example, Syngenta invested over $1.4 billion in 2014, BASF crop production invest £215 million annually and Agrii invest over £1m annually into agricultural research and development (R&D), these are a few examples of existing AgSpace customers that offer a clear route to market. Second, there is considerable potential value of these methods to publically-funded agricultural research. One particular opportunity in the public sector is the imminent revision of Defra's RB209 recommendations on soil nutrient and pH management.


10 25 50