A Deep Learning Multi-Layer Perceptron and Remote Sensing Approach For Soil Health Based Crop Yield Estimation - ScienceDirect
A Deep Learning Multi-Layer Perceptron and Remote Sensing Approach For Soil Health Based Crop Yield Estimation - ScienceDirect
Show more
Highlights
• Soil health parameter estimation using simple regression techniques.
Abstract
In recent years, Deep Learning Multi-Layer Perceptron (DLMLP) neural networks have shown
remarkable success in addressing crop yield forecast related problems. The methodologies
used so far for crop yield forecast with remotely sensed data were focused upon vegetation
indices generated from optical data. The prediction of crop yield in an accurate manner by
developing robust machine learning models based on soil health parameters is crucial since it
helps keep a track of soil health as well as its impact on overall yield. This study aims to utilize
remotely sensed Microwave satellite data from Sentinel-1 and optical data from Sentinel-2,
and field data to estimate three important soil health parameters- Soil Moisture, Soil Salinity,
https://www.sciencedirect.com/science/article/pii/S1569843222001546 1/34
10/21/22, 11:07 AM A deep learning multi-layer perceptron and remote sensing approach for soil health based crop yield estimation - ScienceD…
and Soil Organic Carbon (SOC). The study has been carried out in the Rupnagar district of
Punjab in India. The estimated soil health parameters, SAR backscatter, and optical remote
sensing satellite data parameters were utilized to estimate wheat crop yield. The soil health
based DLMLP model performed best in crop yield estimation and gave R2 values of 0.723 and
0.684 in the training and testing phases, respectively, and Mean Absolute Error (MAE) of 0.98
and Root Mean Square Error (RMSE) value of 1.24 for the 2019–20 season. The DLMLP test R2
was 42.2% more than the Ordinary Least Squares Regressor (OLS), while the MAE and RMSE
were 37.97% and 38.61% less than the OLS regressor for wheat crop yield estimation. The soil
health-based DLMLP model gave satisfactory yield estimation accuracy in the absence of
validation of soil health parameter values for the preceding years-2015–16 till 2018–19 wheat
seasons. This study's novel feature is that it estimates soil health parameters for the early
stages of wheat crop growth when soil lies mostly exposed and utilises them for crop yield
prediction.
Graphical abstract
Previous Next
Keywords
Crop yield forecast; Deep Learning Multi-Layer Perceptron (DLMLP); Machine
learning; SAR backscatter; Soil health parameters
1. Introduction
With food security being a highly debated and a heated topic of discussion worldwide, it
becomes crucial to have accurate and robust food crop yield estimates (Bose et al., 2016). With
the global population rising and expected to reach 9.8 billion by 2050, the food demand is also
https://www.sciencedirect.com/science/article/pii/S1569843222001546 2/34
10/21/22, 11:07 AM A deep learning multi-layer perceptron and remote sensing approach for soil health based crop yield estimation - ScienceD…
likely to increase (Luo et al., 2013). With more forest land brought into agriculture and
agricultural land being taken up for growing human settlements, there is a drastic change in
global Land Use/Land Cover (LULC) anticipated (Haufler et al., 2022, Ji et al., 2022). This is
bound to have severe repercussions in terms of climate change and abrupt seasonal trends (Ji
et al., 2022). The world's developing economies that have seen tremendous economic
development and growth in the past would see massive setbacks and resource stress (Haufler
et al., 2022).
Soil is perhaps the most essential natural resource that directly or indirectly supports all life
on Earth (Richardson, 2010). Thus, it becomes increasingly important to regularly monitor the
health of the soil since healthy soil ensures a good crop yield (Gylfason, 2001). Many studies
have been carried out globally using remote sensing for agriculture and crop yield estimation.
However, such studies were mainly focused on crop phenology (Peluso, 1993). There is still a
paucity of dedicated applications of remote sensing for soil health monitoring and soil health-
based crop yield estimations (Pointing and Belnap, 2012, Van Der Heijden et al., 2008). Most
soil health parameter estimation studies have been in-situ field measurements or laboratory
testing of the collected samples (Schloter et al., 2003). Remote sensing datasets were mainly
confined to crop identification and classification for an extended period (Obia et al., 2016).
Initially, multispectral remote sensing datasets were utilised to estimate soil health
parameters like soil moisture and salinity (Hassan-Esfahani et al., 2015). These studies used
moisture and salinity indices derived from band-rationing the different bands of the
multispectral remote sensing data (Carrão et al., 2016). Wang and Qu (2009), reviewed soil
moisture index-based soil moisture estimation from optical, thermal and microwave remote
sensors and found that moisture indices are highly efficient.
Similarly, Martínez-Fernández et al. (2016), used soil moisture index to estimate soil moisture
by applying SMOS model. Further, Allbed et al. (2014), used the salinity index to estimate soil
salinity in a highly accurate manner. Since the optical datasets do not have an all-weather data
availability, SAR datasets were utilised since they have cloud penetration ability and all-
weather data availability (Zhang et al., 2014). SAR backscatter increases with the increase in soil
moisture since the penetration in soil decreases (Li et al., 2021). Therefore, SAR backscatter is
utilised in various modelling techniques for efficiently carrying out the estimation of soil
moisture (Srivastava et al., 2009). Since the soil salinity is dependent upon soil moisture,
increases in soil moisture increase the dielectric constant, causing ions to dissociate more and
increase the soil salinity (Lesmes and Friedman, 2005, Pandey et al., 2018). Increase in soil
moisture also causes more decay and adding up of organic matter in the soil (Scott et al.,
1996). Therefore, SAR backscatter can be utilised for soil salinity and SOC estimation as well
using an indirect approach (Mulder et al., 2011). This study utilises moisture and salinity
indices from multispectral satellite data along with SAR backscatter and field collected data to
estimate soil moisture, soil salinity and SOC.
Soil moisture is a crucial soil health parameter since moisture is needed for most of the
metabolic activities of crop plants (Biau et al., 2012). An adequate amount of volumetric
moisture content is required for intake by crop plants since it also acts as a transport media
https://www.sciencedirect.com/science/article/pii/S1569843222001546 3/34
10/21/22, 11:07 AM A deep learning multi-layer perceptron and remote sensing approach for soil health based crop yield estimation - ScienceD…
for different minerals that help plant growth (Sarwar et al., 2010). Soil moisture also regulates
other soil health parameters like soil salinity, SOC, and micro and macronutrients (Zalidis et
al., 2002). Since volumetric soil moisture is the moisture that is available for plants (Denmead
and Shaw, 1962), this study utilises volumetric soil moisture as a parameter for wheat crop
yield estimation.
Soil salinity is a condition where soluble salts accumulate around the root nodules of plants,
thus blocking further water intake (Shahbaz et al., 2012). While a wheat crop is a salinity
tolerant to some extent (Shahbaz and Ashraf, 2013), for healthy soil, it is essential to control
the salinity levels of soil so that the soil can support other crops as well. Since Electrical
Conductivity (EC) is an indicator of soil salinity (Rhoades, 1996) and most of the soil salinity
measurements are carried out in EC, this study utilises the sub-surface EC as a parameter in
the estimation of wheat crop yield.
The SOC is a significant soil health parameter, and soil rich in SOC is crucial for better crop
yield (Lal, 2015). Thus, SOC is also one of the twelve soil health indicators endorsed by
governments worldwide and in India, along with soil moisture and salinity. SOC is also
important since it affects the presence of other soil nutrients like Nitrogen (N), Phosphorus
(P), Potassium (K) and many others (Wright et al., 2011). The role of SOC in forming nano
minerals has been well explained by Basak and Biswas (2010). Therefore, an estimation of SOC
is vital as its presence in the soil regulates other soil nutrients and crop yield.
There is an increase in soil salinity and crop failure cases, hence an increased capital
investment is needed for ensuring better yields by farmers. This results in meagre profits over
the years (Zhang et al., 2021, Singh, 2000). With cities and towns increasing tremendously and
nothing concrete being done to save farmers and agriculture, an acute food shortage is bound
to be seen in the upcoming years (Chakravarti, 1973, Chattopadhyay and Mitra, 2018). This is
when the need for an accurate and robust crop yield estimation is felt. In the recent few years,
there has been a multitude of machine learning models used for crop yield estimation. Out of
which, the deep learning neural networks, support vector regressions, random forest and K-
means nearest neighborhood have proved to be highly accurate in addressing crop yield
prediction problems. There have been remarkable studies comparing different machine
learning algorithms in the recent past and have demonstrated the feasibility of machine
learning based crop yield prediction (Gonzalez-Sanchez et al., 2014). Depending upon the
datasets and their size, different machine learning algorithms have varied in their
performances in terms of accuracy and errors (Abraham et al., 2020, Ismail et al., 2011, Sagan
et al., 2020). While the simple regression-based algorithms have worked well with less data,
the deep learning neural network algorithms have performed better with high volumes of
remote sensing and field datasets in most of the comparative studies carried out (Chlingaryan
et al., 2018, Kim and Lee, 2014, Kim et al., 2020).
The Deep Learning Multi-Layer Perceptron (DLMLP) is a family of feedforward deep learning
Artificial Neural Networks (ANN) are often used for big data-based classification and
regression problems and have proved to be highly successful (Sawada et al., 2020, Zhang et al.,
https://www.sciencedirect.com/science/article/pii/S1569843222001546 4/34
10/21/22, 11:07 AM A deep learning multi-layer perceptron and remote sensing approach for soil health based crop yield estimation - ScienceD…
2021). Most DLMLP models used for crop yield prediction have been based on yield data from
different sources (Du and Zare, 2019). From an agricultural point of view, DLMLP has found
significant usage only in crop classification studies (Price, 1982). The remote sensing based
MLPs used for crop yield have been mainly based on the optical data derived vegetation
indices with little attention to soil health (Sun et al., 2019). The hyperspectral data helps
identify soil health parameters and crop phenology variations but suffers from less temporal
coverage and regular cloud-free data scarcity (Bose et al., 2016). Thus, hyperspectral data has
not been considered for this study and, therefore, is not discussed in this paper. This study
estimates soil health parameters from Microwave/Synthetic Aperture RADAR (SAR) and
Optical multispectral remotely sensed data from Sentinel-1 and Sentinel-2 satellites,
respectively. The estimated soil health parameters were used to develop a soil health based
Deep Learning Multi-Layer Perceptron (DLMLP) model for crop yield estimation. This study
aims to:
• Estimate the three important soil health parameters- soil moisture, EC and SOC using
different machine learning techniques and compare them based on the respective
accuracies.
• Utilising the estimated soil health parameters along with satellite data parameters for
wheat crop yield estimation with a DLMLP model and comparing it with other machine
learning techniques.
This study, as explained before, utilises different satellite and local soil parameters in simple
modelling techniques; therefore, simplicity in modelling has been kept a pivotal point in this
study. Various machine learning techniques used in this study are as follows-
https://www.sciencedirect.com/science/article/pii/S1569843222001546 5/34
10/21/22, 11:07 AM A deep learning multi-layer perceptron and remote sensing approach for soil health based crop yield estimation - ScienceD…
https://www.sciencedirect.com/science/article/pii/S1569843222001546 6/34
10/21/22, 11:07 AM A deep learning multi-layer perceptron and remote sensing approach for soil health based crop yield estimation - ScienceD…
(Yarotsky, 2017). For large datasets, linear kernel SVR or Stochastic Gradient Descent (SGD).
Since the linear SVR is faster and easier to implement, it was used in this study for soil health
parameter estimation and Radial Basis Function (RBF) for crop yield estimation owing to the
higher complexity and large data size for wheat crop yield estimation. For this study, the
optimization coefficient C was taken as 1, cache size as 200, and tolerance as 0.001 for soil
health parameter estimation. For estimation of wheat crop yield, the C value was taken as 1.5,
tolerance value as 0.01, and cache size as 1000.
3. The DLMLP
This study utilises the DLMLP model in two phases (along with other regression algorithms)
firstly in the estimation of soil moisture, soil salinity and SOC and after that in the estimation
of crop yield using the per-pixel soil health parameter values and satellite data parameters as
input. After careful experimentation, a DLMLP model with three hidden, three dropout and
two activation layers was built for soil health parameter estimation. For crop yield estimation,
six hidden layers (2 layers of 40 nodes and 3 layers of 90 nodes), six activation and four drop
out layers were used in the DLMLP model developed. This was due to the larger data size in
the crop yield estimation stage. This study uses Rectified Linear Unit (RELU) activation
function for both the soil health parameter estimation and the estimation of the wheat crop
yield, owing to the speed and simplicity of implementing the RELU activation function. The
crop yield estimator DLMLP model is as shown in Fig. 1.
https://www.sciencedirect.com/science/article/pii/S1569843222001546 7/34
10/21/22, 11:07 AM A deep learning multi-layer perceptron and remote sensing approach for soil health based crop yield estimation - ScienceD…
Fig. 2. (a) India States map showing the state of Punjab, (b) Punjab State with Rupnagar District
highlighted in RED, (c) Standard False Colour Composite of Rupnagar District, (d) Agricultural
fields of the study area growing wheat crop with all other Land Use/Land Cover features
masked out and (e) A field with wheat crop showing a pixel area for field data collection.
https://www.sciencedirect.com/science/article/pii/S1569843222001546 8/34
10/21/22, 11:07 AM A deep learning multi-layer perceptron and remote sensing approach for soil health based crop yield estimation - ScienceD…
4.2. Datasets
SAR data from the Sentinel-1 satellite and optical data from the Sentinel-2 satellite of the
European Space Agency (ESA) were used as the remote sensing dataset(s). Along with the
spaceborne remotely sensed data, field data were collected from the field with the help of a
FieldScout SMEC 300 soil sensor from Spectrum Technologies, USA. The sensor was used to
collect Volumetric Soil Moisture (Referred to as WC), temperature (in Celcius), and electrical
conductivity in mS/cm (referred to as EC) (for details, please refer to Section 5 of
Supplementary material). Moreover, soil samples collected from the field were sent to the soil
lab of the Indian Institute of Sugarcane Research (IISR) Lucknow to determine the Soil
Organic Carbon (SOC in %) and pH. Lastly, crop yield data were collected from the farming
community of Rupnagar for 2015–16, 2016–17, 2017–18, 2018–19, and 2019–20 seasons. The
details of the datasets used are mentioned in Table I.
5. Methodology
Before the actual modelling and analysis, it is crucial to pre-process and prepare the datasets.
Since this study utilises datasets from SAR, Optical and field, the data acquisition and pre-
processing steps are explained in detail.
5.1. The Sentinel-1 level-1 SAR data processing steps in detail are as follows
5.1.1. Split
Since SAR data acquisition in Sentinel-1 Interferometric Wide (IW) swath mode is carried out
in several sub-swaths, it is necessary to split the sub-swaths to delineate the study area. This is
carried out with the split operation (Tripathi et al., 2021).
5.1.2. Calibrate
The objective of the calibration operation is to provide pixel values that can be related directly
to the backscatter of the scene in the imagery. This operation makes the SAR imagery ready
for quantitative use. A calibration vector is provided as an annotation product to convert
digital pixel values to get backscatter values of σ0VV and σ0VH. The radiometric calibration is
applied by Equation (1) (Tripathi and Tiwari, 2020)-
(1)
Where, Values (i) = anyone of the backscatter coefficients or their DN values and Ai = one of βi,
σi. This study utilises sigma nought backscatter for VV and VH polarisations represented by
σ0VV and σ0VH obtained from calibration of Sentinel-1 SAR imagery.
5.1.3. Deburst
Each IW image has 3 swaths, each sub-swath has a series of bursts, and each sub-swath is
processed as an individual SLC image product. Images for all bursts and sub-swaths are
https://www.sciencedirect.com/science/article/pii/S1569843222001546 9/34
10/21/22, 11:07 AM A deep learning multi-layer perceptron and remote sensing approach for soil health based crop yield estimation - ScienceD…
resampled into a common pixel spacing grid in both range, and azimuth directions, called
burst synchronisation, which is achieved by the Deburst operation (Tripathi and Tiwari, 2020).
5.1.4. Multilooking
A multi-looking operation is applied to remove the SAR image's inherent speckle noise owing
to the different spatial resolutions in range and azimuth directions (Moreira et al., 1996). The
resultant imagery has reduced speckle noise and square pixels with 14 m spatial resolution.
This also improves the interpretability of the image.
(2)
For Sentinel-2, Band 3 is Near Infrared (NIR), and Band 8 is Short Wave Infrared (SWIR).
While NDSI is calculated as follows (Asfaw et al., 2018)-
(3)
Band 4 is RED, and Band 8 is NIR. Since different spectral bands of Sentinel-2 have different
spatial resolutions, all the 13 bands were resampled to 14 m spatial resolution.
5.3. Masking out land use/land cover features other than wheat crop fields
Since wheat is the main crop grown in the study area, it is sown in November and harvested
every year at the end of March. Rest all other crops are either plantations that are a permanent
feature or have different sowing and harvest periods. Hence a Normalised Differential
Vegetation Index (NDVI) map was prepared for February 2020 and April 2020. The value of
https://www.sciencedirect.com/science/article/pii/S1569843222001546 10/34
10/21/22, 11:07 AM A deep learning multi-layer perceptron and remote sensing approach for soil health based crop yield estimation - ScienceD…
NDVI also varies from −1 to 1, with a value close to 1 representing vegetation cover. The wheat
crop is present in the fields during February, and agricultural fields are fallow in April since
the wheat crop is harvested at the end of March. Therefore, to mask out all other LULC
features like rivers, built-up, plantations, and rocky terrain, the NDVI image of February was
subtracted from the NDVI image of April 2020. The prepared mask carried only the fields
where the wheat crop is grown in the study area. This made the soil health parameter analysis
of wheat crop growing fields easier.
https://www.sciencedirect.com/science/article/pii/S1569843222001546 11/34
10/21/22, 11:07 AM A deep learning multi-layer perceptron and remote sensing approach for soil health based crop yield estimation - ScienceD…
https://www.sciencedirect.com/science/article/pii/S1569843222001546 12/34
10/21/22, 11:07 AM A deep learning multi-layer perceptron and remote sensing approach for soil health based crop yield estimation - ScienceD…
https://www.sciencedirect.com/science/article/pii/S1569843222001546 13/34
10/21/22, 11:07 AM A deep learning multi-layer perceptron and remote sensing approach for soil health based crop yield estimation - ScienceD…
Wheat Acquired from sample survey among the farming community of – 2015–16,
crop Rupnagar and government data portal- 2016–17,
yield https://aps.dac.gov.in/APY/Public_Report1.aspx 2017–18,
data 2018–19,
2019–20
seasons
The Sentinel-2 optical data has 13 multispectral bands with different spatial resolutions
(Immitzer et al., 2016). Out of these, bands 3, 4, and 8 correspond to Near Infrared (NIR), Red,
and Shortwave Infrared (SWIR) (Drusch et al., 2012), which were layer stacked flowed by
clipping and mosaicing to get the study area imagery. After that, band rationing was done to
get NDMI and NDSI values to be used as parameters in other modelling processes (Goldstein
and Werner, 1998, Tripathi and Tiwari, 2021a, Tripathi and Tiwari, 2020). After that, an NDVI
mask was prepared to mask out all the non-wheat cropland LULC features from the study area
(Section 5.3) (Tripathi and Tiwari, 2021a). A detailed methodology diagram is shown in Fig. 4.
https://www.sciencedirect.com/science/article/pii/S1569843222001546 14/34
10/21/22, 11:07 AM A deep learning multi-layer perceptron and remote sensing approach for soil health based crop yield estimation - ScienceD…
Since C-band SAR data of Sentinel-1 has limited penetration through vegetation (Varghese et
al., 2016) and since this study is focused on soil, the soil health parameters of the early stage of
wheat crop growth were estimated (November 2019 to January 2020) initially. Based on these
parameters, a yield estimation was carried out for the wheat crop for 2020. After that, using the
relationship between satellite and soil health parameters, from the satellite data (both SAR and
Optical) of 2015–16 till 2018–19, the per pixel soil health parameters were estimated. These
estimated soil health parameters were then used as input along with the crop yield data for
crop yield estimation of the 2015–16 till 2018–19 wheat crop seasons. In this study, Random
Forest (RF) regression, Ordinary Least Squares (OLS), Decision Tree (DT) regression, Ridge
regression, Support Vector Regression (SVR), K-means Nearest Neighbourhood (KNN)
regression models were used and compared with DLMLP model. For details, please refer to
section 3 of the Supplementary material.
Soil health parameters were estimated based on the satellite and field soil data during the
early stages of wheat crop growth. Since satellite and field data were available for 2019–20,
highly accurate estimations were carried out for soil health parameters for this season.
From the collected data, 70 % of data was used for model training and 30 % for testing and
validation for soil health and wheat crop yield estimation for the 2019–20 season. Once a
relationship between satellite data parameters and soil health parameters was established, the
trained model (with the highest accuracy and least RMSE and MAE) was used with available
satellite parameters for estimation of soil health parameters for preceding seasons- 2015–16,
2016–17, 2017–18, and 2018–19. These estimations could not be validated quantitatively due to
the non-availability of the field data of the soil parameters for these years. Therefore, from the
moisture and salinity indices generated from the satellite data, a qualitative trend comparison
was made from 2015–16 to 2018–19. The estimation of the three crucial soil health parameters
has been mentioned in sections 4 & 5 of the Supplementary material file. Further details on
the methodology for assessing soil health can be found in Tripathi and Tiwari, 2021a, Tripathi
and Tiwari, 2021b.
Initially, the yield estimation was done for 2019–20 (in tonnes/hectare) by using all the
algorithms discussed in Section 2 and the DLMLP. The results were then compared for
accuracy based on the R2, MAE and RMSE values. After that, using the satellite data
parameters, WC, EC, and SOC were estimated per pixel for the 2015–16 season onwards till
2018–19 (Tripathi and Tiwari, 2020, Tripathi and Tiwari, 2021b). The yield data were collected
from the 2015–16 season till 2019–20, as mentioned in Table 1; hence a yield assessment was
made for all the years using satellite data parameters, soil health parameters (calculated from
satellite data), and collected yield data with the soil health based DLMLP model. The yield
data from the farming community was collected and verified from multiple farmers per
location using a well-defined questionnaire. Moreover, the crop yield data given by the
farming community for the 2015–16 season till the 2018–19 season was close to the
government figures as available on the government data portal (Table 1). The other algorithms
https://www.sciencedirect.com/science/article/pii/S1569843222001546 15/34
10/21/22, 11:07 AM A deep learning multi-layer perceptron and remote sensing approach for soil health based crop yield estimation - ScienceD…
were also used in crop yield estimation for comparison with the DLMLP. Using this, yield
maps were prepared for wheat crop for 2015–16, 2016–17, 2017–18, 2018–19, and 2019–20 as
shown in Fig. 8.
The correlation between EC, soil moisture and SOC with crop yield (in tonnes/hectare) is
shown in Fig. 4-
Fig. 5 shows a high correlation of the three soil health parameters with crop yield, with R2
values of 0.79, 0.88 and 0.51 for soil moisture, EC, and SOC, respectively. This gives a sufficient
statistical basis for choosing these three parameters for crop yield estimation in this study.
Fig. 5. Correlation plots between soil health parameters and crop yield for- (a) EC, (b) Soil
Moisture and (c) SOC.
6.1. Results
The results of the soil health parameters – WC, EC, and SOC, based on the different machine
learning models' accuracies have been compared in this section. The detailed results of the
estimation of soil health parameters have been mentioned under Section 5 of the
https://www.sciencedirect.com/science/article/pii/S1569843222001546 16/34
10/21/22, 11:07 AM A deep learning multi-layer perceptron and remote sensing approach for soil health based crop yield estimation - ScienceD…
supplementary material (Tripathi and Tiwari, 2020, Tripathi and Tiwari, 2021a, Tripathi and
Tiwari, 2021b).
WC, EC, and SOC maps for the entire study region were prepared using per-pixel inputs for
2020. After that, from the satellite data derived parameters, WC, EC, and SOC were estimated
per pixel for previous years for November, December, and January months. Section 8 of the
Supplementary material mentions that OLS performed better than other machine learning
algorithms for soil moisture and soil salinity estimation with > 90 % and low values of RMSE
and MAE. Hence OLS was used for per pixel soil moisture (Tripathi and Tiwari, 2020) and soil
salinity (Tripathi and Tiwari, 2021a) estimation. Since both OLS and RF performed equally
well for SOC estimation, with OLS giving higher R2-statistics, OLS was used for per pixel SOC
estimation (Tripathi and Tiwari, 2021b). The estimated soil moisture, EC, and SOC for
Rupnagar are shown in Fig. 6.
Fig. 6. Estimated per-pixel soil health parameter maps of Rupnagar for (a) Volumetric Soil
Moisture, (b) Electrical Conductivity and (c) Soil Organic Carbon (SOC).
From the satellite data parameters of both SAR and optical satellite datasets and soil health
parameters (per pixel for 2019–20 in wheat crop growing season), along with the local data
collected from the farming community of Rupnagar, yield estimation modelling was carried
out for 2015–16 wheat crop season till 2019–20. The accuracies of various Machine Learning
Models for wheat crop yield estimation for different years are shown in Table 2.
Table 2. Accuracies of different machine learning regressors for crop yield estimation.
https://www.sciencedirect.com/science/article/pii/S1569843222001546 17/34
10/21/22, 11:07 AM A deep learning multi-layer perceptron and remote sensing approach for soil health based crop yield estimation - ScienceD…
OLS
KNN
RIDGE
RF
SVR
https://www.sciencedirect.com/science/article/pii/S1569843222001546 18/34
10/21/22, 11:07 AM A deep learning multi-layer perceptron and remote sensing approach for soil health based crop yield estimation - ScienceD…
OLS
DT
From Table 2 it is observed that all the techniques applied performed alike with less than 50 %
test R2 value for wheat crop yield estimation. The OLS regressor performed slightly better
with a test R2 of 0.481 in wheat crop yield estimation for the 2019–20 season. In Table 3, from
the results of the DLMLP, it is observed that for five consecutive wheat crop growing seasons,
it was observed that the DLMLP model had the highest accuracy. The DLMLP model gave R2-
statistics of 0.723 and 0.684 in the training and testing phases for 2019–20, thereby showing an
increase of 42.2 % in test R2 compared to the OLS results. Also, the DLMLP showed a decrease
of 37.97 % and 38.61 % in MAE and RMSE values compared to the OLS regressor results.
Further, from Table 3, it is also inferred that the DLMLP gave > 60 % accuracy for all the
previous wheat crop growing seasons from 2015–16 till 2018–19.
Feature Importance (FI) score was also calculated for all the predictor variables used in the
DLMLP model for yield estimation for 2019–20, as shown in Fig. 7.
https://www.sciencedirect.com/science/article/pii/S1569843222001546 19/34
10/21/22, 11:07 AM A deep learning multi-layer perceptron and remote sensing approach for soil health based crop yield estimation - ScienceD…
Fig. 7. FI Score bar chart for DLMLP model in wheat crop yield estimation.
https://www.sciencedirect.com/science/article/pii/S1569843222001546 20/34
10/21/22, 11:07 AM A deep learning multi-layer perceptron and remote sensing approach for soil health based crop yield estimation - ScienceD…
Fig. 8. Estimated Wheat Crop Yield Maps for (a) 2015–16, (b) 2016–17, (c) 2017–18, (d)2018–19
and (e) 2019–20, using the DLMLP model.
Fig. 7 shows that the satellite data parameters have 60 % weightage in the estimation of crop
yield, and soil health parameters have 40 % weightage confirming the significance of both
data in crop yield estimation. Similarly, the yield maps for various years were prepared as
shown in Fig. 8.
6.2. Discussion
Crop yield estimation has always been a crucial exercise for developing countries like India,
having a significant chunk of their labour force engaged in agriculture and allied sectors
(Ferencz et al., 2004). Still, there are areas with problems like hunger, food shortage, and crop
failure (Van Wart et al., 2013). Amidst all this, regular crop and soil health monitoring has
become extremely important. There have been many studies conducted on crop health and
crop monitoring using remote sensing (Jongeneel and Gonzalez-Martinez, 2020). However,
dedicated soil health studies using remotely sensed data are still less (Karlen et al., 2019).
It is inferred from this study that, for soil health estimation studies, the simple regression
techniques performed better than the DLMLP due to the smaller dataset sizes used, since
DLMLP better serves moderate to large datasets (Nevavuori et al., 2019). While with crop yield
estimation, DLMLP performed better than other simple regressions because of the large
datasets. The DLMLP, as explained before, is a non-linear black-box approach with massively
parallel information distribution and processing system. For crop yield estimation, satellite
data parameters, soil health parameters, and yield data were taken as input layers. There were
five layers as a hidden layer, and the final crop yield was estimated as output in the DLMLP
mechanism.
For assessment of the model accuracy, apart from R2-statistics, RMSE and MAE were also
used. R2-statistics is an indicator of model fitness or how close are the predicted values to the
observed values (Nevavuori et al., 2019), while RMSE is the square residual variance. RMSE is
explained as the standard deviation value of the unexplained variances in a model (Ismail et
al., 2011). MAE is a measurement of the error occurring between paired observations that
express the same phenomenon, and its low value indicates a better model performance
(Huang and Kuo, 2019, Sun et al., 2019). The high R2-statistics and low values of RMSE and
MAE for the DLMLP model applied for crop yield estimation for 2019–20 indicate the
goodness of fit and accurate model performance. The accuracy is lower for preceding years
from 2015–16 to 2019–20, and RMSE and MAE values are higher. This is because the estimated
per pixel soil health parameters could not be validated due to the absence of field soil
parameters. However, more than fifty per cent R2 values in these years (Please Refer to Table 3)
indicate the estimates were acceptable and models performed satisfactorily. The techniques
used in this study have been kept simple, easy to understand and implement so that they
https://www.sciencedirect.com/science/article/pii/S1569843222001546 21/34
10/21/22, 11:07 AM A deep learning multi-layer perceptron and remote sensing approach for soil health based crop yield estimation - ScienceD…
could be applied widely for precision agriculture purposes for soil health parameters and crop
yield estimation.
7. Conclusions
This study gives promising results in estimating the soil health parameter and the soil health-
based wheat crop yield estimation. This study concluded and proved that soil health
parameters affect crop yield, and healthy soil is crucial for better crop yields. With accurate
yield data and synchronised field data collection, the accuracy of the proposed DLMLP model
can be improved further. The study paves the way for further research with longer wavelength
SAR datasets to make a temporal soil health assessment possible. Such remote sensing-based
soil-health studies are highly beneficial for scientists in suggesting ways and measures to
maintain optimum levels of different soil health parameters. This paper highlights the
sensitivity of C-band SAR data in estimating soil moisture and electrical conductivity while
using them and top vegetation cover backscatter from maturing crops for SOC estimation,
along with the optical remotely sensed data parameters. This study paves the way for more
stress on crop yield estimation using soil parameters with remotely sensed data rather than
NDVI and biomass. As a first-of-its-kind approach, the study, though relying on the early-
stage soil parameters, estimates the wheat crop yield with high accuracy. However, the study
has significant takeaways in terms of its novelty which include-
• The technique and findings can be helpful for mitigation and early resolution of any
change in soil health parameters that may cause low crop yields later.
• The study is cost-effective as it uses freely available satellite data from Sentinel-1 and
Sentinel-2 and uses parameters derived from both synergistically for modelling.
• It is a first-of-its-kind study that emphasises and shows the importance of soil health
parameters in crop yield.
• A novel soil health parameter based DLMLP model that utilises remotely sensed optical
and SAR data along with the field soil parameters for crop yield estimation with sufficient
accuracy.
However, since the previous year's field values (2015–16 to 2018–19) of soil health parameters
were unavailable, the estimated per pixel of soil health parameters from satellite data could
not be validated. This led to lesser accuracy in yield estimation than the 2019–20 crop yield
estimates. However, as mentioned before, the model performed satisfactorily with > 0.6 values
of R2.
https://www.sciencedirect.com/science/article/pii/S1569843222001546 22/34
10/21/22, 11:07 AM A deep learning multi-layer perceptron and remote sensing approach for soil health based crop yield estimation - ScienceD…
To improve the accuracy of yield estimation, longer wavelength SAR datasets (L and S bands)
can be utilised for regular study of soil health parameters and their assessment due to their
higher penetration abilities for the entire crop season. Moreover, systematic data collection
from the ground level can be carried out from cultivators and landowners to get better and
more accurate yield data. This shall improve yield estimation accuracy and ensure more food
security, better returns for farmers, resolution of any soil health problems, crop diseases, and
identification of actual beneficiaries for crop insurance in case of crop failures or low yields.
Acknowledgement
The study was supported by the Department of Civil Engineering, Indian Institute of
Technology (IIT) Ropar, European Space Agency, King Fahd University of Petroleum and
Minerals, Kingdom of Saudi Arabia (K.S.A), Alaska Satellite Facility (ASF) and the SNAP
software team.
Supplementary data 1.
Recommended articles
References
Abraham et al., 2020 S. Abraham, C. Huynh, H. Vu
Classification of Soils into Hydrologic Groups Using Machine Learning
Data, 5 (1) (2020), p. 2
View Record in Scopus Google Scholar
Mapping and Modelling Spatial Variation in Soil Salinity in the Al Hassa Oasis Based
on Remote Sensing Indicators and Regression Techniques
Remote Sensing, 6 (2) (2014), pp. 1137-1157
CrossRef View Record in Scopus Google Scholar
https://www.sciencedirect.com/science/article/pii/S1569843222001546 24/34
10/21/22, 11:07 AM A deep learning multi-layer perceptron and remote sensing approach for soil health based crop yield estimation - ScienceD…
Drusch et al., 2012 M. Drusch, U. Del Bello, S. Carlier, O. Colin, V. Fernandez, F. Gascon, B.
Hoersch, C. Isola, P. Laberinti, P. Martimort, A. Meygret, F. Spoto, O. Sy, F. Marchese, P.
Bargellini
Sentinel-2: ESA's Optical High-Resolution Mission for GMES Operational Services
Remote Sens. Environ., 120 (2012), pp. 25-36
Article Download PDF Google Scholar
Ferencz et al., 2004 C.s. Ferencz, P. Bognár, J. Lichtenberger, D. Hamar, G.y. Tarcsai†, G.
Timár, G. Molnár, S.Z. Pásztor, P. Steinbach, B. Székely, O.E. Ferencz, I. Ferencz-Árkos
https://www.sciencedirect.com/science/article/pii/S1569843222001546 25/34
10/21/22, 11:07 AM A deep learning multi-layer perceptron and remote sensing approach for soil health based crop yield estimation - ScienceD…
https://www.sciencedirect.com/science/article/pii/S1569843222001546 26/34
10/21/22, 11:07 AM A deep learning multi-layer perceptron and remote sensing approach for soil health based crop yield estimation - ScienceD…
First Experience with Sentinel-2 Data for Crop and Tree Species Classifications in
Central Europe
Remote Sensing, 8 (3) (2016), p. 166
CrossRef View Record in Scopus Google Scholar
Karlen et al., 2019 D.L. Karlen, K.S. Veum, K.A. Sudduth, J.F. Obrycki, M.R. Nunes
Soil health assessment: Past accomplishments, current activities, and future
opportunities
Soil Tillage Res., 195 (2019), Article 104365, 10.1016/j.still.2019.104365
Article Download PDF View Record in Scopus Google Scholar
Kim et al., 2020 N. Kim, S.-I. Na, C.-W. Park, M. Huh, J. Oh, K.-J. Ha, J. Cho, Y.-W. Lee
An Artificial Intelligence Approach to Prediction of Corn Yields under Extreme
Weather Conditions Using Satellite and Meteorological Data
Appl. Sci., 10 (11) (2020), p. 3785
https://www.sciencedirect.com/science/article/pii/S1569843222001546 27/34
10/21/22, 11:07 AM A deep learning multi-layer perceptron and remote sensing approach for soil health based crop yield estimation - ScienceD…
Lesmes and Friedman, 2005 Lesmes, D.P., Friedman, S.P., 2005. Relationships between the
Electrical and Hydrogeological Properties of Rocks and Soils BT - Hydrogeophysics.
In: Rubin, Y., Hubbard, S.S. (Eds.), Springer Netherlands, Dordrecht, pp. 87–128.
https://doi.org/10.1007/1-4020-3102-5_4.
Google Scholar
Mulder et al., 2011 V.L. Mulder, S. de Bruin, M.E. Schaepman, T.R. Mayr
The use of remote sensing in soil and terrain mapping — A review
Geoderma, 162 (1) (2011), pp. 1-19, 10.1016/j.geoderma.2010.12.018
Article Download PDF View Record in Scopus Google Scholar
https://www.sciencedirect.com/science/article/pii/S1569843222001546 29/34
10/21/22, 11:07 AM A deep learning multi-layer perceptron and remote sensing approach for soil health based crop yield estimation - ScienceD…
Sagan et al., 2020 V. Sagan, K.T. Peterson, M. Maimaitijiang, P. Sidike, J. Sloan, B.A. Greeling,
S. Maalouf, C. Adams
Monitoring inland water quality using remote sensing: potential and limitations of
spectral indices, bio-optical simulations, machine learning, and cloud computing
Earth Sci. Rev., 205 (2020), p. 103187
Article Download PDF View Record in Scopus Google Scholar
Sarwar et al., 2010 N. Sarwar, Saifullah, S.S. Malhi, M.H. Zia, A. Naeem, S. Bibi, G. Farid
Role of mineral nutrition in minimising cadmium accumulation by plants
J. Sci. Food Agric., 90 (6) (2010), pp. 925-937, 10.1002/jsfa.3916
View Record in Scopus Google Scholar
Scott et al., 1996 N.A. Scott, C.V. Cole, E.T. Elliott, S.A. Huffman
Soil Textural Control on Decomposition and Soil Organic Matter Dynamics
Soil Sci. Soc. Am. J., 60 (4) (1996), pp. 1102-1109,
10.2136/sssaj1996.03615995006000040020x
View Record in Scopus Google Scholar
Thenkabail et al., 2013 P.S. Thenkabail, I. Mariotto, M.K. Gumma, E.M. Middleton, D.R.
Landis, K.F. Huemmrich
Selection of Hyperspectral Narrowbands (HNBs) and Composition of Hyperspectral
Twoband Vegetation Indices (HVIs) for Biophysical Characterisation and
Discrimination of Crop Types Using Field Reflectance and Hyperion/EO-1 Data
https://www.sciencedirect.com/science/article/pii/S1569843222001546 31/34
10/21/22, 11:07 AM A deep learning multi-layer perceptron and remote sensing approach for soil health based crop yield estimation - ScienceD…
IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens., 6 (2) (2013), pp. 427-439,
10.1109/JSTARS.2013.2252601
View Record in Scopus Google Scholar
Van Der Heijden et al., 2008 M.G.A. Van Der Heijden, R.D. Bardgett, N.M. Van Straalen
The unseen majority: soil microbes as drivers of plant diversity and productivity in
terrestrial ecosystems
Ecol. Lett., 11 (3) (2008), pp. 296-310, 10.1111/j.1461-0248.2007.01139.x
View Record in Scopus Google Scholar
Van Wart et al., 2013 J. Van Wart, K.C. Kersebaum, S. Peng, M. Milner, K.G. Cassman
Estimating crop yield potential at regional to national scales
Field Crops Res., 143 (2013), pp. 34-43, 10.1016/j.fcr.2012.11.018
Article Download PDF View Record in Scopus Google Scholar
https://www.sciencedirect.com/science/article/pii/S1569843222001546 32/34
10/21/22, 11:07 AM A deep learning multi-layer perceptron and remote sensing approach for soil health based crop yield estimation - ScienceD…
Wright et al., 2011 S.J. Wright, J.B. Yavitt, N. Wurzburger, B.L. Turner, E.V.J. Tanner, E.J. Sayer,
L.S. Santiago, M. Kaspari, L.O. Hedin, K.E. Harms, M.N. Garcia, M.D. Corre
Potassium, phosphorus, or nitrogen limit root allocation, tree growth, or litter
production in a lowland tropical forest
Ecology, 92 (8) (2011), pp. 1616-1625
CrossRef View Record in Scopus Google Scholar
https://www.sciencedirect.com/science/article/pii/S1569843222001546 33/34
10/21/22, 11:07 AM A deep learning multi-layer perceptron and remote sensing approach for soil health based crop yield estimation - ScienceD…
Improving the impervious surface estimation with combined use of optical and SAR
remote sensing images
Remote Sens. Environ., 141 (2014), pp. 155-167, 10.1016/j.rse.2013.10.028
Article Download PDF View Record in Scopus Google Scholar
Zhu et al., 2022 J. Zhu, J. Qin, F. Yin, Z. Ren, J. Qi, J. Zhang, R. Wang
An APMLP Deep Learning Model for Bathymetry Retrieval Using Adjacent Pixels
IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens., 15 (2022), pp. 235-246,
10.1109/JSTARS.2021.3134013
View Record in Scopus Google Scholar
Cited by (0)
https://www.sciencedirect.com/science/article/pii/S1569843222001546 34/34