04 - RA - Crop Prediction Using Machine Learning
04 - RA - Crop Prediction Using Machine Learning
To cite this article: Madhuri Shripathi Rao et al 2022 J. Phys.: Conf. Ser. 2161 012033 - A HYBRID APPROACH FOR CROP
YIELD PREDICTION USING MACHINE
LEARNING AND DEEP LEARNING
ALGORITHMS
Sonal Agarwal and Sandhya Tarar
View the article online for updates and enhancements. - Survey on Internet of Things and its
Application in Agriculture
Kanderp Narayan Mishra, Shishir Kumar
and Nileshkumar R. Patel
Madhuri Shripathi Rao1, Arushi Singh1, N.V. Subba Reddy1 and Dinesh U
Acharya1
1
Department of Computer Science and Engineering, Manipal Institute of Technology,
Manipal, 576104, Udupi district Karnataka, India
Abstract: For most developing countries, agriculture is their primary source of revenue.
Modern agriculture is a constantly growing approach for agricultural advances and farming
techniques. It becomes challenging for the farmers to satisfy our planet's evolving requirements
and the expectations of merchants, customers, etc. Some of the challenges the farmers face are-
(i) Dealing with climatic changes because of soil erosion and industry emissions (ii) Nutrient
deficiency in the soil, caused by a shortage of crucial minerals such as potassium, nitrogen, and
phosphorus can result in reduced crop growth. (iii) Farmers make a mistake by cultivating the
same crops year after year without experimenting with different varieties. They add fertilizers
randomly without understanding the inferior quality or quantity. The paper aims to discover the
best model for crop prediction, which can help farmers decide the type of crop to grow based
on the climatic conditions and nutrients present in the soil. This paper compares popular
algorithms such as K-Nearest Neighbor (KNN), Decision Tree, and Random Forest Classifier
using two different criterions Gini and Entropy. Results reveal that Random Forest gives the
highest accuracy among the three.
1. Introduction
Machine learning is a valuable decision-making tool for predicting agricultural yields and deciding the
type of crops to sow and things to do during the crop growing season. In order to aid crop prediction
studies, several machine learning methods have been used.
Machine learning techniques are utilized in various sectors, from evaluating customer behavior in
supermarkets to predicting customer phone usage. For some years, agriculture has been using machine
learning techniques. Crop prediction is one of agriculture's complex challenges, and several models
have been developed and proven so far. Because crop production is affected by many factors such as
atmospheric conditions, type of fertilizer, soil, and seed, this challenge necessitates using several
datasets. This implies that predicting agricultural productivity is not a simple process; rather, it entails
a series of complicated procedures. Crop yield prediction methods can now reasonably approximate
the actual yield, although more excellent yield prediction performance is still desired.
The project aims to compare various supervised learning algorithms like KNN, Decision Tree, and
Random Forest on the dataset containing 22 varieties of crops. For the Decision Tree and Random
Forest Classifier, the model’s performance is calculated under two criterions- Entropy and Gini Index.
The results reveal that the suggested machine learning technique's effectiveness is compared to the
best accuracy with precision, recall, and F1 Score.
2. Literature Survey
Content from this work may be used under the terms of the Creative Commons Attribution 3.0 licence. Any further distribution
of this work must maintain attribution to the author(s) and the title of the work, journal citation and DOI.
Published under licence by IOP Publishing Ltd 1
AICECS 2021 IOP Publishing
Journal of Physics: Conference Series 2161 (2022) 012033 doi:10.1088/1742-6596/2161/1/012033
Given the significance of crop prediction, numerous suggestions have been proposed in the past with
the goal of improving crop prediction accuracy. In this paper feed-forward back propagation Artificial
Neural Network methodology has been approached to model and forecast various crop yields at rural
areas based on parameters of soil(PH, nitrogen, potassium, etc.) and parameters related to the
atmosphere (rainfall, humidity, etc.) [1].
This paper looks at five of Tamil Nadu's most important crops- rice, maize, ragi, sugarcane, and
tapioca during a five-year period beginning in 2005. [2]. In order to get the maximum crop
productivity, various factors such as rainfall, groundwater, and cultivation area, and soil type were
used in the analysis. K-Means technique was used for the clustering, and for the classification, the
study looked at three different types of algorithms: fuzzy, KNN, and Modified KNN. After the
analysis, MKNN gave the best prediction result of the three algorithms.
An application for farmers can be created that will aid in the reduction of many problems in the
agriculture sector [3]. In this application, farmers perform single/multiple testing by providing input
such as crop name, season, and location. As soon as one provides the input, the user can choose a
method and mine the outputs. The outputs will show you the crop's yield rate. The findings of the
previous year's data are included in the datasets and transformed into a supported format. The machine
learning models used are Naïve Bayes and KNN.
To create the dataset, information about crops over the previous ten years was gathered from a
variety of sources, such as government websites. An IoT device was setup to collect the atmospheric
data using the components like Soil sensors, Dht11 sensor for humidity and temperature, and Arduino
Uno with Atmega as a processor. Naive Bayes, a supervised learning algorithm obtaining an accuracy
of 97% was further improved by using boosting algorithm, which makes use of weak rule by an
iterative process to bring higher accuracy [5]. To anticipate the yield, the study employs advanced
regression techniques such as ENet, Kernel Ridge, and Lasso algorithms [4]. The three regression
techniques are improved by using Stacking Regression for better prediction.
However, when a comparison study is conducted between the existing system and the proposed
system employing Naive Bayes and Random Forest, respectively. The proposed system comes out on
top. Because it is a bagging method, the random forest algorithm has a high accuracy level, but the
Naïve Bayes classifier’s accuracy level is lower as the algorithm is probability based. [6].
This paper contributes to the following aspects- (a) Crop production prediction utilizing a range of
Machine Learning approaches and a comparison of error rate and accuracy for certain regions. (b) An
easy-to-use mobile app that recommends the most gainful crop. (c) A GPS-based location identifier
that can be used to obtain rainfall estimates for a specific location. (d) A system that recommends the
prime time to apply fertilizers [7]. On the given datasets from Karnataka and Maharashtra, different
machine learning algorithms such as KNN, SVM, MLR, Random Forest, and ANN were deployed and
assessed for yield to accuracy [9]. The accuracy of the above algorithms is compared. The results
show that Decision Tree is the most accurate of the standard algorithms used on the given datasets,
with a 99.87% accuracy rate.
Regression Analysis is applied to determine the relationship between the three factors: Area Under
Cultivation, Food Price Index, and Annual Rainfall and their impact on crop yield. The above three
factors are taken as independent variables, and for the dependent variable, crop yield is taken into
consideration. The R2 obtained after the implementation of RA shows these three factors showed slight
differences indicating their impact on the crop yield [8].
In the proposed paper, the dataset is collected from the government websites such as APMC
website, VC Farm Mandya, which contains data related to climatic conditions and soil nutrients [10].
Two machine learning models were used; the model was trained using the Support Vector Machine
model with Radial Basis Function kernel for rainfall prediction and Decision Tree for the crop
prediction.
A comparative study of various machine learning can be applied on a dataset with a view to
determine the best performing methodology. The prediction is found by applying the Regression
Based Techniques such as Linear, Random Forest, Decision Tree, Gradient Boosting, Polynomial and
2
AICECS 2021 IOP Publishing
Journal of Physics: Conference Series 2161 (2022) 012033 doi:10.1088/1742-6596/2161/1/012033
Ridge on the dataset containing details about the types of crops, different states, and climatic
conditions under different seasons [11]. The parameters used to estimate the efficiency of these
techniques were mean absolute error, root mean square error, mean squared error, R-square, and cross
validation. Gradient Boosting gave the best accuracy- 87.9% for the target variable ‘Yield’ and
Random Forest- 98.9% gave the best accuracy for the target value ‘Production’.
The DHT22 sensor is recommended for monitoring live temperature and humidity [12]. The
surrounding air is measured with a thermistor and a capacitive humidity sensor and outputs a digital
signal on the data pin to the Arduino Uno port pin. The humidity value ranges from 0-100% RH and -
40 to 80 degrees Celsius to read the temperature. The above two parameters and soil characteristics are
considered as input to three different machine learning models: Support Vector Machine, Decision
Tree, and KNN. The Decision Tree gave better accuracy results.
Table 1. Approach to Crop Prediction
3. Proposed Work
3.1.1. KNN Algorithm. Step 1: K, i.e., the number of neighbors is selected. The primary deciding
factor is the number of neighbors.
Step 2: Using distance measures, determine the distance between two points like Euclidean distance
3
AICECS 2021 IOP Publishing
Journal of Physics: Conference Series 2161 (2022) 012033 doi:10.1088/1742-6596/2161/1/012033
, ∑
Step 3: K nearest neighbors are taken into account according to the calculated Euclidean distance.
Step 4: Figure the number of data points in each class surrounded by these K neighbors.
Step 5: The class with the highest number of neighbors is assigned to the new data points.
Step 6: The label is voted on, and the model is ready.
3.2.1. Decision Tree Algorithm. Step 1: Starting with the root node of the tree, which consists of the
entire dataset, says S.
Step 2: The most appropriate attribute is obtained from the dataset by applying the Attribute Selection
Measure (ASM).
Step 3: The S is divided into subdivisions that enclose feasible values for the most appropriate
attributes.
Step 4: The node is formed in the decision tree with the most appropriate attribute.
attribute.
Step 5: The tree formation is setup by iteratively repeating this method for each child until one of the
following requirements is met:
• The tuples are entirely correlated with the same attribute value.
• There are no further attributes accessible.
• There aren't any more instances.
3.2.2. Steps to Split. The datasetet used in the project has numerical values. The decision tree works
with the numerical values in the following ways:
Step 1: Sorting all the values.
Step 2: It will consider a threshold
hreshold value from the feature values.
Step 3: Feature value will split into two parts such that the left node contains feature values less than a
threshold value, and the right node contains feature values greater than a threshold value.
Step 4: The next feature value will consider as a threshold value and again create the same split as
Step 3.
4
AICECS 2021 IOP Publishing
Journal of Physics: Conference Series 2161 (2022) 012033 doi:10.1088/1742-6596/2161/1/012033
Step 5: Entropy/Gini
opy/Gini and Information Gain are calculated of each split, and from the two splits,
splits the
split with better information gain is considered.
∑ #
"
#
Where pi denotes the number of yes values; ni denotes the number of no values for that particular
attribute; p and n are the numberss of yeses and noes of the entire sample, respectively.
$ ! "
Where Entropy(S) denotes entropy of sample S; I(Attribute) denotes Average Information of the
particular attribute.
Step 6: Repeat from Step 2 to Step 5. In this way,
way it will get branches for the decision tree.
3.2.3. Entropy and Gini Index. The criteria for measuring Information Gain are the Gini index and
Entropy. Information gain iss a measurement of how much information is gained about an attribute
att and
the reduction in entropy. Entropy and Gini Index are the metrics that measure the impurity of the
nodes.
es. A node is considered as impure if it has multiple classes else,
else it is considered as pure.
Entropy is a metric that gives the degree of impurity in a specified attribute. The following
formula can be used to compute entropy:
! % & % % &2 %
! ( &
Where S denotes complete sample, P(yes) denotes the probability of yes; P(no) denotes the probability
of no.
Gini Index: Gini is estimated by deducting the sum of squared probabilities of each class from
one. The lower Gini Index value is preferred rather than a higher value. Scikit-learn
Scikit learn takes “Gini” as
the default value and supports “Gini” criteria for Gini Index.
-
$ ) 1 (+% ,
5
AICECS 2021 IOP Publishing
Journal of Physics: Conference Series 2161 (2022) 012033 doi:10.1088/1742-6596/2161/1/012033
3.3.1. Random Forest Algorithm. Step 1: K instances are chosen at random from the given training
dataset.
Step 2: Decision trees are created for the chosen instances.
Step 3: The N is selected for the number of estimators to be created.
Step 4: Step 1 & Step 2 is repeated.
Step 5: For the new instance, the predictions of each estimator is determined, and the category with the
highest vote is assigned.
4. Methodology
6
AICECS 2021 IOP Publishing
Journal of Physics: Conference Series 2161 (2022) 012033 doi:10.1088/1742-6596/2161/1/012033
4.8. Accuracy
7
AICECS 2021 IOP Publishing
Journal of Physics: Conference Series 2161 (2022) 012033 doi:10.1088/1742-6596/2161/1/012033
The number of correct predictions divided by the total number of predictions is known as model
accuracy. The accuracy
ccuracy of the model
mod is calculated using accuracy_score()
score() method of scikit learn
metrics module.
.% + .1
" =
.% + .1 + /% + /1
5. Results
Figure 5. KNN
Prediction-
comparison of
predicted values and
actual values
As observed from figure 5, the points along the straight line depict the correct predictions,
predictions and the
points outside the straight line are wrong predictions. For example, the actual value should be pigeon
peas, but it has been predicted as blackgram,
blackgram which is a false prediction.
8
AICECS 2021 IOP Publishing
Journal of Physics: Conference Series 2161 (2022) 012033 doi:10.1088/1742-6596/2161/1/012033
Figure 8. Random Forest Entropy Criterion Figure 9. Random Forest Gini Criterion Prediction-
Prediction-comparison
comparison of predicted values and actual comparison of predicted values and actual values
values
9
AICECS 2021 IOP Publishing
Journal of Physics: Conference Series 2161 (2022) 012033 doi:10.1088/1742-6596/2161/1/012033
References
[1] Dahikar S and Rode S V 2014 Agricultural crop yield prediction using artificial neural network
approach International Journal of Innovative Research in Electrical, Electronics,
Instrumentation and Control Engineering vol 2 Issue 1 pp 683-6.
[2] Suresh A, Ganesh P and Ramalatha M 2018 Prediction of major crop yields of Tamilnadu using
K-means and Modified KNN 2018 3rd International Conference on Communication and
Electronics Systems (ICCES) pp 88-93 doi: 10.1109/CESYS.2018.8723956.
[3] Medar R, Rajpurohit V S and Shweta S 2019 Crop yield prediction using machine learning
techniques IEEE 5th International Conference for Convergence in Technology (I2CT) pp 1-5
doi: 10.1109/I2CT45611.2019.9033611.
[4] Nishant P S, Venkat P S, Avinash B L and Jabber B 2020 Crop yield prediction based on Indian
agriculture using machine learning 2020 International Conference for Emerging Technology
(INCET) pp 1-4 doi: 10.1109/INCET49848.2020.9154036.
[5] Kalimuthu M, Vaishnavi P and Kishore M 2020 Crop prediction using machine learning 2020
Third International Conference on Smart Systems and Inventive Technology (ICSSIT) pp
926-32 doi: 10.1109/ICSSIT48917.2020.9214190.
[6] Geetha V, Punitha A, Abarna M, Akshaya M, Illakiya S and Janani A P 2020 An effective crop
prediction using random forest algorithm 2020 International Conference on System,
Computation, Automation and Networking (ICSCAN) pp 1-5 doi:
10.1109/ICSCAN49426.2020.9262311.
[7] Pande S M, Ramesh P K, Anmol A, Aishwaraya B R, Rohilla K and Shaurya K 2021 Crop
recommender system using machine learning approach 2021 5th International Conference
on Computing Methodologies and Communication (ICCMC) pp 1066-71 doi:
10.1109/ICCMC51019.2021.9418351.
[8] Sellam V, and Poovammal E 2016 Prediction of crop yield using regression analysis Indian
10
AICECS 2021 IOP Publishing
Journal of Physics: Conference Series 2161 (2022) 012033 doi:10.1088/1742-6596/2161/1/012033
11