0% found this document useful (0 votes)
34 views

Efficient Crop Yield Analysis Prediction in Modern Agriculture System Using Machine Learning Algorithm

Uploaded by

Waqas Ashraf
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
34 views

Efficient Crop Yield Analysis Prediction in Modern Agriculture System Using Machine Learning Algorithm

Uploaded by

Waqas Ashraf
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

2023 International Conferences on Data Science, Agents, and Artificial Intelligence

Efficient Crop yield Analysis Prediction in Modern


Agriculture System using Machine Learning
Algorithm
2023 International Conference on Data Science, Agents & Artificial Intelligence (ICDSAAI) | 979-8-3503-4891-0/23/$31.00 ©2023 IEEE | DOI: 10.1109/ICDSAAI59313.2023.10452646

N.Rajkumar
Computer Science and Engineering M.A.Mukunthan
Vel Tech Rangarajan Dr. Sagunthala R&D Institute of Science Computer Science and Engineering
and Technology Vel Tech Rangarajan Dr. Sagunthala R&D Institute of Science
Chennai, India, [email protected] and Technology
Chennai, India [email protected]

Abstract— In agriculture, yield estimation accuracy is Thanks to a method called machine learning, artificial
crucial. To increase production while decreasing operational intelligence (AI) systems may automatically learn from their
costs and environmental effects, remote sensing systems are experiences and improve over time. Algorithms and
increasingly being employed in developing decision-support techniques for machine learning focus on building computer
tools for modern agricultural systems. However, because RS-
systems that can swiftly retrieve data and make it available for
based technologies need to analyze vast quantities of remotely
sensed data from many platforms, more focus is being placed on use in improvement. Machine learning algorithms have been
machine learning techniques. This is because systems based on categorized into two main subcategories. supervised and
machine learning can handle non-linear tasks and process a unsupervised. Only supervised algorithms are used in our
huge number of inputs. To make decisions about which crops to project.
plant and what to do when they are in the growing season,
machine learning is an essential decision-making tool for Supervised algorithms require supervision by an
forecasting agricultural yields. Several other machine learning individual skilled in machine learning for both input and the
methods have been used, too. We are taking into account a few desired outcome. This algorithms are applied to the previously
factors that influence crop productivity. The variables include studied data in the past and then to new data. A machine
rainfall, PH, temperature, humidity, and nutrients. A machine learning specialist must oversee the input and the
learning model will be created utilising the data retrieved from desired output of supervised algorithms. This algorithm
these parameters in order to assess and forecast the ideal crop also searches for discrepancies between the output and
using machine learning. the correct, planned outcome and then modifies the
Keywords— Machine Learning, Crop yield, Agriculture,
model as appropriate.
Random Forest, Remotesensing. The predictions that arise from a sequence of feature-based
splits are represented by a flowchart that resembles a tree
I. INTRODUCTION structure in the decision tree method. The decision made by
India is mostly an agricultural country. Two thirds of the the leaves marks the end of it, which begins at the root node.
population depends on agriculture for their living. It acts as With decision trees, problems in classification and regression
the cornerstone of the nation's economy. A relatively big can both be resolved.
percentage of the population can also find work in agriculture. Based on the Bayes theorem and operating under the
Crop yield is one of the problem which farmers are facing assumption of predictor independence is the naive bayes
from many years. Crop production predictions in the past were algorithm. The Naive Bayes classifier, to put it simply,
dependent on the particular crops and cultivation expertise of believes that the existence of one feature in a class has no
the farmer. So, to solve this problem, we come up with a effect on the existence of any other characteristics.
method which involves collecting of existing data and based With support vector machines (SVM), every data point is
on that, we will estimate which crop gives the best yield using represented as a point in n-dimensional space, where n is the
Machine learning algorithms. For the best accuracy and crop number of features you have. The value of each feature is a
prediction, SVM, naive bayes, random forest, and decision specific coordinate. Finding the hyper-plane that best divides
tree are the four main machine learning algorithms that we are the two groups is the next step in the categorization process.
considering. We can consider some components like nitrogen, Support vectors are only one observation's coordinates
phosphorous, sulphur, humidity, rain fall and PH which will presented simply. The hyper-plane/line frontier used by the
be helpful for crop prediction. SVM classifier is the one that most effectively separates the
Our main aim is to analyse Crop Yield with two groups.
improved degree of accuracy using ma- chine Learning A well-liked strategy for problems with regression and
Algorithms, In this study, crop yield is predicted using classification is random forest, a supervised machine learning
supervised learning techniques. The Random Forest, Naive method. It constructs decision trees using many samples, use
Bayes, SVM, and Decision Tree algorithms will all be the average of the samples for regression and the majority of
employed to predict yield. the data for categorization. The Random Forest Algorithm's

979-8-3503-4891-0/23/$31.00 ©2023 IEEE

Authorized licensed use limited to: Institute of Space Technology. Downloaded on March 20,2024 at 09:10:28 UTC from IEEE Xplore. Restrictions apply.
capacity to handle data sets containing both continuous "The Applying machine learning for intelligent agriculture-
variables—used in regression—and categorical variables— based crop selection analysis" was introduced and put into
used in classification—is one of its primary features. For practise in 2019 [7] by R. Reda et al. It has a number of
categorization issues, it produces superior results. techniques, such as the random forest regressor, which aids in
making predictions with high accuracy even with a little
Our goal is to maximise crop productivity by utilising machine dataset.
learning algorithms that provide more precision by analysing
III. METHODOLOGY
many characteristics, including nutrients, humidity,
temperature, PH, and rainfall. A. Existing System

II. LITERATURE SURVEY The existing system deals with the contents like State
name, Cropname, Production, Cost and Year in which it was
“The crop yield prediction using data analysis and hybrid
cultivated. The crops which were included belong to only one
approach" was put into practise in 2018 by B. Zhu et al. This
season and most are repeated crops in the dataset and the
will address crop yield production by utilising a variety of
execution also in very less appropriation with repeated data.
existing data analysis [1].
"The Fuzzy logic based crop yield prediction using B. Proposed System
Temperature and Rainfall" was presented by E. Khosla et al.
The proposed system contains some internal factors which
in 2019 [2]. It comprises the water content and temperature
will helps in the growth of crop like potassium, sulphur,
increase of the earth for plant growth.
nitrogen, humidity and water level. Execution is done with
In 2021, "The Hybrid prediction strategy for predicting the machine learning model naming random forest regressor
agricultural information" was discussed by F. H. Tseng et al. which helps in generating high accuracy prediction even with
[3]. It claims that a hybrid prediction model can be used to less amount of data and both testing and training are done
assess crop information from a field. using this module. To anticipate the optimum crop output,
machine learning is used in the framework that has been
2019 saw the implementation of "A Comparative Study of developed. A dataset of crops is experimented with by the
Chemical Components between New and Old Methodologies suggested model. The crop is chosen while taking into account
in Farming" by M.Alizamir et al. [4]. This addresses the yield the soil's composition, the environment in place, and the
production that is influenced by internal chemical components current meteorological circumstances.
including potassium, sulphur, nitrogen, humidity, and water
level. Four algorithms were employed, and we chose the one that
will predict events most accurately. The Algorithms we used
A comparison of a novel method and other machine are Decision Tree , Naive Bayes , SVM and Random Forest
learning techniques for estimating soil organic carbon and Algorithms,where Random Forest Algorithm Predict More
total nitrogen using near infrared spectroscopy in 2020 was accurately So finally we used Random Forest Algorithm as
applied by the author in [5]. It also takes into account the soil the final Algorithm to Predict the Best Crop.
yield level, which will aid in the plant or crop's more
wholesome growth.
P. S. Maya Gopal and R. Bhargavi implemented "The
Advanced machine learning model for better prediction
accuracy of soil temperature at various depths" in 2020 [6].
which claims that crop output is produced using a variety of
different data analysis methods.

General Architecture

Fig 3.1: Architecture Diagram

Data from the dataset will be preprocessed to ignore null


and repeated data. Divide the data into train and test groups,
compare the train data using RF, Naive Bayes, DT, and SVM,
and then forecast which algorithm will yield the best crop for
the designated area.
Data from the dataset will be preprocessed to ignore null
Fig 3.2: Data Flow Diagram and repeated data. Divide the data into train and test groups,
compare the train data using RF, Naive Bayes, DT, and SVM,

Authorized licensed use limited to: Institute of Space Technology. Downloaded on March 20,2024 at 09:10:28 UTC from IEEE Xplore. Restrictions apply.
and then forecast which algorithm will yield the best crop for label Encoder() function to turn category input into numerical
the designated area. data.
C. Design Phase
Splitting the Data and Applying Machine
Once the data was collected and the data will go to Pre-
Learning Modules.
processing model for the removal of redundant values and
forms as a perfect data. Then the data will pass through
SVM,Random Forest,Decision Tree and Naive Bayes and the The input data is trained on and tested in this stage. The
train and test- ing data will be achieved and the output will loaded data is divided into two groups, such as training and
represent as graphical representation. test data, using a division ratio of 80% or 20%, or 0.8 or 0.2.
The easily available input data for a learning set is produced
by a classifier.
D. Algorithm
step 1: Start the process. To approximatingly and categorise the function,
step 2: Load the data set into Jupiter notebook. construct the classifier's support data and assumptions in this
step 3: Data sets for training and testing should be stage. Testing the data is the testing process. Preprocessing
separated. creates the final data, which is processed by the machine
step 4: Now apply machine learning algorithms like learning module.
SVM, NB, DT and RF.
step 5: Find the accuracy of every machine learning We are using four algorithms in our research to forecast
algorithm. crop production. Decision Tree, SVM, Naive Bayes, and
step 6: Now fit the model to the algorithm which gives Random Forest algorithms are used to forecast which crop
highest accuracy. will be suited and yield.
step 7: Now open the visual studio and load the code into
it. IV. IMPLEMENTATION AND DISCUSSION
step 8: Create one web page using HTML, CSS and A. Input and Output
collect the data from user.
step 9: Now pass this data to the Machine Learning This is the dataset we are considering for machine learning
algorithm . algorithms which will give best accuracy.This data will not
step 10: Finally return the output the web page using contain any null or repeated values in it.
Python Framework Flask. step 11: Exit the
process.

E. Module Description
Collecting of Data for Dataset

We gathered information from many sources and


prepared a data collection. And descriptive analysis is
performed using these data. Data is accessible through a
number of online abstract sources, including data.gov.in and
Kaggle.com.

Preprocessing the Data

Figure 4.1: Input Dataset Output Design

White-box models are those that can have their


behavior, forecasts, and influencing factors all well described.
In this project we are going to check what are the factors that
are effecting Random Forest algorithm and the predictions
produced by the algorithm. White box testing allow the
developer to know the weakness and strength of the project
Figure 5.2: Best Accuracy of all Algorithms and Predicting from the internal view of the code.
the best Crop
Here in graph, we are seeing that random forest is
This phase of machine learning is crucial. giving more accuracy than all algoritms and pre-dicting the
Preprocessing involves identifying the right data range, suitable crop also.
adding the missing values, and extracting the functionality.
The dataset's kind is crucial to the analysis process. Here, B. Efficiency of the Proposed System
we'll use the isnull() method to check for null values and the The Proposed project is developed with extra internal
attributes from the dataset, like Nitrogen, Potas- sodium, Sulphur,

Authorized licensed use limited to: Institute of Space Technology. Downloaded on March 20,2024 at 09:10:28 UTC from IEEE Xplore. Restrictions apply.
Humidity, and elements. We have chosen four machine Here in the graph, we see that random forest gives more
learning algorithms for generating a high amount of accuracy than all algorithms and pre-dicting the suitable crop
prediction even with fewer attributes. The training and testing also.
modules which we have used helped us in getting higher
accuracy.

Comparison of Existing and Proposed Systems

Table 4.1: Comparison of Existing and Proposed


System
Sl.No Existing Proposed
System System
Includes Includes
1 parameters parameters Figure 6.2: Displaying the Best Crop
like Crop - like
name, Nitrogen,
yield, Pottasium,
Production, Sulphur,
Cost Humidity Here, the figure says that giving the best crop is based on
some components like nitrogen, phospho-, temperature,
Output Output humidity, PH, rainfall, and potassium.
2 contains contains
the whole extra data V. CONCLUSION
data from from the The suggested model is built using ML algorithms to
the Dataset dataset address the issue of farmers losing money on their farms
which because they don't understand how to grow different crops in
includes various soil types and climates. SVM, Naive Bayes, RF, and
internal Decision Tree approaches are used to build the model. The
values. model offers the best crops to be grown on the least expensive
Prediction Prediction land in light of the examination of the prediction parameters.
3 accuracy is accuracy is Research shows no other work that predicts harvests using the
high with high even same methods exists. As a result, it can be said that this study
the normal with the effort has a higher level of accuracy than previous studies that
dataset less data employed other prediction methodologies.
by using
ML REFERENCES
algorithms. [1] [1]. B. Zhu, Y. Feng, D. Gong, S. Jiang, L. Zhao, and N. Cui, “Hybrid particle
swarm optimization with extreme learning machine for daily reference
Accuracy evapotranspiration prediction from limited climatic data,”
4 Accuracy prediction Comput.Electron.Agricult., vol. 173, Jun. 2020, Art. no. 105430.
Prediction for both [2]. E. Khosla, R. Dharavath, and R. Priya, “Crop yield prediction using
for dataset aggregated rainfall-based modular artificial neural networks and support
training vector regression,” Environ Dev Sustain, Aug. 2019.
and testing [3]. F. H. Tseng, H. H.Cho, and H. T. Wu, “Applying big data for intelligent
dataset. agriculture-based cropselection analysis,” IEEE Access, vol. 7, pp. 116965-
116974, 2019.
[4]. M.Alizamir, O. Kisi, A. N. Ahmed, C.Mert, C. M. Fai, S. Kim, N. W.
Kim, and A. El-Shafie, “Advanced machine learning model for better
prediction accuracy of soil temperature at differentdepths,” PLoS ONE, vol.
15, no. 4, Apr. 2020, Art. no. e0231055.
[5]. M. G. P. S. and B. R., “Performance Evaluation of Best Feature Subsets
for Crop Yield Prediction Using Machine Learning Algorithms,” Applied
Artificial Intelligence, vol. 33, no. 7, pp. 621–642, Jun. 2019.
[6]. P. S. Maya Gopal and R. Bhargavi, “Optimum Feature Subset for
Optimizing Crop Yield Prediction Using Filter and Wrapper Approaches,”
Applied Engineering in Agriculture, vol. 35, no. 1, pp. 9–14, 2019.
[7]. R. Reda, T. Saffaj, B. Ilham, O. Saidi, K. Issam, L. Brahim, and E. M.
El Hadrami, “A comparative study between a new method and other machine
learning algorithms for soil organiccarbon and total nitrogen prediction using
near infrared spectroscopy,” Chemometric Intell. Lab.Syst., vol. 195, Dec.
2019, Art. no. 103873.

Authorized licensed use limited to: Institute of Space Technology. Downloaded on March 20,2024 at 09:10:28 UTC from IEEE Xplore. Restrictions apply.

You might also like