0% found this document useful (0 votes)
42 views5 pages

Machine Learning for Crop Yield Prediction

Uploaded by

ankuro_2
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
42 views5 pages

Machine Learning for Crop Yield Prediction

Uploaded by

ankuro_2
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Crop Management and Forecasting using

Machine Learning
Ankur Ojha
Ankit Ojha Anmol Dwivedi
Department of Mechanical
Department of Information Technology Department of Information Technology
Engineering Galgotias College of
Galgotias College of Engineering and Galgotias College of Engineering and
Engineering and Technology
Technology (AKTU) Technology (AKTU)
(AKTU)
Knowledge Park 2, Greater Noida Knowledge Park 2, Greater Noida
Knowledge Park 2, Greater Noida
ankit2705ojha@[Link] anmoldwivedi@[Link]
ankur15ojha@[Link]

Ajay Kumar
Department Information Technology
Galgotias college of Engineering
and Technology (AKTU)
Knowledge Park 2, Greater Noida
ajay.2606@[Link]

Abstract— The agribusiness assumes a predominant part in the common human-being do not possess knowledge
development of the nation's economy. Climate and other regarding the improvement of the products in the best
ecological changes have become a significant danger in the interest and the best position. That's why certain
horticulture field. AI (ML) is a fundamental methodology for cropping methods and procedures, the periodical
accomplishing pragmatic and powerful answers for this issue. weather circumstances are also moving against the
Crop Yield Prediction includes forecasting yield of the crop from underlying treasures like clay, water, and atmosphere
accessible chronicled accessible information like climate
which will going to head to the vulnerability of feeding
parameter, soil boundary and noteworthy crop yield. This paper
center around anticipating the yield of the harvest dependent on food. We have understood by examining all of the
the current information by utilizing K-means clustering and culmination and problems like climate, warmth, and
Random Forest calculation. Genuine information of data of India many other circumstances, In India, we are in this
was utilized for building the models and the models were tried with situation, we have no decent process and tools &
samples. The forecast will serve to the farmer to foresee the yield techniques to succeed the circumstances struggled. We
of the harvest prior to developing onto the farming field. To foresee researched various ways to enhance the financial
the harvest yield in future precisely the k-means clustering and increase in the area of farming and horticulture. You can
Random Forest, are the generally amazing and well-known observe, various techniques to enhance and advance the
algorithm and directed AI calculation is utilized. farm crop and the condition of the products. Machine
learning is beneficial for forecasting farm yield
Keywords— Yield Analysis; Product Yield, Machine learning;
generation. Usually, Machine learning is the subject of
Result Prediction; Random Forest Algorithm, decision tree
Algorithm, k-means clustering, ensemble approach. programming algorithms that try to adjust to
circumstances automatically through experience and
according to data. Machine learning tools and programs
recognize users to examine data from many different
I. INTRODUCTION perspectives or viewpoints, classify, and shorten the
Farming and agriculture are the firmness of every nation. associations distinguished. If we talk more in terms of
In the farms like India, which produces an unexpected machine learning, It is the process of finding
rising demand for agriculture products and diet due to a relationships and behaviors amongst hundreds of levels
soaring number of people, approaches in the farming in prominent structured and unstructured datasets that are
area are expected to fit the necessities. From an early age, available in plenty of amounts. The models,
farming is deemed the foremost and the leading lifestyle organizations, or associations in between all this
followed in nations likewise India. Aged personalities information can give the report. raw data can be
grow the herbs in their farm hence people have been converted into structured data from the data set using
served requirements of self. Accordingly, the actual clustering algorithms like k-means. and we can apply
yields were grown and have been used by many living- different classification algorithms to classify the result.
things like people, creatures, fowl, cows, and seabirds.
The healthy products grew on the farm which has been We can see traditional behaviors and anticipated biases.
taken by the human's heads to a good and healthy Like, review data about product creation will assist the
lifestyle. After the discovery of the latest creative tools producers to recognize the disasters of crop and block it in
and procedures, the farming area is gradually the eternity. Crop forecasting is a major farming issue faced
diminishing. thus certain, with ample creativity, humans by each farmer. Indian farmers are always thinking to know
continue to focus on cultivating synthetic food items that and trying about gathering information on yield so they can
are composite crops and they head to an injurious prepare themselves before growing the particular crop and
lifestyle and it's good for human beings. In today's era, make a profit out of it. In the early days, crop forecasting was
estimated by comparing farmer's prior knowledge of a produce one optimal and pre accurate predictive model with
singular product and its production. The Agricultural yield much efficiency.
fundamentally turns on climate situations, bugs, soil We will be using a two-level custom ensemble approach to
condition, moisture, a distance of market from the field, and solve this particular issue. The first level approach is an
preparation of crop cultivation. Certain knowledge around unsupervised machine learning algorithm to cluster data in
the past of farm production is the foremost essential different groups. and on the second level design, the approach
information for forming decisiveness compared to farm is supervised machine learning algorithms.
uncertainty about different parameters and management of it.
thus, here this paper suggests a good concept to forecast the Types of Learning
cultivation of the crop in farmland. The Producer will monitor
the yield of the product as per their requirements like water 1. Unsupervised Machine Learning
needed, fertilizers, temperatures, soil condition, and the size
of the land according to the crop. Unsupervised machine learning algorithms
understand patterns of a dataset externally reference
II. LITERATURE REVIEW to public or labelled results. Not like supervised
The purpose of this literature review is to study 10 years machine learning, unsupervised machine learning
of research papers on crop yield prediction. Here we have techniques can't be immediately used to regression
investigated soft computing techniques and algorithms that or a classification problem because the user have no
which one is currently more efficient to solve our objective. notion what can be the outcome from the input and
[R1] SVM technique is very useful for estimating soil and
what output data might be, obtaining it
moisture, using remote sensing data. Each technique has its
own merits and demerits, so by making a fusion of more than impracticable for the user to train the algorithm
one technique we can get our desired results. K-means using the idea you usually would try to do.
Clustering for grouping data, Random forest regressors and Unsupervised learning can alternatively be utilized
decision tree regressors are the most efficient technique to to determine the underlying and unobserved
obtain more accurate results from the data set. Getting climate, structure of the dataset. In other words, you can
economical condition of the country, and pollution data will define unsupervised machine learning as the
predict a more satisfactory result. technique to labialize the unlabeled data.
Machine Learning (ML) trades with obstacles wherever Ex: K-means Clustering, Apriori Algorithm,
the connection in data and product variables is not identified Principle Component analysis, Singular value
or arduous to get. This “learning” word here indicates the Decomposition.
involuntary possession of structural information from samples
of whatever is occurring in detail. Unlike conventional
analytical techniques, ML seems not to create premises
regarding the exact formation of the information model, which
explains the information. This feature is extremely beneficial
to design complicated non-linear operations, such as a
function for crop yield prediction and management. ML
routines common strongly applied to Crop Yield Prediction .
An unsupervised machine algorithm such as the K-means
algorithm is used to group the data based on similar behaviors
in the crop prediction. The supervised machine learning
algorithm has the behavior the separated input and output
Fig.1. Unsupervised Machine Learning
behaviors from each other which is to be foretold of a provided
collection of predictors. Practicing this set, we create a method
that is used to map or indicate desired output with the input 2. Supervised Machine Learning Algorithm
parameters. The training method proceeds till the model Supervised learning is the numerous traditional sub
obtains the aspired level of precision on the training dataset branch of machine learning in today's era. Usually,
that we have provided. Examples of Supervised Learning are People who start with machine learning will
the Linear regression Algorithm, random forest algorithm, commence their course with supervised learning
Support vector machine, decision tree. algorithms of machine learning.
Examples of Unsupervised Machine learning Algorithms Supervised machine learning algorithms are
are K-means for clustering algorithm for clustering in the intended to learn by case by case. The term
system, Apriori algorithm. “supervised” learning arises from the belief You
have somebody guru or teacher who will guide you
III. METHODOLOGY throughout the process.
In the machine learning and data analytic field, the ensemble When you train a supervised machine learning
approach uses multiple learning algorithms to achieve much algorithm, the training data from the dataset mustbe
auspicious performance than could be achieved from any of comprised of levelled data of inputs paired with the
the constituent learning algorithms solely. If elaborate more accurate outputs set. Throughout the training
then we can say The ensemble approach is a machine learning process, the algorithm tries to explore some
methodology that consolidates certain base models to
common patterns in the available data that correlate
with the aspired output data. Later training, a The parameter that we have decided for this study
supervised learning algorithm will use in different are listed below
hidden data and will start determining which label • month of the production
the further input data will be categorized as based on • year of the production
previous training data that was available. The • rainfall occurs during that particular period
purpose of a supervised learning model and • WPI (wholesale price index) for that
algorithm is to forecast the fitting label for recent particular period of time.
manifested input data from the user. In mathematical
terms, a supervised learning algorithm simply shows
as Work flow

𝒚 = 𝒇(𝒙)
a) Decision Tree:

Where Y is the forecasted output data that is found


by a mapping function that selects a class to a
particular input data x. The function has been used
to correlate input features to a forecasted output. The
result is generated by the machine learning model
during the training of the dataset.

[Link] of defined function

IV. OUTCOMES

Tools Used

We have used python as the core of this project and


for research work. python libraries such as scikit-
learn pandas and NumPy for the prediction and
calculation purpose.
For backend we have used flask web framework and
for front-end we have used HTML CSS and Fig.3. Decision Tree
JavaScript to show the result in the webpage.
A decision tree is a graphical illustration of specific
decision conditions that are applied when
Dataset Used complicated branching befalls in a structured
decision manner. A decision tree is an imminent
All dataset used in this research work are sourced model based on a branching range of Boolean tests
from government site [Link] and open source like that use particular data to make more generalized
Kaggle. Data is collected between 2012 and 2018 outcomes.
for 23 different crops that are mostly cultivated in The main ingredients of a decision tree include
India. decision points described by nodes, actions and
The dataset contains feature like rainfall and WPI distinct preferences from a decision point. Each
along their production month. precept within a decision tree is described by
pursuing a series of paths from root to node to the
next node and so on until an action is attained.

First we store all the CSV data file into a crop


dictionary with key name of the crop. We have
stored the base price of all crops in the base
dictionary and rainfall in the rainfall list for current
year.
Next 12 month price prediction of wheat

Fig.4. Example of a decision tree implementation

b) Procedure:
Previous 12 month price prediction of wheat

We have loaded the dataset and splatted into X and


Y and fit these parameters in the decision tree
regressor to predict the prices.
VI. CONCLUSION
We have stored the price with their respect crops in The outcomes explain that we can achieve accurate crop price
the crop list array that we have used to show result forecasting using the decision tree regressor. The decision
on the webpage. tree algorithm predicted the precise price of the crops with
Accuracy could we range from 90 to 92% in terms the lowest cost in terms of performance. of crop yield models
of price prediction. with the lowest models. It is more useful for large crop price
We have predicted the price for future one year for prediction in farming outlining. It helps farmers to make the
each crop in our dataset. best decision for the correct crop selection which have higher
price in the market.

V. FUTURE SCOPE VII. FUTURE ENHANCEMENT

These two graphs are showing the future and past 1 year of In this paper, we explain the crop price prediction capability
price prediction for the wheat in India. of the decision tree algorithm. In the future, we can add more
crops feature like soil condition, PH, humidity, geo-location
and temperature, etc needed to cultivate a particular crop to
predict the more efficient result with a lot of information.

REFERENCES

1. Ahamed, A.T.M.S., Mahmood, N.T., Hossain, N.,


Kabir, M.T., Das, K., Rahman, F., Sci. 153 (3), 399–410.
Rahman, R.M., 2015. Applying data mining [Link]
techniques to predict annual yield of
major crops and recommend planting different 8. McQueen, R.J., Garner, S.R., Nevill-Manning,
crops in different districts in C.G., Witten, I.H., 1995. Applying machine
Bangladesh. In: 2015 IEEE/ACIS 16th learning to agricultural data. Comput. Electron.
International Conference on Software Agric. 12 (4), 275–293. [Link]
Engineering, Artificial Intelligence, Networking org/10.1016/0168-1699(95)98601-9.
and Parallel/Distributed Computing,
SNPD 2015 - Proceedings, 9. Osman, T., Psyche, S.S., Kamal, M.R., Tamanna,
[Link] F., Haque, F., Rahman, R.M., 2017.
Predicting early crop production by analysing prior
2. Ahmad, I., Saeed, U., Fahad, M., Ullah, A., Habib- environment factors, pp.
ur-Rahman, M., Ahmad, A., Judge, J., 470–479. [Link]
2018. Yield forecasting of spring maize using 49073-1_51.
remote sensing and crop modeling in
Faisalabad-Punjab Pakistan. J. Indian Soc. Remote 10. Ranjan, A.K., Parida, B.R., 2019. Paddy acreage
Sens. 46 (10), 1701–1711. mapping and yield prediction using
[Link] sentinel-based optical and SAR data in Sahibganj
district, Jharkhand (India). Spatial
3. Cakir, Y., Kirci, M., Gunes, E.O., 2014. Yield Inf. Res. [Link]
prediction of wheat in south-east region of 00246-4.
Turkey by using artificial neural networks. In: 2014
The 3rd International Conference
on Agro-Geoinformatics, Agro-Geoinformatics
2014. [Link] 11. Shah, A., Dubey, A., Hemnani, V., Gala, D.,
Geoinformatics.2014.6910609. Kalbande, D.R., 2018. Smart Farming System:
Crop Yield Prediction Using Regression
4. Gandhi, N., Armstrong, L.J., 2016b. A review of Techniques. Springer, Singapore, pp. 49–56.
the application of data mining techniques [Link]
for decision making in agriculture. In: Proceedings
of the 2016 2nd International 12. Taherei-Ghazvinei, P., Hassanpour-Darvishi, H.,
Conference on Contemporary Computing and Mosavi, A., Yusof, K.W., Alizamir, M.,
Informatics, [Link] Shamshirband, S., Chau, K., 2018. Sugarcane
IC3I.2016.7917925. growth prediction based on meteorological
parameters using extreme learning machine and
5. Khanal, S., Fulton, J., Klopfenstein, A., Douridas, artificial neural network.
N., Shearer, S., 2018. Integration of high Eng. Appl. Comput. Fluid Mech. 12 (1), 738–749.
resolution remotely sensed data and machine
[Link]
learning techniques for spatial prediction
of soil properties and corn yield. Comput. Electron.
Agric. 153, 213–225. 13. You, J., Li, X., Low, M., Lobell, D., Ermon, S.,
[Link] 2017. Deep Gaussian process for crop yield
prediction based on remote sensing data. In:
6. Liakos, K.G., Busato, P., Moshou, D., Pearson, S., Proceedings of the Thirty-First AAAI
Bochtis, D., 2018. Machine learning in Conference on Artificial Intelligence (AAAI-17),
agriculture: a review. Sensors (Switzerland) 18 (8). 4559–4566. [Link]
[Link] 1109/MWSCAS.2006.381794.
s18082674
14. Wang, A., Tran, C., Desai, N., Lobell, D., n.d. Deep
7. Matsumura, K., Gaitan, C.F., Sugimoto, K., transfer learning for crop yield prediction
Cannon, A.J., Hsieh, W.W., 2015. Maize yield with remote sensing data. [Link]. Retrieved
forecasting by linear regression and artificial neural from [Link]
networks in Jilin, China. J. Agric. [Link]?id=3212707.

Common questions

Powered by AI

Climate and environmental factors such as temperature, rainfall, and soil conditions are integrated into machine learning models as features that significantly affect crop yield . These factors are often nonlinear and interact in complex ways, posing challenges for modeling. Machine learning models can capture these interactions without presupposing their specific form, but the difficulty lies in obtaining accurate and high-resolution environmental data. Additionally, temporal variability and the potential for future climate changes make predictions uncertain, as historical data may not always represent future conditions .

Supervised learning methods rely on labeled data to learn the mapping from input features to an output label, such as using historical yield data to predict future yields. An example is the Random Forest algorithm, which is used to predict the yield of a crop based on climatic and soil data . In contrast, unsupervised learning does not use labeled outputs and instead seeks to identify patterns within the data itself. An example is K-means clustering, which groups data points like areas with similar soil properties and rainfall conditions without any predefined labels .

K-means clustering is used as an unsupervised machine learning algorithm to group data based on similar behaviors, which is crucial for crop yield prediction as it helps in categorizing different crop yield-related data points into groups with similar characteristics. This can reveal patterns such as grouping areas with similar soil and climate conditions that affect crop yield. These groups can then be further examined to understand the underlying factors influencing yield, providing insights without predefined labels, which is beneficial in agriculture where not all relevant predictors are labeled .

Future enhancements for machine learning models in crop yield prediction include the integration of additional data features such as soil pH, geolocation information, local pest history, real-time weather forecast data, and socio-economic factors. With advancements in IoT and remote sensing technology, real-time data collection will become more feasible, offering rich datasets that can improve model accuracy and adaptability. Incorporating these diverse datasets will allow models to account for more variables influencing crop growth, leading to even more precise and timely predictions that can aid in strategic planning, resource allocation, and risk management in agriculture .

The Random Forest algorithm is effective for crop yield prediction because it combines multiple decision trees to improve accuracy and prevent overfitting—a common drawback of single decision trees. By aggregating the results of various trees constructed using random samples of the data, it creates a robust model that generalizes well to unseen data. In comparison, single decision trees can be highly sensitive to data variations and tend to overfit, focusing too much on training data patterns that do not generalize well .

The decision tree algorithm assists in forecasting agricultural crop prices by using historical data to model decision rules, which help in mapping input variables, like month, year, rainfall, and WPI, to predicted outcomes (prices). Its unique advantage lies in its ability to mimic human decision-making processes, allowing users to visualize how specific factors influence price changes. It handles both linear and non-linear relationships effectively and provides clear model interpretations through its tree structure, which can be extremely useful for stakeholders interpreting the model insights for effective decision-making .

Contemporary machine learning applications address the challenges of traditional yield prediction methods by providing more accurate and dynamic predictions. Traditional methods often rely on expert judgment or simple statistical models that may not account for complex interactions between environmental variables and crop growth. Machine learning models, such as Random Forest and Support Vector Machines, can handle high-dimensional data, detect nonlinear patterns, and continually improve as more data becomes available. This enables them to address the variability and complexity of agricultural systems more effectively than traditional methods .

Using ensemble methods, which combine different machine learning algorithms, enhances prediction accuracy by leveraging the strengths of each individual model while mitigating their weaknesses. For crop yield prediction, ensemble methods can improve stability and resilience against overfitting, as they aggregate predictions derived from various models, such as decision trees and K-means clustering. This can lead to better generalization to unseen data, thereby providing more reliable forecasts for farmers to base their decisions on .

High-resolution remotely sensed data plays a critical role in enhancing the precision of machine learning models used for crop yield prediction. Such data provides detailed information about land use, vegetation indices, soil moisture, and climatic variables across large geographical areas and varying time periods. This allows machine learning models to include fine-grained, spatially and temporally resolved features that are crucial for accurate yield prediction. It enables the modeling of spatial patterns and trends that would otherwise be difficult to capture, thereby improving the predictions made by these models .

Machine learning aids in crop management decision-making by analyzing large datasets rapidly and generating accurate predictions about future yields, potential pest outbreaks, or climatic influences. In countries with growing population demands, this capability allows for optimized use of resources, such as applying fertilizers more efficiently or selecting crop varieties that are predicted to perform better under specific environmental conditions. By forecasting yields and potential risks, machine learning can help farmers and policymakers make informed decisions that balance crop output with resource conservation .

You might also like