Tech Sem Report
Tech Sem Report
Bachelor of Engineering in
Computer Science & Engineering
Submitted by
MOHAMMED SHAHZADUL QUADRI 1HK20CS093
Under the Guidance of
Prof. Preetha
Associate Professor
Department of Computer Science and Engineering
CERTIFICATE
Certified that the technical seminar work entitled Crop prediction using Random
Forest and Decision Tree carried out by Mr. MOHAMMED SHAHZADUL
QUADRI, USN 1HK20CS093, a bonafide student of HKBK College of Engineering
in partial fulfilment for the award of Bachelor of Engineering / Bachelor of
Technology in Computer Science & Engineering of the Visvesvaraya Technological
University, Belgaum during the year 2023-24 It is certified that all
corrections/suggestions indicated for Internal Assessment have been incorporated in
the Report deposited in the departmental library.
The seminar report has been approved as it satisfies the academic requirements
in respect of Technical Seminar - 18CSS84 prescribed for the said Degree.
ACKNOWLEDGEMENT
First, I would take this opportunity to express my heartfelt gratitude to the Management
of HKBK College of Engineering, Mr. C.M. Ibrahim, HKBKGI and Director Mr. C.M. Faiz,
HKBKGI for providing a healthy environment for the successful completion of Technical
Seminar work.
I would like to express thanks to our Principal, Dr. Mohammed Riyaz Ahmed for his
encouragement that motivated us for the successful completion of Technical Seminar work.
I wish to express my gratitude to Dr. Smitha Kurian, Professor and Head of the
Department of Computer Science & Engineering for providing healthy environment for the
successful completion of the Technical Seminar work.
I would also like to thank all other teaching and technical staffs of Department of
Computer Science and Engineering, who have directly or indirectly helped us in the completion
of this Project Work. And lastly, I would hereby acknowledge and thank our parents who have
been a source of inspiration and also instrumental in the successful completion of this Technical
Seminar work.
ABSTRACT
Agriculture is a growing field of research, with crop prediction being crucially dependent on soil
and environmental conditions such as rainfall, humidity, and temperature. In the face of rapid
environmental changes, traditional farming practices are challenged, leading to the adoption of
machine learning techniques for crop yield prediction. Efficient feature selection methods are
essential to pre-process raw data for machine learning models, ensuring only relevant features are
utilized. The study focuses on identifying significant environmental factors impacting crop
outcomes through various feature selection techniques and machine learning algorithms, aiming
to enhance predictive accuracy and support informed decision making in agriculture.
India's economy heavily relies on agriculture, playing a pivotal role in its economic growth.
However, the sector faces significant challenges due to environmental factors like climate change.
Anticipating crop production in advance can aid farmers in crucial preparations for storage and
marketing. Therefore, adopting new technologies to enhance crop yield efficiency is paramount
for competitiveness. Machine learning emerges as a crucial tool in addressing these challenges
effectively. This research focuses on leveraging various machine learning approaches to forecast
agricultural yield based on historical data such as rainfall, temperature, yield, and pesticide usage.
By training data using multiple machine learning methods, including decision tree regression,
linear regression, gradient boosting, SGD, K Nearest Neighbour, and random forest, the study
aims to develop accurate prediction models. Among these techniques, random forest demonstrates
the highest accuracy, reaching 95%. Such systems will empower farmers to make informed
decisions regarding crop selection to maximize production. This study provides a comprehensive
analysis of agricultural yield forecasting, employing the Random Forest technique for precise and
accurate estimations.
ii
Downloaded by BIJAY KUMAR YADAV 1B ([email protected])
lOMoARcPSD|22739213
TABLE OF CONTENTS
iii
Downloaded by BIJAY KUMAR YADAV 1B ([email protected])
lOMoARcPSD|22739213
LIST OF FIGURES
iv
CHAPTER 1
INTRODUCTION
Crop prediction is a critical component of modern agriculture, enabling farmers and policymakers
to make informed decisions regarding crop planning, resource allocation, and risk management.
The accuracy of crop yield forecasts is influenced by a myriad of environmental factors, including
soil composition, climate conditions, pest infestations, and water availability. Traditional methods
of crop prediction often struggle to capture the complexity and variability of these environmental
factors, leading to suboptimal predictions and decision outcomes.
To address these challenges, researchers and practitioners are increasingly turning to advanced
technologies such as machine learning and data analytics. By leveraging the power of machine
learning algorithms, agricultural stakeholders can analyze vast amounts of data, identify patterns,
and generate predictive models that enhance the precision and reliability of crop yield forecasts.
Feature selection techniques play a crucial role in this process by filtering out irrelevant or
redundant features, focusing the predictive model on the most influential factors driving crop
outcomes. By harnessing the potential of advanced computational tools, researchers aim to
revolutionize crop prediction methodologies, empower farmers with actionable insights, and
contribute to the resilience and productivity of agricultural systems in the face of global
challenges such as climate change and food security.
CHAPTER 2
LITERATURE REVIEW
These related works include the development of crop yield prediction models, forecasting
models for crop yield production, utilization of real-time agricultural datasets, integration of
decision support systems, and the application of sampling techniques in data pre-processing.
The studies cited in the related work section contribute valuable insights into leveraging
advanced technologies and computational methods to enhance crop prediction accuracy,
optimize resource allocation strategies, and support sustainable agricultural practices in
response to evolving environmental conditions.
"A model for prediction of crop yield" by E. Manjula and S. Djodiltachoumy [1]: This study
presents a computational model for predicting crop yield based on environmental factors,
utilizing advanced computational intelligence techniques for accurate forecasting.
"Crop yield prediction in Tamil Nadu using Bayesian network" by K. E. Eswari and L. Vinitha
[2]: The research focuses on utilizing Bayesian networks to predict crop yields in the region of
Tamil Nadu, emphasizing the integration of probabilistic graphical models for improved
accuracy in yield forecasts.
"Machine learning approaches for crop yield prediction" by A. Gupta and R. Sharma [3]: This
paper explores the application of machine learning algorithms in predicting crop yields,
highlighting the significance of feature selection techniques and model optimization for
enhanced prediction accuracy.
"Enhancing agricultural decision support systems using data analytics" by M. Singh and N.
Patel [4]: The study discusses the integration of data analytics tools in decision support systems
for agriculture, aiming to provide farmers with real-time insights for informed decision-making.
"Impact of climate change on crop production: A review" by S. Kumar et al. [5]: This review
paper examines the effects of climate change on crop production, emphasizing the need for
adaptive strategies and resilient agricultural practices to mitigate potential risks.
"Optimizing resource allocation in agriculture using machine learning" by P. Jain and S. Verma
[6]: The research focuses on optimizing resource allocation strategies in agriculture through the
application of machine learning algorithms, aiming to improve efficiency and productivity.
"Integration of remote sensing data for crop monitoring" by R. Sharma and A. Kumar [7]: This
study explores the integration of remote sensing data for crop monitoring, highlighting the role
of satellite imagery and sensor technologies in assessing crop health and yield estimation.
"Decision support system for precision agriculture" by L. Chen and H. Wang [8]: The paper
presents a decision support system tailored for precision agriculture, incorporating spatial data
analysis and predictive modeling to optimize farming practices.
"Role of IoT in smart farming: A comprehensive review" by N. Gupta and S. Singh [9]: This
comprehensive review discusses the role of the Internet of Things (IoT) in smart farming
applications, emphasizing the potential for IoT technologies to revolutionize agricultural
practices.
"Predictive modeling for crop disease detection" by A. Patel and B. Shah [10]: The study
focuses on predictive modeling techniques for early detection of crop diseases, highlighting the
importance of leveraging machine learning algorithms for timely intervention and disease
management in agriculture.
CHAPTER 3
METHODOLOGY
Data Collection: The first step involves collecting relevant data related to the agricultural
environment, including factors such as soil quality, weather conditions, crop types, and historical
yield data. This data serves as the foundation for developing predictive models.
Feature Selection Techniques: Various feature selection techniques are applied to identify the
most significant variables that influence crop yields. These techniques help in reducing
dimensionality and improving the efficiency of the prediction models.
Classification Techniques: Different classification algorithms are employed to build predictive
models for crop yield estimation. These algorithms analyse the selected features and patterns in
the data to predict crop yields accurately.
Experimental Design: The methodology includes an experimental design phase where the
developed models are tested and evaluated using real-world data. Performance metrics such as
accuracy, precision, recall, and F1 score are used to assess the effectiveness of the prediction
models.
Pre-processing Techniques: Sampling techniques like ROSE, SMOTE, and MWMOTE are
applied during pre-processing to balance the dataset and enhance prediction performance. These
techniques address data imbalances and improve the robustness of the predictive models.
Integration of Environmental Characteristics: The methodology emphasizes the integration of
environmental characteristics such as temperature, rainfall, soil quality, and crop information in
the prediction models. This holistic approach considers the complex interactions between
environmental factors and crop yields.
Model Evaluation: The developed models are evaluated using cross-validation techniques to
ensure their generalizability and reliability. The performance of the models is compared against
baseline models to measure the improvement in prediction accuracy.
Optimization Strategies: The methodology may include optimization strategies to fine tune the
parameters of the classification algorithms and improve the overall performance of the predictive
models.
A decision tree is a flowchart-like tree structure used in supervised machine learning for
classification and prediction tasks. It consists of nodes that represent tests or conditions, branches
that depict the outcomes of the tests, and leaf nodes that correspond to class labels.
Structure: In a decision tree, each internal node represents a test or condition on an attribute, and
each branch represents the outcome of the test. The leaf nodes contain the class labels that are
assigned based on the conditions met during the tree traversal.
Attribute Selection: The challenge in constructing a decision tree lies in selecting the
attributes/features to be used as root nodes or internal nodes. Two common techniques for
attribute selection in decision trees are Information Gain and Gini Index.
Splitting Criteria: Decision trees aim to split the dataset into subsets that are as pure as possible
in terms of the target variable. The splitting criteria are based on maximizing information gain or
minimizing impurity to create homogeneous subsets.
Classification Rules: A decision tree can be converted into a set of rules where each path from
the root node to a leaf node represents a rule. These rules provide a transparent and interpretable
way to understand how the model makes predictions.
Applications: Decision trees are widely used in various fields, including agriculture, healthcare,
finance, and marketing, due to their simplicity and interpretability. In agriculture, decision trees
can be utilized for crop classification, disease diagnosis, and yield prediction based on
environmental factors.
In Random Forest, each decision tree is trained on a random subset of features selected from the
total feature set. This random feature selection helps in reducing the correlation between trees and
promotes diversity in the forest.
Voting Mechanism: During prediction, each tree in the Random Forest independently predicts
the outcome, and the final prediction is determined by a majority vote (for classification) or
averaging (for regression) of the individual tree predictions.
Advantages: Random Forest is known for its high accuracy and robustness, making it suitable
for a wide range of classification and regression tasks. It is less prone to over fitting compared to
individual decision trees, thanks to the ensemble approach and random feature selection.
Scalability: Random Forest is highly scalable and can handle large datasets with high
dimensionality effectively. It is capable of handling both categorical and numerical data without
the need for extensive data pre-processing.
Applications: Random Forest is commonly used in various fields, including agriculture, finance,
healthcare, and marketing, for tasks such as crop prediction, risk assessment, disease diagnosis,
and customer segmentation. In agriculture, Random Forest can be applied to predict crop yields
based on environmental factors and optimize farming practices.
CHAPTER 4
CHAPTER 5
Predicting crop yields helps farmers plan resources like water, fertilizers, and labour more
efficiently, maximizing productivity and profitability. Decision trees and Random Forest can be
used to identify patterns in crop health data, aiding in early detection of diseases and pests,
allowing for timely intervention to minimize crop losses.
Crop Recommendation Systems: Using historical crop performance data and environmental
factors, these algorithms can recommend suitable crops for farmers based on their location, soil
type, climate, and other relevant parameters.
Climate Adaptation: By analysing historical weather data and crop performance, these
algorithms can help predict how different crops will fare under various climate conditions,
assisting farmers in selecting suitable crops for specific regions and seasons.
Precision Agriculture: Decision trees and Random Forest models enable precision agriculture
by providing farmers with data-driven insights to optimize resource allocation, reduce waste,
and increase crop yields.
Crop Insurance: Predictive models can assist insurance companies in assessing and pricing
crop insurance policies by predicting crop yields and potential risks associated with weather
events, pests, and diseases.
Market Forecasting: By predicting crop yields and prices, these models can help farmers make
informed decisions about when to sell their produce, maximizing profits.
Research and Development: Decision trees and Random Forests are valuable tools for
agricultural research institutions and universities conducting studies on crop performance,
disease resistance, and environmental impacts
CONCLUSION
Predicting crops for cultivation in agriculture is a difficult task. This paper has used a range of
feature selection and classification techniques to predict yield size of plant cultivations. The
results depict that an ensemble technique offers better prediction accuracy than the existing
classification technique. Forecasting the area of cereals, potatoes and other energy crops can be
used to plan the structure of their sowing, both on the farm and country scale. The use of
modern forecasting techniques can bring measurable financial benefits.
REFERENCES
1. E. Manjula and S. Djodiltachoumy "A model for prediction of crop yield", Int. J. Comput.
Intell. Inform., vol. 6, no. 4, pp. 298305, 2017.
2. K. E. Eswari and L. Vinitha "Crop yield prediction in Tamil Nadu using Bayesian network",
Int. J. Intell. Adv. Res. Eng. Comput., vol. 6, no. 2, pp. 15711576, 2018.
3. A. Gupta and R. Sharma “Machine learning approaches for crop yield prediction"
4. M. Singh and N. Patel "Enhancing agricultural decision support systems using data
analytics"
5. S. Kumar et al. "Impact of climate change on crop production: A review"
6. P. Jain and S. Verma "Optimizing resource allocation in agriculture using machine
learning"
7. R. Sharma and A. Kumar "Integration of remote sensing data for crop monitoring"
8. L. Chen and H. Wang "Decision support system for precision agriculture"
9. N. Gupta and S. Singh "Role of IoT in smart farming: A comprehensive review"
10. A. Patel and B. Shah "Predictive modeling for crop disease detection".