Chapter-1 1.1 Overview
Chapter-1 1.1 Overview
INTRODUCTION
1.1 Overview
Agriculture is the backbone of the Indian economy. In India, agricultural yield primarily
depends on weather conditions and area. Rice cultivation mainly depends on rainfall and soil
type. Timely advice to predict the future crop productivity and an analysis is to be made in
order to help the farmers to maximize the crop production of crops. Yield prediction is an
important agricultural problem. In the past farmers used to predict their yield from previous
year yield experiences. Thus, for this kind of data analytics in crop prediction, there are
different techniques or algorithms, and with the help of those algorithms we can predict crop
yield. Using all these algorithms and with the help of inter-relation between them, there are
growing range of applications and the role of Big data analytics techniques in agriculture.
Since the creation of new innovative technologies and techniques the agriculture field is
slowly degrading. Due to these, abundant invention people are concentrated on cultivating
artificial products that are hybrid products where there leads to an unhealthy life. Nowadays,
modern people don't have awareness about the cultivation of the crops at the right time and at
the right place. Because of these cultivating techniques the seasonal climatic conditions are also
being changed against the fundamental assets like soil, water and air which lead to insecurity
of food. By analysing all these issues and problems like weather, temperature and several
factors, there is no proper solution and technologies to overcome the situation faced by us. In
India, there are several ways to increase the economic growth in the field of agriculture. There
are multiple ways to increase and improve the crop yield and the quality of the crops.
Machine Learning algorithms is also useful for predicting crop yield production.
Using past information on weather, temperature and a number of other factors the information
is given. The Application which we developed, runs the algorithm and shows the list of crops
suitable for entered data with predicted yield value.
when the producers of the crops know the accurate information on the crop yield it minimizes
the loss. Machine learning, a fast-growing approach that’s spreading out and helping every
sector in making viable decisions to create the foremost of its applications.
The core objective of crop yield estimation is to achieve higher agricultural crop production
and many established models are exploited to increase the yield of crop production. Nowadays,
1
ML is being used worldwide due to its efficiency in various sectors such as forecasting, fault
detection, pattern recognition, etc.
The ML algorithms also help to improve the crop yield production rate when there is a loss in
unfavorable conditions. The ML algorithms are applied for the crop selection method to reduce
the losses crop yield production irrespective of distracting environment.The main objectives
are
The problem that the Indian Agriculture sector is facing the integration oftechnology to bring
the desired outputs. With the advent of new technologies and overuse of non-renewable energy
resources, patterns of rainfall and temperature are disturbed. The inconsistent trends developed
from the side effects of global warming make it difficult for the farmers to clearly predict the
temperature and rainfall patterns thus affecting their crop yield productivity andalso Indian
GDP is decreasing as crop yielding is decreasing. The main aim of this project is to help farmers
to cultivate a crop with maximum yield.
1.3 OBJECTIVE OF PROJECT
This project focuses on predicting the yield of the crop by applying various machine learning
techniques. The outcome of these techniques is compared on the basis of mean absolute error.
The prediction made by machine learning algorithms will help the farmers to decide which crop
to grow to get the maximum yield by considering factors like temperature, rainfall, area, etc.
2
CHAPTER-2
LITERATURE SURVEY
5
CHAPTER-3
SYSTEM ANALYSIS
3.1:EXISTING SYSTEM:
Due to the revolution in industrialization, the economic contribution of agriculture to
India’s GDP is steadily declining with the country’s broad-based economic growth. The
problem that the Indian Agriculture sector is facing is the integration of technology to bring
the desired outputs. With the advent of new technologies and overuse of non-renewable
energy resources patterns of rainfall and temperature are disturbed. The inconsistent trends
developed from the side effects of global warming make it cumbersome for the farmers to
clearly predict the temperature and rainfall patterns thus affecting their crop yield
productivity. In order to perform accurate prediction and handle inconsistent trends in
temperature and rainfall various machine learning algorithms like CNN and also used
computer vision etc can be applied to get a pattern. It will complement the agricultural
growth in India and all together augment the ease of living for farmers. In past, many
researchers have applied machine learning techniques and computer vision to enhance
agricultural growth of the country but it gave the less accuracy.
3.2 PROPOSED SYSTEM:
This project focuses on predicting the yield of the crop by applying various machine
learning techniques like Recurrent Neural Network(RNN) and Long Short term
Memory(LSTM) and Feed Forward Neural Network. The outcome of these techniques is
compared on the basis of mean square error. The prediction made by machine learning
algorithms will help the farmers to decide which crop to grow to get the maximum yield
by considering factors like temperature, rainfall, area, etc. Crop yielding prediction is
determined considering all the features.
The advantages of proposed system:
• The proposed system is useful for agriculture department and farmers to predict
crop yield and to suggest the suitable crop . It is useful to farmers to know the crop
yield.
• It is also used to help the farmers to decide which crop to cultivate in the field.
6
3.3:HARDWARE AND SOFTWARE REQUIREMENTS :
3.3.1 Hardware System Configuration:
• System - Pentium IV 3.5 GHz .
• Hard Disk - 40 GB.
• Floppy Drive - 1.44 Mb.
• Ram - 512 Mb
3.3.2 Software Requirements:
• System - Windows.
• Coding Language – Python 3.7.
Python
• Python is an interpreted high-level programming language for general-purpose
programming. Created by Guido van Rossum and first released in 1991, Python
has a design philosophy that emphasizes code readability, notably using significant
whitespace. Python features a dynamic type system and automatic memory
management. It supports multiple programming paradigms, including object-
oriented, imperative, functional and procedural, and has a large and comprehensive
standard library.
• Python is Interpreted − Python is processed at runtime by the interpreter. You do
not need to compile your program before executing it. This is similar to PERL and
PHP.
• Python is Interactive − you can actually sit at a Python prompt and interact with the
interpreter directly to write your programs.
• Python also acknowledges that speed of development is important. Readable and
terse code is part of this, and so is access to powerful constructs that avoid tedious
repetition of code. Maintainability also ties into this may be an all but useless
metric, but it does say something about how much code you have to scan, read
and/or understand to troubleshoot problems or tweak behaviors. This speed of
development, the ease with which a programmer of other languages can pick up
basic Python skills and the huge standard library is key to another area where
Python excels. All its tools have been quick to implement, saved a lot of time, and
several of them have later been patched and updated by people with no Python
background - without breaking.
7
Machine Learning
Before we take a look at the details of various machine learning methods, let's
start by looking at what machine learning is, and what it isn't. Machine learning is often
categorized as a subfield of artificial intelligence, but I find that categorization can often
be misleading at first brush. The study of machine learning certainly arose from research
in this context, but in the data science application of machine learning methods, it's more
helpful to think of machine learning as a means of building models of data. Fundamentally,
machine learning involves building mathematical models to help understand data.
"Learning" enters the fray when we give these models tuneable parameters that can be
adapted to observed data; in this way the program can be considered to be "learning" from
the data. Once these models have been fit to previously seen data, they can be used to
predict and understand aspects of newly observed data. I'll leave to the reader the more
philosophical digression regarding the extent to which this type of mathematical, model-
based "learning" is similar to the "learning" exhibited by the human brain. Understanding
the problem setting in machine learning is essential to using these tools effectively
Applications of Machines Learning:
Machine Learning is the most rapidly growing technology and according to researchers
we are in the golden year of AI and ML. It is used to solve many real-world complex
problems which cannot be solved with traditional approach.
Following are some real-world applications of ML
• Emotion analysis Sentiment analysis
• Error detection and prevention
• Weather forecasting and prediction
• Stock market analysis and forecasting
• Speech synthesis
• Speech recognition
• Customer segmentation
• Object recognition
• Fraud detection
• Fraud prevention
• Recommendation of products to customer in online shopping
8
CHAPTER-4
SYSTEM DESIGN
4.1:ARCHITECTURE
Fig1.Architecture
Agriculture Data
Pre-Processing Data
Feature Extraction
Result
9
4.2:UML DIAGRAMS:
4.2.1 Introduction:UML represents Unified Modelling Language. UML is an
institutionalized universally useful showing dialect in the subject of article situated
programming designing. The fashionable is overseen, and become made by way of, the
Object Management Group. The goal is for UML to become a regular dialect for making
fashions of item arranged PC programming. In its gift frame UML is contained two
noteworthy components: a Meta-show and documentation. Later on, a few type of method
or system can also likewise be brought to; or related with, UML. The Unified Modeling
Language is a popular dialect for indicating, Visualization, Constructing and archiving the
curios of programming framework, and for business demonstrating and different non-
programming frameworks. The UML speaks to an accumulation of first-rate building
practices which have verified fruitful in the showing of full-size and complicated
frameworks. The UML is a essential piece of creating gadgets located programming and
the product development method. The UML makes use of commonly graphical
documentations to specific the plan of programming ventures.
GOALS: The Primary goals inside the plan of the UML are as in step with the subsequent:
1. Provide clients a prepared to utilize, expressive visual showing Language on the way to
create and change massive models.
2. Provide extendibility and specialization units to make bigger the middle ideas.
3. be free of specific programming dialects and advancement manner.
4. Provide a proper cause for understanding the displaying dialect.
5. Encourage the improvement of OO gadgets exhibit.
6. Support large amount advancement thoughts, for example, joint efforts, systems,
examples and its components.
7. Integrate widespread procedures.
4.2.2 Use Case Diagram:
A use case diagram in the Unified Modeling Language (UML) is a type of behavioral
diagram defined by and created from a Use-case analysis. Its purpose is to present a
graphical overview of the functionality provided by a system in terms of actors, their goals
(represented as use cases), and any dependencies between those use cases. The main
purpose of a use case diagram is to show what system functions are performed for which
actor. Roles of the actors in the system can be depicted.
10
Fig.2 Usecase Diagram
4.2.3 Class Diagram:
In software engineering, a class diagram in the Unified Modelling Language (UML) is type
of static structure diagram that describes the structure of a system by showing the system's
classes, their attributes, operations (or methods), and the relationships among the classes.
It explains which class contains information.
11
Fig.4 Sequence Diagram
12
CHAPTER-5
IMPLEMENTATION
5.1: Modules Description:
1.Upload data
Upload the dataset that have collected from the IMD.It contains the attributes like states
,Districts ,Crop ,Season , Area ,Production and Rainfall.
Table 1. Dataset
2.Preprocessing
Data Preprocessing is a method that is used to convert the raw data into a clean data set.
The data are gathered from different sources, it is collected in raw format which is not
feasible for the analysis. By applying different techniques like replacing missing values
and null values, we can transform data into an understandable format. The final step on
data preprocessing is the splitting of training and testing data. The data usually tend to be
split unequally because training the model usually requires as much datapoints as possible.
The training dataset is the initial dataset used to train ML algorithms to learn and produce
right predictions shows the few rows of the preprocessed data
3.Features Exraction:
There are a lot of factors that affects the yield of any crop and its production. These are
basically the features that help in predicting the production of any crop over the year. In
this project we include factors like Temperature, Rainfall, Area, Humidity and area and
soil type.
4.Load train and test Dataset:
Training datasets with 67% of the observations that can use to train our model, leaving
the remaining 33% testing the model.
5.Apply Neural Network
The processed data is trained through the machine learning algorithms .
6.Performance Analysis
13
The performance of neural network model was evaluated using the metrics like Mean
Square Error (MSE) .
5.2 Modules Used in Python:
Tensorflow:
TensorFlow is a free and open-source software library for dataflow and differentiable
programming across a range of tasks. It is a symbolic math library, and is also used for
machine learning applications such as neural networks. It is used for both research and
production at Google. TensorFlow was developed by the Google Brain team for internal
Google use.
It was released under the Apache 2.0 open-source license on November 9, 2015.
Numpy:
Numpy is a general-purpose array-processing package. It provides a highperformance
multidimensional array object, and tools for working with these arrays.
It is the fundamental package for scientific computing with Python. It contains various
features including these important ones:
• A powerful N-dimensional array object
• Sophisticated (broadcasting) functions
• Tools for integrating C/C++ and Fortran code
• Useful linear algebra, Fourier transform, and random number capabilities
Besides its obvious scientific uses, Numpy can also be used as an efficient
multidimensional container of generic data. Arbitrary data-types can be defined using
Numpy which allows Numpy to seamlessly and speedily integrate with a wide variety of
databases.
Pandas:
Pandas is an open-source Python Library providing high-performance data manipulation
and analysis tool using its powerful data structures. Python was majorly used for data
munging and preparation. It had very little contribution towards data analysis. Pandas
solved this problem. Using Pandas, we can accomplish five typical steps in the processing
and analysis of data, regardless of the origin of data load, prepare, manipulate, model, and
analyze. Python with Pandas is used in a wide range of fields including academic and
commercial domains including finance, economics, Statistics, analytics, etc.
Matplotlib:
Matplotlib is a Python 2D plotting library which produces publication quality figures in a
variety of hardcopy formats and interactive environments across platforms. Matplotlib can
be used in Python scripts, the Python and IPython shells, the Jupyter Notebook, web
14
application servers, and four graphical user interface toolkits. Matplotlib tries to make easy
things easy and hard things possible. You can generate plots, histograms, power spectra,
bar charts, error charts, scatter plots, etc., with just a few lines of code. For examples, see
the sample plots and thumbnail gallery. For simple plotting the pyplot module provides a
MATLAB-like interface, particularly when combined with IPython. For the power user,
you have full control of line styles, font properties, axes properties, etc, via an object
oriented interface or via a set of functions familiar to MATLAB users.
Scikit – learn
Scikit-learn provides a range of supervised and unsupervised learning algorithms via a
consistent interface in Python. It is licensed under a permissive simplified BSD license and
is distributed under many Linux distributions, encouraging academic and commercial use.
Tkinter
Tkinter is the standard GUI library for Python. Python when combined with Tkinter
provides a fast and easy way to create GUI applications. Tkinter provides a powerful
object-oriented interface to the Tk GUI toolkit.Creating a GUI application using Tkinter is
an easy task.
Python
Python is an interpreted high-level programming language for general-purpose
programming. Created by Guido van Rossum and first released in 1991, Python has a
design philosophy that emphasizes code readability, notably using significant whitespace.
Python features a dynamic type system and automatic memory management. It supports
multiple programming paradigms, including object-oriented, imperative, functional and
procedural, and has a large and comprehensive standard library.
• Python is Interpreted − Python is processed at runtime by the interpreter. You do not
need to compile your program before executing it. This is similar to PERL and PHP.
• Python is Interactive − you can actually sit at a Python prompt and interact with the
interpreter directly to write your programs.
Python also acknowledges that speed of development is important. Readable and terse code
is part of this, and so is access to powerful constructs that avoid tedious repetition of code.
Maintainability also ties into this may be an all but useless metric, but it does say something
about how much code you have to scan, read and/or understand to troubleshoot problems
or tweak behaviors. This speed of development, the ease with which a programmer of other
languages can pick up basic Python skills and the huge standard library is key to another
area where Python excels. All its tools have been quick to implement, saved a lot of time,
15
and several of them have later been patched and updated by people with no Python
background - without breaking.
5.3 ALGORITHMS
5.3.1:Feed Forward Neural Work:
• A Feed forward Neural Network is an artificial neural network wherein connections
between the nodes do not form a cycle.
• Perceptrons are arranged in layers, with the first layer taking in inputs and the last
layer producing outputs. The middle layers have no connection with the external
world, and hence are called hidden layers.
• Each perceptron in one layer is connected to every perceptron on the next layer.
Hence information is constantly “fed forward” from one layer to the next., and this
explains why these networks are called feed-forward networks.
• There is no connection among perceptrons in the same layer.
16
5.3.2:Recurrent Neural Network:
Recurrent neural networks (RNN) are the state of the art algorithm for sequential data and
are used by Apple's Siri and and Google's voice search. It is the first algorithm that
remembers its input, due to an internal memory, which makes it perfectly suited for
machine learning problems that involve sequential data. It is one of the algorithms behind
the scenes of the amazing achievements seen in deep learning over the past few years.
RNNs are a powerful and robust type of neural network, and belong to the most promising
algorithms in use because it is the only one with an internal memory.
17
5.3.3:LONG-SHORT TERM MEMORY
Long short-term memory networks (LSTMs) are an extension for recurrent neural
networks, which basically extends the memory. Therefore it is well suited to learn from
important experiences thathave very long time lags in between.
The units of an LSTM are used as building units for the layers of aRNN, often called an
LSTM network.
LSTMs enable RNNs to remember inputs over a long period of time.This is because
LSTMs contain information in a memory, much like the memory of a computer. The
LSTM can read, write and delete information from its memory.
18
connected through layers. A block has components that make it smarter than a classical
neuron and a memory for recent sequences. A block contains gates that manage the block’s
state and output and operates upon an input sequence and each gate within a block uses the
sigmoid activation units to control whether they are triggered or not, making the change of
state and addition of information flowing through the block conditional. Once the model
is fit, we can estimate the performance of the model on the train and test datasets. This will
give us a point of comparison for new models. The network is trained for 50 epochs and a
batch size of 1 is used. The predictions before calculating error scores to ensure that
performance is reported in the same units as the original data.
19
Step 4: upload dataset. The dataset must be in .csv format.
Step 6: Defining rnn function and where create rnn model and train the data by defining
each layer with certain filters to filter dataset
While filtering and training the dataset need to store accuracy like epochs and loss
Where adam is the optimizer used to train and metrics is the accuracy used to predict.
Here the class is defined as binary classification
Binary classification:In binary classification each input sample is assigned to one of two
classes. Generally these two classes are assigned labels like 1 and 0, or positive and
negative. More specifically, the two class labels might be something like malignant or
benign (e.g. if the problem is about cancer classification), or success or failure (e.g. if it is
about classifying student test scores).
Assume there is a binary classification problem with the classes positive and negative.
Here is an example of the labels for seven samples used to train the model. These are
called the ground-truth labels of the sample.
Positive, Negative, Positive , Negative, Positive , Negative, Positive , Negative ,Positive ,Negative
Accura
For comparison, here are both the ground-truth and predicted labels. At first glance we
can see 4 correct and 3 incorrect predictions. Note that changing the threshold might give
20
different results. For example, setting the threshold to 0.6 leaves only two incorrect
predictions.
To extract more information about model performance the confusion matrix is used. The
confusion matrix helps us visualize whether the model is "confused" in discriminating
between the two classes. As seen in the next figure, it is a 2×2 matrix. The labels of the
two rows and columns are Positive and Negative to reflect the two class labels. In this
example the row labels represent the ground-truth labels, while the column labels
represent the predicted labels. This could be changed.
Accuracy is a metric that generally describes how the model performs across all classes.
It is useful when all classes are of equal importance. It is calculated as the ratio between
the number of correct predictions to the total number of predictions.
Here is how to calculate the accuracy using Scikit-learn, based on the confusion matrix
previously calculated. The variable acc holds the result of dividing the sum of True
Positives and True Negatives over the sum of all values in the matrix.
Step 7: Defining runlstm and runff functions for training dataset it is also same as rnn.
21
Step 8:Defining predict function to predict the yield . Here we use Keras to predict the
yield here the test data will be coming to existance here in the array will stored with
actual value and predicted value.
Mean square error: Actual value- Predicted value
If MSE is less accuracy is more if MSE is more accuracy is less.
Step 9: Defining graph function and here we are using numpy for graph
24
def upload():
global filename
global rainfall_dataset
global crop_dataset
global le
filename = filedialog.askdirectory(initialdir = ".")
rainfall_dataset = pd.read_csv('dataset/district wise rainfall normal.csv')
crop_dataset = pd.read_csv('dataset/Agriculture In India.csv')
crop_dataset.fillna(0, inplace = True)
crop_dataset['Production'] = crop_dataset['Production'].astype(np.int64)
print(crop_dataset.dtypes)
print(crop_dataset['Production'])
text.delete('1.0', END)
text.insert(END,filename+' Loaded\n\n')
text.insert(END,str(crop_dataset.head))
def preprocess():
global weight_for_0
global weight_for_1
global crop_dataset
global le
global X, Y
text.delete('1.0', END)
le = LabelEncoder()
crop_dataset['State_Name'] = pd.Series(le.fit_transform(crop_dataset['State_Name']))
crop_dataset['District_Name'] = pd.Series(le.fit_transform(crop_dataset['District_Name']))
crop_dataset['Season'] = pd.Series(le.fit_transform(crop_dataset['Season']))
crop_dataset['Crop'] = pd.Series(le.fit_transform(crop_dataset['Crop']))
crop_datasets = crop_dataset.values
cols = crop_datasets.shape[1]-1
X = crop_datasets[:,0:cols]
Y = crop_datasets[:,cols]
Y = Y.astype('uint8')
avg = np.average(Y)
#avg = avg / 60
Y1 = []
25
for i in range(len(Y)):
if Y[i] >= avg:
Y1.append(1)
else:
Y1.append(0)
Y = np.asarray(Y1)
Y = Y.astype('uint8')
a,b = np.unique(Y, return_counts=True)
print(str(a)+" "+str(b))
Y = to_categorical(Y)
Y = Y.astype('uint8')
counts = np.bincount(Y[:, 0])
weight_for_0 = 1.0 / counts[0]
weight_for_1 = 1.0 / counts[1]
print(X.shape)
print(Y.shape)
scalerX.fit(X)
X = scalerX.transform(X)
text.insert(END,str(X))
def runRNN():
global rnn_acc
global X, Y
global classifier
text.delete('1.0', END)
global rnn_acc
global weight_for_0
global weight_for_1
if os.path.exists('model/rnnmodel.json'):
with open('model/rnnmodel.json', "r") as json_file:
loaded_model_json = json_file.read()
classifier = model_from_json(loaded_model_json)
classifier.load_weights("model/rnnmodel_weights.h5")
classifier._make_predict_function()
print(classifier.summary())
f = open('model/rnnhistory.pckl', 'rb')
26
data = pickle.load(f)
f.close()
accuracy = data[1] * 100
rnn_acc = accuracy
text.insert(END,'RNN Prediction Accuracy : '+str(accuracy)+"\n\n")
else:
class_weight = {0: weight_for_0, 1: weight_for_1}
rnn = Sequential() #creating RNN model object
rnn.add(Dense(256, input_dim=X.shape[1], activation='relu', kernel_initializer = "uniform"))
#defining one layer with 256 filters to filter dataset
rnn.add(Dense(128, activation='relu', kernel_initializer = "uniform"))#defining another layer
to filter dataset with 128 layers
rnn.add(Dense(Y.shape[1], activation='softmax',kernel_initializer = "uniform")) #after buildi
ng model need to predict two classes such as normal or Dyslipidemia disease
rnn.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy']) #while filte
ring and training dataset need to display accuracy
print(rnn.summary()) #display rnn details
rnn_acc = rnn.fit(X, Y, epochs=2, batch_size=64,class_weight=class_weight) #start building
RNN model
values = rnn_acc.history #save each epoch accuracy and loss
values = values['accuracy']
acc = values[1] * 100
rnn_acc = acc;
f = open('model/rnnhistory.pckl', 'wb')
pickle.dump(values, f)
f.close()
text.insert(END,'RNN Prediction Accuracy : '+str(acc)+"\n\n")
classifier = rnn
classifier.save_weights('model/rnnmodel_weights.h5')
model_json = classifier.to_json()
with open("model/rnnmodel.json", "w") as json_file:
json_file.write(model_json)
def runLSTM():
global lstm_acc
if os.path.exists('model/lstmmodel.json'):
with open('model/lstmmodel.json', "r") as json_file:
27
loaded_model_json = json_file.read()
classifier1 = model_from_json(loaded_model_json)
classifier1.load_weights("model/lstmmodel_weights.h5")
classifier1._make_predict_function()
print(classifier1.summary())
f = open('model/lstmhistory.pckl', 'rb')
data = pickle.load(f)
f.close()
accuracy = data[1] * 100
lstm_acc = accuracy
text.insert(END,'LSTM Prediction Accuracy : '+str(accuracy)+"\n\n")
else:
XX = X.reshape((X.shape[0], X.shape[1], 1))
model = Sequential() #creating LSTM model object
model.add(keras.layers.LSTM(512,input_shape=(X.shape[1], 1))) #defining LSTM layer in se
quential object
model.add(Dropout(0.5)) #removing irrelevant dataset features
model.add(Dense(256, activation='relu'))#create another layer
model.add(Dense(Y.shape[1], activation='softmax'))#predict two values as normal or Dyslipi
demia disease
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])#cal
culate accuracy
print(model.summary())
lstm_acc = model.fit(XX, Y, epochs=2, batch_size=64) #start training model
values = lstm_acc.history
values = values['accuracy']
acc = values[1] * 100
lstm_acc = acc
f = open('model/lstmhistory.pckl', 'wb')
pickle.dump(values, f)
f.close()
text.insert(END,'LSTM Prediction Accuracy : '+str(acc)+"\n\n")
classifier1 = model
classifier1.save_weights('model/lstmmodel_weights.h5')
model_json = classifier1.to_json()
with open("model/lstmmodel.json", "w") as json_file:
28
json_file.write(model_json)
def runFF():
global ff_acc
model = Sequential([
Dense(64, activation='relu', input_shape=(X.shape[1],)),
Dense(64, activation='relu'),
Dense(2, activation='softmax')])
model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])
print(model.summary())
lstm_acc = model.fit(X, Y, epochs=2, batch_size=64) #start training model
values = lstm_acc.history
values = values['accuracy']
ff_acc = values[1] * 100
text.insert(END,'Feed Forward Neural Network Prediction Accuracy : '+str(ff_acc)+"\n\n")
def predict():
text.delete('1.0', END)
file = filedialog.askopenfilename(initialdir="dataset")
test = pd.read_csv(file)
test['State_Name'] = pd.Series(le.fit_transform(test['State_Name']))
test['District_Name'] = pd.Series(le.fit_transform(test['District_Name']))
test['Season'] = pd.Series(le.fit_transform(test['Season']))
test['Crop'] = pd.Series(le.fit_transform(test['Crop']))
test = test.values
cols = test.shape[1]
test = test[:,0:cols]
test = scalerX.fit_transform(test)
#test = test.reshape((test.shape[0], test.shape[1], 1))
print(test.shape)
#test = test[:,0:test.shape[1]]
y_pred = classifier.predict(test)
for i in range(len(test)):
predict = np.argmax(y_pred[i])
print(str(predict))
if predict == 0:
text.insert(END,"X=%s, Predicted = %s" % (test[i], 'Predicted Crop Yield will be LESS')+"\n\
n")
29
else:
text.insert(END,"X=%s, Predicted = %s" % (test[i], 'Predicted Crop Yield will be HIGH')+"\n
\n")
def graph():
global rnn_acc,lstm_acc
bars = ['RNN Accuracy','LSTM Accuracy','Feed Forward Accuracy']
height = [rnn_acc,lstm_acc, ff_acc]
y_pos = np.arange(len(bars))
plt.bar(y_pos, height)
plt.xticks(y_pos, bars)
plt.show()
def topGraph():
global rainfall_dataset
global crop_dataset
rainfall_dataset = pd.read_csv('dataset/district wise rainfall normal.csv')
rainfall = rainfall_dataset.groupby(['STATE_UT_NAME'])['ANNUAL'].agg(['sum'])
rainfall = rainfall.sort_values("sum", ascending=False).reset_index()
rainfall = rainfall.loc[0:5]
print(type(rainfall))
rainfall = rainfall.values
x1 = []
y1 = []
for i in range(len(rainfall)):
x1.append(str(rainfall[i,0]))
y1.append(rainfall[i,1])
rice = pd.read_csv('dataset/Agriculture In India.csv')
rice.fillna(0, inplace = True)
rice['Production'] = rice['Production'].astype(np.int64)
rice = rice.groupby(['State_Name','Crop'])['Production'].agg(['sum'])
rice = rice.sort_values("sum", ascending=False).reset_index()
x2 = []
y2 = []
rice = rice.values
for i in range(len(rice)):
if str(rice[i,1]) == 'Rice':
30
x2.append(str(rice[i,0]))
y2.append(rice[i,2])
if len(x2) > 5:
break;
x3 = []
y3 = []
for i in range(len(rice)):
if str(rice[i,1]) == 'Coconut':
x3.append(str(rice[i,0]))
y3.append(rice[i,2])
if len(x3) > 5:
break;
x4 = []
y4 = []
for i in range(len(rice)):
if str(rice[i,1]) == 'Sugarcane':
x4.append(str(rice[i,0]))
y4.append(rice[i,2])
if len(x4) > 5:
break;
x5 = []
y5 = []
for i in range(len(rice)):
x5.append(str(rice[i,0]))
y5.append(rice[i,2])
if len(x5) > 5:
break;
fig, ax = plt.subplots(5)
fig.suptitle('Top 6 State Rainfall & Crop Yield')
ax[0].plot(x1,y1.copy())
ax[0].set_title("State Vs Rainfall")
ax[1].plot(x2,y2.copy())
ax[1].set_title("Top 6 State Vs Rice Crop Yield")
ax[2].plot(x3,y3.copy())
31
ax[2].set_title("Top 6 State Vs Coconut Crop Yield")
ax[3].plot(x4,y4.copy())
ax[3].set_title("Top 6 State Vs Sugarcane Crop Yield")
ax[4].plot(x5,y5.copy())
ax[4].set_title("Top 6 State Vs Any Crop Yield")
plt.show()
font = ('times', 15, 'bold')
title = Label(main, text='Crop Yield Prediction using Machine Learning', justify=LEFT)
title.config(bg='#00cc88', fg='#000000')
title.config(font=font)
title.config(height=3, width=120)
title.place(x=100,y=5)
title.pack()
font1 = ('times', 12, 'bold')
uploadButton = Button(main, text="Upload Agriculture Dataset", command=upload)
uploadButton.place(x=10,y=100)
uploadButton.config(font=font1)
preprocessButton = Button(main, text="Preprocess Dataset", command=preprocess)
preprocessButton.place(x=300,y=100)
preprocessButton.config(font=font1)
rnnButton = Button(main, text="Run RNN Algorithm", command=runRNN)
rnnButton.place(x=480,y=100)
rnnButton.config(font=font1)
lstmButton = Button(main, text="Run LSTM Algorithm", command=runLSTM)
lstmButton.place(x=670,y=100)
lstmButton.config(font=font1)
ffButton = Button(main, text="Run Feedforward Neural Network", command=runFF)
ffButton.place(x=10,y=150)
ffButton.config(font=font1)
graphButton = Button(main, text="Accuracy Comparison Graph", command=graph)
graphButton.place(x=300,y=150)
graphButton.config(font=font1)
predictButton = Button(main, text="Predict Crop using Test Data", command=predict)
predictButton.place(x=10,y=200)
predictButton.config(font=font1)
32
topButton = Button(main, text="Top 6 Crop Yield Graph", command=topGraph)
topButton.place(x=300,y=200)
topButton.config(font=font1)
font1 = ('times', 12, 'bold')
text=Text(main,height=20,width=160)
scroll=Scrollbar(text)
text.configure(yscrollcommand=scroll.set)
text.place(x=10,y=250)
text.config(font=font1)
main.mainloop()
33
5.5 RESULT
5.5.1 SCREENSHOTS
To run project double click on ‘run.bat’ file to get below screen
Fig.10 outputscreenshot 2
34
In above screen selecting and uploading ‘Dataset.csv’ file and then click on ‘Open’ button
to load dataset and to get below screen
35
In above screen all non-numeric values converted to numeric format and in below lines
we can see dataset contains total 246091 records and application using (80%) 196872
records to train ML and using (20%) 49219 records to test ML prediction error rate
(RMSE (root mean square error)). Now click on run rnn algorithm.
36
Fig 15.output screenshot 7
37
Fig 17.output screenshot 9
In above screen selecting and uploading ‘test.csv’ file and then click on ‘Open’ button to
load test data and then application will give below prediction result
38
5.5.2 ADVANTAGES:-
• The proposed system is useful for agriculture department and farmers to predict
crop yield and to suggest the suitable crop if yield is low
• This model can be used to select the most excellent crops for the region.
• In this proposed system there is no need to analyze manually.
5.5.3 DISADVANTAGES:-
• Any kind of outliers in the data might lead to a completely unadequate
Suggestion.
39
CHAPTER-6
TESTING
6.1 INTRODUCTION
The purpose of testing is to discover errors. Testing is the process of trying to discover
every conceivable fault or weakness in a work product. It provides a way to check the
functionality of components, sub assemblies, assemblies and/or a finished product It is the
process of exercising software with the intent of ensuring that the Software system meets
its requirements and user expectations and does not fail in an unacceptable manner. There
are various types of test. Each test type addresses a specific testing requirement.
6.2 TYPES OF TESTS
6.2.1 Unit testing:
Unit testing involves the design of test cases that validate that the internal program logic
is functioning properly, and that program inputs produce valid outputs. All decision
branches and internal code flow should be validated. It is the testing of individual software
units of the application .it is done after the completion of an individual unit before
integration. This is a structural testing, that relies on knowledge of its construction and is
invasive. Unit tests perform basic tests at component level and test a specific business
process, application, and/or system configuration. Unit tests ensure that each unique path
of a business process performs accurately to the documented specifications and contains
clearly defined inputs and expected results.
6.2.2 Integration testing:
Integration tests are designed to test integrated software components to determine if they
actually run as one program. Testing is event driven and is more concerned with the basic
outcome of screens or fields. Integration tests demonstrate that although the components
were individually satisfaction, as shown by successfully unit testing, the combination of
components is correct and consistent. Integration testing is specifically aimed at exposing
the problems that arise from the combination of components.
6.2.3 Functional test :
Functional tests provide systematic demonstrations that functions tested are available as
specified by the business and technical requirements, system documentation, and user
manual
Functional testing is centered on the following items:
Valid Input : identified classes of valid input must be accepted.
Invalid Input : identified classes of invalid input must be rejected.
Functions : identified functions must be exercised.
40
Output :identified classes of application outputs must be exercised.
Systems/Procedures : interfacing systems or procedures must be invoked.
Organization and preparation of functional tests is focused on requirements, key functions,
or special test cases. In addition, systematic coverage pertaining to identify Business
process flows; data fields, predefined processes, and successive processes must be
considered for testing. Before functional testing is complete, additional tests are identified
and the effective value of current tests is determined.
System Test
System testing ensures that the entire integrated software system meets requirements. It
tests a configuration to ensure known and predictable results. An example of system testing
is the configuration oriented system integration test. System testing is based on process
descriptions and flows, emphasizing pre-driven process links and integration points.
6.2.4 White Box Testing
White Box Testing is a testing in which in which the software tester has knowledge of the
inner workings, structure and language of the software, or at least its purpose. It is purpose.
It is used to test areas that cannot be reached from a black box level.
6.2.5 Black Box Testing
Black Box Testing is testing the software without any knowledge of the inner workings,
structure or language of the module being tested. Black box tests, as most other kinds of
tests, must be written from a definitive source document, such as specification or
requirements document, such as specification or requirements document. It is a testing in
which the software under test is treated, as a black box .you cannot “see” into it. The test
provides inputs and responds to outputs without considering how the software works. Unit
Testing Unit testing is usually conducted as part of a combined code and unit test phase of
the software lifecycle, although it is not uncommon for coding and unit testing to be
conducted as two distinct phases.
Test Results: All the test cases mentioned above passed successfully. No defects
encountered.
6.2.6 Acceptance Testing:
User Acceptance Testing is a critical phase of any project and requires significant
participation by the end user. It also ensures that the system meets the functional
requirements. Test Results:All the test cases mentioned above passed successfully. No
defects encountered.
41
CHAPTER-7
CONCLUSION AND FUTURESCOPE
7.1 CONCLUSION:
This project focuses on the prediction of crop and calculation of its yield with the help of
machine learning techniques. Several machine learning methodologies used for the
calculation of accuracy. RNN and LSTM and Feed Forward Neural Network used for the
crop prediction for chosen district. Implemented a system to crop prediction from the
collection of past data. The proposed technique helps farmers in decision making of which
crop to cultivate in the field. This work is employed to search out the gain knowledge about
the crop that can be deployed to make an efficient and useful harvesting. The accurate
prediction of different specified crops across different areas will help farmers . This
improves our Indian economy by maximizing the yield rate of crop production.
7.2 FUTURE SCOPE:
In coming years, can try applying data independent system. That is whatever be the format
our system should work with same accuracy. Integrating soil details to the system is an
advantage, as for the selection of crops knowledge on soil is also a parameter. Proper
irrigation is also a needed feature crop cultivation. In reference to rainfall can depict
whether extra water availability is needed or not. This research work can be enhanced to
higher level by availing it to whole India.
42
CHAPTER-8
REFERENCES
1.Agriculture Role on Indian Economy Madhusudhan L-
https://www.omicsonline.org/openaccess/agriculture-role-on-indianeconomy- 2151-
6219-1000176.php?aid=62176
2.Priya, P., Muthaiah, U., Balamurugan, M. International Journal of Engineering Sciences
Research Technology Predicting Yield of the Crop Using Machine Learning Algorithm.
3.Mishra, S., Mishra, D., Santra, G. H. (2016). Applications of machine learning
techniques in agricultural crop production: a review paper.Indian J. Sci. Technol,9(38), 1-
14.
4.Manjula, E., Djodiltachoumy, S. (2017). A Model for Prediction of Crop
Yield.International Journal of Computational Intelligence and Informatics,6(4), 2349-
6363.
5.Dahikar, S. S., Rode, S. V. (2014). Agricultural crop yield prediction using artificial
neural network approach.International journal of innovative research in electrical,
electronics, instrumentation and control engineering,2(1), 683-686.
6.Gonzlez Snchez, A., Frausto Sols, J., Ojeda Bustamante, W. (2014). Predictive ability of
machine learning methods for massive crop yield prediction.
7.Mandic, D. P., Chambers, J. (2001). Recurrent neural networks for prediction: learning
algorithms, architectures and stability. JohnWiley Sons, Inc..
8.Hochreiter, S., Schmidhuber, J. (1997). Long short-term memory. Neural computation,
9(8), 1735-1780.
9.A. A. Alif, I. F. Shukanya, and T. N. Afee, “Crop prediction based on geographical and
climatic data using machine learning and deep learning”, Doctoral dissertation, BRAC
University) 2018.
10.Sak, H., Senior, A., Beaufays, F. (2014). Long short-term memory recurrent neural
network architectures for large scale acoustic modeling. In Fifteenth annual conference of
the international speech communication association.
11.Niketa Gandhi et al ," Rice Crop Yield Forecasting of Tropical Wet and Dry Climatic
Zone of India Using Data Mining Techniques",IEEE International Conference on
Advances in Computer Applications (ICACA).
43
44