Major Project Final Paper
Major Project Final Paper
Abstract—Rice farming is the cornerstone of global food learning are rapidly transforming agriculture research toward
security, and crop yield and quality are highly reliant on soil data based decisions to intelligently automate the very same
suitability. Crop identification and soil assessment in the past to processes of agriculture system decisions that would be
a large degree rely on time-consuming methods that not only evaluated, and sometimes refined, perhaps even constrained
take much time but are also prone to human mistakes. In order to, limited practices of human decision making, all of which
to address such challenges, the current paper proposes a vary in time, human labor, and material resource
solution based on deep learning in the integration of rice crop inputs/decisions. For a combination of many others,
classification and soil categorization in order to effectively convolutional neural networks (CNN) will have a significant
optimize rice farming by accurately identifying the best
role in use cases that involve image-based classification tasks
cultivation conditions. The suggested system takes advantage of
the architecture of an already trained Convolutional Neural
in the classification of plant species or plant disease diagnosis
Network (CNN) on a large database of images of different types or soil texture recognition[2][4]. This type of model "labels"
of rice crops and soils. The model identifies and perceives the or classifies the images based on learned and then interpreted
inherent spectral and textural characteristics of the images in a classifications of images with high accuracy and limited
way that it is able to classify effectively. The system by human input[3].
processing the real images taken from the rice fields is able to In this research, we introduce an innovative deep learning-
identify the rice type and assess compatibility of the production-
based technology that combines rice variety classification and
supporting soil simultaneously. This twin-analysis approach
soil type classification through image processing. The
guarantees better farm decision-making with prompt and
precise prediction of soil suitability without exposing many to
framework is a hybrid model trained on a cross-site and
manual testing. High reliability under varying environmental heterogeneous database populated by images of rice crop and
conditions is guaranteed with rigorous assessments with pre- soil types analyzed for spectral and textural variation [6]. The
defined assessment metrics of accuracy, precision, recall, and model facilitates the analysis of field images in a real-time
F1-score by the model. Agricultural diagnosis with deep setting, classifies rice varieties, classifies soil in terms of
learning provides a cost-effective and scalable path towards the suitability for crop growth and supports precision agriculture.
future of agriculture. Not only is this research beneficial to This integrated system aims to inform farmers and
farmers because of its ability to provide useful information on agronomists in a timely and accurate manner to self-assess the
enhancing crop yield, but it is also sustainable because it suitability of rice crops with regards to soil. By displacing the
encourages data-driven agriculture. The model has achieved an grower from reliance on traditional soil testing, the system
accuracy of 99.7%. The dataset used in this study includes employs CNNs ability to recognize and classify patterns of
75,000 Macroscopic Rice images, of which 15000 contain rice and soil with minimal human intervention and investment
arborio, 15000 contain basmati, 15000 contain Ipsala,15000 on behalf of the grower to optimize and guide decisions in rice
contain jasmine and 15000 labeled as karacadag. production resulting in both higher productivity and use of
suitable agronomically sound agricultural practices [5]. The
Keywords— Automated Crop Identification, Deep system strives to assist with environmentally sound
Learning, Convolutional Neural Network (CNN), Rice
agricultural implementation and development. Additionally,
Classification, Soil suitability
the model is unifying application developers and risk for
I. INTRODUCTION scalability and adaptability, with minor reengineering related
to the features it can be adapted to crops outside a Rice
Rice is potentially the most crucial staple food. Rising cropping system. Integration with mobile applications and
global demand requires that researchers work to enhance both drone platforms allows image collection in real-time, even
the efficiency and sustainability of rice production systems. while surveying areas of dispersed or remote access where
Among the many factors that influences rice yields, soil agronomists, experts in agriculture, and agricultural services
suitability is arguably the most fundamental. Soils possess aren't available[8]. Our approach to sharing geotagged data
properties such as texture, pH, water holding capacity, organic allows the intersection between field study imaging and
matter, and nutrient content, each of which play a direct role smartphone users somewhere they would not ordinarily find
in rice growth and productivity [1]. However, traditional themselves leading into an understanding of rice-related
practices involve laboratory analyses and expert evaluations regional level analysis which has scale potential from micro
that take considerable time, human labor, and/or additional to macro, assisting large-scale monitoring and evidenced
resource investment to assess or characterize relative informed policy makers to address agriculture challenges and
suitability [7]. Advancing Artificial Intelligence (AI) and deep for the agricultural production system [9].
III. METHODOLOGY
Fig [Link] with corresponding soil type. The methodology adopted in the present work is stringent
and systematic approach towards designing a smart deep
learning model that could classify rice grain types and suggest
associated soil types based on the estimated type. The process
was divided into five primary phases—data collection, data
cleaning, feature selection, construction of the CNN network,
and training of the model—each of which played a pivotal role
in the overall success of the system in performance,
generalizability, and implementation in real-world contexts.
The pipeline must be low-resource and scalable; this allows
for deployment in rural or offline environments where there is
limited internet connectivity and computing resources. Each
Fig [Link] with corresponding soil type. step is addressed in detail in this section..
A. Data Collection
A set of 75,000 high-resolution macroscopic images was
produced with 15,000 images of each of the five types of the
five rice varieties: Arborio, Basmati, Ipsala, Jasmine, and
Karacadag. Images were taken in different lighting,
backgrounds, and orientations from digital microscopes and
cellular phones to add robustness to real-world environments.
The data was sourced from public repositories, research farm
sites, and internal collection, with a focus on geographical and
environmental sample variability. The variability enables the
model to generalize between different growing conditions.
Images were manually labeled and manually annotated by
Fig [Link] with corresponding soil type.
agricultural experts for biological correctness of labeling and
correct labeling. The data was then organized in class-
directory formats, which are easily embeddable within deep
learning pipelines. The explicit addition of spatial, visual, and
environmental variability certifies the dataset. At a general
level, this large and diverse dataset is good basis for training
an effective rice classification model..
B. Data Preprocessing
After the dataset of 75,000 macroscopic images of five
types of rice grains was prepared, a general preprocessing
pipeline was run in a manner to convert all the images into a
Fig 4. Karacadag with corresponding soil type. normalized and homogeneous form suitable for deep learning.
The images were of random orientation, illumination,
resolution, and size initially as they were captured with
different sources like smartphones and microscopes under
different conditions. To enable easier treatment of such
variability, the images were resized to a consistent width of
150×150 pixels. This was in the effort to balance
computations while still preserving discriminative visual To a very large extent, this preprocessing step took the
features such as grain boundary edges, texture, and contour limelight in cleaning, standardizing, and enhancing the
important in discrimination of rice varieties. The images were dataset, thus enhancing the robustness and accuracy of the
downsized through pixel value mapping to a range of 0 to 1. convolutional neural network used in this research.
Normalization is standard deep learning practice as it speeds
up training convergence and minimizes the effect of contrast C. Feature Extraction
and brightness variation in the data set. Automatic feature extraction in this work is performed by
Convolutional Neural Networks (CNNs) without human
Other text-based class labels (e.g., "Basmati" or "Karacadag") feature engineering. Shallow layers of CNN detect simple
were represented numerically to allow models to be trained features like lines and edges, while deeper layers detect
with classification algorithms. The entire dataset was also intricate patterns like grain shape, texture, and color gradient.
shuffled randomly to eliminate any potential bias that could Each convolutional layer convolves with filters to generate
be created based on the image order, to facilitate a generalized feature maps, and ReLU activation adds non-linearity.
learning process. Apart from improving the model MaxPooling layers downsample and emphasize significant
performance and preventing overfitting, several data patterns. These hierarchical attributes are distilled into
augmentation methods were used. These comprised random embeddings and passed through dense layers for ultimate
horizontal and vertical flipping, rotation, zooming in and out, classification. The process enables precise recognition of fine
and brightness adjustment. These changes represent actual differences between varieties of rice.
situation and variation, thus training the model to recognize
rice grains with different orientations, lighting, and distances.
Fig [Link] of CNN Architecture of a Convolutional Neural Network (CNN) showing input, convolutional, pooling,
dense, and output layers.
D. CNN Architecture network learns patterns and relationships to enable proper
The CNN model put forward is intended for streamlined classification. methods and less noise than stochastic methods.
classification of rice grains being deployed on low-resource
systems. It takes an input of 150×150×3 RGB images and
diminishes them through sequential Conv2D layers (32, 64,
128 filters) with ReLU activation and MaxPooling2D to
extract spatial and texture features. Dropout layers (0.3) are
used to limit the likelihood of overfitting. The flattened
features are then connected to a dense layer consisting of 128
neurons, before being outputted through a Softmax layer to
categorize rice into five different classes. The model is trained
with a variant of Adam optimizer and sparse categorical
crossentropy. This model balances accuracy, generalizability,
and computational cost.
The CNN classification process follows a sequential structure
of convolution, pooling, flattening, and dense layers to
extract and classify features (see Fig. 6)
E . Model Training
Model training is a critical process engaged in the
development of the rice grain classification system, enabling
the CNN to learn discriminative features from supervised
image data based on labeled data. Supervised learning
involves input-output pairs—in our case, rice grain images Fig 7. Workflow of rice detection and soil recommendation
and their corresponding class labels—through which the using CNN
The model was trained for 30 epochs, with a batch size of
128 to balance memory consumption and convergence rate.
This mini-batch gradient descent strategy allows for more
frequent weight updates than full-batch The Adam optimizer
was used due to its adaptive learning rate and computational
efficiency. It combines the benefits of RMSProp and
AdaGrad, adjusting learning rates based on both first- and
second-order momentum. The learning rate was set to 0.001,
which ensured convergence without overshooting. Sparse
categorical crossentropy was used as the loss function, ideal
for multi-class classification with integer class labels,
penalizing incorrect predictions and driving correction
through backpropagation.
The rice image passes through preprocessing, feature
extraction, CNN-based classification, and results in rice
detection and soil recommendation (see Fig. 7)
Each epoch included a forward pass, where image data
Fig 8. Distribution of Training and Testing Images Across Rice
was propagated through the CNN to produce class Varieties
probabilities. The predicted output was compared to the actual
label using the loss function, and the error was
To test the performance of the proposed rice grain category model,
backpropagated through the network to update weights. a large experiment was undertaken with a well-balanced dataset of
EarlyStopping was included to halt training when validation 75,000 macroscopic images evenly distributed over five rice
loss no longer improved, preventing overfitting and reducing varieties with 15,000 images each. The data were divided into
training time. ModelCheckpoint was also used to training and test sets with about 14,250 samples for training in each
automatically save model weights at the point of best class and the other 750 for testing in a way that there is an even 95:5
validation accuracy. split in each category. Figure 8 depicts the split, indicating
demarcation of the uniform class distribution of samples for training
To enhance robustness, on-the-fly data augmentation
and testing.
was applied via data generators—introducing random
rotations, zoom, flips, and brightness adjustments to simulate
real-world variability. This increased the effective dataset size
and improved generalization by making the model more
invariant to variations in angle, lighting, and
[Link] and validation accuracy and loss were
tracked and visualized using learning curves to identify trends
of overfitting or underfitting. Finally, the trained model was
evaluated on an unseen test set to independently verify its
accuracy and generalizability. The final model was saved in
formats such as .h5 or .tflite for easy integration into the full
rice classification and soil recommendation system on mobile
platforms..
REFERENCES