0% found this document useful (0 votes)
9 views14 pages

Assignment_SQGAN

The document outlines a project focused on using a Convolutional Neural Network (CNN) for classifying handwritten digits from the MNIST dataset, detailing the methodology, model architecture, and evaluation approach. It describes the data preparation, model training using k-fold cross-validation, and performance diagnostics, all conducted in a Google Colab environment. The project concludes with successful digit classification and potential future optimizations for improved performance.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views14 pages

Assignment_SQGAN

The document outlines a project focused on using a Convolutional Neural Network (CNN) for classifying handwritten digits from the MNIST dataset, detailing the methodology, model architecture, and evaluation approach. It describes the data preparation, model training using k-fold cross-validation, and performance diagnostics, all conducted in a Google Colab environment. The project concludes with successful digit classification and potential future optimizations for improved performance.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 14

ASSIGNMENT-1

Harshyara
Bukkapatnam

ENG21CS0085

7th Semester B

November 6,2024

Sequence Networks and GAN

Prof.Arjun KrishnaMurthy
CNN for MNIST Handwritten Digit
Classification

Dataset

The MNIST handwritten digit classification problem is a standard dataset used in


computer vision and deep learning. Although the dataset is effectively solved, it can be
used as the basis for learning and practicing how to develop, evaluate, and use
convolutional deep learning neural networks for image classification from scratch. This
includes how to develop a robust test harness for estimating the performance of the
model, how to explore improvements to the model, and how to save the model and later
load it to make predictions on new data.

MNIST is a widely used dataset for the hand-written digit classification task. It consists
of 70,000 labelled 28x28 pixel grayscale images of hand-written digits. The dataset is
split into 60,000 training images and 10,000 test images. There are 10 classes (one for
each of the 10 digits). The task at hand is to train a model using the 60,000 training
images and subsequently test its classification accuracy on the 10,000 test images.

The dataset that is being used here is the MNIST digits classification dataset. Keras is a
deep learning API written in Python and MNIST is a dataset provided by this API. This
dataset consists of 60,000 training images and 10,000 testing images.
Model Methodology

The methodology for this project involves constructing and evaluating a convolutional
neural network (CNN) to classify handwritten digits from the MNIST dataset. The
process is structured as follows:
Data Preparation:

• Dataset Loading and Preprocessing: The MNIST dataset is loaded from


tensorflow.keras.datasets, consisting of 28x28 grayscale images with digit labels
(0–9). Images are reshaped to a single channel for CNN compatibility and one-hot
encoded for categorical classification.

• Normalization: Pixel values, originally in the range [0, 255], are scaled to [0, 1]
to enhance convergence during training.
Model Architecture:
• CNN Structure: A CNN model is constructed using Sequential from
tensorflow.keras, with layers designed for feature extraction and classification:

o Convolutional Layers: Two convolutional layers (32 and 64 filters,


respectively) apply a 3x3 kernel with ReLU activation and he_uniform
initialization, followed by batch normalization and max pooling.

o Dropout Layers: Dropout (0.2 and 0.3) is added to mitigate overfitting by


randomly disabling neurons during training.
o Dense Layers: A fully connected dense layer with 100 neurons, followed
by batch normalization and dropout (0.5), is applied before the final output
layer.
o Output Layer: A dense layer with 10 neurons and softmax activation
outputs class probabilities.

• Compilation: The model is compiled using Stochastic Gradient Descent (SGD)


with a learning rate of 0.01 and momentum of 0.9, optimized for categorical cross-
entropy loss.
Evaluation Approach (k-Fold Cross-Validation):
• Cross-Validation: The model is evaluated using 5-fold cross-validation, where
the dataset is divided into five subsets. For each fold, four subsets are used for
training, and one for testing. This approach provides a robust estimate of model
performance by assessing variability across different splits.
• Training and Testing: Within each fold, the model trains for 10 epochs with a
batch size of 32. Accuracy is recorded for both training and validation data,
enabling performance comparison across folds.
Diagnostics and Performance Summary:

• Learning Curves: Training and validation losses and accuracies are plotted for
each fold to visualize model learning and identify potential overfitting or
underfitting.
• Accuracy Summary: Final performance is summarized by calculating the mean
and standard deviation of accuracies across all folds, offering a consolidated view
of the model's generalization ability.

Development Environment
Google Colab was used as the development environment for this project, providing a
cloud-based Jupyter notebook interface with pre-installed libraries for deep learning,
such as TensorFlow and Keras. It enables seamless access to GPU acceleration,
enhancing model training efficiency on the MNIST dataset. Additionally, Colab's
collaborative features facilitate code sharing and documentation, streamlining the
development and testing process.

Principle
The principle behind this CNN model is to classify handwritten digits by progressively
learning spatial hierarchies of features through convolutional layers. The model
leverages convolution to capture local patterns, like edges and textures, which are
essential for recognizing digit shapes. Max pooling layers down-sample these features,
reducing computational complexity while preserving key information.
Regularization techniques such as dropout prevent overfitting by adding noise to the
network, enhancing generalization. Finally, the model uses softmax activation to output
class probabilities, enabling accurate digit classification.

Developing a Model
Importing Libraries
To develop the convolutional neural network (CNN) model for digit classification, we
first import essential libraries:
• Numpy: Used for efficient numerical operations, particularly matrix
manipulations, which are crucial in deep learning tasks.
• Matplotlib: A plotting library used to visualize learning curves and diagnostic
plots, aiding in model evaluation and performance analysis.
• Scikit-Learn's KFold: Provides k-fold cross-validation to estimate model
performance by training and testing on different data splits.
• TensorFlow and Keras Modules:
o Datasets (MNIST): Loads the MNIST dataset, a widely used collection of
handwritten digits, ideal for testing classification models.
o Utils (to_categorical): Converts class labels to one-hot encoded format,
necessary for multi-class classification.
o Models (Sequential): Facilitates the construction of a layer-by-layer
neural network model.
o Layers (Conv2D, MaxPooling2D, Dense, Flatten): Composes the
CNN architecture, with Conv2D for feature extraction, MaxPooling2D for
down-sampling, Dense for fully connected layers, and Flatten for reshaping
data.
o Optimizers (SGD): Implements Stochastic Gradient Descent with
learning rate and momentum adjustments, optimizing model convergence.
These libraries collectively provide the tools to preprocess data, build the CNN model,
train with cross-validation, and evaluate performance.

Data Loading and Preparation


• Dataset Loading: The model uses the MNIST dataset, loaded through
tensorflow.keras.datasets. This dataset contains 28x28 grayscale images of
handwritten digits (0-9), separated into training and test sets.

• Reshaping the Dataset: Each image is reshaped to include a single channel


(28x28x1) to suit the CNN model’s input requirements. This format allows the
convolutional layers to process spatial relationships effectively within each image.

• One-Hot Encoding: Target labels (digit classes) are one-hot encoded,


converting each label into a vector representation. This encoding is crucial for
categorical classification, where the model predicts the probability for each class.

• Normalization: Pixel values, initially ranging from [0, 255], are normalized to
[0, 1] by dividing by 255.0. Normalization aids in stabilizing and accelerating the
training process, allowing the model to converge more efficiently by reducing
variations in pixel intensity.
By performing these steps, the dataset is prepared for optimal performance within the
CNN model, enhancing both accuracy and training speed.
Model Architecture
The model is a convolutional neural network (CNN) designed to classify images of
handwritten digits from the MNIST dataset. The architecture is structured to
progressively capture features at multiple levels of abstraction through several key
layers:
• Convolutional Layers: The model begins with two convolutional layers. The
first layer has 32 filters, and the second layer has 64 filters, both with a 3x3 kernel
size and ReLU activation. These layers learn spatial features in the image, such as
edges and textures, crucial for digit recognition.
• Batch Normalization: Each convolutional layer is followed by batch
normalization to stabilize and accelerate training by normalizing the inputs to
each layer, which helps improve model accuracy.
• Max Pooling Layers: Max pooling layers follow each batch-normalized
convolutional layer to down-sample feature maps, reducing the computational
load and focusing on the most significant features.
• Dropout Layers: Dropout layers are included after each max pooling layer to
reduce overfitting. A dropout rate of 0.2 is applied after the first convolutional
layer, and 0.3 after the second.
• Fully Connected Layers: After flattening the feature maps, the model includes
a dense layer with 100 units and ReLU activation to learn complex combinations
of features before the output layer.
• Output Layer: A dense output layer with 10 neurons and softmax activation
provides class probabilities, corresponding to the ten digit classes (0–9).

Model Compilation
The model is compiled with the following configurations:

• Optimizer: Stochastic Gradient Descent (SGD) with a learning rate of 0.01 and a
momentum of 0.9, which helps the model converge faster by incorporating
previous gradient information.
• Loss Function: Categorical cross-entropy is used as the loss function,
appropriate for multi-class classification tasks.
• Metrics: Model accuracy is tracked as the primary metric, providing a
straightforward evaluation of performance during training and validation.

Model Training Strategy


The model is trained over multiple epochs with cross-validation to ensure robust
generalization. This strategy helps verify that the model performs consistently across
different data splits, improving its reliability on unseen data.
This structured approach to model development enhances the model's ability to
effectively capture, retain, and generalize essential image features, optimizing it for high
accuracy in handwritten digit classification.
k-Fold Cross-Validation
To ensure a robust evaluation of model performance, a 5-fold cross-validation approach
was applied. Cross-validation splits the dataset into five equal parts, or folds, and
iteratively trains and tests the model across these subsets. In each iteration, the model
trains on four of the folds and tests on the remaining one, which varies with each fold.
This method provides a more reliable performance estimate by reducing the impact of
random sampling variations.

Evaluation Function
The evaluate_model function was implemented to automate this cross-validation
process. It initializes a KFold object, shuffling the dataset with a fixed random state for
reproducibility. Within each fold, the model is defined, trained for 10 epochs with a batch
size of 32, and evaluated on the validation fold. The function then appends the accuracy
score and training history of each fold to respective lists, scores and histories, for
performance analysis.

Model Performance Tracking


For each fold, the model’s accuracy is printed to provide insights into the network’s
performance. Final results are stored, allowing for later summarization of the model's
average accuracy and standard deviation across all folds. This approach enhances the
reliability of performance estimates and helps assess model consistency across different
subsets of the data.
Diagnostic Learning Curves
To assess the model's training dynamics and identify potential overfitting or
underfitting, we plotted diagnostic learning curves based on cross-entropy loss and
classification accuracy. For each fold in the cross-validation process, the model's training
and validation losses are visualized, allowing for a comparison of generalization
performance. Similarly, training and validation accuracies are plotted, highlighting how
well the model learns across epochs. This visual analysis provides insights into model
stability and convergence behavior.

Performance Summary
After training the model across multiple folds, we compute an overall performance
summary by calculating the mean and standard deviation of accuracy scores. This
evaluation offers a comprehensive view of the model’s effectiveness and consistency. A
boxplot visualizes the distribution of accuracy scores across folds, emphasizing the
model's generalization capability and performance stability. This summarization
provides a reliable measure of the model's robustness on unseen data.
Final Model Training
In this step, the model is trained on the entire training dataset using the previously
defined CNN architecture. The training process consists of fitting the model to the trainX
and trainY data for 10 epochs, with a batch size of 32. During training, the model adjusts
its weights to minimize the loss function and improve its ability to classify handwritten
digits from the MNIST dataset.
Model Saving
After training, the model is saved as final_model.h5 using the model.save() function.
This allows the trained model to be easily loaded and used for inference or further
evaluation without needing to retrain. The saved model captures the learned parameters,
ensuring reproducibility and facilitating deployment in real-world applications.
Running the Final Model

The final model is trained and saved by invoking the run_final_model() function, which
handles the complete training process and model storage.

Execution
Loading and Evaluating the Final Model
To assess the performance of the trained model, the final version is loaded using
TensorFlow's load_model function. The model, saved as 'final_model.h5', is then
evaluated on the test dataset (testX, testY) to gauge its accuracy. This step provides a
final validation of the model's ability to generalize on unseen data, with the test accuracy
printed as the output.
Image Loading and Preprocessing
The function load_and_prep_image takes an image file as input and preprocesses it for
prediction. The image is loaded in grayscale with a target size of 28x28 pixels, consistent
with the MNIST dataset. The image is then converted to a array and reshaped into a
format suitable for the CNN model (a single sample with one channel). Pixel values are
normalized to the range [0,1] to match the preprocessing done during model training.
Finally, the pre-processed image is displayed for verification, and debugging information
is printed to ensure correct formatting.

Digit Prediction
The predict_digit function loads the pre-processed image and passes it through the
trained CNN model. The model predicts the class (digit) by outputting a probability
distribution over all 10 classes. The predicted digit is identified by selecting the class with
the highest probability using np.argmax(). The predicted digit along with its confidence
score (probability distribution) is printed to the console for evaluation.

Testing the Prediction


The function is tested using the image file 'digit_image.png'. Upon execution, the model
loads, preprocesses the image, makes a prediction, and outputs the predicted digit along
with the confidence level. This ensures the system functions correctly for digit
recognition from new images.
Conclusion
In conclusion, the CNN model successfully classifies handwritten digits from the MNIST
dataset, demonstrating effective use of convolutional layers for feature extraction and
regularization techniques to prevent overfitting. Through 5-fold cross-validation, the
model achieves robust performance with reliable accuracy. The approach highlights the
power of deep learning in image classification tasks and showcases the efficiency of using
Google Colab as a development environment for training and evaluation. Future work
could explore further optimization and the application of more complex architectures for
even higher performance.

Reference:
Google Collab Link:

https://colab.research.google.com/drive/1tp8z4wC8olSFZHYByPkjune6FRbw21Iv?us
p=sharing

You might also like