Assignment_SQGAN
Assignment_SQGAN
Harshyara
Bukkapatnam
—
ENG21CS0085
—
7th Semester B
—
November 6,2024
—
Sequence Networks and GAN
—
Prof.Arjun KrishnaMurthy
CNN for MNIST Handwritten Digit
Classification
Dataset
MNIST is a widely used dataset for the hand-written digit classification task. It consists
of 70,000 labelled 28x28 pixel grayscale images of hand-written digits. The dataset is
split into 60,000 training images and 10,000 test images. There are 10 classes (one for
each of the 10 digits). The task at hand is to train a model using the 60,000 training
images and subsequently test its classification accuracy on the 10,000 test images.
The dataset that is being used here is the MNIST digits classification dataset. Keras is a
deep learning API written in Python and MNIST is a dataset provided by this API. This
dataset consists of 60,000 training images and 10,000 testing images.
Model Methodology
The methodology for this project involves constructing and evaluating a convolutional
neural network (CNN) to classify handwritten digits from the MNIST dataset. The
process is structured as follows:
Data Preparation:
• Normalization: Pixel values, originally in the range [0, 255], are scaled to [0, 1]
to enhance convergence during training.
Model Architecture:
• CNN Structure: A CNN model is constructed using Sequential from
tensorflow.keras, with layers designed for feature extraction and classification:
• Learning Curves: Training and validation losses and accuracies are plotted for
each fold to visualize model learning and identify potential overfitting or
underfitting.
• Accuracy Summary: Final performance is summarized by calculating the mean
and standard deviation of accuracies across all folds, offering a consolidated view
of the model's generalization ability.
Development Environment
Google Colab was used as the development environment for this project, providing a
cloud-based Jupyter notebook interface with pre-installed libraries for deep learning,
such as TensorFlow and Keras. It enables seamless access to GPU acceleration,
enhancing model training efficiency on the MNIST dataset. Additionally, Colab's
collaborative features facilitate code sharing and documentation, streamlining the
development and testing process.
Principle
The principle behind this CNN model is to classify handwritten digits by progressively
learning spatial hierarchies of features through convolutional layers. The model
leverages convolution to capture local patterns, like edges and textures, which are
essential for recognizing digit shapes. Max pooling layers down-sample these features,
reducing computational complexity while preserving key information.
Regularization techniques such as dropout prevent overfitting by adding noise to the
network, enhancing generalization. Finally, the model uses softmax activation to output
class probabilities, enabling accurate digit classification.
Developing a Model
Importing Libraries
To develop the convolutional neural network (CNN) model for digit classification, we
first import essential libraries:
• Numpy: Used for efficient numerical operations, particularly matrix
manipulations, which are crucial in deep learning tasks.
• Matplotlib: A plotting library used to visualize learning curves and diagnostic
plots, aiding in model evaluation and performance analysis.
• Scikit-Learn's KFold: Provides k-fold cross-validation to estimate model
performance by training and testing on different data splits.
• TensorFlow and Keras Modules:
o Datasets (MNIST): Loads the MNIST dataset, a widely used collection of
handwritten digits, ideal for testing classification models.
o Utils (to_categorical): Converts class labels to one-hot encoded format,
necessary for multi-class classification.
o Models (Sequential): Facilitates the construction of a layer-by-layer
neural network model.
o Layers (Conv2D, MaxPooling2D, Dense, Flatten): Composes the
CNN architecture, with Conv2D for feature extraction, MaxPooling2D for
down-sampling, Dense for fully connected layers, and Flatten for reshaping
data.
o Optimizers (SGD): Implements Stochastic Gradient Descent with
learning rate and momentum adjustments, optimizing model convergence.
These libraries collectively provide the tools to preprocess data, build the CNN model,
train with cross-validation, and evaluate performance.
• Normalization: Pixel values, initially ranging from [0, 255], are normalized to
[0, 1] by dividing by 255.0. Normalization aids in stabilizing and accelerating the
training process, allowing the model to converge more efficiently by reducing
variations in pixel intensity.
By performing these steps, the dataset is prepared for optimal performance within the
CNN model, enhancing both accuracy and training speed.
Model Architecture
The model is a convolutional neural network (CNN) designed to classify images of
handwritten digits from the MNIST dataset. The architecture is structured to
progressively capture features at multiple levels of abstraction through several key
layers:
• Convolutional Layers: The model begins with two convolutional layers. The
first layer has 32 filters, and the second layer has 64 filters, both with a 3x3 kernel
size and ReLU activation. These layers learn spatial features in the image, such as
edges and textures, crucial for digit recognition.
• Batch Normalization: Each convolutional layer is followed by batch
normalization to stabilize and accelerate training by normalizing the inputs to
each layer, which helps improve model accuracy.
• Max Pooling Layers: Max pooling layers follow each batch-normalized
convolutional layer to down-sample feature maps, reducing the computational
load and focusing on the most significant features.
• Dropout Layers: Dropout layers are included after each max pooling layer to
reduce overfitting. A dropout rate of 0.2 is applied after the first convolutional
layer, and 0.3 after the second.
• Fully Connected Layers: After flattening the feature maps, the model includes
a dense layer with 100 units and ReLU activation to learn complex combinations
of features before the output layer.
• Output Layer: A dense output layer with 10 neurons and softmax activation
provides class probabilities, corresponding to the ten digit classes (0–9).
Model Compilation
The model is compiled with the following configurations:
• Optimizer: Stochastic Gradient Descent (SGD) with a learning rate of 0.01 and a
momentum of 0.9, which helps the model converge faster by incorporating
previous gradient information.
• Loss Function: Categorical cross-entropy is used as the loss function,
appropriate for multi-class classification tasks.
• Metrics: Model accuracy is tracked as the primary metric, providing a
straightforward evaluation of performance during training and validation.
Evaluation Function
The evaluate_model function was implemented to automate this cross-validation
process. It initializes a KFold object, shuffling the dataset with a fixed random state for
reproducibility. Within each fold, the model is defined, trained for 10 epochs with a batch
size of 32, and evaluated on the validation fold. The function then appends the accuracy
score and training history of each fold to respective lists, scores and histories, for
performance analysis.
Performance Summary
After training the model across multiple folds, we compute an overall performance
summary by calculating the mean and standard deviation of accuracy scores. This
evaluation offers a comprehensive view of the model’s effectiveness and consistency. A
boxplot visualizes the distribution of accuracy scores across folds, emphasizing the
model's generalization capability and performance stability. This summarization
provides a reliable measure of the model's robustness on unseen data.
Final Model Training
In this step, the model is trained on the entire training dataset using the previously
defined CNN architecture. The training process consists of fitting the model to the trainX
and trainY data for 10 epochs, with a batch size of 32. During training, the model adjusts
its weights to minimize the loss function and improve its ability to classify handwritten
digits from the MNIST dataset.
Model Saving
After training, the model is saved as final_model.h5 using the model.save() function.
This allows the trained model to be easily loaded and used for inference or further
evaluation without needing to retrain. The saved model captures the learned parameters,
ensuring reproducibility and facilitating deployment in real-world applications.
Running the Final Model
The final model is trained and saved by invoking the run_final_model() function, which
handles the complete training process and model storage.
Execution
Loading and Evaluating the Final Model
To assess the performance of the trained model, the final version is loaded using
TensorFlow's load_model function. The model, saved as 'final_model.h5', is then
evaluated on the test dataset (testX, testY) to gauge its accuracy. This step provides a
final validation of the model's ability to generalize on unseen data, with the test accuracy
printed as the output.
Image Loading and Preprocessing
The function load_and_prep_image takes an image file as input and preprocesses it for
prediction. The image is loaded in grayscale with a target size of 28x28 pixels, consistent
with the MNIST dataset. The image is then converted to a array and reshaped into a
format suitable for the CNN model (a single sample with one channel). Pixel values are
normalized to the range [0,1] to match the preprocessing done during model training.
Finally, the pre-processed image is displayed for verification, and debugging information
is printed to ensure correct formatting.
Digit Prediction
The predict_digit function loads the pre-processed image and passes it through the
trained CNN model. The model predicts the class (digit) by outputting a probability
distribution over all 10 classes. The predicted digit is identified by selecting the class with
the highest probability using np.argmax(). The predicted digit along with its confidence
score (probability distribution) is printed to the console for evaluation.
Reference:
Google Collab Link:
https://colab.research.google.com/drive/1tp8z4wC8olSFZHYByPkjune6FRbw21Iv?us
p=sharing