0% found this document useful (0 votes)
98 views8 pages

Understanding ANN and MNIST Dataset

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
98 views8 pages

Understanding ANN and MNIST Dataset

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

ANN Notes

An Artificial Neural Network (ANN) is a computer system designed to mimic the way the human
brain processes information. Just like our brains have billions of neurons that work together to
help us think and learn, an ANN has "artificial neurons" (also called nodes or units) that help it
recognize patterns and make decisions based on data.

Key Concepts of an ANN:


1. Neuron: In an ANN, a neuron is the basic building block. Each neuron receives information,
processes it, and then passes it on. In an ANN diagram, these neurons are often represented by
circles.

2. Layers:
- Input Layer: This is where data first enters the network. Each neuron in this layer represents
one feature of the data (like the pixel values in an image or scores in a survey).
- Hidden Layers: These are in the middle and do the actual processing. The neurons in these
layers detect patterns or features by performing calculations on the inputs they receive.
- Output Layer: This is the final layer, and it produces the result, like identifying if an image is
of a cat or a dog. Each neuron in the output layer represents a possible answer.

3. Weights and Biases:


- Weights: Each connection between neurons has a weight, which determines how much
influence one neuron has on the next. Adjusting weights is how an ANN "learns."
- Bias: Biases help fine-tune the network's outputs, allowing it to make better predictions.

4. Activation Functions: These are formulas applied to each neuron's output to help decide if
the neuron should "fire" (send its result to the next layer) or stay inactive. Common activation
functions include ReLU (Rectified Linear Unit), which only passes positive values, and Sigmoid,
which gives a result between 0 and 1.

5. Learning:
- ANNs learn by adjusting the weights and biases using a process called *training*. Training
usually involves a method called **backpropagation**, where the network is corrected based on
its errors. By showing the network thousands or even millions of examples and using an
algorithm called *gradient descent*, it gradually improves.

Example:
Imagine teaching a neural network to identify cats in pictures. During training, you would show it
thousands of labeled images (some with cats, some without). The network will adjust its weights
and biases to minimize errors, learning which features (like pointy ears or fur patterns) signal the
presence of a cat.

Why ANNs are Useful


ANNs are great at recognizing patterns in complex data, like images, sounds, and text. They’re
widely used in technologies we use daily, such as speech recognition (like Siri or Google
Assistant), self-driving cars, and medical diagnosis tools.

To implement an Artificial Neural Network (ANN) in TensorFlow without using Keras, we’ll use
TensorFlow’s lower-level API to manually define the architecture, forward pass, and training
loop. Here’s how you can do it using the MNIST dataset:

The MNIST dataset (Modified National Institute of Standards and Technology) is a well-known
dataset in machine learning and computer vision, often used as a benchmark for evaluating
image classification algorithms. Here’s an overview of what it contains and how it's used:

Overview of MNIST Dataset

1. Contents:
○ The dataset consists of 70,000 images of hand-written digits (0-9).
○ It’s split into 60,000 training images and 10,000 test images.
○ Each image is 28x28 pixels in grayscale, making each image a 784-pixel feature
vector (28 × 28 = 784).
2. Classes:
○ There are 10 classes, one for each digit (0 through 9).
○ Each image is labeled with the corresponding digit it represents, which is the
ground truth label.
3. Why MNIST is Popular:
○ Simplicity: Since it’s grayscale and low resolution, MNIST is computationally
inexpensive, allowing for quick experimentation.
○ Consistency: It’s used as a standardized benchmark across many models,
making it easier to compare results.
○ Starter Dataset: MNIST is often considered a “Hello World” dataset for image
recognition and deep learning, serving as an introductory task for new learners in
machine learning.
4. Applications in Learning:
○ Classification Models: Commonly used to teach models like logistic regression,
support vector machines, neural networks, and convolutional neural networks
(CNNs).
○ Image Preprocessing: MNIST helps in practicing techniques like normalization,
scaling, and reshaping images for neural networks.
○ Evaluation Metrics: Used to illustrate model performance metrics, including
accuracy, precision, recall, and confusion matrices.

Steps:

1. Load and preprocess the data


2. Define the model architecture and weights
3. Implement the forward pass (model prediction)
4. Define the loss function and optimizer
5. Train the model using a custom training loop

Here’s a complete code example:


Explanation of the Code

Class Definition for the Model: We created an `ANNModel` class to hold the weights, biases,
and the forward pass method.
Forward Pass: This function calculates the output of the network for given input data by
passing data through two hidden layers with ReLU activations, followed by an output layer with
softmax (used in the evaluation).
Loss Function: We use sparse categorical cross-entropy as it’s suitable for integer labels.
Training Step: Each training step calculates gradients and applies them to update the model’s
weights.
Custom Training Loop: For each epoch, we loop through batches of data and perform training.
Evaluation: Calculates model accuracy on the test set by comparing predicted labels to true
labels.
This code is fully functional in TensorFlow without Keras, and it provides more control over each
part of the ANN, making it suitable for lower-level neural network experimentation.

Key TensorFlow Concepts Explained

Tensors

Tensors are the core data structures in TensorFlow. They are multidimensional arrays, similar to
NumPy arrays, but optimized for performance in machine learning tasks.

Variables

Variables in TensorFlow are mutable tensors that are used to store model parameters (weights
and biases). They can be updated during training.

Operations

Operations are functions that manipulate tensors. Common operations include matrix
multiplication, addition, and activation functions.

Gradient Tape

The [Link] context manager records the operations for automatic differentiation.
This allows TensorFlow to compute gradients, which are essential for optimizing the model
during training.

Activation Functions

Activation functions introduce non-linearity into the model, allowing it to learn complex patterns.
ReLU (Rectified Linear Unit) is commonly used due to its simplicity and effectiveness.

Loss Functions

Loss functions measure how well the model's predictions match the actual labels. They are
crucial for guiding the optimization process.

Optimizers

Optimizers update the model's parameters based on the computed gradients. Adam is a popular
choice due to its adaptive learning rate, which helps achieve faster convergence.

You might also like