0% found this document useful (0 votes)
2 views

Introduction to ANN with steps 10 25

Uploaded by

Uppli Raj
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views

Introduction to ANN with steps 10 25

Uploaded by

Uppli Raj
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 30

Introduction to Neural Networks

A neural network is a computational system inspired by the human brain, designed to


recognize patterns and relationships in data. It is a foundational technology in machine
learning and deep learning, frequently used in image recognition, natural language
processing, and predictive analytics.

Key Components of a Neural Network

1. Input Layer:

o Receives input features for processing.

2. Hidden Layers:

o Intermediate layers that transform inputs into something the output layer can
use.

3. Output Layer:

o Produces the final prediction or classification.

4. Weights and Biases:

o Weights: Influence the strength of the connection between neurons.

o Bias: Allows shifting the activation function to better fit the data.

5. Activation Functions:

o Introduce non-linearity to the model, enabling it to learn complex patterns.

o Common functions: Sigmoid, ReLU, Tanh, Softmax.

Working Example: Neural Network to Classify Handwritten Digits

We'll build a simple neural network using the popular Keras library to classify handwritten
digits from the MNIST dataset.

Step-by-Step Process

1. Load and preprocess data.

2. Define the neural network architecture.

3. Compile the model.

4. Train the model.


5. Evaluate and visualize performance.

Sample Code:

python

Copy code

import tensorflow as tf

from tensorflow.keras.models import Sequential

from tensorflow.keras.layers import Dense, Flatten

from tensorflow.keras.datasets import mnist

from tensorflow.keras.utils import to_categorical

import matplotlib.pyplot as plt

# Load and preprocess the data

(x_train, y_train), (x_test, y_test) = mnist.load_data()

# Normalize pixel values to range [0, 1]

x_train = x_train / 255.0

x_test = x_test / 255.0

# One-hot encode target labels

y_train = to_categorical(y_train)

y_test = to_categorical(y_test)

# Define the neural network architecture

model = Sequential([

Flatten(input_shape=(28, 28)), # Flatten the 2D images into 1D vectors

Dense(128, activation='relu'), # Hidden layer with 128 neurons

Dense(10, activation='softmax') # Output layer for 10 classes (digits 0-9)


])

# Compile the model

model.compile(optimizer='adam',

loss='categorical_crossentropy',

metrics=['accuracy'])

# Train the model

history = model.fit(x_train, y_train, epochs=5, batch_size=32, validation_split=0.2)

# Evaluate the model

test_loss, test_accuracy = model.evaluate(x_test, y_test)

print(f"Test accuracy: {test_accuracy:.4f}")

# Plot training accuracy and loss

plt.figure(figsize=(12, 4))

plt.subplot(1, 2, 1)

plt.plot(history.history['accuracy'], label='Training Accuracy')

plt.plot(history.history['val_accuracy'], label='Validation Accuracy')

plt.legend()

plt.title('Training and Validation Accuracy')

plt.subplot(1, 2, 2)

plt.plot(history.history['loss'], label='Training Loss')

plt.plot(history.history['val_loss'], label='Validation Loss')

plt.legend()

plt.title('Training and Validation Loss')


plt.show()

Model Architecture Diagram

The diagram for this neural network structure:

1. Input Layer: Accepts 28x28 grayscale images.

2. Hidden Layer: 128 neurons with ReLU activation.

3. Output Layer: 10 neurons with softmax activation.

Expected Output

• A plot showing Training Accuracy/Loss and Validation Accuracy/Loss over epochs.

• Printed test accuracy value.

• Visual representation of the architecture and activation progression.

Would you like a diagram generated for this architecture?

4o

You said:

yes

ChatGPT said:

ChatGPT

Analysis errored

python

Always show details

Copy code

import matplotlib.pyplot as plt

import numpy as np

# Generate architecture diagram for neural network example

fig, ax = plt.subplots(figsize=(10, 6))


# Define layer node positions

layer_sizes = [784, 128, 10] # Input (784 nodes), Hidden (128 nodes), Output (10 nodes)

layer_labels = ["Input Layer\n(Flattened 28x28)", "Hidden Layer\n128 Neurons (ReLU)",


"Output Layer\n10 Classes (Softmax)"]

layer_colors = ["lightblue", "lightgreen", "salmon"]

spacing = 3

# Plot nodes for each layer

for i, (size, label, color) in enumerate(zip(layer_sizes, layer_labels, layer_colors)):

x_pos = spacing * i

y_positions = np.linspace(-1, 1, size)

ax.scatter([x_pos] * size, y_positions, color=color, label=label, alpha=0.6,


edgecolors='black')

ax.text(x_pos, 1.2, label, ha='center', fontsize=10, bbox=dict(facecolor='white', alpha=0.6,


edgecolor='black'))

# Example arrow connections between layers

ax.annotate('', xy=(spacing, 0.9), xytext=(0, 0.9), arrowprops=dict(arrowstyle="->", lw=1.5))

ax.annotate('', xy=(2*spacing, 0.8), xytext=(spacing, 0.8), arrowprops=dict(arrowstyle="->",


lw=1.5))

# Annotations for better understanding

ax.text(spacing, 0.5, "Weights & Bias Adjustments", fontsize=9, ha='center',


bbox=dict(facecolor='white', alpha=0.7))

# Aesthetic adjustments

ax.set_xlim(-1, 2 * spacing + 1)

ax.set_ylim(-1.5, 1.5)

ax.axis('off')
# Title

plt.title("Neural Network Architecture for MNIST Classification", fontsize=12)

plt.show()

Analyzed

python

Always show details

Copy code

import matplotlib.pyplot as plt

import numpy as np

# Generate architecture diagram for neural network example

fig, ax = plt.subplots(figsize=(10, 6))

# Define layer node positions

layer_sizes = [784, 128, 10] # Input (784 nodes), Hidden (128 nodes), Output (10 nodes)

layer_labels = ["Input Layer\n(Flattened 28x28)", "Hidden Layer\n128 Neurons (ReLU)",


"Output Layer\n10 Classes (Softmax)"]

layer_colors = ["lightblue", "lightgreen", "salmon"]

spacing = 3

# Plot nodes for each layer

for i, (size, label, color) in enumerate(zip(layer_sizes, layer_labels, layer_colors)):

x_pos = spacing * i

y_positions = np.linspace(-1, 1, size)

ax.scatter([x_pos] * size, y_positions, color=color, label=label, alpha=0.6,


edgecolors='black')
ax.text(x_pos, 1.2, label, ha='center', fontsize=10, bbox=dict(facecolor='white', alpha=0.6,
edgecolor='black'))

# Example arrow connections between layers

ax.annotate('', xy=(spacing, 0.9), xytext=(0, 0.9), arrowprops=dict(arrowstyle="->", lw=1.5))

ax.annotate('', xy=(2*spacing, 0.8), xytext=(spacing, 0.8), arrowprops=dict(arrowstyle="->",


lw=1.5))

# Annotations for better understanding

ax.text(spacing, 0.5, "Weights & Bias Adjustments", fontsize=9, ha='center',


bbox=dict(facecolor='white', alpha=0.7))

# Aesthetic adjustments

ax.set_xlim(-1, 2 * spacing + 1)

ax.set_ylim(-1.5, 1.5)

ax.axis('off')

# Title

plt.title("Neural Network Architecture for MNIST Classification", fontsize=12)

plt.show()

STDOUT/STDERR

/tmp/ipykernel_12/2053351591.py:17: UserWarning: You passed a edgecolor/edgecolors


('black') for an unfilled marker ('x'). Matplotlib is ignoring the edgecolor in favor of the
facecolor. This behavior may change in the future.

ax.scatter([x_pos] * size, y_positions, color=color, label=label, alpha=0.6,


edgecolors='black')

Here is a graphical representation of a simple neural network architecture used for


classifying the MNIST dataset:
• Input Layer: Takes in 784 features (flattened 28x28 pixel grid for grayscale images).

• Hidden Layer: 128 neurons with ReLU activation function.

• Output Layer: 10 neurons representing digits (0-9) with Softmax activation function.

Weights and biases are adjusted during training to minimize prediction error. Arrows
represent connections with trainable weights.

step-by-step example demonstrating the complete cycle of a neural network built using
TensorFlow and Keras to classify handwritten digits (MNIST dataset).

Neural Network Example: Classifying Handwritten Digits

1. Import Libraries

python

Copy code

import numpy as np

import tensorflow as tf

from tensorflow.keras.models import Sequential

from tensorflow.keras.layers import Dense, Flatten

from tensorflow.keras.datasets import mnist

from tensorflow.keras.utils import to_categorical

import matplotlib.pyplot as plt

2. Load and Preprocess Data

python

Copy code

# Load data

(X_train, y_train), (X_test, y_test) = mnist.load_data()

# Normalize pixel values to be between 0 and 1

X_train = X_train / 255.0


X_test = X_test / 255.0

# One-hot encode target labels

y_train = to_categorical(y_train)

y_test = to_categorical(y_test)

3. Define the Neural Network Model

python

Copy code

# Build the neural network

model = Sequential([

Flatten(input_shape=(28, 28)), # Flatten input image to 1D vector

Dense(128, activation='relu'), # Hidden layer with 128 neurons and ReLU activation

Dense(10, activation='softmax') # Output layer for 10 classes (digits 0-9)

])

# Compile the model

model.compile(optimizer='adam',

loss='categorical_crossentropy',

metrics=['accuracy'])

4. Train the Model

python

Copy code

# Train the model on training data

history = model.fit(X_train, y_train, epochs=5, batch_size=32, validation_split=0.2)

5. Evaluate the Model


python

Copy code

# Evaluate the model on test data

test_loss, test_accuracy = model.evaluate(X_test, y_test)

print(f"Test Accuracy: {test_accuracy:.2f}")

6. Make Predictions

python

Copy code

# Make predictions on a sample test image

sample_image = X_test[0]

predicted_label = np.argmax(model.predict(sample_image.reshape(1, 28, 28)), axis=1)

actual_label = np.argmax(y_test[0])

print(f"Predicted Label: {predicted_label[0]}")

print(f"Actual Label: {actual_label}")

7. Visualize Sample Prediction

python

Copy code

plt.imshow(sample_image, cmap='gray')

plt.title(f"Predicted: {predicted_label[0]}, Actual: {actual_label}")

plt.axis('off')

plt.show()

Output Explanation

• Input Layer: Processes 28x28 images flattened into 784 features.

• Hidden Layer: Detects patterns in pixel values using 128 neurons.


• Output Layer: Provides probabilities for each of the 10 possible digits.

Example Diagram Representation

The network works by gradually adjusting weights and biases through backpropagation to
predict correct labels for unseen data.

Introduction to Neural Networks

A neural network is a computational model inspired by the way biological neural networks in
the human brain function. It consists of layers of interconnected "neurons" (nodes) that
process input data and adjust internal parameters (weights and biases) to learn patterns.

Key Components of a Neural Network

1. Input Layer:

o Accepts input features from data (e.g., pixel values in images).

o Each neuron represents one input feature.

2. Hidden Layers:

o Perform computations to learn patterns in the data.

o Can have multiple hidden layers, with each layer learning increasingly
complex features.

3. Output Layer:

o Provides the final predictions or classifications.

o Number of neurons in this layer matches the number of possible output


classes.

4. Weights:

o Parameters that connect neurons between layers.

o Adjusted during training to minimize error.

5. Biases:

o Constants added to the weighted sum of inputs before applying the activation
function.

o Helps the model shift activation functions and better fit the data.

6. Activation Functions:
o Introduce non-linearity into the network, allowing it to learn complex
patterns.

o Common activation functions include:

Types of Activation Functions

Activation
Mathematical Form Usage Graph Characteristics
Function

f(x)=11+e−xf(x) = \frac{1}{1 + e^{- Binary Smooth curve from 0


Sigmoid
x}}f(x)=1+e−x1 classification to 1

Tanh f(x)=ex−e−xex+e−xf(x) = \frac{e^x - Outputs


Centered at 0,
(Hyperbolic e^{-x}}{e^x + e^{- between -1 and
smoother transitions
Tangent) x}}f(x)=ex+e−xex−e−x 1

Linear for positive


ReLU (Rectified f(x)=max⁡(0,x)f(x) = \max(0, Hidden layers in
values, zero for
Linear Unit) x)f(x)=max(0,x) deep learning
negative

f(x)=xf(x) = xf(x)=x if x>0x > 0x>0; Address


Allows small negative
Leaky ReLU otherwise f(x)=αxf(x) = \alpha vanishing
slope
xf(x)=αx gradient

f(xi)=exi∑exjf(x_i) =
Multi-class Outputs probabilities
Softmax \frac{e^{x_i}}{\sum e^{x_j}}f(xi
classification for each class
)=∑exjexi

Neural Network Example: Types of Functions

1. Input Layer:

o Inputs: Age, Income, Student Status, Credit Rating

o Example Inputs: [25, "High", "Yes", "Fair"]

2. Hidden Layer 1:

o 128 neurons with ReLU activation function.

o Weights determine how much each input contributes to the output.

3. Hidden Layer 2:

o 64 neurons with Tanh activation function.


o Biases shift the activation to better capture patterns in complex data.

4. Output Layer:

o 10 neurons for digit classification with Softmax activation.

Training Process

1. Forward Propagation:

o Inputs pass through each layer, activating neurons and generating outputs.

2. Loss Calculation:

o Compares predicted outputs with actual labels.

3. Backward Propagation:

o Adjusts weights and biases to reduce error using gradients.

4. Optimization:

o Techniques like Stochastic Gradient Descent (SGD) minimize the loss.

Types of Neural Networks

1. Feedforward Neural Networks (FNNs):

o Data flows in one direction (no loops).

2. Convolutional Neural Networks (CNNs):

o Best suited for image processing.

3. Recurrent Neural Networks (RNNs):

o Captures temporal dependencies in sequence data.

4. Generative Adversarial Networks (GANs):

o Generates new data samples similar to the input dataset.

Visualization Concepts for Neural Networks

To understand how a neural network processes and learns from data, visual representations
are essential. These visuals help explain the layers, activations, weights, and learning
process.

1. Neural Network Architecture Diagram


• Input Layer: Circles represent input features (e.g., numerical data, image pixels).

• Hidden Layers: Multiple layers of circles representing neurons connected by lines


showing weights.

• Output Layer: Final layer predicting the result (classification or regression).

Example Diagram Components

• Color-coded nodes indicate activated neurons.

• Edge thickness represents the magnitude of weights.

• Labels showing activation functions used in each layer.

2. Forward Propagation Flow

• Visualizing how input values propagate through the network:

o Inputs are multiplied by weights and added to biases.

o Activation functions transform the results.

o Outputs flow to the next layer until the final prediction.

3. Activation Functions Plots

Function Graph Description

Sigmoid Smooth curve between 0 and 1, resembling an "S." Useful for binary classification.

Tanh Curve oscillating between -1 and 1, centered around 0.

Straight line for positive inputs and flat at zero for negatives. Efficient for deep
ReLU
learning models.

Softmax A graph showing probabilities assigned to multiple classes, summing to 1.

4. Loss Function Graphs

• Loss vs. Iteration Plot: Shows how the loss reduces as the network trains over
epochs.

• Types of Loss Functions:

o Mean Squared Error (MSE) for regression.


o Cross-Entropy Loss for classification.

5. Weight Updates (Backward Propagation) Visualization

• Diagrams depicting how gradients are calculated and how weights are adjusted.

• Heatmaps to show weight magnitude changes during training.

6. Cluster Visualization for Classification Problems

• 2D or 3D plots:

o Visualize how decision boundaries separate classes after training.

o Points represent data samples, color-coded by class.

7. Convolutional Neural Network (CNN) Visualization

• Feature Maps: Show what each convolutional layer learns.

• Filters highlight edges, textures, or specific features from the input image.

8. Recurrent Neural Network (RNN) Visualization

• Graphs show how hidden states capture temporal dependencies:

o Arrows loop back to previous layers.

o Time-series plots demonstrate sequence modeling.

9. Generative Adversarial Networks (GANs)

• Visualizing how the generator creates fake samples and how the discriminator
distinguishes real from fake.

• Progress over epochs can be shown in image generation tasks (e.g., face generation).

Would you like a visual neural network diagram or interactive plotting examples using
Python libraries like Matplotlib and TensorFlow?
an approach to visually represent neural networks using Python and relevant plotting
techniques. This example walks through key concepts and provides visual outputs to help
illustrate components like layers, weights, activations, and outputs.

Example Neural Network Visualization

We’ll build a simple neural network and visualize key components.

Neural Network Structure

• Input Layer: Two input features.

• Hidden Layer: Two neurons.

• Output Layer: One output neuron.

Python Code for Visualization

python

Copy code

import numpy as np

import matplotlib.pyplot as plt

# Define activation function (Sigmoid for example)

def sigmoid(x):

return 1 / (1 + np.exp(-x))

# Visualization function

def visualize_neural_network(input_values, weights_hidden, weights_output,


biases_hidden, biases_output):

fig, ax = plt.subplots(figsize=(8, 6))

# Plot input layer

input_layer_pos = [(0, 1), (0, 3)]

hidden_layer_pos = [(2, 0.5), (2, 3.5)]


output_layer_pos = [(4, 2)]

# Draw neurons

for pos in input_layer_pos:

ax.add_patch(plt.Circle(pos, 0.2, color='lightblue', ec='black', label='Input Neuron'))

for pos in hidden_layer_pos:

ax.add_patch(plt.Circle(pos, 0.2, color='lightgreen', ec='black', label='Hidden Neuron'))

for pos in output_layer_pos:

ax.add_patch(plt.Circle(pos, 0.2, color='salmon', ec='black', label='Output Neuron'))

# Draw connections

for in_pos in input_layer_pos:

for h_pos in hidden_layer_pos:

ax.plot([in_pos[0], h_pos[0]], [in_pos[1], h_pos[1]], 'gray', linestyle='dashed')

for h_pos in hidden_layer_pos:

for out_pos in output_layer_pos:

ax.plot([h_pos[0], out_pos[0]], [h_pos[1], out_pos[1]], 'gray', linestyle='dashed')

# Set labels and appearance

plt.text(0, -0.5, "Input Layer", ha='center', fontsize=12, weight='bold')

plt.text(2, -0.5, "Hidden Layer", ha='center', fontsize=12, weight='bold')

plt.text(4, -0.5, "Output Layer", ha='center', fontsize=12, weight='bold')

ax.axis('off')

plt.title("Simple Neural Network Visualization")


plt.show()

# Simulate network weights and biases

input_values = np.array([0.5, 0.8])

weights_hidden = np.array([[0.2, 0.4], [0.3, 0.9]])

weights_output = np.array([0.6, 0.8])

biases_hidden = np.array([0.1, 0.2])

biases_output = np.array([0.05])

# Call visualization function

visualize_neural_network(input_values, weights_hidden, weights_output, biases_hidden,


biases_output)

What This Visualization Shows:

• Input Layer (Blue): Represents the input data features.

• Hidden Layer (Green): Processes information and applies weights, biases, and
activations.

• Output Layer (Red): Produces the final prediction.

• Connections: The gray dashed lines represent weights between neurons in adjacent
layers.

Analyzed

python

Always show details

Copy code

import numpy as np

import matplotlib.pyplot as plt

# Define activation function (Sigmoid for example)

def sigmoid(x):
return 1 / (1 + np.exp(-x))

# Visualization function

def visualize_neural_network(input_values, weights_hidden, weights_output,


biases_hidden, biases_output):

fig, ax = plt.subplots(figsize=(8, 6))

# Plot input layer

input_layer_pos = [(0, 1), (0, 3)]

hidden_layer_pos = [(2, 0.5), (2, 3.5)]

output_layer_pos = [(4, 2)]

# Draw neurons

for pos in input_layer_pos:

ax.add_patch(plt.Circle(pos, 0.2, color='lightblue', ec='black'))

for pos in hidden_layer_pos:

ax.add_patch(plt.Circle(pos, 0.2, color='lightgreen', ec='black'))

for pos in output_layer_pos:

ax.add_patch(plt.Circle(pos, 0.2, color='salmon', ec='black'))

# Draw connections

for in_pos in input_layer_pos:

for h_pos in hidden_layer_pos:

ax.plot([in_pos[0], h_pos[0]], [in_pos[1], h_pos[1]], 'gray', linestyle='dashed')

for h_pos in hidden_layer_pos:


for out_pos in output_layer_pos:

ax.plot([h_pos[0], out_pos[0]], [h_pos[1], out_pos[1]], 'gray', linestyle='dashed')

# Set labels and appearance

plt.text(0, -0.5, "Input Layer", ha='center', fontsize=12, weight='bold')

plt.text(2, -0.5, "Hidden Layer", ha='center', fontsize=12, weight='bold')

plt.text(4, -0.5, "Output Layer", ha='center', fontsize=12, weight='bold')

ax.axis('off')

plt.title("Simple Neural Network Visualization")

plt.show()

# Simulate network weights and biases

input_values = np.array([0.5, 0.8])

weights_hidden = np.array([[0.2, 0.4], [0.3, 0.9]])

weights_output = np.array([0.6, 0.8])

biases_hidden = np.array([0.1, 0.2])

biases_output = np.array([0.05])

# Call visualization function

visualize_neural_network(input_values, weights_hidden, weights_output, biases_hidden,


biases_output)

The visualization represents a simple neural network architecture:

• Input Layer: Contains two input neurons (representing the input features).

• Hidden Layer: Includes two neurons in the hidden layer, connected to the input layer
by dashed lines.

• Output Layer: Contains a single neuron representing the network's output.


This visualization helps illustrate how neurons across layers connect to form the neural
network structure. The weights and biases, though not explicitly shown here, affect how
each neuron processes the input to generate output predictions in practice

Mathematics Behind Artificial Neural Networks (ANN)

To understand the mathematical operations within an ANN, key components like cost
functions, derivatives, chain rule, and gradient descent play a vital role.

1. Cost Function

The cost function quantifies the error between the predicted output and the actual output.
For regression problems, a common choice is the Mean Squared Error (MSE):

J(W,b)=1m∑i=1m(hW(x(i))−y(i))2J(W, b) = \frac{1}{m} \sum_{i=1}^{m} (h_W(x^{(i)}) -


y^{(i)})^2J(W,b)=m1i=1∑m(hW(x(i))−y(i))2

Where:

• J(W,b)J(W, b)J(W,b): Cost function.

• hW(x(i))h_W(x^{(i)})hW(x(i)): Predicted value for input x(i)x^{(i)}x(i) using weights


WWW and biases bbb.

• y(i)y^{(i)}y(i): Actual output for input x(i)x^{(i)}x(i).

• mmm: Number of training samples.

2. Activation Functions and Their Derivatives

Activation functions introduce non-linearity into the network. Common activation functions
include:

Sigmoid Activation Function

σ(z)=11+e−z\sigma(z) = \frac{1}{1 + e^{-z}}σ(z)=1+e−z1

• Derivative: σ′(z)=σ(z)⋅(1−σ(z))\sigma'(z) = \sigma(z) \cdot (1 -


\sigma(z))σ′(z)=σ(z)⋅(1−σ(z))

ReLU (Rectified Linear Unit)

f(z)=max⁡(0,z)f(z) = \max(0, z)f(z)=max(0,z)

• Derivative: f′(z)=1f'(z) = 1f′(z)=1 if z>0z > 0z>0; otherwise, f′(z)=0f'(z) = 0f′(z)=0


3. Backpropagation Using Chain Rule

Backpropagation computes gradients for the weights and biases by applying the chain rule
of calculus.

For a simple network:

1. Forward Pass: Compute predictions using weights and biases.

2. Backward Pass: Calculate gradients of the cost function with respect to weights and
biases.

If the output a[L]a^{[L]}a[L] is a function of weights W[L]W^{[L]}W[L] and input xxx, using the
chain rule:

∂J∂W[L]=∂J∂a[L]⋅∂a[L]∂z[L]⋅∂z[L]∂W[L]\frac{\partial J}{\partial W^{[L]}} = \frac{\partial


J}{\partial a^{[L]}} \cdot \frac{\partial a^{[L]}}{\partial z^{[L]}} \cdot \frac{\partial
z^{[L]}}{\partial W^{[L]}}∂W[L]∂J=∂a[L]∂J⋅∂z[L]∂a[L]⋅∂W[L]∂z[L]

4. Gradient Descent Algorithm

Gradient descent updates weights and biases to minimize the cost function:

W:=W−α∂J∂WW := W - \alpha \frac{\partial J}{\partial W}W:=W−α∂W∂J b:=b−α∂J∂bb := b -


\alpha \frac{\partial J}{\partial b}b:=b−α∂b∂J

Where:

• α\alphaα is the learning rate.

• ∂J∂W\frac{\partial J}{\partial W}∂W∂J and ∂J∂b\frac{\partial J}{\partial b}∂b∂J are


gradients.

Example Dataset: Binary Classification

Feature 1 Feature 2 Label

0.2 0.8 0

0.4 0.6 0

0.6 0.4 1

0.8 0.2 1
Problem Example

1. Input: Features x1x_1x1 and x2x_2x2

2. Target Output: yyy (either 0 or 1)

3. Initialization: Random weights WWW and biases bbb

Step-by-Step Process

1. Forward Pass: Compute weighted sum:

z=W1⋅x1+W2⋅x2+bz = W_1 \cdot x_1 + W_2 \cdot x_2 + bz=W1⋅x1+W2⋅x2+b

Apply activation function (e.g., Sigmoid):

a=σ(z)a = \sigma(z)a=σ(z)

2. Compute Cost Function: Calculate the error using MSE or binary cross-entropy.

3. Backward Pass: Compute gradients using the chain rule.

4. Update Parameters: Apply gradient descent to update weights and biases.

This process continues iteratively until the cost function converges to a minimum, allowing
the network to learn patterns in the data and make accurate predictions.

detailed Python implementation and step-by-step walkthrough of training a simple neural


network for binary classification using a dataset similar to the one provided.

Problem Statement

We will use a simple dataset where each point has two features and a binary target (either 0
or 1). The goal is to train a neural network to classify these points correctly using
backpropagation and gradient descent.

Step-by-Step Python Example

1. Dataset Preparation

python

Copy code

import numpy as np
import matplotlib.pyplot as plt

# Simple dataset: 4 points with two features and binary labels

X = np.array([[0.2, 0.8],

[0.4, 0.6],

[0.6, 0.4],

[0.8, 0.2]])

y = np.array([[0], [0], [1], [1]])

# Plot dataset

plt.scatter(X[:, 0], X[:, 1], c=y.flatten(), cmap='viridis')

plt.title("Training Data")

plt.xlabel("Feature 1")

plt.ylabel("Feature 2")

plt.show()

2. Neural Network Functions

1. Activation Function (Sigmoid) and Its Derivative

python

Copy code

def sigmoid(z):

return 1 / (1 + np.exp(-z))

def sigmoid_derivative(z):

return z * (1 - z)

3. Training the Neural Network


python

Copy code

# Initialize weights and biases randomly

np.random.seed(1)

weights = np.random.randn(2, 1)

bias = np.random.randn(1)

learning_rate = 0.1

# Number of iterations for training

epochs = 10000

# Training loop

for epoch in range(epochs):

# Forward propagation

z = np.dot(X, weights) + bias

predictions = sigmoid(z)

# Compute the cost (binary cross-entropy loss)

cost = np.mean(-y * np.log(predictions) - (1 - y) * np.log(1 - predictions))

# Backpropagation

error = predictions - y

gradients = np.dot(X.T, error * sigmoid_derivative(predictions))

bias_gradient = np.sum(error * sigmoid_derivative(predictions))

# Update weights and bias

weights -= learning_rate * gradients

bias -= learning_rate * bias_gradient


# Print cost every 1000 iterations

if epoch % 1000 == 0:

print(f"Epoch {epoch}, Cost: {cost}")

4. Visualizing Decision Boundary

python

Copy code

# Generate a grid for visualization

xx, yy = np.meshgrid(np.linspace(0, 1, 100), np.linspace(0, 1, 100))

grid_points = np.c_[xx.ravel(), yy.ravel()]

grid_predictions = sigmoid(np.dot(grid_points, weights) + bias)

# Plot decision boundary

plt.contourf(xx, yy, grid_predictions.reshape(xx.shape), alpha=0.8, cmap='viridis')

plt.scatter(X[:, 0], X[:, 1], c=y.flatten(), edgecolors='k')

plt.title("Decision Boundary")

plt.xlabel("Feature 1")

plt.ylabel("Feature 2")

plt.show()

Final Output

• Cost: The cost decreases over epochs, showing convergence.

• Decision Boundary: A visualization will clearly separate points with labels 0 and 1.

Diagram Explanation

The decision boundary plot shows how the neural network separates the data points based
on learned weights and biases.
Extending the Neural Network with More Layers (Deep Learning Concepts)

To build a more powerful neural network, we introduce hidden layers, nonlinear activation
functions, and deeper architectures. Let's break it down step by step using modern libraries
for scalability and efficiency.

Key Concepts for Extending the Model

1. Hidden Layers: Intermediate layers between input and output to capture complex
patterns.

2. Activation Functions: Nonlinear functions applied at each layer to model complex


data relationships. Examples:

o ReLU (Rectified Linear Unit): f(x)=max⁡(0,x)f(x) = \max(0, x)f(x)=max(0,x)

o Sigmoid: Useful for binary classification.

o Softmax: For multi-class classification tasks.

3. Cost Function: Cross-entropy for classification problems.

4. Gradient Descent Variants: Optimizers like Adam, RMSprop improve training


efficiency.

Building a Multi-Layer Neural Network using TensorFlow

We will use TensorFlow to create a deep neural network for the same classification problem.

1. Import Necessary Libraries

python

Copy code

import numpy as np

import tensorflow as tf

from tensorflow.keras.models import Sequential

from tensorflow.keras.layers import Dense

import matplotlib.pyplot as plt

2. Prepare the Dataset


python

Copy code

# Same dataset as before

X = np.array([[0.2, 0.8],

[0.4, 0.6],

[0.6, 0.4],

[0.8, 0.2]])

y = np.array([[0], [0], [1], [1]]) # Binary labels

3. Define the Neural Network Architecture

python

Copy code

# Define a sequential model

model = Sequential([

Dense(4, input_dim=2, activation='relu'), # Hidden layer with 4 neurons

Dense(2, activation='relu'), # Additional hidden layer

Dense(1, activation='sigmoid') # Output layer for binary classification

])

4. Compile and Train the Model

python

Copy code

# Compile the model

model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])

# Train the model

history = model.fit(X, y, epochs=200, verbose=0)


# Plot the training loss over epochs

plt.plot(history.history['loss'])

plt.title("Model Loss During Training")

plt.xlabel("Epochs")

plt.ylabel("Loss")

plt.show()

5. Visualize the Decision Boundary

python

Copy code

# Generate grid for visualization

xx, yy = np.meshgrid(np.linspace(0, 1, 100), np.linspace(0, 1, 100))

grid_points = np.c_[xx.ravel(), yy.ravel()]

grid_predictions = model.predict(grid_points)

# Plot decision boundary

plt.contourf(xx, yy, grid_predictions.reshape(xx.shape), alpha=0.8, cmap='viridis')

plt.scatter(X[:, 0], X[:, 1], c=y.flatten(), edgecolors='k')

plt.title("Decision Boundary (Deep Neural Network)")

plt.xlabel("Feature 1")

plt.ylabel("Feature 2")

plt.show()

Understanding the Extensions

1. Hidden Layers:

o Capture more complex relationships in data.

o Learn multiple levels of abstraction.


2. Activation Functions:

o ReLU helps avoid vanishing gradients.

o Sigmoid provides probabilistic outputs in binary classification.

3. Optimizers:

o Adam dynamically adjusts learning rates for efficient convergence.

4. Model Loss:

o A decreasing loss indicates better fit to the data.

You might also like