0% found this document useful (0 votes)
49 views18 pages

DL_UNIT_IV

Unit IV covers Convolutional Neural Networks (CNNs) and Recurrent Neural Networks (RNNs), detailing their architectures, functionalities, and applications. It highlights the advantages of CNNs over traditional neural networks for image data, the concept of representation learning, and the workings of convolutional layers and multichannel operations. Additionally, it introduces RNNs for sequential data processing, their challenges, and provides a sample code for implementing a simple RNN in PyTorch.

Uploaded by

agatamudi2005
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
49 views18 pages

DL_UNIT_IV

Unit IV covers Convolutional Neural Networks (CNNs) and Recurrent Neural Networks (RNNs), detailing their architectures, functionalities, and applications. It highlights the advantages of CNNs over traditional neural networks for image data, the concept of representation learning, and the workings of convolutional layers and multichannel operations. Additionally, it introduces RNNs for sequential data processing, their challenges, and provides a sample code for implementing a simple RNN in PyTorch.

Uploaded by

agatamudi2005
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 18

Unit IV

Unit IV: Convolutional Neural Networks: (1) Nerual Network and (2)
Representation Learing, (3) Convolutional Layers, (4) Multichannel
Convolution Operation, Recurrent Neural Networks: (5) Introduction to
RNN, (6) RNN Code, PyTorch Tensors: (7) Deep Learning with PyTorch, (8)
CNN in PyTorch.

(1) Introduction to Neural Networks (NNs) and


Their Connection to CNNs
A Neural Network (NN) is a computational model inspired by the structure and functioning of
the human brain. It consists of interconnected layers of nodes (or neurons) that process data to
learn patterns and make decisions.

Understanding basic neural networks lays the foundation for grasping Convolutional Neural
Networks (CNNs), which are specialized types of neural networks designed for tasks like image
recognition, object detection, and more.
2. Limitations of Traditional Neural Networks for Images

While fully connected (dense) neural networks work well for structured data, they struggle with
high-dimensional data like images because:

 Too Many Parameters: An image with 256x256 pixels (grayscale) has 65,536 features.
Fully connecting each pixel to neurons results in a massive number of parameters.
 Loss of Spatial Structure: Dense layers treat each pixel independently, ignoring the
spatial relationships between neighboring pixels.

These limitations led to the development of Convolutional Neural Networks (CNNs).

3. How CNNs Improve Over Traditional NNs

CNNs introduce two key ideas:

1. Local Connectivity (Convolution):


Instead of connecting every input to every neuron, CNNs use convolutional layers with
small filters (like 3x3 or 5x5) that scan the input. This captures local patterns (e.g., edges,
textures) and drastically reduces the number of parameters.
2. Parameter Sharing:
The same filter (set of weights) is applied across the entire image, allowing the network
to detect the same feature in different locations. This improves efficiency and
generalization.

4. Architecture of a Basic CNN

A simple CNN architecture consists of:

1. Input Layer:
Takes raw pixel data (e.g., 28x28 pixels for MNIST digits).
2. Convolutional Layer:
Applies filters to extract local features.
3. Activation Function (ReLU):
Introduces non-linearity to learn complex patterns.
4. Pooling Layer (e.g., Max Pooling):
Reduces spatial dimensions, making the network more efficient while retaining important
features.
5. Fully Connected Layer (Dense):
After several convolution and pooling layers, the output is flattened and fed into dense
layers for final classification.
6. Output Layer (Softmax/Sigmoid):
Produces class probabilities.
5. Key Differences: Neural Networks vs. CNNs

Traditional Neural Network Convolutional Neural Network


Aspect
(NN) (CNN)
Data Type Structured/tabular data Image, video, audio, spatial data
Connection Type Fully connected Locally connected (convolution)
Parameter
High number of parameters Fewer parameters due to sharing
Efficiency
Spatial Awareness No Yes

6. Real-World Applications of CNNs

 Image Classification: Identifying objects in images (e.g., cats vs. dogs).


 Object Detection: Detecting and locating objects in real-time (e.g., autonomous
vehicles).
 Medical Imaging: Diagnosing diseases from X-rays, MRIs.
 Facial Recognition: Identifying faces in security systems.
 Natural Language Processing (NLP): Sentence classification, sentiment analysis.

(2) Representation Learning


Representation Learning is a type of machine learning where the system automatically
discovers the features or representations needed for tasks like classification, clustering, or
prediction. Instead of manually designing features (feature engineering), representation learning
allows models to learn useful features directly from raw data.

Key Concepts:

1. Raw Data to Features:


2. Raw data (like images, text, or audio) often contains complex structures. Representation
learning transforms this raw data into meaningful features that make it easier for machine
learning models to perform tasks.
3. Hierarchical Learning:
4. In many models (like deep neural networks), representations are learned hierarchically:
o Lower layers capture basic patterns (edges in images, word frequency in text).
o Higher layers capture complex concepts (object shapes, sentence meaning).
5. Unsupervised, Supervised, and Self-Supervised Learning:
o Unsupervised: Models learn from data without labels (e.g., autoencoders, PCA).
o Supervised: Models learn representations that help predict labels (e.g., CNNs for
image classification).
o Self-Supervised: Models create pseudo-labels from the data itself (e.g.,
contrastive learning).
Popular Techniques in Representation Learning:

1. Autoencoders:
Learn compressed representations of data by encoding and then decoding it back.
2. Principal Component Analysis (PCA):
3. A linear method that reduces dimensions while preserving as much variance as possible.
4. Word Embeddings (e.g., Word2Vec, GloVe):
5. Represent words as dense vectors capturing semantic relationships.
6. Deep Learning (CNNs, RNNs, Transformers):
7. Automatically learn hierarchical features from images, sequences, and more.
8. Contrastive Learning:
Focuses on learning representations by comparing similar and dissimilar data points (e.g.,
SimCLR, MoCo).

Why Representation Learning Matters:

 Reduces Need for Manual Feature Engineering: Saves time and often outperforms
hand-crafted features.
 Generalization: Learns robust features that work well across different tasks or datasets.
 Efficiency: Captures complex patterns that are hard to define manually.

(3) Convolutional Layers


A Convolutional Layer is a fundamental building block of Convolutional Neural Networks
(CNNs), commonly used in image processing, computer vision, and even in NLP tasks. It
performs a mathematical operation called convolution, which helps in automatically learning
spatial hierarchies of features from input data (like images).

How Convolutional Layers Work

1. Input Data:
This could be an image (represented as a 2D matrix of pixel values) or other structured
data.
2. Filters (Kernels):
A small matrix of weights (like 3x3 or 5x5) that slides over the input data. Each filter is
designed to detect specific features such as edges, corners, textures, etc.
3. Convolution Operation:
o The filter moves (or convolves) over the input with a certain stride (step size).
o At each position, an element-wise multiplication is performed between the filter
and the overlapping input region, and the results are summed up to produce a
single value.
o This process generates a new matrix called a feature map or activation map.
4. Activation Function:
After convolution, an activation function (like ReLU) is applied to introduce non-
linearity, helping the network learn complex patterns.

Key Hyperparameters of Convolutional Layers

1. Filter Size:
2. Common sizes are 3x3, 5x5, etc. Smaller filters capture fine details, while larger ones
detect broader patterns.
3. Stride:
Determines how far the filter moves with each step. A stride of 1 means the filter moves
one pixel at a time; larger strides reduce the output size.
4. Padding:
o Valid Padding: No padding, which reduces the output size.
o Same Padding: Adds zeros around the input to maintain the same output size.
5. Number of Filters:
6. Determines how many different features the layer can detect. More filters capture more
patterns but increase computational cost.

Why Convolutional Layers Are Powerful

 Parameter Sharing: The same filter slides over the entire input, reducing the number of
parameters compared to fully connected layers.
 Local Connectivity: Filters focus on local regions, capturing spatial relationships
effectively.
 Translation Invariance: Features detected in one part of the image can be recognized
anywhere else.

Applications of Convolutional Layers


 Image Classification (e.g., ResNet, VGG)
 Object Detection (e.g., YOLO, Faster R-CNN)
 Semantic Segmentation (e.g., U-Net)
 Natural Language Processing (e.g., TextCNN)
 Audio Processing (e.g., Spectrogram Analysis)

(4) Multichannel Convolution Operation


In convolutional neural networks (CNNs), multichannel convolution refers to performing
convolution operations on input data that has multiple channels. This is essential when dealing
with real-world data like color images (which have RGB channels) or even deeper feature maps
in deeper layers of a CNN.

1. Why Multichannel Convolution?


 Grayscale Image: Has 1 channel (e.g., 28×28 pixels).
 Color Image (RGB): Has 3 channels—Red, Green, and Blue (e.g., 28×28×3).
 Deeper CNN Layers: The input becomes feature maps with dozens or hundreds of channels.

Each channel carries different information. To effectively learn from this, convolution operations
must handle multiple channels simultaneously.

2. How Multichannel Convolution Works


Single-Channel Convolution (Recap):

For a grayscale image, the convolution is simple:

Output=Input∗Filter

Where * represents the convolution operation.

Multichannel Convolution:

When the input has multiple channels (e.g., RGB), the process changes slightly:

1. Input: A 3D tensor (Height × Width × Channels), e.g., 32×32×3.


2. Filter (Kernel): A 3D tensor too, with the same depth as the input channels. For
example:
o If the input has 3 channels, the filter will be of size 3×3×3.
3. Operation:
o The filter slides over the input spatially (height and width).
o For each position:
 Perform element-wise multiplication for each channel separately.
 Sum the results across all channels to get a single value.
4. Output: A 2D feature map (or more, if using multiple filters).

4. Example: RGB Image Convolution


- Input: 4×4×34 \times 4 \times 34×4×3 RGB image

- Filter: 2×2×32 \times 2 \times 32×2×3 kernel

 Step 1: The filter covers a 2×22 \times 22×2 patch in each of the R, G, B channels.
 Step 2: Perform element-wise multiplication in each channel.
 Step 3: Sum the results from all 3 channels to get one value in the output feature map.

If we apply multiple filters, each will generate a separate feature map, increasing the depth of
the output.

5. Extending to Multiple Filters

In practice:

 You’ll use many filters (e.g., 32, 64, 128 filters in one layer).
 Each filter learns to detect different features (edges, textures, patterns).
 The output becomes a stack of feature maps, forming a 3D output tensor.

Example:

 Input: 32×32×332 \times 32 \times 332×32×3 (RGB image)


 Filters: 64 filters of size 3×3×33 \times 3 \times 33×3×3
 Output: 30×30×6430 \times 30 \times 6430×30×64 (after convolution with stride=1 and no
padding)
(5) Introduction to Recurrent Neural Networks
(RNNs)
Recurrent Neural Networks (RNNs) are a class of neural networks designed to handle sequential
data—data where the order of elements matters. They are widely used in tasks like natural
language processing (NLP), speech recognition, time series forecasting, and more.

1. Why Use RNNs?

Traditional neural networks (like feedforward networks or CNNs) assume that inputs are
independent of each other. However, in many real-world problems, the context from previous
data points is crucial:

 Text: The meaning of a word often depends on the words before it.
 Speech: The pronunciation of a sound depends on preceding sounds.
 Stock Prices: Tomorrow’s price depends on today’s and yesterday’s prices.

RNNs solve this by having a form of memory, allowing them to retain information from
previous inputs when processing new ones.
4. Types of RNN Architectures

1. One-to-One: Standard feedforward neural network.


o Example: Image classification.
2. One-to-Many: One input, sequence output.
o Example: Image captioning.
3. Many-to-One: Sequence input, single output.
o Example: Sentiment analysis (predicting sentiment from a sentence).
4. Many-to-Many: Sequence input, sequence output.
o Example: Machine translation, video classification.

5. Challenges with RNNs

Despite their power, RNNs have some limitations:

1. Vanishing Gradient Problem:


o When backpropagating through many time steps, gradients can become extremely
small, making learning difficult.
o This limits the ability of standard RNNs to capture long-term dependencies.
2. Exploding Gradient Problem:
o Conversely, gradients can sometimes grow uncontrollably large, destabilizing the
model.

6. Variants to Overcome RNN Limitations

To address the issues above, researchers developed advanced RNN architectures:

 LSTM (Long Short-Term Memory): Designed to remember long-term dependencies


using special gates (input, forget, output).
 GRU (Gated Recurrent Unit): A simplified version of LSTM with fewer gates, often
faster and equally effective.
 Bidirectional RNNs: Process the sequence in both forward and backward directions for
richer context.

Applications of RNNs
 Natural Language Processing (NLP): Language translation, text generation, sentiment
analysis.
 Speech Recognition: Converting spoken words to text.
 Time Series Forecasting: Predicting stock prices, weather forecasting.
 Music Generation: Creating new melodies from learned patterns.

(6) RNN Code:


import torch

import torch.nn as nn

import torch.optim as optim

# 1Define the RNN Model

class SimpleRNN(nn.Module):

def __init__(self, input_size, hidden_size, output_size):

super(SimpleRNN, self).__init__()

self.rnn = nn.RNN(input_size, hidden_size, batch_first=True)

self.fc = nn.Linear(hidden_size, output_size)

def forward(self, x):

out, _ = self.rnn(x) # out: (batch_size, seq_length, hidden_size)

out = out[:, -1, :] # Get the last time step's output

out = self.fc(out) # Final output layer


return out

# 2Hyperparameters

input_size = 1 # Each element is a single number

hidden_size = 8 # Number of hidden units

output_size = 2 # Binary classification (0 or 1)

num_epochs = 100

learning_rate = 0.01

# 3Sample Dataset (Simple Patterns)

# Class 0: Sequences where the sum < 2

# Class 1: Sequences where the sum >= 2

X = torch.tensor([

[[0], [0], [0]],

[[1], [0], [0]],

[[0], [1], [1]],

[[1], [1], [1]],

[[0], [1], [0]]

], dtype=torch.float32)

y = torch.tensor([0, 0, 1, 1, 0], dtype=torch.long) # Labels

# 4Model, Loss, Optimizer

model = SimpleRNN(input_size, hidden_size, output_size)

criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters(), lr=learning_rate)

# 5Training Loop

for epoch in range(num_epochs):

outputs = model(X)

loss = criterion(outputs, y)

optimizer.zero_grad()

loss.backward()

optimizer.step()

if (epoch+1) % 10 == 0:

print(f'Epoch [{epoch+1}/{num_epochs}], Loss: {loss.item():.4f}')

# 6Testing the Model

with torch.no_grad():

test_seq = torch.tensor([[[1], [0], [1]]], dtype=torch.float32) # Sum = 2 (should be Class 1)

prediction = model(test_seq)

predicted_class = torch.argmax(prediction, dim=1)

print(f'Predicted Class: {predicted_class.item()}')

Explanation:

1. Model: SimpleRNN uses nn.RNN followed by a linear layer for classification.


2. Dataset: Sequences of 0s and 1s, with labels based on their sum.
3. Training: We minimize cross-entropy loss.
4. Testing: A new sequence is classified based on the learned pattern.

(7) Deep Learning Using Pytorch Coding


import torch
import torch.nn as nn
import torch.optim as optim
# 1. Data (Simple Example)
X = torch.randn(100, 5) # 100 samples, 5 features
y = torch.randint(0, 2, (100,)) # 100 labels (0 or 1)

# 2. Model (Simple Neural Network)


model = nn.Sequential(
nn.Linear(5, 10), # Input layer (5 features) to hidden layer (10 neurons)
nn.ReLU(), # Activation function (ReLU)
nn.Linear(10, 1), # Hidden layer (10 neurons) to output layer (1 neuron)
nn.Sigmoid() # Output activation (Sigmoid for binary classification)
)

# 3. Loss and Optimizer


criterion = nn.BCELoss() # Binary Cross-Entropy Loss
optimizer = optim.Adam(model.parameters(), lr=0.001) # Adam optimizer

# 4. Training Loop
for epoch in range(50): # Train for 50 epochs
outputs = model(X) # Forward pass
y = y.float().view(-1, 1) # Reshape y for BCE Loss
loss = criterion(outputs, y) # Calculate loss
optimizer.zero_grad() # Clear gradients
loss.backward() # Backpropagation
optimizer.step() # Update weights

if (epoch+1) % 10 == 0: # Print loss every 10 epochs


print(f'Epoch [{epoch+1}/50], Loss: {loss.item():.4f}')

# 5. Prediction (Example)
with torch.no_grad():
test_input = torch.randn(5, 5) # Example test data
predictions = model(test_input)
predicted_labels = (predictions > 0.5).int() # Convert probabilities to labels (0 or 1)
print("Predictions:", predicted_labels)

Explanation:

1. Data: We create some random input data X and labels y for demonstration. In a real
application, you would load your data.
2. Model: We define a simple neural network using nn.Sequential. It has two linear layers
(fully connected layers) and a ReLU activation function in between. The output layer has
a sigmoid activation to produce probabilities for binary classification.
3. Loss and Optimizer: We use Binary Cross-Entropy Loss (nn.BCELoss) for binary
classification and the Adam optimizer (optim.Adam) to update the model's weights.
4. Training Loop:
o The code iterates through the data multiple times (epochs).
o In each epoch, it performs a forward pass (calculates predictions), calculates the
loss, performs backpropagation (calculates gradients), and updates the weights
using the optimizer.
o The loss is printed periodically to track training progress.
5. Prediction: We create some example test data, pass it through the trained model to get
predictions (probabilities), and then convert the probabilities to predicted labels (0 or 1)
using a threshold of 0.5. torch.no_grad() is used during prediction because we don't need to
calculate gradients.

(8) CNN using PyTorch


import torch
import torch.nn as nn
import torch.optim as optim

# 1. Data (Example - Replace with your data)


# For simplicity, we'll use random data. In a real application, you'd load images.
# Assume images are 28x28 and we have 1 channel (grayscale).
X = torch.randn(100, 1, 28, 28) # 100 images, 1 channel, 28x28 size
y = torch.randint(0, 10, (100,)) # 100 labels (0 to 9 - 10 classes)

# 2. CNN Model
class SimpleCNN(nn.Module):
def __init__(self):
super(SimpleCNN, self).__init__()
self.conv1 = nn.Conv2d(1, 16, kernel_size=3, stride=1, padding=1) # Convolutional layer 1
self.relu1 = nn.ReLU() # ReLU activation
self.pool1 = nn.MaxPool2d(kernel_size=2, stride=2) # Max pooling
self.conv2 = nn.Conv2d(16, 32, kernel_size=3, stride=1, padding=1) # Convolutional layer
2
self.relu2 = nn.ReLU()
self.pool2 = nn.MaxPool2d(kernel_size=2, stride=2)
self.flatten = nn.Flatten() # Flatten the output for the linear layer
self.fc = nn.Linear(32 * 7 * 7, 10) # Fully connected layer (10 output classes)

def forward(self, x):


x = self.conv1(x)
x = self.relu1(x)
x = self.pool1(x)
x = self.conv2(x)
x = self.relu2(x)
x = self.pool2(x)
x = self.flatten(x)
x = self.fc(x)
return x

# 3. Instantiate Model, Loss, Optimizer


model = SimpleCNN()
criterion = nn.CrossEntropyLoss() # Loss for multi-class classification
optimizer = optim.Adam(model.parameters(), lr=0.001)

# 4. Training Loop
for epoch in range(20): # Train for 20 epochs
outputs = model(X)
loss = criterion(outputs, y)
optimizer.zero_grad()
loss.backward()
optimizer.step()

if (epoch+1) % 5 == 0:
print (f'Epoch [{epoch+1}/20], Loss: {loss.item():.4f}')

# 5. Prediction (Example)
with torch.no_grad():
test_input = torch.randn(5, 1, 28, 28) # Example test data (5 images)
predictions = model(test_input)
_, predicted_labels = torch.max(predictions, 1) # Get the predicted class indices
print("Predictions:", predicted_labels)

Explanation:

1. Data: We create some random image data X and labels y. Replace this with your actual
image data loading. We assume grayscale images of size 28x28.
2. CNN Model (SimpleCNN):
o nn.Conv2d: A 2D convolutional layer. It applies a filter (kernel) to the input image
to extract features. 1 is the number of input channels (grayscale), 16 and 32 are the
number of output channels (number of filters). kernel_size is the size of the filter,
stride is how much the filter moves, and padding adds zeros around the image to
control the output size.
o nn.ReLU: ReLU activation function.
o nn.MaxPool2d: Max pooling. It downsamples the feature maps, reducing
computation and making the model more robust to small variations in the input.
o nn.Flatten: Flattens the multi-dimensional feature maps into a 1D vector before
feeding into the fully connected layer.
o nn.Linear: A fully connected linear layer. It takes the flattened features and outputs
the final predictions (logits for each class).
3. Loss and Optimizer:
o nn.CrossEntropyLoss: Loss function for multi-class classification.
o optim.Adam: Adam optimizer.
4. Training Loop: Similar to the previous example, we iterate through the data, perform a
forward pass, calculate the loss, backpropagate, and update the weights.
5. Prediction: We create some example test images and use the trained model to make
predictions. torch.max is used to get the predicted class labels (the class with the highest
probability). torch.no_grad() is essential during prediction.

You might also like