DL_UNIT_IV
DL_UNIT_IV
Unit IV: Convolutional Neural Networks: (1) Nerual Network and (2)
Representation Learing, (3) Convolutional Layers, (4) Multichannel
Convolution Operation, Recurrent Neural Networks: (5) Introduction to
RNN, (6) RNN Code, PyTorch Tensors: (7) Deep Learning with PyTorch, (8)
CNN in PyTorch.
Understanding basic neural networks lays the foundation for grasping Convolutional Neural
Networks (CNNs), which are specialized types of neural networks designed for tasks like image
recognition, object detection, and more.
2. Limitations of Traditional Neural Networks for Images
While fully connected (dense) neural networks work well for structured data, they struggle with
high-dimensional data like images because:
Too Many Parameters: An image with 256x256 pixels (grayscale) has 65,536 features.
Fully connecting each pixel to neurons results in a massive number of parameters.
Loss of Spatial Structure: Dense layers treat each pixel independently, ignoring the
spatial relationships between neighboring pixels.
1. Input Layer:
Takes raw pixel data (e.g., 28x28 pixels for MNIST digits).
2. Convolutional Layer:
Applies filters to extract local features.
3. Activation Function (ReLU):
Introduces non-linearity to learn complex patterns.
4. Pooling Layer (e.g., Max Pooling):
Reduces spatial dimensions, making the network more efficient while retaining important
features.
5. Fully Connected Layer (Dense):
After several convolution and pooling layers, the output is flattened and fed into dense
layers for final classification.
6. Output Layer (Softmax/Sigmoid):
Produces class probabilities.
5. Key Differences: Neural Networks vs. CNNs
Key Concepts:
1. Autoencoders:
Learn compressed representations of data by encoding and then decoding it back.
2. Principal Component Analysis (PCA):
3. A linear method that reduces dimensions while preserving as much variance as possible.
4. Word Embeddings (e.g., Word2Vec, GloVe):
5. Represent words as dense vectors capturing semantic relationships.
6. Deep Learning (CNNs, RNNs, Transformers):
7. Automatically learn hierarchical features from images, sequences, and more.
8. Contrastive Learning:
Focuses on learning representations by comparing similar and dissimilar data points (e.g.,
SimCLR, MoCo).
Reduces Need for Manual Feature Engineering: Saves time and often outperforms
hand-crafted features.
Generalization: Learns robust features that work well across different tasks or datasets.
Efficiency: Captures complex patterns that are hard to define manually.
1. Input Data:
This could be an image (represented as a 2D matrix of pixel values) or other structured
data.
2. Filters (Kernels):
A small matrix of weights (like 3x3 or 5x5) that slides over the input data. Each filter is
designed to detect specific features such as edges, corners, textures, etc.
3. Convolution Operation:
o The filter moves (or convolves) over the input with a certain stride (step size).
o At each position, an element-wise multiplication is performed between the filter
and the overlapping input region, and the results are summed up to produce a
single value.
o This process generates a new matrix called a feature map or activation map.
4. Activation Function:
After convolution, an activation function (like ReLU) is applied to introduce non-
linearity, helping the network learn complex patterns.
1. Filter Size:
2. Common sizes are 3x3, 5x5, etc. Smaller filters capture fine details, while larger ones
detect broader patterns.
3. Stride:
Determines how far the filter moves with each step. A stride of 1 means the filter moves
one pixel at a time; larger strides reduce the output size.
4. Padding:
o Valid Padding: No padding, which reduces the output size.
o Same Padding: Adds zeros around the input to maintain the same output size.
5. Number of Filters:
6. Determines how many different features the layer can detect. More filters capture more
patterns but increase computational cost.
Parameter Sharing: The same filter slides over the entire input, reducing the number of
parameters compared to fully connected layers.
Local Connectivity: Filters focus on local regions, capturing spatial relationships
effectively.
Translation Invariance: Features detected in one part of the image can be recognized
anywhere else.
Each channel carries different information. To effectively learn from this, convolution operations
must handle multiple channels simultaneously.
Output=Input∗Filter
Multichannel Convolution:
When the input has multiple channels (e.g., RGB), the process changes slightly:
Step 1: The filter covers a 2×22 \times 22×2 patch in each of the R, G, B channels.
Step 2: Perform element-wise multiplication in each channel.
Step 3: Sum the results from all 3 channels to get one value in the output feature map.
If we apply multiple filters, each will generate a separate feature map, increasing the depth of
the output.
In practice:
You’ll use many filters (e.g., 32, 64, 128 filters in one layer).
Each filter learns to detect different features (edges, textures, patterns).
The output becomes a stack of feature maps, forming a 3D output tensor.
Example:
Traditional neural networks (like feedforward networks or CNNs) assume that inputs are
independent of each other. However, in many real-world problems, the context from previous
data points is crucial:
Text: The meaning of a word often depends on the words before it.
Speech: The pronunciation of a sound depends on preceding sounds.
Stock Prices: Tomorrow’s price depends on today’s and yesterday’s prices.
RNNs solve this by having a form of memory, allowing them to retain information from
previous inputs when processing new ones.
4. Types of RNN Architectures
Applications of RNNs
Natural Language Processing (NLP): Language translation, text generation, sentiment
analysis.
Speech Recognition: Converting spoken words to text.
Time Series Forecasting: Predicting stock prices, weather forecasting.
Music Generation: Creating new melodies from learned patterns.
import torch.nn as nn
class SimpleRNN(nn.Module):
super(SimpleRNN, self).__init__()
# 2Hyperparameters
num_epochs = 100
learning_rate = 0.01
X = torch.tensor([
], dtype=torch.float32)
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters(), lr=learning_rate)
# 5Training Loop
outputs = model(X)
loss = criterion(outputs, y)
optimizer.zero_grad()
loss.backward()
optimizer.step()
if (epoch+1) % 10 == 0:
with torch.no_grad():
prediction = model(test_seq)
Explanation:
# 4. Training Loop
for epoch in range(50): # Train for 50 epochs
outputs = model(X) # Forward pass
y = y.float().view(-1, 1) # Reshape y for BCE Loss
loss = criterion(outputs, y) # Calculate loss
optimizer.zero_grad() # Clear gradients
loss.backward() # Backpropagation
optimizer.step() # Update weights
# 5. Prediction (Example)
with torch.no_grad():
test_input = torch.randn(5, 5) # Example test data
predictions = model(test_input)
predicted_labels = (predictions > 0.5).int() # Convert probabilities to labels (0 or 1)
print("Predictions:", predicted_labels)
Explanation:
1. Data: We create some random input data X and labels y for demonstration. In a real
application, you would load your data.
2. Model: We define a simple neural network using nn.Sequential. It has two linear layers
(fully connected layers) and a ReLU activation function in between. The output layer has
a sigmoid activation to produce probabilities for binary classification.
3. Loss and Optimizer: We use Binary Cross-Entropy Loss (nn.BCELoss) for binary
classification and the Adam optimizer (optim.Adam) to update the model's weights.
4. Training Loop:
o The code iterates through the data multiple times (epochs).
o In each epoch, it performs a forward pass (calculates predictions), calculates the
loss, performs backpropagation (calculates gradients), and updates the weights
using the optimizer.
o The loss is printed periodically to track training progress.
5. Prediction: We create some example test data, pass it through the trained model to get
predictions (probabilities), and then convert the probabilities to predicted labels (0 or 1)
using a threshold of 0.5. torch.no_grad() is used during prediction because we don't need to
calculate gradients.
# 2. CNN Model
class SimpleCNN(nn.Module):
def __init__(self):
super(SimpleCNN, self).__init__()
self.conv1 = nn.Conv2d(1, 16, kernel_size=3, stride=1, padding=1) # Convolutional layer 1
self.relu1 = nn.ReLU() # ReLU activation
self.pool1 = nn.MaxPool2d(kernel_size=2, stride=2) # Max pooling
self.conv2 = nn.Conv2d(16, 32, kernel_size=3, stride=1, padding=1) # Convolutional layer
2
self.relu2 = nn.ReLU()
self.pool2 = nn.MaxPool2d(kernel_size=2, stride=2)
self.flatten = nn.Flatten() # Flatten the output for the linear layer
self.fc = nn.Linear(32 * 7 * 7, 10) # Fully connected layer (10 output classes)
# 4. Training Loop
for epoch in range(20): # Train for 20 epochs
outputs = model(X)
loss = criterion(outputs, y)
optimizer.zero_grad()
loss.backward()
optimizer.step()
if (epoch+1) % 5 == 0:
print (f'Epoch [{epoch+1}/20], Loss: {loss.item():.4f}')
# 5. Prediction (Example)
with torch.no_grad():
test_input = torch.randn(5, 1, 28, 28) # Example test data (5 images)
predictions = model(test_input)
_, predicted_labels = torch.max(predictions, 1) # Get the predicted class indices
print("Predictions:", predicted_labels)
Explanation:
1. Data: We create some random image data X and labels y. Replace this with your actual
image data loading. We assume grayscale images of size 28x28.
2. CNN Model (SimpleCNN):
o nn.Conv2d: A 2D convolutional layer. It applies a filter (kernel) to the input image
to extract features. 1 is the number of input channels (grayscale), 16 and 32 are the
number of output channels (number of filters). kernel_size is the size of the filter,
stride is how much the filter moves, and padding adds zeros around the image to
control the output size.
o nn.ReLU: ReLU activation function.
o nn.MaxPool2d: Max pooling. It downsamples the feature maps, reducing
computation and making the model more robust to small variations in the input.
o nn.Flatten: Flattens the multi-dimensional feature maps into a 1D vector before
feeding into the fully connected layer.
o nn.Linear: A fully connected linear layer. It takes the flattened features and outputs
the final predictions (logits for each class).
3. Loss and Optimizer:
o nn.CrossEntropyLoss: Loss function for multi-class classification.
o optim.Adam: Adam optimizer.
4. Training Loop: Similar to the previous example, we iterate through the data, perform a
forward pass, calculate the loss, backpropagate, and update the weights.
5. Prediction: We create some example test images and use the trained model to make
predictions. torch.max is used to get the predicted class labels (the class with the highest
probability). torch.no_grad() is essential during prediction.