0% found this document useful (0 votes)
19 views

Unit 3 ML

Rgpv

Uploaded by

playscore07
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
19 views

Unit 3 ML

Rgpv

Uploaded by

playscore07
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 27

Unit 3

Neural networks - Neural Networks are computational models that mimic the complex
functions of the human brain. The neural networks consist of interconnected nodes or neurons that
process and learn from data, enabling tasks such as pattern recognition and decision making in
machine learning.

Types of Neural Networks


There are seven types of neural networks that can be used.

 Feedforward Neteworks: A feedforward neural network is a simple artificial neural network


architecture in which data moves from input to output in a single direction. It has input,
hidden, and output layers; feedback loops are absent. Its straightforward architecture makes it
appropriate for a number of applications, such as regression and pattern recognition.

 Multilayer Perceptron (MLP): MLP is a type of feedforward neural network with three or
more layers, including an input layer, one or more hidden layers, and an output layer. It uses
nonlinear activation functions.

 Convolutional Neural Network (CNN): A Convolutional Neural Network (CNN) is a


specialized artificial neural network designed for image processing. It employs convolutional
layers to automatically learn hierarchical features from input images, enabling effective image
recognition and classification. CNNs have revolutionized computer vision and are pivotal in
tasks like object detection and image analysis.

 Recurrent Neural Network (RNN): An artificial neural network type intended for sequential
data processing is called a Recurrent Neural Network (RNN). It is appropriate for applications
where contextual dependencies are critical, such as time series prediction and natural language
processing, since it makes use of feedback loops, which enable information to survive within
the network.

 Long Short-Term Memory (LSTM): LSTM is a type of RNN that is designed to overcome
the vanishing gradient problem in training RNNs. It uses memory cells and gates to selectively
read, write, and erase information.
CNN
Types of layers: datasets
Let’s take an example by running a covnets on of image of dimension 32 x 32 x 3.

Input Layers: It’s the layer in which we give input to our model. In CNN, Generally, the input
will be an image or a sequence of images. This layer holds the raw input of the image with width
32, height 32, and depth 3.

Convolutional Layers: This is the layer, which is used to extract the feature from the input
dataset. It applies a set of learnable filters known as the kernels to the input images. The
filters/kernels are smaller matrices usually 2×2, 3×3, or 5×5 shape. it slides over the input image
data and computes the dot product between kernel weight and the corresponding input image
patch. The output of this layer is referred as feature maps. Suppose we use a total of 12 filters for
this layer we’ll get an output volume of dimension 32 x 32 x 12.

Stride is the number of pixels which are shift over the input matrix. When the stride is equaled to 1, then
we move the filters to 1 pixel at a time and similarly, if the stride is equaled to 2, then we move the filters to
2 pixels at a time.

Padding Sometimes filter does not fit perfectly fit the input image.
We have two options:
 Pad the picture with zeros (zero-padding) so that it fits
 Drop the part of the image where the filter did not fit.
This is called valid padding which keeps only valid part of the image

"Padding is an additional layer which can add to the border of an image."


How does Padding work?
Padding works by extending the area of which a convolutional neural network processes an
image. The kernel is the neural networks filter which moves across the image, scanning each
pixel and converting the data into a smaller, or sometimes larger, format. In order to assist
the kernel with processing the image, padding is added to the frame of the image to allow
for more space for the kernel to cover the image. Adding padding to an image processed by
a CNN allows for more accurate analysis of images.
o Shrinking outputs
o Losing information on the corner of the image.
Activation Layer: By adding an activation function to the output of the preceding layer,
activation layers add nonlinearity to the network. it will apply an element-wise activation function
to the output of the convolution layer. Some common activation functions are RELU: max(0,
x), Tanh, Leaky RELU, etc. The volume remains unchanged hence output volume will have
dimensions 32 x 32 x 12.

Pooling layer: This layer is periodically inserted in the convnet and its main function is to reduce
the size of volume which makes the computation fast reduces memory and also prevents
overfitting. Two common types of pooling layers are max pooling and average pooling. If we use
a max pool with 2 x 2 filters and stride 2, the resultant volume will be of dimension 16x16x12.
Pooling layer plays an important role in pre-processing of an image. Pooling layer reduces the
number of parameters when the images are too large. Pooling is "downscaling" of the image
obtained from the previous layers. It can be compared to shrinking an image to reduce its pixel
density..
Pooling is reduces the dimensionality of each feature map but remain the important
information. Since the large number of hidden layer required to learn the complex relation
present in the input image . so we apply pooling to reduce the feature representation.

Pooling layers section would reduce the number of parameters when the images are too
large. Spatial pooling also called subsampling or downsampling which reduces the
dimensionality of each map but retains important information. Spatial pooling can be of
different types:

Max pooling takes the largest element from the rectified feature map. Taking the largest element could
also take the average pooling. Sum of all elements in the feature map call as sum pooling

Flattening: The resulting feature maps are flattened into a one-dimensional vector after the
convolution and pooling layers so they can be passed into a completely linked layer for
categorization or regression.

Fully Connected Layers: In this layer every neuron in one layer is connected to the every neuron
to the another layer. The aim of fully connected layer is to use high level feature map produced by
convolution and pooling layer for classifying the input image into various classes based on the
training dataset. takes the input from the previous layer and computes the final classification or
regression task.

Output Layer: The output from the fully connected layers is then fed into a logistic function for
classification tasks like sigmoid or softmax which converts the output of each class into the
probability score of each class.
Types of Layers in Neural Network
The Neural Network is constructed from 3 type of layers: Input layer —
initial data for the neural network. Hidden layers — intermediate layer
between input and output layer and place where all the computation is
done. Output layer — produce the result for given inputs

Q .What do you mean by the term convolution layer, pooling layer, loss layer, dense layer?
Describe each one in brief .

Convolution Layer - Convolutional layer. The convolutional layer is the core


building block of a CNN, and it is where the majority of computation occurs. It
requires a few components, which are input data, a filter, and a feature map
1.Convolution Operation: At the core of a convolutional layer is the convolution
operation. It involves sliding a small filter (also known as a kernel) across the
input data. The filter has learnable parameters that are adjusted during the
training process. At each position of the input data, the filter is applied to a local
region, and a dot product operation is performed between the filter and the
input values within that region.
2.Feature Maps: As the filter slides across input data, it generates feature maps
representing different features or patterns. Multiple filters produce multiple
feature maps capturing various aspects of the input.
3.Shared Weights and Parameter Sharing: Convolutional layers employ
parameter sharing, where the same filter is applied to different regions of the
input. This reduces parameters and encourages learning spatial hierarchies of
features.
4.Activation Function: After convolution, an activation function like ReLU is applied
to introduce non-linearity. ReLU maps negative values to zero, aiding in training
deep neural networks effectively.

5.Padding and Stride: Padding adds zeros around input data to preserve spatial
information at edges. Stride determines the step size of the filter across the input,
controlling the spatial dimensions of the output feature maps.

Pooling Layer
In Convolutional Neural Networks (CNNs), the output feature maps from the
convolutional layers are down sampled by using pooling layers.
The main purpose of pooling is to reduce the size of feature maps, which in turn makes
computation faster. Pooling layers section would reduce the number of
parameters and maintain the most relevant information.
Spatial pooling also called subsampling or down sampling which reduces the dimensionality
of each map but retains important information. Spatial pooling can be of different types:
 Max Pooling  Average Pooling  Sum Pooling

Max pooling - is a pooling operation that selects the maximum element from the region
of the feature map covered by the filter the summary of the features in a region is
represented by the maximum value in that region. It is mostly used when the image has a
dark background since max pooling will select brighter pixels.

Min Pooling - In this type of pooling, the summary of the features in a region is represented
by the minimum value in that region. It is mostly used when the image has a light background
since min pooling will select darker pixels.
Average Pooling - In the third type of pooling, the summary of the features in a region are
represented by the average value of that region. Average pooling smooths the harsh edges of a
picture and is used when such edges are not important.

Global Pooling - Maximum or average value over the full spatial dimension of the input
feature map is calculated using global pooling. Global pooling is often used to prepare the data
from a convolutional layer to be utilized in a fully connected layer

Advantages of Pooling Layer:


Dimensionality reduction: The main advantage of pooling layers is that they help in reducing the
spatial dimensions of the feature maps. This reduces the computational cost and also helps in
avoiding overfitting by reducing the number of parameters in the model.

Translation invariance: Pooling layers are also useful in achieving translation invariance in the
feature maps. This means that the position of an object in the image does not affect the
classification result, as the same features are detected regardless of the position of the object.

Feature selection: Pooling layers can also help in selecting the most important features from the
input, as max pooling selects the most salient features and average pooling preserves more
information.

Loss Layer - The loss functions are used in the output layer to calculate the deviation between the output
that is predicted and the actual output. Depending upon the usage, we use different loss functions.
Softmax Loss Function/Cross-Entropy: It is used for measuring the model performance. It generates
independent probability values within the probability distribution of [0,1].

Euclidean Loss: It is used in regression problems to real-valued labels (-∞, ∞).

The loss layer, also known as the cost function or objective function, is a crucial component of a machine
learning model, particularly in supervised learning tasks such as classification or regression

Mean Absolute Error (L1 Loss)


Mean Squared Error (L2 Loss)
Huber Loss
Cross-Entropy(a.k.a Log loss)
Relative Entropy(a.k.a Kullback–Leibler divergence)
Squared Hinge

Dense Layer -

A dense layer, also known as a fully connected layer, is a type of neural network layer where
each neuron is connected to every neuron in the previous layer. Dense layers are fundamental
building blocks in feedforward neural networks, including multilayer perceptrons (MLPs) .
Dense layers are crucial for learning complex patterns in data and are commonly used in the
final stages of deep learning models for tasks like classification and regression

SubSampling - Sub-sampling, often referred to as down sampling or pooling, is a


technique used in neural networks to reduce the spatial dimensions (width and
height) of the input data. This process helps in reducing computational
complexity.
most important features from the input data.
The process of sub-sampling typically occurs after applying convolutional layers
in a Convolutional Neural Network (CNN). It helps in preserving the most
relevant information while reducing the dimensionality of the data.
The sub sampling layer performs the down sampled operation on the input
maps. This is commonly knows as the pooling layer. In This layer The number of
input and output features maps does not change. for example If there are N
inputs In maps Then there will be exactly N output maps. Due to this down
sampling operation the size of each dimension of the output maps will be
reduced depending on the size of down sampling mask. For Example, if a 2X2
down sampling kernel is used, then each output dimension will be half of the
corresponding input dimension for all the images

Keras - is one of the most powerful and easy to use python library, which is
built on top of popular deep learning libraries like TensorFlow, Theano, etc.,
for creating deep learning models.
Keras is an open-source high-level Neural Network library, which is written in Python is capable
enough to run on Theano, TensorFlow, or CNTK. It was developed by one of the Google engineers,
Francois Chollet. It is made user-friendly, extensible, and modular for facilitating faster
experimentation with deep neural networks. It not only supports Convolutional Networks and
Recurrent Networks individually but also their combination.

What makes Keras special?


o Focus on user experience has always been a major part of Keras.
o Large adoption in the industry.
o It is a multi backend and supports multi-platform, which helps all the encoders come together
for coding.
o Research community present for Keras works amazingly with the production community.
o Easy to grasp all concepts.
o It supports fast prototyping.
o It seamlessly runs on CPU as well as GPU.
o It provides the freedom to design any architecture, which then later is utilized as an API for
the project.
o It is really very simple to get started with.
o Easy production of models actually makes Keras special.

Applications of Keras

 Keras is used for creating deep models which can be productized on smartphones.

 Keras is also used for distributed training of deep learning models.

 Keras is used by companies such as Netflix, Yelp, Uber, etc.

 Keras is also extensively used in deep learning competitions to create and deploy working
models, which are fast in a short amount of time.

Certainly! Keras is a high-level neural networks API written in Python that works as an interface for
artificial neural networks. It's known for its simplicity, modularity, and ease of use. Here are some
key features of the Keras framework:

User-Friendly API: Keras provides a simple and intuitive interface that makes it easy to design,
build, and experiment with neural network models. Its user-friendly design is particularly beneficial
for beginners and researchers.

Modularity: Keras enables building neural networks using a modular approach. Neural network
architectures can be constructed by assembling individual layers, allowing for easy
experimentation and customization.

Compatibility with Multiple Backends: Keras is compatible with multiple deep learning backend
engines, including TensorFlow, Theano, and Microsoft Cognitive Toolkit (CNTK). This flexibility
allows users to choose the backend that best suits their needs

Support for Convolutional and Recurrent Networks: Keras provides built-in support for
building Convolutional Neural Networks (CNNs) for tasks such as image classification and object
detection, as well as Recurrent Neural Networks (RNNs) for tasks such as sequence modeling and
natural language processing.

Extensive Documentation and Community Support: Keras offers comprehensive


documentation, tutorials, and examples, making it easy for users to get started with building
neural network models. Additionally, it has a vibrant community of users and contributors who
provide support and share knowledge through forums, mailing lists, and social media platforms

Ease of Prototyping: Keras allows for rapid prototyping of neural network architectures by
providing a wide range of pre-built layers, activation functions, optimizers, and loss functions. This
enables users to quickly experiment with different configurations and hyperparameters.
Visualization Tools: Keras provides built-in utilities for visualizing neural network architectures,
training/validation curves, and model performance metrics. These visualization tools aid in
understanding and debugging neural network models.

Easy Model Saving and Loading: Keras makes it simple to save trained models to disk and load
them for inference or further training. Models can be saved in various formats, including HDF5 and
JSON, making them compatible with different platforms and environments.

Customizability: Keras allows users to define custom layers, loss functions, metrics, and callbacks,
enabling advanced customization and integration of domain-specific requirements into neural
network models.

1X1 Convolution
A problem with deep convolutional neural networks is that the number of feature maps often
increases with the depth of the network. This problem can result in a dramatic increase in the
number of parameters and computation required when larger filter sizes are used, such as 5×5 and
7×7.

To address this problem, a 1×1 convolutional layer can be used that offers a channel-wise pooling,
often called feature map pooling or a projection layer. This simple technique can be used for
dimensionality reduction, decreasing the number of feature maps whilst retaining their salient
features. It can also be used directly to create a one-to-one projection of the feature maps to pool
features across channels or to increase the number of feature maps, such as after traditional
pooling layers.

A filter applied to an input image or feature map always results in a single number.
Systematic application of the filter from left to right and top to bottom creates a two-
dimensional feature map. Each filter produces one corresponding feature map.

The filter must match the depth (number of channels) of the input. Regardless of the input
and filter depth, the output is a single number, creating a feature map with a single
channel.

Concrete examples:

 For a grayscale image (one channel), a 3×3 filter is applied in 3x3x1 blocks.
 For a color image with three channels (red, green, blue), a 3×3 filter is applied in
3x3x3 blocks.
 For a block of feature maps with a depth of 64 from another layer, a 3×3 filter is
applied in 3x3x64 blocks to create the single values for the output feature map.

The depth of the output of one convolutional layer is defined only by the number of parallel
filters applied to the input.

Problem of Too Many Feature Maps


As the depth of a convolutional neural network (CNN) increases, so does the number of
feature maps. This is a common design pattern but can lead to issues:
 Increased Depth: More filters in deeper layers increase the number of feature maps.
 Concatenation: Architectures like Inception concatenate outputs from multiple layers,
increasing input depth for subsequent layers.
 Computational Load: More feature maps require more parameters and
computations, especially with larger filters like 5×5 or 7×7.

Pooling layers reduce spatial dimensions but not the number of feature maps. Thus, a
method to reduce the depth is needed.
Down sample Feature Maps With 1×1 Filters
A 1×1 convolutional layer helps by:
 Dimensionality Reduction: Reducing the number of feature maps while retaining
important features.
 Efficiency: Each 1×1 filter has one weight per input channel, acting like a neuron
across the input feature maps.
 Nonlinearity: Applying nonlinear functions enables complex transformations.
This simple method summarizes input feature maps and allows control over the depth of
feature maps. It can be used to increase or decrease the number of feature maps as
needed, often referred to as a projection layer or channel pooling layer.

What is Inception Net


Inception Modules are used in Convolutional Neural Networks to allow for more efficient
computation and deeper Networks through a dimensionality reduction with stacked 1×1
convolutions. The modules were designed to solve the problem of computational expense,
as well as overfitting, among other issues. The solution, in short, is to take multiple kernel
filter sizes within the CNN, and rather than stacking them sequentially, ordering them to
operate on the same level.

Inception Blocks
Conventional convolutional neural networks typically use convolutional and pooling layers
to extract features from the input data. However, these networks are limited in capturing
local and global features, as they typically focus on either one or the other. The inception
blocks in the InceptionNet architecture are intended to solve the problem of learning a
combination of local and global features from the input data.

Inception blocks address this problem using a modular design that allows the network to
learn a variety of feature maps at different scales. These feature maps are
then concatenated together to form a more comprehensive representation of the input
data. This allows the network to capture a wide range of features, including both low-level
and high-level features, which can be useful for tasks such as image classification.

By using inception blocks, the Inception Net architecture can learn a more comprehensive
set of features from the input data, which can improve the network's performance on tasks
such as image classification.

The below image is the “naive” inception module. It performs convolution on an input,
with 3 different sizes of filters (1x1, 3x3, 5x5). Additionally, max pooling is also performed.
The outputs are concatenated and sent to the next inception module.
inception network is often difficult to determine the best filters sizes for your network and whether to use
polling layers. to overcome this inception architecture uses many different filter sizes and pooling layers in
parallel, the output of which are concatenated and inputted to the next block in this way the network
chooses which filter sizes or combination use. To solve the problem of a large computational cost the
inception network utilises 1X1 convolution to shrink the volume of the next layer.
As stated before, deep neural networks are computationally expensive. To make it
cheaper, the authors limit the number of input channels by adding an extra 1x1
convolution before the 3x3 and 5x5 convolutions. Though adding an extra operation
may seem counterintuitive, 1x1 convolutions are far more cheaper than 5x5
convolutions, and the reduced number of input channels also help. Do note that
however, the 1x1 convolution is introduced after the max pooling layer, rather than
before.
How does an Inception Module Function Work?
 An Inception Module is a building block used in the Inception network architecture
for CNNs.
 It improves performance by allowing multiple parallel convolutional filters to be
applied to the input data.
 The basic structure of an Inception Module is a combination of multiple convolutional
filters of different sizes applied in parallel to the input data.
 The filters may have different kernel sizes (e.g. 3x3, 5x5) and/or different strides
(e.g. 1x1, 2x2).
 Output of each filter is concatenated together to form a single output feature map.
 Inception Module also includes a max pooling layer, which takes the maximum value
from a set of non-overlapping regions of the input data.
 This reduces the spatial dimensionality of the data and allows for translation
invariance.
 The use of multiple parallel filters and max pooling layers allows the Inception
Module to extract features at different scales and resolutions, improving the
network's ability to recognize patterns in the input data.
 In summary, the Inception module improves feature extraction, improving the
network's performance.

Why 1 X 1 Convolutions are Less Expensive?


 1x1 convolutions are less computationally expensive than larger convolutional filters
because they involve fewer parameters.
 Since the kernel size is 1x1, it only has one set of weights, much less than the number
required for larger convolutional filters.
 1x1 convolutions also require less memory to store the weights and less computation
to perform the convolution.
 These smaller kernels are more efficient as they are applied to lower-dimensional
feature maps, which reduces the number of operations and memory required.
 Using 1x1 convolutions also allows for dimensionality reduction, which can help to
reduce the number of parameters in the network and improve performance.
 In summary, 1x1 convolutions are less computationally expensive due to fewer
parameters, fewer memory requirements, and less computation required for
convolution, making them more efficient and suitable for dimensionality reduction.
Dimensionality Reduction - The number of input features, variables, or columns
present in a given dataset is known as dimensionality, and the process to reduce
these features is called dimensionality reduction.

Dimensionality reduction technique can be defined as, "It is a way of converting the
higher dimensions dataset into lesser dimensions dataset ensuring that it
provides similar information." These techniques are widely used in machine
learning for obtaining a better fit predictive model while solving the classification and
regression problems.

It is commonly used in the fields that deal with high-dimensional data, such as speech
recognition, signal processing, bioinformatics, etc. It can also be used for data
visualization, noise reduction, cluster analysis, etc.

Dimensionality Reduction Techniques


Dimensionality reduction techniques can be broadly divided into two categories:
Feature selection: This refers to retaining the relevant (optimal) features and discarding the
irrelevant ones to ensure the high accuracy of the model. Feature selection methods such as filter,
wrapper, and embedded methods are popularly used.

Feature extraction: This process is also termed feature projection, wherein multidimensional space
is converted into a space with lower dimensions. Some known feature extraction methods include principal
component analysis (PCA), linear discriminant analysis (LDA), Kernel PCA (K-PCA), and quadratic
discriminant analysis (QCA).

Methods of Dimensionality Reduction


The various methods used for dimensionality reduction include:
 Principal Component Analysis (PCA)
 Linear Discriminant Analysis (LDA)
 Generalized Discriminant Analysis (GDA)

Principal Component Analysis (PCA) –

Principal Component Analysis is a statistical process that converts the observations of correlated
features into a set of linearly uncorrelated features with the help of orthogonal transformation.
These new transformed features are called the Principal Components.

PCA works by considering the variance of each attribute because the high attribute shows the
good split between the classes, and hence it reduces the dimensionality. Some real-world
applications of PCA are image processing, movie recommendation system, optimizing the
power allocation in various communication channels.

It works on the condition that while the data in a higher dimensional space is mapped to data in a
lower dimension space, the variance of the data in the lower dimensional space should be
maximum.
The main goal of Principal Component Analysis (PCA) is to reduce the dimensionality of a dataset
while preserving the most important patterns or relationships between the variables without any
prior knowledge of the target variables.

It involves the following steps:


 Construct the covariance matrix of the data.
 Compute the eigenvectors of this matrix.
 Eigenvectors corresponding to the largest eigenvalues are used to reconstruct a large fraction
of variance of the original data.

Advantages of Dimensionality Reduction

 It helps in data compression, and hence reduced storage space.

 It reduces computation time.

 It also helps remove redundant features, if any.


Disadvantages of Dimensionality Reduction

 It may lead to some amount of data loss.

 PCA tends to find linear correlations between variables, which is sometimes undesirable.

 PCA fails in cases where mean and covariance are not enough to define datasets.

 We may not know how many principal components to keep- in practice.

The Mathematics Behind PCA


PCA begins with a set of data points, typically represented as a matrix, where rows represent observations, and
columns represent features.

Step 1: Data Standardization


Before performing PCA, it’s essential to standardize the data. This means centering the data by subtracting the mean
and scaling it by dividing by the standard deviation. Standardization ensures that all features have equal importance in
the analysis. Z=X−μ/σ.

Step 2: Covariance Matrix

Covariance measures the strength of joint variability between two or more variables, indicating how much they
change in relation to each other. The covariance between two variables measures how they change together. The
covariance matrix for a dataset with n features is an n x n matrix that summarizes the relationships between all pairs
of features.

The value of covariance can be positive, negative, or zeros.

 Positive: As the x1 increases x2 also increases.


 Negative: As the x1 increases x2 also decreases.
 Zeros: No direct relation

Step 3: Eigenvalue and Eigenvector Calculation

The next step is to compute the eigenvalues and eigenvectors of the covariance matrix. These eigenvalues represent
the amount of variance explained by each eigenvector (principal component). Eigenvalues and eigenvectors are
mathematical concepts related to linear transformations and matrices. In the context of PCA, they play a central role
in identifying the principal components. Here’s what they mean:

 Eigenvalue: An eigenvalue (λ) represents a scalar that indicates how much variance is explained by the
corresponding eigenvector. In PCA, eigenvalues quantify the importance of each principal component.
They are always non-negative, and the eigenvalue corresponding to a principal component measures the
proportion of the total variance in the data explained by that component.

 Eigenvector: An eigenvector (v) is a vector associated with an eigenvalue. In PCA, eigenvectors represent
the directions along which the data varies the most. Each eigenvector points in a specific direction in the
feature space and corresponds to a principal component. Eigenvectors are typically normalized, meaning
their length is 1.
Step 4: Sorting Eigenvalues and Eigenvectors

To identify the most significant principal components, sort the eigenvalues in descending
order. The corresponding eigenvectors are also sorted accordingly. The first principal
component explains the most variance, the second explains the second most, and so on.

Step 5: Selecting Principal Components

Choose a subset of the top k eigenvectors to form a transformation matrix. After computing
the eigenvalues and eigenvectors of the covariance matrix, they are sorted in descending
order based on the magnitude of their eigenvalues. The principal components are then
selected from the top eigenvectors. The first principal component corresponds to the
eigenvector with the largest eigenvalue, the second principal component corresponds to the
eigenvector with the second-largest eigenvalue, and so on. These principal components are
orthogonal, meaning they are uncorrelated. This matrix is used to project the original data
into a lower-dimensional space, resulting in the reduced dataset.
One-hot encoding and label encoding are two commonly used techniques for representing
categorical data in machine learning and data analysis. Both techniques are used to convert
categorical variables into a format that can be provided to machine learning algorithms.

Steps for PCA Algorithm

1. Standardize the data: PCA requires standardized data, so the first step is to standardize the data to
ensure that all variables have a mean of 0 and a standard deviation of 1.

2. Calculate the covariance matrix: The next step is to calculate the covariance matrix of the
standardized data. This matrix shows how each variable is related to every other variable in the
dataset.

3. Calculate the eigenvectors and eigenvalues: The eigenvectors and eigenvalues of the covariance
matrix are then calculated. The eigenvectors represent the directions in which the data varies the
most, while the eigenvalues represent the amount of variation along each eigenvector.

4. Choose the principal components: The principal components are the eigenvectors with the highest
eigenvalues. These components represent the directions in which the data varies the most and are
used to transform the original data into a lower-dimensional space.

5. Transform the data: The final step is to transform the original data into the lower-dimensional
space defined by the principal components.

Label Encoding:

Label encoding involves assigning a unique integer value to each category in a categorical variable.
Each category is represented by an integer, starting from 0 or 1 and incrementing by 1 for each
subsequent category. For example:

 Category A: 0
 Category B: 1
 Category C: 2

Label encoding doesn't change the dimensionality of the data, as it replaces each category with a
single integer value. However, it introduces ordinality, meaning that the numeric values imply an
order or hierarchy among the categories. This may not always be desirable, especially for nominal
categorical variables where there is no inherent order.

One-Hot Encoding:

One-hot encoding, on the other hand, expands the categorical variable into a binary matrix where
each category is represented by a binary vector. In this encoding scheme, each category is
represented by a vector of length equal to the number of unique categories. The vector has a
value of 1 at the index corresponding to the category and 0 at all other indices. For example:

 Category A: [1, 0, 0]
 Category B: [0, 1, 0]
 Category C: [0, 0, 1]

One-hot encoding increases the dimensionality of the data because each unique category is
represented by its own binary feature. If a categorical variable has n unique categories, one-hot
encoding will result in n new binary features. This can lead to a significant increase in the number
of features, especially for variables with many unique categories. However, one-hot encoding
ensures that the categorical variables are treated as independent binary features, without implying
any ordinal relationship between the categories.

Q. break down how cnn actually operates. The image is downloaded , and the number of filters is
increased as we approach the model output ,but why

Image is downloaded:
 When dealing with CNNs, especially in image processing tasks, the images are
typically preprocessed before being fed into the network. This preprocessing might
involve downloading the images from a dataset or an external source.
 Once the images are downloaded, they are usually resized, normalized, and
sometimes augmented to prepare them for input into the neural network. This
preprocessing step ensures that the images are in a consistent format and are
suitable for training or inference.

2. The number of filters is increased as we approach the model output:


 This statement refers to the architecture of the CNN model itself, particularly in the
convolutional layers.
 In the earlier layers of a CNN, the number of filters is usually kept low. These filters
capture low-level features such as edges, textures, and gradients. Since these
features are more local and specific, a small number of filters are sufficient to capture
them.
 As we move deeper into the network, the number of filters typically increases. This is
because deeper layers are responsible for capturing higher-level and more abstract
features. These features are often combinations of lower-level features detected
earlier in the network. Therefore, a larger number of filters are required to capture
this increased complexity and variation in the data.
 By increasing the number of filters in deeper layers, the network can learn more
sophisticated representations of the input data, which helps in making more accurate
predictions.

Transfer Learning –
Transfer learning, used in machine learning, is the reuse of a pre-trained model on a
new problem. In transfer learning, a machine exploits the knowledge gained from a
previous task to improve generalization about another.
Transfer Learning refers to the set of methods that allow transferring knowledge
gained from solving specific problems to address another problem.

Transfer learning is a powerful technique in deep learning that allows us to leverage the
knowledge gained from one task to improve performance on another related task. This is
especially useful in deep learning because training deep neural networks can be
computationally expensive and time-consuming and also if you have not large amount of
data, then in that case you will not be able to train your model from scratch. By using
transfer learning, we can start with a pretrained model that has already learned general
features that are useful for many different tasks. We can then fine-tune this model on our
target task with less data and less training time.

Transfer learning is a technique in machine learning where a model trained on one task is
used as the starting point for a model on a second task. This can be useful when the second
task is similar to the first task, or when there is limited data available for the second task. By
using the learned features from the first task as a starting point, the model can learn more
quickly and effectively on the second task. This can also help to prevent overfitting, as the
model will have already learned general features that are likely to be useful in the second
task.
How does Transfer Learning work?
This is a general summary of how transfer learning works:

 Pre-trained Model: Start with a model that has previously been trained for a certain
task using a large set of data. Frequently trained on extensive datasets, this model
has identified general features and patterns relevant to numerous related jobs.

 Base Model: The model that has been pre-trained is known as the base model. It is
made up of layers that have utilized the incoming data to learn hierarchical feature
representations.

 Transfer Layers: In the pre-trained model, find a set of layers that capture generic
information relevant to the new task as well as the previous one. Because they are
prone to learning low-level information, these layers are frequently found near the
top of the network.

 Fine-tuning: Using the dataset from the new challenge to retrain the chosen layers.
We define this procedure as fine-tuning. The goal is to preserve the knowledge from
the pre-training while enabling the model to modify its parameters to better suit the
demands of the current assignment.
Ways of doing Transfer Learning :

There are two ways to doing transfer learning :

1. Feature extraction : In feature extraction, we keep the weights of the


convolutional layers fixed and add a new fully connected layer at the end of
the network. We then train only the fully connected layer. This method is
useful for image classification tasks when the labels are similar to the labels
in the dataset used to train the pretrained model.

2. Fine Tuning : In fine-tuning, we train the last few convolutional layers of the
pretrained model and then add a new fully connected layer. We also train
the fully connected layer. This method is useful when the labels for our
image classification task are new and not present in the dataset used to train
the pretrained model. We keep the weights of the first few convolutional
layers fixed and only train the last few layers and the fully connected layer.

Advantages of transfer learning:


 Speed up the training process: By using a pre-trained model, the model can learn
more quickly and effectively on the second task, as it already has a good
understanding of the features and patterns in the data.
 Better performance: Transfer learning can lead to better performance on the second
task, as the model can leverage the knowledge it has gained from the first task.
 Handling small datasets: When there is limited data available for the second task,
transfer learning can help to prevent overfitting, as the model will have already
learned general features that are likely to be useful in the second task.
The benefits of transfer learning, particularly in transferring features, are numerous and impactful
in various machine learning and deep learning applications:

1. Effective Feature Extraction:


 Pre-trained models, especially those trained on large-scale datasets like ImageNet,
have learned to extract meaningful and hierarchical representations of input data.
 Transferring these learned features allows the model to capture rich and informative
representations of the input, even for tasks with limited data.
 Features extracted from lower layers of the pre-trained model (e.g., edges, textures)
are often generic and reusable across a wide range of tasks, providing a strong
foundation for subsequent learning.
2. Reduced Need for Task-Specific Data:
 By leveraging transferred features from pre-trained models, the model requires less
task-specific labeled data for training.
 This is particularly advantageous in scenarios where collecting large amounts of
labeled data for a specific task is challenging, time-consuming, or costly.
3. Improved Generalization and Robustness:
 Transferred features from pre-trained models often capture generic and invariant
patterns present in the input data.
 This enhances the model's ability to generalize well to unseen examples and
improves its robustness to variations in the input, such as changes in lighting
conditions, viewpoints, or backgrounds.
4. Efficient Learning and Faster Convergence:
 Utilizing transferred features as initializations for training accelerates the learning
process and leads to faster convergence.
 Instead of starting with randomly initialized weights, the model begins with feature
representations that have already captured relevant information from a large dataset.
This jump-starts the learning process and speeds up convergence towards an
optimal solution.
5. Domain Adaptation and Knowledge Transfer:
 Transfer learning facilitates knowledge transfer between related domains, allowing
models trained on one domain to be adapted to perform well on a different but
related domain.
 For example, features learned from images of natural scenes can be transferred and
adapted for tasks such as satellite imagery analysis or medical image classification.
6. Interpretability and Insights:
 Transferred features often provide insights into the underlying structure of the input
data, making the model's decisions more interpretable.
 Understanding which features are relevant for the task at hand can lead to better
model interpretability and enable practitioners to gain deeper insights into the
problem domain.
Overfitting

When a model performs very well for training data but has poor performance with test data (new data), it is known as
overfitting. In this case, the machine learning model learns the details and noise in the training data such that it
negatively affects the performance of the model on test data. Overfitting can happen due to low bias and high
variance.

Reasons for Overfitting:

1. High variance and low bias.


2. The model is too complex.
3. The size of the training data.
Techniques to Reduce Overfitting
1. Improving the quality of training data reduces overfitting by focusing on meaningful
patterns, mitigate the risk of fitting the noise or irrelevant features.
2. Increase the training data can improve the model’s ability to generalize to unseen data
and reduce the likelihood of overfitting.
3. Reduce model complexity.
4. Early stopping during the training phase (have an eye over the loss over the training
period as soon as loss begins to increase stop training).
5. Ridge Regularization and Lasso Regularization.
6. Use dropout for neural networks to tackle overfitting.

Underfitting - When a model has not learned the patterns in the training data well and is unable to
generalize well on the new data, it is known as underfitting. An underfit model has poor performance on the
training data and will result in unreliable predictions. Underfitting occurs due to high bias and low variance.
Reasons for Underfitting
1. The model is too simple, So it may be not capable to represent the complexities in the
data.
2. The input features which is used to train the model is not the adequate representations of
underlying factors influencing the target variable.
3. The size of the training dataset used is not enough.
4. Excessive regularization are used to prevent the overfitting, which constraint the model
to capture the data well.
5. Features are not scaled.
Techniques to Reduce Underfitting
1. Increase model complexity.
2. Increase the number of features, performing feature engineering.
3. Remove noise from the data.
4. Increase the number of epochs or increase the duration of training to get better results.
Overfitting
Definition: Overfitting occurs when a model learns the training data too well, including its noise
and details, which negatively impacts its performance on new, unseen data.

Indicators:

1. High training accuracy and low validation accuracy: The model performs very well on training
data but poorly on validation data.
2. Large gap between training and validation loss: The training loss is much lower than the
validation loss.

Example: Consider a CNN trained on a dataset for image classification.

 Training Accuracy: 98%


 Validation Accuracy: 70%
 Training Loss: 0.02
 Validation Loss: 1.2

In this example, the high training accuracy and low validation accuracy, along with the significant
gap between training and validation loss, suggest overfitting. The model memorizes the training
data but fails to generalize to new data.

Underfitting
Definition: Underfitting occurs when a model is too simple to capture the underlying patterns in
the data, leading to poor performance on both training and validation datasets.

Indicators:

1. Low training and validation accuracy: The model performs poorly on both training and validation
data.
2. High training and validation loss: The losses remain high for both training and validation datasets.

Example: Consider a CNN trained on the same dataset.

 Training Accuracy: 60%


 Validation Accuracy: 58%
 Training Loss: 1.5
 Validation Loss: 1.6

In this example, the low accuracy and high loss for both training and validation data indicate that
the model is underfitting. It fails to capture the essential patterns in the data.

What are overfitting and underfitting?

Overfitting occurs when your neural network learns too much from the training
data and fails to generalize to new or unseen data. Underfitting occurs when
your neural network learns too little from the training data and performs
poorly on both the training and the validation data. Both overfitting and
underfitting can reduce your accuracy .

How to detect overfitting and underfitting?


One of the best ways to detect overfitting and underfitting is to monitor the
training and validation loss and accuracy during the training process. If the
training loss is much lower than the validation loss, or the training accuracy is
much higher than the validation accuracy, you may have overfitting. If the
training loss and the validation loss are both high, or the training accuracy and
the validation accuracy are both low, you may have underfitting. You can also
use learning curves and confusion matrices to visualize the performance of
your neural network on different data sets.

How to prevent overfitting?


There are many techniques to prevent overfitting, such as data augmentation and regularization.
Data augmentation involves applying random transformations to the training data, like cropping,
flipping, or scaling. Regularization adds a penalty term to the loss function to reduce the
complexity and magnitude of the weights and biases of the neural network. Early stopping is also
an effective technique; it involves stopping the training process when the validation loss stops
decreasing or starts increasing, or when the validation accuracy stops increasing or starts
decreasing. These methods can help reduce the risk of overfitting and ensure that the neural
network does not learn spurious patterns from the training data.

How to fix underfitting?


Underfitting can be fixed with several techniques, such as data quality, model complexity, and
hyperparameter tuning. Data quality involves checking and improving the quality of the training
data, like removing outliers, missing values, duplicates, and errors. Model complexity involves
increasing the complexity and capacity of the neural network by adding more layers, neurons, and
features. Lastly, hyperparameter tuning requires finding the optimal values for the
hyperparameters of the neural network, like learning rate, batch size, activation function, and
optimizer. All of these techniques can enhance the reliability and relevance of the training data as
well as improve the efficiency and effectiveness of the learning process.

You might also like