0% found this document useful (0 votes)
234 views13 pages

Deep Learning - AD3501 - Important Question and 2 Marks With Answers - Unit 1

Uploaded by

2k22aids49
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
234 views13 pages

Deep Learning - AD3501 - Important Question and 2 Marks With Answers - Unit 1

Uploaded by

2k22aids49
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 13

Click on Subject/Paper under Semester to enter.

Professional English Discrete Mathematics Environmental Sciences


Professional English - - II - HS3252 - MA3354 and Sustainability -
I - HS3152 GE3451
Digital Principles and
Statistics and Probability and
Computer Organization
Matrices and Calculus Numerical Methods - Statistics - MA3391
- CS3351
- MA3151 MA3251
3rd Semester
1st Semester

4th Semester
2nd Semester

Database Design and Operating Systems -


Engineering Physics - Engineering Graphics
Management - AD3391 AL3452
PH3151 - GE3251

Physics for Design and Analysis of Machine Learning -


Engineering Chemistry Information Science Algorithms - AD3351 AL3451
- CY3151 - PH3256
Data Exploration and Fundamentals of Data
Basic Electrical and
Visualization - AD3301 Science and Analytics
Problem Solving and Electronics Engineering -
BE3251 - AD3491
Python Programming -
GE3151 Artificial Intelligence
Data Structures Computer Networks
- AL3391
Design - AD3251 - CS3591

Deep Learning -
AD3501

Embedded Systems
Data and Information Human Values and
and IoT - CS3691
5th Semester

Security - CW3551 Ethics - GE3791


6th Semester

7th Semester

8th Semester

Open Elective-1
Distributed Computing Open Elective 2
- CS3551 Project Work /
Elective-3
Open Elective 3 Intership
Big Data Analytics - Elective-4
CCS334 Open Elective 4
Elective-5
Elective 1 Management Elective
Elective-6
Elective 2
All Computer Engg Subjects - [ B.E., M.E., ] (Click on Subjects to enter)
Programming in C Computer Networks Operating Systems
Programming and Data Programming and Data Problem Solving and Python
Structures I Structure II Programming
Database Management Systems Computer Architecture Analog and Digital
Communication
Design and Analysis of Microprocessors and Object Oriented Analysis
Algorithms Microcontrollers and Design
Software Engineering Discrete Mathematics Internet Programming
Theory of Computation Computer Graphics Distributed Systems
Mobile Computing Compiler Design Digital Signal Processing
Artificial Intelligence Software Testing Grid and Cloud Computing
Data Ware Housing and Data Cryptography and Resource Management
Mining Network Security Techniques
Service Oriented Architecture Embedded and Real Time Multi - Core Architectures
Systems and Programming
Probability and Queueing Theory Physics for Information Transforms and Partial
Science Differential Equations
Technical English Engineering Physics Engineering Chemistry
Engineering Graphics Total Quality Professional Ethics in
Management Engineering
Basic Electrical and Electronics Problem Solving and Environmental Science and
and Measurement Engineering Python Programming Engineering
lOMoARcPSD|43859295

www.BrainKart.com

UNIT I

DEEP NETWORKS BASICS

Linear Algebra: Scalars -- Vectors -- Matrices and tensors; Probability Distributions --


Gradientbased Optimization – Machine Learning Basics: Capacity -- Overfitting and
underfitting -- Hyperparameters and validation sets -- Estimators -- Bias and variance --
Stochastic gradient descent -- Challenges motivating deep learning; Deep Networks: Deep
feedforward networks; Regularization -- Optimization.

Two Marks – Part A


1. What is Deep Learning?
Deep learning is a part of machine learning with an algorithm inspired by the structure
and function of the brain, which is called an artificial neural network. In the mid-1960s,
Alexey Grigorevich Ivakhnenko published the first general, while working on deep
learning network. Deep learning is suited over a range of fields such as computer
vision, speech recognition, natural language processing, etc

2. What are the main differences between AI, Machine Learning, and Deep Learning?
AI stands for Artificial Intelligence. It is a technique which enables machines to mimic
human behavior.
Machine Learning is a subset of AI which uses statistical methods to enable machines
to improve with experiences.

Deep learning is a part of Machine learning, which makes the computation of multi-
layer neural networks feasible. It takes advantage of neural networks to simulate
human-like decision making.
3. Differentiate supervised and unsupervised deep learning procedures.
Supervised learning is a system in which both input and desired output data are
provided. Input and output data are labeled to provide a learning basis for future data
processing.
Unsupervised procedure does not need labeling information explicitly, and the
operations can be carried out without the same. The common unsupervised learning
method is cluster analysis. It is used for exploratory data analysis to find hidden
patterns or grouping in data.

https://play.google.com/store/apps/details?id=info.therithal.brainkart.annauniversitynotes
lOMoARcPSD|43859295

www.BrainKart.com

4. What are the applications of deep learning?


There are various applications of deep learning:
Computer vision
Natural language processing and pattern recognition
Image recognition and processing
Machine translation
Sentiment analysis
Question answering system
Object Classification and Detection
Automatic Handwriting Generation
Automatic Text Generation.

5. What is scalar and vector?


A scalar is just a single number, in contrast to most of the other objects like Vectors,
which are usually arrays of multiple numbers.

6. What are matrices and tensors?


Matrices: A matrix is a 2D array of numbers, so each element is identified by two
subscripts instead of just one. We usually give matrices uppercase variable names with
bold characters, such as A.

We usually identify the elements of a matrix by using its name in italics but not in bold,
and the subscripts are listed with separating commas.

Tensors: In some cases, we’ll need an array with more than two axes. In the general
case, an array of numbers arranged on a regular grid with a varying number of axes is
called a tensor. We note a tensor named “A” with this font: A.

7. Why probability is important in deep learning?


Probability is the science of quantifying uncertain things. Most of machine learning and
deep learning systems utilize a lot of data to learn about patterns in the data. Whenever
data is utilized in a system rather than sole logic, uncertainty grows up and whenever
uncertainty grows up, probability becomes relevant.

By introducing probability to a deep learning system, we introduce common sense to


the system. Otherwise the system would be very brittle and will not be useful.In deep
learning, several models like Bayesian models, probabilistic graphical models, hidden
markov models are used. They depend entirely on probability concepts.

8. Define Random Variable.


A random variable is a variable that can take on different values randomly. We typically
denote the random variable itself with a lower case letter in plain typeface, and the
values it can take on with lower case script letters. For example, x1 and x2 are both
possible values that the random variable x can take on.

9. Do random variables is discrete or continuous?

https://play.google.com/store/apps/details?id=info.therithal.brainkart.annauniversitynotes
lOMoARcPSD|43859295

www.BrainKart.com

Random variables may be discrete or continuous. A discrete random variable is one that
has a finite or countably infinite number of states. Note that these states are not
necessarily the integers; they can also just be named states that are not considered to
have any numerical value. A continuous random variable is associated with a real value.

10. What are probability distributions?


A probability distribution is a description of how likely a random variable or set of
random variables is to take on each of its possible states. The way we describe
probability distributions depends on whether the variables are discrete or continuous.

11. Define Probability mass function?


A probability distribution over discrete variables may be described using a probability
mass function (PMF). We typically denote probability mass functions with a capital P.
Often we associate each random variable with a different probability mass function and
the reader must infer which probability mass function to use based on the identity of the
random variable, rather than the name of the function; P(x) is usually not the same as
P(y).

12. List the properties that probability mass function satisfies?


• The domain of P must be the set of all possible states of x.

• ∀x ∈ x,0 ≤ P(x) ≤ 1. An impossible event has probability 0 and no state can be less
probable than that. Likewise, an event that is guaranteed to happen has probability 1,
and no state can have a greater chance of occurring.

• ∑x∈x P(x) = 1. We refer to this property as being normalized. Without this property, we
could obtain probabilities greater than one by computing the probability of one of many
events occurring.

13. List the properties that probability density function satisfies?


When working with continuous random variables, we describe probability distributions
using a probability density function (PDF) rather than a probability mass function.

To be a probability density function, a function p must satisfy the following properties:


• The domain of p must be the set of all possible states of x.
• ∀x ∈ x, p(x) ≥ 0. Note that we do not require p(x) ≤ 1.
• ʃp(x)dx = 1.

14. What is Gradient based optimizer?


Gradient descent is an optimization algorithm that’s used when training deep learning
models. It’s based on a convex function and updates its parameters iteratively to
minimize a given function to its local minimum.

https://play.google.com/store/apps/details?id=info.therithal.brainkart.annauniversitynotes
lOMoARcPSD|43859295

www.BrainKart.com

The notation used in the above Formula is given below,


In the above formula,
 α is the learning rate,
 J is the cost function, and
 ϴ is the parameter to be updated.
As you can see, the gradient represents the partial derivative of J(cost function) with
respect to ϴj

15. Why overfitting and underfitting in ML?


Factors determining how well an ML algorithm will perform are its ability to:
1. Make the training error small
2. Make gap between training and test errors small

• They correspond to two ML challenges


Underfitting - Inability to obtain low enough error rate on the training set
Overfitting - Gap between training error and testing error is too large

We can control whether a model is more likely to overfit or underfit by altering its
capacity

16. What is capacity of a model?


Model capacity is ability to fit variety of functions
– Model with Low capacity struggles to fit training set
– A High capacity model can overfit by memorizing properties of training set not useful
on test set
• When model has higher capacity, it overfits – One way to control capacity of a
learning algorithm is by choosing the hypothesis space
• i.e., set of functions that the learning algorithm is allowed to select as being the
solution

17. How to control the capacity of learning algorithm?


One way to control the capacity of a learning algorithm is by choosing its hypothesis
space, the set of functions that the learning algorithm is allowed to select as being the
solution. For example, the linear regression algorithm has the set of all linear functions
of its input as its hypothesis space. We can generalize linear regression to include
polynomials, rather than just linear functions, in its hypothesis space. Doing so increases
the model’s capacity

18. Define Bayes error.


Ideal model is an oracle that knows the true probability distributions that generate the
data • Even such a model incurs some error due to noise/overlap in the distributions •
The error incurred by an oracle making predictions from the true distribution p(x,y) is
called the Bayes error

19. Why hyperparameters in ML?


Most ML algorithms have hyperparameters
– We can use to control algorithm behavior
– Values of hyperparameters are not adapted by learning algorithm itself
• Although, we can design nested learning where one learning algorithm
– Which learns best hyperparameters for another learning algorithm.

https://play.google.com/store/apps/details?id=info.therithal.brainkart.annauniversitynotes
lOMoARcPSD|43859295

www.BrainKart.com

20. How to solve overfitting problem caused by learning hyperparameters on training


dataset?
• To solve the problem, we use a validation set
– Examples that training algorithm does not observe
• Test examples should not be used to make choices about the model hyperparameters •
Training data is split into two disjoint parts
– First to learn the parameters
– Other is the validation set to estimate generalization error during or after training
• allowing for the hyperparameters to be updated
– Typically, 80% of training data for training and 20% for validation

21. What are point estimators?


Point estimators are functions that are used to find an approximate value of a population
parameter from random samples of the population. They use the sample data of a population
to calculate a point estimate or a statistic that serves as the best estimate of an
unknown parameter of a population.

Most often, the existing methods of finding the parameters of large populations are
unrealistic. For example, when finding the average age of kids attending kindergarten, it will
be impossible to collect the exact age of every kindergarten kid in the world. Instead, a
statistician can use the point estimator to make an estimate of the population parameter.

22. List the characteristics or Properties of Point Estimators?


The following are the main characteristics of point estimators:
1. Bias
The bias of a point estimator is defined as the difference between the expected value of the
estimator and the value of the parameter being estimated. When the estimated value of the
parameter and the value of the parameter being estimated are equal, the estimator is
considered unbiased.
Also, the closer the expected value of a parameter is to the value of the parameter being
measured, the lesser the bias is.
2. Consistency

https://play.google.com/store/apps/details?id=info.therithal.brainkart.annauniversitynotes
lOMoARcPSD|43859295

www.BrainKart.com

Consistency tells us how close the point estimator stays to the value of the parameter as it
increases in size. The point estimator requires a large sample size for it to be more consistent
and accurate.
You can also check if a point estimator is consistent by looking at its corresponding
expected value and variance. For the point estimator to be consistent, the expected value
should move toward the true value of the parameter.
3. Most efficient or unbiased
The most efficient point estimator is the one with the smallest variance of all the unbiased
and consistent estimators. The variance measures the level of dispersion from the estimate,
and the smallest variance should vary the least from one sample to the other.

23. Define Stochastic Gradient Descent with merits and demerits.


Stochastic Gradient Descent (SGD) is a variant of the Gradient Descent algorithm used for
optimizing machine learning models. In this variant, only one random training example is
used to calculate the gradient and update the parameters at each iteration. Here are some of
the advantages and disadvantages of using SGD:
Advantages of Gradient Descent
Speed: SGD is faster than other variants of Gradient Descent such as Batch Gradient
Descent and Mini-Batch Gradient Descent since it uses only one example to update the
parameters.
Memory Efficiency: Since SGD updates the parameters for each training example one
at a time, it is memory-efficient and can handle large datasets that cannot fit into
memory.
Avoidance of Local Minima: Due to the noisy updates in SGD, it has the ability to
escape from local minima and converges to a global minimum.

Disadvantages of Gradient Descent


Noisy updates: The updates in SGD are noisy and have a high variance, which can
make the optimization process less stable and lead to oscillations around the minimum.

Slow Convergence: SGD may require more iterations to converge to the minimum
since it updates the parameters for each training example one at a time.

Sensitivity to Learning Rate: The choice of learning rate can be critical in SGD since
using a high learning rate can cause the algorithm to overshoot the minimum, while a
low learning rate can make the algorithm converge slowly.

https://play.google.com/store/apps/details?id=info.therithal.brainkart.annauniversitynotes
lOMoARcPSD|43859295

www.BrainKart.com

Less Accurate: Due to the noisy updates, SGD may not converge to the exact global
minimum and can result in a suboptimal solution. This can be mitigated by using
techniques such as learning rate scheduling and momentum-based updates

24. What is a deep feedforward network?


In a feedforward network, the information moves only in the forward direction, from the
input layer, through the hidden layers (if they exist), and to the output layer. There are no
cycles or loops in this network. Feedforward neural networks are sometimes
ambiguously called multilayer perceptron’s.

25. What is the working principle of a feed forward neural network?

When the feed forward neural network gets simplified, it can appear as a single layer
perceptron.
This model multiplies inputs with weights as they enter the layer. Afterward, the
weighted input values get added together to get the sum. As long as the sum of the
values rises above a certain threshold, set at zero, the output value is usually 1, while if it
falls below the threshold, it is usually -1.

26. What are the Layers of feed forward neural network?

https://play.google.com/store/apps/details?id=info.therithal.brainkart.annauniversitynotes
lOMoARcPSD|43859295

www.BrainKart.com

27. Brief on classification of activation function.


An activation function can be classified into three major categories: sigmoid, Tanh, and
Rectified Linear Unit (ReLu).
 Sigmoid:

Input values between 0 and 1 get mapped to the output values.

 Tanh:

A value between -1 and 1 gets mapped to the input values.


 Rectified linear Unit:

Only positive values are allowed to flow through this function. Negative values get
mapped to 0.
28. What is Regularization?
Regularization is a technique used in machine learning and deep learning to prevent
overfitting and improve the generalization performance of a model. It involves adding a
penalty term to the loss function during training. This penalty discourages the model
from becoming too complex or having large parameter values, which helps in
controlling the model’s ability to fit noise in the training data. Regularization methods
include L1 and L2 regularization, dropout, early stopping, and more.

29. What is dropout in neural network?


Dropout is a regularization technique used in neural networks to prevent overfitting.
During training, a random subset of neurons is “dropped out” by setting their outputs to
zero with a certain probability. This forces the network to learn more robust and
independent features, as it cannot rely on specific neurons. Dropout improves
generalization and reduces the risk of overfitting.

30. Difference between regularization and optimization.


The main conceptual difference is that optimization is about finding the set of
parameters/weights that maximizes/minimizes some objective function (which can also
include a regularization term), while regularization is about limiting the values that your
parameters can take during the optimization/learning/training, so optimization with
regularisation (especially, with L1 and L2 regularization) can be thought of as
constrained optimization, but, in some cases, such as dropout, it can also be thought of
as a way of introducing noise in the training process.

https://play.google.com/store/apps/details?id=info.therithal.brainkart.annauniversitynotes
lOMoARcPSD|43859295

www.BrainKart.com

31. How does splitting a dataset into train, dev and test sets help identify overfitting?
• Overfitting: the model fits the training set so much that it does not generalize well.
• Low training error and high dev error can be used to identify this
• Must ensure that the distribution of train and dev is the same/similar!

PART B
1. Develop short notes on following with respect to deep learning with
Examples.
i) Scalar and Vectors. (6)
ii) Matrices. (7)
2. Explicate Probability Mass function and Probability Density function (13)
3. Describe Gradient-based optimization in deep learning.
4. Explain in detain on linear regression machine learning algorithm. (13)
5. Describe Stochastic Gradient Descent in detail. (13)
6. Explain in detail on different regularization technique in Deep learning? (13)
7. Brief how does regularization help reduce overfitting? (13)
8. Analyse and write short notes on Dataset Augmentation. (13)
9. Point out and explain different set of layers in Feed forward networks.
10. Describe Deep feed forward networks with neat diagram. (13)

PART C
1. Assess the following with respect to deep learning examples.
i) Random Variables. (6)
ii) Probability. (7)
2. Explain briefly on Estimators, Bias and Variance that are useful for generalization,
underfitting and overfitting.
3. Briefly explain an example of a fully functioning feed forward network on a simple
task.
4. Assess the difference between linear models and neural networks. (15)

https://play.google.com/store/apps/details?id=info.therithal.brainkart.annauniversitynotes
Click on Subject/Paper under Semester to enter.
Professional English Discrete Mathematics Environmental Sciences
Professional English - - II - HS3252 - MA3354 and Sustainability -
I - HS3152 GE3451
Digital Principles and
Statistics and Probability and
Computer Organization
Matrices and Calculus Numerical Methods - Statistics - MA3391
- CS3351
- MA3151 MA3251
3rd Semester
1st Semester

4th Semester
2nd Semester

Database Design and Operating Systems -


Engineering Physics - Engineering Graphics
Management - AD3391 AL3452
PH3151 - GE3251

Physics for Design and Analysis of Machine Learning -


Engineering Chemistry Information Science Algorithms - AD3351 AL3451
- CY3151 - PH3256
Data Exploration and Fundamentals of Data
Basic Electrical and
Visualization - AD3301 Science and Analytics
Problem Solving and Electronics Engineering -
BE3251 - AD3491
Python Programming -
GE3151 Artificial Intelligence
Data Structures Computer Networks
- AL3391
Design - AD3251 - CS3591

Deep Learning -
AD3501

Embedded Systems
Data and Information Human Values and
and IoT - CS3691
5th Semester

Security - CW3551 Ethics - GE3791


6th Semester

7th Semester

8th Semester

Open Elective-1
Distributed Computing Open Elective 2
- CS3551 Project Work /
Elective-3
Open Elective 3 Intership
Big Data Analytics - Elective-4
CCS334 Open Elective 4
Elective-5
Elective 1 Management Elective
Elective-6
Elective 2
All Computer Engg Subjects - [ B.E., M.E., ] (Click on Subjects to enter)
Programming in C Computer Networks Operating Systems
Programming and Data Programming and Data Problem Solving and Python
Structures I Structure II Programming
Database Management Systems Computer Architecture Analog and Digital
Communication
Design and Analysis of Microprocessors and Object Oriented Analysis
Algorithms Microcontrollers and Design
Software Engineering Discrete Mathematics Internet Programming
Theory of Computation Computer Graphics Distributed Systems
Mobile Computing Compiler Design Digital Signal Processing
Artificial Intelligence Software Testing Grid and Cloud Computing
Data Ware Housing and Data Cryptography and Resource Management
Mining Network Security Techniques
Service Oriented Architecture Embedded and Real Time Multi - Core Architectures
Systems and Programming
Probability and Queueing Theory Physics for Information Transforms and Partial
Science Differential Equations
Technical English Engineering Physics Engineering Chemistry
Engineering Graphics Total Quality Professional Ethics in
Management Engineering
Basic Electrical and Electronics Problem Solving and Environmental Science and
and Measurement Engineering Python Programming Engineering

You might also like