Deep Learning - AD3501 - Important Question and 2 Marks With Answers - Unit 1
Deep Learning - AD3501 - Important Question and 2 Marks With Answers - Unit 1
4th Semester
2nd Semester
Deep Learning -
AD3501
Embedded Systems
Data and Information Human Values and
and IoT - CS3691
5th Semester
7th Semester
8th Semester
Open Elective-1
Distributed Computing Open Elective 2
- CS3551 Project Work /
Elective-3
Open Elective 3 Intership
Big Data Analytics - Elective-4
CCS334 Open Elective 4
Elective-5
Elective 1 Management Elective
Elective-6
Elective 2
All Computer Engg Subjects - [ B.E., M.E., ] (Click on Subjects to enter)
Programming in C Computer Networks Operating Systems
Programming and Data Programming and Data Problem Solving and Python
Structures I Structure II Programming
Database Management Systems Computer Architecture Analog and Digital
Communication
Design and Analysis of Microprocessors and Object Oriented Analysis
Algorithms Microcontrollers and Design
Software Engineering Discrete Mathematics Internet Programming
Theory of Computation Computer Graphics Distributed Systems
Mobile Computing Compiler Design Digital Signal Processing
Artificial Intelligence Software Testing Grid and Cloud Computing
Data Ware Housing and Data Cryptography and Resource Management
Mining Network Security Techniques
Service Oriented Architecture Embedded and Real Time Multi - Core Architectures
Systems and Programming
Probability and Queueing Theory Physics for Information Transforms and Partial
Science Differential Equations
Technical English Engineering Physics Engineering Chemistry
Engineering Graphics Total Quality Professional Ethics in
Management Engineering
Basic Electrical and Electronics Problem Solving and Environmental Science and
and Measurement Engineering Python Programming Engineering
lOMoARcPSD|43859295
www.BrainKart.com
UNIT I
2. What are the main differences between AI, Machine Learning, and Deep Learning?
AI stands for Artificial Intelligence. It is a technique which enables machines to mimic
human behavior.
Machine Learning is a subset of AI which uses statistical methods to enable machines
to improve with experiences.
Deep learning is a part of Machine learning, which makes the computation of multi-
layer neural networks feasible. It takes advantage of neural networks to simulate
human-like decision making.
3. Differentiate supervised and unsupervised deep learning procedures.
Supervised learning is a system in which both input and desired output data are
provided. Input and output data are labeled to provide a learning basis for future data
processing.
Unsupervised procedure does not need labeling information explicitly, and the
operations can be carried out without the same. The common unsupervised learning
method is cluster analysis. It is used for exploratory data analysis to find hidden
patterns or grouping in data.
https://play.google.com/store/apps/details?id=info.therithal.brainkart.annauniversitynotes
lOMoARcPSD|43859295
www.BrainKart.com
We usually identify the elements of a matrix by using its name in italics but not in bold,
and the subscripts are listed with separating commas.
Tensors: In some cases, we’ll need an array with more than two axes. In the general
case, an array of numbers arranged on a regular grid with a varying number of axes is
called a tensor. We note a tensor named “A” with this font: A.
https://play.google.com/store/apps/details?id=info.therithal.brainkart.annauniversitynotes
lOMoARcPSD|43859295
www.BrainKart.com
Random variables may be discrete or continuous. A discrete random variable is one that
has a finite or countably infinite number of states. Note that these states are not
necessarily the integers; they can also just be named states that are not considered to
have any numerical value. A continuous random variable is associated with a real value.
• ∀x ∈ x,0 ≤ P(x) ≤ 1. An impossible event has probability 0 and no state can be less
probable than that. Likewise, an event that is guaranteed to happen has probability 1,
and no state can have a greater chance of occurring.
• ∑x∈x P(x) = 1. We refer to this property as being normalized. Without this property, we
could obtain probabilities greater than one by computing the probability of one of many
events occurring.
https://play.google.com/store/apps/details?id=info.therithal.brainkart.annauniversitynotes
lOMoARcPSD|43859295
www.BrainKart.com
We can control whether a model is more likely to overfit or underfit by altering its
capacity
https://play.google.com/store/apps/details?id=info.therithal.brainkart.annauniversitynotes
lOMoARcPSD|43859295
www.BrainKart.com
Most often, the existing methods of finding the parameters of large populations are
unrealistic. For example, when finding the average age of kids attending kindergarten, it will
be impossible to collect the exact age of every kindergarten kid in the world. Instead, a
statistician can use the point estimator to make an estimate of the population parameter.
https://play.google.com/store/apps/details?id=info.therithal.brainkart.annauniversitynotes
lOMoARcPSD|43859295
www.BrainKart.com
Consistency tells us how close the point estimator stays to the value of the parameter as it
increases in size. The point estimator requires a large sample size for it to be more consistent
and accurate.
You can also check if a point estimator is consistent by looking at its corresponding
expected value and variance. For the point estimator to be consistent, the expected value
should move toward the true value of the parameter.
3. Most efficient or unbiased
The most efficient point estimator is the one with the smallest variance of all the unbiased
and consistent estimators. The variance measures the level of dispersion from the estimate,
and the smallest variance should vary the least from one sample to the other.
Slow Convergence: SGD may require more iterations to converge to the minimum
since it updates the parameters for each training example one at a time.
Sensitivity to Learning Rate: The choice of learning rate can be critical in SGD since
using a high learning rate can cause the algorithm to overshoot the minimum, while a
low learning rate can make the algorithm converge slowly.
https://play.google.com/store/apps/details?id=info.therithal.brainkart.annauniversitynotes
lOMoARcPSD|43859295
www.BrainKart.com
Less Accurate: Due to the noisy updates, SGD may not converge to the exact global
minimum and can result in a suboptimal solution. This can be mitigated by using
techniques such as learning rate scheduling and momentum-based updates
When the feed forward neural network gets simplified, it can appear as a single layer
perceptron.
This model multiplies inputs with weights as they enter the layer. Afterward, the
weighted input values get added together to get the sum. As long as the sum of the
values rises above a certain threshold, set at zero, the output value is usually 1, while if it
falls below the threshold, it is usually -1.
https://play.google.com/store/apps/details?id=info.therithal.brainkart.annauniversitynotes
lOMoARcPSD|43859295
www.BrainKart.com
Tanh:
Only positive values are allowed to flow through this function. Negative values get
mapped to 0.
28. What is Regularization?
Regularization is a technique used in machine learning and deep learning to prevent
overfitting and improve the generalization performance of a model. It involves adding a
penalty term to the loss function during training. This penalty discourages the model
from becoming too complex or having large parameter values, which helps in
controlling the model’s ability to fit noise in the training data. Regularization methods
include L1 and L2 regularization, dropout, early stopping, and more.
https://play.google.com/store/apps/details?id=info.therithal.brainkart.annauniversitynotes
lOMoARcPSD|43859295
www.BrainKart.com
31. How does splitting a dataset into train, dev and test sets help identify overfitting?
• Overfitting: the model fits the training set so much that it does not generalize well.
• Low training error and high dev error can be used to identify this
• Must ensure that the distribution of train and dev is the same/similar!
PART B
1. Develop short notes on following with respect to deep learning with
Examples.
i) Scalar and Vectors. (6)
ii) Matrices. (7)
2. Explicate Probability Mass function and Probability Density function (13)
3. Describe Gradient-based optimization in deep learning.
4. Explain in detain on linear regression machine learning algorithm. (13)
5. Describe Stochastic Gradient Descent in detail. (13)
6. Explain in detail on different regularization technique in Deep learning? (13)
7. Brief how does regularization help reduce overfitting? (13)
8. Analyse and write short notes on Dataset Augmentation. (13)
9. Point out and explain different set of layers in Feed forward networks.
10. Describe Deep feed forward networks with neat diagram. (13)
PART C
1. Assess the following with respect to deep learning examples.
i) Random Variables. (6)
ii) Probability. (7)
2. Explain briefly on Estimators, Bias and Variance that are useful for generalization,
underfitting and overfitting.
3. Briefly explain an example of a fully functioning feed forward network on a simple
task.
4. Assess the difference between linear models and neural networks. (15)
https://play.google.com/store/apps/details?id=info.therithal.brainkart.annauniversitynotes
Click on Subject/Paper under Semester to enter.
Professional English Discrete Mathematics Environmental Sciences
Professional English - - II - HS3252 - MA3354 and Sustainability -
I - HS3152 GE3451
Digital Principles and
Statistics and Probability and
Computer Organization
Matrices and Calculus Numerical Methods - Statistics - MA3391
- CS3351
- MA3151 MA3251
3rd Semester
1st Semester
4th Semester
2nd Semester
Deep Learning -
AD3501
Embedded Systems
Data and Information Human Values and
and IoT - CS3691
5th Semester
7th Semester
8th Semester
Open Elective-1
Distributed Computing Open Elective 2
- CS3551 Project Work /
Elective-3
Open Elective 3 Intership
Big Data Analytics - Elective-4
CCS334 Open Elective 4
Elective-5
Elective 1 Management Elective
Elective-6
Elective 2
All Computer Engg Subjects - [ B.E., M.E., ] (Click on Subjects to enter)
Programming in C Computer Networks Operating Systems
Programming and Data Programming and Data Problem Solving and Python
Structures I Structure II Programming
Database Management Systems Computer Architecture Analog and Digital
Communication
Design and Analysis of Microprocessors and Object Oriented Analysis
Algorithms Microcontrollers and Design
Software Engineering Discrete Mathematics Internet Programming
Theory of Computation Computer Graphics Distributed Systems
Mobile Computing Compiler Design Digital Signal Processing
Artificial Intelligence Software Testing Grid and Cloud Computing
Data Ware Housing and Data Cryptography and Resource Management
Mining Network Security Techniques
Service Oriented Architecture Embedded and Real Time Multi - Core Architectures
Systems and Programming
Probability and Queueing Theory Physics for Information Transforms and Partial
Science Differential Equations
Technical English Engineering Physics Engineering Chemistry
Engineering Graphics Total Quality Professional Ethics in
Management Engineering
Basic Electrical and Electronics Problem Solving and Environmental Science and
and Measurement Engineering Python Programming Engineering