0% found this document useful (0 votes)
2 views

deep learning u1

Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views

deep learning u1

Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

History of Deep Learning

Deep Learning has its roots in the 1940s with the invention of the McCulloch-Pitts Neuron,
which was a mathematical model of a biological neuron. Over the decades, advancements have
been made in both theoretical and computational aspects, such as:

• 1940s: McCulloch-Pitts Neuron laid the foundation for artificial neural networks.

• 1950s-60s: The Perceptron algorithm was developed, showing that machines could learn
from data.

• 1980s: Backpropagation became popular, enabling effective training of multilayer


networks.

• 2000s: Introduction of GPUs revolutionized computational efficiency.

• 2010s and Beyond: Modern frameworks like TensorFlow and PyTorch emerged, and
applications in image recognition, natural language processing, and autonomous
systems soared.

Deep learning has evolved into a critical tool for solving complex, real-world problems,
leveraging massive datasets and advanced architectures.

Introduction to Deep Learning

Deep Learning is a specialized area within Machine Learning, focusing on neural networks with
multiple layers (deep architectures). These networks aim to mimic the human brain’s ability to
learn from data, identifying patterns, and making decisions. By leveraging vast datasets and
computational power, deep learning excels in tasks like image recognition, natural language
processing, and autonomous systems.

McCulloch-Pitts Neuron

• Definition: A computational model of a neuron, introduced by Warren McCulloch and


Walter Pitts in 1943.

• Components:

o Binary inputs (‘0’ or ‘1’).

o Weights assigned to inputs.

o Threshold function: Activates output when input exceeds a threshold.

• Significance: A foundational concept for neural networks.

• Limitations: Cannot handle non-linear relationships or learn from data.


Multilayer Perceptrons (MLPs)

• Structure:

o Input Layer: Accepts features from data.

o Hidden Layers: Perform computations and transformations.

o Output Layer: Produces predictions.

• Key Features:

o Fully connected layers.

o Non-linear activation functions allow complex problem-solving.

• Applications: Used in classification, regression, and pattern recognition tasks.

Representation Power of MLPs

• Universal Approximation Theorem: MLPs with sufficient neurons and layers can
approximate any continuous function.

• Challenges:

o Training requires careful weight initialization.

o Large models may lead to overfitting.

Sigmoid Neurons

• Definition: Neurons that use the sigmoid activation function.

• Formula: σ(x)=11+e−x\sigma(x) = \frac{1}{1 + e^{-x}}

o Output lies between 0 and 1.

• Advantages: Smooth gradient makes optimization easier.

• Drawbacks:

o Saturation: Gradients become very small for extreme input values (vanishing
gradient problem).

Feed Forward Neural Networks

• Definition: A type of neural network where information flows in one direction—from


input to output.

• Features:
o No cycles or feedback loops.

o Suitable for supervised learning tasks.

• Workflow:

1. Input is processed through successive layers.

2. Each layer applies weights, biases, and activation functions.

3. Final output is generated for predictions.

Backpropagation

• Purpose: Train neural networks by minimizing the error between predicted and actual
outputs.

• Steps:

1. Forward Pass: Compute predictions and loss.

2. Backward Pass: Calculate gradients of the loss function with respect to weights
using the chain rule.

3. Weight Update: Use optimization algorithms like Gradient Descent to update


weights.

• Requirement: Activation functions must be differentiable.

Weight Initialization Methods

• Importance: Proper initialization prevents problems like vanishing or exploding


gradients.

• Methods:

o Random Initialization: Small random values.

o Xavier Initialization: Scales weights based on the number of input and output
neurons.

o He Initialization: Optimized for ReLU activations; adjusts scaling based on input


size.

Batch Normalization

• Definition: A technique to normalize inputs to each layer, stabilizing the learning


process.
• Process:

1. Normalize activations within a mini-batch.

2. Scale and shift normalized values using learnable parameters.

• Benefits:

o Reduces internal covariate shift.

o Improves training speed.

o Acts as a regularizer, reducing overfitting.

Representation Learning

• Definition: Automatically discovers meaningful features from raw data.

• Advantages:

o Reduces manual feature engineering.

o Enables better performance on complex datasets.

• Examples:

o Learning edge detectors in image data.

o Discovering word embeddings in natural language processing.

GPU Implementation

• Why GPUs?

o GPUs excel at parallel processing, making them ideal for matrix computations in
deep learning.

• Frameworks:

o TensorFlow, PyTorch, and Keras provide GPU support for faster training.

• Impact:

o Significantly reduces training time for large models and datasets.

Decomposition – PCA and SVD

• Principal Component Analysis (PCA):

o Reduces data dimensionality by projecting it onto principal components.


o Retains maximum variance with fewer features.

o Applications: Noise reduction, data visualization.

• Singular Value Decomposition (SVD):

o Decomposes a matrix into three components: U (left singular vectors), Σ (singular


values), and V^T (right singular vectors).

o Applications: Recommender systems, image compression, and solving linear


systems.

You might also like