0% found this document useful (0 votes)

26 views90 pages

INFO AI Ch4

The document discusses deep convolutional networks (CNNs) and computer vision. It covers: 1. The basic steps to solve problems using neural networks including designing networks, training with data, and testing/analyzing. 2. How CNNs have revolutionized computer vision by enabling trainable feature extractors and classifiers rather than fixed algorithms. 3. An overview of computer vision, from its origins in biological research to applications of CNNs and the development of techniques like SIFT, HoG, and deep learning models.

Uploaded by

rojen003

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

26 views90 pages

INFO AI Ch4

Uploaded by

rojen003

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 90

深度卷积网络 CNN

与计算机视觉
徐丰
复旦大学
如何用神经网络解决问题？
Step1. 根据 • CNN、RNN、FNN、R-CNN、
问题设计神 GAN、ResNet
经网络 • Loss Function、Regularization

Step2. 使用
• SGD+BP、Momentum、
数据训练神 Adam、Dropout、
经网络 BatchNorm

Step3. 测试 • Underfitting/Overfitting
• Data Augmentation
分析 • Cross-Validation
深度卷积网络：计算机视觉领域的革命
 Image/Speech Recognition
Before 2012

SIFT K-means
Trainable Object
HOG Sparse Coding
Mix of Gaussians
Classifier Class
MFCC
Low-level Features: Mid-level Features: supervised
fixed unsupervised
2012-Now: Deep Learning

Trainable Trainable Trainable Object

Trainable
Feature Feature Feature
Classifier Class
Extractor Extractor Extractor
计算机视觉
 Computer vision is an interdisciplinary field that
deals with how computers can be made to gain high-
level understanding from digital images or videos.
 From the perspective of engineering, it seeks to
automate tasks that the human visual system can do.
计算机视觉诞生

• Hubel and Wiesel ,1959

…there are simple and complex neurons in the
primary visual cortex and that visual processing
always starts with simple structures such as
oriented edges…

• Kirsch et al. 1959 (Scanner)

• Roberts, 1963
Machine perception of three-dimensional solids… one
of the precursors of modern Computer Vision.
计算机视觉的诞生

• 1960s: 1st Hype of Artificial Intelligence

• Papert@MIT: Summer Vision Project
…engineer a platform that could perform, automatically, background/foreground
segmentation and extract non-overlapping objects from real-world images…

• Marr, 1982 official birth of CV as a scientific field

《Vision: A computational investigation into the human
representation and processing of visual information》
计算机视觉的发展

传统方法：
• HoG、SIFT
• 图像处理 etc.

神经网络：
• Fukushima, 1975: Cognitron
• 1980s: BP, ANN
• 2006: Hinton Deep Belief Nets
• 2012: AlexNet
• 2016: AlphaGO
计算机视觉的发展
从照相机、计算机到计算机视觉

摄像技术的发展

互联网、大数据的发展

计算机技术发展

Computer Vision算法的发展
计算机视觉常见应用
 Classification
 Segmentation
 Object Recognition
 Detection
 Identification
 Motion Analysis
 Egomotion – 3D rigid-body motion estimation
 Tracking
 Optical flow
 Pose/action recognition
 Scene Reconstruction
 Image Restoration
55 years of hand-crafted features

– The traditional model of pattern recognition (since the late 50's)

– Fixed/engineered features (or fixed kernel) + trainable classifier

Hand-crafted “Simple” Trainable

Feature Extractor Classifier

– Perceptron
Architecture of “classical” recognition systems
“Classic” architecture for pattern recognition
– Speech recognition: 1990-2011
– Object Recognition: 2005-2012
– Handwriting recognition (long ago)
– Graphical model has latent variables (locations of parts)

fixed unsupervised fixed supervised fixed

MFCC SIFT, Gaussians K-

(linear) Graphical
HoG Means Pooling
Sparse Coding Classifier Model
Cuboids

Low-level Mid-level parts, Object,

Features Features phones, Utterance,
characters word
SIFT特征

 边缘检测算子
 直方图统计
Architecture of deep learning-based recognition
systems
“Deep” architecture for pattern recognition
– Speech, and Object recognition: since 2011/2012
– Handwriting recognition: since the early 1990s
– Convolutional Net with optional Graphical Model on top
– Trained purely supervised
– Graphical model has latent variables (locations of parts)

supervise fixed supervise fixed supervise fixed

d d d
Filters + Filters + Filters + Graphical
Pooling Pooling
ReLU ReLU ReLU Model

Low-level Mid-level parts, Object,

Features Features phones, Utterance,
characters word
Future Systems: deep learning + structured
prediction
Globally-trained deep architecture
– Handwriting recognition: since the mid 1990s
– Speech Recognition: since 2011
– All the modules are trained with a combination of unsupervised and
supervised learning
– End-to-end training == deep structured prediction

Unsup + Unsup + Unsup + Unsup +

supervised fixed supervised fixed supervisedsupervised

Filters + Filters + Filters + Graphical

Pooling Pooling
ReLU ReLU ReLU Model

Low-level Mid-level parts, Object,

Features Features phones, Utterance,
characters word
Deep learning = learning hierarchical
representations
It's deep if it has more than one stage of non-linear feature transformation

Low-level Mid-level High-level Trainable

feature feature feature classifier

Feature visualization of convolutional net trained on ImageNet from [Zeiler & Fergus 2 0 1 3 ]
深度卷积网络：生物学基础
视觉神经处理
 空间平移不变性
 Translational Invariance

 物体的空间聚合属性
 Rigidity
视觉神经处理
 物体可分解/部件可组合（多层）
 Compositionality

 物体的空间线性组合性
卷积层

卷积操作变量 - Total input to the j-th feature map of layer l at position 𝑥, 𝑦 :

𝐼 𝐹−1
(𝑙) 𝑙 𝑙−1 (𝑙)
𝑙−1
𝑉𝑗 𝑥, 𝑦 = ෍ ෍ 𝑘𝑗𝑖 𝑢, 𝑣 ∙ 𝑂𝑖 𝑥 − 𝑢, 𝑦 − 𝑣 + 𝑏𝑗
• 𝑂𝑖 𝑖 = 1, ⋯ , 𝐼 : feature maps on 𝑖=1 𝑢,𝑣=0
the 𝑙 − 1 layer - Convolution layer output:
𝑙
• 𝑘𝑗𝑖 𝑢, 𝑣 : trainable convolution 𝑙 (𝑙)
𝑂𝑗 𝑥, 𝑦 = 𝑓 𝑉𝑗 𝑥, 𝑦
kernel
(𝑙)
• 𝑏𝑗 : trainable bias - REctified Linear Unit (ReLU): 𝑓 𝑥 = max(0, 𝑥)
卷积操作

(𝑙) (𝑙−1) (𝑙)

𝒐𝑗 = max(0, ෍ 𝒐𝑖 ∗ 𝒘𝑖𝑗 )
𝑗

Rectified Linear Unit (ReLU)

降维—池化
 用空域模板提取特征，特征的空间分辨率降低
 信息从空间维度转移到特征维度

Convolution Pooling
降维—池化
 用空域模板提取特征，特征的空间分辨率降低
 信息从空间维度转移到特征维度

100%

Convolution Pooling
池化
 局部窗口取最大(Max Pooling)或均值(Average Pooling)
 优点：数据降维、局部平移/扭曲不变性

- Pooling layer output:

𝑙+1 𝑙
𝑂𝑖 𝑥, 𝑦 = max 𝑂𝑖 𝑥 ∙ 𝑠 + 𝑢, 𝑦 ∙ 𝑠 + 𝑣
𝑢,𝑣=0,⋯,𝐺−1
- G: pooling size
- s: stride (spacing between adjacent pooling windows)
多层卷积+池化
 多层特征提取
 卷积操作：平移不变性
 权值共享：大大降低可训练的参数数量
 池化：空域信息经过降维转化到高维特征域
 池化丢失精确的定位能力（可通过多级尺度特征描述）
Layer 2
Feature “Maps”

Layer-3 Filters

Layer 1
Feature “Maps”
Layer-2 Filters

Input Image
Layer-1 Filters
Layer 3
Feature “Maps” Classifier

Layer 2
Feature “Maps”

Layer-3 Filters

Layer 1
Feature “Maps”
Layer-2 Filters

Input Image
Layer-1 Filters
[He, 2016]
卷积层的BP实现
 限制条件：权值相等
 We compute the gradients as usual, and then modify
the gradients so that they satisfy the constraints.
 So if the weights started oﬀ satisfying the constraints,
they will continue to satisfy them.

 ReLU的梯度
0 𝑥<0
 𝑓′ =ቊ
1 𝑥≥0

 误差回传时无衰减！
池化层的BP实现
 保存最大输入的位置信息

𝑥𝑚𝑎𝑥
𝑥𝑖 = 𝑥𝑚𝑎𝑥 max

𝐿′ 𝐿′
max

0
Softmax
 Applies to the output layer in the case of multi-class
classification
 Estimates the posterior probabilities over each class:
𝐿
exp 𝑂𝑖
𝑝𝑖 = 𝐿
(𝑖 = 1, ⋯ , 𝐾)
σ𝐾
𝑗=1 exp 𝑂𝑗
Cross-Entropy (Softmaxloss)
 Cross-entropy loss function: minimizing the discrepancy
between the ground truth y and the network prediction p:
𝐾
𝐿 𝑤 =෍ −𝑡𝑗 log 𝑝𝑗 = − log 𝑝𝑦
𝑗=1

 Gradient of Softmaxloss is balanced out between

softmax and cross-entropy

𝜕𝐿 𝜕𝐿 𝜕𝑝
 = ∙ =𝑡−𝑝
𝜕𝑂 𝜕𝑝 𝜕𝑂

 误差回传时无衰减、无失真！
CNN的网络参数
Feature Feature Feature
Kernel 1 Kernel 2 Kernel n map 1 map 2 map n
x0 x0 x0

x
y0 y0 y0 x'

d d d

y
y'

1 1 1
d

xxyxd x0 x y0 x d x n x' x y' x n

(1,1)
y0

x p
s

(1,2)

x p
s

(2,1)

x p
CNN网络参数
 Padding
 Stride

𝑥+2𝑝 −𝑥0
 𝑥′ = +1
𝑠
𝑦+2𝑝 −𝑦0
 𝑦′ = +1
𝑠

 applies to both Conv and Pooling layers.

 Once parameters of input (previous) layer and conv/pool

layer are given, the size of output (next) layer are
automatically given.
全连接层
 当输入层为1x1xd，卷积核为1x1xdxn
 则输出为1x1xn
 全连接层为卷积层的特例
Convolutional network
Filter Bank +non-linearity

Pooling

Filter Bank +non-linearity

Pooling

Filter Bank +non-linearity

[LeCun et al. NIPS 1989]

典型CNN的网络结构
输入数据
 1D
 Signals
 2D
 Image
 Voice Cepstrum
 3D
 Tomography
 Video
CNN及应用
CNN及应用
 CNN类型
 RCNN
 DenseNet
 U-Net
 CV应用
 Classification
 Object Detection
 Segmentation
 etc.
Classification

Data Output

Predict A Predict B Predict C Accuracy

Actual A 9 1 0 90%

Actual B 1 7 2 70%

Actual C 1 0 8 80%

Total Accuracy 80%

2012 AlexNet

 使用ImageNet数据训练网络，ImageNet数据库含有1500多万个带标
记的图像，超过2.2万个类别。
 使用ReLU
 使用image translation、reflection、patch extraction
 用dropout
 使用批处理随机梯度下降训练模型，注明动量衰减值和权重衰减值。
 用两台GTX 580 GPU，训练了5到6天
2013 ZF Net

 AlexNet训练用了1500万张图片，而ZFNet只用了130万张。
 AlexNet在第一层中使用了大小为11×11的滤波器，而ZF
使用的滤波器大小为7x7，
 使用一台GTX 580 GPU训练了12天。
 开发可视化技术“解卷积网络”（Deconvolutional
Network）
DeconvNet
2014 VGG Net
 3x3的滤波器，3卷积
层具有7x7的有效感
受野。
 每个maxpool层后滤
波器的数量增加一倍
 训练中使用scale
jittering的数据增强技
术。
 使用4台英伟达Titan
Black GPU训练了两
到三周。
2015 GoogLeNet

 整个架构中使用了9个Inception 模型，总共超过100层。
 没有使用完全连接的层。他们使用一个平均池代替，从
7x7x1024 的体积降到了 1x1x1024，这节省了大量的参数。
比AlexNet的参数少了12倍
 在感知模型中，使用了R-CNN中的概念。
 高端的GPU一周内就能完成训练。
2015 ResNet
关于ResNet
关于ResNet
关于ResNet
 ResNet
 AlphaGO Zero
Object Detection & Recognition
Classification + localization. Results
Classification + localization: multiscale sliding window

– Apply convnet with a sliding window over the image at multiple scales
– Important note: it's very cheap to slide a convnet over an image
– Just compute the convolutions over the whole image and replicate the fully-
connected layers
Classification + Localization: sliding window +
bounding box regression
– Apply convnet with a sliding window over the image at multiple scales
– For each window, predict a class and bounding box parameters
– Even if the object is not completely contained in the viewing window, the convnet
can predict where it thinks the object is.
Classification + Localization: sliding window +
bounding box regression + bbox voting

– Apply convnet with a sliding window over the image at multiple scales
– For each window, predict a class and bounding box parameters
– Compute an “average” bounding box, weighted by scores
2013：R-CNN

 R-CNN将物体识别分为两步：区域建议+分类。
2015： Fast R-CNN
YOLO-You Only Look Once
 核心思想：YOLO将物体检测作为回归问题求解，基于一个单独的end-to-end
网络，完成从原始图像的输入到物体位置和类别的输出。

 主要特点：速度快、背景错误率低、泛化能力强

 检测流程：

 网络结构：
YOLO-You Only Look Once
 模型： 1. 图片分为S*S个网格，每个网格负责
检测中心落在该网格的物体；
2. 每个网格输出预测的边界框的位置(x,
y, w, h)及其包含目标的概率与属于某
一类别的概率confidence；

 比较：

背景识别错误率显著降低
图 Fast R-CNN与YOLO识别情况对比
Correct: correct class and IOU > .5
Localization: correct class, .1 <IOU < .5 YOLO在保证检测率的前提下，检测
Similar: class is similar, IOU > .1 速度最快
Redmon J, Divvala S, Girshick R, et al. You only look once: Unified, real-time object detection[C]//Proceedings of the IEEE conference on
computer vision and pattern recognition. 2016: 779-788.
SSD
 YOLO: 将目标检测转换成一个回归问题

Single Shot Detector(SSD) 是在

2015提出的一种目标检测方法。SSD
综合了YOLO中的回归思想以及Faster
R-CNN的anchor机制。

 和YOLO一样通过回归获取目标位
置和类别  Faster R-CNN: Region Proposal Network, anchor 机制

 利用Faster R-CNN的anchor机制
建立位置和特征的对应关系。
 保持YOLO速度快以及Faster R-
CNN准确度高的特性

[1]Liu W, Anguelov D, Erhan D, et al. Ssd: Single shot multibox detector[C]//European conference on computer vision. Springer, Cham, 2016: 21-37.
[2] Redmon J, Divvala S, Girshick R, et al. You only look once: Unified, real-time object detection[C]//Proceedings of the IEEE conference on
computer vision and pattern recognition. 2016: 779-788.
SSD
 Single Shot Detector(SSD)网络结构图：

 SSD512模型在COCO test-dev上的检测实例。我们展示了分数高于0.6的检测。每种颜色对应一
种目标类别。
Semantic Segmentation Y
 Labeling every pixel with the object it belongs to LeCun

 Would help identify obstacles, targets, landing sites, dangerous areas

Would help line up depth map with edge maps

[Farabet et al. ICML 2012, PAMI 2013]

Scene parsing/labeling

[Farabet et al. ICML 2012, PAMI 2013]

2017：Mask R-CNN

Mask R-CNN在检测出目标位置和类别的同时，还能在预测框中将目标分割出。
关于Mask R-CNN

Mask R-CNN = Faster R-CNN + Mask

Mask R-CNN对Faster R-CNN结构基础上搭建，在Faster R-CNN的RPN
提出的Region上预测一个二进制的Mask，使网络在检测的同时对区域中
的目标进行分割。
Fully-Convolutional Network
 用CNN做按像素分类（Segmentation）：
 存储开销很大
 计算效率低下
 像素块大小的限制了感知区域的大
 FCN将传统CNN中的全连接层转化成卷积层
U-Net
一般的自编码器输入与输出不
能共享低层信息，低层信息很
容易丢失。为了解决生成器遇
skip connections
到信息瓶颈的问题，我们在网
络中增加了一些跳连接，这样
的网络被称为U-Net.

GAN部分的损失函数:

𝐿𝐺𝐴𝑁 (𝐺) = − ෍ 𝐸𝑧~𝑝𝑑𝑎𝑡𝑎(𝑖) log 𝐷 𝐺 𝑧

𝑖
传统的损失函数:

𝐿𝐿1 𝐺 = ෍ 𝐸𝑥~𝑝𝑑𝑎𝑡𝑎(𝑖) ,𝑧~𝑝𝑑𝑎𝑡𝑎(𝑗) ||𝑥 − 𝐺 𝑧 ||11

𝑖
联合损失函数:
𝐿 𝐺 = 𝛽1 𝐿𝐺𝐴𝑁 𝐺 + 𝛽2 𝐿𝐿1 𝐺
U-Net

(a)

图(a)中所示，U-Net能完成图像处理和计算
机视觉中包括的很多图像转换的工作；图(b)
反映了相比于GAN，用传统的损失函数生成
的图像更加模糊。

(b)
SegNet-语义分割网络
➢ SegNet的提出： University of Cambridge 应用于自动驾驶或者智能机器人
Pooling Indices
➢ SegNet的网络结构：输入 Encoder Decoder Softmax
输出
RGB图像 RGB分类图
13-layers 13-layers
（VGG-16）
➢ SegNet的创新点：Decoder方式

• 利用Encoder阶段下采样的最大池化的索引
• 减少了模型参数

图一：SegNet网络结构图2：SegNet的Decoder方式
[1] Computer Vision and Robotics Group at the University of Cambridge, UK. http://mi.eng.cam.ac.uk/projects/segnet/
[2] Badrinarayanan, Vijay, Alex Kendall, and Roberto Cipolla. "SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation." arXiv preprint
arXiv:1511.00561 (2015).
SegNet-语义分割网络
➢ Bayesian SegNet 网络结构：
Pooling Indices 输出
输入 RGB分类图
Encoder Decoder Softmax
RGB图像输出
13-layers 13-layers
（VGG-16）置信度图

➢ Bayesian SegNet创新点：在卷积层中多加了一个DropOut层以输出不确定性

➢ Bayesian SegNet优势：
• 效果相比于SegNet提升2-3%
• 提升了小数据集的效果

[1] Kendall, Alex, Vijay Badrinarayanan, and Roberto Cipolla. "Bayesian SegNet: Model Uncertainty in Deep Convolutional Encoder-Decoder Architectures for Scene
Understanding." arXiv preprint arXiv:1511.02680 (2015).
ConvNet for stereo matching Y LeCun

– Using a ConvNet to learn a similarity

measure between image patches

[LeCun, 2016]
Pose estimation and attribute recovery withY
ConvNets LeCun

Pose-Aligned Network for Deep Attribute Modeling [Zhang

et al. CVPR 2014] (Facebook AI Research) Real-time hand pose
recovery [Tompson et
al. Trans. on Graphics
14]

Body pose estimation [Tompson et al. ICLR, 2 0 1 4 ]

[LeCun, 2016]
[LeCun, 2016]
[LeCun, 2016]
[LeCun, 2016]
[LeCun, 2016]
[LeCun, 2016]
References
– NVIDIA/NYU – Deep Learning Institute Teaching Kit
– Goodfellow: Deep Learning, MIT Press
– LeCun, Deep Learning Tutorial, 2016
– Lee, Deep Learning Tutorial, 2017
Thank You
Q&A

[email protected]
www.emwlab.fudan.edu.cn

Chapitre-8-2024 (1)
No ratings yet
Chapitre-8-2024 (1)
231 pages
Lecture 08
No ratings yet
Lecture 08
43 pages
Deep Learning Hardware
No ratings yet
Deep Learning Hardware
82 pages
Lec5 CNN RNN Attention
No ratings yet
Lec5 CNN RNN Attention
71 pages
Week5_Computer_Vision
No ratings yet
Week5_Computer_Vision
58 pages
Lecture_3
No ratings yet
Lecture_3
48 pages
Access 2007 VBA Bible PDF
100% (5)
Access 2007 VBA Bible PDF
722 pages
Lec6 RNN Attention Search
No ratings yet
Lec6 RNN Attention Search
62 pages
cv_2025_Spring_16
No ratings yet
cv_2025_Spring_16
53 pages
Dlai DL CNN
No ratings yet
Dlai DL CNN
193 pages
Chapter 5 Deep Learning
No ratings yet
Chapter 5 Deep Learning
35 pages
CV Ss16 0609 Deep Learning
No ratings yet
CV Ss16 0609 Deep Learning
91 pages
6-DeepVisualLearning L6
No ratings yet
6-DeepVisualLearning L6
82 pages
Unit III
No ratings yet
Unit III
58 pages
UNIT IV_NNDL (3)
No ratings yet
UNIT IV_NNDL (3)
32 pages
List of Available RTCA Docs
No ratings yet
List of Available RTCA Docs
148 pages
Kernel Slides
No ratings yet
Kernel Slides
33 pages
An Analysis of Convolutional Neural Network Architectures
No ratings yet
An Analysis of Convolutional Neural Network Architectures
54 pages
Classic Cnn
No ratings yet
Classic Cnn
39 pages
Hot Chips Overview
No ratings yet
Hot Chips Overview
47 pages
001 Intro
No ratings yet
001 Intro
66 pages
Convolutional Networks
No ratings yet
Convolutional Networks
211 pages
Lecture4 - Convnets For CV Slide
No ratings yet
Lecture4 - Convnets For CV Slide
65 pages
CV Mot
No ratings yet
CV Mot
69 pages
7 CNN
No ratings yet
7 CNN
66 pages
MN906 AI Watermarking
No ratings yet
MN906 AI Watermarking
99 pages
Lec14-CNNRNNModels
No ratings yet
Lec14-CNNRNNModels
64 pages
FT04_Haghighat_Independent_2023
No ratings yet
FT04_Haghighat_Independent_2023
40 pages
23 DeepLearning PDF
No ratings yet
23 DeepLearning PDF
74 pages
DL-UNIT-3
No ratings yet
DL-UNIT-3
12 pages
Hardware Architectures For Deep Neural Networks-MIT'16
No ratings yet
Hardware Architectures For Deep Neural Networks-MIT'16
300 pages
Convnets From Thesis
No ratings yet
Convnets From Thesis
9 pages
Unit 5a - Machine Vision
No ratings yet
Unit 5a - Machine Vision
55 pages
C8-Modern CNNs
No ratings yet
C8-Modern CNNs
57 pages
02 - Introduction to Convolutional Neural Networks (CNNs)
No ratings yet
02 - Introduction to Convolutional Neural Networks (CNNs)
28 pages
W01 PracticalProblemsProjects
No ratings yet
W01 PracticalProblemsProjects
27 pages
Oct2022 CSC649 SupervisedDL - CNN
No ratings yet
Oct2022 CSC649 SupervisedDL - CNN
79 pages
Convolution Neural Networks
No ratings yet
Convolution Neural Networks
80 pages
Convolutional Neural Networks _ deeplearning-notes
No ratings yet
Convolutional Neural Networks _ deeplearning-notes
43 pages
Convnets
No ratings yet
Convnets
41 pages
L10 - Intro - To - Deep - Learning
No ratings yet
L10 - Intro - To - Deep - Learning
75 pages
Deep Learning Unit2
No ratings yet
Deep Learning Unit2
43 pages
Introduction To Convolutional Neural Network (CNN) Using Tensorflow - by Govinda Dumane - Towards Data Science
No ratings yet
Introduction To Convolutional Neural Network (CNN) Using Tensorflow - by Govinda Dumane - Towards Data Science
17 pages
A ConvNet For The 2020s - FaceBook - Zhuang Liu, Hanzi Mao, Chao-Yuan Wu, Christoph Feichtenhofer, Trevor Darrell, Saining Xie - 2201.03545
No ratings yet
A ConvNet For The 2020s - FaceBook - Zhuang Liu, Hanzi Mao, Chao-Yuan Wu, Christoph Feichtenhofer, Trevor Darrell, Saining Xie - 2201.03545
14 pages
ch4_CNN
No ratings yet
ch4_CNN
35 pages
BMM 2018 - Deep Learning Tutorial
No ratings yet
BMM 2018 - Deep Learning Tutorial
47 pages
Convolutional Neural Networks in Python _ DataCamp
No ratings yet
Convolutional Neural Networks in Python _ DataCamp
22 pages
Comprehensive Notes on Advanced CNN Concepts & Vision Tasks
No ratings yet
Comprehensive Notes on Advanced CNN Concepts & Vision Tasks
5 pages
Convolutional_Networks_2024
No ratings yet
Convolutional_Networks_2024
44 pages
Canon LBP 1000
100% (1)
Canon LBP 1000
174 pages
Deep Learning Resources
No ratings yet
Deep Learning Resources
5 pages
Convolutional Neural Networks: CMSC 733 Fall 2015 Angjoo Kanazawa
No ratings yet
Convolutional Neural Networks: CMSC 733 Fall 2015 Angjoo Kanazawa
55 pages
Uploading Bulk Data Into ERP Using FBDI Through OIC 1750051375
No ratings yet
Uploading Bulk Data Into ERP Using FBDI Through OIC 1750051375
17 pages
L09-10 DL and CNN
No ratings yet
L09-10 DL and CNN
56 pages
1119810450-3
No ratings yet
1119810450-3
6 pages
CNN 2
No ratings yet
CNN 2
47 pages
Convolutional Neural Networks: CS 535 Deep Learning, Winter 2020 Fuxin Li
No ratings yet
Convolutional Neural Networks: CS 535 Deep Learning, Winter 2020 Fuxin Li
44 pages
CNN
No ratings yet
CNN
31 pages
Convolutional Neural Networks: Computer Vision CS 543 / ECE 549 University of Illinois Jia-Bin Huang
No ratings yet
Convolutional Neural Networks: Computer Vision CS 543 / ECE 549 University of Illinois Jia-Bin Huang
76 pages
Convolutional Neural Networks-CNN PDF
No ratings yet
Convolutional Neural Networks-CNN PDF
95 pages
Apple Method 2024
No ratings yet
Apple Method 2024
11 pages
Intro CNN PDF
No ratings yet
Intro CNN PDF
31 pages
Ch-3 Convolutional Neural Networks (CNNs)
No ratings yet
Ch-3 Convolutional Neural Networks (CNNs)
11 pages
Deep Convolutional Neural Networks For Image Classification: Many Slides From Rob Fergus (NYU and Facebook)
No ratings yet
Deep Convolutional Neural Networks For Image Classification: Many Slides From Rob Fergus (NYU and Facebook)
55 pages
Intel Core 14th Gen S Series Media Presentation
No ratings yet
Intel Core 14th Gen S Series Media Presentation
37 pages
Solon TCD
No ratings yet
Solon TCD
17 pages
En CE Logger1000 Series CE-RED CyberSecurity Certification
No ratings yet
En CE Logger1000 Series CE-RED CyberSecurity Certification
3 pages
3 mins pidato bi
No ratings yet
3 mins pidato bi
11 pages
Waves of Technology
No ratings yet
Waves of Technology
22 pages
3D Laser Scanning Project Technical Report
No ratings yet
3D Laser Scanning Project Technical Report
13 pages
Kodedkloud Instalation Hardway
No ratings yet
Kodedkloud Instalation Hardway
153 pages
2023 Virbration
No ratings yet
2023 Virbration
27 pages
Warehouse Design
No ratings yet
Warehouse Design
11 pages
Capgemini and Sony Interview Questions
No ratings yet
Capgemini and Sony Interview Questions
11 pages
Brief Biodata GSN Raju
No ratings yet
Brief Biodata GSN Raju
3 pages
Technical Manual For ETA 08 0213 For CFS CU Firestop Cushion For Penetrations Technical Information ASSET DOC 2398866
No ratings yet
Technical Manual For ETA 08 0213 For CFS CU Firestop Cushion For Penetrations Technical Information ASSET DOC 2398866
10 pages
All About ICT
No ratings yet
All About ICT
2 pages
Brochure Lonking LG 833 NM PDF
100% (1)
Brochure Lonking LG 833 NM PDF
2 pages
AOC Catalog Astra-K Sports Tourer Central Market English
No ratings yet
AOC Catalog Astra-K Sports Tourer Central Market English
14 pages
OptiX RTN 600 Routine Maintenance
No ratings yet
OptiX RTN 600 Routine Maintenance
40 pages
Maintenance Methods and Systems
No ratings yet
Maintenance Methods and Systems
6 pages
Vdocuments - MX - Lullaby Warmer Ge Mediadownloadsinproductslullaby Warmer Ge Healthcare PDF
No ratings yet
Vdocuments - MX - Lullaby Warmer Ge Mediadownloadsinproductslullaby Warmer Ge Healthcare PDF
8 pages
Module 1 History and Philosophy PDF
No ratings yet
Module 1 History and Philosophy PDF
11 pages
IOT - Based Smart Basket
No ratings yet
IOT - Based Smart Basket
3 pages
Specific Jaguar Engine Light OBD-II DTC Trouble Codes B (1) .Cleaned
No ratings yet
Specific Jaguar Engine Light OBD-II DTC Trouble Codes B (1) .Cleaned
6 pages
Bom Display72
No ratings yet
Bom Display72
2 pages
OSINT Framework
No ratings yet
OSINT Framework
1 page
3800 Cover
No ratings yet
3800 Cover
3 pages
Radar Principle - SkyRadar PDF
No ratings yet
Radar Principle - SkyRadar PDF
1 page
Pyramid Image Processing: Exploring the Depths of Visual Analysis
From Everand
Pyramid Image Processing: Exploring the Depths of Visual Analysis
Fouad Sabry
No ratings yet

INFO AI Ch4

Uploaded by

INFO AI Ch4

Uploaded by

深度卷积网络 CNN

Trainable Trainable Trainable Object

• Hubel and Wiesel ,1959

• Kirsch et al. 1959 (Scanner)

• 1960s: 1st Hype of Artificial Intelligence

• Marr, 1982 official birth of CV as a scientific field

– The traditional model of pattern recognition (since the late 50's)

Hand-crafted “Simple” Trainable

fixed unsupervised fixed supervised fixed

MFCC SIFT, Gaussians K-

Low-level Mid-level parts, Object,

supervise fixed supervise fixed supervise fixed

Low-level Mid-level parts, Object,

Unsup + Unsup + Unsup + Unsup +

Filters + Filters + Filters + Graphical

Low-level Mid-level parts, Object,

Low-level Mid-level High-level Trainable

卷积操作变量 - Total input to the j-th feature map of layer l at position 𝑥, 𝑦 :

(𝑙) (𝑙−1) (𝑙)

Rectified Linear Unit (ReLU)

- Pooling layer output:

 Gradient of Softmaxloss is balanced out between

xxyxd x0 x y0 x d x n x' x y' x n

 applies to both Conv and Pooling layers.

 Once parameters of input (previous) layer and conv/pool

Filter Bank +non-linearity

Filter Bank +non-linearity

[LeCun et al. NIPS 1989]

Predict A Predict B Predict C Accuracy

Total Accuracy 80%

Single Shot Detector(SSD) 是在

 Would help identify obstacles, targets, landing sites, dangerous areas

[Farabet et al. ICML 2012, PAMI 2013]

[Farabet et al. ICML 2012, PAMI 2013]

Mask R-CNN = Faster R-CNN + Mask

𝐿𝐺𝐴𝑁 (𝐺) = − ෍ 𝐸𝑧~𝑝𝑑𝑎𝑡𝑎(𝑖) log 𝐷 𝐺 𝑧

𝐿𝐿1 𝐺 = ෍ 𝐸𝑥~𝑝𝑑𝑎𝑡𝑎(𝑖) ,𝑧~𝑝𝑑𝑎𝑡𝑎(𝑗) ||𝑥 − 𝐺 𝑧 ||11

– Using a ConvNet to learn a similarity

Pose-Aligned Network for Deep Attribute Modeling [Zhang

Body pose estimation [Tompson et al. ICLR, 2 0 1 4 ]

You might also like