0% found this document useful (0 votes)
12 views25 pages

AI2025 Lecture09 Inperson Slide

The document is a lecture on shallow neural networks presented by Taewook Kang, focusing on the architecture, parameters, and gradient descent methods for training such networks. It includes detailed mathematical formulations for cost functions, partial derivatives, and the application of binary cross-entropy loss. The content is adapted from Prof. Woowhan Jung's materials at Hanyang University.

Uploaded by

chiyeon0607
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views25 pages

AI2025 Lecture09 Inperson Slide

The document is a lecture on shallow neural networks presented by Taewook Kang, focusing on the architecture, parameters, and gradient descent methods for training such networks. It includes detailed mathematical formulations for cost functions, partial derivatives, and the application of binary cross-entropy loss. The content is adapted from Prof. Woowhan Jung's materials at Hanyang University.

Uploaded by

chiyeon0607
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 25

1

AI System Semiconductor Design


Lecture9 (In-person): Shallow Neural
Network
4/2/2025
Lecturer: Taewook Kang
Acknowledgments
Lecture material adapted from
Prof. Woowhan Jung, DSLAB, Hanyang Univ.

SKKU Kang Research Group / SEE3007 Spring 2025 1 1


REVIEW: SHALLOW NEURAL NETWORK

SKKU Kang Research Group / SEE3007 Spring 2025 2


Shallow Neural Network
Parameters: 𝜽 = {𝑾[1] , 𝒃[1] , 𝒘 2 , 𝑏[2] }
tanh
Loss 𝐿 𝑦,
ො 𝑦
𝑥1 Cost 𝐽 𝜃 = σ𝑚
𝑞=1 𝐿 𝑦
ො 𝑞 , 𝑦 (𝑞)
tanh
Gradient descent
𝑥2 𝜎 𝑦ො σ𝑚 ො 𝑞 ,𝑦 (𝑞)
𝑖=1 𝛻𝜽 𝐿 𝑦
tanh 𝜽 ≔ 𝜽 − 𝜂𝛻𝜽 𝐽 𝜽 = 𝜽 − 𝜂
𝑚
𝑥3
𝑞 , 𝑦 (𝑞)
tanh So, we will compute 𝛻𝜽𝐿 𝑦ො for 1 ≤ 𝑞 ≤ 𝑚

𝜕𝐿 𝑎 2 , 𝑦
𝜕𝑏 [2]
𝒛[1] = 𝑾[1] 𝒙 + 𝒃[1]
𝜕𝐿 𝑎 2 , 𝑦 For 1 ≤ 𝑖 ≤ ℎ
𝒂[1] = tanh 𝒛[1] Element-wise 2
𝜕𝑤𝑖
representations
𝑧 [2] =𝒘 2 𝑇 𝒂[1] + 𝑏[2] 𝜕𝐿 𝑎 2 , 𝑦
For 1 ≤ 𝑖 ≤ ℎ
𝑦ො = 𝑎 2 =𝜎 𝑧 2 𝜕𝑏𝑖
1

𝜕𝐿 𝑎 2 , 𝑦
For 1 ≤ 𝑖 ≤ ℎ, 1 ≤ 𝑗 ≤ 𝑛
1
SKKU Kang Research Group / SEE3007 Spring 2025 𝜕𝑊𝑖𝑗 3
Partial Derivatives: 2nd layer
𝑦
Parameters 𝑾[1] , 𝒃[1] , 𝒘 2 , 𝑏 [2]
𝒙
𝑇
𝑾[1] 𝒛1 =𝑾1 𝒙+𝒃1 𝒂 1 = tanh 𝒛 1 𝑧 2 =𝒘2 𝒂𝟏 +𝑏2 𝑎2 =𝜎 𝑧2 𝐿 𝑎 2 ,𝑦

𝒃[1] 𝒘[2] Binary cross entropy


𝑏[2] 𝐿 𝑎 2 , 𝑦 = −𝑦 log 𝑎 2 − 1 − 𝑦 log 1 − 𝑎 2

𝜕𝐿 𝑎 2 ,𝑦 𝜕𝐿 𝑎 2 ,𝑦 𝜕𝑎 2 𝜕𝑧 2
▪ 𝜕𝑏 [2]
=
𝜕𝑎 [2] 𝜕𝑧 [2] 𝜕𝑏 [2]
𝑑𝜎 𝑥
= 𝜎 𝑥 (1 − 𝜎 𝑥 )
𝑑𝑥
−𝑦 1−𝑦 2
= [2] + 𝜎 𝑧 1−𝜎 𝑧 2
𝑎 1 − 𝑎[2]

−𝑦 1−𝑦 2 1−𝑎2
= [2] + 𝑎
𝑎 1 − 𝑎[2]
Same with logistic
regression
= −𝑦 1 − 𝑎 2 + 𝑎[2] 1 − 𝑦 =𝑎[2] − y

SKKU Kang Research Group / SEE3007 Spring 2025 4


Partial Derivatives: 2nd layer
𝑦
Parameters 𝑾[1] , 𝒃[1] , 𝒘 2 , 𝑏 [2]
𝒙
𝑇
𝑾[1] 𝒛1 =𝑾1 𝒙+𝒃1 𝒂 1 = tanh 𝒛 1 𝑧 2 =𝒘2 𝒂𝟏 +𝑏2 𝑎2 =𝜎 𝑧2 𝐿 𝑎 2 ,𝑦

𝒃[1] 𝒘[2] Binary cross entropy


𝑏[2] 𝐿 𝑎 2 , 𝑦 = −𝑦 log 𝑎 2 − 1 − 𝑦 log 1 − 𝑎 2

𝜕𝐿 𝑎 2 ,𝑦 𝜕𝐿 𝑎 2 ,𝑦 𝜕𝑎 2 𝜕𝑧 2
▪ 𝜕𝑤𝑖
2 =
𝜕𝑎 [2] 𝜕𝑧 [2] 𝜕𝑤 2
𝑖

𝜕𝑧 2
= 𝑎[2] − 𝑦
𝜕𝑤𝑖 2

= 𝑎[2] − 𝑦 𝑎𝑖1

SKKU Kang Research Group / SEE3007 Spring 2025 5


Partial Derivatives: 1st layer
𝑦
Parameters 𝑾[1] , 𝒃[1] , 𝒘 2 , 𝑏 [2]
𝒙
𝑇
𝑾[1] 𝒛1 =𝑾1 𝒙+𝒃1 𝒂 1 = tanh 𝒛 1 𝑧 2 =𝒘2 𝒂𝟏 +𝑏2 𝑎2 =𝜎 𝑧2 𝐿 𝑎 2 ,𝑦

𝒃[1] 𝒘[2]
𝑏[2] 𝐿 𝑎 2 , 𝑦 = −𝑦 log 𝑎 2 − 1 − 𝑦 log 1 − 𝑎 2
1 1
𝜕𝐿 𝑎 2 ,𝑦 𝜕𝐿 𝑎 2 ,𝑦 𝜕𝑎 2 𝜕𝑧 2 𝜕𝑎𝑖 𝜕𝑧𝑖
= 𝑑 tanh 𝑥
𝜕𝑏𝑖
1 𝜕𝑎 [2] 𝜕𝑧 [2] 𝜕𝑎 1 𝜕𝑧 1 𝜕𝑏 1 = 1 − tanh2 𝑥
𝑖 𝑖 𝑖 𝑑𝑥
1 1
𝜕𝑧 2 𝜕𝑎𝑖 𝜕𝑧𝑖
= 𝑎[2] − 𝑦
𝜕𝑎𝑖1 𝜕𝑧𝑖 1 𝜕𝑏𝑖 1

= 𝑎[2] − 𝑦 𝑤𝑖 2 1 − tanh2 𝑧𝑖 1 ⋅ 1

2 12
= 𝑎[2] − 𝑦 𝑤𝑖 1 − 𝑎𝑖
SKKU Kang Research Group / SEE3007 Spring 2025 6
Partial Derivatives: 1st layer
𝑦
Parameters 𝑾[1] , 𝒃[1] , 𝒘 2 , 𝑏 [2]
𝒙
𝑇
𝑾[1] 𝒛1 =𝑾1 𝒙+𝒃1 𝒂 1 = tanh 𝒛 1 𝑧 2 =𝒘2 𝒂𝟏 +𝑏2 𝑎2 =𝜎 𝑧2 𝐿 𝑎 2 ,𝑦

𝒃[1] 𝒘[2]
𝑏[2] 𝐿 𝑎 2 , 𝑦 = −𝑦 log 𝑎 2 − 1 − 𝑦 log 1 − 𝑎 2
1 1
𝜕𝐿 𝑎 2 ,𝑦 𝜕𝐿 𝑎 2 ,𝑦 𝜕𝑎 2 𝜕𝑧 2 𝜕𝑎𝑖 𝜕𝑧𝑖 Note
1 =
𝜕𝑊𝑖𝑗 𝜕𝑎 [2] 𝜕𝑧 [2] 𝜕𝑎 1 𝜕𝑧 1 𝜕𝑊 1 1 1𝑇 1
𝑖 𝑖 𝑖𝑗 𝑧𝑖 = 𝒘𝑖 𝒙 + 𝑏𝑖
𝑚
1
2 12 𝜕𝑧𝑖 = ෍ 𝑊𝑖𝑗 𝑥𝑗 + 𝑏𝑖
1 1
= 𝑎[2] − 𝑦 𝑤𝑖 1 − 𝑎𝑖
𝜕𝑊𝑖𝑗1 𝑗=1
1
𝜕𝑧𝑖
⇒ 1
= 𝑥𝑗
12 𝜕𝑊𝑖𝑗
= 𝑎[2] − 𝑦 𝑤𝑖 2 1 − 𝑎𝑖 𝑥𝑗

SKKU Kang Research Group / SEE3007 Spring 2025 7


Partial derivatives
𝜕𝐿 𝑎 2 ,𝑦
▪ 𝜕𝑏 [2]
= 𝑎[2] − y
𝜕𝐿 𝑎 2 ,𝑦 1
▪ 𝜕𝑤𝑖
2 = 𝑎[2] − 𝑦 𝑎𝑖

𝜕𝐿 𝑎 2 ,𝑦 2 1 2
▪ 𝜕𝑏𝑖
1 = 𝑎 [2]
− 𝑦 𝑤𝑖 1 − 𝑎𝑖

𝜕𝐿 𝑎 2 ,𝑦 [2] 2 1 2
▪ 1
𝜕𝑊𝑖𝑗
= 𝑎 − 𝑦 𝑤𝑖 1 − 𝑎𝑖 𝑥𝑗

SKKU Kang Research Group / SEE3007 Spring 2025 8


Partial Derivatives and Gradients
Vectorization
Partial Derivatives Gradients

𝜕𝐿 𝑎 2 , 𝑦 𝜕𝐿 𝑎 2 , 𝑦
[2]
= 𝑎[2] − y = 𝑎 [2] − y
𝜕𝑏 𝜕𝑏 [2]

𝜕𝐿 𝑎 2 , 𝑦 1 𝛻𝒘[2] 𝐿 𝑎 2 , 𝑦 = 𝑎[2] − 𝑦 𝒂[1]


2
= 𝑎[2] − 𝑦 𝑎𝑖
𝜕𝑤𝑖
𝜕𝐿 𝑎 2 , 𝑦 𝛻𝒃[1] 𝐿 𝑎 2 , 𝑦 = 𝑎[2] − 𝑦 𝒘[2] ⊙ 𝒆[1]
[2] 2 12
1
= 𝑎 − 𝑦 𝑤𝑖 1 − 𝑎𝑖
𝜕𝑏𝑖 𝛻𝑾[1] 𝐿 𝑎 2 , 𝑦 = 𝑎 2 − 𝑦 𝒘 2 ⊙ 𝒆 1 ⨂𝒙
𝜕𝐿 𝑎 2 , 𝑦 [2] 2 12
1
= 𝑎 − 𝑦 𝑤𝑖 1 − 𝑎𝑖 𝑥𝑗
𝜕𝑊𝑖𝑗 where 𝒆[1] = 1 − 𝒂 1 ⊙ 𝒂 1

Outer product
⊙: element-wise product (a.k.a. Hadamard product)
⨂: outer product

SKKU Kang Research Group / SEE3007 Spring 2025 9


REVIEW: LOGISTIC REGRESSION
ASSIGNMENT – XOR DATA

SKKU Kang Research Group / SEE3007 Spring 2025 10


Logistic regression: boolean operators
▪ Training logistic regression models for Boolean operators
▪ Requirements
▪ AND, OR, XOR
▪ You need to build a dataset for each operator
▪ may not working for an operator
▪ Use numpy arrays
▪ Initialization with lists: x, y
▪ Random initialization: w, b
▪ Use numpy operator
▪ Inner product
▪ Addition

SKKU Kang Research Group / SEE3007 Spring 2025 11


Lecture 5 Assignment
▪ Logistic regression on Boolean data cost
▪ Report required contents
1. Add source code for model & training
2. For each operator (AND, OR, XOR):
(1) Cost plot for epoch
epoch
▪ Use different learning rates (at least 3 learning rates)
(2) Show predicted results for whole input combinations→ e.g.,

3. Explain whether the logistic regression model works well for AND, OR, XOR data
▪ For one operator, the logistic regression won’t work. Which one?
▪ Due: 2025/3/27 (Thu) 11:59 PM (after in-person lecture07)
▪ So, you have two more classes (including today) to ask questions to finish.
▪ Submit to iCampus

SKKU Kang Research Group / SEE3007 Spring 2025 12


PYTHON PRACTICE
XOR CLASSIFICATION WITH SNN

SKKU Kang Research Group / SEE3007 Spring 2025 13


Slicing a numpy array

SKKU Kang Research Group / SEE3007 Spring 2025 14


Slicing a numpy array with condition

SKKU Kang Research Group / SEE3007 Spring 2025 15


Outer product

𝜕𝐿 𝑎 2 , 𝑦 2 12
1
= 𝑎[2] − 𝑦 𝑤𝑖 1 − 𝑎𝑖 𝑥𝑗
𝜕𝑊𝑖𝑗

SKKU Kang Research Group / SEE3007 Spring 2025 16


Data preparation

SKKU Kang Research Group / SEE3007 Spring 2025 17


Data Plot - Two Different Ways

SKKU Kang Research Group / SEE3007 Spring 2025 18


Model

𝒛[1] = 𝑾[1] 𝒙 + 𝒃[1]

𝒂[1] = tanh 𝒛[1]


𝑇
𝑧 [2] = 𝒘 2 𝒂[1] + 𝑏[2]

𝑦ො = 𝑎 2 = 𝜎 𝑧 2

SKKU Kang Research Group / SEE3007 Spring 2025 19


Train (with element-wise operations)

𝜕𝐿 𝑎 2 , 𝑦
[2]
= 𝑎[2] − y
𝜕𝑏

𝜕𝐿 𝑎 2 , 𝑦
2
= 𝑎[2] − 𝑦 𝑎𝑖1
𝜕𝑤𝑖

𝜕𝐿 𝑎 2 , 𝑦 2
= 𝑎[2] − 𝑦 𝑤𝑖 2 1 − 𝑎𝑖1
𝜕𝑏𝑖 1

𝜕𝐿 𝑎 2 , 𝑦 2
= 𝑎[2] − 𝑦 𝑤𝑖 2 1 − 𝑎𝑖1 𝑥𝑗
𝜕𝑊𝑖𝑗1

SKKU Kang Research Group / SEE3007 Spring 2025 20


Test Results

SKKU Kang Research Group / SEE3007 Spring 2025 21


Decision Boundary

▪ How does your decision boundary look like?


▪ Does it make sense? Is it good enough?
▪ If it doesn’t look good enough, how can we improve it?
▪ Hint: Increase Nepoch or increase number of neurons

SKKU Kang Research Group / SEE3007 Spring 2025 22


Train (with Vector Operations) – Assignment
Complete this code

𝜕𝐿 𝑎 2 , 𝑦
[2]
= 𝑎[2] − y
𝜕𝑏

𝛻𝒘[2] 𝐿 𝑎 2 , 𝑦 = 𝑎[2] − 𝑦 𝒂[1]

𝛻𝒃[1] 𝐿 𝑎 2 , 𝑦 = 𝑎[2] − 𝑦 𝒘[2] ⊙ 𝒆[1]


?
2 2
𝛻𝑾[1] 𝐿 𝑎 , 𝑦 = 𝑎 − 𝑦 𝒘 2 ⊙𝒆 1 ⨂𝒙
= 𝛻𝒃[1] 𝐿 𝑎 2 , 𝑦 ⨂𝒙 ?

𝒆[1] = 1 − 𝒂 1 ⊙ 𝒂 1

SKKU Kang Research Group / SEE3007 Spring 2025 23


LECTURE09 ASSIGNMENT

SKKU Kang Research Group / SEE3007 Spring 2025 24


Modeling XOR with a Shallow Neural Network
▪ Train a shallow neural network that acts like an XOR operator – We already did this
for the element-wise model.
▪ Goal: use vector operations
▪ Complete the code in “Train (with Vector Operations) – Assignment” slide
▪ There are a total of 3 question mark blocks. You need to fill in those lines.
▪ Report requirements
▪ Source code
▪ Model part (vector operations, including initialization, forward pass)
▪ Training part
▪ Test result - model.predict part
▪ Decision Boundary plot - Add a few sentences of analysis.
▪ Is it good enough?
▪ How many neurons did you use? How many Nepoch did you use?
▪ Due: 4/11 Tue 8:59AM
SKKU Kang Research Group / SEE3007 Spring 2025 25

You might also like