1
AI System Semiconductor Design
Lecture9 (In-person): Shallow Neural
Network
4/2/2025
Lecturer: Taewook Kang
Acknowledgments
Lecture material adapted from
Prof. Woowhan Jung, DSLAB, Hanyang Univ.
SKKU Kang Research Group / SEE3007 Spring 2025 1 1
REVIEW: SHALLOW NEURAL NETWORK
SKKU Kang Research Group / SEE3007 Spring 2025 2
Shallow Neural Network
Parameters: 𝜽 = {𝑾[1] , 𝒃[1] , 𝒘 2 , 𝑏[2] }
tanh
Loss 𝐿 𝑦,
ො 𝑦
𝑥1 Cost 𝐽 𝜃 = σ𝑚
𝑞=1 𝐿 𝑦
ො 𝑞 , 𝑦 (𝑞)
tanh
Gradient descent
𝑥2 𝜎 𝑦ො σ𝑚 ො 𝑞 ,𝑦 (𝑞)
𝑖=1 𝛻𝜽 𝐿 𝑦
tanh 𝜽 ≔ 𝜽 − 𝜂𝛻𝜽 𝐽 𝜽 = 𝜽 − 𝜂
𝑚
𝑥3
𝑞 , 𝑦 (𝑞)
tanh So, we will compute 𝛻𝜽𝐿 𝑦ො for 1 ≤ 𝑞 ≤ 𝑚
𝜕𝐿 𝑎 2 , 𝑦
𝜕𝑏 [2]
𝒛[1] = 𝑾[1] 𝒙 + 𝒃[1]
𝜕𝐿 𝑎 2 , 𝑦 For 1 ≤ 𝑖 ≤ ℎ
𝒂[1] = tanh 𝒛[1] Element-wise 2
𝜕𝑤𝑖
representations
𝑧 [2] =𝒘 2 𝑇 𝒂[1] + 𝑏[2] 𝜕𝐿 𝑎 2 , 𝑦
For 1 ≤ 𝑖 ≤ ℎ
𝑦ො = 𝑎 2 =𝜎 𝑧 2 𝜕𝑏𝑖
1
𝜕𝐿 𝑎 2 , 𝑦
For 1 ≤ 𝑖 ≤ ℎ, 1 ≤ 𝑗 ≤ 𝑛
1
SKKU Kang Research Group / SEE3007 Spring 2025 𝜕𝑊𝑖𝑗 3
Partial Derivatives: 2nd layer
𝑦
Parameters 𝑾[1] , 𝒃[1] , 𝒘 2 , 𝑏 [2]
𝒙
𝑇
𝑾[1] 𝒛1 =𝑾1 𝒙+𝒃1 𝒂 1 = tanh 𝒛 1 𝑧 2 =𝒘2 𝒂𝟏 +𝑏2 𝑎2 =𝜎 𝑧2 𝐿 𝑎 2 ,𝑦
𝒃[1] 𝒘[2] Binary cross entropy
𝑏[2] 𝐿 𝑎 2 , 𝑦 = −𝑦 log 𝑎 2 − 1 − 𝑦 log 1 − 𝑎 2
𝜕𝐿 𝑎 2 ,𝑦 𝜕𝐿 𝑎 2 ,𝑦 𝜕𝑎 2 𝜕𝑧 2
▪ 𝜕𝑏 [2]
=
𝜕𝑎 [2] 𝜕𝑧 [2] 𝜕𝑏 [2]
𝑑𝜎 𝑥
= 𝜎 𝑥 (1 − 𝜎 𝑥 )
𝑑𝑥
−𝑦 1−𝑦 2
= [2] + 𝜎 𝑧 1−𝜎 𝑧 2
𝑎 1 − 𝑎[2]
−𝑦 1−𝑦 2 1−𝑎2
= [2] + 𝑎
𝑎 1 − 𝑎[2]
Same with logistic
regression
= −𝑦 1 − 𝑎 2 + 𝑎[2] 1 − 𝑦 =𝑎[2] − y
SKKU Kang Research Group / SEE3007 Spring 2025 4
Partial Derivatives: 2nd layer
𝑦
Parameters 𝑾[1] , 𝒃[1] , 𝒘 2 , 𝑏 [2]
𝒙
𝑇
𝑾[1] 𝒛1 =𝑾1 𝒙+𝒃1 𝒂 1 = tanh 𝒛 1 𝑧 2 =𝒘2 𝒂𝟏 +𝑏2 𝑎2 =𝜎 𝑧2 𝐿 𝑎 2 ,𝑦
𝒃[1] 𝒘[2] Binary cross entropy
𝑏[2] 𝐿 𝑎 2 , 𝑦 = −𝑦 log 𝑎 2 − 1 − 𝑦 log 1 − 𝑎 2
𝜕𝐿 𝑎 2 ,𝑦 𝜕𝐿 𝑎 2 ,𝑦 𝜕𝑎 2 𝜕𝑧 2
▪ 𝜕𝑤𝑖
2 =
𝜕𝑎 [2] 𝜕𝑧 [2] 𝜕𝑤 2
𝑖
𝜕𝑧 2
= 𝑎[2] − 𝑦
𝜕𝑤𝑖 2
= 𝑎[2] − 𝑦 𝑎𝑖1
SKKU Kang Research Group / SEE3007 Spring 2025 5
Partial Derivatives: 1st layer
𝑦
Parameters 𝑾[1] , 𝒃[1] , 𝒘 2 , 𝑏 [2]
𝒙
𝑇
𝑾[1] 𝒛1 =𝑾1 𝒙+𝒃1 𝒂 1 = tanh 𝒛 1 𝑧 2 =𝒘2 𝒂𝟏 +𝑏2 𝑎2 =𝜎 𝑧2 𝐿 𝑎 2 ,𝑦
𝒃[1] 𝒘[2]
𝑏[2] 𝐿 𝑎 2 , 𝑦 = −𝑦 log 𝑎 2 − 1 − 𝑦 log 1 − 𝑎 2
1 1
𝜕𝐿 𝑎 2 ,𝑦 𝜕𝐿 𝑎 2 ,𝑦 𝜕𝑎 2 𝜕𝑧 2 𝜕𝑎𝑖 𝜕𝑧𝑖
= 𝑑 tanh 𝑥
𝜕𝑏𝑖
1 𝜕𝑎 [2] 𝜕𝑧 [2] 𝜕𝑎 1 𝜕𝑧 1 𝜕𝑏 1 = 1 − tanh2 𝑥
𝑖 𝑖 𝑖 𝑑𝑥
1 1
𝜕𝑧 2 𝜕𝑎𝑖 𝜕𝑧𝑖
= 𝑎[2] − 𝑦
𝜕𝑎𝑖1 𝜕𝑧𝑖 1 𝜕𝑏𝑖 1
= 𝑎[2] − 𝑦 𝑤𝑖 2 1 − tanh2 𝑧𝑖 1 ⋅ 1
2 12
= 𝑎[2] − 𝑦 𝑤𝑖 1 − 𝑎𝑖
SKKU Kang Research Group / SEE3007 Spring 2025 6
Partial Derivatives: 1st layer
𝑦
Parameters 𝑾[1] , 𝒃[1] , 𝒘 2 , 𝑏 [2]
𝒙
𝑇
𝑾[1] 𝒛1 =𝑾1 𝒙+𝒃1 𝒂 1 = tanh 𝒛 1 𝑧 2 =𝒘2 𝒂𝟏 +𝑏2 𝑎2 =𝜎 𝑧2 𝐿 𝑎 2 ,𝑦
𝒃[1] 𝒘[2]
𝑏[2] 𝐿 𝑎 2 , 𝑦 = −𝑦 log 𝑎 2 − 1 − 𝑦 log 1 − 𝑎 2
1 1
𝜕𝐿 𝑎 2 ,𝑦 𝜕𝐿 𝑎 2 ,𝑦 𝜕𝑎 2 𝜕𝑧 2 𝜕𝑎𝑖 𝜕𝑧𝑖 Note
1 =
𝜕𝑊𝑖𝑗 𝜕𝑎 [2] 𝜕𝑧 [2] 𝜕𝑎 1 𝜕𝑧 1 𝜕𝑊 1 1 1𝑇 1
𝑖 𝑖 𝑖𝑗 𝑧𝑖 = 𝒘𝑖 𝒙 + 𝑏𝑖
𝑚
1
2 12 𝜕𝑧𝑖 = 𝑊𝑖𝑗 𝑥𝑗 + 𝑏𝑖
1 1
= 𝑎[2] − 𝑦 𝑤𝑖 1 − 𝑎𝑖
𝜕𝑊𝑖𝑗1 𝑗=1
1
𝜕𝑧𝑖
⇒ 1
= 𝑥𝑗
12 𝜕𝑊𝑖𝑗
= 𝑎[2] − 𝑦 𝑤𝑖 2 1 − 𝑎𝑖 𝑥𝑗
SKKU Kang Research Group / SEE3007 Spring 2025 7
Partial derivatives
𝜕𝐿 𝑎 2 ,𝑦
▪ 𝜕𝑏 [2]
= 𝑎[2] − y
𝜕𝐿 𝑎 2 ,𝑦 1
▪ 𝜕𝑤𝑖
2 = 𝑎[2] − 𝑦 𝑎𝑖
𝜕𝐿 𝑎 2 ,𝑦 2 1 2
▪ 𝜕𝑏𝑖
1 = 𝑎 [2]
− 𝑦 𝑤𝑖 1 − 𝑎𝑖
𝜕𝐿 𝑎 2 ,𝑦 [2] 2 1 2
▪ 1
𝜕𝑊𝑖𝑗
= 𝑎 − 𝑦 𝑤𝑖 1 − 𝑎𝑖 𝑥𝑗
SKKU Kang Research Group / SEE3007 Spring 2025 8
Partial Derivatives and Gradients
Vectorization
Partial Derivatives Gradients
𝜕𝐿 𝑎 2 , 𝑦 𝜕𝐿 𝑎 2 , 𝑦
[2]
= 𝑎[2] − y = 𝑎 [2] − y
𝜕𝑏 𝜕𝑏 [2]
𝜕𝐿 𝑎 2 , 𝑦 1 𝛻𝒘[2] 𝐿 𝑎 2 , 𝑦 = 𝑎[2] − 𝑦 𝒂[1]
2
= 𝑎[2] − 𝑦 𝑎𝑖
𝜕𝑤𝑖
𝜕𝐿 𝑎 2 , 𝑦 𝛻𝒃[1] 𝐿 𝑎 2 , 𝑦 = 𝑎[2] − 𝑦 𝒘[2] ⊙ 𝒆[1]
[2] 2 12
1
= 𝑎 − 𝑦 𝑤𝑖 1 − 𝑎𝑖
𝜕𝑏𝑖 𝛻𝑾[1] 𝐿 𝑎 2 , 𝑦 = 𝑎 2 − 𝑦 𝒘 2 ⊙ 𝒆 1 ⨂𝒙
𝜕𝐿 𝑎 2 , 𝑦 [2] 2 12
1
= 𝑎 − 𝑦 𝑤𝑖 1 − 𝑎𝑖 𝑥𝑗
𝜕𝑊𝑖𝑗 where 𝒆[1] = 1 − 𝒂 1 ⊙ 𝒂 1
Outer product
⊙: element-wise product (a.k.a. Hadamard product)
⨂: outer product
SKKU Kang Research Group / SEE3007 Spring 2025 9
REVIEW: LOGISTIC REGRESSION
ASSIGNMENT – XOR DATA
SKKU Kang Research Group / SEE3007 Spring 2025 10
Logistic regression: boolean operators
▪ Training logistic regression models for Boolean operators
▪ Requirements
▪ AND, OR, XOR
▪ You need to build a dataset for each operator
▪ may not working for an operator
▪ Use numpy arrays
▪ Initialization with lists: x, y
▪ Random initialization: w, b
▪ Use numpy operator
▪ Inner product
▪ Addition
SKKU Kang Research Group / SEE3007 Spring 2025 11
Lecture 5 Assignment
▪ Logistic regression on Boolean data cost
▪ Report required contents
1. Add source code for model & training
2. For each operator (AND, OR, XOR):
(1) Cost plot for epoch
epoch
▪ Use different learning rates (at least 3 learning rates)
(2) Show predicted results for whole input combinations→ e.g.,
3. Explain whether the logistic regression model works well for AND, OR, XOR data
▪ For one operator, the logistic regression won’t work. Which one?
▪ Due: 2025/3/27 (Thu) 11:59 PM (after in-person lecture07)
▪ So, you have two more classes (including today) to ask questions to finish.
▪ Submit to iCampus
SKKU Kang Research Group / SEE3007 Spring 2025 12
PYTHON PRACTICE
XOR CLASSIFICATION WITH SNN
SKKU Kang Research Group / SEE3007 Spring 2025 13
Slicing a numpy array
SKKU Kang Research Group / SEE3007 Spring 2025 14
Slicing a numpy array with condition
SKKU Kang Research Group / SEE3007 Spring 2025 15
Outer product
𝜕𝐿 𝑎 2 , 𝑦 2 12
1
= 𝑎[2] − 𝑦 𝑤𝑖 1 − 𝑎𝑖 𝑥𝑗
𝜕𝑊𝑖𝑗
SKKU Kang Research Group / SEE3007 Spring 2025 16
Data preparation
SKKU Kang Research Group / SEE3007 Spring 2025 17
Data Plot - Two Different Ways
SKKU Kang Research Group / SEE3007 Spring 2025 18
Model
𝒛[1] = 𝑾[1] 𝒙 + 𝒃[1]
𝒂[1] = tanh 𝒛[1]
𝑇
𝑧 [2] = 𝒘 2 𝒂[1] + 𝑏[2]
𝑦ො = 𝑎 2 = 𝜎 𝑧 2
SKKU Kang Research Group / SEE3007 Spring 2025 19
Train (with element-wise operations)
𝜕𝐿 𝑎 2 , 𝑦
[2]
= 𝑎[2] − y
𝜕𝑏
𝜕𝐿 𝑎 2 , 𝑦
2
= 𝑎[2] − 𝑦 𝑎𝑖1
𝜕𝑤𝑖
𝜕𝐿 𝑎 2 , 𝑦 2
= 𝑎[2] − 𝑦 𝑤𝑖 2 1 − 𝑎𝑖1
𝜕𝑏𝑖 1
𝜕𝐿 𝑎 2 , 𝑦 2
= 𝑎[2] − 𝑦 𝑤𝑖 2 1 − 𝑎𝑖1 𝑥𝑗
𝜕𝑊𝑖𝑗1
SKKU Kang Research Group / SEE3007 Spring 2025 20
Test Results
SKKU Kang Research Group / SEE3007 Spring 2025 21
Decision Boundary
▪ How does your decision boundary look like?
▪ Does it make sense? Is it good enough?
▪ If it doesn’t look good enough, how can we improve it?
▪ Hint: Increase Nepoch or increase number of neurons
SKKU Kang Research Group / SEE3007 Spring 2025 22
Train (with Vector Operations) – Assignment
Complete this code
𝜕𝐿 𝑎 2 , 𝑦
[2]
= 𝑎[2] − y
𝜕𝑏
𝛻𝒘[2] 𝐿 𝑎 2 , 𝑦 = 𝑎[2] − 𝑦 𝒂[1]
𝛻𝒃[1] 𝐿 𝑎 2 , 𝑦 = 𝑎[2] − 𝑦 𝒘[2] ⊙ 𝒆[1]
?
2 2
𝛻𝑾[1] 𝐿 𝑎 , 𝑦 = 𝑎 − 𝑦 𝒘 2 ⊙𝒆 1 ⨂𝒙
= 𝛻𝒃[1] 𝐿 𝑎 2 , 𝑦 ⨂𝒙 ?
𝒆[1] = 1 − 𝒂 1 ⊙ 𝒂 1
SKKU Kang Research Group / SEE3007 Spring 2025 23
LECTURE09 ASSIGNMENT
SKKU Kang Research Group / SEE3007 Spring 2025 24
Modeling XOR with a Shallow Neural Network
▪ Train a shallow neural network that acts like an XOR operator – We already did this
for the element-wise model.
▪ Goal: use vector operations
▪ Complete the code in “Train (with Vector Operations) – Assignment” slide
▪ There are a total of 3 question mark blocks. You need to fill in those lines.
▪ Report requirements
▪ Source code
▪ Model part (vector operations, including initialization, forward pass)
▪ Training part
▪ Test result - model.predict part
▪ Decision Boundary plot - Add a few sentences of analysis.
▪ Is it good enough?
▪ How many neurons did you use? How many Nepoch did you use?
▪ Due: 4/11 Tue 8:59AM
SKKU Kang Research Group / SEE3007 Spring 2025 25