0% found this document useful (0 votes)

113 views

Kursus Deep Learning

Sebelum mempelajari RNNs dan arsitektur Deep Learning yang lainnya, disarankan untuk kita mempelajari beberapa topik berikut:• Gradient Descent/Ascent• LinearRegression• LogisticRegression• Konsep Back propagation dan Computational Graph• Multilayer Neural Networks,

Uploaded by

hendriganting

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

113 views

Kursus Deep Learning

Uploaded by

hendriganting

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 108

Pengantar DeepLearning

9 October 2017
untuk NLP

Alfan F.Wicaksono, FASILKOM UI

Alfan Farizki Wicaksono([email protected])

Information RetrievalLab.
FakultasIlmuKomputer
Universitas Indonesia 2017
1
Deep LearningTsunami
“Deep Learning waves have lapped at the shores of

9 October 2017
computational linguistics for several years now, but 2015
seems like the year when the full force of the tsunami hit the
major Natural Language Processing (NLP) conferences.”
-Dr.Christopher D.Manning, Dec 2015

Alfan F.Wicaksono, FASILKOM UI

Christopher D. Manning. (2015). Computational Linguistics and Deep
Learning Computational Linguistics, 41(4), 701–707. 2
Tips
Sebelum mempelajari RNNs dan arsitektur Deep Learning

9 October 2017
yang lainnya, disarankan untuk kita mempelajari beberapa
topik berikut:

• Gradient Descent/Ascent

Alfan F.Wicaksono, FASILKOM UI

• Linear Regression
• Logistic Regression
• Konsep Backpropagation dan Computational Graph
• Multilayer Neural Networks

3
Referensi/Bacaan
• Andrej Karpathy’ Blog

9 October 2017
• http://karpathy.github.io/2015/05/21/rnn-effectiveness/
• Colah’s Blog
• http://karpathy.github.io/2015/05/21/rnn-effectiveness/

Alfan F.Wicaksono, FASILKOM UI

• Buku Deep Learning Yoshua Bengio
• Y.Bengio, Deep Learning, MLSS2015

4
Deep Learning vs MachineLearning
• Deep Learning adalah bagian dari isu MachineLearning

9 October 2017
• Machine Learning adalah bagian dari isu Artificial Intelligence

ArtificialIntelligence

Alfan F.Wicaksono, FASILKOM UI

• Searching
• Knowledge MachineLearning
Representation
and reasoning
• Planning
DeepLearning

5
Machine Learning : Dirancangmanusia

(rule-based) : Di-inferotomatis

9 October 2017
Predicted label:positive

Alfan F.Wicaksono, FASILKOM UI

Hand-craftedrules:

i f contains(‘menarik’):
r e t u r n positive
...

6
“Buku ini sangat menarik dan penuh manfaat”
Machine Learning : Dirancangmanusia

(classicalML) : Di-inferotomatis

Fungsi klasifikasi dioptimasi berdasarkan input (fitur) &

9 October 2017
ouput.
Predicted label:positive

Alfan F.Wicaksono, FASILKOM UI

Learn mapping from features tolabel Classifier

FeatureEngineering!
Hand-designed FeatureExtractor:
Contoh: Menggunakan TF-IDF, Representation
informasi syntax dengan POS Tagger, dsb.

7
“Buku ini sangat menarik dan penuh manfaat”
Machine Learning : Dirancangmanusia

(RepresentationLearning) : Di-inferotomatis

Fitur dan fungsi klasifikasi di-optimasi secara bersama-sama.

9 October 2017
Predicted label:positive

Alfan F.Wicaksono, FASILKOM UI

Learn mapping from features tolabel Classifier

Learn FeatureExtractor
Contoh: Restricted Boltzman Machine, Representation
Autoencoder, dsb.

8
“Buku ini sangat menarik dan penuh manfaat”
Machine Learning : Dirancangmanusia

(DeepLearning) : Di-inferotomatis

Deep Learning learnsFeatures!

9 October 2017
Predicted label:positive

Alfan F.Wicaksono, FASILKOM UI

Learn mapping from features tolabel Classifier

Fitur yang Lebih HighLevel

FiturKompleks/High-Level

FiturSederhana Representation

9
“Buku ini sangat menarik dan penuh manfaat”
h
Sejara
10

Alfan F. Wicaksono, FASILKOM UI 9 October 2017

The Perceptron (Rosenblatt,1958)
• Sejarah dimulai dimulai dengan Perceptron di akhir tahun1950.

9 October 2017
• Perceptron terdiri dari 3 layer: Sensory, Association, dan
Response.

Alfan F.Wicaksono, FASILKOM UI

http://www.andreykurenkov.com/writing/a-brief-history-of-neural-nets-and-deep-learning/ 11

Rosenblatt, Frank. “The perceptron: a probabilistic model for information storage and
organization in the brain.” Psychological review 65.6 (1958): 386.
The Perceptron (Rosenblatt,1958)

9 October 2017
Alfan F.Wicaksono, FASILKOM UI
Activation function adalah fungsi non-linier. Dalam kasus perceptron
Rosenblatt, activation function adalah operasi thresholding biasa (step
function).

Learning perceptron menggunakan metode DonaldHebb.

Saatitu, mampumelakukan klasifikasi untuk input Pixel 20x20! 12

http://www.andreykurenkov.com/writing/a-brief-history-of-neural-nets-and-deep-learning/
The organization of behavior: A neuropsychological theory. D. O. Hebb. JohnWiley
And Sons, Inc., New York, 1949
The Fathers of Deep Learning(?)

9 October 2017
Alfan F.Wicaksono, FASILKOM UI
13

https://www.datarobot.com/blog/a-primer-on-deep-learning/
The Fathers of Deep Learning(?)
• Di tahun 2006, ketiga orang tersebut mengembangkan cara
untuk memanfaatkan dan mengatasi masalah training

9 October 2017
terhadap deep neural networks.
• Sebelumnya, banyak orang yang sudah menyerah terkait
manfaat dari neural network, dan cara training-nya.

Alfan F.Wicaksono, FASILKOM UI

• Mereka mengatasi masalah terkait Neural Network belum
mampu belajaruntukmenemukanrepresentasiyang berguna.

• GeoffHintonhas been snatched up by Google;

• YannLeCunis Director of AI Research at Facebook;
• Yoshua Bengio holds a position as research chair for
Artificial Intelligence at University of Montreal 14

https://www.datarobot.com/blog/a-primer-on-deep-learning/
The Fathers of Deep Learning(?)
• Automated learning of data representations and features is

9 October 2017
what thehype is all about!

Alfan F.Wicaksono, FASILKOM UI

https://www.datarobot.com/blog/a-primer-on-deep-learning/
Mengapasebelumnya“deeplearning” tidak sukses?

• Sebenarnya, neural network kompleks sudah banyak

9 October 2017
ditemukan sebelumnya.
• Bahkan Long-ShortTermMemory(LSTM)network, yang saat ini
ramai digunakan di bidang NLP, ditemukan tahun 1997 oleh
Hochreiter&Schmidhuber.

Alfan F.Wicaksono, FASILKOM UI

• Ditambah lagi, orang dahulu percaya bahwa neural
network “can solve everything!”. Tetapi, mengapa mereka
tidak bisa melakukannya dahulu?

16
Mengapasebelumnya“deeplearning” tidak sukses?

• Beberapa alasan, oleh IlyaSutskever:

• http://yyue.blogspot.co.id/2015/01/a-brief-overview-of-deep-

9 October 2017
learning.html

• Computers were slow. So the neural networks of past were tiny. And
tiny neural networks cannot achieve very high performance on

Alfan F.Wicaksono, FASILKOM UI

anything. In other words, small neural networks are not powerful.
• Datasets were small. So even if it was somehow magically possible to
train LDNNs, there were no large datasets that had enough
information to constrain their numerous parameters. So failure
was inevitable.
• Nobody knew how to train deep nets. The current best object
recognition networks have between 20 and 25 successive layers of
convolutions. A 2 layer neural network cannot do anything good on
object recognition. Yet back in the day everyone was very sure that
deep nets cannot be trained with SGD, since that would’ve been
too good to be true 17
The Success of DeepLearning
Salah satu faktor-nya adalah karena saat ini ditemukan cara

9 October 2017
learning yang bekerja secara praktikal.
The success of Deep Learning hinges on a very fortunate fact: that
well-tuned and carefully-initialized stochastic gradient descent
(SGD) can train LDNNs on problems that occur in practice. It is not a

Alfan F.Wicaksono, FASILKOM UI

trivial fact since the training error of a neural network as a function
of its weights is highly non-convex. And when it comes to non-
convex optimization, we were taught that all bets are off...

And yet, somehow, SGD seems to be very good at training those large
deep neural networks on the tasks that we care about. The problem
of training neural networks is NP-hard, and in fact there exists a
family of datasets such that the problem of finding the best neural
network with three hidden units is NP-hard. And yet, SGD just solves
it in practice. 18

Ilya Sutskever, http://yyue.blogspot.co.at/2015/01/a-brief-overview-of-deep-learning.html

Apa Itu DeepLearning?
19

Alfan F. Wicaksono, FASILKOM UI 9 October 2017

Apa itu Deep Learning?
• Kenyataannya, Deep Learning =(Deep) Artificial Neural

9 October 2017
Networks(ANNs)
• Dan Neural Networks sebenarnya adalah sebuah
TumpukanFungsiMatematika

Alfan F.Wicaksono, FASILKOM UI

Image Courtesy: Google

Apa itu Deep Learning?
Secara praktis, (supervised) Machine Learning itu adalah:

9 October 2017
Ekspresikan permasalahan ke dalam sebuah fungsi F (yang
mempunyai parameter θ), lalu secara otomatis cari
parameter θ sehingga fungsi F tepat mengeluarkan output
yang diinginkan.

Alfan F.Wicaksono, FASILKOM UI

Predicted label: Y =positive

Y =F(X;θ)
21
X: “Buku ini sangat menarik dan penuh manfaat”
Apa itu Deep Learning?
Untuk Deep Learning, fungsi tersebut biasanya terdiri dari

9 October 2017
tumpukan banyak fungsi yang biasanya serupa.

Y  F(F (F (X ;3);2);1)

Alfan F.Wicaksono, FASILKOM UI

Y =positive

F(X;θ3)
Gambar ini sering disebut Tumpukan Fungsi ini
dengan istilah sering disebut dengan
F(X;θ2)
ComputationalGraph TumpukanLayer

F(X;θ1)
22
“Buku ini sangat menarik dan penuh manfaat”
Apa itu Deep Learning?
• Layer yang paling terkenal/umum adalah Fully-Connected

9 October 2017
Layer.
Y  F (X )  f (W .X b)
• “weighted sum of its inputs, followed by a non-linear function”

Alfan F.Wicaksono, FASILKOM UI

• Fungsi non-linier yang umum digunakan: Tanh (tangent
hyperbolic), Sigmoid,ReLU(Rectified Linear Unit)

W  R M N
 
X  RN
f

 w x b

Munit ii
 i 
Nunit f
b  RM
w xi i b
f (W.X  b) i 23

X Non-linearity
Mengapa perlu“Deep”?
• Humans organize their ideas and concepts hierarchically

9 October 2017
• Humans first learn simpler concepts and then compose
them to represent more abstract ones
• Engineers break-up solutions into multiple levels of
abstraction and processing

Alfan F.Wicaksono, FASILKOM UI

• It would be good to automatically learn / discover these
concepts

24
Y. Bengio, Deep Learning, MLSS 2015, Austin, Texas, Jan2014
(Bengio &Delalleau2011)
Neural Networks
Y  f (W1.X  b1)

9 October 2017
Alfan F.Wicaksono, FASILKOM UI
X Y  f (W1.X  b1)

25
Neural Networks
Y  f (W1.( f (W1.X  b2 ))  b1)

9 October 2017
Alfan F.Wicaksono, FASILKOM UI
Y  f (W2 .H1  b2 )

X H1  f (W1.X  b1 )

26
Neural Networks
Y  f (W1.( f (W2.( f (W3.X  b3 ))  b2 ))  b1)

9 October 2017
Alfan F.Wicaksono, FASILKOM UI
Y  f (W3.H 2  b3)

X
H1  f (W1.X  b1 ) H 2  f (W2 .H1  b2 )
27
Alasanmatematis mengapaharus“deep”?

• A neural network with a single hidden layer of enough units

9 October 2017
can approximate anycontinuousfunctionarbitrarily well.

• In other words, it can solve whatever problem you’re

interested in!

Alfan F.Wicaksono, FASILKOM UI

(Cybenko 1998, Hornik1991)

28
Alasanmatematis mengapaharus“deep”?
Akan tetapi...

9 October 2017
• “Enough units” can be a very large number. There are functions
representable with a small, but deep network that would require
exponentiallymany units with a single layer.
• The proof only says that a shallow network exists, it does not say

Alfan F.Wicaksono, FASILKOM UI

how to find it.
• Evidence indicates that it is easier to train a deep network to perform
wellthanashallowone.
• A more recent result brings an example of a very large class of
functions that cannot be efficiently represented with a small-
depthnetwork.

(e.g., Hastadet al. 1986, Bengio&Delalleau2011) (Braverman,

2011) 29
Mengapa perlu non-linearity?
 
f

 w x b
ii 
 i 

9 October 2017
f
w xi i b
i

Alfan F.Wicaksono, FASILKOM UI

Non-linearity

Mengapa perlu fungsi non-linier f?

H1  W1.X b1 Kalau tanpa f, rangkaian fungsi ini adalah

tetap fungsilinier.
H 2  W2 .H1  b2
Data bisa sangat kompleks, dan terkadang
Y  W3.H 2 b3 hubungan yang ada pada data tidak hanya 30
linier, tetapi bisa non-linier. Perlu
? representasi yang bisa menangkap hal ini.
Training NeuralNetworks
Y  f (W1.( f (W2.( f (W3.X  b3 ))  b2 ))  b1)

9 October 2017
• Secara random,kita inisialisasi semua parameter W1, b1, W2,
b2, W3,b3

Alfan F.Wicaksono, FASILKOM UI

• Definisikan sebuah cost function/loss function yang
mengukur seberapa baik fungsi neural network Anda.
• Seberapa jauh nilai yang diprediksi dengan nilai sesungguhnya

• Secara iteratif/berulang-ulang, sesuaikan nilai parameter

sehingga nilai loss function menjadi minimal.
31
Training NeuralNetworks
• Initialize trainable parameters
randomly

9 October 2017
Alfan F.Wicaksono, FASILKOM UI
W(3)

W(2)

W(1)
32
Bukuini sangat baik dan mendidik
Training NeuralNetworks
• Initialize trainable parameters
randomly

9 October 2017
• Loop: x =1 →#epoch:
• Pick a training example

Alfan F.Wicaksono, FASILKOM UI

W(3)

W(2)

W(1)
33
x Bukuini sangat baik dan mendidik
Training NeuralNetworks
pos neg • Initialize trainable parameters
y’ randomly

9 October 2017
True Label 1 0
• Loop: x =1 →#epoch:
Pred. Label • Pick a training example
y 0.3 0.7
(Output) • Compute output by doing feed-

Alfan F.Wicaksono, FASILKOM UI

forward process
W(3)

W(2)

h1
W(1)
34
x Bukuini sangat baik dan mendidik
Training NeuralNetworks
pos neg • Initialize trainable parameters
y’ randomly

9 October 2017
True Label 1 0
L
• Loop: x =1 →#epoch:
y • Pick a training example
Pred. Label y 0.3 0.7
(Output) • Compute output by doing feed-

Alfan F.Wicaksono, FASILKOM UI

forward process
W(3)
• Compute gradient of loss w.r.t.
h2 output

W(2)

h1
W(1)
35
x Bukuini sangat baik dan mendidik
Training NeuralNetworks
pos neg • Initialize trainable parameters
y’ randomly

9 October 2017
True Label 1 0
L
• Loop: x =1 →#epoch:
y • Pick a training example
Pred. Label y 0.3 0.7
(Output) L • Compute output by doing feed-

Alfan F.Wicaksono, FASILKOM UI

forward process
W (3)
L • Compute gradient of loss w.r.t.
output
h2
• Backpropagate loss, computing
W(2) gradients w.r.t trainable
parameters. It’s like
h1 computing contribution of
error to the output of each
W(1)
parameter
36
x Bukuini sangat baik dan mendidik
Training NeuralNetworks
pos neg • Initialize trainable parameters
y’ randomly

9 October 2017
True Label 1 0
L
• Loop: x =1 →#epoch:
y • Pick a training example
Pred. Label y 0.3 0.7
(Output) L • Compute output by doing feed-

Alfan F.Wicaksono, FASILKOM UI

forward process
W (3)
L • Compute gradient of loss w.r.t.
output
h2
L • Backpropagate loss, computing
W (2) gradients w.r.t trainable
L parameters. It’s like
h1 computing contribution of
error to the output of each
W(1)
parameter
37
x Bukuini sangat baik dan mendidik
Training NeuralNetworks
pos neg • Initialize trainable parameters
y’ randomly

9 October 2017
True Label 1 0
L
• Loop: x =1 →#epoch:
y • Pick a training example
Pred. Label y 0.3 0.7
(Output) L • Compute output by doing feed-

Alfan F.Wicaksono, FASILKOM UI

forward process
W (3)
L • Compute gradient of loss w.r.t.
output
h2
L • Backpropagate loss, computing
W (2) gradients w.r.t trainable
L parameters. It’s like
h1 computing contribution of
L error to the output of each
W (1) parameter
38
x Bukuini sangat baik dan mendidik
Gradient Descent(GD)

Take small step in direction of negative gradient!

n(t1) (t)n  t (t) f ((t) )

 n

9 October 2017
Alfan F.Wicaksono, FASILKOM UI
39
https://github.com/joshdk/pygradesc
Lebih Detaildalam Hal Teknis …
0
4

Alfan F. Wicaksono, FASILKOM UI 9 October 2017

Gradient Descent(GD)

Salah satu framework paling sederhana untuk permasalahan

optimasi (multivariate optimization).

9 October 2017
Digunakan untuk mencari konfigurasi parameter-parameter
sehingga cost function menjadi optimal, dalam hal ini
mencapai local minimum.

Alfan F.Wicaksono, FASILKOM UI

GD menggunakan metode iteratif, yang dimulai dari sebuah
titik acak, dan secara perlahan-lahan mengikuti arah negatif
dari gradient sehingga pada akhirnya akan berpindah ke
suatu titik kritikal, yang diharapkan merupakan local
minimum.
41
Tidak dijamin mencapai global minimum !
Gradient Descent(GD)

Problem: carilah nilai x sehingga fungsi f(x) =2x4 +x3 –3x2

mencapai titik local minimum.

9 October 2017
Misal, kita pilih x dimulai dari x=2.0:

Alfan F.Wicaksono, FASILKOM UI

Algoritma GD konvergen
pada titik x =0.699, yang
merupakan local minimum.

Localminimum

42
Gradient Descent(GD)

Algorithm:

For t  1, 2, ..., Nmax :

9 October 2017
xt 1  xt tf '(x t )
If f '(x t 1 )  then return "converged on critical point"
If xt xt1  then return "converged on x value"

Alfan F.Wicaksono, FASILKOM UI

If f (xt 1 )  f (xt ) then return "diverging"

αt : learning rate atau step size pada iterasi ke-t

ϵ:sebuah bilangan yang sangat kecil
Nmax:batas banyaknya iterasi, atau disebut epoch jika iterasi selalu sampai akhir

Algoritma dimulai denganmenebak nilai x1 ! 43

Tips: pilih αt yang tidak terlalu kecil, juga tidak terlalu besar.
Gradient Descent(GD)

Kalau parameter-nya ada banyak ?

Carilah θ =θ1, θ2, …,θnsehinggaf(θ1, θ2, …,θn) mencapai local minimum!

9 October 2017
while not converged :

Alfan F.Wicaksono, FASILKOM UI

Dimulai dengan menebak
 (t 1)

(t ) 1
 t f ( )
(t)
nilai awal θ =θ1, θ2, …,θn
1
 (t)
1


(t1) (t ) 22  t f ((t) )
(t)2

n(t1) (t)n  t (t) f ((t) )

 n 44
Gradient Descent(GD)

Take small step in direction of negative gradient!

n(t1) (t)n  t (t) f ((t) )

 n

9 October 2017
Alfan F.Wicaksono, FASILKOM UI
45
https://github.com/joshdk/pygradesc
LogisticRegression

Berbeda dengan Linear Regression, Logistic Regression

digunakan untuk permasalahan Binary Classification. Jadi
output yang dihasilkan adalah diskrit ! (1 atau 0), (ya atau

9 October 2017
tidak).

Misal, diberikan 2 hal:

Alfan F.Wicaksono, FASILKOM UI

1. Unseen data x yang ingin diprediksi labelnya, yaitu y.
2. Model logistic regression dengan parameter θ0, θ1, …,θn
yang sudah ditentukan.
n
P( y  1| x;)  (0 1 x1  ...n xn )  (0  xi )
i1

P( y  0 | x;)  1 P( y  1| x;)
46
1
(z) 
1 e z
LogisticRegression

Bisa digambarkan dalam bentuk “singleneuron”.

n
P( y  1| x;)  (0 1 x1  ...n xn )  (0  i xi )

9 October 2017
i1

P( y  0 | x;)  1 P( y  1| x;)

Alfan F.Wicaksono, FASILKOM UI

(z)  x1
1 e z θ1

x2 θ2

θ3
x3
θ0
+1

47
Dengan fungsi sigmoid sebagai activationfunction
LogisticRegression

Slide sebelumnya mengasumsikan bahwa parameter θ sudah

ditentukan.

9 October 2017
Bagaimana bila belum ditentukan ?
Kita bisa estimasi parameter θ dengan memanfaatkan
training data {(x(1), y(1)), (x(2), y(2)), …, (y(n), y(n))} yang

Alfan F.Wicaksono, FASILKOM UI

diberikan (learning).

x1
θ1
x2 θ2
θ3
x3
θ0
48
+1
LogisticRegression
Learning

Misal,
n
h(x) (0 1 x1  ... n xn )  (0  i xi )

9 October 2017
i1

Probabilitas masing-masing kelas dapat dituliskan secara singkat

(persamaan di slide-slide sebelumnya) dengan:

Alfan F.Wicaksono, FASILKOM UI

P( y | x;)  (h(x)) y .(1 h(x))1 y y {0,1}

Diberikan mbuah training examples, likelihood:

m
L()   P( y(i ) | x(i ) ;)
i1
m
49
  (h(x (i) y ( i) (i) 1y ( i )
)) .(1 h(x ))
i1
LogisticRegression
Learning

Dengan MLE, kita akan mencari konfigurasi parameter yang

memberikan nilai likelihood maksimal(=nilailog-likelihood maksimal).

9 October 2017
l()  log L()

Alfan F.Wicaksono, FASILKOM UI

m
  y(i ) log h(x(i) )  (1 y(i ) ) log (1 h(x(i) ))
i1

Atau, kita cari parameter yang meminimalkan fungsi negative log-

likelihood (cross-entropy loss). Inilah costfunctionkita !

m
J ()   y(i ) log h(x(i) )  (1 y(i ) ) log (1 h(x(i) ))
i1
50
LogisticRegression
Learning

Agar bisa menggunakan Gradient Descent untuk mencari

parameter yang meminimalkan cost function, kita perlu

9 October 2017
menurunkan:

J ()
i

Alfan F.Wicaksono, FASILKOM UI

Dapat ditunjukkan bahwa untuk sebuah training example(x,y):


J () (h (x)  y) x
 j
j

Jadi, untuk update nilai parameter di tiap iterasi untuk semua

example:
m
51
 j: j  (h (x(i ) )  y(i ) ) xij
i1
LogisticRegression
Learning

BatchGradient Descentuntuk learning model

i n i s i a l i s a s i θ 1 , θ 2 , … , θn

9 October 2017
while not converged :

m
 1: 1  (h (x

Alfan F.Wicaksono, FASILKOM UI

(i)
)  y (i) ) x1i
i1
m
 2: 2  (h (x
i1
(i)
)  y (i) ) x 2i

m
 n: n  (h (x
i1
(i)
)  y (i) ) x ni

52
LogisticRegression
Learning

StochasticGradientDescent: kita bisa membuat progress dan

update parameter ketika kita melihat sebuahtrainingexample.

9 October 2017
i n i s i a l i s a s i θ 1 , θ 2 , … , θn
while not converged :

Alfan F.Wicaksono, FASILKOM UI

f o r i : = 1 t o m do :
 : (y(i )  h (x (i ) )) xi
1 1  1
 : (y  h (x )) x
(i ) (i ) i
2 2  2

 : ( y(i )  h (xn(i ) )) xi n n

53
LogisticRegression
Learning

Mini-Batch Gradient Descent untuk learning model: Daripada kita

gunakan sebuah sample untuk update (seperti online learning),

9 October 2017
gradient dihitung dengan cara rata-rata/sum dari sebuah mini- batch
sample (misal, 32 atau 64 sample).

Alfan F.Wicaksono, FASILKOM UI

BatchGradientDescent:gradient dihitung untuk semuasample!
• Untuk sekali step, terlalu besar komputasinya

OnlineLearning:gradient dihitung untuk sebuahsample!

• Terkadang noisy
• Update step sangat kecil

54
Multilayer NeuralNetwork (MultilayerPerceptron)

Logistic Regression sebenarnya bisa dianggap sebagai Neural

Network dengan beberapa unit di layer input, satu unit di
layer output, dan tanpa hidden layer.

9 October 2017
x1
θ1

Alfan F.Wicaksono, FASILKOM UI

x2 θ2

θ3
x3
θ0
+1

Dengan fungsi sigmoid sebagai activationfunction

55
Multilayer NeuralNetwork (MultilayerPerceptron)

Misal, ada 3-layer NN, dengan 3 input unit, 2 hidden unit, dan
2 output unit.

9 October 2017
x1 𝑊
(1)11
𝑊 𝑊
(1)21 (2)11

Alfan F.Wicaksono, FASILKOM UI

(1) 𝑊
(2)21
𝑊
x2 12
(1) 𝑊12
(2)
𝑊22 𝑊
(2)22
𝑊 (1)
13𝑊 (2 (2
𝑏 𝑏
(1)23 ) 1 )2
x3
(1
(1) 𝑏
𝑏 )2
1
+1
56
+1 W(1), W(2), b(1), b(2) adalah parameter!
Multilayer NeuralNetwork (MultilayerPerceptron)

Seandainya parameter sudah diketahui, bagaimana caranya

memprediksi label dari input (klasifikasi) ?

9 October 2017
Dari contoh sebelumnya, ada 2 unit di output layer. Kondisi ini
biasanya digunakan untuk binary classification. Unit pertama
menghasilkan probabilitas untuk pertama, dan unit kedua

Alfan F.Wicaksono, FASILKOM UI

menghasilkan probabilitas untuk kelas kedua.

Kita perlu melakukan proses feed-forward untuk menghitung

nilai yang dihasilkan di output layer.

57
Multilayer NeuralNetwork (MultilayerPerceptron)

Misal, untuk activation function, kita gunakan fungsi

hyperbolictangent. f (x)  tanh(x)

9 October 2017
Untuk menghitung output di hiddenlayer:

z1(2)  W11(1) x1 W12(1) x2 W13(1) x3  b1(1)

Alfan F.Wicaksono, FASILKOM UI

z (2)
2
 W (1)
21 1
x W (1)
22 2
x W (1)
23 3
x  b2
(1)

a1(2)  f (z (2)1 )
a2(2)  f (z (2)2 ) Ini hanyalah perkalian matrix !

 x1 
W (1)
W(1)
W    b1(1) 
(1)
58
z (2)
W x b  
(1) (1) 11
(1)
12
(1)   x2    (1) 
13
(1)
W W W  b2 
 x3 
21 22 23
Multilayer NeuralNetwork (MultilayerPerceptron)

Jadi, Proses feed-forward secara keseluruhan hingga

menghasilkan output di kedua output unit adalah:

9 October 2017
z (2)  W (1) x  b(1)
a (2)  f (z (2) )

Alfan F.Wicaksono, FASILKOM UI

z(3)  W (2) a (2) b(2)
hW ,b (x)  a(3)  softmax (z (3) )

59
Multilayer NeuralNetwork (MultilayerPerceptron)
Learning

Ada Cost function yang berupa cross-entropy loss: m

adalah banyaknya training examples.

9 October 2017
nl 1 sl sl1 (l ) 2

Alfan F.Wicaksono, FASILKOM UI

1 m
J (W,b)    yi, j log( pi, j )   (W ji )
m i1 j 2 l 1 i1 j1

Regularizationterms

Berarti, cost function untuk satu sample (x,y)adalah:

J (W ,b; x, y)   y j loghW ,b (x j )   y j logp j 

60
j j
Multilayer NeuralNetwork (MultilayerPerceptron)
Learning

Namun, kali ini, Cost function kita adalah squared-error: m

adalah banyaknya training examples.

9 October 2017


Alfan F.Wicaksono, FASILKOM UI

nl 1 sl
1 m 1  sl1
J (W,b)    hW ,b (x (i) )  y (i)    (W ji(l ) ) 2
2

m i1  2  2 l 1 i1 j1

Regularizationterms

Berarti, cost function untuk satu sample (x,y)adalah:

1
J (W ,b; x, y)  hW ,b (x (i ) )  y(i )
2
2 1
2 j

  hW ,b (x j )  y j
(i) (i)

2 61
Multilayer NeuralNetwork (MultilayerPerceptron)
Learning

Batch Gradient Descent

9 October 2017
i n i s i a l i s a s i W, b
whil e not converg ed : Bagaimanacara menghitung

Alfan F.Wicaksono, FASILKOM UI

gradient??

Wij(l )  W ij(l )  J (W,b)
Wij
(l )

bi(l )  b(li )  
J (W,b)
bi(l)

62
Multilayer NeuralNetwork (MultilayerPerceptron)
Learning

Misal, (x, y) adalah sebuah training sample, dan dJ(W,b;x,y)

adalah turunan parsial terkait sebuah sample (x,y).

9 October 2017
dJ(W,b;x,y) menentukan overall partial derivative dJ(W,b):

Alfan F.Wicaksono, FASILKOM UI

 1 m  
     
(i) (i) (l)
J (W,b) J (W,b; x , y ) W
Wij(l )  i1
m W (l)
ij 
ij

 1 m 
bi(l)
J (W ,b)  
m i1 bi(l)
J (W ,b; x (i)
, y (i)
)

Dihitung dengan teknik BackPropagation

Multilayer NeuralNetwork (MultilayerPerceptron)
Learning
Back-Propagation

9 October 2017
1. Jalankan proses feed-forward
2. Untuk setiap output unit i pada layer nl(output layer)

i (nl )
 J (W,b; x, y)  (a (nl )
 y )  f '(z (nl )
)
zi
i i i

Alfan F.Wicaksono, FASILKOM UI

(nl )

3. l = nl- 1, nl- 2, ... 2

Untuk setiap node i di layerl
sl1
 i(l)  (  W ji(l ) (l1)
j )  f '(z (l)
i )
j1
4. Finally..

  64
 j i  
(l ) (l1) (l1)
J (W,b; x, y) a J (W,b; x, y)
Wij
(l )
bi(l) i
Multilayer NeuralNetwork (MultilayerPerceptron)
Learning 𝑧1(3)
(2)
Back-Propagation 𝑊 𝑎 (3
) 1
11
𝑊
(2)21

9 October 2017
Contoh hitung gradient di output ... 𝑊
(2)12
(3)
𝑧2
(3
𝑊22 𝑎
(2) ) 2

J (W ,b; x, y) 
2

1 (3) 2
a1  y1 
1 (3)
2

a2  y2  
2
𝑏
(2) 𝑏
)2
(2

Alfan F.Wicaksono, FASILKOM UI

1
J
a1
(3)
 a1  y1 
(3)
+1

a 1
(3)

    f ' z 
 f z (3)
1 (3)
 J a1(3) z1(3)
J (W ,b; x, y)  (3)  (3) 
z (3)
z (3) 1 W12(2)
a1 z1 W12(2)
1 1

z(3)
1
W (2)
11
a
(2)
1
W (2)
12
a (2)
2
b
(2)
1
 
 a1(3)  y1  f '(z 1(3) )  a 2(2)

z1(3) 65
 a2(2)
W12 (2)
Sensitivity – JacobianMatrix
The Jacobian J is the matrix of partial derivatives of the

9 October 2017
network output vector y with respect to the input vector x.

. . . .
. . . . yk

Alfan F.Wicaksono, FASILKOM UI

J J k ,i 
x i
. . . .
 
. . . .

These derivatives measure therelative sensitivity of the outputsto

small changes in the inputs, and can therefore be used, for
example, to detect irrelevant inputs.
66

Alex Graves, Supervised Sequence Labelling with Recurrent Neural Networks

9 October 2017
More Complex NeuralNetworks

Alfan F.Wicaksono, FASILKOM UI

(Neural NetworkArchitectures)

Recurrent Neural Networks (Vanilla RNN, LSTM,GRU)

Attention Mechanisms
Recursive Neural Networks

67
Recurrent NeuralNetworks
O1 O2 O3 O4 O5

9 October 2017
Alfan F.Wicaksono, FASILKOM UI
h1 h2 h3 h4 h5

X1 X2 X3 X4 X5
68

One of the famous Deep Learning Architectures in the NLP community

Recurrent NeuralNetworks(RNNs)

Kita biasanya menggunakan RNNsuntuk:

• Memproses Sequences
• Menghasilkan Sequences

9 October 2017
• …
• Intinya … ada Sequences

Alfan F.Wicaksono, FASILKOM UI

Sequencesbiasanya:
• Urutan kata-kata
• Urutan kalimat-kalimat
• Signal
• Suara
• Video (Sequence of Images)
• … 69
Recurrent NeuralNetworks(RNNs)

NotRNNs SequenceInput
(Vanilla Feed-Forward NNs) (e.g. Sentence Classification)

9 October 2017
Alfan F.Wicaksono, FASILKOM UI
SequenceInput/Output
(e.g. Machine Translation) 70
SequenceOutput
(e.g. Image Captioning)
http://karpathy.github.io/2015/05/21/rnn-effectiveness/
Recurrent NeuralNetworks(RNNs)

9 October 2017
RNNscombine the input vector with their state vector with a fixed

Alfan F.Wicaksono, FASILKOM UI

(but learned) function to produce a new state vector.

This can, in programming terms, be interpreted as running a fixed

program with certain inputs and some internalvariables.

In fact, it is known that RNNs are Turing-Complete in the sense that

they can simulate arbitrary programs (with properweights).
71
http://binds.cs.umass.edu/papers/1995_Siegelmann_Science.pdf
http://karpathy.github.io/2015/05/21/rnn-effectiveness/
http://colah.github.io/posts/2015-08-Understanding-LSTMs/
Recurrent NeuralNetworks(RNNs)

Misal, ada I input unit, Koutput unit, dan Hhidden unit (state).

Y1 Y2 Komputasi RNNs untuk satusample:

9 October 2017
ht  R H 1 xt  R I 1 yt  R K 1

W (hy) W ( xh)  R H I W (hh)  R H H

Alfan F.Wicaksono, FASILKOM UI

s1 s2
W (hy)  R K H
W (hh)
W ( xh) h  W ( xh)  x W (hh)  s
t t t1

st  tanh ht 
yt  W (hy) st
X1 X2 72

h0  0
Recurrent NeuralNetworks(RNNs)

BackPropagation ThroughTime(BPTT)
The loss function depends on the activation
Y1 Y2
of the hidden layer not only through its

9 October 2017
influence on the output layer.
(y)
 K (hy) ( y) H (hh) (h) 
 (h)
i,t    Wi, j  j,t  Wi,n  n,t1   f ' hi,t 
 j 1 

Alfan F.Wicaksono, FASILKOM UI

n1
s1 s2
(h)
Lt
Dioutput:  
(y )

yi,t
i,t

Di setiap step,kecuali
palingkanan:
L
X1 X2 (h)i,t t (h)i,T 1  0 73
hi,t
Alex Graves, Supervised Sequence Labelling with Recurrent Neural Networks
Recurrent NeuralNetworks(RNNs)

BackPropagation ThroughTime(BPTT)
The same weights are reused at every
Y1 Y2
timestep, we sum over the whole sequence to

9 October 2017
get the derivatives with respect to the
network weights.

W (hy) L T

 

Alfan F.Wicaksono, FASILKOM UI

s1 s2  (h)
 si,t1
Wi, j
(hh)
t1
j,t

W (hh)
L T

W ( xh) Wi, j
(hy )
 
t1
 j,t  s i,t
( y)

L

T
 (h)
x
X1 X2 Wi, j
( xh)
t1
j,t i,t
74
Recurrent NeuralNetworks(RNNs)

BackPropagation ThroughTime(BPTT)

9 October 2017
Alfan F.Wicaksono, FASILKOM UI
75

Pascanu et al., On the difficulty of training Recurrent Neural Networks, 2013

Recurrent NeuralNetworks(RNNs)

BackPropagation ThroughTime(BPTT)
Misal, untuk parameter antar state: Term-term ini disebut temporal
contribution: bagaimana W(hh)

9 October 2017
pada step k mempengaruhi cost
pada step-step setelahnya (t >k)
Lt

t

L h
t t k
 
 
h
W (hh)
h h W (hh)

Alfan F.Wicaksono, FASILKOM UI

k 1 t k 1 k t
Diputus sampai k step ke belakang. Di
sini artinya “immediate derivative”,
yaitu hk-1dianggap konstan terhadap
W(hh).
ht ht ht 1 hk 2 hk 1
  
hk ht1 ht2 hk1 hk

 
76
  hk t W
 ( xh)
 xt 1W  s
(hh)
 s
W (hh)
W (hh) t1
Recurrent NeuralNetworks(RNNs)

Vanishing&Exploding GradientProblems
Bengio et al., (1994) said that “the exploding gradients problem refers to the
large increase in the norm of the gradient during training. Such events are

9 October 2017
caused by the explosion of the long term components, which can grow
exponentially more then short term ones.”

And “The vanishing gradients problem refers to the opposite behaviour, when

Alfan F.Wicaksono, FASILKOM UI

long term components go exponentially fast to norm 0, making it impossible for
the model to learn correlation between temporally distant events.”

Kokbisa terjadi? Cobalihat salah satu temporal componentdarisebelumnya:

ht ht ht 1 hk 2 hk 1

  
hk ht1 ht2 hk1 hk
In the same way a product of t - k real numbers can shrink to zero or explode to
77
infinity, so does this product of Matrices. (Pascanu et al.,)

Bengio, Y., Simard, P., and Frasconi, P.(1994). Learning long-term dependencies with gradient
descent is difficult. IEEE Transactions on Neural Networks
Sequential Jacobianbiasa
Recurrent NeuralNetworks(RNNs) digunakan untuk analisis
penggunaan konteks pada
Vanishing&Exploding Gradient Problems RNNs.

9 October 2017
Alfan F.Wicaksono, FASILKOM UI
78

Alex Graves, Supervised Sequence Labelling with Recurrent Neural Networks

Recurrent NeuralNetworks(RNNs)

Solusi untuk VanishingGradientProblem

1) Penggunaan non-gradient based training algorithms (Simulated
Annealing, Discrete Error Propagation, etc.) (Bengio et al.,1994)

9 October 2017
2) Definisikan arsitektur baru di dalam RNNCell!, seperti Long-Short Term
Memory(LSTM)(Hochreiter &Schmidhuber,1997).

Alfan F.Wicaksono, FASILKOM UI

3) Untuk metode yang lain, silakan merujuk (Pascanu et al.,2013).

Pascanuet al., On the difficulty of training Recurrent Neural Networks, 2013

S. Hochreiter and J.Schmidhuber. Long Short-Term Memory. Neural Computation,9(8):1735

1780, 1997 79

Y. Bengio, P.Simard, and P.Frasconi. Learning Long-Term Dependencies with Gradient

Descent is Difficult. IEEE Transactions on Neural Networks, 1994
Recurrent NeuralNetworks(RNNs)

Variant: Bi-DirectionalRNNs

9 October 2017
Alfan F.Wicaksono, FASILKOM UI
80

Alex Graves, Supervised Sequence Labelling with Recurrent Neural Networks

Recurrent NeuralNetworks(RNNs)
Sequential Jacobian (Sensitivity) untuk Bi-Directional RNNs

9 October 2017
Alfan F.Wicaksono, FASILKOM UI
81

Alex Graves, Supervised Sequence Labelling with Recurrent Neural Networks

Long-ShortTermMemory(LSTM)

1. The LSTM architecture consists of a set of recurrently

connected subnets, known as memoryblocks.

9 October 2017
2. These blocks can be thought of as a differentiable version
of the memory chips in a digital computer.
3. Each block contains:

Alfan F.Wicaksono, FASILKOM UI

1. Self-connected memory cells
2. Three multiplicative units (gates)
1. Input gates (analogue of write operation)
2. Output gates (analogue of read operation)
3. Forget gates ((analogue of reset operation))

S. Hochreiter and J.Schmidhuber. Long Short-Term Memory. Neural Computation,9(8):1735

1780, 1997
Long-ShortTermMemory(LSTM)

017
2
r
ob
e
ct
O
9

Alfan F.Wicaksono, FASILKOM UI

83
Alex Graves, Supervised Sequence Labelling with Recurrent Neural Networks
S. Hochreiter and J.Schmidhuber. Long Short-Term Memory. Neural Computation,9(8):1735
1780, 1997
Long-ShortTermMemory(LSTM)

The multiplicative gates allow LSTM

memory cells to store and access

9 October 2017
information over long periods of time,
thereby mitigating the vanishing gradient
problem.

Alfan F.Wicaksono, FASILKOM UI

For example, as long as the input gate
remains closed (i.e. has an activation near
0), the activation of the cell will not be
overwritten by the new inputs arriving in
the network, and can therefore be made
available to the net much later in the
sequence, by opening the output gate.

84
Alex Graves, Supervised Sequence Labelling with Recurrent Neural Networks
S. Hochreiter and J.Schmidhuber. Long Short-Term Memory. Neural Computation,9(8):1735
1780, 1997
Long-ShortTermMemory(LSTM)

Visualisasi lain dari satu cell diLSTM

9 October 2017
Alfan F.Wicaksono, FASILKOM UI
85

S. Hochreiter and J.Schmidhuber. Long Short-Term Memory. Neural Computation,9(8):1735

1780, 1997
Long-ShortTermMemory(LSTM)

KomputasidiLSTM

9 October 2017
Alfan F.Wicaksono, FASILKOM UI
86

S. Hochreiter and J.Schmidhuber. Long Short-Term Memory. Neural Computation,9(8):1735

1780, 1997
Long-ShortTermMemory(LSTM)
Preservation of Gradient Information padaLSTM

9 October 2017
Alfan F.Wicaksono, FASILKOM UI
87
Alex Graves, Supervised Sequence Labelling with Recurrent Neural Networks
S. Hochreiter and J.Schmidhuber. Long Short-Term Memory. Neural Computation,9(8):1735
1780, 1997
Example: RNNsfor POSTagger
(Zennaki,2015)

9 October 2017
PRP VBD TO J NN
J

Alfan F.Wicaksono, FASILKOM UI

h1 h2 h3 h4 h5

88
I went to west java
LSTM + CRF for Semantic RoleLabeling
(Zhou and Xu,ACL2015)

9 October 2017
Alfan F.Wicaksono, FASILKOM UI
89
AttentionMechanism

A potential issue with this encoder–decoder approach is that a neural

9 October 2017
network needs to be able to compress all the necessary information of
a source sentence into a fixed-length vector. This may make it difficult
for the neural network to cope with long sentences, especially those
that are longer than the sentences in the training corpus.

Alfan F.Wicaksono, FASILKOM UI

Dzmitry Bahdanau, et al., Neural machine translation by jointly
learning to align and translate, 2015

90
AttentionMechanism

Neural Translation Model TANPAAttention Mechanism

9 October 2017
Alfan F.Wicaksono, FASILKOM UI
Sutkever, Ilya et al., Sequence to Sequence Learning with Neural
Networks, NIPS 2014.
91
https://blog.heuritech.com/2016/01/20/attention-mechanism/

AttentionMechanism

Neural Translation Model TANPAAttention Mechanism

9 October 2017
Sutkever, Ilya et al., Sequence
to Sequence Learning with
Neural Networks, NIPS 2014.

Alfan F.Wicaksono, FASILKOM UI

92
AttentionMechanism

Mengapa Perlu AttentionMechanism?

9 October 2017
• Each time the proposed model generates a word in a translation, it
(soft-)searches for a set of positions in a source sentence where the
most relevant information is concentrated. The model then predicts
a target word based on the context vectors associated with these

Alfan F.Wicaksono, FASILKOM UI

source positions and all the previous generated target words.
• …it encodes the input sentence into a sequence of vectors and
chooses a subset of these vectors adaptively while decoding the
translation. This frees a neural translation model from having to
squash all the information of a source sentence, regardless of its
length, into afixed-length vector.

Dzmitry Bahdanau, et al., Neural machine translation by jointly

93
learning to align and translate, 2015
https://blog.heuritech.com/2016/01/20/attention-mechanism/

AttentionMechanism

Neural Translation Model – dengan Attention Mechanism

9 October 2017
Alfan F.Wicaksono, FASILKOM UI
94

Dzmitry Bahdanau, Kyunghyun Cho, Yoshua Bengio, NeuralMachineTranslation by Jointly Learning

to Align and Translate, arXiv:1409.0473,2016
AttentionMechanism

Neural Translation Model – dengan Attention Mechanism

9 October 2017
Alfan F.Wicaksono, FASILKOM UI
95
Cell merepresentasikan bobot attention,terkait translation.

Dzmitry Bahdanau, Kyunghyun Cho, Yoshua Bengio, NeuralMachineTranslation by Jointly Learning

to Align and Translate, arXiv:1409.0473,2016
AttentionMechanism

Simple Attention Networks untukSentence Classification

9 October 2017
Alfan F.Wicaksono, FASILKOM UI
96

Colin Raffel, Daniel P.W.Ellis, F EED -F ORWARD NETWORKS WITH ATTENTION CAN
S OLVES OME L ONG-T ERM MEMORY P ROBLEMS, Workshop track - ICLR 2016
AttentionMechanism

Hierarchical Attention Networks untukSentence Classification

9 October 2017
Alfan F.Wicaksono, FASILKOM UI
97

Yang, Zichao, et al., Hierarchical Attention Networks for Document Classification, NAACL 2016
AttentionMechanism

Hierarchical Attention Networks untukSentence Classification

Task:Prediksi RatingDokumen

9 October 2017
Alfan F.Wicaksono, FASILKOM UI
98

Yang, Zichao, et al., Hierarchical Attention Networks for Document Classification, NAACL 2016
AttentionMechanism

9 October 2017
Alfan F.Wicaksono, FASILKOM UI
99

https://blog.heuritech.com/2016/01/20/attention-mechanism/
AttentionMechanism

9 October 2017
Alfan F.Wicaksono, FASILKOM UI
100
https://blog.heuritech.com/2016/01/20/attention-mechanism/
Xu, Kelvin, et al. « Show, Attend and Tell: Neural Image Caption
Generation with Visual Attention (2016).
AttentionMechanism

9 October 2017
Alfan F.Wicaksono, FASILKOM UI
101
https://blog.heuritech.com/2016/01/20/attention-mechanism/
Xu, Kelvin, et al. « Show, Attend and Tell: Neural Image Caption
Generation with Visual Attention (2016).
AttentionMechanism

Attention Mechanism untuk Textual Entailment

Diberikan pasangan premis-hipotesis, tentukan apakah 2 tersebut
kontradiksi, tidak berhubungan, atau logicallyentail.

9 October 2017
Attention Model digunkan
Sebagai contoh:
untuk menghubungkan
• Premis: “A wedding party taking pictures“ kata-kata di premis dan

Alfan F.Wicaksono, FASILKOM UI

• Hipotesis: “Someone got married“ hipotesis.

102

Tim Rocktaschel et al., REASONING ABOUT ENTAILMENT WITH NEURAL ATTENTION, ICLR 2016
AttentionMechanism

Attention Mechanism untuk Textual Entailment

Diberikan pasangan premis-hipotesis, tentukan apakah 2 tersebut
kontradiksi, tidak berhubungan, atau logicallyentail.

9 October 2017
Alfan F.Wicaksono, FASILKOM UI
103

Tim Rocktaschel et al., REASONING ABOUT ENTAILMENT WITH NEURAL ATTENTION, ICLR 2016
Recursive NeuralNetworks

9 October 2017
Alfan F.Wicaksono, FASILKOM UI
104

R. Socher, C. Lin, A. Y.Ng, and C.D. Manning. 2011a. Parsing Natural Scenes and Natural Language with
Recursive Neural Networks. In ICML
Recursive NeuralNetworks

9 October 2017
Alfan F.Wicaksono, FASILKOM UI
p1  gW.b;c bias 105

Socher et al., Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank, EMNLP
2013
Recursive NeuralNetworks

9 October 2017
Alfan F.Wicaksono, FASILKOM UI
106

Socher et al., Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank, EMNLP
2013
Convolutional Neural Networks (CNNs) for Sentence Classification
(Kim,EMNLP2014)

9 October 2017
Alfan F.Wicaksono, FASILKOM UI
107
Recursive Neural Network for SMT Decoding.
(Liu et al., EMNLP2014)

9 October 2017
Alfan F.Wicaksono, FASILKOM UI
108

NPMA Certification Handbook
No ratings yet
NPMA Certification Handbook
13 pages
Aajeevika - National Rural Livelihoods Mission (NRLM)
No ratings yet
Aajeevika - National Rural Livelihoods Mission (NRLM)
4 pages
Agroconsultant: Intelligent Crop Recommendation System Using Machine Learning Algorithms
No ratings yet
Agroconsultant: Intelligent Crop Recommendation System Using Machine Learning Algorithms
6 pages
A Fuzzy Ontology and Its Application To News Summarization
100% (1)
A Fuzzy Ontology and Its Application To News Summarization
22 pages
Ecs R40iix - Rev C Sec
No ratings yet
Ecs R40iix - Rev C Sec
37 pages
8-Deep Learning For NLP
No ratings yet
8-Deep Learning For NLP
49 pages
Understanding Machine Learning Algorithms - in Depth
No ratings yet
Understanding Machine Learning Algorithms - in Depth
167 pages
Deep Learning For Health Informatics
No ratings yet
Deep Learning For Health Informatics
18 pages
Finger Vein Recognition
No ratings yet
Finger Vein Recognition
20 pages
Project
No ratings yet
Project
39 pages
Awesome Machine Learning Papers
100% (1)
Awesome Machine Learning Papers
326 pages
DL PRACTICAL FILE
No ratings yet
DL PRACTICAL FILE
58 pages
Machine Learning
100% (1)
Machine Learning
46 pages
Logistic Regression
No ratings yet
Logistic Regression
41 pages
A Thesis On Automated Handling of Port Containers Using Machine Learning
100% (1)
A Thesis On Automated Handling of Port Containers Using Machine Learning
59 pages
Main PPT2
No ratings yet
Main PPT2
31 pages
Machine Learning
No ratings yet
Machine Learning
38 pages
ChatGPT_MyLearning on Coding for Machine Learning
No ratings yet
ChatGPT_MyLearning on Coding for Machine Learning
16 pages
Customer Churn Prediction
No ratings yet
Customer Churn Prediction
70 pages
Deep Learning Lab With Output
No ratings yet
Deep Learning Lab With Output
12 pages
What's The Difference Between AI, Machine Learning
No ratings yet
What's The Difference Between AI, Machine Learning
21 pages
Artificial Neural Network
100% (1)
Artificial Neural Network
35 pages
2023.02 - Time Series Forecasting With Transformer Models - en
100% (1)
2023.02 - Time Series Forecasting With Transformer Models - en
52 pages
Artificial Neural Networks
No ratings yet
Artificial Neural Networks
57 pages
Thesis Final
0% (1)
Thesis Final
186 pages
Mini Project Clustering
50% (2)
Mini Project Clustering
33 pages
Machine Learning
No ratings yet
Machine Learning
2 pages
Machine Learning GL
No ratings yet
Machine Learning GL
25 pages
machine learning final manual
No ratings yet
machine learning final manual
45 pages
Machine Learning Interview Questions
No ratings yet
Machine Learning Interview Questions
8 pages
Neural Networks
No ratings yet
Neural Networks
13 pages
Radial Basis Function
No ratings yet
Radial Basis Function
35 pages
Assiegnment On Khidmah Credit Card
No ratings yet
Assiegnment On Khidmah Credit Card
14 pages
Data Science Learning Plan
No ratings yet
Data Science Learning Plan
3 pages
CS 8520: Artificial Intelligence: Knowledge Representation
No ratings yet
CS 8520: Artificial Intelligence: Knowledge Representation
30 pages
Vision Statement
No ratings yet
Vision Statement
1 page
Evaluating Bert and Parsbert For Analyzing Persian Advertisement Data
No ratings yet
Evaluating Bert and Parsbert For Analyzing Persian Advertisement Data
12 pages
CISC 6080 Capstone Project in Data Science
No ratings yet
CISC 6080 Capstone Project in Data Science
9 pages
Deeplearningsmartnetworks 190505233523
100% (1)
Deeplearningsmartnetworks 190505233523
101 pages
Deep Learning For Face Recognition
No ratings yet
Deep Learning For Face Recognition
47 pages
Artificial Neural Networks
No ratings yet
Artificial Neural Networks
43 pages
Predicting Credit Risk For Unsecured Lending
No ratings yet
Predicting Credit Risk For Unsecured Lending
9 pages
Computer Vision Based Attendance Management System For Students
No ratings yet
Computer Vision Based Attendance Management System For Students
6 pages
Leukemia Cancer Cells Segmentation and Classification Using Machine Learning
No ratings yet
Leukemia Cancer Cells Segmentation and Classification Using Machine Learning
18 pages
MLOps Buyers Guide by Seldon
No ratings yet
MLOps Buyers Guide by Seldon
11 pages
Study Material BTech IT VIII Sem Subject Deep Learning Deep Learning Btech IT VIII Sem
No ratings yet
Study Material BTech IT VIII Sem Subject Deep Learning Deep Learning Btech IT VIII Sem
30 pages
Lecture 2 Prompt Engineering
No ratings yet
Lecture 2 Prompt Engineering
60 pages
Final Year Project
No ratings yet
Final Year Project
57 pages
Loan Prediction System
No ratings yet
Loan Prediction System
5 pages
Hybrid Neural Networks: Fundamentals and Applications for Interacting Biological Neural Networks with Artificial Neuronal Models
From Everand
Hybrid Neural Networks: Fundamentals and Applications for Interacting Biological Neural Networks with Artificial Neuronal Models
Fouad Sabry
No ratings yet
Proceedings Book
No ratings yet
Proceedings Book
295 pages
Deep Learning Models For Spatio-Temporal Forecasting and Analysis
No ratings yet
Deep Learning Models For Spatio-Temporal Forecasting and Analysis
131 pages
Lazy Lerners (Learning From Your Neighbours)
100% (1)
Lazy Lerners (Learning From Your Neighbours)
11 pages
Beyond Binary Exploring the Depths of Artificial Intelligence: programming, #2
From Everand
Beyond Binary Exploring the Depths of Artificial Intelligence: programming, #2
guddu rathore
No ratings yet
Big Data Approach For Secure Traffic Data Analytics Using Hadoop
No ratings yet
Big Data Approach For Secure Traffic Data Analytics Using Hadoop
4 pages
Abstract On The Artificial Intelegence
No ratings yet
Abstract On The Artificial Intelegence
15 pages
Group E Deep Learning Final
No ratings yet
Group E Deep Learning Final
31 pages
Introduction To Parallel Programming: Linda Woodard CAC 19 May 2010
100% (1)
Introduction To Parallel Programming: Linda Woodard CAC 19 May 2010
38 pages
NLP Presentation
No ratings yet
NLP Presentation
20 pages
MIS-15 - Data and Knowledge Management
No ratings yet
MIS-15 - Data and Knowledge Management
55 pages
Back Propagation Technique
No ratings yet
Back Propagation Technique
24 pages
Memahami Deep Learning
100% (1)
Memahami Deep Learning
109 pages
Whatis Iso 20000
No ratings yet
Whatis Iso 20000
24 pages
Account Example-Open Erp
No ratings yet
Account Example-Open Erp
1 page
Welcome To The Medical Documentation Project
No ratings yet
Welcome To The Medical Documentation Project
33 pages
OpenERP Modul Medical
No ratings yet
OpenERP Modul Medical
33 pages
Maturity Model in Business Management
No ratings yet
Maturity Model in Business Management
20 pages
Cobit 4.1. 2008
No ratings yet
Cobit 4.1. 2008
35 pages
E Business Dan Review
No ratings yet
E Business Dan Review
47 pages
Project Portfolio Management March 2011
100% (2)
Project Portfolio Management March 2011
55 pages
Arsitektur Strategik + TOGAF
No ratings yet
Arsitektur Strategik + TOGAF
66 pages
BCG WetFeet Insider Guide
100% (1)
BCG WetFeet Insider Guide
92 pages
SOP E Payment
No ratings yet
SOP E Payment
18 pages
IT Strategic Plan
No ratings yet
IT Strategic Plan
9 pages
Permit To Operate Form
No ratings yet
Permit To Operate Form
2 pages
Aircraft Weighing Report 2.00: Weight CG
No ratings yet
Aircraft Weighing Report 2.00: Weight CG
30 pages
Bav99 Diodes SMD
No ratings yet
Bav99 Diodes SMD
3 pages
Truck and Loader Production2
No ratings yet
Truck and Loader Production2
97 pages
Adding ICONS To Nagios Map - ScottyP's Blog
No ratings yet
Adding ICONS To Nagios Map - ScottyP's Blog
4 pages
97
No ratings yet
97
20 pages
Csaw 7020
No ratings yet
Csaw 7020
20 pages
The Future of The Physical Learning Environment
No ratings yet
The Future of The Physical Learning Environment
8 pages
Quantum Computer Science An Introduction
No ratings yet
Quantum Computer Science An Introduction
7 pages
UNIT - IV - PPT
100% (1)
UNIT - IV - PPT
18 pages
AutoCAD Plant 3D De-Mystifying Isos
100% (2)
AutoCAD Plant 3D De-Mystifying Isos
69 pages
Smart Breaking System
No ratings yet
Smart Breaking System
6 pages
Ericsson For Sale From Powerstorm 4SI11051093
No ratings yet
Ericsson For Sale From Powerstorm 4SI11051093
5 pages
Modelling and Optimization of Hybrid Kevlar Glass Fabric Reinforced Polymer Composites for Low-Velocity Impact Resistant Applications
No ratings yet
Modelling and Optimization of Hybrid Kevlar Glass Fabric Reinforced Polymer Composites for Low-Velocity Impact Resistant Applications
15 pages
Erection ITP
No ratings yet
Erection ITP
2 pages
Envicool Cabinet Climate Control Solutions
No ratings yet
Envicool Cabinet Climate Control Solutions
26 pages
Gates Model PDF
No ratings yet
Gates Model PDF
1 page
E D1080 Pages: 3: Answer Any Three Questions. Each Question Carries 10 Marks
No ratings yet
E D1080 Pages: 3: Answer Any Three Questions. Each Question Carries 10 Marks
3 pages
Progress Report No. 54 November 4, 2022 To December 3, 2022
No ratings yet
Progress Report No. 54 November 4, 2022 To December 3, 2022
116 pages
A Report On Wireless Power Transfer Via Strongly Coupled Magnetic Resonances
No ratings yet
A Report On Wireless Power Transfer Via Strongly Coupled Magnetic Resonances
7 pages
Cummins KTA19 and QSK19 Engine Overhaul Parts
100% (1)
Cummins KTA19 and QSK19 Engine Overhaul Parts
13 pages
Injection Mold Design PDF
No ratings yet
Injection Mold Design PDF
48 pages
Build Prop
No ratings yet
Build Prop
8 pages
Turbomachinery Full
No ratings yet
Turbomachinery Full
40 pages
V&V Importance
No ratings yet
V&V Importance
9 pages
Self Polishing
No ratings yet
Self Polishing
8 pages
Quick Guide System 1200 - DTM Stakeout
0% (1)
Quick Guide System 1200 - DTM Stakeout
7 pages
Water Quality
No ratings yet
Water Quality
304 pages
AU2015 RetainingWallsMadeEasy (SolidModel) PartII
No ratings yet
AU2015 RetainingWallsMadeEasy (SolidModel) PartII
29 pages