A Present A Ç Ão Deep Learning
A Present A Ç Ão Deep Learning
Need
Igor Caetano Diniz
Introduction
• The Transformer was proposed in the paper Attention is All You Need.
• A TensorFlow implementation of it is available as a part of the
Tensor2Tensor package.
• Softmax Function:
The Beast With Many Heads – Multi-head
Attention
• It gives the attention layer multiple “representation subspaces”.
• As we encode the word "it",
one attention head is focusing
most on "the animal", while
another is focusing on "tired“
• In a sense, the model's
representation of the word
"it" bakes in some of the
representation of both
"animal" and "tired".
Positional Encoding
• If we’re to think of a
Transformer of 2 stacked
encoders and decoders
Residuals
• cross-entropy
• Kullback–Leibler divergence.
Obrigado!