Sparse, Stacked and Variational Autoencoder | by Venkata Krishna Jonnalagadda | Medium
Sparse, Stacked and Variational Autoencoder | by Venkata Krishna Jonnalagadda | Medium
Autoencoder
Venkata Krishna Jonnalagadda · Follow
11 min read · Dec 6, 2018
497 2
Autoencoder:
Sparse Autoencoder:
An autoencoder takes the input image or vector and learns code dictionary
that changes the raw input from one representation to another. Where in
sparse autoencoders with a sparsity enforcer that directs a single layer
network to learn code dictionary which in turn minimizes the error in
reproducing the input while restricting number of code words for
reconstruction[8].
Stacked Autoencoder:
· Train autoencoder using input data and acquire the learned data.
· The learned data from the previous layer is used as an input for the next
layer and this continues until the training is completed.
· Once all the hidden layers are trained use the backpropagation algorithm to
minimize the cost function and weights are updated with the training set to
achieve fine tuning.
Variational Autoencoder:
Continue
Figure 4: Layers in an Autoencoder [2]
Autoencoders Application:
Today data denoising and dimensionality reduction for data visualization are
the two major applications of autoencoders. With dimensionality and
sparsity constraints, autoencoders can learn data projections which is better
than PCA.[11]
Autoencoders or its variants such as stacked, sparse or VAE are used for
compact representation of data. For example a 256x256 pixel image can be
represented by 28x28 pixel. Google is using this type of network to reduce
the amount band width you use it on your phone. If you download an image
the full resolution of the image is downscaled and then sent to you via
wireless internet and then in your phone a decoder that reconstructs the
image to full resolution.
Machine translation: it has been studied since late 1950s and an incredibly a
difficult problem to translate text from one human language to another
human language. With the use of autoencoders machine translation has
taken a huge leap forward to accurately translate text from one language to
another.
Reverberant speech recognition using deep learning in front end and back of
a system. This model is built by Mimura, Sakai and Kawahara, 2015 where
they adopted a deep autoencoder(DAE) for enhancing the speech at the front
end and recognition of speech is performed by DNN-HMM acoustic models
at the back end [13].
In order to improve the accuracy of the ASR system on noisy utterances, will
be trained a collection of LSTM networks, which map features of a noisy
utterance to a clean utterance. The figure below shows the model used by
(Marvin Coto, John Goddard, Fabiola Martínez) 2016.
Figure 10: source [(Marvin Coto, John Goddard, Fabiola Martínez) 2016]
Top highlight
Figure 14: Colorful Image Colorization by Richard Zhang, Phillip Isola, Alexei A. Efros
An autoencoder could let you make use of pre trained layers from another
model, to apply transfer learning to prime the encoder/decoder.
Figure below from the 2006 Science paper by Hinton and Salakhutdinov
show a clear difference betwwen Autoencoder vs PCA. It shows
dimensionality reduction of the MNIST dataset (28×2828×28 black and white
images of single digits) from the original 784 dimensions to two.
Figure 15: from the 2006 Science paper by Hinton and Salakhutdinov
Autoencoders vs GAN
An autoencoder compresses its image or vector anything with a very high
dimensionality and run through the neural network and tries to compress
the data into a smaller representation, and then transforms it back into a
tensor with the same shape as its input over several neural net layers.
Autoencoders are trained to reproduce the input, so it’s kind of like learning
a compression algorithm for that specific dataset.
Another difference: while they both fall under the umbrella of unsupervised
learning, they are different approaches to the problem. A GAN is a
generative model — it’s supposed to learn to generate realistic new samples
of a dataset. Variational autoencoders are generative models, but normal
“vanilla” autoencoders just reconstruct their inputs and can’t generate
realistic new samples. [16]
Figure 16: Image reconstructed by VAE and VAE-GAN compared to their original input images [17]
References
[4] Liu, G., Bao, H. and Han, B. (2018). A Stacked Autoencoder-Based Deep
Neural Network for Achieving Gearbox Fault Diagnosis. [online] Hindawi.
Available at: https://www.hindawi.com/journals/mpe/2018/5105709/
[Accessed 23 Nov. 2018].
[15] Towards Data Science. (2018). Autoencoder Zoo — Image correction with
TensorFlow — Towards Data Science. [online] Available at:
https://towardsdatascience.com/autoencoder-zoo-669d6490895f [Accessed 27
Nov. 2018].
[17] Towards Data Science. (2018). What The Heck Are VAE-GANs? — Towards
Data Science. [online] Available at: https://towardsdatascience.com/what-the-
heck-are-vae-gans-17b86023588a [Accessed 30 Nov. 2018].
[18] Zhao, Y., Deng, B. and Shen, C. (2018). Spatio-Temporal AutoEncoder for
Video Anomaly Detection. MM ’17 Proceedings of the 25th ACM international
conference on Multimedia, pp.1933–1941.
Responses (2)
Responses (2)
Miwa Egner
Dec 5, 2020
unsupervised
... is an autoencoder really unsupervised? It has a correct answer to train the net -- itself.
Reply
Thanks for the great article! Would love to read more about SAE. The 8th reference from Eric Wilkinson is not
available any more. Do you know where can I find his article or paper?
Reply
More from Venkata Krishna Jonnalagadda
Lists
Hanen LM Po
Help Status About Careers Press Blog Privacy Terms Text to speech Teams