0% found this document useful (0 votes)
4 views

Sparse, Stacked and Variational Autoencoder | by Venkata Krishna Jonnalagadda | Medium

The document discusses various types of autoencoders, including sparse, stacked, and variational autoencoders, detailing their architectures, functionalities, and applications in machine learning. It highlights how these models are used for tasks such as dimensionality reduction, data denoising, and feature learning, emphasizing their advantages over traditional methods like PCA. Additionally, the document explores advancements in autoencoders and their roles in fields like natural language processing and speech recognition.

Uploaded by

20pcse02
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views

Sparse, Stacked and Variational Autoencoder | by Venkata Krishna Jonnalagadda | Medium

The document discusses various types of autoencoders, including sparse, stacked, and variational autoencoders, detailing their architectures, functionalities, and applications in machine learning. It highlights how these models are used for tasks such as dimensionality reduction, data denoising, and feature learning, emphasizing their advantages over traditional methods like PCA. Additionally, the document explores advancements in autoencoders and their roles in fields like natural language processing and speech recognition.

Uploaded by

20pcse02
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 17

Sparse, Stacked and Variational

Autoencoder
Venkata Krishna Jonnalagadda · Follow
11 min read · Dec 6, 2018

497 2

Autoencoder:

An Autoencoder is a neural network which is an unsupervised learning


algorithm which uses back propagation to generate output value which is
almost close to the input value. Lets see now how an autoencoder actually
works in detail. It takes input such as image or vector anything with a very
high dimensionality and run through the neural network and tries to
compress the data into a smaller representation with two principal
components. The first one is the encoder which is simply a bunch of layers
that are full connected layers or convolutional layers which are going to take
the input and compress it to a smaller representation which has less
dimensions then the input which is known as bottleneck. Now from this
bottleneck it tries to reconstruct the input using full connected layers or
convolutional layers.

Figure 1: Layers in an autoencoder [1]

Sparse Autoencoder:

An autoencoder takes the input image or vector and learns code dictionary
that changes the raw input from one representation to another. Where in
sparse autoencoders with a sparsity enforcer that directs a single layer
network to learn code dictionary which in turn minimizes the error in
reproducing the input while restricting number of code words for
reconstruction[8].

The sparse autoencoder consists a single hidden layer, which is connected to


the input vector by a weight matrix forming the encoding step. The hidden
layer then outputs to a reconstruction vector, using a tied weight matrix to
form the decoder[8].

Figure 2: Sparse autoencoder[8]

An advancement to sparse autoencoders is the k-sparse autoencoder. Here


we choose k neurons with highest activation functions ignoring other
activation functions using ReLU activation functions and adjusting the
threshold to find the largest neurons. This tune the value of k to obtain
sparsity level best suited for the dataset[8].

Stacked Autoencoder:

A stacked autoencoder is a neural network consist several layers of sparse


autoencoders where output of each hidden layer is connected to the input of
the successive hidden layer.

Figure 3: Stacked Autoencoder[3]

As shown in Figure above the hidden layers are trained by an unsupervised


algorithm and then fine-tuned by a supervised method. Stacked autoencoder
mainly consists of three steps[4].

· Train autoencoder using input data and acquire the learned data.
· The learned data from the previous layer is used as an input for the next
layer and this continues until the training is completed.

· Once all the hidden layers are trained use the backpropagation algorithm to
minimize the cost function and weights are updated with the training set to
achieve fine tuning.

The recent advancements in Stacked Autoendocer is it provides a version of


raw data with much detailed and promising feature information, which is
used to train a classier with a specific context and find better accuracy than
training with raw data.

Stacked autoencoder improving accuracy in deep learning with noisy


autoencoders embedded in the layers [5].

Stacked autoencoder are used for P300 Component Detection and


Classification of 3D Spine Models in Adolescent Idiopathic Scoliosis in
medical science. Classification of the rich and complex variability of spinal
deformities is critical for comparisons between treatments and for long-
term patient follow-ups.

Variational Autoencoder:

The basic idea behind a variational autoencoder is that instead of mapping


an input to fixed vector, input is mapped to a distribution. The only
difference between the autoencoder and variational autoencoder is that
bottleneck vector is replaced with two different vectors one representing the
mean of the distribution and the other representing the standard deviation
of the distribution.

Loss function for variational autoencoder

Search Write Sign up Sign in


li(θ,ϕ)=−Ez∼qθ(z∣xi)[logpϕ(xi∣z)]+KL(qθ(z∣xi)∣∣p(z))

The loss function in variational autoencoder consists of two terms. First,


Sign in with one
Google
represents the reconstruction loss and the second term is a regularizer and
KL means Kullback-Leibler divergence between the encoder’sUse distribution
your GoogleqθAccount
to sign in to Medium
(z∣x) and p (z). This divergence measures how much information is lost when
using q to represent p. No more passwords to
remember. Signing in is fast,
simple and secure.

Continue
Figure 4: Layers in an Autoencoder [2]

Figure 5: Layers in Variational Autoencoder [2]


Recent advancements in VAE as mentioned in [6] which improves the quality
of VAE samples by adding two more components. Firstly, a pre-trained
classifier as extractor to input data which aligns the reproduced images.
Secondly, a discriminator network for additional adversarial loss signals.
Other significant improvement in VAE is Optimization of the Latent
Dependency Structure by [7]. In this VAE parameters, network parameters
are optimized with a single objective. Interference is formed through
sampling which produces expectations over latent variable structures and
incorporates top-down and bottom-up reasoning over latent variable values.

Autoencoders Application:

Today data denoising and dimensionality reduction for data visualization are
the two major applications of autoencoders. With dimensionality and
sparsity constraints, autoencoders can learn data projections which is better
than PCA.[11]

Previously Autoencoders are used for dimensionality reduction or feature


learning. In recent developments with connection with the latent variable
models have brought autoencoders to forefront of the generative modelling.
[11]

Autoencoders or its variants such as stacked, sparse or VAE are used for
compact representation of data. For example a 256x256 pixel image can be
represented by 28x28 pixel. Google is using this type of network to reduce
the amount band width you use it on your phone. If you download an image
the full resolution of the image is downscaled and then sent to you via
wireless internet and then in your phone a decoder that reconstructs the
image to full resolution.

Autoencoders are used in Natural Language Processing, where NLP enclose


some of the most difficult problems in computer science. With advancement
in deep learning and indeed, autoencoders are been used to overcome some
of these problems[9].

Word Embedding: Words or phrases from a sentence or context of a word in


a document are sorted in relation with other words.
Figure 6: Some example word vectors mapped to two dimensions. Source:
http://suriyadeepan.github.io/img/seq2seq/we1.png

Document Clustering: classification of documents such as blogs or news or


any data into recommended categories. The challenge is to accurately
cluster the documents into categories where there actually fit. Hinton used
autoencoder to reduce the dimensionality vectors to represent the word
probabilities in newswire stories[10]. The Figure below shows the
comparisons of Latent Semantic analysis and an autoencoder based on PCA
and non linear dimensionality reduction algorithm proposed by Roweis
where autoencoder outperformed LSA.[10]
Figure 7: Clustering documents using (B) LSA and © an autoencoder. Source: [10].

Machine translation: it has been studied since late 1950s and an incredibly a
difficult problem to translate text from one human language to another
human language. With the use of autoencoders machine translation has
taken a huge leap forward to accurately translate text from one language to
another.

Autoencoders to extract speech: A deep generative model of spectrograms


containing 256 frequency bins and 1,3,9 or 13 frames has been created by
[12]. This model has one visible layer and one hidden layer of 500 to 3000
binary latent variables.[12]

Figure 8: Speech extraction [12].

With Deep Denoising Autoencoders(DDAE) which has shown drastic


improvement in performance has the capability to recognize the whispered
speech which has been a problem for a long time in Automatic Speech
Recognition(ASR). This has been implemented in various smart devices such
as Amazon Alexa.

Reverberant speech recognition using deep learning in front end and back of
a system. This model is built by Mimura, Sakai and Kawahara, 2015 where
they adopted a deep autoencoder(DAE) for enhancing the speech at the front
end and recognition of speech is performed by DNN-HMM acoustic models
at the back end [13].

Denoising of speech using deep autoencoders:

In actually conditions we experience speech signals are contaminated by


noise and reverberation. If this speech is used by SR it may experience
degradation in speech quality and in turn effect the performance.

Figure 9: (left) clean speech signal (right) noisy speech signal.

In order to improve the accuracy of the ASR system on noisy utterances, will
be trained a collection of LSTM networks, which map features of a noisy
utterance to a clean utterance. The figure below shows the model used by
(Marvin Coto, John Goddard, Fabiola Martínez) 2016.

Figure 10: source [(Marvin Coto, John Goddard, Fabiola Martínez) 2016]

Spatio-Temporal AutoEncoder for Video Anomaly Detection:

Anomalous events detection in real-world video scenes is a challenging


problem due to the complexity of “anomaly” as well as the cluttered
backgrounds, objects and motions in the scenes. In (Zhao, Deng and Shen,
2018) they proposed model called Spatio-Temporal AutoEncoder which
utilizes deep neural networks to learn video representation automatically
and extracts features from both spatial and temporal dimensions by
performing 3-dimensional convolutions. They introduced a weight-
decreasing prediction loss for generating future frames, which enhances the
motion feature learning in videos. Since most anomaly detection datasets are
restricted to appearance anomalies or unnatural motion anomalies. Figure
below shows the architecture of the network. An encoder followed by two
branches of decoder for reconstructing past frames and predicting the
future frames.

Figure 11: source[18]

Top highlight

Reconstruction image using Convolutional Autoencoders: CAE are useful in


reconstruction of image from missing parts. For this the model has to be
trained with two different images as input and output. The input image can
rather be a noisy version or an image with missing parts and with a clean
output image. During training process the model learns and fills the gaps in
the input and output images. Here is an example below how CAE replace the
missing part of the image.

Figure 12: Reconstructed image from missing image [14]


Figure 13: Source [15]

Paraphrase Detection: in many languages two phrases may look differently


but when it comes to the meaning they both mean exactly same. Deep
learning autoencoders allow us to find such phrases accurately.

Many other advanced applications includes full image colorization,


generating higher resolution images by using lower resolution as input.

Figure 14: Colorful Image Colorization by Richard Zhang, Phillip Isola, Alexei A. Efros

Autoencoders advantage over PCA/SVD for dimensionality reduction:


Training an autoencoder with one dense encoder layer and one dense
decoder layer and linear activation is essentially equivalent to performing
PCA.

An autoencoder can learn non-linear transformations, unlike PCA, with a


non-linear activation function and multiple layers.

An autoencoder doesn’t have to learn dense (affine) layers; it can use


convolutional layers to learn too, which could be better for video, image and
series data.

It may be more efficient, in terms of model parameters, to learn several


layers with an autoencoder rather than learn one huge transformation with
PCA.

An autoencoder gives a representation as the output of each layer, and


maybe having multiple representations of different dimensions is useful.

An autoencoder could let you make use of pre trained layers from another
model, to apply transfer learning to prime the encoder/decoder.

Figure below from the 2006 Science paper by Hinton and Salakhutdinov
show a clear difference betwwen Autoencoder vs PCA. It shows
dimensionality reduction of the MNIST dataset (28×2828×28 black and white
images of single digits) from the original 784 dimensions to two.
Figure 15: from the 2006 Science paper by Hinton and Salakhutdinov

Autoencoders vs GAN
An autoencoder compresses its image or vector anything with a very high
dimensionality and run through the neural network and tries to compress
the data into a smaller representation, and then transforms it back into a
tensor with the same shape as its input over several neural net layers.
Autoencoders are trained to reproduce the input, so it’s kind of like learning
a compression algorithm for that specific dataset.

A GAN looks kind of like an inside out autoencoder — instead of


compressing high dimensional data, it has low dimensional vectors as the
inputs, high dimensional data in the middle.

Another difference: while they both fall under the umbrella of unsupervised
learning, they are different approaches to the problem. A GAN is a
generative model — it’s supposed to learn to generate realistic new samples
of a dataset. Variational autoencoders are generative models, but normal
“vanilla” autoencoders just reconstruct their inputs and can’t generate
realistic new samples. [16]

Figure 16: Image reconstructed by VAE and VAE-GAN compared to their original input images [17]

Autoencoders are an extremely exciting new approach to unsupervised


learning, and for virtually every major kind of machine learning task, they
have already surpassed the decades of progress made by researchers
handpicking features.

References

[1] et al N. A dynamic programming approach to missing data estimation


using neural networks; Available from:
https://www.researchgate.net/figure/222834127_fig1.

[2] Kevin frans blog. (2018). Variational Autoencoders Explained. [online]


Available at: http://kvfrans.com/variational-autoencoders-explained/
[Accessed 28 Nov. 2018].

[3] Packtpub.com. (2018). {{metadataController.pageTitle}}. [online] Available


at:
https://www.packtpub.com/mapt/book/big_data_and_business_intelligence/
9781787121089/4/ch04lvl1sec51/setting-up-stacked-autoencoders [Accessed
28 Nov. 2018].

[4] Liu, G., Bao, H. and Han, B. (2018). A Stacked Autoencoder-Based Deep
Neural Network for Achieving Gearbox Fault Diagnosis. [online] Hindawi.
Available at: https://www.hindawi.com/journals/mpe/2018/5105709/
[Accessed 23 Nov. 2018].

[5] V., K. (2018). Improving the Classification accuracy of Noisy Dataset by


Effective Data Preprocessing. International Journal of Computer Applications,
180(36), pp.37–46.

[6] Hou, X. and Qiu, G. (2018). IMPROVING VARIATIONAL AUTOENCODER


WITH DEEP FEATURE CONSISTENT AND GENERATIVE ADVERSARIAL
TRAINING. Workshop track — ICLR.

[7] Variational Autoencoders with Jointly Optimized Latent Dependency


Structure. (2018). ICLR 2019 Conference Blind Submission.

[8] Wilkinson, E. (2018). Deep Learning: Sparse Autoencoders. [online] Eric


Wilkinson. Available at:
http://www.ericlwilkinson.com/blog/2014/11/19/deep-learning-sparse-
autoencoders [Accessed 29 Nov. 2018].

[9] Doc.ic.ac.uk. (2018). Autoencoders: Applications in Natural Language


Processing. [online] Available at:
https://www.doc.ic.ac.uk/~js4416/163/website/nlp/ [Accessed 29 Nov. 2018].

[10] Hinton G, Salakhutdinov R. Reducing the Dimensionality of Data with


Neural Networks. Science. 2006;313(5786):504–507. Available from:
https://www.cs.toronto.edu/~hinton/science.pdf.

[11] Autoencoders: Bits and bytes, https://medium.com/towards-data-


science/autoencoders-bits-and-bytes-of-deep-learning-eaba376f23ad

[12] Binary Coding of Speech Spectrograms Using a Deep Auto-encoder, L.


Deng, et al.
[13] Mimura, M., Sakai, S. and Kawahara, T. (2015). Reverberant speech
recognition combining deep neural networks and deep autoencoders
augmented with a phone-class feature. EURASIP Journal on Advances in Signal
Processing, 2015(1).

[14] Towards Data Science. (2018). Autoencoders — Introduction and


Implementation in TF.. [online] Available at:
https://towardsdatascience.com/autoencoders-introduction-and-
implementation-3f40483b0a85 [Accessed 29 Nov. 2018].

[15] Towards Data Science. (2018). Autoencoder Zoo — Image correction with
TensorFlow — Towards Data Science. [online] Available at:
https://towardsdatascience.com/autoencoder-zoo-669d6490895f [Accessed 27
Nov. 2018].

[16] Anon, (2018). [online] Available at: https://www.quora.com/What-is-the-


difference-between-Generative-Adversarial-Networks-and-Autoencoders
[Accessed 30 Nov. 2018].

[17] Towards Data Science. (2018). What The Heck Are VAE-GANs? — Towards
Data Science. [online] Available at: https://towardsdatascience.com/what-the-
heck-are-vae-gans-17b86023588a [Accessed 30 Nov. 2018].

[18] Zhao, Y., Deng, B. and Shen, C. (2018). Spatio-Temporal AutoEncoder for
Video Anomaly Detection. MM ’17 Proceedings of the 25th ACM international
conference on Multimedia, pp.1933–1941.

Machine Learning Autoencoder

Written by Venkata Krishna Jonnalagadda Follow


46 Followers · 2 Following

Responses (2)
Responses (2)

What are your thoughts? Respond

Miwa Egner
Dec 5, 2020

unsupervised

I am a bit contused here.

... is an autoencoder really unsupervised? It has a correct answer to train the net -- itself.

Reply

Kylie Zhiqi Liu


Sep 11, 2019

Thanks for the great article! Would love to read more about SAE. The 8th reference from Eric Wilkinson is not
available any more. Do you know where can I find his article or paper?

Reply
More from Venkata Krishna Jonnalagadda

Venkata Krishna Jonnalagadda Venkata Krishna Jonnalagadda

Understanding Shap Values The Value of Protecting Privacy


(SHapley Additive exPlanations) and Information Technology
for Machine
Shapley Learning
values, often referred to as Shap INTRODUCTION:
values, are a concept from cooperative game
theory that has been adapted for use in
machine…
May 31, 2023 4 Jun 23, 2019 6

Venkata Krishna Jonnalagadda

Object Detection YOLO v1 , v2, v3


Object detection reduces the human efforts
in many fields. In our case, we are using YOLO
v3 to detect an object. YOLO v3 has
DARKNET-53…
Jan 31, 2019 134 1

See all from Venkata Krishna Jonnalagadda

Recommended from Medium


Mohana Roy Chowdhury Jo Wang

Applications of Bayes Theorem: KL Autoencoder


Divergence and ELBO An autoencoder is a type of neural network
Let’s test out all the math we’ve discussed so used primarily for unsupervised learning.
far!

Oct 23, 2024 138 Oct 21, 2024 4

Lists

Predictive Modeling w/ Practical Guides to Machine


Python Learning
20 stories · 1829 saves 10 stories · 2203 saves

Natural Language Processing The New Chatbots: ChatGPT,


1943 stories · 1595 saves Bard, and Beyond
12 stories · 549 saves
Ajay Sharma Eleventh Hour Enthusiast

GAN (Generative Adversarial Denoising Diffusion Probabilistic


Networks) - Popular GenAI Models
Algorithm
Exploring the World of GAN Paper Review

Oct 16, 2024 79 1 Sep 9, 2024 2

Hanen LM Po

Variational Autoencoder (VAE) Mastering Denoising Diffusion


What is Variational Autoencoder (VAE) ? Probabilistic Models (DDPMs)
This article delves into the fascinating world
of Denoising Diffusion Probabilistic Models
(DDPMs), a cutting-edge generative AI
Nov 22, 2024 algorithm…
Aug 20, 2024 24

See more recommendations

Help Status About Careers Press Blog Privacy Terms Text to speech Teams

You might also like