Multimedia_generation_using_neural_network_DeepDream
Multimedia_generation_using_neural_network_DeepDream
2023 International Conference on Advances in Electronics, Communication, Computing and Intelligent Information Systems (ICAECIS) | 979-8-3503-4805-7/23/$31.00 ©2023 IEEE | DOI: 10.1109/ICAECIS58353.2023.10170138
network DeepDream
Komala C R Krishan Kumar Bhushan Prasanna Hridaynadan Anthony
Dept. of Information Science and Dept. of Information Science and Dept. of Information Science and
Engineering Engineering Engineering
HKBK College of Engineering, HKBK College of Engineering, HKBK College of Engineering,
Bengaluru, India Bengaluru, India Bengaluru, India
[email protected] [email protected] [email protected]
Abstract— The field of machine learning and artificial to shed light on the functioning of the brain. Convolutional
intelligence is growing rapidly. The deep neural network is a neural networks are a particular type that are mostly used for
field of ML which is showing greater possibility. The deep vision and are capable of identifying objects in data and
neural network has been widely acknowledged as being in a identifying patterns. These research, however, are limited to
golden age and advancing with the advent of new technologies.
It is slowly becoming the leader of the technological world in
picture identification tasks, where decisions are made using
artificial intelligence. The models which are built by machine the conditional probabilities for the argmax of the class.
learning algorithms with great accuracy help in every sector of Additionally, their artificial visuals lack realism and are
human evolution. Image and audio processing is done through challenging to view. Since it is mostly bound by category
many algorithms to intensify its behavioral pattern. The content to produce diversity. Therefore, straight use of these
development of modern art is largely dependent on the painters existing data-free methodologies for model compression is not
who are developing the style. The Deep Dream algorithm can possible. In contrast to popular belief, computation creative
also do the artistic creation of the image and audio. The deep research frequently emphasizes that the act of creation is a
dream algorithm which googles engineer Alexander rational process, a by product of insanity. In order to do this,
Mordvintsevin built in the year 2015. It enhances the patterns in
the multimedia through algorithmic pareidolia. It creates a
we'll first go over the findings from art history about how
dream-like effect. It gives a new vision of hallucinations to hallucinations affect creativity Hence, this paper will examine
images. This could be used in the health sector for the purpose the psychological development of emotion-based creativity,
of detecting diseases and defects through scans of the patients. whether it be normal or not. One the one hand, there is some
It is redefining lower-definition multimedia to higher-definition freedom to explain how the human brain employs
multimedia. Previous model were able to only generate one of hallucinations after recognising how little of a part
the multimedia format. Proposed system will be able to create hallucinations have in such a process. This will enable us to
multimedia based on DeepDream and link different formats determine the functional requirements for a computer model
together so that they appear to be generated together. of hallucination. By explicitly modelling the 3D reality that
lies behind 2D visuals, the machines themselves can behave
Keywords— Audio Convolutional neural network (CNN), more intelligently. Rendering is the process of creating a
Imagination, Artificial intelligence (AI), Images, semantic picture from a 3D environment. This is critical to computer
segmentation, syntactic lexicon vision because it sits on the dividing line between the three -
dimensional world and 2D images.
I. INTRODUCTION [23] explains model: Model Train A model must deliver a
According to [23] creativity is dataset with past data for training from which to discover
“The ability to use your imagination to produce new ideas, patterns. a test model Under this model, the trained dataset
make things etc.” and imagination is “the ability to form will be evaluated. also forecast the accurate output value.
pictures or ideas in your mind”. Customer Input To accurately anticipate the value of the
[24]The work was done by the google engineer stocks, the user will specify either the current date or a future
Mordvintsev in 2015 which led to development of date here.
DeepDream . It has tremendous applications :It uses a
combination of 3 steps to give dreams like1. It is used to read
the conventional neural network of the input image in the
perceptual task . It covers a large portion of the stimulus space
The main visual system, which distinguishes lines and edges,
is where a picture is formed from a mixture of visual
experiences. The brain's regions fill in the form, identify
shadows, and construct noses, eyes, and faces using this basic
drawing, passing it along like an assembly line in a factory.
Our memories and our language help us put together the final
image, which enables us to categorize and organize images Fig. 1. Layers of Machine learning
into, for instance, various breeds of cats and dogs. Artificial
neural networks are made to mimic how the brain operates and
167
Authorized licensed use limited to: COLLEGE OF ENGINEERING - Pune. Downloaded on February 06,2025 at 12:34:11 UTC from IEEE Xplore. Restrictions apply.
Machine learning is a branch of artificial intelligence that
focuses on developing algorithms that enable computers to
learn from data and form inferences or conclusions without
being explicitly programmed. One popular method of machine
learning is neural networks, which are modelled on the
composition and function of the human brain. (a) An illustration of a
(b) An example of a straightforward
Layers of interconnected nodes or neurons make up neural synthetic neuron unit.
feedforward neural network. The
networks. Each neuron receives information from other The input signals are xi,
single output node is blue, the one
and the weights, wi, must
neurons, processes it, and then sends the outcome to neurons be learned. After adding
input node is red, the 3 hidden units
in the layer below. In order to learn to recognise patterns in are green, the 3 bias nodes are grey,
up all of the input signals
and there are 5 input nodes.
data, the neural network modifies the weights of connections and their weights, the
between neurons during training. activation function is
A branch of machine learning called deep learning makes then applied.
use of neural networks with numerous layers, usually more
Fig. 2.: Simple units that are then merged to form complex networks
than three. As a result, they can learn to identify intricate make up neural networks.[19]
patterns in huge datasets and have attained cutting-edge
performance in tasks like audio and image recognition and
natural language processing. To suit the data they are given, machine learning
The artificial neural network is one of the neural network programmes modify their internal parameters. These
architectures that is most frequently used in deep learning. An computer programmes are still created by software engineers,
input layer, one or more hidden layers, and an output layer are but they are written in a way that allows changes to be made
all present in these networks. In order to figure out the without requiring a complete re-programming of the system.
optimum way to map inputs to outputs, the weights of Machine learning programmes should get better as they
connections between neurons are changed throughout receive more data. Statistics and machine learning are linked
training. fields. Some methods attempt to find models based on well-
A trained artificial neural network is used in DeepDream, known distributions directly. Statistics and machine learning
a specialised deep learning application, to produce visuals. In are linked fields. While some methods are more broad, others
order to produce images that contain the properties and directly search for models that are based on the developer's
patterns the network learnt during training, the technique well-known distribution assumptions.
entails modifying an input image to optimise the activation of People outside of the industry frequently believe that
certain neurons in the network. While DeepDream is only one machine learning programme creators are ignorant of what
deep learning application, it exemplifies how capable artificial their programmes are doing. It is quite clear in the sense that
neural networks are of producing original and surprising the developer could compute the same answer as the
outcomes. computer does give the identical input with only a pen, a lot
of paper, and a calculator. of course, and a lot of time. In the
II. EXITING SYSTEM sense that it is challenging to forecast how an algorithm
would operate without giving it a try, it is not understood. But
A. BASICS OF MACHINE LEARNING to expect an electrical engineer to describe how a computer
Programming machines to solve issues is the traditional operates is analogous. The electrical engineer could possibly
method of doing so with software. The task is broken down acquire the necessary information, but it would take a lot of
into as few subtasks as feasible, the subtasks are examined, time and effort to piece together such a sophisticated system
and the machine is given instructions to use human-designed from its component parts.
algorithms to process the input and produce the desired
output. However, this strategy is impractical for other B. BASICS OF CONVOLUTION NEURRAL NETWORKS
applications, such as object recognition. For a human to Convolutional neural networks (CNNs), also known as
consider every possible object, lighting scenario, rotational deep learning, are frequently used in computer vision
variation, and scene layout, let alone model them all, there are applications such as object and image detection.. CNNs are
simply too many possibilities. However, a lot more data has made to process incoming data in a manner that resembles
become accessible in recent years because to the internet, how the human visual system works. In order to extract
low-cost computers, cameras, crowdsourcing websites like information at various scales and orientations from the input
Wikipedia and many others, services like Amazon image, they use a sequence of convolutional layers. A
Mechanical Turk, and other developments. Making sequence of fully connected layers are then applied to the
advantage of this data is the goal of machine learning. output from the convolutional layers to conduct classification
A formal definition of the field of Machine Learning is or regression using the features that were previously learned..
defined by Tom Mitchel [1]:
A software application is said to learn from experience E In many computer vision applications[28], CNNs have
in connection to a specific class of tasks T and performance gained cutting-edge performance in a variety of tasks, such as
measure P if its performance at tasks in T, as measured by P, semantic segmentation, object detection, and image
increases with experience E. classification. They have been used in many different
projects, such as those involving driverless vehicles and
medical image analysis..
[26]Gradient-based learning applied to document
recognition by Yann LeCun et al., published in the
Proceedings of the IEEE in 1998, is one of the earliest and
most significant studies on CNNs. In order to show the
168
Authorized licensed use limited to: COLLEGE OF ENGINEERING - Pune. Downloaded on February 06,2025 at 12:34:11 UTC from IEEE Xplore. Restrictions apply.
potential of the method, this work offered the first successful It has gained notoriety online [11]. The images are often
application of CNNs to handwritten character recognition. generated in iterations, with each iteration zooming in on the
Since then, significant advancements and enhancements have image.
been made to LeCun et aground breaking .’s work, making You may view the images and videos that the Google
CNNs one of the most well-known and commonly utilized developers have posted at [5]. The original image from which
deep learning techniques in computer vision[27]. Figure 3 was produced using the deep dream algorithm is
shown in Figure 2.
C. GOOGLE DEEPDREAM
It is commonly known that the gradient descent technique D. ARTISTIC STYLE IMITATION
optimises the majority of the parameters in neural networks. The principle behind neural networks is that each layer of
It’s challenging to gauge the impact it has on the recognition the network learns a unique representation of the data. This is
system, though. simple to visualise in the case of CNNs, as demonstrated in
[2] suggests a method to examine the weights that such a numerous articles [6]. In most cases, the network learned to
network learns. [3] used a related concept. construct edge detectors in the bottom layer and more intricate
Take a neural network that was taught to identify different structures in the upper levels.
visuals, like bananas, as an example. With this method, Gatys, Ecker, and Bethge demonstrated in [7] that it is
random noise is introduced and the network is turned on its possible to distinguish between a picture’s content and general
side. The random noise image is gradually modified until it style in terms of local image appearance. They provide
produces the output “banana” in order to examine what the evidence for their thesis by embellishing a random image of
network believes bananas to look like. The modifications can their choice with several artists’ aesthetics.
also be limited so that the statistics of the input image must
resemble those of real images. One illustration of this is the
correlation between nearby pixels.
A different method Is to boost the output of layers. This is
explained in [2]:
Network is asked: “Whatever you see there, I want
more of it!” This results in a feedback loop: if a cloud
resembles a bird just a little, the network will intensify that
similarity. The network will then recognise the bird much (a) Original Image (b) Style Image
more clearly on subsequent passes as a result, and so on,
until a highly detailed bird suddenly appears.
The science-fiction film “Inception” inspired the term
“Inceptionism,” which appears in the title of [2] (2010). It
could be used, for example, because neural networks have
layers. More layers are frequently found in recent works
[4]. They get “deeper,” to borrow the term.
The method is known as Google DeepDream because it
was published by Google developers.
(c) The artistic style of Van Gogh’s “Starry Night” applied to the
photograph of a Scottish Highland Cattle.
Fig. 5. To achieve the output, the algorithm uses both the original
image and the style image.
Authorized licensed use limited to: COLLEGE OF ENGINEERING - Pune. Downloaded on February 06,2025 at 12:34:11 UTC from IEEE Xplore. Restrictions apply.
technical details of the DeepDream algorithm, IV. TYPES OF DATA
including how it works and how it was implemented
using the Caffe framework. A. Iamage Data
• "The DeepDream project" by Alexander It is possible to immediately apply a rudimentary neural
Mordvintsev: This is the official website for the network to visual data, but the number of parameters becomes
DeepDream project, which includes documentation, incredibly huge.
code, and examples of DeepDream images. Per pixel and channel, one neuron would be used.
• "Visualizing and Understanding Convolutional Accordingly, one would receive 750,000 input signals for
Networks" by Matthew D. Zeiler and Rob Fergus: RGB images measuring 500 px by 500 px. So-called
This paper, which was published in the journal Convolutional Neural Networks (CNNs) were developed to
"arXiv" in 2013, presents a technique for visualizing tackle this issue. These networks employ convolution layers
the features learned by a convolutional neural rather than learning the entire connection between the input
network, which was an important inspiration for the layer and the first hidden layer.
DeepDream algorithm. Convolution layers pick up the weights of an image filter
• "DeepDream: A Software Suite for Visualizing by learning a convolution. Another benefit is that CNNs
Neural Networks" by Alexander Mordvintsev, employ spatial relationships between pixels rather than
Christopher Olah, and Mike Tyka: This paper, reducing a picture to a continuous stream of discrete values.
which was published in the journal "ACM [12] provides a good introduction to CNNs.
SIGGRAPH 2015 Talks" in 2015, presents an
overview of the DeepDream algorithm and its B. Audio Data
applications. Speaker identification, speech recognition, and song
• "Gradient Descent" by C. M. Bishop, published in identification are examples of common machine learning
the journal "Neural Networks" in 1995: This paper tasks that use audio data. This brings up some uncommon but
presents an overview of gradient descent, a intriguing subjects, such as the creation of music and the
commonly used optimization technique in machine artistic synthesis of audio.
learning. It covers both the basic algorithm and
some of the more advanced techniques that have • Emily Howell
been developed to improve its performance. Experiments in Musical Intelligence, sometimes
• Dr. Komala C R, International Journal of Photo known as EMI or Emmy, was a project started by
Energy, Hindawi, Volume 2022, Article ID David Cope in 1984 [13]. He introduces the notion that
1048378, September 2022.: The storefront's size and music can be viewed as a language that can be studied
the appearance of the building would stay unaltered, using natural language processing (NLP) techniques.
but the heat loss would be substantially greater. We When Cope utilized the system to "produce little
won't have to be concerned about ledges, ceilings, or phrase-size textures as next possibilities utilizing its
floors getting damaged when renovating this syntactic lexicon and rule base," EMI was more
manner. Apartments typically utilise between 110 valuable to him, according to Cope [13].
and 130 kW of heating power per square metre. A Cope began work on a new project based on EMI:
novel Energy Distribution Technique for computing Emily Howell in 2003 [14]. "Creating both highly
light energy was created using the deep learning authentic replications and unique music
model. A building that uses little energy and costs compositions," according to this software.
little to construct is said to be energy-efficient. The The reader might wish to check out [15] to get a sense
annual demand for heat would be decreased as a of how beautiful the music is generated.
result. [25] Cope claims that "a set of instructions for generating
• Dr. Komala C R, “Pre indication of Stock Price various, but highly linked self-replications" is a
Using Prolonged Petite-Stretch Memory (PPSM) fundamental component of music. Emmy was
Algorithm”, International Journal of Mechanical designed to locate this list of guidelines. It looks for a
Engineering (IJME), ISSN: 0974-5823, Vol/Pages: composer's "signature," which Cope defines as
7/1626-1633, Jan 2022: Share value, whether it is a "contiguous patterns that recur in two or more of the
profit or loss, plays a significant part in determining composer's works."
the accuracy of financing and share value. The value Emily Howell differs from Emmy in that she doesn't
of shares in various companies aids in estimating the always stick to one particular, well-established style.
possible earnings of the company. It strongly Utilizing the association network is Emily Howell.
advises stock predictions. As most features are low, Cope is clear that this is not a neural network in any
high, or open/closed, this is accomplished using a way. However, [14] does not explicitly state how an
machine learning-based report that predicts values association network is trained. In [16], Emily Howell
using the Extended Small Memory (PPSM) rule. is described in full, according to Cope.
The low, high, open, and success of the desired
instructional technique are the most crucial aspects C. GRUV
to take into account when utilising the PPSM [17] A network that can be trained to produce music is
formula to determine stock price values.[22] constructed using gated recurrent units (GRU) and
recurrent neural networks, namely LSTM networks. Raw
audio waveforms were used as input by Nayebi and Vitelli
rather than written notes or MIDI files. The audio
waveforms in question are feature vectors with time steps
of 0, 1,..., t 1, t. These feature vectors X1,..., Xt are
170
Authorized licensed use limited to: COLLEGE OF ENGINEERING - Pune. Downloaded on February 06,2025 at 12:34:11 UTC from IEEE Xplore. Restrictions apply.
provided to the network, and it is required to forecast the algorithm boost the activations). algorithm decide
feature vector Xt+1. This implies that the song keeps which layers and which neurons should fire more
playing. intensely. Similarly it could increase the qualities of
The issue was modelled as a regression task because the the audio utilising the CNN .
input is continuous. The music was broken up into chunks of Step 3. Until the input image and audio have all the features
length N, and discrete Fourier transformation (DFT) was that particular layer was initially looking for, this
applied to extract information from the frequency domain. process is repeated.
Both an implementation and a demonstration may be
obtained at [20] and [18], respectively D. Feature Map
V. AUDIO SYNTHESIZATION
Authorized licensed use limited to: COLLEGE OF ENGINEERING - Pune. Downloaded on February 06,2025 at 12:34:11 UTC from IEEE Xplore. Restrictions apply.
study the impact of visual hallucinations on creativity. http://neuralnetworksanddeeplearning.com/chap6.html# introducing
Converting 2D images into 3D content is one of the convolutional networks
[13] D. Cope, “Experiments in music intelligence (emi),” 1987. [Online].
difficulties in computer vision. You can make 3D meshes Available: http://hdl.handle.net/2027/spo.bbp2372.1987. 025
off of a single image utilising rendering technique, such as [14] ——, “The well-programmed clavier: Style in computer music
modelling the 3D world using images on devices. These composition,” XRDS: Crossroads, The ACM Magazine for Students,
efforts are being driven by recent advancements in deep vol. 19, no. 4, pp. 16–20, 2013. [Online]. Available:
http://dl.acm.org/citation.cfm?id=2460444
learning insights, including convolution neural networks and [15] ——, “Emily howell fugue,” YouTube, Oct. 2012. [Online]. Available:
methodologies. Deep Dream, a technique for employing https://www.youtube.com/watch?v=jLR- c uCwI
CNNs to create fresh pictures, was also investigated.. [16] ——, Computer models of musical creativity. MIT Press Cambridge,
2005.
[17] A. Nayebi and M. Vitelli, “GRUV: Algorithmic music generation using
Throughout all media, the process of making films from recurrent neural networks,” 2015. [Online]. Available:
speech, text, and image sources is comparable. Using selected http://cs224d.stanford.edu/reports/NayebiAran.pdf
input characteristics that are coupled to form a video output [18] M. Vitelli, “Algorithmic music generation with recurrent neural
after being mapped using a feature map, a prototype system networks,” YouTube, Jun. 2015. [Online]. Available:
https://youtu.be/0VTI1BBLydE
has been developed. [19] D. Johnson, “Biaxial recurrent neural network for music composition,”
GitHub, Aug. 2015. [Online]. Available: https:
REFERENCES //github.com/hexahedria/biaxial-rnn-music-composition
[1] T. M. Mitchell, Machine learning, ser. McGraw Hill series in computer [20] M. Vitelli and A. Nayebi, “GRUV,” Aug. 2015. [Online]. Available:
science. McGraw-Hill, 1997. https://github.com/MattVitelli/GRUV
[2] A. Mordvintsev, C. Olah, and M. Tyka, “Inceptionism: Going deeper [21] ——, “Composing music with recurrent neural networks,” Personal
into neural networks,” googleresearch.blogspot.co.uk, Jun. 2015. Blog, Aug. 2015. [Online]. Available:
[Online]. Available: http://googleresearch.blogspot.de/ http://www.hexahedria.com/2015/08/03/ composing-music-with-
2015/06/inceptionism-going-deeper-into-neural.html recurrent-neural-networks/
[3] C. Vondrick, A. Khosla, T. Malisiewicz, and A. Torralba, “Hoggles: [22] Dr. Komala C R, “Pre indication of Stock Price Using Prolonged
Visualizing object detection features,” in Computer Vision (ICCV), Petite-Stretch Memory (PPSM) Algorithm”, International Journal of
2013 IEEE International Conference on. IEEE, 2013, pp. 1–8. [Online]. Mechanical Engineering (IJME), ISSN: 0974-5823, Vol/Pages:
Available: http://ieeexplore.ieee.org/ xpls/abs 7/1626-1633, Jan 2022, https://kalaharijournals.com/resources/181-
all.jsp?arnumber=6751[10]9 200/IJME_Vol7.1_200.pdf
[4] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image [23] A. Gadsby, Ed., Dictionary of Contemporary English. Pearson
recognition,” arXiv preprint arXiv:1512.03385, 2015. [Online]. Education Limited, 2006.
Available: http://arxiv.org/abs/1512.03385 [24] Creativity in Machine Learning Martin Thoma
[5] “Inceptionism: Going deeper into neural networks,” Google Photos, https://arxiv.org/pdf/1601.03642.pdf
Jun. 2015. [Online]. Available: https://goo.gl/Bydofw [25] Dr. Komala C R, “Deep Learning for an Innovative Photo Energy
[6] M. D. Zeiler and R. Fergus, “Visualizing and understanding Model to Estimate the Energy Distribution in Smart Apartments”,
convolutional networks,” in Computer Vision–ECCV 2014. Springer, Hindawi, International Journal of Photo energy, Volume 2022, Article
2014, pp. 818–833. ID 1048378, 29th September 2022,
[7] L. A. Gatys, A. S. Ecker, and M. Bethge, “A neural algorithm of artistic https://doi.org/10.1155/2022/1048378
style,” arXiv preprint arXiv:1508.06576, 2015. [Online]. Available: [26] LeCun, Y., Bengio, Y., & Hinton, G. (2015). Deep learning. Nature,
http://arxiv.org/abs/1508.06576 521(7553), 436-444. https://www.nature.com/articles/nature14539
[8] J. Johnson, “neural-style,” GitHub, Jan. 2016. [Online]. Available: [27] He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep residual learning
https://github.com/jcjohnson/neural-style for image recognition. In Proceedings of the IEEE conference on
[9] Y. Shih, S. Paris, C. Barnes, W. T. Freeman, and F. Durand, “Style computer vision and pattern recognition (pp. 770-778).
transfer for headshot portraits,” ACM Transactions on Graphics https://ieeexplore.ieee.org/document/7780459
(TOG), vol. 33, no. 4, p. 148, 2014. [Online]. Available: [28] Wang, X., Girshick, R., Gupta, A., & He, K. (2018). Non-local neural
http://dl.acm.org/citation.cfm?id=260[11]37 networks. In Proceedings of the IEEE conference on computer vision
[10] Y. Shih, “Style transfer for headshot portraits,” YouTube, Jun. 2014. and pattern recognition (pp. 7794-7803).
[Online]. Available: https://www.youtube.com/watch?v= https://arxiv.org/abs/1711.07971
Hj5lGFzlubU
[11] “Deepdream,” Reddit. [Online]. Available: https://www.reddit.
com/r/deepdream/ .
[12] M. A. Nielsen, Neural Networks and Deep Learning. Determination
Press, 2015. [Online]. Available:
172
Authorized licensed use limited to: COLLEGE OF ENGINEERING - Pune. Downloaded on February 06,2025 at 12:34:11 UTC from IEEE Xplore. Restrictions apply.