Music Generation Using Recurrent Neural Networks
Music Generation Using Recurrent Neural Networks
https://doi.org/10.22214/ijraset.2022.48200
International Journal for Research in Applied Science & Engineering Technology (IJRASET)
ISSN: 2321-9653; IC Value: 45.98; SJ Impact Factor: 7.538
Volume 10 Issue XII Dec 2022- Available at www.ijraset.com
Abstract: Over the past years music has been continually evolving through the its tempos and beats as well as it’s melody.
Traditionally, music is produced by a group of musicians with different instruments combining together to create a final
synchronised product. In recent years, harmonies and beats were always considered to be generated manually. However, with
the advent of digital technologies and software, it has become possible for machines to generate music automatically at an
alarming pace. The purpose of this research is to propose a method for creating musical notes using Recurrent Neural
Networks (RNN), specifically Long Short-Term Memory (LSTM) networks. To implement this algorithm, a model is created and
the data is represented in the form of a percussive instrument digital interface (MIDI) file for easy access and interpretation. The
process of preparing the data for input into the model is also discussed, as well as techniques for receiving, processing,
and storing MIDI files for use as input.To enhance its learning capabilities, the model should be able to remember previous
details of a musical sequence and its structure. This paper discusses the use of a layered architecture in the LSTM model and
how its connections interweave to create a neural network.
Index Terms: MIDI, RNN, LSTM, Music Generation.
I. INTRODUCTION
Recent research and development in music technology have made it possible to replicate artists using deep learning and similar
technologies. These AI models are referred to as Generative Adversarial Networks or GANs. They learn howto perform complex
tasks as they process large data sets. Music does not have a set dimension because it is a series ofnotes and chords. Traditional deep
neural network approaches cannot be used to make music because they presume fixed di- mensionality of inputs and targets/outputs
and independence of outputs. As a result, it was evident that a domain-independent Identify applicable funding agency here. If
none, delete this. approach that learns to map sequences to sequences would be beneficial. Music is an art that employs the science
of harmony of the pitch to achieve the beauty of sound. It is frequently used as a form of artistic expression and may be heard
without theuse of any non-musical instruments. The term ‘note’ refers toa single unit of musical measurement, denoted by the sign
‘’. With or without the presence or addition of other notes, notes might behave differently. Music is organised in the form of notes,
chords, or even a scale. Music has its sense to it since it brings pleasure to the ear. It can be frustrating since thereare different
genres of music out there.
The genetic algorithm is a method for creating music by building upon existing compositions. Evolutionary algorithms may
emphasise the powerful rhythm in each fragment and merge them into separate pieces of music. However, it isinefficient since each
iteration step includes a latency. Addi- tionally, it is difficult to obtain coherent and detailed rhythmic information due to the lack of
context. To solve the preceding scenario, we need a system that can remember the previous note sequence, forecast the following
one, etc. Recurrent Neural Networks, namely Long, and Short-Term Memory RNNs, are deployed.
We presented an algorithmic approach to music generation. Learning musical genres from any current instrumental beat to beat
rhythm may be used to generate a new music pattern. This sensitive objective demands the system’s ability to apprehend previous
knowledge to extract musical elements and project future musical style patterns. Furthermore, the system must learn and alter the
original musical patterns.
This study presents an algorithm based on LSTM networks (a type of neural network) that can be used to autonomously generate
music and melodies without human intervention. Themain goal is to develop a model that can learn from a series of musical
notes, analyze them, and then generate a set of high-quality musical sounds.
We expand the use of RNNs to create a music gener- ator by combining character-based neural language models such as
Character RNN and LSTM cells , which include bi-directionality and attention.
The ABC notation is used to express music in text format. Other more expressive audio for-mats, such as MIDI and Mel Frequency
Cepstral Coefficients (MFCCs), exist, however they are primarily sound-based and may not be transcribable to sheet music.
©IJRASET: All Rights are Reserved | SJ Impact Factor 7.538 | ISRA Journal Impact Factor 7.894 | 1352
International Journal for Research in Applied Science & Engineering Technology (IJRASET)
ISSN: 2321-9653; IC Value: 45.98; SJ Impact Factor: 7.538
Volume 10 Issue XII Dec 2022- Available at www.ijraset.com
B. Deep Recurrent Music Writer: Memory-enhanced Varia-tional Auto encoder based Musical Score
The authors developed a new metric to evaluate the quality of generated music and used it to assess the outputs of a
Variational Auto-encoder-based generative model for au- tomated music composition. They applied this measure to systematically
and automatically adjust the parameters and architectures of the generative model to optimize the musical outputs in terms of their
similarity to a particular musical style.
IV. OBJECTIVES
Our system aims to create a model that can compose melodies and rhythms automatically without human input. The model is able
to remember previous data descriptions and generate harmonious music using a single-layered LSTM model. It is capable of
learning harmonic and melodic note sequences from MIDI files of piano notes.
We want to investigate the effect of increasing the number of LSTM units and testing different hyper-parameter values on the
model’s performance. We believe that far more studies could further optimise this model using a substantial number of
computations.
The primary objective of autonomous music creation is to aid musicians in composing and refining music. Another important
consideration is the amount of oversight used for the prototype. There is total autonomy and mechanisation at one end of the scale
with no human input.
It might also be more interactive, with early stops included in the model to monitor the music generation process. This paper’s
neural network technique is designed to be non-interactive. The MIDI file format is also designed for this dimension since it
providesa full final result that is machine-accessible without human contact.
©IJRASET: All Rights are Reserved | SJ Impact Factor 7.538 | ISRA Journal Impact Factor 7.894 | 1353
International Journal for Research in Applied Science & Engineering Technology (IJRASET)
ISSN: 2321-9653; IC Value: 45.98; SJ Impact Factor: 7.538
Volume 10 Issue XII Dec 2022- Available at www.ijraset.com
The degree of autonomy is an intriguing perspective advancement for genuine musicians who can rupture the sys- tem in the middle
of content creation. Flexibility in user input is another important fundamental objective. Our architecture is adaptable: the user
could change the number of layers, hidden layers size, sequence length, time steps, batch size, optimisation algorithm, and learning
rate. Users may further adjust the window of notes fed into the note-axis LSTM model as well as the number of time steps input
into the model.
We can produce high-quality music, but there is still consid-erable room for improvement. For now, the music is created by only
one instrument. It would be fascinating to hear what music the model could produce if taught to play multiple instruments. A
technique that can handle unfamiliar notes in music may be added to the model. By filtering unknown notes and replacing them
with known notices, the algorithm can generate more robust quality music. We want to use this approach in the future to produce
mood-based music based on user input. Music has been seen in studies to aid people with dementia and Parkinson’s. In conclusion,
this system mayproduce music based on the patient’s demands.
V. EXISTING SYSTEM
It has frequently been suggested that a musical moment maybe defined by an emotional influx and outflow of energy. Con-sidered a
universal quality of music, tension may be influencedby a variety of musical elements, including pitch range, noise level tempo, note
density, harmonious relationships, and latent demands.
The music system employs a multi-level biomimetic design that informally separates actions from processing (adaptive mappings
by the RL algorithm) and sensing (the reward function) (changes of musical parameters). This indicates a de- velopment above the
typical separation of sensing, processing,and response paradigm that was at the foundation of earlier artificial intelligence models.
The musical agent must select a series of musical gestures (musical parameters) that will heighten the tension heard by the
spectator.
The supervised learning algorithm must select what musical action to take based on the emotional data provided by the listener
in real-time in order to solve this active reinforcement learning (RL) challenge.
©IJRASET: All Rights are Reserved | SJ Impact Factor 7.538 | ISRA Journal Impact Factor 7.894 | 1354
International Journal for Research in Applied Science & Engineering Technology (IJRASET)
ISSN: 2321-9653; IC Value: 45.98; SJ Impact Factor: 7.538
Volume 10 Issue XII Dec 2022- Available at www.ijraset.com
Fig(1) : The basic flow of the proposed system is depicted in Fig (1). The first thing the solution does is generate a list of
notes derived from the training MIDI files. Following that, it defines a variable for each note. A dictionary is created, which
assigns unique integers to all notes across all octaves. Following that, a new dictionaryis created that maps strings to previously
declared integers ranging from 0 to 9. After this, a 100-note sequence is created.
Fig. 2
Fig(2) : To train a model, we adopt a character level-based architecture. So, at the present time step ‘tt’, each input note in the
sound file is used to forecast a next note in the file, i.e., each LSTM cell accepts the previous time step activation (at1) and the
previous layer actual output (yt1) as input. The following diagram depicts this (Fig 2).
.
Fig. 3
Fig(3) : At each sampling step, the activation and cell state from the preceding LSTM cell will propagate into the following cell,
where they will be used to generate a fresh output.
©IJRASET: All Rights are Reserved | SJ Impact Factor 7.538 | ISRA Journal Impact Factor 7.894 | 1355
International Journal for Research in Applied Science & Engineering Technology (IJRASET)
ISSN: 2321-9653; IC Value: 45.98; SJ Impact Factor: 7.538
Volume 10 Issue XII Dec 2022- Available at www.ijraset.com
©IJRASET: All Rights are Reserved | SJ Impact Factor 7.538 | ISRA Journal Impact Factor 7.894 | 1356
International Journal for Research in Applied Science & Engineering Technology (IJRASET)
ISSN: 2321-9653; IC Value: 45.98; SJ Impact Factor: 7.538
Volume 10 Issue XII Dec 2022- Available at www.ijraset.com
©IJRASET: All Rights are Reserved | SJ Impact Factor 7.538 | ISRA Journal Impact Factor 7.894 | 1357
International Journal for Research in Applied Science & Engineering Technology (IJRASET)
ISSN: 2321-9653; IC Value: 45.98; SJ Impact Factor: 7.538
Volume 10 Issue XII Dec 2022- Available at www.ijraset.com
X. ACKNOWLEDGMENT
We thank Vishwakarma University’s Faculty of Science andTechnology for providing us with this opportunity to present our study
and research to the industry experts.
Last but not least, we would like to thank our connections from the industry who guided us, friends who supported us, and
respondents for their active participation encouragement and willingness to spend time on our proposed system.
REFERENCES
[1] Agrawal, Shipra, and Navin Goyal. ”Analysis of Thompson sampling for the multi-armed bandit problem.” In Conference on Learning Theory , pp. 39-1. 2012.
[2] Nicolas Boulanger-Lewandowski, Yoshua Bengio, and Pascal Vincent. Modeling temporal dependencies in high-dimensional sequences: Appli- cation to
polyphonic music generation and transcription. ICML, 2012..
[3] Douglas Eck and Jurgen Schmidhuber. A first look at music compositionusing lstm recurrent neural networks. IDSIA.
[4] Sigurur Skúli “Music Generation Using a LSTM Neural Network in Keras” https://towardsdatascience.com/how-to-generate-music-using- a-lstm-neural-
networkin- keras-68786834d4c5 (2017). (Accessed on 16/10/22-22:00)
[5] Hochreiter, S., Bengio, Y., Frasconi, P., and Schmidhuber, J. (2001). Gradient flow in recurrent nets: the difficulty of learning long-term dependencies. In
Kremer, S. C. and Kolen, J. F., editors, A Field Guide to Dynamical Recurrent Neural Networks. IEEE Press.
[6] P erez-Ortiz, J. A., Gers, F. A., Eck, D., and Schmidhuber, J. (2002). Kalman filters improve lstm network performance in problems unsolv- able by traditional
recurrent nets. Neural Networks. Accepted pending minor revisions.
[7] Christopher Olah. “Understanding LSTM Networks.” https://colah.github.io/posts/2015-08-Understanding-LSTMs/ (2015) (Accessed on 16/10/22-22:00)
[8] Akshay Sood “Long Short-Term Memory” https://pages.cs.wisc.edu/ shavlik/cs638/lectureNotes/Long(percent- sign)20Short Term(percent-
sign)20Memory(percent- sign)20Networks.pdf (2016) (Accessed on 16/10/22-22:00)
[9] PDillon Ranwala “The Evolution of Music and AI Technology” https://wattai.github.io/blog/music-ai-evolution (2020) (Accessed on16/10/22-22:00)
[10] Other piano data source: http://www.piano-midi.de/midi-files.htm (Ac- cessed on 16/10/22-22:00)
[11] https://github.com/vishnubob/python-midi (Accessed on 16/10/22-22:00)
[12] N. Rafal Jozefowicz, Ilya Sutskever, and Wojciech Zaremba. “An Empirical Exploration of Recurrent Network Architectures.” (2015)
[13] P. Y. Nikhil Kotecha, ”Generating Music using an LSTM Network,” arXiv.org,vol. arXiv:1804.07300, 2018.
[14] T. a. H. G. Tieleman, ”Lecture 6.5-rmsprop: Divide the gradient by a running average of its recent magnitude,” COURSERA: Neural networks for machine
learning, vol. 4, no. 2, pp. 26-31, 2012.
[15] D. P. W. E. B. W. a. P. L. T. Bertin-Mahieux, ”The million song dataset,”International Society for Music Information Retrieval (IS- MIR’11), vol. 2, no. 9, p.
10,2011.
[16] ”Musical-Matrices,” 10 July 2016. [Online] Available: https://github.com/dshieble/Musical-Matrices/tree/master/Pop-Music- Midi.(Access ed on 16/10/ 22
17:00).
[17] C. B. Browne, ”System and method for automatic music generationusing a neural network architecture,” Google Patents, 2001.
[18] Michael Phi “Illustrated Guide to LSTM’s and GRU’s: A Step by Step Explanation.” https://towardsdatascience.com/illustrated-guide-to-lstms- and-gru-sa-step-
by-step- explanation-44e9eb85bf21 (2018)
[19] N. Boulanger-Lewandowski, Y. Bengio, and P. Vincent, “Modeling temporal dependencies in high-dimensional sequences: Application to polyphonic music
generation and transcription,” arXiv preprint arXiv:1206.6392, 2012.
©IJRASET: All Rights are Reserved | SJ Impact Factor 7.538 | ISRA Journal Impact Factor 7.894 | 1358