Music Genre Classification
Music Genre Classification
Abstract: A music genre is a conventional category The popular music genres are Blues, Classical,
that identifies some pieces of music as belonging to Country, Disco, Hip-Hop, Jazz, Metal, Pop, Reggae
a shared tradition or set of conventions. It is to be and Rock.
distinguished from musical form and musical style.
Music can be divided into different genres in many Music has also been divided into Genres and
different ways. The popular music genres are Pop, sub genres not only on the basis on music but also
Hip-Hop, Rock, Jazz, Blues, Country and Metal. on the lyrics as well. This makes music genre
classification difficult. Also the definition of
Categorizing music files according to their genre is music genre has changed over time. For
a challenging task in the area of music information instance, pop songs that were made fifty years ago
retrieval (MIR). Automatic music are different from the pop songs we have today.
genre classification is important to obtain music Fortunately, the progress in music data and
from a large collection. It finds applications in its storage has improved considerably over the
the real world in various fields like automatic past few years.
tagging of unknown piece of music (useful for apps
like Saavn, Wynk etc.). Since manually classifying each track of a large
music database according to their genre is a tedious
Companies nowadays use music classification, task, Machine Learning Techniques to perform
either to be able to place recommendations to their Automatic Music Genre Classification are used.
customers or simply as a product. Determining
music genres is the first step in the process of music
Rest of this paper is organized as follows. Section II
recommendation. Most of the current music
deals with the Literature Review done to write this
genre classification techniques use machine
paper, this section consists of the important
learning techniques.
takeaways gathered after studying different works
in the same field. Section III provides an
The same principles are applied in Music Analysis
overview about the Dataset that was used to
also. Machine Learning techniques have proved to
carry out this research work. Section IV deals with
be quite successful in extracting trends and
the high level design of the work. Section V
patterns from the large pool of data.
deals with the structure of the Convolutional
neural network which performs the classification.
I. Introduction Section VI contains the experiment results.
In today’s world, an individual’s music Section VII consists of the conclusion and
collection future work.
generally contains hundreds of songs, while the
professional collection normally contains tens of II. Literature Review
thousands of music files. Music databases Music Genre Classification is an area which
are incessantly gaining reputation in relations has
to specialized archives and private sound attracted the interest of many researchers. This
collections. With improvements in internet services section will provide details about some of the
and increase in network bandwidth there is also research work already done in this field.
an increase in number of people accessing the Vishnupriya S and K Meenakshi [1] have proposed
music database. Dealing with extremely large a Neural Network Model to perform the
music databases is exhausting and time classification. Tzanetakis and Cook [2]
consuming. pioneered their work on music genre
classification using machine learning algorithm.
Most of the music files are stored according to the They created the GTZAN dataset which is till
song title or the artist name. This may cause trouble date considered as a standard for genre
in searching for a song related to a specific genre. classification. Changsheng Xu et al [3] have
shown how to use support vector machines
(SVM) for this task. Matthew Creme
Charles Burlin, Raphael Lenain from Jazz 100
Stanford University [4] have used 4 different Metal 100
methods to perform the classification. They have Pop 100
Reggae 100
used Support Vector Machines, Neural Networks, Rock 100
Decision Trees and K-Nearest Neighbours
methods to perform classification. Tao [5] Total 1000
shows the use of restricted Boltzmann machines
and arrives to better results than a generic
multilayer neural network by generating more Data Pre-Processing was done in the following
data out of the initial dataset, GTZAN. manner:
After carrying out the above mentioned 1. Database of the complete collection was created
literature survey, Convolutional Neural Network and stored in a .csv file.
is used to perform classification and the details
of the same are explained in the following sections. 2. Feature Vector Extraction is done using
the libROSA package in python as shown in
III. About the Dataset figure 1. libROSA is a python package for music
GTZAN Genre Collection dataset was used and audio analysis which provides the
to building blocks necessary to create music
perform the classification. The dataset has been information retrieval systems.
taken from the popular software
framework MARSYAS. Marsyas (Music 3. Each audio file is taken and from that, its feature
Analysis, Retrieval and Synthesis for Audio vector is extracted. The extracted feature vector is
Signals) is an open source software framework called MFCC(Mel-Frequency Cepstral
for audio processing with specific emphasis on Coefficients). The MFCCs as shown in figure 3
Music Information Retrieval applications. It has encode the timbral properties of the music signal
been designed and written by George Tzanetakis by encoding the rough shape of the log-
([email protected]). Marsyas has been used for a power spectrum on the Mel frequency scale.
variety of projects in both academia and industry. A Zero- Crossings graph is plotted as shown in
figure 4 for each audio track. This graph
Dataset consists of 1000 audio tracks each 30 visualizes the number of times the signal crosses
seconds long. It contains 10 genres(Blues, zero level.
Classical, Country, Disco, Hip-Hop, Jazz, Metal,
Pop, Reggae and Rock), each represented by 100 4. As shown in the following figure, Fourier
tracks. The tracks are all 22050Hz Mono 16-bit Transforms are applied on the music signal.
audio files in A Frequency Spectrum is thus obtained. Mel
.wav format. Scale Filtering is applied on the frequency
spectrum to obtain a Mel Frequency
Spectrum. A log() function is applied on
Table 1. Distribution of the Dataset this Mel Frequency Spectrum which is
transformed into Cepstral Coefficients on
applying discrete cosine transforms. Finally,
Genre Number of tracks the Feature Vector is obtained by finding out
the derivatives of the Cepstral Coefficients.
Blues 100
Classical 100
Country 100
Disco 100
Hip-Hop 100
Fig 1: Extraction of Feature Vector
IV. High Level Design 4. Each track from the test dataset is also
pre- processed and a feature vector is extracted
1. The dataset is split into two parts, Training for the same.
data and Test data.
5. The trained Neural Network model operates
2. Each track from the train dataset is on the feature vector obtained at the end of
pre- processed and a feature vector is step 4 to perform classification on test data.
extracted for the same. A Feature Vector
Database is generated from the extracted 6. Finally, output is genre of the music track.
feature vectors.
High Level Design of the system is shown in figure
3. The Neural Network model is trained using 5:
the obtained feature vector database.
EPOCH EPOCH