0% found this document useful (0 votes)
42 views6 pages

Music Genre Classification

The document discusses automatic music genre classification using machine learning techniques. It describes how a dataset containing 1000 audio tracks across 10 genres is split into training and test sets. Feature vectors are extracted from the audio tracks using techniques like MFCC and zero-crossings analysis. A convolutional neural network is trained on the training set's feature vectors and labels to perform classification, and its performance is evaluated on the unseen test set. The goal is to develop an automatic system that can classify music tracks by genre at a large scale, which has applications in music recommendation and organization.

Uploaded by

harshukayt
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
42 views6 pages

Music Genre Classification

The document discusses automatic music genre classification using machine learning techniques. It describes how a dataset containing 1000 audio tracks across 10 genres is split into training and test sets. Feature vectors are extracted from the audio tracks using techniques like MFCC and zero-crossings analysis. A convolutional neural network is trained on the training set's feature vectors and labels to perform classification, and its performance is evaluated on the unseen test set. The goal is to develop an automatic system that can classify music tracks by genre at a large scale, which has applications in music recommendation and organization.

Uploaded by

harshukayt
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 6

Music Genre Classification

Abstract: A music genre is a conventional category The popular music genres are Blues, Classical,
that identifies some pieces of music as belonging to Country, Disco, Hip-Hop, Jazz, Metal, Pop, Reggae
a shared tradition or set of conventions. It is to be and Rock.
distinguished from musical form and musical style.
Music can be divided into different genres in many Music has also been divided into Genres and
different ways. The popular music genres are Pop, sub genres not only on the basis on music but also
Hip-Hop, Rock, Jazz, Blues, Country and Metal. on the lyrics as well. This makes music genre
classification difficult. Also the definition of
Categorizing music files according to their genre is music genre has changed over time. For
a challenging task in the area of music information instance, pop songs that were made fifty years ago
retrieval (MIR). Automatic music are different from the pop songs we have today.
genre classification is important to obtain music Fortunately, the progress in music data and
from a large collection. It finds applications in its storage has improved considerably over the
the real world in various fields like automatic past few years.
tagging of unknown piece of music (useful for apps
like Saavn, Wynk etc.). Since manually classifying each track of a large
music database according to their genre is a tedious
Companies nowadays use music classification, task, Machine Learning Techniques to perform
either to be able to place recommendations to their Automatic Music Genre Classification are used.
customers or simply as a product. Determining
music genres is the first step in the process of music
Rest of this paper is organized as follows. Section II
recommendation. Most of the current music
deals with the Literature Review done to write this
genre classification techniques use machine
paper, this section consists of the important
learning techniques.
takeaways gathered after studying different works
in the same field. Section III provides an
The same principles are applied in Music Analysis
overview about the Dataset that was used to
also. Machine Learning techniques have proved to
carry out this research work. Section IV deals with
be quite successful in extracting trends and
the high level design of the work. Section V
patterns from the large pool of data.
deals with the structure of the Convolutional
neural network which performs the classification.
I. Introduction Section VI contains the experiment results.
In today’s world, an individual’s music Section VII consists of the conclusion and
collection future work.
generally contains hundreds of songs, while the
professional collection normally contains tens of II. Literature Review
thousands of music files. Music databases Music Genre Classification is an area which
are incessantly gaining reputation in relations has
to specialized archives and private sound attracted the interest of many researchers. This
collections. With improvements in internet services section will provide details about some of the
and increase in network bandwidth there is also research work already done in this field.
an increase in number of people accessing the Vishnupriya S and K Meenakshi [1] have proposed
music database. Dealing with extremely large a Neural Network Model to perform the
music databases is exhausting and time classification. Tzanetakis and Cook [2]
consuming. pioneered their work on music genre
classification using machine learning algorithm.
Most of the music files are stored according to the They created the GTZAN dataset which is till
song title or the artist name. This may cause trouble date considered as a standard for genre
in searching for a song related to a specific genre. classification. Changsheng Xu et al [3] have
shown how to use support vector machines
(SVM) for this task. Matthew Creme
Charles Burlin, Raphael Lenain from Jazz 100
Stanford University [4] have used 4 different Metal 100
methods to perform the classification. They have Pop 100
Reggae 100
used Support Vector Machines, Neural Networks, Rock 100
Decision Trees and K-Nearest Neighbours
methods to perform classification. Tao [5] Total 1000
shows the use of restricted Boltzmann machines
and arrives to better results than a generic
multilayer neural network by generating more Data Pre-Processing was done in the following
data out of the initial dataset, GTZAN. manner:
After carrying out the above mentioned 1. Database of the complete collection was created
literature survey, Convolutional Neural Network and stored in a .csv file.
is used to perform classification and the details
of the same are explained in the following sections. 2. Feature Vector Extraction is done using
the libROSA package in python as shown in
III. About the Dataset figure 1. libROSA is a python package for music
GTZAN Genre Collection dataset was used and audio analysis which provides the
to building blocks necessary to create music
perform the classification. The dataset has been information retrieval systems.
taken from the popular software
framework MARSYAS. Marsyas (Music 3. Each audio file is taken and from that, its feature
Analysis, Retrieval and Synthesis for Audio vector is extracted. The extracted feature vector is
Signals) is an open source software framework called MFCC(Mel-Frequency Cepstral
for audio processing with specific emphasis on Coefficients). The MFCCs as shown in figure 3
Music Information Retrieval applications. It has encode the timbral properties of the music signal
been designed and written by George Tzanetakis by encoding the rough shape of the log-
([email protected]). Marsyas has been used for a power spectrum on the Mel frequency scale.
variety of projects in both academia and industry. A Zero- Crossings graph is plotted as shown in
figure 4 for each audio track. This graph
Dataset consists of 1000 audio tracks each 30 visualizes the number of times the signal crosses
seconds long. It contains 10 genres(Blues, zero level.
Classical, Country, Disco, Hip-Hop, Jazz, Metal,
Pop, Reggae and Rock), each represented by 100 4. As shown in the following figure, Fourier
tracks. The tracks are all 22050Hz Mono 16-bit Transforms are applied on the music signal.
audio files in A Frequency Spectrum is thus obtained. Mel
.wav format. Scale Filtering is applied on the frequency
spectrum to obtain a Mel Frequency
Spectrum. A log() function is applied on
Table 1. Distribution of the Dataset this Mel Frequency Spectrum which is
transformed into Cepstral Coefficients on
applying discrete cosine transforms. Finally,
Genre Number of tracks the Feature Vector is obtained by finding out
the derivatives of the Cepstral Coefficients.
Blues 100
Classical 100
Country 100
Disco 100
Hip-Hop 100
Fig 1: Extraction of Feature Vector

Fig 2: Sample Frequency Spectrum

Fig 3:Sample Mel Frequency Spectrum


Fig 4: Zero Crossings

IV. High Level Design 4. Each track from the test dataset is also
pre- processed and a feature vector is extracted
1. The dataset is split into two parts, Training for the same.
data and Test data.
5. The trained Neural Network model operates
2. Each track from the train dataset is on the feature vector obtained at the end of
pre- processed and a feature vector is step 4 to perform classification on test data.
extracted for the same. A Feature Vector
Database is generated from the extracted 6. Finally, output is genre of the music track.
feature vectors.
High Level Design of the system is shown in figure
3. The Neural Network model is trained using 5:
the obtained feature vector database.

Fig 5: High Level Design

V. Convolutional Neural Network CNN has various layers such as Convolutional


layers, ReLU layers, Pooling layers and a fully
CNN is a Deep Learning algorithm which can take an connected layer as shown in figure 6. CNN is
input image as input, assign importance (learnable widely used for image classification because
weights and biases) to various aspects/objects in it does automatic feature extraction using
the image and be able to differentiate one from the convolution.
other. A
Fig 6: Architecture of CNN used in model

VI. Results For the GTZAN dataset, the model we


used achieved a training accuracy of about 98%
To train the Convolutional Neural Network, an 80% and validation accuracy of 73% as shown in figure
- 20% splitting strategy was used for training 7 and figure 8.
and
testing respectively. Python was the language used to develop the
model. A number of packages such as keras,
The accuracy of the model is calculated using: numpy, pandas were used to build the model.
Experiment is done on the google colab platform.
Tensorflow package is used for deep-learning.

EPOCH EPOCH

Fig 7: Accuracy Fig 8: Loss

VII. Conclusion and Future Work


The extension of this work would be to
This research work provides the details of an consider bigger data sets and also tracks in
application which performs Music Genre different formats(mp3, au etc). Also, with time
Classification using Machine Learning techniques. the style represented by each genre will continue to
The application uses a Convolutional Neural Network change. So the objective for the future will be to
model to perform the classification. A Mel Spectrum stay updated with the change in styles of genres and
of each track from the GTZAN dataset is obtained. extending our software to work on these updated
This is done by using the libROSA package of python. styles. This work can also be extended to
A piece of software is implemented which performs work as a music recommendation system
classification of huge database of songs into their depending on the mood of the person.
respective genres.

You might also like