Lionel Adi
Lionel Adi
Abstract − Recommendations are a very useful tool in many industries. Recommendations provide the best selection of
what the user wants and provide satisfaction compared to ordinary searches. In the music industry, recommendations are
used to provide songs that have similarities in terms of genre or theme. There are various kinds of genres in the world of
music, including pop, classic, reggae and others. With genre, the difference between one song and another can be heard
clearly. This genre can be analyzed by spectrogram analysis. Convolutional Neural Network(CNN) is a neural network
algorithm that is commonly used to recognize and classify image data. In this study, an image spectrogram analysis was
developed which will be the input feature for the Convolutional Neural Network. CNN will classify and provide song
recommendations according to what the user wants. In addition, testing was carried out with two different architectures
from CCN, namely VGG-16 and RESNET-50. From the results of the study obtained, the best accuracy results were
obtained by the VGG-16 model with 20 epochs with accuracy 60%, compared to the RESNET-50 model with more than
20 epochs. The results of the recommendations generated on the test data obtained a good similarity value for VGG-16
compared to RESNET-50.
JISA (Jurnal Informatika dan Sains) (e-ISSN: 2614-8404) is published by Program Studi Teknik Informatika, Universitas Trilogi
under Creative Commons Attribution-ShareAlike 4.0 International License.
69
JISA (Jurnal Informatika dan Sains) e-ISSN: 2614-8404
Vol. 05, No. 01, June 2022 p-ISSN:2776-3234
module. From the research that has been done, the accuracy Based on the research that has been done previously,
is 98%. this study will carry out a music genre recommendation
Research of music types Classification using the CNN process using the GZTAN dataset which is composed of 10
algorithm has also been carried out by analyzing types of genres, where the music data is first processed
spectrogram images. The spectrogram image that has been using a spectrogram. The results will be classified using the
generated from the music, then deep learning process will CNN algorithm with RESNET50 and VGG16 architecture.
be carried out by using the CNN model. Based on the The results of the recommendations generated will be
research that has been done, it is found that the use of 35 tested whether they are in accordance with the song desired
epochs has an optimal accuracy of 81.33%. When by the user.
compared with the KNN method, CNN produces a better
level of accuracy[1]. Other research on spectrogram II. RESEARCH METHODOLOGY
analysis for music genre classification using CNN and Mel-
spectograms has been carried out and the test results depend The method used in this research is the dataset
on the number of datasets, training iterations and computer preparation process, pre-processing, spectrogram,
specifications greatly affect the level of accuracy and classification process and calculating similarity using
duration of modeling. The resulting accuracy is very cosine similiarity.
optimal in classifying music genres, which is 99% for the A. Dataset
RELU activation function and 95% for ELU[7]. This research uses a dataset in the form of spectrogram
Music recommendations based on genre have also been images taken from the GTZAN dataset. To simplify the
carried out using the Convolutional Recurrent Neural classification of music data using a neural network, it is
Network. Where in this study also uses a spectrogram and necessary to change the music data data into a mel-
analyzes it using CRNN. This study also compared the use spectrogram to be processed by the Neural Network.
of CRNN and CNN methods to classify music genres. From GTZAN consists of music data and Mel spectrogram
the research results, it is found that CRNN which takes into results from that music file. Where this dataset is a public
account the frequency and time sequence features has better dataset that is widely used for evaluating the introduction
performance than CNN[8]. Research on next-song of music genres (Music Genre Recognition / MGR).
recommendation has also been carried out, where Neural GTZAN is a collection of music collected from 2000-2001,
network has performed well in all types of tests. In this which comes from various sources such as CDs, radio and
study it was concluded that the NN-based next-song microphone recordings. This dataset consists of 10 genres,
recommenders, CNN-rec, NN-rec and Word2Vec, namely blues, classical, country, disco, hip-hop, jazz,
outperform the non-NN based ones[9]. In this research metal, pop, reggae and rock. The duration of each of these
demonstrate that the NN-based next-song recommenders, music is 30 seconds. Each genre contains 100 music files.
which combine users’ general preference and sequential The number of datasets used in this study is divided into 3
listening patterns, have the highest performance. parts : training data, validation data and test data. With
Music recommendation using deep content also done details in each section as follows:
by A¨aron van den Oord dkk[10]. In their research showed
that recent advances in deep learning translate very well to Table 1. Number dataset used on each class
the music recommendation setting in combination with Genre Training Validation Testing
data data data
approach used in this study, with deep convolutional neural Blues 80 10 10
networks significantly outperforming a more traditional Classical 80 10 10
approach using bag-of-words representations of audio Country 80 10 10
signals. Also other research on music recommendation Disco 80 10 10
done by using user behaviour [11]. The approach Hiphop 80 10 10
Jazz 80 10 10
considered genre, recording year, freshness, favor and time
Metal 80 10 10
pattern as factors to recommend songs. The evaluation Pop 80 10 10
results demonstrate that the approach is effective. Regae 80 10 10
Research on music recommendations by genre is Rock 80 10 10
carried out by comparing several machine learning
algorithms such as KNN, RF, NB, DT dan B. Spectogram
SVM[12]. According to the results summarized in this The spectrogram is a visual representation of the
research, SVM achieved better classification results than frequency spectrum of the signal[14]. In the GTZAN
other methods. In addition, changing the window size and dataset, spectrograms have been generated and stored in
window type caused very small performance changes. their respective classes. Before being entered into the CNN
Research about music recommendation using network, this data is further divided into training data,
similarity between using decided genre value and using validation data and test data. Each of these spectrogram
feature vector distance also have been done by Jonseol Dee images will be included in the array, then labeled according
et al. In their paper, proposed a recommendation system to their respective index folders. Then after being given a
based on a preference classification using real-time user label, the data will be appended into an array to make it
brainwaves and genre feature classification. Proposed easier to pass the data. From the spectrogram image there
user’s preference clasifier achieved an overall accuracy of are many values and features of the music file that can be
81.07%[13]. displayed. The following is an example of an illustration of
the spectrogram of each class in this study.
JISA (Jurnal Informatika dan Sains) (e-ISSN: 2614-8404) is published by Program Studi Teknik Informatika, Universitas Trilogi
under Creative Commons Attribution-ShareAlike 4.0 International License.
70
JISA (Jurnal Informatika dan Sains) e-ISSN: 2614-8404
Vol. 05, No. 01, June 2022 p-ISSN:2776-3234
JISA (Jurnal Informatika dan Sains) (e-ISSN: 2614-8404) is published by Program Studi Teknik Informatika, Universitas Trilogi
under Creative Commons Attribution-ShareAlike 4.0 International License.
71
JISA (Jurnal Informatika dan Sains) e-ISSN: 2614-8404
Vol. 05, No. 01, June 2022 p-ISSN:2776-3234
There are 2 different processes will be carried out using the convolution process repeatedly. The resulting model output
VGG-16 and Resnet 50 architectures. is shown as shown below.
After the model is obtained from the training process
using two different architectures. Then the process of
finding similarities between feature vectors is carried out
using cosine similarity. The application will display a
recommendation of 5 songs that match those in the
validation data. Where the recommendation process is
carried out by calculating the value of the similarity of
features between one music and another. The first process
is to choose music from each genre that will be used as the
basis for the recommendation system. Then the forecast
from the music base is calculated based on an artificial
neural network. The cosine similarity value is calculated
from the 2 featured vector being compared. To calculate the
similarity of 2 pieces of music with the number of features
N, where the first music has a feature vector
x=[x1,x2,x3….xn] and the second music has a feature
vector y=[y1,y2,y3,…yn] then the formula which is used as
follows:
Figure 6. CNN output model
JISA (Jurnal Informatika dan Sains) (e-ISSN: 2614-8404) is published by Program Studi Teknik Informatika, Universitas Trilogi
under Creative Commons Attribution-ShareAlike 4.0 International License.
72
JISA (Jurnal Informatika dan Sains) e-ISSN: 2614-8404
Vol. 05, No. 01, June 2022 p-ISSN:2776-3234
the lowest classification was obtained in the disco class. Where the results obtained are still lower than the VGG16
model in classifying classes from the music dataset used.
.
Figure 10. RESNET50 Accuracy value
For the testing process, the steps taken are the same as
the process in the VGG16 model, that is looking for feature
extraction from the test image and looking for its cosine
similarity with feature extraction from the training dataset.
So that the results of the music spectrogram
recommendations are obtained in accordance with the
testing dataset used. The following is in Figure 12 the
results of 5 image similarities from the tested test data. It
Figure 9. Recommendation output for VGG16 model
can be seen that the results of the spectrogram
While on RESNET50, it has the same testing process recommendation are quite good, only the level of similarity
as VGG16. After the experiment, the training process with is lower than the VGG16 model.
RESNET50 requires a larger epoch to get better accuracy
results. In this study, quite good accuracy results were
obtained at epochs of 30 for RESNET50. The picture below
shows the calculation results of Precision, recall, f1-score
and accuracy on the RESNET50 model. The resulting
Accuracy value is slightly lower than the VGG16 model
with a larger number of epochs. The results of the confusion
matrix for the CNN-RESNET50 model are shown in Fig.
JISA (Jurnal Informatika dan Sains) (e-ISSN: 2614-8404) is published by Program Studi Teknik Informatika, Universitas Trilogi
under Creative Commons Attribution-ShareAlike 4.0 International License.
73
JISA (Jurnal Informatika dan Sains) e-ISSN: 2614-8404
Vol. 05, No. 01, June 2022 p-ISSN:2776-3234
JISA (Jurnal Informatika dan Sains) (e-ISSN: 2614-8404) is published by Program Studi Teknik Informatika, Universitas Trilogi
under Creative Commons Attribution-ShareAlike 4.0 International License.
74