Guide To YAMNet - Sound Event Classifier
Guide To YAMNet - Sound Event Classifier
com/)
(https://ad.doubleclick.net/ddm/trackclk/N5295.2790902ANALYTICSINDIA/B26064305.307666412;dc_trk_aid=500518083;dc_trk_cid=153698222
(https://praxis.ac.in/data-science-program/?utm_source=AIM&utm_medium=banner&utm_campaign=DS-July2021)
T
ransfer Learning (https://analyticsindiamag.com/transfer-learning-using-tensorflow-
keras/) is a well-liked and popular machine learning technique in which one can train a
model by reusing information learned from a previously existing model. You must have
heard and read about common applications of transfer learning in the vision domain – training
models to accurately classify images and do object detection (https://analyticsindiamag.com/object-
detection-using-tensorflow/) or text-domain – sentiment analysis
(https://analyticsindiamag.com/getting-started-sentiment-analysis-tensorflow-keras/) or
question answering (https://analyticsindiamag.com/10-question-answering-datasets-to-build-
robust-chatbot-systems/), and the list goes on …
We will learn how to apply transfer learning for a new(relatively) type of data: audio, by making a
(https://analyticsindiamag.com/)
sound classifier. There are many vital use cases of sound classification, such as detecting whales and
other creatures using sound as a necessity to travel, protecting wildlife from poaching and
encroachment, etc.
Let’s better understand this amazing model with a practical use case and hands-on Python
Implementation.
Importing tensorflow-hub for leveraging the pre-trained model, wavfile for storing an audio file.
import tensorflow as tf
import numpy as np
import csv
Instantiating the pre-trained model to a variable using hub.model method for usage in the below
cells. The labels file will also be loaded from model assets and is present at model.class_map_path().
We require this to load it on the class_names variable later on.
model = hub.load('https://tfhub.dev/google/yamnet/1')
Helper_Function_1
This helper function is created to find the name of the class with the top score when mean-
aggregated across frames.
# Find the name of the class with the top score(https://analyticsindiamag.com/)
when mean-aggregated across
frames.
def class_names_from_csv(class_map_csv_text):
class_names = []
reader = csv.DictReader(csvfile)
class_names.append(row['display_name'])
return class_names
class_map_path = model.class_map_path().numpy()
class_names = class_names_from_csv(class_map_path)
Helper_Function_2
We need to add this method to verify and convert the loaded audio to the proper sample_rate which
is 16K. This is mentioned in the YAMNet paper (https://arxiv.org/pdf/2002.12764.pdf) by the authors,
as it can adversely affect our model results.
desired_sample_rate=16000):
desired_length = int(round(float(len(waveform)) /
original_sample_rate * desired_sample_rate))
The Colab notebook will have all links required; you just have to run the notebook provided.
!curl -O https://storage.googleapis.com/audioset/speech_whistling2.wav
!curl -O https://storage.googleapis.com/audioset/miaow_16k.wav
We can also listen to a sample audio file from the downloaded data set and check its properties by
applying the following snippet. As shown below, we can play the sample audio file and look at some
information about this particular audio file.
# wav_file_name = 'speech_whistling2.wav'
wav_file_name = 'miaow_16k.wav'
duration = len(wav_data)/sample_rate
Audio(wav_data, rate=sample_rate)
(https://an
OPINIONS (HTTPS://ANALYTICSINDIAMAG.COM/CATEGORY/ARTICLES/)
alyticsindi
Why Do Companies Prefer Pre-Trained Models?
amag.com/
(https://analyticsindiamag.com/why-do-companies-prefer-pre-
why-do- trained-models/)
companies
-prefer-
pre-
trained-
models/)
We are converting the wave data into numbers to feed to the pre-trained model. The model will give
us scores , embeddings, and spectrograms as output that we can later display. Helper_Function_1
gives us the output as “Animal” which means that this is the label with the maximum number of
audio files in our dataset.
scores_np = scores.numpy()
spectrogram_np = spectrogram.numpy()
infered_class = class_names[scores_np.mean(axis=0).argmax()]
Plotting the three outputs we got from running the model, namely scores
(https://audio2score.github.io/), embeddings (https://ccrma.stanford.edu/events/learning-audio-
embeddings-signal-representation-audio-transformation-understanding) and spectrograms
(https://www.izotope.com/en/learn/understanding-spectrograms.html).
plt.figure(figsize=(10, 6))
(https://analyticsindiamag.com/)
# Plot the waveform.
plt.subplot(3, 1, 1)
plt.plot(waveform)
plt.xlim([0, len(waveform)])
plt.subplot(3, 1, 2)
# Plot and label the model output scores for the top-scoring classes.
top_n = 10
top_class_indices = np.argsort(mean_scores)[::-1][:top_n]
plt.subplot(3, 1, 3)
And here is your sound classifier using the pretrained model YAMNet. I recommend trying out this
model with different datasets from open source and those present in the links present in this blog.
This is a peculiar use case that will enhance your skillset in Deep Learning.
References:
Add a comment...
Join Our Telegram Group. Be part of an engaging online community. Join Here
(https://t.me/joinchat/NJLxnhZB7GkX3CPvjs9QGQ).
Subscribe to our Newsletter (https://analyticsindiamag.com/)
Get the latest updates and relevant offers by sharing your email.
Mudit is experienced in machine learning and deep learning. He is an undergraduate in Mechatronics and worked as
a team lead (ML team) for several Projects. He has a strong interest in doing SOTA ML projects and writing blogs on
data science and machine learning.
SHARE (https://www.facebook.com/sharer.php?u=https://analyticsindiama
(https://twitter.com/intent/tweet?text=Guide%20to%20YAMNet%20:%20Sound%20Event%20Classifier&via=Analyticsindiam&url=h
TWEET
sound-event-classifier/)
(https://www.linkedin.com/cws/share?url=https://analyticsindiamag.com/guide-to-yamnet-sound-event-classifier/)
(https://wa.me/?text=Guide%20to%20YAMNet%20:%20Sound%20Event%20Classifier%20https://analyticsindiamag.com/guide-to-yamn
(mailto:?
subject=Guide%20to%20YAMNet%20:%20Sound%20Event%20Classifier&body=Guide%20to%20YAMNet%20:%20Sound%20Event%20C
to-yamnet-sound-event-classifier/)
(https://share.flipboard.com/bookmarklet/popout?v=2&title=Guide%20to%20YAMNet%20:%20Sound%20Event%20Classifier&url=https:/
sound-event-classifier/)
(https://analyticsindiamag.com/)
(https://herovired.com/promotions/IDMA_PT/?utm_source=AIM&utm_medium=bannerAIM&utm_campaign=AIMCampaign)
(https://business.louisville.edu/learnmore/msba-india?utm_campaign=MSBA-
INDIA&utm_source=analyticsindia&utm_medium=display&utm_keyword=analyticsindia&utm_content=GetPaid)
Webinar
Virtual Conference
RELATED POSTS
DEVELOPERS CORNER
(HTTPS://ANALYTICSINDIAMAG.COM/CATEGORY/DEVELOPERS_CORNER/)
using-inceptionresnet_v2/)
13/07/2021 · 6 MINS READ
DEVELOPERS CORNER
(HTTPS://ANALYTICSINDIAMAG.COM/CATEGORY/DEVELOPERS_CORNER/)
Guide To Question-Answering
System With T5 Transformer
(https://analyticsindiamag.com/guide- (https://analyticsi
to-question-
to-question-answering-system- answering-
system-with-t5-
with-t5-transformer/) transformer/)
DEVELOPERS CORNER
(HTTPS://ANALYTICSINDIAMAG.COM/CATEGORY/DEVELOPERS_CORNER/)
(https://analyticsindiamag.com/python-googles-t5-
transformer-
guide-to-googles-t5- for-text-
summarizer/)
transformer-for-text-
summarizer/)
26/06/2021 · 15 MINS READ
DEVELOPERS CORNER
(HTTPS://ANALYTICSINDIAMAG.COM/CATEGORY/DEVELOPERS_CORNER/)
learning-using-tensorflow- tensorflow-
keras/)
keras/)
11/05/2021 · 10 MINS READ
DEVELOPERS CORNER
(HTTPS://ANALYTICSINDIAMAG.COM/CATEGORY/DEVELOPERS_CORNER/)
(https://analyticsindiamag.com/what-
(https://analyticsindiamag.com/)
when-google-
threw-all-voice-
happened-when-google-threw- data-to-the-
blender-answer-
all-voice-data-to-the-blender- speechstew/)
answer-speechstew/)
20/04/2021 · 3 MINS READ
DEVELOPERS CORNER
(HTTPS://ANALYTICSINDIAMAG.COM/CATEGORY/DEVELOPERS_CORNER/)
worth-of-prompts-in-pre- prompts-in-pre-
trained-models/)
trained-models/)
22/03/2021 · 2 MINS READ
(HTTPS://FACEBOOK.COM/ANALYTICSINDIAMAGAZINE)
(HTTPS://TWITTER.COM/ANALYTICSINDIAM)
(HTTPS://INSTAGRAM.COM/ANALYTICSINDIAMAGAZINE)
ABOUT US(HTTPS://ANALYTICSINDIAMAG.COM/ABOUT/)
(HTTPS://PINTEREST.COM/ANALYTICSINDIAM)
(https://analyticsindiamag.com/)
ADVERTISE(HTTPS://ANALYTICSINDIAMAG.COM/ADVERTISE-WITH-US/) (HTTPS://YOUTUBE.COM/CHANNEL/UCALWRSGEJAVG1VW9QSFOUMA)
(HTTPS://MEDIUM.COM/ANALYTICS-INDIA-MAGAZINE)
WRITE FOR US(HTTPS://ANALYTICSINDIAMAG.COM/WRITE-FOR-US/)
(HTTPS://WWW.LINKEDIN.COM/COMPANY/ANALYTICS-INDIA-
MAGAZINE)
COPYRIGHT(HTTPS://ANALYTICSINDIAMAG.COM/COPYRIGHT-TRADEMARKS/)
(HTTPS://T.ME/NJLXNHZB7GKX3CPVJS9QGQ)
PRIVACY(HTTPS://ANALYTICSINDIAMAG.COM/PRIVACY-POLICY/)
TERMS OF USE(HTTPS://ANALYTICSINDIAMAG.COM/TERMS-USE/)
CONTACT US(HTTPS://ANALYTICSINDIAMAG.COM/CONTACT-US/)