0% found this document useful (0 votes)

11 views

Presentation Slides

The document discusses urban sound classification using machine learning models. It analyzes the UrbanSound dataset containing labeled urban sound clips from 10 classes. Various feature extraction techniques are tested to represent the sound data, including MFCC. Models tested include naive Bayes, random forest, SVM, deep neural networks, recurrent neural networks, and convolutional neural networks. The challenges of data variability, overfitting, and achieving high accuracy are discussed.

Uploaded by

nguyen hung

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

11 views

Presentation Slides

Uploaded by

nguyen hung

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 13

URBAN SOUND

CLASSIFICATION
CHIH-WEI CHANG & BENJAMIN DORAN
OVERVIEW
• Dataset
• What is the data
• Feature extraction
• Models
• Types
• Variability of data
• Overfitting
• Accuracy
MODELS
• We tested: • Libraries Used:
• Naive Bayes • Sci-Kit Learn
• Random forest • Tensorflow
• SVM • Librosa(for handling .wav file)
• Deep Neural Networks
• Recurrent Neural Networks
• Convolutional Neural Networks
The UrbanSound Dataset
• Created by Justin Salamon & Christopher Jacoby & Juan Pablo Bello

• Contains 8732 labeled sound excerpts (shorter than 4s) of real field-recording urban
sounds from 10 classes : (1).air conditioner, (2).car horn, (3).children playing, (4). Dog
bark, (5). Drilling, (6). Engine idling, (7). Jackhammer, (8) gun shot, (9) siren, and (10.)
street music.

• The largest free urban sound dataset.

Feature Extraction for Sound Data
• Most classification problems have data that can be easily expressed as vector forms.
But when it comes to sound data, feature extraction is not straightforward.
Air conditioner Car horn

Children playing Dog barks

Drilling Engine idling

•
Feature Extraction
Mel Frequencies Cepstral Coefficient (MFCC) are features that is widely in automatic speech
and speech recognition.

• Steps to extract features include:

1. Frame the signal into shorter frames. (20-40 ms)

2. For each frame, calculate the periodogram estimate of power spectrum

3. Apply the mel filterbank, sum the energy in each filter

4. Take logarithm of the filterbank energy.

Challenge in Feature Extraction
• Get equal size of features from sound files of different length, different resolutions, and
different number of channel.

• Our 3 approaches:
Approach 1: Extract “characteristics” of each sound clips, so the number of
characteristics is independent from original data shape.

Approach 2: Zero Padding to make shorter samples longer.

Approach 3: Having different density of filterbanks in each sample.

Shorter samples have denser filterbanks so all the sample can have the same number of
filterbanks.
MODELS TYPES:
• We tested: • Libraries Used:
• Naive Bayes • Sci-Kit Learn
• Random forest • Tensorflow
• SVM • Librosa
• Deep Neural Networks (for handling .wav file)
• Recurrent Neural Networks • Python_speech_features
• Convolutional Neural Networks (for handling .wav file)
VARIABILITY OF DATA
Data Set Total Size per Sample
Dimensions
Raw data(not modeled) 2D 1D: Variable length
Flattened Zero-padded MFCC 2D 1D: 27600 features
193features 2D 1D: 193 features
Const_shape MFCC 3D 2D:20 rows 41 features
Const_shape LogMFCC 3D 2D:20 rows 41 features

• Sklearn Is only able to take 2D data, meaning we needed to flatten data such as the MFCC data
that gave us additional rows per sample.

• Sklearn and TensorFlow are both unable to handle variable sized data meaning we often
needed to use zero padding.
• const_MFCC: Random Forest 52%, SVM 55%, NB 20%
ACCURACY • feature193: Random Forest 61%, SVM 60%, NB 24%
ACCURACY
OVERFITTING

• With all models overfitting was a serious issue.

• Neural Nets needed regularization on all layers
• DeepNN's did best with 80% dropout on each layer.
• No Neural Nets with more than 3 hidden layers.
• DeepNN and RNN had most trouble, followed by CNN.
• However, DeepNN was also the easiest to compensate for the overfitting.

• Next Step would be increasing dataset size …

Training Curve Plot

EN-WS232UP - HART MODEM With 24VDC and RL
No ratings yet
EN-WS232UP - HART MODEM With 24VDC and RL
2 pages
Inspection: - Cylinder Block
100% (1)
Inspection: - Cylinder Block
6 pages
Urban Sound Classification PaperV2
No ratings yet
Urban Sound Classification PaperV2
6 pages
Nietjet 0602S 2018 003
No ratings yet
Nietjet 0602S 2018 003
5 pages
Sound Classification
No ratings yet
Sound Classification
5 pages
SIH_FALKON_PARADOX_final
No ratings yet
SIH_FALKON_PARADOX_final
6 pages
05 -- Feature Engineering (Text)
No ratings yet
05 -- Feature Engineering (Text)
28 pages
UrbanSound8K Dataset: Automatic Sound Recognition (ASR) Project with CNN and ANN Models
No ratings yet
UrbanSound8K Dataset: Automatic Sound Recognition (ASR) Project with CNN and ANN Models
31 pages
Aiml Project List
No ratings yet
Aiml Project List
10 pages
Music Genre Classification
No ratings yet
Music Genre Classification
33 pages
Acoustic Feature Analysis For ASR: Instructor: Preethi Jyothi
No ratings yet
Acoustic Feature Analysis For ASR: Instructor: Preethi Jyothi
34 pages
CS229 Final Report - Music Genre Classification
No ratings yet
CS229 Final Report - Music Genre Classification
6 pages
Speech Recognition For Bangla Digit Using DTW & CNN in Matlab
No ratings yet
Speech Recognition For Bangla Digit Using DTW & CNN in Matlab
13 pages
Speech Recognition, Synthesis, and Dialogue 2
No ratings yet
Speech Recognition, Synthesis, and Dialogue 2
59 pages
Lecture 7 - Automatic Speech Recognition
No ratings yet
Lecture 7 - Automatic Speech Recognition
58 pages
mrac_paper1a
No ratings yet
mrac_paper1a
11 pages
Yesno Classification - Info
No ratings yet
Yesno Classification - Info
7 pages
Mallet Tutorial
No ratings yet
Mallet Tutorial
120 pages
Urban Sound Classification
No ratings yet
Urban Sound Classification
6 pages
Randomly Weighted CNNs For Audio Classification
No ratings yet
Randomly Weighted CNNs For Audio Classification
5 pages
An Approach To Extract Feature Using MFC
No ratings yet
An Approach To Extract Feature Using MFC
5 pages
MFCC Step
100% (1)
MFCC Step
5 pages
ISMIR 2019 Tutorial - Waveform-Based Music Processing With Deep Learning
No ratings yet
ISMIR 2019 Tutorial - Waveform-Based Music Processing With Deep Learning
152 pages
Bryn Lansdown
No ratings yet
Bryn Lansdown
48 pages
Pad Assignment 2
No ratings yet
Pad Assignment 2
12 pages
MFCC Feature Extraction
No ratings yet
MFCC Feature Extraction
9 pages
Chapter - 1: 1.1 Introduction To Music Genre Classification
No ratings yet
Chapter - 1: 1.1 Introduction To Music Genre Classification
57 pages
Muzic Genre Classification
No ratings yet
Muzic Genre Classification
4 pages
Predicting Singer Voice Using Convolutional Neural Network
No ratings yet
Predicting Singer Voice Using Convolutional Neural Network
17 pages
Day 2 S1
No ratings yet
Day 2 S1
4 pages
DL Music
No ratings yet
DL Music
16 pages
Research Paper Attri
No ratings yet
Research Paper Attri
7 pages
Speaker Recognition Using Vocal Tract Features
No ratings yet
Speaker Recognition Using Vocal Tract Features
5 pages
Music Genre Classification Project Repor
No ratings yet
Music Genre Classification Project Repor
19 pages
Empirical Study of Features and Classifiers For Fault Diagnosis in Motorcycles Based On Acoustic Signals
No ratings yet
Empirical Study of Features and Classifiers For Fault Diagnosis in Motorcycles Based On Acoustic Signals
28 pages
5sgasgs
No ratings yet
5sgasgs
6 pages
Musicgenre-Pages Merged
No ratings yet
Musicgenre-Pages Merged
12 pages
Control of Robot Arm Based On Speech Recognition Using Mel-Frequency Cepstrum Coefficients (MFCC) and K-Nearest Neighbors (KNN) Method
No ratings yet
Control of Robot Arm Based On Speech Recognition Using Mel-Frequency Cepstrum Coefficients (MFCC) and K-Nearest Neighbors (KNN) Method
6 pages
MLSP_LAB_EXP3
No ratings yet
MLSP_LAB_EXP3
6 pages
Saheaw 2020
No ratings yet
Saheaw 2020
4 pages
ML Project Ideas
No ratings yet
ML Project Ideas
2 pages
Recall What Are Sound Features? Feature Detection and Extraction Features in Sphinx III
No ratings yet
Recall What Are Sound Features? Feature Detection and Extraction Features in Sphinx III
11 pages
Dynamic Spectrum Derived MFCC and HFCC Parameters and Human Robot Speech Interaction
No ratings yet
Dynamic Spectrum Derived MFCC and HFCC Parameters and Human Robot Speech Interaction
5 pages
An_effective_automatic_speech_emotion_recognition_for_Tamil_language_based_on_DWT_and_MFCC_using_Stability-plasticity_dilemma_Neural_network
No ratings yet
An_effective_automatic_speech_emotion_recognition_for_Tamil_language_based_on_DWT_and_MFCC_using_Stability-plasticity_dilemma_Neural_network
6 pages
Audio Classification
No ratings yet
Audio Classification
1 page
Feature Engineering and Selection: CS 294: Practical Machine Learning October 1, 2009 Alexandre Bouchard-Côté
No ratings yet
Feature Engineering and Selection: CS 294: Practical Machine Learning October 1, 2009 Alexandre Bouchard-Côté
94 pages
Sentiment Analysis On Tweets
No ratings yet
Sentiment Analysis On Tweets
2 pages
Cs6601 Project 2 Paper
No ratings yet
Cs6601 Project 2 Paper
4 pages
1 - An Introduction To Machine Learning With Scikit-Learn
No ratings yet
1 - An Introduction To Machine Learning With Scikit-Learn
9 pages
Coding Neural Networks-Classification & Regression
No ratings yet
Coding Neural Networks-Classification & Regression
39 pages
Acoustic Modeling Using Deep Belief Networks: Abdel-Rahman Mohamed, George E. Dahl, and Geoffrey Hinton
No ratings yet
Acoustic Modeling Using Deep Belief Networks: Abdel-Rahman Mohamed, George E. Dahl, and Geoffrey Hinton
10 pages
2017 Bookmatter SpeechRecognitionUsingArticula
No ratings yet
2017 Bookmatter SpeechRecognitionUsingArticula
8 pages
A_novel_approach_for_MFCC_feature_extraction
No ratings yet
A_novel_approach_for_MFCC_feature_extraction
5 pages
ContentBasedFiltering (2)
No ratings yet
ContentBasedFiltering (2)
40 pages
Guide To YAMNet - Sound Event Classifier
No ratings yet
Guide To YAMNet - Sound Event Classifier
10 pages
Sentimental Analysis
No ratings yet
Sentimental Analysis
7 pages
Chapter4 (The Evaluating Multiple Models Chapter Is Really Good!)
No ratings yet
Chapter4 (The Evaluating Multiple Models Chapter Is Really Good!)
47 pages
2015_Piczak_ESC_sound_datasets
No ratings yet
2015_Piczak_ESC_sound_datasets
4 pages
Error-Correction on Non-Standard Communication Channels
From Everand
Error-Correction on Non-Standard Communication Channels
Edward A. Ratzer
No ratings yet
Analog vs Digital
From Everand
Analog vs Digital
Marcus Tesla
No ratings yet
Image Compression: Efficient Techniques for Visual Data Optimization
From Everand
Image Compression: Efficient Techniques for Visual Data Optimization
Fouad Sabry
No ratings yet
Some Case Studies on Signal, Audio and Image Processing Using Matlab
From Everand
Some Case Studies on Signal, Audio and Image Processing Using Matlab
Dr. Hedaya Mahmood Alasooly
No ratings yet
Mayank Upadhyay: Career Objective
No ratings yet
Mayank Upadhyay: Career Objective
3 pages
Title Page
No ratings yet
Title Page
9 pages
Strategia de Eficientizare A Activităţii Instituţiei Prefectului-Judeţul Olt - PDF Descargar Libre
No ratings yet
Strategia de Eficientizare A Activităţii Instituţiei Prefectului-Judeţul Olt - PDF Descargar Libre
8 pages
2 Interface Python With Mysql - Programs
No ratings yet
2 Interface Python With Mysql - Programs
2 pages
Note On Alkanols
No ratings yet
Note On Alkanols
6 pages
Fatality During Rescue Boat Drill
No ratings yet
Fatality During Rescue Boat Drill
2 pages
Blower Door Tests (En 13829) For Quality Assurance - Getting Air-Tight Buildings in Retrofitting, Too
No ratings yet
Blower Door Tests (En 13829) For Quality Assurance - Getting Air-Tight Buildings in Retrofitting, Too
6 pages
Calendar Systems A-to-Z
No ratings yet
Calendar Systems A-to-Z
52 pages
GA
No ratings yet
GA
2 pages
IT Sci Y6 Topical Test
No ratings yet
IT Sci Y6 Topical Test
6 pages
Artist 12 User Manual (English)
No ratings yet
Artist 12 User Manual (English)
19 pages
Hsslive Xii Politics CH 3 Notes Joby
No ratings yet
Hsslive Xii Politics CH 3 Notes Joby
4 pages
Company Lawtextbook PDF
No ratings yet
Company Lawtextbook PDF
400 pages
Ecological Relationships
No ratings yet
Ecological Relationships
9 pages
Microbiology Fundamentals a clinical approach 4th ed 2022 Heidi Smith download
100% (1)
Microbiology Fundamentals a clinical approach 4th ed 2022 Heidi Smith download
32 pages
Breville TG425XL Manual
No ratings yet
Breville TG425XL Manual
24 pages
Submitted By: Jashndeep Singh Karan Vashist
No ratings yet
Submitted By: Jashndeep Singh Karan Vashist
13 pages
Jeter AA 4e SolutionsManual Ch16
100% (1)
Jeter AA 4e SolutionsManual Ch16
22 pages
Ypsilanti Courier March 21, 2013
No ratings yet
Ypsilanti Courier March 21, 2013
1 page
CV Resume and Covering Letters
No ratings yet
CV Resume and Covering Letters
7 pages
X4029960-301 - Bus Riser+UTX - R01
No ratings yet
X4029960-301 - Bus Riser+UTX - R01
15 pages
Automated Design Modular Buildings GCGAN
No ratings yet
Automated Design Modular Buildings GCGAN
21 pages
The Role of AI in Digital Empowerment
No ratings yet
The Role of AI in Digital Empowerment
10 pages
Blackbook Sample
No ratings yet
Blackbook Sample
70 pages
Month End Closing Procedures
100% (2)
Month End Closing Procedures
4 pages
Decision Making and Problem Solving PDF
No ratings yet
Decision Making and Problem Solving PDF
364 pages
Learning Module General Biology 1
No ratings yet
Learning Module General Biology 1
18 pages
Hull Girder Strength Calculation4
No ratings yet
Hull Girder Strength Calculation4
1 page

Presentation Slides

Uploaded by

Presentation Slides

Uploaded by

URBAN SOUND

• The largest free urban sound dataset.

Children playing Dog barks

Drilling Engine idling

• Steps to extract features include:

1. Frame the signal into shorter frames. (20-40 ms)

2. For each frame, calculate the periodogram estimate of power spectrum

3. Apply the mel filterbank, sum the energy in each filter

4. Take logarithm of the filterbank energy.

Approach 2: Zero Padding to make shorter samples longer.

Approach 3: Having different density of filterbanks in each sample.

• With all models overfitting was a serious issue.

• Next Step would be increasing dataset size …

You might also like