0% found this document useful (0 votes)
11 views

Presentation Slides

The document discusses urban sound classification using machine learning models. It analyzes the UrbanSound dataset containing labeled urban sound clips from 10 classes. Various feature extraction techniques are tested to represent the sound data, including MFCC. Models tested include naive Bayes, random forest, SVM, deep neural networks, recurrent neural networks, and convolutional neural networks. The challenges of data variability, overfitting, and achieving high accuracy are discussed.

Uploaded by

nguyen hung
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views

Presentation Slides

The document discusses urban sound classification using machine learning models. It analyzes the UrbanSound dataset containing labeled urban sound clips from 10 classes. Various feature extraction techniques are tested to represent the sound data, including MFCC. Models tested include naive Bayes, random forest, SVM, deep neural networks, recurrent neural networks, and convolutional neural networks. The challenges of data variability, overfitting, and achieving high accuracy are discussed.

Uploaded by

nguyen hung
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 13

URBAN SOUND

CLASSIFICATION
CHIH-WEI CHANG & BENJAMIN DORAN
OVERVIEW
• Dataset
• What is the data
• Feature extraction
• Models
• Types
• Variability of data
• Overfitting
• Accuracy
MODELS
• We tested: • Libraries Used:
• Naive Bayes • Sci-Kit Learn
• Random forest • Tensorflow
• SVM • Librosa(for handling .wav file)
• Deep Neural Networks
• Recurrent Neural Networks
• Convolutional Neural Networks
The UrbanSound Dataset
• Created by Justin Salamon & Christopher Jacoby & Juan Pablo Bello

• Contains 8732 labeled sound excerpts (shorter than 4s) of real field-recording urban
sounds from 10 classes : (1).air conditioner, (2).car horn, (3).children playing, (4). Dog
bark, (5). Drilling, (6). Engine idling, (7). Jackhammer, (8) gun shot, (9) siren, and (10.)
street music.

• The largest free urban sound dataset.


Feature Extraction for Sound Data
• Most classification problems have data that can be easily expressed as vector forms.
But when it comes to sound data, feature extraction is not straightforward.
Air conditioner Car horn

Children playing Dog barks

Drilling Engine idling



Feature Extraction
Mel Frequencies Cepstral Coefficient (MFCC) are features that is widely in automatic speech
and speech recognition.

• Steps to extract features include:

1. Frame the signal into shorter frames. (20-40 ms)

2. For each frame, calculate the periodogram estimate of power spectrum

3. Apply the mel filterbank, sum the energy in each filter

4. Take logarithm of the filterbank energy.


Challenge in Feature Extraction
• Get equal size of features from sound files of different length, different resolutions, and
different number of channel.

• Our 3 approaches:
Approach 1: Extract “characteristics” of each sound clips, so the number of
characteristics is independent from original data shape.

Approach 2: Zero Padding to make shorter samples longer.

Approach 3: Having different density of filterbanks in each sample.


Shorter samples have denser filterbanks so all the sample can have the same number of
filterbanks.
MODELS TYPES:
• We tested: • Libraries Used:
• Naive Bayes • Sci-Kit Learn
• Random forest • Tensorflow
• SVM • Librosa
• Deep Neural Networks (for handling .wav file)
• Recurrent Neural Networks • Python_speech_features
• Convolutional Neural Networks (for handling .wav file)
VARIABILITY OF DATA
Data Set Total Size per Sample
Dimensions
Raw data(not modeled) 2D 1D: Variable length
Flattened Zero-padded MFCC 2D 1D: 27600 features
193features 2D 1D: 193 features
Const_shape MFCC 3D 2D:20 rows 41 features
Const_shape LogMFCC 3D 2D:20 rows 41 features

• Sklearn Is only able to take 2D data, meaning we needed to flatten data such as the MFCC data
that gave us additional rows per sample.

• Sklearn and TensorFlow are both unable to handle variable sized data meaning we often
needed to use zero padding.
• const_MFCC: Random Forest 52%, SVM 55%, NB 20%
ACCURACY • feature193: Random Forest 61%, SVM 60%, NB 24%
ACCURACY
OVERFITTING

• With all models overfitting was a serious issue.


• Neural Nets needed regularization on all layers
• DeepNN's did best with 80% dropout on each layer.
• No Neural Nets with more than 3 hidden layers.
• DeepNN and RNN had most trouble, followed by CNN.
• However, DeepNN was also the easiest to compensate for the overfitting.

• Next Step would be increasing dataset size …


Training Curve Plot

You might also like