Nieuwenhuizen - Comparison of Algorithms For Audio Fingerprinting

In this paper, two audio fingerprinting algorithms are tested that of Avery Wang’s and Haitsma and Kalker’s in terms of accuracy, speed, versatility and scalability.

Uploaded by

Kate Zen

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

112 views

Nieuwenhuizen - Comparison of Algorithms For Audio Fingerprinting

In this paper, two audio fingerprinting algorithms are tested that of Avery Wang’s and Haitsma and Kalker’s in terms of accuracy, speed, versatility and scalability.

Uploaded by

Kate Zen

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 2

Comparison of Algorithms for Audio Fingerprinting

Heinrich A. van Nieuwenhuizen, Willie C. Venter and Leenta M.J. Grobler

School of Electrical, Electronic and Computer Engineering
North-West University, Potchefstroom Campus, South Africa
Email: {20252188; willie.venter; leenta.grobler}@nwu.ac.za
Telephone: (027) 18 299-4058 Fax: (027) 18 299-1977

Abstract-The practical implementation and application

Recordings’
of audio fingerprinting to recognize specific audio collection
Fingerprint
segments are becoming more and more popular these extraction

days. Different research groups are working on

Recordings’
implementations of audio fingerprinting, but they IDs
seldom, compare their algorithms to those of other DB

people working in the field. A fair judgment can

therefore not be made of which of the available
algorithms is suitable for certain applications. In this
paper, two audio fingerprinting algorithms are tested Unlabeled Recording
recording ID
that of Avery Wang’s and Haitsma and Kalker’s in Fingerprint
Matc
extraction
terms of accuracy, speed, versatility and scalability

Keywords: Audio Fingerprinting; Automatic Music Recognition;

Content-based Audio Identification; Perceptual Hashing; Robust Figure 1 : Content-based audio identification framework [6]
Matching;
In section II, the three main groups of audio
fingerprinting techniques are briefly discussed, after which
I. INTRODUCTION
the operation of Avery Wang’s Shazam [2] and Haitsma and
Fingerprinting systems are not a new concept; it has been Kalker [1] algorithm are briefly discussed. In section III the
around for more than a hundred years. In 1893 Sir Francis operation of the algorithms are discussed. In Section IV the
Galton was the first to “prove” that no two fingerprints of validation and verification is presented. The paper is ended
human beings are alike [1]. This notion was taken further by with a conclusion and future work in section V.
using any unique feature to identify an object; this includes
the iris and even ears. People also realized the potential of II. DIFFERENT AUDIO FINGERPRINTING TECHNIQUES
constructing fingerprints of audio signals to identify and
compare them. This principle is called audio fingerprinting. There are several applications for audio fingerprinting
algorithms. According to Wes Hatch [3] the biggest
Audio fingerprints make use of short audio segments benefactor would be the broadcast monitoring industry.
usually between 3-30 seconds in length (depending on the Other applications would be playlist generation, royalty
algorithm) to create an audio fingerprint. This audio collection, program verification and advertisement
fingerprint is compared to a database of known audio verification.
fingerprints to identify the original audio source. The audio According to P.J.O Doets, M. Menor Gisbert and R.L.
fingerprints of the segments do not necessarily have to be of Lagendijk there are three groups [4] which audio
high quality to be a match. Distortions and interference of fingerprinting can be classified as:
the original signal makes matching of the fingerprints less
reliable, but (to a certain extent) it will still be recognizable. Group 1: Systems that use features based on multiple
The distortions and interferences can be compared to a subbands, namely Philips’ Robust Hash algorithm, reported
smudged or partial human fingerprint. to be very robust [1] against distortions. Phillips uses
Haitsma and Kalker’s algorithm.
In this paper a brief description about audio
fingerprinting, a generic code of Avery Wang’s Shazam Group 2: Systems that use features based on a single band
algorithm and that of Haitsma and Kalker’s algorithm are such as the spectral domain, namely Avery Wang’s Shazam
discussed [2]. and Fraunhofer’s AudioID algorithms.

Group 3: Systems using a combination of subbands or

frames, which is optimized through training, namely
Microsoft’s Robust Audio Recognition Engine (RARE) that
uses Hidden Markov Models (HMMs). [5]

This work was completed at the Telkom-Grintek Centre of Excellence

At the NWU, and is funded by the HTBO THRIP project
III. OPERATION Landmark-Based Audio Fingerprinting” [5]. Verifying that
the algorithms are correctly implemented.
The Haitsma and Kalker’s algorithm propose that a The two algorithms are then compared to each other in
terms of accuracy, speed, versatility and scalability. The
fingerprint extraction scheme should be based on a general
algorithms are further tested by subjecting the data to noise
streaming approach, taking an audio signal framing it into
and compression and the algorithms parameters are tweaked
windows of 370ms length for every 11.6ms giving it an
for better results. Comparing the algorithms in the latter
overlapping factor of 31/32 [1]. terms but with different data ranging from classical music to
pop and advertisements.
After which the FFT of every frame is computed filter
though band division stored into 32-bit sub Fingerprints. V. CONCLUSION AND FUTURE WORK
See figure 2 below
The following were observed in the study.
Increasing and decreasing both algorithms frame size will
decrease speed but increase the accuracy respectfully.
Defining more peaks in the Shazam’s algorithm’s frame
(normally 5) would also result in better accuracy but
decrease speed. Increasing the overlapping regions in both
algorithms will increase robustness but decrease speed.
The nearest false positive is on average 5 or 6, which
strengthens James .P Ogle and Daniel P.W. Ellis theory that
9 peaks[7] are needed to be a match.
In the future the following should be tried
Figure 2 : Haitsma and Kalker’s algorithm [1] Translating the algorithms to a faster programing
environment. Further research should be done on different
An unidentified sample of an audio signal should be 3-30 techniques to access the database quicker.
seconds in length for their algorithm to identify a match. More application uses should be investigated e.g.
Avery Wang claims for a database of 20 thousand tracks gunshots, engine noise etc.
implemented on a PC, the search time is 5 to 500
milliseconds [2]. As the code is not available, generic code
VI. BIBLIOGRAPHY
for MATLAB was produced by Dan Ellis [5]. Robert Macrae
of C4DM Queen Mary University London altered the code [1] J. Haitsma and A. Kalker, "A Highly Robust Audio Fingerprinting
for use in the windows environment. System," International Symposium on Music Information Retrieval
(ISMIR), pp. 107-115, 2002.
The latter algorithm proposed to make use of a [2] A. L.-C. Wang, "An Industrial-Strength Audio Search Algorithm,"
spectrogram. The spectrogram is the squared magnitude of ISMIR, 2003.
the STFT (Short-time Fourier Transform). [3] W. Hatch, "A Quick Review of Audio Fingerprinting," Mar. 2003.

spectrogram(t ,  )  STFT (t ,  )
2 [4] P. J. O. Doets, M. M. Gisbert, and R. L. Lagendijk, "On the
(1) comparison of audio fingerprints for extracting quality parameters of
compressed audio," vol. 6072, 2006.
Usually the spectrogram is divided into small sizes [5] D. P. W. Ellis. (2009) Robust Landmark-Based Audio Fingerprinting.
(typically 512 points) which are called windows or frames. http://labrosa.ee.columbia.edu/matlab/fingerprint/
This is the shared basis of group 2. The differences between [6] P. Cano, E. Batlle, E. Gómez, L. de C.T.Gomes, and M. Bonnet,
the fingerprint algorithms in the group typically involve "Audio Fingerprinting: Concepts And Applications," Studies in
Computational Intelligence (SCI), no. 2, pp. 233-245, 2005.
how much the frames overlap, and how the fingerprint is
[7] J. P. Ogle and D. P. W. Ellis, "Fingerprinting to identify repeated
defined in the frame and the storing and searching of the sound events in long-duration personal audio recordings," 2007.
fingerprints. Avery Wang Shazam algorithm uses the energy
peaks in the frame and form spectral pair landmarks. The
local maxima within a defined section are grouped into pairs Heinrich van Nieuwenhuizen received his B.Eng degree in 2009 and is
[4]. currently pursuing his M.Eng at the North West University, Potchefstroom
The hash values are computed and compared the entry campus. His research interests include software design, audio fingerprinting
and implementation and comparison of audio fingerprinting algorithms for
with the most hits is returned as the match. (Typically more industrial use.
than 9 spectral peaks are considered a match [7].)

IV. VALIDATION AND VERIFICATION

The two algorithms are implemented. Haitsma and
Kalker’s results are compared to that of their own “A
Highly Robust Audio Fingerprinting System” [1] and Avery
Wang’s compared to his own “An Industrial-Strength Audio
Search Algorithm” [2] and that of Dan Ellis “Robust

This work was completed at the Telkom-Grintek Centre of Excellence

At the NWU, and is funded by the HTBO THRIP project

Music Note Recognition Using FFT
No ratings yet
Music Note Recognition Using FFT
11 pages
Librosa - Audio and Music Signal Analysis in Python SCIPY 2015
No ratings yet
Librosa - Audio and Music Signal Analysis in Python SCIPY 2015
7 pages
The Matrix in Theory
100% (8)
The Matrix in Theory
315 pages
Introduction To in 2018 Modern Workplace Learning
No ratings yet
Introduction To in 2018 Modern Workplace Learning
22 pages
Nieuwenhuizen - The Study and Implementation of Shazam's Audio Fingerprinting Algorithm For Advertisement Identification
No ratings yet
Nieuwenhuizen - The Study and Implementation of Shazam's Audio Fingerprinting Algorithm For Advertisement Identification
4 pages
AudioFingerprinting
No ratings yet
AudioFingerprinting
5 pages
Plagiarized Audio Identification Using Audio Fingerprinting
No ratings yet
Plagiarized Audio Identification Using Audio Fingerprinting
4 pages
Seminar On Audio Fingerprinting: Presented By: Abdul Jaleel.N Roll No: 01 Gecw
No ratings yet
Seminar On Audio Fingerprinting: Presented By: Abdul Jaleel.N Roll No: 01 Gecw
19 pages
A Highly Robust Audio Fingerprinting System
No ratings yet
A Highly Robust Audio Fingerprinting System
9 pages
Audio Fingerprinting Based On Normalized Spectral Subband Moments
No ratings yet
Audio Fingerprinting Based On Normalized Spectral Subband Moments
4 pages
Audio Fingerprinting
No ratings yet
Audio Fingerprinting
5 pages
Audio Fingerprinting With Python and Numpy
No ratings yet
Audio Fingerprinting With Python and Numpy
13 pages
Audio Fingerprinting: Combining Computer Vision & Data Stream Processing Shumeet Baluja & Michele Covell Google, Inc. 1600 Amphitheatre Parkway, Mountain View, CA. 94043
No ratings yet
Audio Fingerprinting: Combining Computer Vision & Data Stream Processing Shumeet Baluja & Michele Covell Google, Inc. 1600 Amphitheatre Parkway, Mountain View, CA. 94043
4 pages
Audio File Recognition Using Hash Algorithm
No ratings yet
Audio File Recognition Using Hash Algorithm
8 pages
Fingerprint
No ratings yet
Fingerprint
17 pages
Vericast
No ratings yet
Vericast
68 pages
Audio Fingerprinting Sls 24oct2011
No ratings yet
Audio Fingerprinting Sls 24oct2011
44 pages
An Industrial-Strength Audio Search Algorithm
No ratings yet
An Industrial-Strength Audio Search Algorithm
7 pages
(Burges, Platt, Jana) Distortion Discriminant Anal
No ratings yet
(Burges, Platt, Jana) Distortion Discriminant Anal
10 pages
How Does Chromaprint Work
No ratings yet
How Does Chromaprint Work
4 pages
Fingerprinting
No ratings yet
Fingerprinting
22 pages
Audio Fingerprint Generation: Andre Mosley Po T Wang John Broadway Yu-Heng Lee
No ratings yet
Audio Fingerprint Generation: Andre Mosley Po T Wang John Broadway Yu-Heng Lee
2 pages
Audio File
No ratings yet
Audio File
28 pages
Shazam Princeton ELE201
No ratings yet
Shazam Princeton ELE201
7 pages
A Low-Complexity Audio Fingerprinting Technique for Embedded Applications
No ratings yet
A Low-Complexity Audio Fingerprinting Technique for Embedded Applications
20 pages
Music Recognition Using Audio Fingerprinting - Document
No ratings yet
Music Recognition Using Audio Fingerprinting - Document
66 pages
A Short Introduction To Audio Fingerprinting With A Focus On Shazam
No ratings yet
A Short Introduction To Audio Fingerprinting With A Focus On Shazam
5 pages
Artboard 7
No ratings yet
Artboard 7
1 page
AI-Based Vocal Judging Application
No ratings yet
AI-Based Vocal Judging Application
8 pages
Natalia Struharova Research Project
No ratings yet
Natalia Struharova Research Project
10 pages
What Is Shazam?
No ratings yet
What Is Shazam?
22 pages
Audio Visual Speech Recognition: Advancements, Applications, and Insights
From Everand
Audio Visual Speech Recognition: Advancements, Applications, and Insights
Fouad Sabry
No ratings yet
2.continuous Low-Power Music Recognition
No ratings yet
2.continuous Low-Power Music Recognition
5 pages
Music Source Separation: Francisco Javier Cifuentes Garc Ia
No ratings yet
Music Source Separation: Francisco Javier Cifuentes Garc Ia
7 pages
Music Score Alignment and Computer Accompaniment: Roger B. Dannenberg and Christopher Raphael
100% (1)
Music Score Alignment and Computer Accompaniment: Roger B. Dannenberg and Christopher Raphael
8 pages
Electronics: Audio Fingerprint Extraction Based On Locally Linear Embedding For Audio Retrieval System
No ratings yet
Electronics: Audio Fingerprint Extraction Based On Locally Linear Embedding For Audio Retrieval System
15 pages
A Systematic Approach To Authenticate Song Signal Without Distortion of Granularity of Audible Information (ASSDGAI)
No ratings yet
A Systematic Approach To Authenticate Song Signal Without Distortion of Granularity of Audible Information (ASSDGAI)
10 pages
BertinMahieux Columbia 0054D 11154
No ratings yet
BertinMahieux Columbia 0054D 11154
126 pages
The Columbine Massacre - Barack Obama - Zionist Wolf in Sheep's (PDFDrive)
No ratings yet
The Columbine Massacre - Barack Obama - Zionist Wolf in Sheep's (PDFDrive)
18 pages
Pitch Recognition Through Template Matching: Salim Perchy
100% (1)
Pitch Recognition Through Template Matching: Salim Perchy
11 pages
The Echo Nest Musical Fingerprint
No ratings yet
The Echo Nest Musical Fingerprint
1 page
Water Marking Audio Files With Copyrights
No ratings yet
Water Marking Audio Files With Copyrights
83 pages
US6990453
No ratings yet
US6990453
30 pages
Tai Lieu FFT
No ratings yet
Tai Lieu FFT
12 pages
Aggregate Features and A B For Music Classification: DA Oost
No ratings yet
Aggregate Features and A B For Music Classification: DA Oost
12 pages
Mohini Dey - Capstone
No ratings yet
Mohini Dey - Capstone
52 pages
Sounds Perfect - The Evolution of Recording Technology and Music's Social Future
No ratings yet
Sounds Perfect - The Evolution of Recording Technology and Music's Social Future
375 pages
Ieee Audio Copy Forgery
No ratings yet
Ieee Audio Copy Forgery
14 pages
Article 3
No ratings yet
Article 3
2 pages
Saracoglu 2009
No ratings yet
Saracoglu 2009
6 pages
Paper 141-Audio Watermarking A Comprehensive Review
No ratings yet
Paper 141-Audio Watermarking A Comprehensive Review
9 pages
Multimedia Auditory Signal Analysis
No ratings yet
Multimedia Auditory Signal Analysis
17 pages
Landmark based audio fingerprinting for Naval Vessels
No ratings yet
Landmark based audio fingerprinting for Naval Vessels
6 pages
State-Of-The-Art in Fundamental Frequency Tracking: Stéphane Rossignol, Peter Desain and Henkjan Honing
100% (2)
State-Of-The-Art in Fundamental Frequency Tracking: Stéphane Rossignol, Peter Desain and Henkjan Honing
11 pages
06516351
No ratings yet
06516351
6 pages
Using Exact Locality Sensitive Mapping To Group and Detect Audio-Based Cover Songs
No ratings yet
Using Exact Locality Sensitive Mapping To Group and Detect Audio-Based Cover Songs
8 pages
Musical Genre Classification by Instrumental Features: Dannenberg, Thom, and Watson
No ratings yet
Musical Genre Classification by Instrumental Features: Dannenberg, Thom, and Watson
4 pages
Automatic Music Timbre Indexing
No ratings yet
Automatic Music Timbre Indexing
1 page
Robust Audio Steganography Using Direct-Sequence Spread Spectrum Technology
No ratings yet
Robust Audio Steganography Using Direct-Sequence Spread Spectrum Technology
6 pages
Automatic Target Recognition: Fundamentals and Applications
From Everand
Automatic Target Recognition: Fundamentals and Applications
Fouad Sabry
No ratings yet
Automatic Target Recognition: Advances in Computer Vision Techniques for Target Recognition
From Everand
Automatic Target Recognition: Advances in Computer Vision Techniques for Target Recognition
Fouad Sabry
No ratings yet
Efficient Memory Optimization for IoT Intrusion Detection
From Everand
Efficient Memory Optimization for IoT Intrusion Detection
Ethan Evelyn
No ratings yet
Human Visual System Model: Understanding Perception and Processing
From Everand
Human Visual System Model: Understanding Perception and Processing
Fouad Sabry
No ratings yet
Kafka Event System
75% (4)
Kafka Event System
166 pages
Rong Xiaoqing: The Fading American Dreams of China's Most Notorious Snakehead' - Foreign Policy
No ratings yet
Rong Xiaoqing: The Fading American Dreams of China's Most Notorious Snakehead' - Foreign Policy
9 pages
Western Liberalism Is Dying in China - Foreign Policy
No ratings yet
Western Liberalism Is Dying in China - Foreign Policy
7 pages
Right-Wing Chinese
No ratings yet
Right-Wing Chinese
28 pages
What It Means To Be Liberal' or Conservative' in China - Foreign Policy
No ratings yet
What It Means To Be Liberal' or Conservative' in China - Foreign Policy
5 pages
Reporting On Human Snakes in China - Pulitzer Center
No ratings yet
Reporting On Human Snakes in China - Pulitzer Center
11 pages
Bountouridis - Music Information Retrieval Using Biologically-Inspired Techniques
No ratings yet
Bountouridis - Music Information Retrieval Using Biologically-Inspired Techniques
167 pages
MNN - Basic Studio Production Handbook
No ratings yet
MNN - Basic Studio Production Handbook
55 pages
Choi - Deep Learning For Musical Info Retrieval
No ratings yet
Choi - Deep Learning For Musical Info Retrieval
16 pages
The Saturn System
No ratings yet
The Saturn System
111 pages
Archived: Ec Type Examination Certificate
No ratings yet
Archived: Ec Type Examination Certificate
3 pages
Audiophile USB Manual
No ratings yet
Audiophile USB Manual
36 pages
2. Imc113_individual Assignment (Microsoft Power Point)
No ratings yet
2. Imc113_individual Assignment (Microsoft Power Point)
1 page
Reverse Time Migration
No ratings yet
Reverse Time Migration
2 pages
Building A Scalable Time-Series Database Using Postgres: Mike Freedman
No ratings yet
Building A Scalable Time-Series Database Using Postgres: Mike Freedman
45 pages
230 V USB Charging Socket: Page 1/4
No ratings yet
230 V USB Charging Socket: Page 1/4
4 pages
Incomput Phys Bio Longo
No ratings yet
Incomput Phys Bio Longo
22 pages
B.SC Withheld List Nov-2021examination
No ratings yet
B.SC Withheld List Nov-2021examination
80 pages
Smart Parking
No ratings yet
Smart Parking
12 pages
Lesson 1 IP Addressing
No ratings yet
Lesson 1 IP Addressing
24 pages
Digital Signal Processing
100% (4)
Digital Signal Processing
354 pages
Base SAS Interview Questions
No ratings yet
Base SAS Interview Questions
26 pages
CovMo User Guide
No ratings yet
CovMo User Guide
180 pages
Lab Manual Computer Aided Engineering Graphics ECE 151-251
100% (1)
Lab Manual Computer Aided Engineering Graphics ECE 151-251
22 pages
Consumer Durable Industry: Presented By-Kasturi Mandal A Vijay Kumar Sasi Kumar Umesh G S Arun Kumar Barun Bardhan
0% (1)
Consumer Durable Industry: Presented By-Kasturi Mandal A Vijay Kumar Sasi Kumar Umesh G S Arun Kumar Barun Bardhan
60 pages
Single Channel Low Light Monochrome Situational Awareness Camera
No ratings yet
Single Channel Low Light Monochrome Situational Awareness Camera
2 pages
Robust and Realistic LSDYNA Crashworthiness and Safety Models by Easi-Crash Dyna
No ratings yet
Robust and Realistic LSDYNA Crashworthiness and Safety Models by Easi-Crash Dyna
10 pages
Salesforce Field Names Reference
No ratings yet
Salesforce Field Names Reference
78 pages
Ultrasync Zone Expansion Modules: Um-Z8 & Um-Z20
No ratings yet
Ultrasync Zone Expansion Modules: Um-Z8 & Um-Z20
2 pages
VMware View 5 Customer Presentation v1 - Gar
No ratings yet
VMware View 5 Customer Presentation v1 - Gar
45 pages
Securitron DK37 Data Sheet
No ratings yet
Securitron DK37 Data Sheet
1 page
Cloud Practitioner
No ratings yet
Cloud Practitioner
5 pages
PS 2768 - Rev2018 04 16 PDF
No ratings yet
PS 2768 - Rev2018 04 16 PDF
34 pages
(tabloid) Remove whitespaces and other characters or text strings in Google Sheets from multiple cells at once
No ratings yet
(tabloid) Remove whitespaces and other characters or text strings in Google Sheets from multiple cells at once
26 pages
DC 250 Error Codes
No ratings yet
DC 250 Error Codes
40 pages
Aqui Face
No ratings yet
Aqui Face
12 pages
Trojan Horse A Novel PDF
No ratings yet
Trojan Horse A Novel PDF
11 pages
EE36-Data Structure and Algorithms Ii Eee
No ratings yet
EE36-Data Structure and Algorithms Ii Eee
159 pages
S3C9442/C9444/F9444/C9452/C9454/F9454 Sam88Rcri Instruction Set
No ratings yet
S3C9442/C9444/F9444/C9452/C9454/F9454 Sam88Rcri Instruction Set
50 pages

Nieuwenhuizen - Comparison of Algorithms For Audio Fingerprinting

Uploaded by

Nieuwenhuizen - Comparison of Algorithms For Audio Fingerprinting

Uploaded by

Comparison of Algorithms for Audio Fingerprinting

Heinrich A. van Nieuwenhuizen, Willie C. Venter and Leenta M.J. Grobler

Abstract-The practical implementation and application

days. Different research groups are working on

people working in the field. A fair judgment can

Keywords: Audio Fingerprinting; Automatic Music Recognition;

Group 3: Systems using a combination of subbands or

This work was completed at the Telkom-Grintek Centre of Excellence

IV. VALIDATION AND VERIFICATION

This work was completed at the Telkom-Grintek Centre of Excellence

You might also like