0% found this document useful (0 votes)

77 views

Automatic Audio Analysis For Content Description & Indexing

This document discusses automatic audio analysis for content description and indexing. It covers: 1. Auditory scene analysis and how the human auditory system organizes complex sound scenes. 2. Computational auditory scene analysis (CASA), which aims to automatically analyze sound scenes by translating psychological rules of sound organization into computational models. 3. Prediction-driven CASA, which models audio perception as searching for plausible hypotheses to explain observations, rather than directly extracting information from sounds.

Uploaded by

mcastilho

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

77 views

Automatic Audio Analysis For Content Description & Indexing

Uploaded by

mcastilho

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 20

Automatic audio analysis

for content description & indexing

Dan Ellis
International Computer Science Institute, Berkeley CA
<[email protected]>

Outline

1 Auditory Scene Analysis (ASA)

2 Computational ASA (CASA)

3 Prediction-driven CASA

4 Speech recognition & sound mixtures

5 Implications for content analysis

Audio Indexing - Dan Ellis 1998feb04 - 1

1 Auditory Scene Analysis
“The organization of complex sound scenes
according to their inferred sources”
• Sounds rarely occur in isolation
- organization required for useful information
• Human audition is very effective
- unexpectedly difficult to model
• ‘Correct’ analysis defined by goal
- source shows independence, continuity
→ecological constraints enable organization
f/Hz
city22

4000
−40
2000
−50
1000

400 −60

200 −70

0 1 2 3 4 5 6 7 8 9 dB
time/s

Audio Indexing - Dan Ellis 1998feb04 - 2

Psychology of ASA
• Extensive experimental research
- organization of ‘simple pieces’
(sinusoids & white noise)
- streaming, pitch perception, ‘double vowels’
• “Auditory Scene Analysis” [Bregman 1990]
→ grouping ‘rules’
- common onset/offset/modulation,
harmonicity, spatial location
• Debated... (Darwin, Carlyon, Moore, Remez)

(from
Darwin 1996)

Audio Indexing - Dan Ellis 1998feb04 - 3

2 Computational Auditory Scene Analysis
(CASA)
• Automatic sound organization?
- convert an undifferentiated signal into a
description in terms of different sources
f/Hz
city22

4000
−40 horn horn
2000
−50 door crash
1000
yell
400 −60

200 −70
car noise

0 1 2 3 4 5 6 7 8 9 dB
time/s

• Translate psych. rules into programs?

- representations to reveal common onset,
harmonicity ...
• Motivations & Applications
- it’s a puzzle: new processing principles?
- real-world interactive systems (speech, robots)
- hearing prostheses (enhancement, description)
- advanced processing (remixing)
- multimedia indexing...
Audio Indexing - Dan Ellis 1998feb04 - 4
CASA survey
• Early work on co-channel speech
- listeners benefit from pitch difference
- algorithms for separating periodicities
• Utterance-sized signals need more
- cannot predict number of signals (0, 1, 2 ...)
- birth/death processes
• Ultimately, more constraints needed
- nonperiodic signals
- masked cues
- ambiguous signals

Audio Indexing - Dan Ellis 1998feb04 - 5

CASA1: Periodic pieces
• Weintraub 1985
- separate male & female voices
- find periodicities in each frequency channel by
auto-coincidence
- number of voices is ‘hidden state’
• Cooke & Brown (1991-3)
- divide time-frequency plane into elements
- apply grouping rules to form sources
- pull single periodic target out of noise
brn1h.aif brn1h.fi.aif
frq/Hz frq/Hz

3000 3000
2000 2000
1500 1500
1000 1000

600 600
400 400
300 300
200 200
150 150
100 100
0.2 0.4 0.6 0.8 1.0 time/s 0.2 0.4 0.6 0.8 1.0 time/s

Audio Indexing - Dan Ellis 1998feb04 - 6

CASA2: Hypothesis systems
• Okuno et al. (1994-)
- ‘tracers’ follow each harmonic + noise ‘agent’
- residue-driven: account for whole signal
• Klassner 1996
- search for a combination of templates
- high-level hypotheses permit front-end tuning
3760 Hz

Buzzer-Alarm
2540 Hz
2230 Hz 2350 Hz
Glass-Clink
1675 Hz
1475 Hz

950 Hz
500 Hz
Phone-Ring
460 Hz Siren-Chirp
420 Hz

1.0 2.0 3.0 4.0 sec 1.0 2.0 3.0 4.0 sec
TIME TIME
(a) (b)

• Ellis 1996
- model for events perceived in dense scenes
- prediction-driven: observations - hypotheses

Audio Indexing - Dan Ellis 1998feb04 - 7

CASA3: Other approaches
• Blind source separation (Bell & Sejnowski)
- find exact separation parameters by maximizing
statistic e.g. signal independence
• HMM decomposition (RK Moore)
- recover combined source states directly
• Neural models (Malsburg, Wang & Brown)
- avoid implausible AI methods (search, lists)
- oscillators substitute for iteration?

Audio Indexing - Dan Ellis 1998feb04 - 8

3 Prediction-driven CASA
Perception is not direct
but a search for plausible hypotheses
• Data-driven...
input signal discrete
mixture features Object objects Grouping Source
Front end
formation rules groups

vs. Prediction-driven hypotheses

Noise
components
Hypothesis Predict
management & combine
Periodic
components
prediction
errors
input signal predicted
mixture features Compare features
Front end
& reconcile

• Motivations
- detect non-tonal events (noise & clicks)
- support ‘restoration illusions’...
→ hooks for high-level knowledge
+ ‘complete explanation’, multiple hypotheses,
resynthesis
Audio Indexing - Dan Ellis 1998feb04 - 9
Analyzing the continuity illusion
• Interrupted tone heard as continuous
- .. if the interruption could be a masker
f/Hz
ptshort

4000
2000
1000

0.0 0.2 0.4 0.6 0.8 1.0 1.2 1.4

time/s

• Data-driven just sees gaps

• Prediction-driven can accommodate

- special case or general principle?

Audio Indexing - Dan Ellis 1998feb04 - 10
Phonemic Restoration (Warren 1970)
• Another ‘illusion’ instance
• Inference relies on high-level semantics
nsoffee.aif
frq/Hz
3500

3000

2500

2000

1500

1000

500

0
1.2 1.3 1.4 1.5 1.6 1.7 time/s

• Incorporating knowledge into models?

Audio Indexing - Dan Ellis 1998feb04 - 11

Subjective ground-truth in mixtures?
• Listening tests collect ‘perceived events’:

• Consistent answers:
f/Hz
City

4000
2000
1000
400
200

0 1 2 3 4 5 6 7 8 9

Horn1 (10/10)
S9−horn 2
S10−car horn
S4−horn1
S6−double horn
S2−first double horn
S7−horn
S7−horn2
S3−1st horn
S5−Honk
S8−car horns
S1−honk, honk

Crash (10/10)
S7−gunshot
S8−large object crash
S6−slam
S9−door Slam?
S2−crash
S4−crash
S10−door slamming
S5−Trash can
S3−crash (not car)
S1−slam

Horn2 (5/10)
S9−horn 5
S8−car horns
S2−horn during crash
S6−doppler horn
S7−horn3

Truck (7/10)
S8−truck engine
S2−truck accelerating
S5−Acceleration
S1−rev up/passing
S6−acceleration
S3−closeup car
S10−wheels on road

Audio Indexing - Dan Ellis 1998feb04 - 12

PDCASA example:
City-street ambience
f/Hz
City
4000
2000
1000
400
200
1000
400
200
100
50
0 1 2 3 4 5 6 7 8 9

f/Hz
Wefts1−4 Weft5 Wefts6,7 Weft8 Wefts9−12
4000
2000
1000
400
200
1000
400
200
100
50

Horn1 (10/10)
Horn2 (5/10)
Horn3 (5/10)
Horn4 (8/10)
Horn5 (10/10)

f/Hz
Noise2,Click1
4000
2000
1000
400
200

Crash (10/10)

f/Hz
Noise1
4000
2000
1000 −40
400
200 −50

−60
Squeal (6/10)
Truck (7/10)
−70

0 1 2 3 4 5 6 7 8 9 dB
time/s

• Problems
- error allocation - rating hypotheses
- source hierarchy - resynthesis

Audio Indexing - Dan Ellis 1998feb04 - 13

4 Speech recognition
& sound mixtures
• Conventional speech recognition:

Feature Phoneme HMM

extraction low-dim. classifier phoneme decoder words
signal
features probabilities

- signal assumed entirely speech

- find valid labelling by discrete labels
- class models from training data
• Some problems:
- need to ignore lexically-irrelevant variation
(microphone, voice pitch etc.)
- compact feature space → everything speech-like
• Very fragile to nonspeech, background
- scene-analysis methods very attractive...

Audio Indexing - Dan Ellis 1998feb04 - 14

CASA for speech recognition
• Data-driven: CASA as preprocessor
- problems with ‘holes’ (but: Cooke, Okuno)
- doesn’t exploit knowledge of speech structure
• Prediction-driven: speech as component
- same ‘reconciliation’ of speech hypotheses
- need to express ‘predictions’ in signal domain
Speech
components

Hypothesis Noise Predict

management components & combine

Periodic
components
input
mixture Compare
Front end
& reconcile

Audio Indexing - Dan Ellis 1998feb04 - 15

Example of speech & nonspeech
f/Bark
(a) Clap (clap8k−env.pf)
15
10
5

0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 1.1 1.2 1.3 1.4 1.5
(b) Speech plus clap (223cl−env.pf)
dB
60
40
20
(c) Recognizer output
h# w n ay n tcl t uw f ay ah s ay ow h# v s eh v ah n h#

h# n ay n t uw f ay v ow h# s eh v ax n
tcl
<SIL> nine two five oh <SIL> seven
(d) Reconstruction from labels alone (223cl−renvG.pf)

(e) Slowly−varying portion of original (223cl−envg.pf)

(f) Predicted speech element ( = (d)+(e) ) (223cl−renv.pf)

(g) Click5 from nonspeech analysis (justclick.pf)

(h) Spurious elements from nonspeech analysis (nonclicks.pf)

0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 1.1 1.2 1.3 1.4 1.5

• Problems:
- undoing classification & normalization
- finding a starting hypothesis
- granularity of integration
Audio Indexing - Dan Ellis 1998feb04 - 16
5 Implications for content analysis:
Using CASA to index soundtracks
f/Hz
city22

4000
−40 horn horn
2000
−50 door crash
1000
yell
400 −60

200 −70
car noise

0 1 2 3 4 5 6 7 8 9 dB
time/s

• What are the ‘objects’ in a soundtrack?

- subjective definition → need auditory model
• Segmentation vs. classification
- low-level cues → locate events
- higher-level ‘learned’ knowledge to give
semantic label (footstep, crash)
... AI complete?
• But: hard to separate
- illusion phenomena suggest auditory
organization depends on interpretation

Audio Indexing - Dan Ellis 1998feb04 - 17

Using speech recognition for indexing
• Active research area:
Access to news broadcast databases
- e.g. Informedia (CMU), ThisL (BBC+...)
- use LV-CSR to transcribe,
then text-retrieval to find
- 30-40% word error rate, still works OK
• Several systems at NIST TREC workshop
• Tricks to ‘ignore’ nonspeech/poor speech

Audio Indexing - Dan Ellis 1998feb04 - 18

Open issues in automatic indexing
• How to do ASA?
• Explanation/description hierarchy
- PDCASA: ‘generic’ primitives
+ constraining hierarchy
- subjective & task-dependent
• Classification
- connecting subjective & objective properties
→ finding subjective invariants, prominence
- representation of sound-object ‘classes’
• Resynthesis?
- a ‘good’ description should be adequate
- provided in PDCASA, but low quality
- requires good knowledge-based constraints

Audio Indexing - Dan Ellis 1998feb04 - 19

6 Conclusions
• Auditory organization is required in real
environments
• We don’t know how listeners do it!
- plenty of modeling interest
• Prediction-reconciliation can account for
‘illusions’
- use ‘knowledge’ when signal is inadequate
- important in a wider range of circumstances?
• Speech recognizers are a good source of
knowledge
• Automatic indexing implies ‘synthetic listener’
- need to solve a lot of modeling issues

Audio Indexing - Dan Ellis 1998feb04 - 20

UPSC GS4 Ethics Notes - Tufel Noorani (Archer' IAS Acad
No ratings yet
UPSC GS4 Ethics Notes - Tufel Noorani (Archer' IAS Acad
191 pages
14ec3029 Speech and Audio Signal Processing
No ratings yet
14ec3029 Speech and Audio Signal Processing
30 pages
9 Forceandlawsofmotion
100% (2)
9 Forceandlawsofmotion
16 pages
Institutional Arrangements Definition
No ratings yet
Institutional Arrangements Definition
2 pages
Auditory Scene Analysis: Phenomena, Theories and Computational Models
No ratings yet
Auditory Scene Analysis: Phenomena, Theories and Computational Models
39 pages
Introduction & DSP: EE E6820: Speech & Audio Processing & Recognition
No ratings yet
Introduction & DSP: EE E6820: Speech & Audio Processing & Recognition
33 pages
Environmental Sound Recognition and Classification
No ratings yet
Environmental Sound Recognition and Classification
36 pages
Augmented Audification: July 2015
No ratings yet
Augmented Audification: July 2015
8 pages
Audio Signal Classification
No ratings yet
Audio Signal Classification
39 pages
Cocktail Party Problem As Binary Classification: Deliang Wang
No ratings yet
Cocktail Party Problem As Binary Classification: Deliang Wang
39 pages
article - audio intent detection classification problem
No ratings yet
article - audio intent detection classification problem
4 pages
RohithD_1AY20IS070_Technical Seminar _Synopsis Format_2023-2024
No ratings yet
RohithD_1AY20IS070_Technical Seminar _Synopsis Format_2023-2024
2 pages
FS06-01-012
No ratings yet
FS06-01-012
5 pages
Lecture 1
No ratings yet
Lecture 1
48 pages
Audio Indexing: Gaël Richard
No ratings yet
Audio Indexing: Gaël Richard
1 page
Review of Noise Reduction Techniques in Speech Processing
No ratings yet
Review of Noise Reduction Techniques in Speech Processing
3 pages
Bregman PDF
No ratings yet
Bregman PDF
12 pages
Chapter 2
No ratings yet
Chapter 2
29 pages
Detecting Sound Objects in Audio Recordings Anurag Kumar, Rita Singh and Bhiksha Raj Carnegie Mellon University, Pittsburgh USA-15213 (Alnu, Rsingh, Bhiksha) @cs - Cmu.edu
No ratings yet
Detecting Sound Objects in Audio Recordings Anurag Kumar, Rita Singh and Bhiksha Raj Carnegie Mellon University, Pittsburgh USA-15213 (Alnu, Rsingh, Bhiksha) @cs - Cmu.edu
5 pages
Introsounds 2 2
No ratings yet
Introsounds 2 2
33 pages
Trends in Audio and Acoustic Signal Processing
No ratings yet
Trends in Audio and Acoustic Signal Processing
35 pages
Audio Information Retrieval (AIR)
No ratings yet
Audio Information Retrieval (AIR)
3 pages
Paralinguistic Information Processing: - Prosody
No ratings yet
Paralinguistic Information Processing: - Prosody
41 pages
Filtering With Spatial Parameters in B-Format Audio Streams
No ratings yet
Filtering With Spatial Parameters in B-Format Audio Streams
49 pages
msp.1982.28454
No ratings yet
msp.1982.28454
6 pages
1999 Waspaa Mfas PDF
No ratings yet
1999 Waspaa Mfas PDF
4 pages
_speech recognition system
No ratings yet
_speech recognition system
12 pages
Digital Audio Forensics Fundamentals From Capture To Courtroom James Zjalic download
No ratings yet
Digital Audio Forensics Fundamentals From Capture To Courtroom James Zjalic download
83 pages
Analysis of Env Sounds
No ratings yet
Analysis of Env Sounds
132 pages
Blok Diagram Pitch Correction
No ratings yet
Blok Diagram Pitch Correction
37 pages
Voice Recognition PDF
No ratings yet
Voice Recognition PDF
37 pages
Samuel Kim, Shrikanth Narayanan Shiva Sundaram
No ratings yet
Samuel Kim, Shrikanth Narayanan Shiva Sundaram
4 pages
Andre Whines Thesis
No ratings yet
Andre Whines Thesis
141 pages
US6990453
No ratings yet
US6990453
30 pages
Time-Scale Modification Algorithms For Music Audio Signals
No ratings yet
Time-Scale Modification Algorithms For Music Audio Signals
104 pages
Feature Extraction Using PCA
No ratings yet
Feature Extraction Using PCA
36 pages
Speech Recognition UTHM
No ratings yet
Speech Recognition UTHM
30 pages
Automatic Speech Activity Detection, Source Localization, and Speech Recognition On The Chil Seminar Corpus
No ratings yet
Automatic Speech Activity Detection, Source Localization, and Speech Recognition On The Chil Seminar Corpus
4 pages
Towards Neurocomputational Speech and So
No ratings yet
Towards Neurocomputational Speech and So
279 pages
Journal - Listening For Sirens Locating and Classifying Acoustic Alarms in City Scenes
No ratings yet
Journal - Listening For Sirens Locating and Classifying Acoustic Alarms in City Scenes
10 pages
Speech Enhancement 2019
No ratings yet
Speech Enhancement 2019
3 pages
Samsung Prism Ppt 2
No ratings yet
Samsung Prism Ppt 2
11 pages
Unclassified 2012 Sijtsma AcousticBeamformingForTheRankingOfAircraftNoise
No ratings yet
Unclassified 2012 Sijtsma AcousticBeamformingForTheRankingOfAircraftNoise
52 pages
Mohini Dey - Capstone
No ratings yet
Mohini Dey - Capstone
52 pages
DMV 6
No ratings yet
DMV 6
15 pages
Glotin H. (Ed.) - Soundscape Semiotics. Localization and Categorization PDF
No ratings yet
Glotin H. (Ed.) - Soundscape Semiotics. Localization and Categorization PDF
193 pages
Combined Beamforming and Frequency Domain Ica For Source Separation
No ratings yet
Combined Beamforming and Frequency Domain Ica For Source Separation
4 pages
Audio Data Analysis Using Machine Learning and Deep
No ratings yet
Audio Data Analysis Using Machine Learning and Deep
74 pages
46 Silence PDF
No ratings yet
46 Silence PDF
8 pages
Introduction To Multimedia. Analog-Digital Representation
100% (1)
Introduction To Multimedia. Analog-Digital Representation
29 pages
Speech Recognition1
100% (1)
Speech Recognition1
39 pages
Speech Technology
No ratings yet
Speech Technology
5 pages
Stewart Spatial Auditory 2010
No ratings yet
Stewart Spatial Auditory 2010
186 pages
Language-Based Audio Retrieval_ A Comprehensive Re
No ratings yet
Language-Based Audio Retrieval_ A Comprehensive Re
8 pages
Semantic Rank Reduction of Music Audio: MIT Media Lab 20 Ames St. E15-491 Cambridge, MA 02139 USA
No ratings yet
Semantic Rank Reduction of Music Audio: MIT Media Lab 20 Ames St. E15-491 Cambridge, MA 02139 USA
4 pages
Project-Team METISS: Activity Report 2012
No ratings yet
Project-Team METISS: Activity Report 2012
38 pages
CCS369 - TSS-Unit 5
No ratings yet
CCS369 - TSS-Unit 5
23 pages
1 s20 S0957417423010229 Main
No ratings yet
1 s20 S0957417423010229 Main
35 pages
ZCR Based Identification of Voiced Unvoiced and Silent Parts of Speech Signal in Presence of Background Noise
No ratings yet
ZCR Based Identification of Voiced Unvoiced and Silent Parts of Speech Signal in Presence of Background Noise
30 pages
Microphone Interference Reduction in Live Sound Alice Clifford, Josh Reiss Centre For Digital Music Queen Mary, University of London London, UK Alice - Clifford@eecs - Qmul.ac - Uk
No ratings yet
Microphone Interference Reduction in Live Sound Alice Clifford, Josh Reiss Centre For Digital Music Queen Mary, University of London London, UK Alice - Clifford@eecs - Qmul.ac - Uk
8 pages
Sound Localization Using Microphone Arrays: Anish Chandak 10/12/2006 COMP 790-072 Presentation
No ratings yet
Sound Localization Using Microphone Arrays: Anish Chandak 10/12/2006 COMP 790-072 Presentation
33 pages
Noise Reduction: Enhancing Clarity, Advanced Techniques for Noise Reduction in Computer Vision
From Everand
Noise Reduction: Enhancing Clarity, Advanced Techniques for Noise Reduction in Computer Vision
Fouad Sabry
No ratings yet
RA The Book Vol 2: The Recording Architecture Book of Studio Design
From Everand
RA The Book Vol 2: The Recording Architecture Book of Studio Design
Roger D'Arcy
3.5/5 (2)
Red Beans
No ratings yet
Red Beans
126 pages
100 Children's Specialty Clinics: July 2008 100-1
No ratings yet
100 Children's Specialty Clinics: July 2008 100-1
18 pages
Office of Inspector General: Medicare Payments For Orthotics
No ratings yet
Office of Inspector General: Medicare Payments For Orthotics
30 pages
Corrections Final
No ratings yet
Corrections Final
7 pages
Adding Meta Tags
0% (1)
Adding Meta Tags
3 pages
Analysis of Construction Fatalities - The OSHA Data Base 1985-1989
100% (1)
Analysis of Construction Fatalities - The OSHA Data Base 1985-1989
84 pages
The Westcoast Wire: An Upright Solution For CP Kids
No ratings yet
The Westcoast Wire: An Upright Solution For CP Kids
5 pages
Ruby On Rails
No ratings yet
Ruby On Rails
53 pages
Dynamic Macroprogramming of Wireless Sensor Networks With Mobile Agents
No ratings yet
Dynamic Macroprogramming of Wireless Sensor Networks With Mobile Agents
6 pages
Eduardo Peixoto Neves: Profile
No ratings yet
Eduardo Peixoto Neves: Profile
2 pages
RA English07
100% (2)
RA English07
28 pages
Haga 1220
No ratings yet
Haga 1220
77 pages
SMAMem L
No ratings yet
SMAMem L
4 pages
Implementing Functional Languages On Object-Oriented Virtual Machines
No ratings yet
Implementing Functional Languages On Object-Oriented Virtual Machines
20 pages
Razavi Et Al Oopsla06 Poster
No ratings yet
Razavi Et Al Oopsla06 Poster
3 pages
Mediakit 08
No ratings yet
Mediakit 08
9 pages
Quality Early Learning - : Key To School Success
No ratings yet
Quality Early Learning - : Key To School Success
190 pages
Spywarehome 0905
No ratings yet
Spywarehome 0905
8 pages
Trading Tasks: A Simple Theory of O Shoring: Gene M. Grossman and Esteban Rossi-Hansberg
No ratings yet
Trading Tasks: A Simple Theory of O Shoring: Gene M. Grossman and Esteban Rossi-Hansberg
37 pages
Teaching Giving Instructions-1
No ratings yet
Teaching Giving Instructions-1
22 pages
Name:Adri Aiman B. MD Anuar Musaddad Class:50
No ratings yet
Name:Adri Aiman B. MD Anuar Musaddad Class:50
7 pages
Ethics in Consulting
No ratings yet
Ethics in Consulting
5 pages
Edwards13MPhil PDF
No ratings yet
Edwards13MPhil PDF
127 pages
Dragonstar Book of Races
100% (1)
Dragonstar Book of Races
36 pages
Prezi Presentation Translation Elements
No ratings yet
Prezi Presentation Translation Elements
2 pages
ECE Ethics Standards
No ratings yet
ECE Ethics Standards
19 pages
Phenomonology of Schizophrenia
No ratings yet
Phenomonology of Schizophrenia
13 pages
Laozi Zhuangzi
No ratings yet
Laozi Zhuangzi
65 pages
Robbins Fom 4ce Ch13 Lite
No ratings yet
Robbins Fom 4ce Ch13 Lite
16 pages
KasparovTeachesChess byGarryKasparov PDF
100% (2)
KasparovTeachesChess byGarryKasparov PDF
128 pages
Million Really Die?
No ratings yet
Million Really Die?
8 pages
Mick Farren The Mick Farren THE ADVENTURES OF MARILYNAdventures of Marilyn
No ratings yet
Mick Farren The Mick Farren THE ADVENTURES OF MARILYNAdventures of Marilyn
23 pages
E Learning Lpm2012
No ratings yet
E Learning Lpm2012
42 pages
XX XX Xyb Xy: X XX X XX Xyyx Xyyx Xyz Xy XZ X Yz Xyxz XB XB
No ratings yet
XX XX Xyb Xy: X XX X XX Xyyx Xyyx Xyz Xy XZ X Yz Xyxz XB XB
14 pages
Tugas 3 Presentasi Compound
No ratings yet
Tugas 3 Presentasi Compound
10 pages
Instant Download (Ebook) Introduction to the Philosophy of Sport by Heather Lynne Reid ISBN 9780742570627, 0742570622 PDF All Chapters
100% (1)
Instant Download (Ebook) Introduction to the Philosophy of Sport by Heather Lynne Reid ISBN 9780742570627, 0742570622 PDF All Chapters
81 pages
The Philosophy of Happiness
100% (1)
The Philosophy of Happiness
2 pages
Astadyayi
No ratings yet
Astadyayi
73 pages
Case Study-Personality
No ratings yet
Case Study-Personality
2 pages
Min-Max Brawlers With Bernkastel
No ratings yet
Min-Max Brawlers With Bernkastel
40 pages
Con Law II Outline
No ratings yet
Con Law II Outline
29 pages
BPCG 171 June2010 June2023
No ratings yet
BPCG 171 June2010 June2023
18 pages
Managing Channels of Distribution in The Age of Electronic Commerce PDF
No ratings yet
Managing Channels of Distribution in The Age of Electronic Commerce PDF
8 pages
Ces Narrative Report
No ratings yet
Ces Narrative Report
1 page
API Score Card
No ratings yet
API Score Card
4 pages
Umuganura W'inkota Y'ukuri
No ratings yet
Umuganura W'inkota Y'ukuri
244 pages

Automatic Audio Analysis For Content Description & Indexing

Uploaded by

Automatic Audio Analysis For Content Description & Indexing

Uploaded by

Automatic audio analysis

for content description & indexing

1 Auditory Scene Analysis (ASA)

2 Computational ASA (CASA)

4 Speech recognition & sound mixtures

5 Implications for content analysis

Audio Indexing - Dan Ellis 1998feb04 - 1

Audio Indexing - Dan Ellis 1998feb04 - 2

Audio Indexing - Dan Ellis 1998feb04 - 3

• Translate psych. rules into programs?

Audio Indexing - Dan Ellis 1998feb04 - 5

Audio Indexing - Dan Ellis 1998feb04 - 6

Audio Indexing - Dan Ellis 1998feb04 - 7

Audio Indexing - Dan Ellis 1998feb04 - 8

vs. Prediction-driven hypotheses

0.0 0.2 0.4 0.6 0.8 1.0 1.2 1.4

• Data-driven just sees gaps

• Prediction-driven can accommodate

- special case or general principle?

• Incorporating knowledge into models?

Audio Indexing - Dan Ellis 1998feb04 - 11

Audio Indexing - Dan Ellis 1998feb04 - 12

Audio Indexing - Dan Ellis 1998feb04 - 13

Feature Phoneme HMM

- signal assumed entirely speech

Audio Indexing - Dan Ellis 1998feb04 - 14

Hypothesis Noise Predict

Audio Indexing - Dan Ellis 1998feb04 - 15

(e) Slowly−varying portion of original (223cl−envg.pf)

(f) Predicted speech element ( = (d)+(e) ) (223cl−renv.pf)

(g) Click5 from nonspeech analysis (justclick.pf)

(h) Spurious elements from nonspeech analysis (nonclicks.pf)

• What are the ‘objects’ in a soundtrack?

Audio Indexing - Dan Ellis 1998feb04 - 17

Audio Indexing - Dan Ellis 1998feb04 - 18

Audio Indexing - Dan Ellis 1998feb04 - 19

Audio Indexing - Dan Ellis 1998feb04 - 20

You might also like