The document discusses speaker recognition systems, focusing on two main approaches: the spectrographic approach and voice print identification. It details their principles, features, applications, and comparisons, emphasizing their roles in analyzing speech signals and identifying unique vocal characteristics. Additionally, it covers various techniques and algorithms used in these systems, such as Mel-Frequency Cepstral Coefficients, Gaussian Mixture Models, and Deep Learning methods.
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF or read online on Scribd
0 ratings0% found this document useful (0 votes)
8 views
favsi m3 (models)
The document discusses speaker recognition systems, focusing on two main approaches: the spectrographic approach and voice print identification. It details their principles, features, applications, and comparisons, emphasizing their roles in analyzing speech signals and identifying unique vocal characteristics. Additionally, it covers various techniques and algorithms used in these systems, such as Mel-Frequency Cepstral Coefficients, Gaussian Mixture Models, and Deep Learning methods.
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF or read online on Scribd
You are on page 1/ 48
Speaker recognition systems, auditory analyais is a eru
dppreacneg.° UatON Analysis in speaker recognition are the spectipraphic
approaches,
hing individuals based on their vocal characteristics, T
Broach and voles print ident
\ocomn
ification. Lets explore each of the
pectrorraphic Approach:
Principle:
The spectrographie approach involves the anal
N generates a visual representation known
represented by color of shading,
Features:
isis of the frequency content of speech sign
Extracts features such as formants, piteh, and other 5}
Application:
ectral characteristics from the spectrogram,
Commonly used in forensic voice analysis and speech signal processing.
* Provides a visual tool for the examination of speech signals,
ice Print Identification:
Principle:
+ Voice print identification focuses
‘on extracting and analyzing unique characteristics of an individual's voi
vocal signature."
8 a catinetie “voice print” oF
+ Hinvolves features related to the individuals voral tract. piteh, intonation. and other veice charactensstics,
| Features:
* Extracts features related to the shape and size of the vocal tract. pitch patterns, speech rate, and ether personalized attributes
Application:
+ Used for speaker recognition. including both speaker verification
known voices)
+ Applied in secunty, access control. and law enforcement
{confirming identity) and speaker identifeation finding a match in database ot
m-ECOGNITION SYSTEM OF
APHICComparison:
Nature of Analysis:
* Spectrographic Approach: Primarily analyzes the frequency content of speech signals, providing a visual
representation.
* Voice Print Identification: Focuses on extracting specific features related to individual voice characteristics.
Applications:
* Spectrographic Approach: Often used in forensic analysis and visual inspection of speech signals.
* Voice Print Identification: Applied in speaker recognition systems for authentication and identification
purposes.
Representation:
* Spectrographic Approach: Provides a visual representation of the speech signal.
* Voice Print Identification: Focuses on creating @ mathematical representation or model of the unique voice
characteristics.
Use Cases:
* Spectrographic Approach: Useful for analyzing speech signals in a more general context, not necessarily
for individual speaker identification.
* Voice Print Identification: Specifically designed for speaker recognition tasks, including verification and
identification.Comparison:
Nature of Analysis:
* Spectrographic Approach: Primarily analyzes the frequency content of speech signals, providing a visual
representation.
* Voice Print Identification: Focuses on extracting specific features related to individual voice characteristics.
Applications:
* Spectrographic Approach: Often used in forensic analysis and visual inspection of speech signals.
* Voice Print Identification: Applied in speaker recognition systems for authentication and identification
purposes.
Representation:
* Spectrographic Approach: Provides a visual representation of the speech signal.
* Voice Print Identification: Focuses on creating a mathematical representation or model of the unique voice
characteristics.
Use Cases:
* Spectrographic Approach: Useful for analyzing speech signals in a more general context, not necessarily
for individual speaker identification.
* Voice Print Identification: Specifically designed for speaker recognition tasks, including verification and
identification.
OACH ORPrin
Spectral Analysis: Spectral analysis involves
decomposing the speech signat
Into its frequency components
MECC mimies the human ear's
tesponse to different frequencies
by tranaferming the power
‘spectrum of the speech signs
Mel-Frequency Cepstral
Coefficients (MFCC):
Pitch analysis focuses on
determining the fundamental
frequency (piten) of the speakers
Pitch Analysis:
Formant analysis identifies and
‘analyzes the resonant frequencies
In the vocal tract during speech
reduction.
FormantAnalysis:
Examines characteristics related
to the glottal souree of speach,
‘uch a8 the voral fold vibrations,
Voice Source Analysis:
Prosodic features analyze the
rhythm, intonation, and stress
ppatterns in speech
Prosodic Features:
Fourier Transform is commonly
used to obtain the spectrum of the
‘speech signal
The process involves dividing the
‘speech signal into frames.
applyingaa filterbank to obtain mele
frequency bins, takingthe
fogarthm.and then applying the
discrete cosine transform
Algorithms such a8 autocorrelation
‘of cepstral analyais are used to
estimate piten,
Formants are typically identified a5,
peaks inthe frequency spectrum.
Measures like jitter (frequency
Variation) and shimmer (ampitude
variation) are used to characterize
the voiee source.
Exuaction of features related to
speach rate, guration of sabes,
Venations im pten, and intensity
Extracted features may include
formants (tesonant frequencies).
spectral peaks, and other frequency
related characteristics
The resulting cootficients capture
esvental spectral information and
‘are commonly used in speaker
recognition systems,
Extracted pitch information can be
used to dstingush speakers based
‘on pitch patterns.
Formant frequencies and bandwidths
‘provide information about the shape
‘and site of the vocal vact,
contributing to speaker
eisunctiveness
Jitter, shimmer, and other voice
source features contnbute to the
Uniqueness of a speakers voice.
Prosodic features reflect the
‘emotional ang benavioral
nepeets of speech. eanancingpeaker
Method
sol _ Features
Recognition system e ;
Dynamic Features: Captures the cynamic aspects of De'ta Coefficients represent the Dynamic features enhance the
speech by considering changes rate of change of other features, representation cf temporal
over time, Providing temporal information. variations in spatch, contributing
to the sprahers characterization,
Dei ep. earning, Approaches Deep leaming vtlizes neural Convolutional Neural Networks Deep leaming models lean
NI NI . ‘Networks to automatically eam (CNNs) or Recurrent Neural a
hierarchical representations from Networks (RNNs) may be employed
raw speech signals, for feature leaming. data. eliminating the need for
handcrafted features.
Gaussian Mixture Models GMMs model the probability Trains the model ona setofknown Statistical [characteristics of space
* Gstribution of feature vectors for speakers and calculates. Tkelihood features are represented by GWMs
(MMs); each speaker using a mixture of ratios during testing. Providing @ model for each
Gaussian distributions. speaker,
A - UTAis eHfective for capturing
LTA); LTAfocuses on capturing longterm Averages features over an.
Long Term Averagin statistical information about a extended time window, providing a ‘Speaker characteristics in.
speaker's voice. stable representation, ‘continuous speech,
izatic |; VOrepresents speech features by Quantizes feature vectors intoa = VQis efficient for storage and
Vector Quantization (v mapping themto asset of limited set of codewords. reducing comparison of speech patterns,
tepresentative vectors (codebook). dimensionality, often used in conjunction with,
‘other methods,
7 HMMs model the temporal Trains HMIMs on speech dataand —HMthis are effective for capturing
Hidden Markov Models evolution of speech features by uses them to model the dynamics _context-dependent infermation ane
representing speech as a of speech, the temporal aspects of speech,
sequence of states.
RECOGNITION SYST
fEMICCaptures the dynamic aspects of
Speech by considering changes
over time,
DeepleamingAppmaches deep teanng tices eva
Networks to automaticly eam
hierarchical representation from
Faw speech signals,
Gaussian Mixture Models Mts ‘model the probabiity
tribution of feature vectors for
ach speaker using a mixture of
Gaussian distributions.
Long Term Averaging (LTA); LTAfocuses on capturing long-term
statistical information abouta
speaker's voice,
Vector Quantization (VQ); VQrepresents speech features by
mapping themtoa set of
representative vectors (codebook).
Hidden Markov Models HMMs model the temporal
(HMMs):
iran
evolution of speech features by
representing speech asa
sequence of states.
PROACHES 10 SPEAKE!
VANDA
DP
Dea costficients represent the
‘rate of change of ether features,
Providing temporal information,
Convolutional Neural Netw
(CNNs) of Recurrent Neural
Networks (RUNS) may bee
for feature aming,
Trains the model on a set of knoan,
‘speakers and calculates likelihood
‘ratios during testing.
Averages features over an
extended time window, providing a
stable representation.
Quantizes feature vectors intoa
limited set of codewords, reducing
Gimensionalty.
Trains HMMs on speech data and
uses them to model the dynamics
of speech.
nt
Oynamic features enhance the
"epcesetaton ef temporal
\ratensin etch, contrbaing
tothe speakers charactenzation,
Deep leaming models eam
cmp patter and
‘representations direct fom the
ata, eliminating the need for
handoraedfeatres.
Statistical characteristics of speech
features are represerted by GMM
Providing a model for each,
speaker,
LIAis effective foreapturing
speaker charactrsis in
continuous speech,
VQis efficent for storage and
comparison of speech pattems,
ten used in conjunction with
coer methods,
HAIN are effective for capturing
‘context-dependent information and
‘the temporal aspects of speech.
-ECOGNITIONISYSTEMOFic Features: Captures tecjnamic aspects ef Detacceficients represent the Dynamic features enhance the
Dreniere speeeh byeonseing changes atwotchangeet therfeatres, representation cf terperl
overtime, providing temporal infermation, variations in speech,
tothe speakers characterization,
DeepleamingAppmaches Deep learning vtlizes neural Convolutional Neural Networks
Deep learing models leam
‘etiors to automatcalyleam —_(CNNs} ee Recurent Neural complen pattems and
Nerarchial representations from Networks (RNS) maybeemplaed —teprecartavers recy trom tbe
raw speech signals, forfeatue laming, ata, eliminating te raed for
bandoratted features,
ian Mixture Models GMs modelthe probably Trains the model ena st ofkngnn Statistica characteristics of speech
oe suibuton of feature vectors for speakers andcalou'atesthethoed features are represarted by GMs,
ach speaker using a motue of ation during testing. roving a model foreach
Gaussian distributions,
speaker,
LongTerm Averaging(LTA), UAfonses en captuinglngtemm —_Aeagesfeatues ver an
UIAis effective for capturing
‘stetiscal information abouta, extended time window, providing a
speaker characterctes in
speakers voice, Stable representation. Continuoxs speech,
Vector Quantization (VQ); VQrepresents speech features by Quantizes feature vectors into a VQis efficient for storage and
mapping them to a set of linted set of codewords, reducing Comparison of speech pattems,
‘epresentative vectors (codebook), Gmersionalty, fen used in conjunction with
other methods,
Hidden Markov Models HMMs modelthe temporal Trans Hs on speech eataand HAIN are fective for capturing
HMMs): ‘evolution of speech features by ‘uses them to model the dynamics Contert-dependent information and
(HMMs): representing speech asa of speech,
eee ‘the temporal aspects cf speech,
‘Sequence .
PROACHES TOSPEAKER RECOGNITION'SYSTEMIOF
“AUDITORY ANALYSISGAUSSIAN MIXTURE MODELS (GMMS);
InitiairedGAUSSIAN MIXTURE MODELS (GMMS):LONG TERM AVERAGING (LTA):
Enrollment Phase
Testing PhaseAutom: it
ated Speaker Principle
Cognition system
eon syster
Dynamic Features: Captures the ¢imamic ESpeCts of
SPEECH by con
"sidering changes
| owines
Oeep leaming utiizes neural
NetWorhs to automaticalisieam
Hierarchical representations fem
few speech signals,
Gaussian Misture Models che mote! te potabny
(GMN fs); Gstribution of feature vectors for
ach speaker Using @ mixture of
Gaussian distributions,
UTA focuses on eapturnglongtem
‘statistical information abouta,
Speaker's voice.
‘VQ represents ‘speech features y
‘Mapping them to e set of -
HMMs model the temporal
tepresenting speech asa
saquenea of state,
lirted set feos
Convolationet Neural Newacis
(CNS) er Recurent teu
Natwors (a) may b empl
for feature lining
‘Trains the model on z set of noxm
‘Speakers and caleulates fcelned
ratios during testing,
fi cveran
‘tended time nindn, roving
stab/2 representation,
Quantizes feature vectors into
‘Trains HhIMs on speech data ard
cof speech,
Naratensin speech cont
Nothe speaters characterzaton,
Deep learning odes eam
comple patems and
et rad tor
handcrafted features,
Statatcal charactersticn of speech
features are represented by Git,
Proving a modal fer anch
speaker,
VQ ecient fer storage and
are effective fer capturing
evel pesch features by uses themtomedel the ¢ramics _cortert dependent information an
lution of s : ®
the tempera aspects of speech,GAUSSIAN MIXTURE MODELS (GMMS):
Intihned
ounMFCC MODEL*SYAVETES (NEURAL
NEI WORKS)
beorsic Foemy
§; (Sse
CaeDEEP me APPROACHES (NEURAL
NETWORKS)
Jorn hong
Ns ed Eat
“Sle
jas] fem
a
AacktSoul eo!
(iztfean)MFCC MODELey eeVECTOR QUANTIZATION (VQ):
if
i+
ane
ee
i 3
aaHIDDEN MARKOV MODELS (HMMS):
HMMs in ASR
Originally used in speech recognition (Rabiner, 1986)
Proposed for DNA modeling (Churchill, 1989)
Applied to modeling proteins (Haussler et al, 1992)
Multiple sequence alignment
Ieee le dise\eeM cull Aull (‘homologs’)tees yea
freemesoe
Analog Circuits vs Digital CircuitsBasic terms
(i fie eFe
Sirs) Burt
incecuktsdue Gainfice
(Pa Sais
ris S775
enre Siete
date
SUT)
(ierch=
ia].SUIsignal is continuous and time varying.
Troubleshooting of analog signals are difficult.
‘An analog signal is usually in the form of sine wave.
| asly affected by the noise
‘Analog signals use continous values to represent the data,
curacy of the analog signals may be affected by noise.
|Analog signals may be affected during data transmission,
log signal u
fore power,
Examples: Temperature, Pressure, Flow measurements, ete,
|Components like resistors, Capacitors, Inductors, Diodes are
| used in analog circuits.
SS
idios and
Vide
ANALOG
0s: DIGITAL &
Digital signal have two or more states and in binary form.
‘Troubleshooting of digital signals are easy,
|An digital signal is usually in the form of square wave,
‘These are stable and less prone to noise.
Digital signals use discrete values to represent the data,
‘Accuracy of the digital signals are immune. from the noise.
Digital signals are not affacted during data transmission,
Digital signal use less power.
Examples: Valve Feedback, Motor Start, Trip, ete.
Components like transistors, logic gates,
and microcontrollers
are used in Digital circuits,
|and what an R, Land C does ina
mp into the main topics lets unders
re denoted by the letter “R”. A resistor is an element that
mostly in form of heat. It will have a Voltage drop across it which
mains fixed for a fixed value of current flowing through it
* Capacitor: Capacitors are denoted by the letter “C”. A capacitor is an element
ich stores energy (temporarily) in form of electric field. Capacitor resists chang,
i ge. There are many types of capacitors, out of which the ceramic capacitor
C tic capacitors are mostly used. They charge in one direction and
‘ opposite direction
s are denoted by the letter “L”. A Inductor is also similar to
y but is stored in form of magnetic field. Inductors
+ Resistor: Resistors a
d
s ener
. Inductors are normally a coil wound wire and is rarely used
er two components.
r esistor, Capacitor and Inductors are put together we can form circuits
. RLand RLC circuit which exhibits time and frequency dependent responses
1 le usef
circuit can k
every
is tutorial
n. many AC applications as mentioned already. A RC/RL/RLC
used asa filter, oscillator and much more it is not possible to
pect in this tutorial, so we will learn the basic behaviour of them in¢ RC circuit:
¢ The RC circuit (Resistor Capacitor Circuit) will consist ofa
Capacitor and a Resistor connected either in series or parallel to a
voltage or current source. These types of circuits are also called as RC
filters or RC networks since they are most commonly used in
filtering applications. An RC circuit can be used to make some crude
filters like low-pass, high-pass and Band-Pass filters.
c
ae \|
I
Te RC Circuit
—————+ RL circuit:
+ The RL Circuit (Resistor Inductor Circuit) will consist of an Inductor and a Re:
again connected either in series or parallel. A series RL circuit will be driven by voltage
source and a parallel RL circuit will be driven by a current source. RL circuii
commonly used in as passive filters, a first order RL uit with only one inductor
stor
ir
and one capacitor is shown below
+ Similarly ina RL cireuit we have to replace the Capacitor with an Inductor. The Light
sumed to act as a pure resistive load and the resistance of the bulb is set toa
bulb is
known value of 100 ohms.ARLC circuit as the name implies will consist of a Resistor, Capacitor and Inductor
connected in series or parallel. The circuit forms an Oscillator circuit which is very
commonly used in Radio receivers and televisions. It is also very commonly used as
damper circuits in analog applications. The resonance property of a first order RLC
circuit is discussed below
The RLC circuit is also called as series resonance circuit, oscillating circuit or a tuned
circuit.
RLC Circuit