A Project Report On Speech Recognition For Gender Discrimination
A Project Report On Speech Recognition For Gender Discrimination
on
Speech Recognition for Gender Discrimination
IN
ELECTRONICS AND COMMUNICATION ENGINEERING
on
Speech Recognition for Gender Discrimination
IN
ELECTRONICS AND COMMUNICATION ENGINEERING
Signature of Supervisor .
Name of Supervisor : Mr. B Suresh
Date:
ACKNOWLEDGEMENT
ii
DECLARATION
Project work is a part of our curriculum that gives us the knowledge about the
topic and subject we have studied. It also helps in understanding and relating
the theoretical concepts better which was not enlightened in the classroom. We
have prepared this report as a part of our ‘MAJOR PROJECT FOR SEMESTER
VIII’. The topic we have selected for the project is ‘SPEECH RECOGNITION -
GENDER DISCRIMINATION ’
Student Name:
HARSH SRIVASTAVA
TARANDEEP SINGH
YOGESH SINDHWANI
iii
Contents
CERTIFICATE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . i
ACKNOWLEDGEMENT . . . . . . . . . . . . . . . . . . . . . . . . . . iii
DECLARATION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iv
1 INTRODUCTION 1
1.1 VISION OF SPEECH RECOGNITION . . . . . . . . . . . . . . . 1
1.2 HISTORY AND BACKGROUND . . . . . . . . . . . . . . . . . . . 1
2 CHARACTERSTICS 4
2.1 CHARACTERSTICS OF SPEECH RECOGNITION . . . . . . . . 4
2.2 CLASSIFICATION OF SPEECH . . . . . . . . . . . . . . . . . . . 6
2.3 APPLICATION OF SPEECH . . . . . . . . . . . . . . . . . . . . . 8
3 SOURCE CODE 9
3.1 ALOGRITHM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
3.2 OUR TECHNIQUE . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
5 Appendes 21
6 Conclusion 24
iv
Abstract
INTRODUCTION
Development of speech recognition systems started early in the 1950’s with ad-
vancement into voice, and individual voice was easily distuinguish from others
voice. Previous limitations and thesis to derive a more reliable method of finding
the relation between 2 sets of speech sounds. For identifying speech research has
continued today under the field of speech processing where many acheivements
have taken place.
The task of speech recognition is to convert speech sounds into a sequence of words
by a computer program such as matlab. As the most natural communication clas-
sification for humans, the dream of speech recognition is to enable humans to
communicate more effectively.
The first attempts to invent systems for Automatic Speech Recognition (ASR) by
machines which is invented in 1940s. Earlier research leads to the development
of speech and its recognition which was funded by the National Science Founda-
tion (NSF) and Defense Advanced Research Projects Agency (DARPA). Initial
1
Figure 1.1: Speech Recognition Process
research, performed by NSA and National Science Foundation funding which was
conducted in the 1970s. Speech recognition technology was designed initially for
people in the differently abled community like voice recognition overcome people
with disabilities caused by multiple sclerosis, arthritis achieve maximum produc-
tivity on computers. In the early 1970’s of market opportunities emerged for
speech recognition. The early versions of these products were hard to use. The
early speech recognition systems had lot of problems: they were set to be speaker
independent or had small library, or used a very stylized code.However, in the
information technology industry, nothing stays the same for very long and by the
end ofthe 1980s there was a whole new generation of commercial speech recogni-
tion software systems that were easier to use and speech recognition technology
is being used by millions of individuals to automatically create files from dicta-
tion reports are formatted, changed for mistakes in punctuation and grammar;
and verified for consistency or any possible errors. Speech recognition technology
will become widespread as the technology becomes better. Some speech system
2
Figure 1.2: Speech Production model
3
Chapter 2
CHARACTERSTICS
NITION
Speech signals comprises of many type of sounds. It is classified into some cate-
gories as per as their frequency and amplitude sequence are concerned.
4
1. 1. Voiced sounds are produced by oral cavity with the tension of the glottis.
They have less frequency and less mean average zero crossing rates which is
approximately 14 in 10ms. They vibrate in a relaxed mode.
5
Figure 2.3: Plosive Speech Signal
Isolated word versus continuous speech: Speech systems which identify only
single words at one instant while continuous recognizes sequences of words at
an instant. The isolated words are easier to prepare. Continuous phrase system
combines patterns of smaller speech such as words into larger speech pattern which
is a combination of words which is a sentence.
6
Figure 2.5: Isolated word
7
2.3 APPLICATION OF SPEECH
Usages of Speech Recognition: despite of the truth that speech recognition has lot
of limitations but this technology can be very useful in lot of very application but
keeping in mind but weakness and strength of systems.
8
Chapter 3
SOURCE CODE
3.1 ALOGRITHM
The method/Technique we have used in this project are based on pitch analysis
of the signal.In this method we differentiate between genders which is male or
female on the basis of the pitch and pitch also defined as the basic frequency of
the source.
Therefore we are using a pitch estimate calculation which is quite accurate
and a pitch extractor which is quite efficient .These factors and calculations helps
in differentiating between male and female. The results are quite precise as well.
A PDA which stands for Pitch Detection Algorithm is a process which helps in
determining the pitch or basic frequency which is also known as fundamental
frequency of any speech sound or signal.
While those of female certain pitch or basic frequency range has been defined
on the basis of some results conducted before and similarly for male the frequency
has been pre defined. The results show that the frequency range for female is
higher in comparison to male frequency range.
9
3.2 OUR TECHNIQUE
X
Rn (k) = x(m)w(n − m) (3.1)
X
Rn (k) = x(m + k)w(n − m − k) (3.2)
here
Rn (k) = Short-Time Auto-correlation
x = Speech Signal
w = Hamming Window
k = Sample time at which auto-correlation is calculated
gui Singleton = 1;
gui State = struct('gui Name', mfilename, ...
'gui Singleton', gui Singleton, ...
10
'gui OpeningFcn', @gui speech analysis OpeningFcn, ...
'gui OutputFcn', @gui speech analysis OutputFcn, ...
'gui LayoutFcn', [] , ...
'gui Callback', []);
if nargin && ischar(varargin{1})
gui State.gui Callback = str2func(varargin{1});
end
if nargout
[varargout{1:nargout}] = gui mainfcn(gui State, varargin{:});
else
gui mainfcn(gui State, varargin{:});
end
% End initialization code − DO NOT EDIT
% UIWAIT makes gui speech analysis wait for user response (see UIRESUME)
11
% uiwait(handles.figure1);
% −−− Outputs from this function are returned to the command line.
function varargout = gui speech analysis OutputFcn(hObject, eventdata, handles)
% varargout cell array for returning output args (see VARARGOUT);
% hObject handle to figure
% eventdata reserved − to be defined in a future version of MATLAB
% handles structure with handles and user data (see GUIDATA)
12
title('Waveform');
xlabel('Time (s)');
ylabel('Amplitude');
end
guidata(hObject, handles);
13
msgbox(msgboxText,'Signal recording not done', 'warn');
else
%calculate autocorrelation
r = xcorr(x, ms20, 'coeff');
%plot autocorrelation
d = (−ms20:ms20)/fs;
plot(d, r);
title('Autocorrelation');
xlabel('Delay (s)');
ylabel('Correlation coeff.');
ms2 = fs/500; %maximum speech Fx at 500Hz
ms20 = fs/50; %maximum speech Fx at 50Hz
%just look at region corresponding to positive delays
r = r(ms20 + 1 : 2*ms20+1);
[rmax, tx] = max(r(ms2:ms20));
Fx = fs/(ms2+tx−1);
14
% −−− Executes on button press in Play.
function Play Callback(hObject, eventdata, handles)
% hObject handle to Play (see GCBO)
% eventdata reserved − to be defined in a future version of MATLAB
% handles structure with handles and user data (see GUIDATA)
global x;
time = get(handles.time slider,'Value');
if time == 0
%the first, second and third lines of the message box
msgboxText{1} = 'You have tried to play without recording signal';
msgboxText{2} = 'Try recording a signal using record button';
%this command actually creates the message box
msgbox(msgboxText,'Signal recording not done', 'warn');
else
wavplay(x,8000);
guidata(hObject, handles);
end
15
% eventdata reserved − to be defined in a future version of MATLAB
% handles empty − handles not created until after all CreateFcns called
16
Chapter 4
In this project we have taken into account many practical examples of both female
and male voice. While speaking it has to made sure that background noise is
minimal which does not affect the original signal for whom which we are trying
to differentiate between male and female. Frequency is determined with help of
matlab software. There are different time plot on basis of the time domains and
frequency domains.
17
Figure 4.2: Frequency domain plots for male and female voice
18
Figure 4.4: Female Voice
19
Figure 4.6: Male voice 2
20
Chapter 5
Appendes
21
Figure 5.2: Awesome Image
22
Figure 5.4: Awesome Image
23
Chapter 6
Conclusion
From the above table it can be easily deduced that there is a significant difference
between the male and female voice in term of frequencies or the pitch. Hence
this method of frequency can act as a parameter in gender difference by setting a
threshold frequency value for male and female voice.
24
Bibliography
[1] .R. Rabiner and B.H. Juang, Fundamentals of Speech Recognition, Prentice-
Hall, Englewood Cliffs, N.J., 1993.
[2] .R Rabiner and R.W. Schafer, Digital Processing of Speech Signals, Prentice-
Hall, Englewood Cliffs, N.J., 1978.
[4] . Linde, A. Buzo and R. Gray, “An algorithm for vector quantizer design”,
IEEE Transactions on Communications, Vol. 28, pp.84-95, 1980.
[5] . Furui, “Speaker independent isolated word recognition using dynamic fea-
tures of speech spectrum”, IEEE Transactions on Acoustic, Speech, Signal
Processing, Vol. ASSP-34, No. 1, pp. 52-59, February 1986.
[7] .K. Song, A.E. Rosenberg and B.H. Juang, “A vector quantisation approach
to speaker recognition”, ATT Technical Journal, Vol. 66-2, pp. 14-26, March
1987.
25
[8] omp.speech Frequently Asked Questions WWW site, http://svr-
www.eng.cam.ac.uk/comp.speech/
26