Lip Reading Using External Viseme Decoding: 1 Javad Peymanfard 2 Mohammad Reza Mohammadi 3 Hossein Zeinali

Uploaded by

tasinsafwathc

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

34 views5 pages

Lip Reading Using External Viseme Decoding: 1 Javad Peymanfard 2 Mohammad Reza Mohammadi 3 Hossein Zeinali

Uploaded by

tasinsafwathc

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 5

Lip reading using external viseme decoding

1st Javad Peymanfard 2st Mohammad Reza Mohammadi 3st Hossein Zeinali
School of Computer Engineering School of Computer Engineering Department of Computer Engineering
Iran University of Science and Technology Iran University of Science and Technology Amirkabir University of Technology
Tehran, Iran Tehran, Iran Tehran, Iran
[email protected] [email protected] [email protected]

4st Nasser Mozayani

School of Computer Engineering
Iran University of Science and Technology
Tehran, Iran
arXiv:2104.04784v2 [cs.CV] 7 Nov 2021

[email protected]

Abstract—Lip-reading is the operation of recognizing speech at assigning an input video to a class. But in the second group,
from lip movements. This is a difficult task because the move- which is a sequence learning task, each sample (video) has
ments of the lips when pronouncing the words are similar for a sentence-level text. This is a more challenging task than
some of them. Viseme is used to describe lip movements during
a conversation. This paper aims to show how to use external word-level prediction.
text data (for viseme-to-character mapping) by dividing video- In this paper, we propose a method in which independent
to-character into two stages, namely converting video to viseme, textual data from lip-reading datasets can be utilized to achieve
and then converting viseme to character by using separate models. higher accuracy for modeling. In this method, an external
Our proposed method improves word error rate by an absolute viseme decoding can be modeled using only textual data, so it
rate of 4% compared to the normal sequence to sequence lip-
reading model on the BBC-Oxford Lip Reading Sentences 2 is efficient to build it.
(LRS2) dataset. We employed a sequence-to-sequence model for viseme-to-
Index Terms—lip-reading, visual speech recognition, viseme character modeling to predict characters using a large amount
of text data. A two-layer GRU with an attention mechanism is
I. I NTRODUCTION used in this model. The use of this external model has increased
Lip-reading is commonly used to understand human speech the accuracy of the entire lip-reading process.
without hearing sound and by using only visual features. This The paper is organized as follows: In section 2, we review the
ability is more common in people with hearing loss or hearing most important related works. In this section, we also discuss
problems. Over the past years, several methods have been the advantages and disadvantages of the available methods. In
proposed for a person to lip-read [1], but there is an important Section 3, we discuss the proposed model and describe it in
difference between these methods and the lip-reading methods detail. In Section 4, we discuss the experiments. Finally, we
suggested in AI. The purpose of the proposed methods for conclude the paper in Section 5.
lip-reading by the machine is to convert visual information
into words. This conversion takes place on two levels, which II. R ELATED W ORK
are described below. However, the main purpose of lip-reading There are a variety of methods for lip-reading. These methods
by humans is to understand the meaning of speech and not to can fall into two categories: word-level and sentence-level lip-
understand every single word of speech. Obviously, visemes reading. In word-level methods, lip-reading is a classification
are the main challenge in lip-reading. Visemes are the visual task, whereas in sentence-level methods, sequence prediction
equivalent of phonemes [2]. In fact, each viseme refers to a is the problem. There are many pre-deep learning methods,
group of phonemes in which the movement of the lips is the which you can review by referring to [9]–[11]. In one of
same, such as /b/, /p/, and /m/. these methods [12], modeling is performed at the viseme-level
The traditional approaches of automatic lip-reading used using HMM (Hidden Markov model). After obtaining visemes,
hand-crafted features [1], [3]–[5]. Hidden Markov Model was another phoneme HMM is trained for converting each viseme
also applied for modeling [6]–[8]. However, with new large to a specific phoneme. This method is a non-deep approach
public datasets introduced in this area in recent years and and examined on little data.
deep learning methods used for this purpose, more appropriate The first method proposed for sentence-based lip-reading
methods have been proposed which are more accurate than based on deep learning was called LipNet [13]. The LipNet
even a professional lip reader. These datasets are divided into architecture has 3 layers of STCNN (Spatio-Temporal CNN)
two groups, namely word-level and sentence-level lip-reading. followed by 2 Bi-GRUs (Bidirectional Gated Recurrent Unit).
In the first group, the lip-reading problem is a classification As an end-to-end model, LipNet is trained with CTC loss [14].
task. There are some vocabularies (classes) and the model aims This method has been tested on GRID dataset. In the next
Fig. 1. Traditional sequence to sequence methods.

Fig. 2. Our proposed lipreading method using external viseme to character model.

method called WAS (Watch, Attend and Spell), the attention and used Temporal Convolutional Networks, suggesting another
mechanism is used, and lip-reading is performed on LRS2 improvement to word-level lip-reading.
data, which is a real-world data [2]. This model is based on
LAS (Listen, Attend and Spell) which has been developed for III. P ROPOSED METHOD
speech recognition task [15]. In this section, we propose a method in which external
The deep learning architectures are compared in [16]. In textual data can improve the lip-reading model accuracy. This
this comparison, three new neural network architectures (Fully section consists of two parts. In the first part, we will describe
convolutional, Bidirectional LSTM, and Transformer [17]) are the highest accuracy that can be achieved in word-level lip-
compared and the best performing network with respect to reading, and in the second part, we will explain our proposed
word error rate is the Transformer with a 50% of WER (Word method.
Error Rate) on LRS2. The fully convolutional network also
has the best training and inference time. A. Word-level lower bound error for greedy algorithm
In another recent work, an effective strategy for training lip- First, we used the available lip-reading text data to find
reading model has been proposed that uses speech recognition the lower bound error of word-level viseme-to-character
directly [18]. This method, which is based on knowledge modeling. In this case, we first find text data vocabularies
distillation, does not require manually annotated lip-reading and the percentage of repetition of each word. Then, using
data and the videos are unlabeled. This method predicts the the pronunciation list of the words and one of the suggested
speech in sentence-level and obtains state-of-the-art results on phoneme-to-viseme mapping [21], the viseme sequence for
the LRS2 and LRS3 datasets. each word is obtained. Obviously, some words have the same
In still another work, the main focus is on multilingual viseme sequence (like art and heart). We then categorize these
synergized lip-reading [19]. In this method, a model with words and use a greedy algorithm to get the minimum error.
higher accuracy in both languages can be achieved using data The best choice for any viseme sequence, if there is more
from two different languages. The main idea of this work is than one word, is the word that is repeated the most. In the
based on the fact that common patterns in lip movement exist experiment we performed on the LRS3 dataset [22], the lowest
in different languages because human vocal organs are the WER was 24.29%. In addition, we tested this experiment on
same. This method obtains state-of-the-art performance on the the LRS2 dataset, and the lowest WER was 27.16%. But this
two challenging word-level lip-reading benchmarks, namely is for the case that the context is not taken into account. In the
LRW (English) and LRW-1000 (Mandarin). following, we propose a method that can be used to achieve
The authors proposed in [20] a variable-length augmentation higher accuracy for viseme decoding.
TABLE I
E XAMPLES OF VISEME DECODING RESULTS .

Ground Truth Predicted

AND SO THIS IS WHAT I DID AND SO THIS IS WHAT I DID
SO I SHOULD TALK ABOUT ART SO I SHOULD TALK ABOUT YOU
WELL THE RANGE IS QUITE A BIT WELL THERE HAD CHASE QUITE A BIT
THIS IS NELL REMMEL THIS IS THE TROUBLE
NEXT IS SYLVIA SLATER NEXT IS SILVER IS NOT HER
THAT IS MY MOM AND DAD THAT IS MY PEOPLE THEN

B. Lip reading using external viseme decoding independent data. In fact, in this method, two networks are
In recent years, with the advances in the field of deep trained, one of which converts video to viseme and the other
learning, significant progress has been made in many computer predicts the characters using visemes sequence.
vision problems. One of the most difficult tasks in this field
IV. E XPERIMENTS
is lip reading and viseme decoding. There are usually small
datasets available for this task in a variety of languages. Also, In this section, we will refer to the experiments we performed
providing data in this area for the available methods is very as well as the results. The experiments are divided into two
costly. Because these methods require curriculum learning, parts. In the first part, we describe the conversion of viseme
word-level annotation is needed. to character. We also compare the accuracy obtained for the
In this paper, we intend to solve this problem separately case of using raw data or existing data for lip-reading. Also,
using available sequence-to-sequence methods. This allows in the second part, we perform lip-reading at the character
us to use raw textual data in a language directly for viseme level using the obtained model, along with a viseme level lip
decoding. With respect to the lower bound error mentioned in reading model.
the previous section, we expect the inaccuracy of this model to Note that our goal was not to achieve the best reported
be much lower because in this task the context is considered results, but due to lack of time, we intended to show that the
and each word is not decoded separately. proposed method can improve the baseline. We believe that
The lip-reading methods mentioned in the previous section this improvement can be achieved by any other method. In the
perform modeling at the sentence level, and with respect to future, we will try to first replicate the results reported in [2]
the main challenge in lip-reading, which is viseme decoding, and then incorporate the proposed method into them to make
it is expected that the video-to-viseme conversion will be done a better comparison with the state-of-the-art results.
with greater accuracy. Also, lip-reading can be done at the
sentence level using these two proposed models. A. Viseme decoding
In fact, both sub-models have their own advantages which As explained earlier, usually there is little data for training
improve accuracy. We describe these two models in order of use. a lip-reading model. In this experiment, we want to show how
In the first model, which aims to convert video to viseme, the increasing the unlabeled text data can affect the accuracy of
existing methods can be used exactly and we do not need more the viseme decoding. In this step, we first trained the model
data or any change in the structure of the network. Nevertheless, using the textual data of the LRS2 dataset. Given that the size
we expect to achieve higher accuracy due to the smaller number of the LRS2 samples is not large in terms of textual data,
of classes. Moreover, there is no need for a lip-reading dataset in the second case we used OpenSubtitles corpus [23] for
in the second model. To train this model, we need raw textual viseme to character modeling and measured the effect of this
data in the target language. Also, training data can be obtained improvement.
as needed by having a phoneme sequence for each word and We selected 6 million samples from the OpenSubtitles corpus
using the phoneme-to-viseme mapping. The only challenge for this purpose. But since only the word sequence is available
when constructing this data set is to obtain the sequence of in this corpus, in the first step we used CMU Pronouncing
words for an utterance. There are several solutions to this Dictionary to convert this word sequence into a phoneme
problem called G2P (grapheme-to-phoneme). sequence. In this step, we also removed the sentences containing
As shown in Figure 1, in traditional sequence-to-sequence words that were not in the dictionary. Given the fact that
methods, the first step is determining the mouth area using the the main purpose here is a kind of language modeling, (i.e.
facial landmarks for cropping the ROI (region of interest). This the possibility of occurrence of different consecutive visemes
sequence is then modeled using a 3D visual front-end (usually is important to us), we are not very sensitive to choose an
using 3D-CNN) followed by a sequence processing model. In accurate transcription for all the words. Consequently, we used
fact, in these methods, the feature extraction of lip movements a simple method for this purpose. In this way, we only select
is obtained using a 3D convolutional network, and the output the first transcript for each word in the dictionary with multiple
is a set of probabilities for each character. But in the proposed transcripts. We used the CMU dictionary to convert the word
method, shown in Figure 2, another network is trained with sequence in the OpenSubtitles data into a phoneme sequence.
TABLE II As shown in Table III, in the case of viseme-level modeling,
WER AND CER FOR V ISEME TO C HARACTER M ODELING . the character error rate is 33.9%, which is 16% more accurate
than in the case of word-level modeling (i.e. the second row).
Dataset CER WER Also, when external textual data is used for viseme decoding,
LRS2 26 % 37% we achieve higher accuracy than if the network implicitly learns
OpenSubtitles 10 % 16 %
the language model probabilities.

TABLE III
V. C ONCLUSION
P ERFORMANCE ON LRS2 DATASET. Lip reading is one of the most challenging tasks in the field
of computer vision. There is, in fact, scant data available on
Method WER CER
this task for many languages. We introduced a new method to
Video to Viseme 62.3 % 33.9 % use external text data for lip-reading. We can achieve higher
WAS [2] 73.9 % 49.9 % accuracy for lip-reading by utilizing the raw text data of a
Proposed method 69.5 % 46.1 % specific language, a grapheme-to-phoneme as well as a viseme
mapping. The experimental results indicated that the proposed
method can improve the accuracy of viseme decoding and
Subsequently, the viseme sequence for each sample is obtained outperforms the case where only lip-reading text data is used
using the phoneme to viseme mapping. After preparing these for language modeling by a wide margin. We also incorporated
data, the model was trained and some of the results are shown in this model into the viseme-level model for lip-reading and
Table I. As mentioned above, we used a sequence-to-sequence achieved higher accuracy than the case where only video data
network with a two-layer GRU with a cell size of 1024. We was used for training. One of the limitations of our work is the
also used attention mechanism [24] in order to achieve a better use of the same phoneme sequence for words with more than
result. one correct pronunciation. These words can be pronounced
The results of Table III show that using the OpenSubtitles in several ways and we do not know which pronunciation
corpus reduced the relative CER (Character Error Rate) by is used in the video. To get better results, the output of an
approximately 62%, and the relative WER error by around 57%. automatic speech recognition system can be considered as
The results of this experiment indicate that by having more future work. Furthermore, due to internal limitations, we had
training data for viseme decoding, a better language model to use a simple sequence-to-sequence model for both tasks.
can be obtained, and this will improve the accuracy of this Therefore, as another future work, we will incorporate our
decoding. Of course, there is definitely an upper bound for proposed method into state-of-the-art systems to show how it
this improvement, and it cannot be claimed that by increasing can improve the overall performance of a lip-reading system.
the data, the error can be reduced to zero. But it shows that it
is easier to improve the accuracy with that as there is a lot of VI. ACKNOWLEDGEMENT
unlabeled textual data in any language. The authors would like to extend their gratitude to the Speech
Laboratory of the Brno University of Technology for providing
B. Sentence-level lip-reading
access to computational servers and LRS datasets.
In this step, using the proposed models for character-level lip-
reading, we perform viseme-level lip-reading. For this purpose, R EFERENCES
we had to first train the video-to-viseme model, for which we
[1] I. Matthews, T. F. Cootes, J. A. Bangham, S. Cox, and R. Harvey,
need to have a sequence of viseme for each video sample. Here “Extraction of visual features for lipreading,” IEEE Transactions on
again, we used a dictionary similar to the previous experiment. Pattern Analysis and Machine Intelligence, vol. 24, no. 2, pp. 198–213,
After the phoneme sequence for each training sample was 2002.
[2] J. S. Chung, A. Senior, O. Vinyals, and A. Zisserman, “Lip reading
obtained, the phoneme-to-viseme mapping was used to convert sentences in the wild,” in 2017 IEEE Conference on Computer Vision
the phoneme sequence into the viseme sequence. After that, the and Pattern Recognition (CVPR). IEEE, 2017, pp. 3444–3453.
video-to-viseme model was trained using a simple sequence to [3] Z. Zhou, G. Zhao, and M. Pietikäinen, “Towards a practical lipreading
system,” in CVPR 2011. IEEE, 2011, pp. 137–144.
sequence model. The results of this experiment are shown in [4] G. Potamianos and C. Neti, “Improved ROI and within frame discriminant
Table III. features for lipreading,” in Proceedings 2001 International Conference
In the first row of this table, the video-to-viseme result shows on Image Processing (Cat. No. 01CH37205), vol. 3. IEEE, 2001, pp.
250–253.
that this model is able to do this task with an acceptable degree [5] Y. Lan, R. Harvey, B. Theobald, E.-J. Ong, and R. Bowden, “Comparing
of accuracy even by using a simple model. So far, we have visual features for lipreading,” in International Conference on Auditory-
only viseme sequence as output. In the second part of the table, Visual Speech Processing 2009, 2009, pp. 102–106.
[6] G. Potamianos, H. P. Graf, and E. Cosatto, “An image transform approach
the result of combining this model with the model prepared in for HMM based automatic lipreading,” in Proceedings 1998 International
the previous step is provided, along with a comparison with the Conference on Image Processing. ICIP98 (Cat. No. 98CB36269). IEEE,
result obtained through the method used in [2]. Considering 1998, pp. 173–177.
[7] Y. Lan, B.-J. Theobald, R. Harvey, E.-J. Ong, and R. Bowden, “Improving
that a simple architecture is used in both trained models, the visual features for lip-reading,” in Auditory-Visual Speech Processing
improvement of accuracy compared to [2] is considerable. 2010, 2010.
[8] S. S. Morade and S. Patnaik, “A novel lip reading algorithm by using
localized ACM and HMM: Tested for digit recognition,” optik, vol. 125,
no. 18, pp. 5181–5186, 2014.
[9] Z. Zhou, G. Zhao, X. Hong, and M. Pietikäinen, “A review of recent
advances in visual speech decoding,” Image and vision computing, vol. 32,
no. 9, pp. 590–605, 2014.
[10] L. Lombardi et al., “A survey of automatic lip reading approaches,” in
Eighth International Conference on Digital Information Management
(ICDIM 2013). IEEE, 2013, pp. 299–302.
[11] S. Mathulaprangsan, C.-Y. Wang, A. Z. Kusum, T.-C. Tai, and J.-C.
Wang, “A survey of visual lip reading and lip-password verification,” in
2015 International Conference on Orange Technologies (ICOT). IEEE,
2015, pp. 22–25.
[12] H. L. Bear and R. Harvey, “Decoding visemes: Improving machine lip-
reading,” in 2016 IEEE International Conference on Acoustics, Speech
and Signal Processing (ICASSP). IEEE, 2016, pp. 2009–2013.
[13] Y. M. Assael, B. Shillingford, S. Whiteson, and N. de Freitas, “LipNet:
Sentence-level lipreading,” arXiv preprint arXiv:1611.01599, vol. 2, no. 4,
2016.
[14] A. Graves, S. Fernández, F. Gomez, and J. Schmidhuber, “Connectionist
temporal classification: labelling unsegmented sequence data with
recurrent neural networks,” in Proceedings of the 23rd international
conference on Machine learning, 2006, pp. 369–376.
[15] W. Chan, N. Jaitly, Q. Le, and O. Vinyals, “Listen, attend and spell: A
neural network for large vocabulary conversational speech recognition,”
in 2016 IEEE International Conference on Acoustics, Speech and Signal
Processing (ICASSP). IEEE, 2016, pp. 4960–4964.
[16] T. Afouras, J. S. Chung, and A. Zisserman, “Deep lip reading: a
comparison of models and an online application,” arXiv preprint
arXiv:1806.06053, 2018.
[17] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez,
Ł. Kaiser, and I. Polosukhin, “Attention is all you need,” in Advances in
neural information processing systems, 2017, pp. 5998–6008.
[18] T. Afouras, J. S. Chung, and A. Zisserman, “Asr is all you need:
Cross-modal distillation for lip reading,” in ICASSP 2020-2020 IEEE
International Conference on Acoustics, Speech and Signal Processing
(ICASSP). IEEE, 2020, pp. 2143–2147.
[19] M. Luo, S. Yang, X. Chen, Z. Liu, and S. Shan, “Synchronous
bidirectional learning for multilingual lip reading,” arXiv preprint
arXiv:2005.03846, 2020.
[20] B. Martinez, P. Ma, S. Petridis, and M. Pantic, “Lipreading using temporal
convolutional networks,” in ICASSP 2020-2020 IEEE International
Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE,
2020, pp. 6319–6323.
[21] N. Harte and E. Gillen, “TCD-TIMIT: An audio-visual corpus of
continuous speech,” IEEE Transactions on Multimedia, vol. 17, no. 5,
pp. 603–615, 2015.
[22] T. Afouras, J. S. Chung, and A. Zisserman, “LRS3-TED: a large-scale
dataset for visual speech recognition,” arXiv preprint arXiv:1809.00496,
2018.
[23] J. Tiedemann and L. Nygaard, “The opus corpus-parallel and free:
http://logos. uio. no/opus.” in LREC. Citeseer, 2004.
[24] J. Chorowski, D. Bahdanau, D. Serdyuk, K. Cho, and Y. Ben-
gio, “Attention-based models for speech recognition,” arXiv preprint
arXiv:1506.07503, 2015.

Deep Audio-Visual Speech Recognition
No ratings yet
Deep Audio-Visual Speech Recognition
13 pages
Chung 18
No ratings yet
Chung 18
28 pages
Lip Reading Sentences Using Deep Learning With Only Visual Cues
No ratings yet
Lip Reading Sentences Using Deep Learning With Only Visual Cues
15 pages
Lip Reading Using CNN and LTSM
No ratings yet
Lip Reading Using CNN and LTSM
9 pages
Lip Reading With Hahn Convolutional Neural Networks
No ratings yet
Lip Reading With Hahn Convolutional Neural Networks
28 pages
00. Analyzing Lower Half Facial Gestures for Lip Reading Applications Survey on Vision Techniques
No ratings yet
00. Analyzing Lower Half Facial Gestures for Lip Reading Applications Survey on Vision Techniques
45 pages
Uag Deploy Config Guide
No ratings yet
Uag Deploy Config Guide
241 pages
Deep_Learning-Based_Automated_Lip-Reading_A_Survey
No ratings yet
Deep_Learning-Based_Automated_Lip-Reading_A_Survey
22 pages
Prajwal Sub-Word Level Lip Reading With Visual Attention CVPR 2022 Paper
No ratings yet
Prajwal Sub-Word Level Lip Reading With Visual Attention CVPR 2022 Paper
11 pages
Learning Individual Speaking Styles For Accurate L
No ratings yet
Learning Individual Speaking Styles For Accurate L
11 pages
2001 08702v1
No ratings yet
2001 08702v1
6 pages
A Novel Machine Lip Reading Model A Novel Machine Lip Reading Model
No ratings yet
A Novel Machine Lip Reading Model A Novel Machine Lip Reading Model
6 pages
Multi-Grained Spatio-Temporal Modeling For Lip-Reading: Wangchenhao17@mails - Ucas.ac - CN
No ratings yet
Multi-Grained Spatio-Temporal Modeling For Lip-Reading: Wangchenhao17@mails - Ucas.ac - CN
11 pages
A Lip Reading Method Based On 3D Convolutional Vision Transformer
No ratings yet
A Lip Reading Method Based On 3D Convolutional Vision Transformer
8 pages
Engineering Science and Technology, An International Journal
No ratings yet
Engineering Science and Technology, An International Journal
10 pages
Automatic Lip Reading Classification of
No ratings yet
Automatic Lip Reading Classification of
5 pages
Developing Phoneme Based Lip Reading Sentences System For Silent Speech Recognition
No ratings yet
Developing Phoneme Based Lip Reading Sentences System For Silent Speech Recognition
10 pages
Lip-Reading With Densely Connected Temporal Convolutional Networks
No ratings yet
Lip-Reading With Densely Connected Temporal Convolutional Networks
10 pages
HD Intelligent Traffic Camera User's Manual
No ratings yet
HD Intelligent Traffic Camera User's Manual
123 pages
cep report
No ratings yet
cep report
21 pages
A_Survey_on_Deep_Learning_based_Lip-Reading_Techniques
No ratings yet
A_Survey_on_Deep_Learning_based_Lip-Reading_Techniques
8 pages
ANN Paper
No ratings yet
ANN Paper
6 pages
Developing Phoneme-Based Lip-Reading Sentences System For Silent Speech Recognition
No ratings yet
Developing Phoneme-Based Lip-Reading Sentences System For Silent Speech Recognition
10 pages
LipReadNet_A_Deep_Learning_Approach_to_Lip_Reading (1)
No ratings yet
LipReadNet_A_Deep_Learning_Approach_to_Lip_Reading (1)
6 pages
Learning Spatio-Temporal Features With Two-Stream Deep 3D Cnns For Lipreading
No ratings yet
Learning Spatio-Temporal Features With Two-Stream Deep 3D Cnns For Lipreading
13 pages
Pseudo-Convolutional Policy Gradient For Sequence-to-Sequence Lip-Reading
No ratings yet
Pseudo-Convolutional Policy Gradient For Sequence-to-Sequence Lip-Reading
8 pages
Toward_Language-independent_Lip_Reading_A_Transfer_Learning_Approach
No ratings yet
Toward_Language-independent_Lip_Reading_A_Transfer_Learning_Approach
4 pages
1 s2.0 S1877050922001843 Main
No ratings yet
1 s2.0 S1877050922001843 Main
6 pages
NMITCON 2024 Submitted 23 24 B27 Lip Reading Using Deep Learning Project Paper
No ratings yet
NMITCON 2024 Submitted 23 24 B27 Lip Reading Using Deep Learning Project Paper
8 pages
Zhu 2020 J. Phys. Conf. Ser. 1651 012076
No ratings yet
Zhu 2020 J. Phys. Conf. Ser. 1651 012076
8 pages
Lipreading With 3D-2D-Cnn BLSTM-HMM and Word-Ctc Models
No ratings yet
Lipreading With 3D-2D-Cnn BLSTM-HMM and Word-Ctc Models
5 pages
5 - Log4J
67% (3)
5 - Log4J
40 pages
Deep Learning For Lip Reading Using Audio-Visual Information For Urdu Language
No ratings yet
Deep Learning For Lip Reading Using Audio-Visual Information For Urdu Language
5 pages
Lip Reading Word Classification: Abiel Gutierrez Stanford University Zoe-Alanah Robert Stanford University
No ratings yet
Lip Reading Word Classification: Abiel Gutierrez Stanford University Zoe-Alanah Robert Stanford University
9 pages
LRW-1000: A Naturally-Distributed Large-Scale Benchmark For Lip Reading in The Wild
No ratings yet
LRW-1000: A Naturally-Distributed Large-Scale Benchmark For Lip Reading in The Wild
8 pages
Hybrid Attention Mechanisms in 3D CNN for Noise-Resilient Lip Reading in Complex Environments
No ratings yet
Hybrid Attention Mechanisms in 3D CNN for Noise-Resilient Lip Reading in Complex Environments
11 pages
ashrith miniproject 2
No ratings yet
ashrith miniproject 2
11 pages
Lipsound2: Self-Supervised Pre-Training For Lip-To-Speech Reconstruction and Lip Reading
No ratings yet
Lipsound2: Self-Supervised Pre-Training For Lip-To-Speech Reconstruction and Lip Reading
11 pages
Deformation Flow Based Two-Stream Network For Lip Reading
No ratings yet
Deformation Flow Based Two-Stream Network For Lip Reading
7 pages
Lip Reading Using Committee Networks With Two Different Types of Concatenated Frame Images
No ratings yet
Lip Reading Using Committee Networks With Two Different Types of Concatenated Frame Images
7 pages
DL_REVIEW
No ratings yet
DL_REVIEW
4 pages
AFIA_KSASpunpile
No ratings yet
AFIA_KSASpunpile
48 pages
Mutual Information Maximization For Effective Lip Reading: Xing Zhao, Shuang Yang, Shiguang Shan, Xilin Chen
No ratings yet
Mutual Information Maximization For Effective Lip Reading: Xing Zhao, Shuang Yang, Shiguang Shan, Xilin Chen
8 pages
Lip Reading Using Deep Learning in Turkish Language
No ratings yet
Lip Reading Using Deep Learning in Turkish Language
12 pages
A Multimodal German Dataset For Automatic Lip Reading Systems and Transfer Learning
No ratings yet
A Multimodal German Dataset For Automatic Lip Reading Systems and Transfer Learning
8 pages
Analysis_of_Lip-Reading_Using_Deep_Learning_Techniques_A_Review
No ratings yet
Analysis_of_Lip-Reading_Using_Deep_Learning_Techniques_A_Review
6 pages
Icassp19 Zhoupan
No ratings yet
Icassp19 Zhoupan
5 pages
Cascade
No ratings yet
Cascade
6 pages
ANN Paper (1)
No ratings yet
ANN Paper (1)
7 pages
Lip-Decoder
No ratings yet
Lip-Decoder
11 pages
Lipreading Using a Comparative Machine Learning Approach
No ratings yet
Lipreading Using a Comparative Machine Learning Approach
7 pages
2
No ratings yet
2
7 pages
2209 01383v3
No ratings yet
2209 01383v3
5 pages
2 base
No ratings yet
2 base
5 pages
Batch_A3
No ratings yet
Batch_A3
7 pages
2012.02515v2
No ratings yet
2012.02515v2
7 pages
584 Camera Ready
No ratings yet
584 Camera Ready
6 pages
Deep Learning Model For Lip Reading To ImproveAccessibility
No ratings yet
Deep Learning Model For Lip Reading To ImproveAccessibility
6 pages
Paper 28
No ratings yet
Paper 28
6 pages
LIP Reading Using Facial Feature Extraction and Deep Learning
No ratings yet
LIP Reading Using Facial Feature Extraction and Deep Learning
5 pages
Implementation of Robust Design in ANSYS
No ratings yet
Implementation of Robust Design in ANSYS
118 pages
VMS Presentation 2018
No ratings yet
VMS Presentation 2018
17 pages
Action Plan - 041203
No ratings yet
Action Plan - 041203
11 pages
JBL PRX815W ServiceManual VA
100% (8)
JBL PRX815W ServiceManual VA
23 pages
Analysis of Selected Motor Event and Starting Reports
100% (1)
Analysis of Selected Motor Event and Starting Reports
14 pages
Big Data Computing Spark Basics and RDD: Ke Yi
No ratings yet
Big Data Computing Spark Basics and RDD: Ke Yi
43 pages
A&M XTM M P8 Catalouge
No ratings yet
A&M XTM M P8 Catalouge
2 pages
Solaxcloud Web User Guide V3.2 Enduser en 2022.05.27
No ratings yet
Solaxcloud Web User Guide V3.2 Enduser en 2022.05.27
17 pages
Automation Exercise
No ratings yet
Automation Exercise
14 pages
Blogging MCQs
No ratings yet
Blogging MCQs
4 pages
Predicción de Cancer de Mama Con Machine Learning y Comparación de F1 Score
No ratings yet
Predicción de Cancer de Mama Con Machine Learning y Comparación de F1 Score
13 pages
5 Pen PC Technology Powerpoint Presentation
No ratings yet
5 Pen PC Technology Powerpoint Presentation
18 pages
MSNL FS PP Routing Upload
No ratings yet
MSNL FS PP Routing Upload
10 pages
Major Project Presentation ON Twitter Sentiment Analysis "Baldev Ram Mirdha Institute of Technology"
No ratings yet
Major Project Presentation ON Twitter Sentiment Analysis "Baldev Ram Mirdha Institute of Technology"
13 pages
Revit Structure 2012 Commands and Shortcuts
100% (1)
Revit Structure 2012 Commands and Shortcuts
5 pages
Replay XD User Manual Web
No ratings yet
Replay XD User Manual Web
12 pages
CS 701 Viva Qa
No ratings yet
CS 701 Viva Qa
4 pages
2 - Marisel Selene Gutierrez Garcia - Oxford Online Skills Program B1
No ratings yet
2 - Marisel Selene Gutierrez Garcia - Oxford Online Skills Program B1
9 pages
Emtech First Quarter Exam
No ratings yet
Emtech First Quarter Exam
3 pages
Snapdragon 865 5G Mobile Platform
No ratings yet
Snapdragon 865 5G Mobile Platform
6 pages
Si 810 PDF
No ratings yet
Si 810 PDF
2 pages
SY-1 Build Instructions: Erratas/preparations
No ratings yet
SY-1 Build Instructions: Erratas/preparations
6 pages
Resume - MIS - Pradeep Tiwari (22 Aug. 2023)
No ratings yet
Resume - MIS - Pradeep Tiwari (22 Aug. 2023)
2 pages
Z1200E
No ratings yet
Z1200E
2 pages
Engineering Company Training Topics 2
No ratings yet
Engineering Company Training Topics 2
1 page
Y7 Autumn Block 1 ANS6 Continue Non Linear Sequences
No ratings yet
Y7 Autumn Block 1 ANS6 Continue Non Linear Sequences
1 page
The Newbie’s Guidebook to ChatGPT: A Beginner's Tutorial: The Newbie’s Guidebook
From Everand
The Newbie’s Guidebook to ChatGPT: A Beginner's Tutorial: The Newbie’s Guidebook
Timothy King
No ratings yet
Visual Word: Unlocking the Power of Image Understanding
From Everand
Visual Word: Unlocking the Power of Image Understanding
Fouad Sabry
No ratings yet
Explanation Based Learning: Fundamentals and Applications
From Everand
Explanation Based Learning: Fundamentals and Applications
Fouad Sabry
No ratings yet
Language Identification: Fundamentals and Applications
From Everand
Language Identification: Fundamentals and Applications
Fouad Sabry
No ratings yet

Lip Reading Using External Viseme Decoding: 1 Javad Peymanfard 2 Mohammad Reza Mohammadi 3 Hossein Zeinali

Uploaded by

Lip Reading Using External Viseme Decoding: 1 Javad Peymanfard 2 Mohammad Reza Mohammadi 3 Hossein Zeinali

Uploaded by

Lip reading using external viseme decoding

4st Nasser Mozayani

Ground Truth Predicted

You might also like