0% found this document useful (0 votes)

21 views

Continual Learning in Machine Speech Chain Using Gradient Episodic Memory.18320v1

Uploaded by

neturiue

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

21 views

Continual Learning in Machine Speech Chain Using Gradient Episodic Memory.18320v1

Uploaded by

neturiue

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 6

CONTINUAL LEARNING IN MACHINE SPEECH CHAIN

USING GRADIENT EPISODIC MEMORY

Geoffrey Tyndall1∗ , Kurniawati Azizah1 , Dipta Tanaya1 ,

Ayu Purwarianti2 , Dessi Puji Lestari2 , Sakriani Sakti3,4
1
University of Indonesia, Indonesia
2
Bandung Institute of Technology, Indonesia
arXiv:2411.18320v1 [cs.CL] 27 Nov 2024

3
Nara Institute of Science and Technology, Japan
4
Japan Advanced Institute of Science and Technology, Japan
[email protected],
{kurniawati.azizah,diptatanaya}@cs.ui.ac.id,
{ayu,dessipuji}@itb.ac.id, [email protected]

ABSTRACT of large-scale speech models [5, 6] that excel in multitask

Continual learning for automatic speech recognition (ASR) performance, these models demand substantial resources in
systems poses a challenge, especially with the need to avoid terms of data and computational power, and they require
catastrophic forgetting while maintaining performance on the availability of all tasks from the beginning, i.e., offline
previously learned tasks. This paper introduces a novel ap- learning.
proach leveraging the machine speech chain framework to An alternative approach to this issue is fine-tuning, which
enable continual learning in ASR using gradient episodic transfers knowledge from one task to another, or multitask
memory (GEM). By incorporating a text-to-speech (TTS) learning, where the model is trained from scratch using
component within the machine speech chain, we support both previous and new task data simultaneously. Unfortu-
the replay mechanism essential for GEM, allowing the ASR nately, the former approach (transfer learning) can degrade
model to learn new tasks sequentially without significant the model’s performance on earlier tasks due to catastrophic
performance degradation on earlier tasks. Our experiments, forgetting [7]. Meanwhile, the latter approach necessitates re-
conducted on the LJ Speech dataset, demonstrate that our taining old data to mix with new task data, potentially leading
method outperforms traditional fine-tuning and multitask to privacy concerns.
learning approaches, achieving a substantial error rate reduc- Continual learning is a paradigm designed to allow mod-
tion while maintaining high performance across varying noise els to learn new tasks sequentially without compromising
conditions. We showed the potential of our semi-supervised their ability to perform previous tasks or violating data pri-
machine speech chain approach for effective and efficient vacy. Its effectiveness in sequentially handling multiple
continual learning in speech recognition. recognition tasks was recently demonstrated in [8].
Index Terms— Machine Speech Chain, Continual Learn- Unlike existing fully supervised methods for conducting
ing, Gradient Episodic Memory continual learning experiments on ASR, this paper proposes
a semi-supervised approach within the machine speech chain
1. INTRODUCTION framework [9]. Our method integrates text-to-speech (TTS)
to support a replay mechanism in continual learning. We
The exceptional performance of deep learning architectures, adopt gradient episodic memory (GEM) [10] as our chosen
as illustrated by the Transformer model [1], has enabled implementation for this replay-based continual learning sce-
state-of-the-art automatic speech recognition (ASR) systems nario.
to reach levels of accuracy comparable to human perfor- We evaluate our proposed method against other prevalent
mance [2, 3, 4]. These advancements have significantly learning paradigms such as fine-tuning and multitask learn-
enhanced speech recognition capabilities. However, a critical ing. Our results indicate that continual learning within the
challenge persists: ASR systems should be capable of rec- machine speech chain framework offers superior performance
ognizing a continuous stream of tasks. Despite the existence compared to these traditional methods and serves as a viable
∗ This work was conducted while the first author was doing internship at alternative to fully supervised continual learning. Although
HA3CI Laboratory, JAIST, Japan under JST Sakura Science Program the upper bound fully supervised continual learning achieves
various works, adaptive lombard TTS [12], data augmenta-
tion [13], and code-switching [14].

2.2. Gradient Episodic Memory

Gradient episodic memory (GEM) is a replay method of con-

tinual learning paradigm [10]. GEM exploits samples from
the past task’s data when encountering the data of a new task
to minimize the L2 distance between the gradients of the new
task’s data and the old ones’ data, i.e.,

1
min ∥g − g̃∥22
g̃ 2 (1)
s.t.⟨g̃, gk ⟩ ≥ 0, ∀k ∈ (0, ..., i − 1),

where g, g̃k ∈ R|θ| and |θ| is the number of model’s param-

eters. ASR model that was equipped with GEM in previous
finding (see [8]) outperformed regularization-based methods
, such as synaptic intelligence [15], or knowledge distillation
[16], in a continual learning scenario with different acous-
tic and topic domain acted as task boundary. In this paper,
we introduce GEM usage in machine speech chain and we
demonstrate first-hand to show its potential.

3. MACHINE SPEECH CHAIN USING GEM

We introduce a three-stage mechanism designed to en-

able ASR models to perform continual learning in a semi-
supervised manner, achieving satisfactory results with mini-
mal forgetting. These three stages, depicted in Figure 1, build
upon the process proposed in [9], for the first and second
stages, with our continual learning method introduced in the
third stage.
Fig. 1. Continual learning in the machine speech chain frame-
work.
1. First stage: Supervised learning on the base task.
Here, ASR and TTS are trained separately in a super-
a lower error rate, our approach manages to achieve a 40% vised manner to ensure strong baseline performance
average error rate reduction relative to fine-tuning. There- for the subsequent training stages.
fore, our contributions include: (1) proposing a machine
speech chain-based method for enabling continual learning 2. Second stage: Semi-supervised learning. At this stage,
in speech recognition; (2) conducting experiments to validate ASR and TTS mutually enhance each other by training
our method using the LJ speech dataset. on unlabeled data from the base task, using unsuper-
vised methods to improve performance.
2. RELATED WORK
3. Third stage: Continual learning. ASR engages in con-
tinual learning for new tasks using replayed inputs from
2.1. Machine Speech Chain
the base task, synthesized by TTS.
Machine speech chain is an architecture that connects sequence-
to-sequence model of automatic speech recognition (ASR) In our approach, the replay process for speech recogni-
and text-to-speech (TTS) in a closed-loop framework. This tion leverages TTS as a synthesis model to generate pseudo-
integration was proposed to be representative of human samples of the base task. These pseudo-samples are stored in
speech chain mechanism [9], which is listening while speak- episodic memory and used by GEM to regulate the gradients
ing [11]. To date, machine speech chain has been used in for both new and previous tasks.
CER (%) CER (%) Split Ratio CER (%) CER (%)
Model LJ Original LJ Noisy Labeled / Unlabeled LJ Original LJ Noisy
ASRLower 30 / 70 11.1 15.5
Pre-trained 9.2 82.6 50 / 50 4.8 11.5
Fine-tuning 19.0 31.3 70 / 30 4.0 10.9
GEM 8.5 15.8
Multitask 74.8 76.7 Table 2. Results for the ASRSpeechChain with updates of dif-
ASRSpeechChain ferent ratio of labeled and unlabeled data during base task
Pre-trained 6.4 95.7 learning in the first stage and second stage of the framework.
Fine-tuning 12.7 33.1
GEM 11.1 15.5
4. EXPERIMENTS
ASRUpper
Pre-trained 1.9 108.4
4.1. Experimental Setup
Fine-tuning 6.7 15.6
GEM 5.2 8.4 We prepared two tasks for the ASR models to recognize.
Multitask 3.8 10.9 The first task, referred to as the base task, utilized the clean
original dataset of LJ Speech [19], consisting of 24 hours of
Table 1. CER results for different methods applied on the audio. To simulate different scenario for the subsequent task,
ASR model. The color-coded rows (■ 1st , ■ 2nd , ■ 3rd ) repre- we created a noisy version of the original speech dataset.
sent each stage of our proposed machine speech chain-based This noisy dataset also comprises 24 hours of audio, but with
method. added white noise at a signal-to-noise ratio (SNR) of 0. Con-
sequently, the base task is denoted as LJ Original, and the
subsequent task is denoted as LJ Noisy. Both datasets were
During the third stage, when the machine speech chain
split into train, dev, and test sets with a ratio of 94%, 3%, and
encounters incoming tasks as:
3%, respectively.
For the ASR architecture, we employed the Speech-
[D1 , ..., Dn ] = [(x1 , y 1 ), ..., (xn , y n )] Transformer [20], while the TTS architecture was based on
= [{(x11 , y11 , ..., (x1|D1 | , y|D
1
1 | )}, ..., the Transformer-based Tacotron 2 [21]. All of the ASR
models did not involve hyperparameter tuning since they al-
{(xn1 , y1n ), ..., (xn|Dn | , y|D
n
n | )}],
ready employed almost identical hyperparameters to those
that had been used in [20]. The architecture of the ASR
where x is the input and y is the label, we forward the speech models employed 12 encoder blocks, 6 decoder blocks, 4
data label to TTS to generate pseudo-samples of the base task, attention heads, and a feed-forward hidden layer size of 2048.
i.e., x̂0 ∼ T T S(y i ). These synthesized samples are stored, We used 80 dimensions for the Mel-spectrogram input. We
along with the data from the incoming task, processed as fol- trained the models using the Adam optimizer with β1 = 0.9,
lows: β2 = 0.98, ϵ = 10−9 and employed cross-entropy loss with
neighborhood smoothing. The episodic memory that we used
M0 ← M0 ∪ (x̂0 , y i ) (2) for continual learning had size of 100 samples per task, or in
i i other word 1% of dataset size.
Mi ← Mi ∪ (x , y ) (3)
For TTS models that are needed in machine speech chain
i i
g ← ∇θ ℓ(ASRθ (x ), y ) (4) condition, we configured them to be consisted of 6 encoder
gk ← ∇θ ℓ(ASRθ , Mk ) for all k < i (5) blocks for the transformer-based encoder, 6 decoder blocks
g̃ ← PROJECT(g, g0 , g1 , ..., gi−1 ), see (1) (6) for the autoregressive decoder, 8 heads, and a feed-forward
hidden layer size of 2048. These values were identical to the
θ ← θ − δg̃, (7)
best configuration that had been used in [21]. The TTS input
was the character sequence, and the output was the 80 dimen-
where M represents the episodic memory and δ denotes the sions of the Mel-spectrogram. We used the Adam optimizer
weight assigned for updating the model parameters during with the same β1 , β2 , ϵ values and employed cross-entropy
continual learning for the i-th task (i > 0). loss.
To our knowledge, our proposed mechanism is the first to Our experiment involved training ASR models under su-
incorporate TTS within the continual learning framework for pervised conditions: lower bound and upper bound, and our
ASR. While prior works in continual learning have utilized proposed method that involved semi-supervised condition:
various generative models [17, 18], none has specifically em- machine speech chain. The upper and lower bound refers
ployed TTS for continual learning in ASR. to the amount of base task data provided to the ASR model
Fig. 2. Learning curves of models in continual learning paradigm and their respective metrics.

before it engages in learning with subsequent task. Specifi- significant error rate reductions, comparable to the ASRUpper
cally, we varied the proportion of the LJ Original training data model, with a 40% error rate reduction relative to the respec-
while keeping the LJ Noisy training data constant at 100% tive fine-tuning methods. We also demonstrate the results
of the train set. We used 30% of LJ Original train set for the with different split ratio of labeled and unlabeled data of the
lower bound condition, 30% of the train set as labeled data & base task in Table 2, where we can observe that with increas-
70% of the train set as unlabeled data for the machine speech ing labeled data the error rates are becoming smaller. These
chain condition, and 100% of the train set for the upper bound results emphasize that our proposed method is effective, mit-
condition. igating catastrophic forgetting and maintaining consistent
performance across tasks and varying semi-supervised learn-
ing scenarios.
4.2. Experiment Result
4.2.1. Continual Learning Performance 4.2.2. Continual Learning Comparison
The experimental results, as detailed in Table 1, demonstrate We also compared our semi-supervised method to the other
the efficacy of various continual learning approaches applied continual learning methods which are carried out in a fully
to the ASR model in both clean (LJ Original) and noisy (LJ supervised scenario. These other methods were gradient
Noisy) conditions. The ASRLower results show that the GEM episodic memory (GEM) and elastic weight consolidation
approach significantly reduces the character error rate (CER) (EWC) [22]. We can see from Figure 2 that the learn-
compared to fine-tuning and multitask learning. For instance, ing curves exhibit the superiority of GEM, as models that
GEM achieved a CER of 8.5% on LJ Original and 15.8% leveraged GEM as their replay process were able to prevent
on LJ Noisy, outperforming the fine-tuning method which re- catastrophic forgetting. Although EWC had worse forgetting
sulted in CERs of 19.0% and 31.3% respectively. Multitask prevention, it performed better on learning new task because
learning, however, showed the highest CERs of 74.8% and of its fully supervised scenario.
76.7%, indicating its limitation in handling noise without op- We also computed the continual learning metrics, such as
timal balance of data. average (AVG), backward transfer (BWT), and forward trans-
The ASRSpeechChain model trained with GEM outper- fer (FWT) character error rate, as shown in Figure 2, which
formed the fine-tuning method, achieving CERs of 11.1% were useful for comparing the three models to each other. In
and 15.5% for LJ Original and LJ Noisy respectively. This is our experiment, BWT is defined as the ability of a model to
a significant improvement over fine-tuning, which recorded transfer the lowest possible error to the previous task it has
CERs of 12.7% and 33.1%. Furthermore, comparing the encountered, while FWT is defined as the ability of a model
GEM method across different models, ASRUpper using GEM to learn a new task with the lowest possible error compared to
achieved the lowest CERs at 5.2% and 8.4%, compared to the error rate attained by the standard fine-tune method.
fine-tuning and multitask methods. However, it is impor- GEM, when applied in a supervised ASR system, as ex-
tant to highlight that the ASRSpeechChain model, despite not pected, achieved the lowest of all the metrics. EWC had
reaching the lowest error rates, still showed substantial im- a slightly lower AVG at 12.5% than ASRSpeechChain , which
provements. The ASRSpeechChain model with GEM achieved achieved 13.3%. Our model performed well in reducing for-
getting by introducing a lower error to the previous task with a donated voluntarily by the speaker to the public domain. The
BWT at 4.7% than EWC’s at 7.8%. For the FWT metric, our texts that were read by the speaker are also in the public
model and EWC performed relatively similarly at -0.3% and domain. We are aware of the usage of synthetic data that
-0.1% respectively. From these results, we can observe that is generated by text-to-speech (TTS) to assist the continual
our model works as intended to learn sequential tasks, pre- learning of automatic speech recognition (ASR). There is
vent catastrophic forgetting, and exploit accumulated knowl- potential to perpetuate ethical risk, such as bias and attribu-
edge to learn a new task, which are all the properties of a tion issues in the synthetic samples. However, our proposed
functioning continual learning process. method utilizes TTS within a closed-loop framework, allow-
ing us to better control the generation process and mitigate
5. CONCLUSION such issues. Furthermore, we believe this method can allevi-
ate key challenges, such as the reliance on large quantities of
We proposed a novel method to allow automatic speech real human speech data.
recognition (ASR) model to perform continual learning in a
semi-supervised manner of machine speech chain. We then 8. ACKNOWLEDGEMENTS
demonstrated first-hand the implementation of such replay
method with gradient episodic memory (GEM). Although Part of this work is supported by JSPS KAKENHI Grant
our upper bound supervised model achieved lower CER Numbers JP21H05054 and JP23K21681, as well as JST
than our proposed method, the machine speech chain-based Sakura Science Program.
method managed to get the same 40% averaged error rate
reduction. Furthermore, we compared both machine speech 9. REFERENCES
chain that was trained under the proposed continual learning
scenario with the machine speech chain under the fine-tuning [1] Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob
scenario. We found that our method worked and achieved Uszkoreit, Llion Jones, Aidan N Gomez, Lukasz Kaiser,
minimal forgetting, or prevented catastrophic forgetting. This and Illia Polosukhin, “Attention is all you need,” in Ad-
showed that our novel method has potential for further appli- vances in Neural Information Processing Systems, 2017.
cation of speech recognition and can serve as an alternative
to the fully supervised mechanism of continual learning. We [2] Jason Li, Vitaly Lavrukhin, Boris Ginsburg, Ryan Leary,
believe this paper provides the first exploration of contin- Oleksii Kuchaiev, Jonathan M Cohen, Huyen Nguyen,
ual learning in machine speech chain framework and makes and Ravi Teja Gadde, “Jasper: An end-to-end con-
a step towards realizing effective and efficient learning for volutional neural acoustic model,” arXiv preprint
speech recognition. arXiv:1904.03288, 2019.
[3] Qian Zhang, Han Lu, Hasim Sak, Anshuman Tripathi,
6. LIMITATIONS Erik McDermott, Stephen Koo, and Shankar Kumar,
“Transformer transducer: A streamable speech recog-
We acknowledge the need for further experiments to assess nition model with transformer encoders and rnn-t loss,”
the generalizability of our approach. While this work demon- in IEEE International Conference on Acoustics, Speech,
strates success on a simple task boundary of noise variation, and Signal Processing (ICASSP), 2020.
future work will involve applying our method to a wider range
of tasks, such as multilingual speech recognition (where the [4] Anmol Gulati, James Qin, Chung-Cheng Chiu, Niki
model needs to adapt to different phonetic inventories) or Parmar, Yu Zhang, Jiahui Yu, Wei Han, Shibo Wang,
task-agnostic continual learning (where tasks are not prede- Zhengdong Zhang, Yonghui Wu, and Ruoming Pang,
fined). This will allow us to investigate the effectiveness of “Conformer: Convolution-augmented transformer for
our method in handling more complex scenarios and poten- speech recognition,” in Conference of the International
tially lead to a more robust continual learning for ASR in Speech Communication Association (INTERSPEECH),
machine speech chain framework. 2020.
[5] Alec Radford, Jong Wook Kim, Tao Xu, Greg Brock-
7. ETHICS STATEMENT man, Christine McLeavey, and Ilya Sutskever, “Robust
speech recognition via large-scale weak supervision,” in
Our study followed the scientific methodology and ethics. International Conference on Machine Learning (ICML),
The LJ Speech dataset that we used is a public domain dataset 2023.
which is not in violation of license and data ethics. LJ Speech
dataset is an English language speech dataset consisting of [6] Seamless Communication, Loı̈c Barrault, Yu-An
13,100 short audio clips of a single speaker reading passages Chung, Mariano Cora Meglioli, David Dale, Ning
from 7 non-fiction books. The audio part was recorded and Dong, Paul-Ambroise Duquenne, Hady Elsahar,
Hongyu Gong, Kevin Heffernan, John Hoffman, learning,” in Proceedings of the 3rd Annual Meeting
Christopher Klaiber, Pengwei Li, Daniel Licht, Jean of the Special Interest Group on Under-resourced Lan-
Maillard, Alice Rakotoarison, Kaushik Ram Sadagopan, guages @ LREC-COLING 2024, 2024.
Guillaume Wenzek, Ethan Ye, Bapi Akula, Peng-Jen
Chen, Naji El Hachem, Brian Ellis, Gabriel Mejia [15] Friedemann Zenke, Ben Poole, and Surya Ganguli,
Gonzalez, Justin Haaheim, Prangthip Hansanti, Russ “Continual learning through synaptic intelligence,” in
Howes, Bernie Huang, Min-Jae Hwang, Hirofumi International Conference on Machine Learning (ICML),
Inaguma, Somya Jain, Elahe Kalbassi, Amanda Kallet, 2017.
Ilia Kulikov, Janice Lam, Daniel Li, Xutai Ma, Ruslan [16] Geoffrey Hinton, Oriol Vinyals, and Jeff Dean, “Distill-
Mavlyutov, Benjamin Peloquin, Mohamed Ramadan, ing the knowledge in a neural network,” arXiv preprint
Abinesh Ramakrishnan, Anna Sun, Kevin Tran, Tuan arXiv:1503.02531, 2015.
Tran, Igor Tufanov, Vish Vogeti, Carleigh Wood,
Yilin Yang, Bokai Yu, Pierre Andrews, Can Balioglu, [17] Hanul Shin, Jung Kwon Lee, Jaehong Kim, and Jiwon
Marta R. Costa-jussà, Onur Celebi, Maha Elbayad, Kim, “Continual learning with deep generative replay,”
Cynthia Gao, Francisco Guzmán, Justine Kao, Ann in Advances in Neural Information Processing Systems,
Lee, Alexandre Mourachko, Juan Pino, Sravya Popuri, 2017.
Christophe Ropers, Safiyyah Saleem, Holger Schwenk,
Paden Tomasello, Changhan Wang, Jeff Wang, and [18] Craig Atkinson, Brendan McCane, Lech Szymanski,
Skyler Wang, “Seamlessm4t: Massively multilingual and Anthony Robins, “Pseudo-recursal: Solving the
& multimodal machine translation,” arXiv preprint catastrophic forgetting problem in deep neural net-
arXiv:2308.11596, 2023. works,” arXiv preprint arXiv:1802.03875, 2018.

[7] Michael McCloskey and Neal J Cohen, Catastrophic [19] Keith Ito and Linda Johnson, “The LJ
interference in connectionist networks: The sequential Speech Dataset,” https://keithito.com/
learning problem, Academic Press, 1989. LJ-Speech-Dataset/, 2017.

[8] Heng-Jui Chang, Hung yi Lee, and Lin shan Lee, “To- [20] Linhao Dong, Shuang Xu, and Bo Xu, “Speech-
wards lifelong learning of end-to-end asr,” in Confer- transformer: a no-recurrence sequence-to-sequence
ence of the International Speech Communication Asso- model for speech recognition,” in IEEE International
ciation (INTERSPEECH), 2021. Conference on Acoustics, Speech, and Signal Process-
ing (ICASSP), 2018.
[9] Andros Tjandra, Sakriani Sakti, and Satoshi Nakamura,
“Machine speech chain,” IEEE Transactions on Audio, [21] Naihan Li, Shujie Liu, Yanqing Liu, Sheng Zhao, and
Speech, and Language Processing, 2020. Ming Liu, “Neural speech synthesis with transformer
network,” in Proceedings of the AAAI Conference on
[10] David Lopez-Paz and Marc’Aurelio Ranzato, “Gradient Artificial Intelligence, 2019.
episodic memory for continual learning,” in Advances
in Neural Information Processing Systems, 2017. [22] James Kirkpatrick, Razvan Pascanu, Neil Rabinowitz,
Joel Veness, Guillaume Desjardins, Andrei A Rusu,
[11] Peter B. Denes and Elliot Pinson, The Speech Chain, Kieran Milan, John Quan, Tiago Ramalho, Agnieszka
Worth Publishers, 1993. Grabska-Barwinska, Demis Hassabis, Claudia Clopath,
Dharshan Kumaran, and Raia Hadsell, “Overcoming
[12] Sashi Novitasari, Sakriani Sakti, and Satoshi Naka- catastrophic forgetting in neural networks,” Proceed-
mura, “A machine speech chain approach for dynam- ings of the national academy of sciences, 2017.
ically adaptive lombard tts in static and dynamic noise
environments,” IEEE Transactions on Audio, Speech,
and Language Processing, 2022.

[13] Heli Qi, Sashi Novitasari, Andros Tjandra, Sakriani

Sakti, and Satoshi Nakamura, “Speechain: A speech
toolkit for large scale machine speech chain,” arXiv
preprint arXiv:2301.02966, 2023.

[14] Rais Vaza Man Tazakka, Dessi Lestari, Ayu Purwarianti,

Dipta Tanaya, Kurniawati Azizah, and Sakriani Sakti,
“Indonesian-English code-switching speech recognition
using the machine speech chain based semi-supervised

Past Simple Exercises
83% (6)
Past Simple Exercises
3 pages
Lecture 9 - Speech Recognition
No ratings yet
Lecture 9 - Speech Recognition
65 pages
Science 10 q1 Dlp3
No ratings yet
Science 10 q1 Dlp3
1 page
Weekly 7 Habits of Highly Effective Teens Semester Reading
No ratings yet
Weekly 7 Habits of Highly Effective Teens Semester Reading
6 pages
ACL - 2022 - Yanzhe Zhang - Continual Sequence Generation With Adaptive Compositional Modules
No ratings yet
ACL - 2022 - Yanzhe Zhang - Continual Sequence Generation With Adaptive Compositional Modules
15 pages
L - B S R G C N: Etter Ased Peech Ecognition With Ated ONV ETS
No ratings yet
L - B S R G C N: Etter Ased Peech Ecognition With Ated ONV ETS
10 pages
Transactions On Neural Networks and Learning Systems 11
No ratings yet
Transactions On Neural Networks and Learning Systems 11
1 page
CSExtended Essay
No ratings yet
CSExtended Essay
31 pages
Towards Rehearsal-Free Multilingual ASR - A LoRA-based Case Study On Whisper
No ratings yet
Towards Rehearsal-Free Multilingual ASR - A LoRA-based Case Study On Whisper
5 pages
Deep Speech 2: End-to-End Speech Recognition in English and Mandarin
No ratings yet
Deep Speech 2: End-to-End Speech Recognition in English and Mandarin
10 pages
Lexicon-Free Conversational Speech Recognition With Neural Networks
No ratings yet
Lexicon-Free Conversational Speech Recognition With Neural Networks
10 pages
Pronunciation Tutor For Deaf Children Based On ASR
No ratings yet
Pronunciation Tutor For Deaf Children Based On ASR
6 pages
10.2478 - Jaiscr 2019 0006
No ratings yet
10.2478 - Jaiscr 2019 0006
11 pages
Almost Unsupervised Text To Speech and Automatic Speech Recognition
No ratings yet
Almost Unsupervised Text To Speech and Automatic Speech Recognition
11 pages
Lang TDNN
No ratings yet
Lang TDNN
21 pages
AI for Everyone: An Intermediate Guide to Artificial Intelligence
From Everand
AI for Everyone: An Intermediate Guide to Artificial Intelligence
Nova Clarke
No ratings yet
2003.09553v2
No ratings yet
2003.09553v2
20 pages
Speech Recognition Using Neural Networks IJERTV7IS100087
No ratings yet
Speech Recognition Using Neural Networks IJERTV7IS100087
7 pages
Research Method and Presentation (Mini Project Proposal)
No ratings yet
Research Method and Presentation (Mini Project Proposal)
26 pages
Cross-Language Transfer Learning, Continuous Learning, and Domain
No ratings yet
Cross-Language Transfer Learning, Continuous Learning, and Domain
5 pages
Deep Learning in Natural Language Processing A State-of-the-Art Survey
No ratings yet
Deep Learning in Natural Language Processing A State-of-the-Art Survey
6 pages
Deep Speech - Scaling Up End-To-End Speech Recognition
No ratings yet
Deep Speech - Scaling Up End-To-End Speech Recognition
12 pages
Christoph Bensch Master Thesis
No ratings yet
Christoph Bensch Master Thesis
67 pages
Convai Technical Overview Speech Ai Part 2 2301964
No ratings yet
Convai Technical Overview Speech Ai Part 2 2301964
11 pages
Research paper
No ratings yet
Research paper
9 pages
Baskar Is2019 193167
No ratings yet
Baskar Is2019 193167
5 pages
Contextualized Automatic Speech Recognition With Dynamic Vocabulary
No ratings yet
Contextualized Automatic Speech Recognition With Dynamic Vocabulary
5 pages
Automatic Speech Recognition Using Deep Neural Networks
No ratings yet
Automatic Speech Recognition Using Deep Neural Networks
6 pages
End-to-End Automatic Speech Recognition
No ratings yet
End-to-End Automatic Speech Recognition
19 pages
Listen, Attend and Spell
No ratings yet
Listen, Attend and Spell
16 pages
A Neural Attention Model For Speech Command Recognition: A B C C
No ratings yet
A Neural Attention Model For Speech Command Recognition: A B C C
18 pages
SPEECH RECOGNITION SYSTEM
No ratings yet
SPEECH RECOGNITION SYSTEM
5 pages
Editor in Chief,+recurrent Neural Networks in Automatic Speech Recognition
No ratings yet
Editor in Chief,+recurrent Neural Networks in Automatic Speech Recognition
8 pages
1910.12740v2
No ratings yet
1910.12740v2
5 pages
Continual Learning
No ratings yet
Continual Learning
12 pages
2007.13904v2
No ratings yet
2007.13904v2
20 pages
Performance - Evaluation - of - Recurrent - Neural - Networks-LSTM - and - GRU - For ASR - IC2E3
No ratings yet
Performance - Evaluation - of - Recurrent - Neural - Networks-LSTM - and - GRU - For ASR - IC2E3
6 pages
Intelligent Speech Recognition Algorithm in Multimedia Visual Interaction Via BiLSTM and Attention Mechanism
No ratings yet
Intelligent Speech Recognition Algorithm in Multimedia Visual Interaction Via BiLSTM and Attention Mechanism
13 pages
Wang_Continual_Learning_With_Lifelong_Vision_Transformer_CVPR_2022_paper
No ratings yet
Wang_Continual_Learning_With_Lifelong_Vision_Transformer_CVPR_2022_paper
11 pages
2015_Multi-task Learning of Deep Neural Networks for Low-resource Speech Recognition_Chen_Mak_IEEEACM Transactions on Audio, Speech, and Language Processing
No ratings yet
2015_Multi-task Learning of Deep Neural Networks for Low-resource Speech Recognition_Chen_Mak_IEEEACM Transactions on Audio, Speech, and Language Processing
12 pages
Hybrid CTC/Attention Architecture For End-to-End Speech Recognition
No ratings yet
Hybrid CTC/Attention Architecture For End-to-End Speech Recognition
16 pages
Deep Learning with Fast.ai: Definitive Reference for Developers and Engineers
From Everand
Deep Learning with Fast.ai: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Ijeet 12 03 035
No ratings yet
Ijeet 12 03 035
9 pages
applsci-15-02919
No ratings yet
applsci-15-02919
19 pages
SPeech Understanding Facebook
No ratings yet
SPeech Understanding Facebook
5 pages
Continual Learning For Recurrent Neural Networks - An Empirical Evaluation
No ratings yet
Continual Learning For Recurrent Neural Networks - An Empirical Evaluation
48 pages
Recent Advances of Foundation Language Models-Based Continual Learning - A Survey
No ratings yet
Recent Advances of Foundation Language Models-Based Continual Learning - A Survey
40 pages
Posterior-GAN: Towards Informative and Coherent Response Generation With Posterior Generative Adversarial Network
No ratings yet
Posterior-GAN: Towards Informative and Coherent Response Generation With Posterior Generative Adversarial Network
8 pages
Malayalam Speech Recognition
No ratings yet
Malayalam Speech Recognition
3 pages
50 Breakthrough AI Concepts in 500 Words Each: In 500 words, #17
From Everand
50 Breakthrough AI Concepts in 500 Words Each: In 500 words, #17
Nietsnie Trebla
No ratings yet
RSOM For TSP
No ratings yet
RSOM For TSP
6 pages
Realization of Embedded Speech Recognmition Module Based On STM32
No ratings yet
Realization of Embedded Speech Recognmition Module Based On STM32
5 pages
Conversion of NNLM To Back Off Language Model in ASR
No ratings yet
Conversion of NNLM To Back Off Language Model in ASR
4 pages
Self-Supervised Learning: Teaching AI with Unlabeled Data
From Everand
Self-Supervised Learning: Teaching AI with Unlabeled Data
Robert Johnson
No ratings yet
Multi-View Self-Supervised Learning and Multi-Scale Feature Fusion For Automatic Speech Recognition
No ratings yet
Multi-View Self-Supervised Learning and Multi-Scale Feature Fusion For Automatic Speech Recognition
20 pages
Few-Shot Machine Learning: Doing More with Less Data
From Everand
Few-Shot Machine Learning: Doing More with Less Data
Robert Johnson
No ratings yet
Download
No ratings yet
Download
13 pages
Brain-Inspired Replay For Continual Learning With Artificial Neural Networks
No ratings yet
Brain-Inspired Replay For Continual Learning With Artificial Neural Networks
14 pages
Conditional Variational Autoencoder With Adversarial Learning For End-to-End Text-to-Speech
No ratings yet
Conditional Variational Autoencoder With Adversarial Learning For End-to-End Text-to-Speech
15 pages
Voicerecognition
No ratings yet
Voicerecognition
23 pages
ASR - VLSP 2021: Conformer With Gradient Mask and Stochastic Weight Averaging For Vietnamese Automatic Speech Recognition
No ratings yet
ASR - VLSP 2021: Conformer With Gradient Mask and Stochastic Weight Averaging For Vietnamese Automatic Speech Recognition
7 pages
The Newbie’s Guidebook to ChatGPT: A Beginner's Tutorial: The Newbie’s Guidebook
From Everand
The Newbie’s Guidebook to ChatGPT: A Beginner's Tutorial: The Newbie’s Guidebook
Timothy King
No ratings yet
1507 08240
No ratings yet
1507 08240
8 pages
Certified Training With Branch-and-Bound.18235v1
No ratings yet
Certified Training With Branch-and-Bound.18235v1
16 pages
Initialization To Keep SNN Training and Generalization Great With Surrogate-Stable Variance.18250v1
No ratings yet
Initialization To Keep SNN Training and Generalization Great With Surrogate-Stable Variance.18250v1
11 pages
Exploration of LLM Multi-Agent Application Implementation Based On LangGraph+CrewAI.18241v1
No ratings yet
Exploration of LLM Multi-Agent Application Implementation Based On LangGraph+CrewAI.18241v1
3 pages
Thai Financial Domain Adaptation of THaLLE - Technical Report.18242v1
No ratings yet
Thai Financial Domain Adaptation of THaLLE - Technical Report.18242v1
27 pages
What Neural Networks Learn Is What Network Designers Say.18343v1
No ratings yet
What Neural Networks Learn Is What Network Designers Say.18343v1
16 pages
Diffusion Self-Distillation For Zero-Shot Customized Image Generation
No ratings yet
Diffusion Self-Distillation For Zero-Shot Customized Image Generation
22 pages
Weakly Supervised Framework Considering Multi-Temporal Information For Large-Scale Cropland Mapping With Satellite Imagery
No ratings yet
Weakly Supervised Framework Considering Multi-Temporal Information For Large-Scale Cropland Mapping With Satellite Imagery
33 pages
Can LLMs Plan Paths in The Real World?
No ratings yet
Can LLMs Plan Paths in The Real World?
17 pages
Proactive Gradient Conflict Mitigation in Multi-Task Learning: A Sparse Training Perspective
No ratings yet
Proactive Gradient Conflict Mitigation in Multi-Task Learning: A Sparse Training Perspective
23 pages
Functional Relevance Based On The Continuous Shapley Value
No ratings yet
Functional Relevance Based On The Continuous Shapley Value
36 pages
Towards Efficient Neurally-Guided Program Induction For ARC-AGI
No ratings yet
Towards Efficient Neurally-Guided Program Induction For ARC-AGI
17 pages
Robust Offline Reinforcement Learning With Linearly Structured F-Divergence Regularization
No ratings yet
Robust Offline Reinforcement Learning With Linearly Structured F-Divergence Regularization
52 pages
Isometry Pursuit
No ratings yet
Isometry Pursuit
18 pages
MONOPOLY: Learning To Price Public Facilities For Revaluing Private Properties With Large-Scale Urban Data
No ratings yet
MONOPOLY: Learning To Price Public Facilities For Revaluing Private Properties With Large-Scale Urban Data
9 pages
SoK: Watermarking For AI-Generated Content
No ratings yet
SoK: Watermarking For AI-Generated Content
28 pages
LLM-ABBA: Understand Time Series Via Symbolic Approximation
No ratings yet
LLM-ABBA: Understand Time Series Via Symbolic Approximation
13 pages
Large Language Model-Brained GUI Agents: A Survey
No ratings yet
Large Language Model-Brained GUI Agents: A Survey
78 pages
S3 (Extra Credit)
No ratings yet
S3 (Extra Credit)
2 pages
English Reading and Composition
No ratings yet
English Reading and Composition
3 pages
Reading Approach
No ratings yet
Reading Approach
24 pages
Negative Impact of Cigarette Smoking: Iwzqgnb7Y8
100% (1)
Negative Impact of Cigarette Smoking: Iwzqgnb7Y8
2 pages
Download Complete Subjectivity Ancient and Modern 4th Edition R J Snell PDF for All Chapters
100% (2)
Download Complete Subjectivity Ancient and Modern 4th Edition R J Snell PDF for All Chapters
40 pages
Enhancing Campus Efficiency: Developing A Web-Based Appointment System For PDSA Services at Liceo de Cagayan University
100% (1)
Enhancing Campus Efficiency: Developing A Web-Based Appointment System For PDSA Services at Liceo de Cagayan University
4 pages
Jss Intake Fee Admission Process 2023
No ratings yet
Jss Intake Fee Admission Process 2023
13 pages
Gifted and Talented
No ratings yet
Gifted and Talented
2 pages
SDOIN FLAT fILIPINO
No ratings yet
SDOIN FLAT fILIPINO
5 pages
POL 320 Thesis-Outline Meeting
No ratings yet
POL 320 Thesis-Outline Meeting
3 pages
Wood - 2015 - Best Practice in The Psychological Assessment of Early Years Children With Differences
No ratings yet
Wood - 2015 - Best Practice in The Psychological Assessment of Early Years Children With Differences
7 pages
Corporate Company Contact List
100% (1)
Corporate Company Contact List
6 pages
Untitled
No ratings yet
Untitled
3 pages
6th Semester Fee
No ratings yet
6th Semester Fee
1 page
USCGA
No ratings yet
USCGA
160 pages
Summative Test in PR2 22-23
No ratings yet
Summative Test in PR2 22-23
1 page
Further Practice 2-W2
No ratings yet
Further Practice 2-W2
2 pages
02 VxWorks653
100% (2)
02 VxWorks653
52 pages
AP0161 Using Simetrix Simplis Circuit Simulation
No ratings yet
AP0161 Using Simetrix Simplis Circuit Simulation
16 pages
School of Thought - Associationism
No ratings yet
School of Thought - Associationism
15 pages
FDP Invitation
No ratings yet
FDP Invitation
3 pages
Download ebooks file (Ebook) Teaching ESL Composition: Purpose, Process, and Practice by Dana R. Ferris, John Hedgcock ISBN 9780805844672, 9781410611505, 0805844678, 1410611507 all chapters
100% (3)
Download ebooks file (Ebook) Teaching ESL Composition: Purpose, Process, and Practice by Dana R. Ferris, John Hedgcock ISBN 9780805844672, 9781410611505, 0805844678, 1410611507 all chapters
86 pages
GT-Review-v4-issue-3-2 - Basic Social Process
No ratings yet
GT-Review-v4-issue-3-2 - Basic Social Process
116 pages
Pieter Zeeman
No ratings yet
Pieter Zeeman
6 pages
I. Activity Design: Literacy Week Celebration II. Rationale
No ratings yet
I. Activity Design: Literacy Week Celebration II. Rationale
2 pages
ks3-lesson-plan-3-safer-online-relationships-NSPCC
No ratings yet
ks3-lesson-plan-3-safer-online-relationships-NSPCC
9 pages
1 YELLOW BATCH - CP - Seating Plan-1
No ratings yet
1 YELLOW BATCH - CP - Seating Plan-1
32 pages

Continual Learning in Machine Speech Chain Using Gradient Episodic Memory.18320v1

Uploaded by

Continual Learning in Machine Speech Chain Using Gradient Episodic Memory.18320v1

Uploaded by

CONTINUAL LEARNING IN MACHINE SPEECH CHAIN

USING GRADIENT EPISODIC MEMORY

Geoffrey Tyndall1∗ , Kurniawati Azizah1 , Dipta Tanaya1 ,

ABSTRACT of large-scale speech models [5, 6] that excel in multitask

2.2. Gradient Episodic Memory

Gradient episodic memory (GEM) is a replay method of con-

where g, g̃k ∈ R|θ| and |θ| is the number of model’s param-

3. MACHINE SPEECH CHAIN USING GEM

We introduce a three-stage mechanism designed to en-

[13] Heli Qi, Sashi Novitasari, Andros Tjandra, Sakriani

[14] Rais Vaza Man Tazakka, Dessi Lestari, Ayu Purwarianti,

You might also like