ICASSP - 2025 - Copie

This paper presents an adaptive compression method for supervised and self-supervised models in speech recognition, focusing on reducing model size while maintaining performance. The approach customizes compression techniques like pruning and quantization based on the individual characteristics of model layers, guided by fuzzy logic. Experiments demonstrate a significant reduction in memory footprint (up to 85%) with only slight performance degradation, highlighting the method's effectiveness for sustainable AI deployment.

Uploaded by

maif.flannels636

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

7 views5 pages

ICASSP - 2025 - Copie

Uploaded by

maif.flannels636

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 5

Adaptive Compression of Supervised and

Self-Supervised Models for Green Speech

Recognition
Mouaad Oujabour⇤ , Leila Ben Letaifa⇤ , Jean-François Dollinger⇤ , Jean-Luc Rouas† ,
⇤ CESI
LINEACT UR 7527, Nancy, France
† LaBRI, CNRS UMR 5800 Univ. de Bordeaux, Bordeaux INP Talence, France

Abstract—Computational power is crucial for the development Several works in the literature have focused on large model
and deployment of artificial intelligence capabilities, as the large compression, but only a limited number address SSL mod-
size of deep learning models often requires significant resources. els [13], [14]. Among these, in [15], the authors applies
Compression methods aim to reduce model size making artificial
intelligence more sustainable and accessible. Compression tech- knowledge distillation (KD) to the Wav2vec acoustic model,
niques are often applied uniformly across model layers, without achieving a 4.8x compression ratio, though with a WER
considering their individual characteristics. In this paper, we increase of 3.62. In [16], genetic algorithms are proposed for
introduce a customized approach that optimizes compression for structured pruning of the Wav2vec2 XLSR53 model, resulting
each layer individually. Some layers undergo both pruning and/or in a slight WER increase of 0.21% with 40% pruning. In [17],
quantization, while others are only quantized, with fuzzy logic
guiding these decisions. The quantization precision is further ad- the authors employ symmetric linear quantization to reduce
justed based on the importance of each layer. Our experiments on the precision of weights and activations of a BERT model to
both supervised and self-supervised models using the librispeech INT8. To our knowledge, quantization has not yet been applied
dataset show only a slight decrease in performance, with about to speech SSL models.
85% memory footprint reduction. Previous research often applies a uniform compression
Index Terms—Adaptive compression, Self Supervised models,
method across all layers. However, recent studies reveal that
speech recognition, pruning, quantization, fuzzy logic
weight distributions vary by type and position within the
network [5], [18]. For example, layers with many critical
I. I NTRODUCTION weights need higher quantization precision, while layers with
mostly low-magnitude weights are more suitable for pruning.
Artificial Intelligence (AI) achieves remarkable success in We propose a customized approach that selects the optimal
a wide array of applications. This success stems from the compression method for each layer individually. Some layers
development of deeper and wider Deep Neural Network will undergo both quantization and/or pruning, while others
(DNN) architectures, which enhance the model’s ability to will be only quantized, with fuzzy logic guiding the decision
learn intricate patterns for specific tasks. This is especially process.
evident in computer vision, in natural language processing
and audio processing, including speech recognition. However, II. M ODEL COMPRESSION
the deployment of such large models comes with significant Focusing on Green AI [1] to minimize computational costs
computing and financial costs and contributes to a substantial while preserving performance, we prioritize techniques with
carbon footprint [1]. This not only challenges the inclusivity minimal parameter tuning. We explore two model compres-
of AI but also raises environmental concerns [2]. To address sion strategies, quantization and pruning, which are not only
these issues, reducing model size is crucial. Several methods easily applied to pre-trained models but also ideal for rapid
exist for compressing DNNs, including quantization, pruning, deployment on mobile devices.
knowledge distillation, parameter sharing, and matrix factor-
ization [3]–[9]. A. Quantization
Today, most DNN applications rely on supervised learning, Model quantization reduces the size of the neural networks
which requires labeled data — a process that is often time- by using lower precision values of the weight or activation
consuming and costly. In contrast, human learning begins [6]. The two main approaches are Post-Training Quantization
unsupervised, as infants learn language through observation, (PTQ) and Quantization-Aware Training (QAT). A standard
and later through supervised tasks like reading and writing. quantization function Q(x) converts a floating-point value x
To mimic this process, self-supervised learning (SSL) frame- to an integer : ⇣x⌘
works have been developed. In speech processing, models like Q(x) = Int Z (1)
Wav2vec 2.0 [10], HUBERT [11], and WavLM [12] excel S
with minimal annotated data by pretraining on large unlabeled where S is a floating-point scaling factor, and Z is an integer
datasets, followed by fine-tuning on smaller labeled datasets. zero point, representing the zero value in the quantization
scheme. The Int(·) function rounds the scaled x to the nearest
integer. This approach is known as uniform quantization
because all x values are scaled by the same factor S, leading
to evenly spaced quantized values. Non-uniform quantization,
with its variable spacing between quantized values, can more
effectively capture signal information, but it is challenging
to implement efficiently on standard hardware. As a result,
uniform quantization is the preferred method.
Clipping range selection, or calibration, can be done using
the signal’s minimum and maximum values ↵ = xmin and Fig. 1. Fuzzy membership functions for weight importance in the adaptive
= xmax resulting in asymmetric quantization as the range compression method.
may not be centered. Alternatively, symmetric quantization
sets ↵ = often using the maximum absolute values ↵ =
= max(|xmax |, |xmin |). Asymmetric quantization typically a) Trapezoidal Membership Function: This function is
narrows the clipping range, which is important for imbalanced used to classify weights into two categories: low or high
weights or activations like those following ReLU. Setting the importance. The corresponding membership function, denoted
zero point to Z = 0 simplifies symmetric quantization. by µk (x), where k 2 {low, high}, is given by:

B. Pruning 8
>
> 0 if x  ak
Pruning removes unimportant weights/components, by ze- >
>
>
> x ak
if a k < x  bk
roing values close to zero. Formally, a neural network model < bk a k
can be defined as a function family f (x, W ) where x denotes µk (x; ak , bk , ck , dk ) = 1 if bk < x  c k (2)
>
>
the network architecture and W its parameters. Pruning a >
> dk x
if c k < x  dk
>
> d k ck
neural network involves taking an existing model f (x, W ) and :0 if x > dk
generating a new model f (x, W 0 ) such that W 0 = M W
|W | b) Triangular Membership Function: For weights clas-
where M 2 {0, 1} is a binary mask to set some parameters
sified as medium importance, we use a triangular mem-
to zero and is the elementwise product operator [19].
bership function. The triangular membership function
The pruning techniques include : – Unstructured pruning
µmedium (x; aM , bM , cM ) is defined as:
[20] removes individual weights, creating sparse matrices
– Structured pruning removes entire blocks, such as rows, 8
columns, neurons, or attention heads [21]. In this paper, our fo- >
> 0 if x  aM
>
<x
cus is on unstructured pruning, as it targets the smallest model aM
if aM < x  bM
elements without significant performance loss. Unstructured µmedium (x; aM , bM , cM ) = bM
cM
aM
x (3)
>
>
> if bM < x  cM
pruning introduces sparsity, creating irregular memory access : cM bM

patterns but sparse matrix representations [22] or specialized 0 if x > cM

hardware [23] can address this issue. Pruning can be done 2) Membership Degree Calculation: For each weight w
iteratively between training epochs, applied once after training in a layer, we calculate its degree of membership in each
[18], or integrated during fine-tuning [24]. Pruning can be importance class as low, medium, or high. Let x = |w|, the
applied globally, removing a fraction of parameters across magnitude of the weight. The membership degrees are µlow (x),
the entire model, or locally, targeting a specific percentage µmedium (x) and µhigh (x). These degrees describe the extent to
of parameters within each layer [19]. which a weight belongs to each importance class.
3) Defuzzification and Decision-Making: Defuzzification
III. A DAPTIVE COMPRESSION
converts fuzzy weight classifications into concrete actions.
We propose an adaptive compression method using fuzzy Based on the percentage of weights in each importance
logic [25] to evaluate weight importance in each layer and category (low, medium, high), the method applies the appro-
dynamically select the optimal compression strategy. By an- priate compression strategy. Layers with a majority of low-
alyzing the statistical distribution of weight magnitudes (e.g., importance weights are pruned or quantized with low preci-
minimum, maximum, median, standard deviation), the method sion, while those with medium or high-importance weights are
defines fuzzy membership functions to classify weights as low, quantized with higher precision.
medium, or high importance.
1) Fuzzy Membership Functions: Fuzzy logic allows us to IV. E XPERIMENTS AND RESULTS
assign degrees of membership to different importance classes We conducted experiments comparing quantization, prun-
low, medium, or high for each weight based on its magnitude. ing, and the proposed adaptive method under identical con-
In the adaptive method, we used basically trapezoidal and ditions. Each technique was applied post-training in a one-
triangular membership functions to describe the fuzzy sets for shot manner and evaluated by memory footprint and Word
low, medium, and high importance as shown on Fig. 1. Error Rate (WER). Compression was measured by storage
efficiency for quantization and sparsity for pruning. To ensure TABLE II
fairness, we introduced a unified compression rate for doubly W ORD E RROR R ATE (WER) AFTER QUANTIZATION TO 2, 4, AND 8 BITS .
compressed models. Models Data Initial Qint8 Qint4 Qint2
Test clean 3.3 3.3 3.5 94.0
A. Baseline systems Test other 8.0 8.0 8.6
Transf
We utilize automatic speech recognition (ASR) models Dev clean 3.0 3.0 3.2
Dev other 7.9 7.9 8.4
trained with the ESPnet toolkit [26] on the LibriSpeech Test clean 2.9 3.0 3.0 33.2
dataset [27], which comprises approximately 1000 hours of Conf
Test other 7.3 7.4 7.6
Dev clean 2.9 2.9 3.0
16kHz English speech recordings. Of these, around 960 hours Dev other 7.1 7.3 7.4
are dedicated to training, with the remaining hours evenly Test clean 2.4 2.4 2.4 22.7
split between development (dev) and testing (test) sets. The Branch
Test other 5.3 5.3 5.6
Dev clean 2.1 2.2 2.2
dataset distinguishes between two categories: clean data (test- Dev other 5.2 5.2 5.4
clean and dev-clean) and other data (test-other and dev-other). Test clean 2.2 2.2 2.2 16.5
Other data refer to recordings that are more challenging due E-branch
Test other 4.6 4.6 4.7
Dev clean 2.0 2.0 2.0
to factors such as background noise, unclear pronunciation, Dev other 4.6 4.6 4.7
or varied accents. Instead, clean data consist of recordings Test clean 2.0 2.0 2.0 66.3
with clear and high-quality audio. We chose to evaluate the Hubert
Test other 4.2 4.2 4.2
Dev clean 1.9 1.9 1.9
Transformer architecture [28] and some of its variants that Dev other 4.1 4.1 4.2
are the Conformer [29], the Branchformer [30] and the E- Test clean 2.5 2.5 2.6 100.0
Branchformer [31] because of their high performance in End Wav2Vec
Test other 6.3 6.3 6.6
Dev clean 2.3 2.3 2.4
to End ASR. We also used the Wav2Vec, the Hubert and the Dev other 6.6 6.6 6.9
WavLM SSL models. These models are referred to as T ransf , Test clean 2.0 2.0 2.1 11.4
Conf , Branch, Ebranch, W 2V , Hub and W lm. Results are WavLm
Test other 4.2 4.2 4.3
Dev clean 1.9 1.9 2.0
reported in TABLE I. Dev other 4.2 4.2 4.2

TABLE I
BASELINE SYSTEMS ’ MODELS : N UMBER OF PARAMETERS (M ILLIONS ), TABLE III
MEMORY FOOTPRINT (M EGABYTES ) AND WORD ERROR RATE (%). M EMORY FOOTPRINT (M EGABYTES ) AFTER QUANTIZATION .

Models Qint8 Qint4 Qint2

Characteristic Trans Conf Branch Ebranch W2V Hub Wlm Mem. Mem. zip Mem. Mem. zip Mem. Mem. zip
Parameters 99 93 116 148 432 433 431 Transf 108 97 65 59 41 34
Mem. 397 373 596 553 1734 1731 1727 Conf 131 119 95 87 75 66
Mem. zip 369 345 433 467 1176 1179 1174 Branch 129 111 79 69 50 41
WER E-branch 163 138 99 87 63 51
Test-clean 3.3 2.9 2.4 2.2 2.5 2.0 2.0 Hubert 512 443 332 279 231 171
Test-other 8.0 7.3 5.3 4.6 6.3 4.2 4.2 Wav2Vec 512 444 332 279 230 171
Dev-clean 3.0 2.9 2.1 2.0 2.3 1.9 1.9 WavLm 510 442 330 277 229 170
Dev-other 7.9 7.1 5.2 4.6 6.6 4.1 4.2

Among classical DNNs, E-Branchformer is the most per- C. Pruning

formant but also the largest. For SSL models, all have compa-
Local unstructured pruning is applied to each linear layer to
rable and significant memory sizes, with Hubert and WavLM
all the models using the pruning rates 40% and 60%. Accord-
showing superior performance than Wav2vec.
ing to TABLE IV, not all models are equally robust to pruning.
B. Quantization For traditional models, the Branchformer and E-Branchformer
We reduced the model precision from 32 bits floats to architectures appear to be over-parameterized, as they can be
8, 4, and 2 bits integersusing the symetric PTQ dynamic pruned by up to 50% without any loss in performance. These
quantization method with Quanto software1 . TABLE II shows models are larger compared to transformers and conformers
that all models are robust to 8 bits quantization but less so and include more MLP layers. In the context of SSL, the
to 4 bits one. The best performance is achieved by the E- Hubert and WavLM architectures show considerably greater
Branchformer, Hubert, and Wavlm models. Overall, the error robustness to pruning than Wav2vec. This enhanced robustness
rate remains stable or increases slightly, with a maximum is likely due to their training processes, which rely on masked
absolute rise of 0.2% for clean data (test and dev) and up to prediction.
0.6% for noisy data. Quantizing to 2-bit integers significantly
degrades performance on clean data. So, we chose not to D. Adaptive compression
evaluate it on the remaining dataset. Regarding memory size,
Each model is compressed layer by layer as follows: For
according to TABLE III, it is reduced by more than 3.6 times
each layer, a membership function categorizes the weights into
for 8 bits quantization and by 6.3 times for 4 bits quantization.
three groups: L (low), M (medium), and H(high). To define
1 https://github.com/huggingface/optimum-quanto/tree/main the parameters of these membership functions, let us denote
TABLE IV not pruned, except for the Branchformer and E-Branchformer,
W ORD E RROR R ATE (WER) FOR THE PRUNING RATES : 40% AND 60% where a quarter of the layers were pruned without performance
Models Pr = 40% Pr = 60% loss. With (↵2, 2), we focused on increasing the proportion
Transf 3.6 10.7 of weights in the low class since 60% pruning rate led to
Conf 3.1 5.2 a WER increase of 0.3, 0.1, and 0.9 respectively for the
Branch 2.4 2.7 Branchformer, E-Branchformer, and WavLm (see TABLE IV).
E-branch 2.1 2.3
Hubert 2.3 3.1 We find that the model size is typically larger than with
Wav2Vec 4.1 18.5 4-bit quantization but smaller than with 8-bit quantization.
WavLm 2.1 2.9 Branchformer offers the best trade-off, reducing size by 75%
with slight increasing the WER.
2) Two-class compression: : According to paragraph IV-B,
|w| the weight magnitude. The following relationships then WER changes little with the 4-bit quantization of all layers but
hold under tuple notation: increases significantly with 2-bit quantization. The approach is
(aM , bM , cM ) = (median(|w|) , to use 2 bits for insignificant layers and 4 bits for the remaining
layers. The adaptive compression is applied as follows: if
median(|w|), (4)
class L has the highest cardinality, the layer is quantized to 4
median(|w|) + ) bits; otherwise, it is quantized to 2 bits. TABLE VI show the
(aL , bL , cL , dL ) = (min(|w|), min(|w|), compression results.
min(|w|) + std(|w|), (5)
TABLE VI
median(|w|) ↵) R ESULTS OF THE T WO - CLASS COMPRESSION
(aH , bH , cH , dH ) = (max(|w|), max(|w|),
Model WER Mem .zip %H %L
max(|w|) std(|w|), (6) Transf 5.8 54 92.49 7.51
Conf 4.7 84 95.97 4.03
median(|w|) + ↵) Branch 2.7 54 47.37 52.63
E branch 2.4 74 66.17 33.83
with ↵ and variable parameters. Hubert 3.2 239 76.56 23.44
We apply two experiments: three-class compression and Wav2Vec 15.3 233 73.60 26.40
two-class compression. WavLm 2.8 237 74.93 25.07
1) Three-class compression: : If class L has the highest
cardinality, the layer is pruned and quantized to 8 bits. If class The memory size falls between 4-bit and 2-bit quantization.
M has the highest cardinality, the layer is quantized to 4 bits. While the WER is higher than that of the quantization into 4
Otherwise, it is quantized to 8 bits. TABLE V show the results bits, it remains much better than 2-bit quantization, offering a
for the following values : balance between the two. For Branchformer, 2-bit quantization
(↵1, 1) = ( std_magnitude
2 , std_magnitude
4 ) and of all layers yields a WER of 16.5%, 4-bit gives 2.2%, and a
(↵2, 2) = std_magnitude
, std_magnitude
) mix of 34% of the layers at 2 bits and 66% at 4 bits results
1 2
in 2.4% WER with a memory reduction to 15%. For WavLM,
TABLE V 2-bit quantization gives a WER of 11.4%, 4-bit achieves 2.0%,
R ESULTS OF THE T HREE - CLASS COMPRESSION : (↵1, 1) AND (↵2, 2), and a 25%-75% mix results in 2.8% WER.
%Ł, %H AND %M THE PERCENTAGE OF RESPECTIVE COMPRESSED
LAYERS AND P r THE PRUNING RATE OF THE PRUNED LAYERS . V. C ONCLUSIONS
Model WER Mem.z %L %M %H %Pr Large DNN models are resource-intensive and compres-
(↵1, 1) sion techniques can reduce model size without compromising
Transf 3.3 88.06 1.17 85.96 12.87 48
Conf 3.1 118.97 0.68 97.28 2.04 48 performance, making Artificial intelligence more sustainable
Branch 2.4 88.7 26.57 58.94 14.49 59 and accessible. Typically, compression methods are applied
E branch 2.2 107.82 24.72 63.29 11.99 53 uniformly across all network layers, neglecting their individual
Hubert 2.0 308.46 0.00 91.82 8.18 0
Wav2Vec 2.6 299.82 0.62 93.12 6.25 48 characteristics. This work introduces a refined approach that
WavLm 2.1 313.66 1.47 89.44 9.09 54 adapts compression methods to each layer’s specific needs,
(↵2, 2) optimizing performance. Our experiments with supervised
Transf 3.4 97.25 44.45 32.16 23.39 36 and self-supervised models show only a slight performance
Conf 3.1 120.54 54.42 27.21 18.37 36
Branch 2.4 94.76 54.11 27.54 18.35 49 reduction. BranchFormer achieves the best balance, reducing
E branch 2.2 117.8 56.93 25.09 17.98 45 memory size by up to 85% with just a 0.2% increase in WER.
Hubert 2.1 357.45 38.05 34.91 27.04 36
Wav2Vec 4.4 358.52 37.81 32.50 29.69 36 ACKNOWLEDGMENT
WavLm 2.1 370.2 42.82 31.09 26.09 36
This work was supported by the FVLLMONTI project,
Using (↵1, 1), the WER has either remained the same or funded by the European Union’s Horizon 2020 program (grant
increased slightly. The majority of layers in all models were agreement No. 101016776).
R EFERENCES [21] S. Anwar, K. Hwang, and W. Sung, “Structured pruning of deep
convolutional neural networks,” ACM Journal on Emerging Technologies
[1] R. Schwartz, J. Dodge, N. A. Smith, and O. Etzioni, “Green AI,” in Computing Systems, vol. 1, p. 1–18, 2017.
Commun. ACM, vol. 63, no. 12, p. 54–63, nov 2020. [Online]. [22] G. Nazli, J. Ankit, and S. Qian, “Comparative analysis of sparse matrix
Available: https://doi.org/10.1145/3381831 algorithms for information retrieval,” computer, vol. 2, 2003.
[2] G. Sastry, L. Heim, H. Belfield, M. Anderljung, M. Brundage, J. Hazell, [23] G. A. Alireza Amirshahi, Joshua Alexander Harrison Klein and
C. O’Keefe, G. K. Hadfield, R. Ngo, K. Pilz et al., “Computing D. Atienza, “Tic-sat: Tightly-coupled systolic accelerator for transform-
power and the governance of artificial intelligence,” arXiv preprint ers.” 28th Asia and South Pacific Design Automation Conference, 2023,
arXiv:2402.08797, 2024. pp. 657–663.
[3] S. S. Jash Rathod, Nauman Dawalatabad and D. Gowda, “Multi-stage [24] M. Gupta and P. Agrawal, “Compression of deep learning models for
progressive compression of conformer transducer for on-device speech text: A survey,” Computer Science. ACM Trans. Knowl. Discov. Data,
recognition.” in INTERSPEECH, 2022. 2020.
[4] L. B. Letaifa and J.-L. Rouas, “Transformer model compression for [25] G. Klir and B. Yuan, Fuzzy sets and fuzzy logic. Prentice hall New
end-to-end speech recognition on mobile devices,” in European Signal Jersey, 1995, vol. 4.
Processing Conference, EUSIPCO, 2022. [26] S. Watanabe, T. Hori, S. Karita, T. Hayashi, J. Nishitoba, Y. Unno,
N. E. Y. Soplin, J. Heymann, M. Wiesner, N. Chen, A. Renduchintala,
[5] L. Ben Letaifa and J.-L. Rouas, “Variable scale pruning for transformer
and T. Ochiai, “Espnet: End-to-end speech processing toolkit.” INTER-
model compression in end-to-end speech recognition,” Algorithms. Spe-
SPEECH, 2018, pp. 2207–2211.
cial Issue ”Recent Advances in Machine Learning Algorithms”, 2023.
[27] V. Panayotov, G. Chen, D. Povey, and S. Khudanpur, “Librispeech: an
[6] A. Gholami, S. Kim, Z. Dong, Z. Yao, M. W. Mahoney, and K. Keutzer,
asr corpus based in public domain audio books,” in ICCASP, 2015.
“A survey of quantization methods for efficient neural network infer-
[28] L. Dong, S. Xu, and B. Xu, “Speech-transformer: A no-recurrence
ence,” in Low-Power Computer Vision. Chapman and Hall/CRC, 2022,
sequence-to-sequence model for speech recognition.” IEEE Interna-
pp. 291–326.
tional Conference on Acoustics, Speech, and Signal Processing ICASSP,
[7] M. B. Noach and Y. Goldberg, “Compressing pre-trained language 2018.
models by matrix decomposition,” in International Joint Conference on [29] A. Gulati, J. Qin, C.-C. Chiu, N. Parmar, Y. Zhang, J. Yu, W. Han,
Natural Language Processing, 2020. S. Wang, Z. Zhang, Y. Wu, and R. Pang, “Conformer: Convolution-
[8] D. Bekal, K. Gopalakrishnan, K. Mundnich, S. Ronanki, S. Bodapati, augmented Transformer for Speech Recognition,” in INTERSPEECH,
and K. Kirchhoff, “A metric-driven approach to conformer layer pruning Oct. 2020.
for efficient asr inference.” INTERSPEECH, 2023. [30] Y. Peng, S. Dalmia, I. Lane, and S. Watanabe, “Branchformer: Parallel
[9] Y. Wang and J. Li, “Residualtransformer: Residual low-rank learning mlp-attention architectures to capture local and global context for speech
with weight-sharing for transformer layers,” in ICASSP 2024-2024 IEEE recognition and understanding,” in International Conference on Machine
International Conference on Acoustics, Speech and Signal Processing Learning. PMLR, 2022, pp. 17 627–17 643.
(ICASSP). IEEE, 2024, pp. 11 161–11 165. [31] K. Kim, F. Wu, Y. Peng, J. Pan, P. Sridhar, K. J. Han, and S. Watanabe,
[10] A. Baevski, Y. Zhou, A. Mohamed, and M. Auli, “wav2vec 2.0: A “E-branchformer: Branchformer with enhanced merging for speech
framework for self-supervised learning of speech representations,” in recognition,” in 2022 IEEE Spoken Language Technology Workshop
Advances in Neural Information Processing Systems, vol. 33, 2020, pp. (SLT). IEEE, 2023, pp. 84–91.
12 449–12 460.
[11] W.-N. Hsu, B. Bolte, Y.-H. H. Tsai, K. Lakhotia, R. Salakhutdinov, and
A. Mohamed, “Hubert: Self-supervised speech representation learning
by masked prediction of hidden units,” in IEEE/ACM Transactions on
Audio, Speech, and Language Processing, vol. 29. IEEE, 2021, pp.
3451–3460.
[12] S. Chen, C. Wang, Z. Chen, Y. Wu, Y. Liang, Y. Q. Fan, M. Z. Chang,
S. Liu et al., “Wavlm: Large-scale self-supervised pre-training for full
stack speech processing,” in IEEE/ACM Transactions on Audio, Speech,
and Language Processing, vol. 30. IEEE, 2022, pp. 346–360.
[13] C.-I. J. Lai, Y. Zhang, A. H. Liu, S. Chang, Y.-L. Liao, Y.-S. Chuang,
K. Qian, S. Khurana, D. Cox, and J. Glass, “Parp: Prune, adjust and
re-prune for self-supervised speech recognition,” Advances in Neural
Information Processing Systems, vol. 34, pp. 21 256–21 272, 2021.
[14] Y. Peng, K. Kim, F. Wu, P. Sridhar, and S. Watanabe, “Structured
pruning of self-supervised pre-trained models for speech recognition and
understanding,” in IEEE International Conference on Acoustics, Speech
and Signal Processing (ICASSP). IEEE, 2023, pp. 1–5.
[15] Z. Peng, A. Budhkar, I. Tuil, J. Levy, P. Sobhani, R. Cohen, and
J. Nassour, “Shrinking bigfoot: Reducing wav2vec 2.0 footprint,” arXiv
preprint arXiv:2103.15760, 2021.
[16] O. Ludwig and T. Claes, “Compressing wav2vec2 for embedded applica-
tions,” in 2023 IEEE 33rd International Workshop on Machine Learning
for Signal Processing (MLSP). IEEE, 2023, pp. 1–6.
[17] O. Zafrir, G. Boudoukh, P. Izsak, and M. Wasserblat, “Q8bert: Quantized
8bit bert,” in 2019 Fifth Workshop on Energy Efficient Machine Learning
and Cognitive Computing-NeurIPS Edition (EMC2-NIPS). IEEE, 2019,
pp. 36–39.
[18] L. B. Letaifa and J.-L. Rouas, “Fine-grained analysis of the transformer
model for efficient pruning,” in International Conference on Machine
Learning and Applications ICMLA, 2022.
[19] D. Blalock, J. J. Gonzalez Ortiz, J. Frankle, and J. Guttag, “What is the
state of neural network pruning?” in Proceedings of Machine Learning
and Systems, I. Dhillon, D. Papailiopoulos, and V. Sze, Eds., vol. 2,
2020, pp. 129–146.
[20] S. Han, H. Mao, and W. J. Dally, “Deep compression: Compressing deep
neural network with pruning, trained quantization and huffman coding,”
in proceedings ICLR, 2016.

인공지능 하드웨어 연구를 위한 딥러닝 기초 및 경량화 기술 (Hw 이론)
No ratings yet
인공지능 하드웨어 연구를 위한 딥러닝 기초 및 경량화 기술 (Hw 이론)
96 pages
ASCAI - Adaptive Sampling For Acquiring Compact AI
No ratings yet
ASCAI - Adaptive Sampling For Acquiring Compact AI
8 pages
Tutorial On DNN 6 of 9 Network and Hardware Co Design
No ratings yet
Tutorial On DNN 6 of 9 Network and Hardware Co Design
60 pages
Data Compression Overview
No ratings yet
Data Compression Overview
77 pages
An Survey of Neural Network Compression
No ratings yet
An Survey of Neural Network Compression
73 pages
Pruning and Quantization For Deep Neural Network Acceleration: A Survey
No ratings yet
Pruning and Quantization For Deep Neural Network Acceleration: A Survey
41 pages
Model Compression and Efficient Inference For Large Language Models: A Survey
No ratings yet
Model Compression and Efficient Inference For Large Language Models: A Survey
47 pages
And The Bit Goes Down
No ratings yet
And The Bit Goes Down
11 pages
L7 Lecture Image - classification.DNN v4
No ratings yet
L7 Lecture Image - classification.DNN v4
61 pages
Model Compression
No ratings yet
Model Compression
41 pages
Unit 6.4 Compressing Neural Networks
No ratings yet
Unit 6.4 Compressing Neural Networks
45 pages
Pruning V:s Quantization
No ratings yet
Pruning V:s Quantization
21 pages
Pruning Machine Learning
No ratings yet
Pruning Machine Learning
93 pages
EBSCO-FullText-04 18 2025
No ratings yet
EBSCO-FullText-04 18 2025
27 pages
Applsci 12 11184
No ratings yet
Applsci 12 11184
18 pages
模型剪枝在2d3d卷积网络中的研究与应用-悉尼大学在读博士生郭晋阳智东西公开课
No ratings yet
模型剪枝在2d3d卷积网络中的研究与应用-悉尼大学在读博士生郭晋阳智东西公开课
70 pages
Efficient Deep Learning in Network Compression and
No ratings yet
Efficient Deep Learning in Network Compression and
21 pages
Compressing Neural Networks Using The Variational Information Bottleneck
No ratings yet
Compressing Neural Networks Using The Variational Information Bottleneck
27 pages
Module 10 - Learners Guide
No ratings yet
Module 10 - Learners Guide
29 pages
OLMP Lab6
No ratings yet
OLMP Lab6
27 pages
A Survey of Quantization Methods For Efficient Neural Network Inference
No ratings yet
A Survey of Quantization Methods For Efficient Neural Network Inference
33 pages
HCIP-AI-EI Developer V2.0 Training Material
No ratings yet
HCIP-AI-EI Developer V2.0 Training Material
508 pages
1 - A Day in The Life of ChatGPT As A Researcher
No ratings yet
1 - A Day in The Life of ChatGPT As A Researcher
20 pages
Compression Survey Hal
No ratings yet
Compression Survey Hal
26 pages
Riemannian Low-Rank Model Compression For Federated Learning With Over-The-Air Aggregation
No ratings yet
Riemannian Low-Rank Model Compression For Federated Learning With Over-The-Air Aggregation
16 pages
The Kmean Quatization
No ratings yet
The Kmean Quatization
14 pages
A Comprehensive Survey On Model Compression and Acceleration
No ratings yet
A Comprehensive Survey On Model Compression and Acceleration
43 pages
ML System Optimization - Lecture 10 - Model Optimization Techniques
No ratings yet
ML System Optimization - Lecture 10 - Model Optimization Techniques
33 pages
Survey Pruning 1 - 2022-Methods For Pruning Deep
No ratings yet
Survey Pruning 1 - 2022-Methods For Pruning Deep
21 pages
(P4) CLIP-Q Deep Network
No ratings yet
(P4) CLIP-Q Deep Network
10 pages
AQLM
No ratings yet
AQLM
18 pages
Layer-Wise Quantization
No ratings yet
Layer-Wise Quantization
17 pages
2020 Emnlp-Main 37
No ratings yet
2020 Emnlp-Main 37
13 pages
To Prune, or Not To Prune: Exploring The Efficacy of Pruning For Model Compression
No ratings yet
To Prune, or Not To Prune: Exploring The Efficacy of Pruning For Model Compression
11 pages
NIPS 2016 Dynamic Network Surgery For Efficient Dnns Paper
No ratings yet
NIPS 2016 Dynamic Network Surgery For Efficient Dnns Paper
9 pages
Hrel: Filter Pruning Based On High Relevance Between Activation Maps and Class Labels
No ratings yet
Hrel: Filter Pruning Based On High Relevance Between Activation Maps and Class Labels
13 pages
3an Empirical Study of Binary N
No ratings yet
3an Empirical Study of Binary N
11 pages
Pruning Networks With Cross-Layer Ranking Amp K-Reciprocal Nearest Filters
No ratings yet
Pruning Networks With Cross-Layer Ranking Amp K-Reciprocal Nearest Filters
10 pages
Paper Survey - Training With Quantization Noise For Extreme Model Compression
No ratings yet
Paper Survey - Training With Quantization Noise For Extreme Model Compression
25 pages
17056-Article Text-20550-1-2-20210518
No ratings yet
17056-Article Text-20550-1-2-20210518
8 pages
Learning Efficient Convolutional Networks Through Network Slimming
No ratings yet
Learning Efficient Convolutional Networks Through Network Slimming
10 pages
(Ebook) The Transformers Legends by David Cian ISBN 9780743497916, 0743497910 Download
100% (2)
(Ebook) The Transformers Legends by David Cian ISBN 9780743497916, 0743497910 Download
67 pages
Conmatphys 031119 050745
No ratings yet
Conmatphys 031119 050745
28 pages
Study On Street Vendors Before and After Pandemic
100% (1)
Study On Street Vendors Before and After Pandemic
81 pages
Deep Model Compression Based On The Trai
No ratings yet
Deep Model Compression Based On The Trai
9 pages
DAC'22 EBSP Bit Sparsity DNN
No ratings yet
DAC'22 EBSP Bit Sparsity DNN
6 pages
Afane ATP Adaptive Threshold Pruning For Efficient Data Encoding in Quantum CVPR 2025 Paper
No ratings yet
Afane ATP Adaptive Threshold Pruning For Efficient Data Encoding in Quantum CVPR 2025 Paper
10 pages
Statistics Mechanic of Deep Learning
No ratings yet
Statistics Mechanic of Deep Learning
28 pages
Jntuk r20 Unit-V Deep Learning Techniques (WWW - Jntumaterials.co - In)
No ratings yet
Jntuk r20 Unit-V Deep Learning Techniques (WWW - Jntumaterials.co - In)
61 pages
PyTorch Neural Network Classifcation
No ratings yet
PyTorch Neural Network Classifcation
1 page
Model Compression and Pruning Techniques
No ratings yet
Model Compression and Pruning Techniques
2 pages
Compressing Deep Convolutional Networks
No ratings yet
Compressing Deep Convolutional Networks
10 pages
Model Compression Is The Big ML Flavour of 2021
No ratings yet
Model Compression Is The Big ML Flavour of 2021
4 pages
A Survey of Model Compression and Acceleration For Deep Neural Networks
No ratings yet
A Survey of Model Compression and Acceleration For Deep Neural Networks
10 pages
Konigstein2 v-8 ScrambledContent Chapter-9
No ratings yet
Konigstein2 v-8 ScrambledContent Chapter-9
10 pages
Model Compression Techniquesin Deep Learning
No ratings yet
Model Compression Techniquesin Deep Learning
23 pages
DL M3 Tech
No ratings yet
DL M3 Tech
15 pages
Prompt
No ratings yet
Prompt
3 pages
Y6 Science 2021
100% (1)
Y6 Science 2021
32 pages
CIKM
No ratings yet
CIKM
173 pages
Case History, Assessment Process and Report
No ratings yet
Case History, Assessment Process and Report
88 pages
Grid Audit Report Format
100% (1)
Grid Audit Report Format
7 pages
XXXXX: Important Instructions To Examiners
No ratings yet
XXXXX: Important Instructions To Examiners
16 pages
Deterioration of Concrete
No ratings yet
Deterioration of Concrete
34 pages
International Project Management Guide 2.0 (IAPM)
100% (1)
International Project Management Guide 2.0 (IAPM)
44 pages
CH 16 Keller, Grumach, Et Al
No ratings yet
CH 16 Keller, Grumach, Et Al
22 pages
Lesson Plan Subject/Grade Unit/Skill/Topic of Lesson Standards Addressed Va:Re9.1. 2 Va:Cr2.1.2 Vacr3.1.2
100% (1)
Lesson Plan Subject/Grade Unit/Skill/Topic of Lesson Standards Addressed Va:Re9.1. 2 Va:Cr2.1.2 Vacr3.1.2
4 pages
Architecture and Sociology
No ratings yet
Architecture and Sociology
11 pages
Nissan - Resilience Strategy
0% (1)
Nissan - Resilience Strategy
2 pages
Collaborative Learning
No ratings yet
Collaborative Learning
7 pages
01 - Electricity - Basic Principles
No ratings yet
01 - Electricity - Basic Principles
14 pages
MAD111 - Chap 1
No ratings yet
MAD111 - Chap 1
237 pages
Asset Holiday Home Work 2
No ratings yet
Asset Holiday Home Work 2
13 pages
Yamaha Fzr400swc 89 Parts Catalogue
100% (42)
Yamaha Fzr400swc 89 Parts Catalogue
6 pages
Lesson Plan
No ratings yet
Lesson Plan
9 pages
ER Model and Relational Model: Learning Objectives
No ratings yet
ER Model and Relational Model: Learning Objectives
18 pages
Pospiszyl 2023 The Fifth Element The Enlightenment and The Draining of Eastern Europe
No ratings yet
Pospiszyl 2023 The Fifth Element The Enlightenment and The Draining of Eastern Europe
28 pages
De Chuyen Anh Vinh Phuc 2018-2019
No ratings yet
De Chuyen Anh Vinh Phuc 2018-2019
6 pages
Vmware - Kopia
No ratings yet
Vmware - Kopia
45 pages
Unit One: Lesson 10 "I'll Always Be Proud of Him"
No ratings yet
Unit One: Lesson 10 "I'll Always Be Proud of Him"
11 pages
Rewriting The Classics Argumentative Essay by Lucienne Tanios
No ratings yet
Rewriting The Classics Argumentative Essay by Lucienne Tanios
2 pages
Patient Clinical Audit Case Study Example
No ratings yet
Patient Clinical Audit Case Study Example
3 pages
Dijkstra's Algorithm: 1 N Ij I J 1
No ratings yet
Dijkstra's Algorithm: 1 N Ij I J 1
5 pages
PPE Lab Manual
No ratings yet
PPE Lab Manual
52 pages
Chung 2016 Stop Bang Questionnaire
No ratings yet
Chung 2016 Stop Bang Questionnaire
8 pages
Vigyan Vahini
No ratings yet
Vigyan Vahini
8 pages
Epic Minigeddon2
No ratings yet
Epic Minigeddon2
1 page
Machine Learning - Advanced Concepts
From Everand
Machine Learning - Advanced Concepts
Derrick Mwiti
No ratings yet
Mesh Generation: Advances and Applications in Computer Vision Mesh Generation
From Everand
Mesh Generation: Advances and Applications in Computer Vision Mesh Generation
Fouad Sabry
No ratings yet
Kernel Methods: Fundamentals and Applications
From Everand
Kernel Methods: Fundamentals and Applications
Fouad Sabry
No ratings yet