License: arXiv.org perpetual non-exclusive license
arXiv:2602.08260v2 [eess.SP] 23 Apr 2026

Towards Optimal Semantic Communications: Reconsidering the Role of Semantic Feature Channels

Yongjeong Oh, , Jihong Park, ,
Jinho Choi, , and Yo-Seb Jeon
Yongjeong Oh and Yo-Seb Jeon are with the Department of Electrical Engineering, POSTECH, Pohang, Gyeongbuk 37673, Republic of Korea (email: yongjeongoh@postech.ac.kr, yoseb.jeon@postech.ac.kr).Jihong Park is with the Information Systems Technology and Design Pillar, Singapore University of Technology and Design, Singapore 487372 (email: jihong_park@sutd.edu.sg).Jinho Choi is with the School of Electrical and Mechanical Engineering, The University of Adelaide, SA 5005, Australia (email: jinho.choi@adelaide.edu.au).
Abstract

This paper investigates the optimization of transmitting the encoder outputs, termed semantic features (SFs), in semantic communication (SC). We begin by modeling the entire communication process from the encoder output to the decoder input, encompassing the physical channel and all transceiver operations, as the SF channel, thereby establishing an encoder–SF channel–decoder pipeline. In contrast to prior studies that assume a fixed SF channel, we note that the SF channel is configurable, as its characteristics are shaped by various transmission and reception strategies, such as power allocation. Based on this observation, we formulate the SF channel optimization problem under a mutual information constraint between the SFs and their reconstructions, and analytically derive the optimal SF channel under a linear encoder-decoder structure and Gaussian source assumption. Building on this analysis, we propose a joint optimization framework for the encoder-decoder and SF channel applicable to both analog and digital SC systems. To realize the optimized SF channel, we also propose a physical-layer calibration strategy that enables real-time power control and adaptation to varying channel conditions. Simulation results demonstrate that the proposed SF channel optimization achieves superior task performance under various communication environments.

I Introduction

With recent advances in artificial intelligence (AI), next-generation wireless networks are anticipated to support emerging intelligent applications such as digital twins, intelligent transportation, and collaborative robotics [4, 22]. These applications often require frequent and large-scale data exchange, which places a heavy burden on existing communication systems designed for the accurate transmission of raw data. Fortunately, with AI systems increasingly equipped with perception and decision-making capabilities, the communication goals in such intelligent services are gradually changing from accurately reproducing raw data to conveying information that contributes to performing specific tasks. This shift has led to the emergence of a new communication paradigm known as semantic communication (SC), which focuses on transmitting only task-relevant information, thereby improving bandwidth efficiency and robustness against channel perturbations [5, 38, 31].

Among various implementations of SC, two representative approaches have been widely studied: deep separate source-channel coding (DeepSSCC) and deep joint source-channel coding (DeepJSCC). In DeepSSCC, the source and channel coding schemes are designed independently [15, 30]. This separation provides high design flexibility, allowing source and channel coding schemes to be freely combined. However, the decoupled design prevents joint optimization across the source and channel, thereby limiting the achievable task performance [14]. In contrast, DeepJSCC jointly designs the source and channel coding schemes [3]. This joint design principle reduces flexibility, as the encoder and decoder are typically tailored to specific source distributions and channel conditions. Nevertheless, the ability to perform end-to-end optimization often leads to superior task performance, which has driven extensive research efforts toward DeepJSCC.

A typical approach in DeepJSCC is to transform the input data at the transmitter into a latent representation, often referred to as the semantic features (SFs). These SFs are then transmitted through either analog or digital communication, depending on whether the SFs are conveyed in continuous or discrete forms. In both cases, the transmitted SFs inevitably experience degradation due to channel fading and noise. At the receiver, the decoder performs the designated task based on the received SFs. We note that the entire process from the encoder output to the decoder input, including the physical channel, modulation, power control, and other transceiver operations, can be collectively modeled as an equivalent channel, referred to as the SF channel. This abstraction allows us to interpret the DeepJSCC framework as an encoder–SF channel–decoder pipeline, as illustrated in Fig. 1, where the SF channel represents how the transmitted SFs are distorted and transformed through the communication process.

Refer to caption
(a) Analog SC pipeline with the SF channel p(𝒛^|𝒛)p(\hat{\bm{z}}|{\bm{z}})
Refer to caption
(b) Digital SC pipeline with the SF channel p(𝒃^|𝒃)p(\hat{\bm{b}}|{\bm{b}})
Figure 1: The overall analog and digital SC pipelines: Encoder–SF channel–decoder.

Over the years, extensive research has focused on encoder-decoder (enc-dec) optimization under a single fixed SF channel assumption [3, 33, 37, 34, 7]. For example, earlier works modeled the SF channel as an additive white Gaussian noise (AWGN) or Rayleigh fading channel with a fixed signal-to-noise ratio (SNR) level [3, 33]. This approach was further extended by incorporating multipath fading effects and multi-antenna systems into the SF channel [37, 34]. Also, in [7], to capture bit-level transmission characteristics of digital communication, the SF channel was characterized as a binary symmetric channel (BSC) with a fixed bit-flip probability. These studies have successfully demonstrated the effectiveness of enc-dec-centric optimization by achieving high task performance under the assumed SF channel. Nevertheless, the resulting enc-dec often suffers from significant performance degradation when the actual SF channel deviates from the trained one.

To address this challenge, several studies have attempted to enhance robustness or adaptability by training the enc-dec under multiple SF channels [39, 28, 16, 2, 21]. In [39], multiple SF channels were configured under an AWGN or Rayleigh fading channel with varying SNR levels. For digital communications, multiple SF channels were generated by sampling the bit-flip probabilities of the BSCs [28] or by changing both the SNR levels and modulation orders under an AWGN channel [16]. Further, in [2] and [21], multiple SF channels were induced by varying the number of transmitted SFs for rate-adaptive SCs. Although these studies further improve the robustness and generality of the enc-dec-centric optimization, their performance still tends to degrade under unseen communication environments. Moreover, covering all possible SF channels using the above approaches would require excessive data sampling and/or model complexity as the actual SF channel can vary by numerous factors, including antenna configurations, channel statistics, and interference/noise levels.

Unlike prior works that focus solely on enc–dec-centric optimization under fixed SF channels, this work highlights that the SF channel is configurable, as its behavior depends on various transceiver operations such as power allocation and adaptive modulation and coding [11]. This implies that the SF channel itself can be incorporated into the training process to further improve task performance. An initial attempt in this direction was made in [27], which jointly optimizes the enc–dec and a BSC-based SF channel. However, the applicability of [27] is limited to point-to-point digital SC scenarios. Moreover, [27] does not establish a theoretical connection between the SF channel and practical communication systems, as it relies on a heuristic regularization loss for optimizing the BSC-based SF channel. Table I summarizes representative studies on SF channel modeling and optimization and categorizes them into three groups: single SF channels, multiple SF channels, and trainable SF channels.

To further shed light on the potential of configuring the SF channel, this paper proposes a universal and theoretically grounded framework for jointly optimizing the enc-dec and the SF channel in both analog and digital SCs. In this framework, we focus on solving a joint optimization problem that maximizes the task performance under a limited mutual information between the transmitted and reconstructed SFs. To provide analytical evidence for the necessity of this joint optimization, we first analyze a tractable case where the source is Gaussian and the enc-dec is modeled as linear mappings. Building upon the insights from this analysis, we propose an end-to-end training strategy that jointly optimizes the non-linear deep neural network (DNN)-based enc-dec and the SF channel. We also introduce a communication strategy that realizes the trained SF channel in practical communication scenarios by controlling physical layer (PHY) parameters including transmit power and modulation level. Our framework controls the distortion of each SF by optimizing the SF channel, thereby improving the task performance of SC. Furthermore, it reduces the training overhead by decoupling the training process from the actual communication system.

TABLE I: Comparison of related works on SF channel modeling and optimization.
Category Work SF channel Channel adaptiveness or robustness Train SF channel for analog SCs Train SF channel for digital SCs Theoretical foundation for SF channel
Single SF channel [3, 33] AWGN/Rayleigh with a fixed SNR
[37] OFDM with a fixed number of multipaths
[34] MIMO with a fixed number of antennas
[7] BSC with a fixed bit-flip probability
Multiple SF channels [39] AWGN/Rayleigh with varying SNRs
[28] BSC with varying bit-flip probabilities
[16] AWGN with varying SNRs and modulation orders
[2, 21] Transmission with varying numbers of SFs
Trainable SF channel [27] Trainable BSC with a heuristic loss function
This work Trainable AWGN / Trainable BSC with a mutual information constraint

The major contributions of this paper are summarized as follows.

  • We formulate the joint optimization problem of the enc-dec and the SF channel to maximize task performance in both analog and digital SCs. In this problem, we introduce a mutual information constraint between the transmitted and reconstructed SFs to prevent convergence to a trivial error-free SF channel, while accounting for various communication constraints in an integrated manner.

  • We present an analytical study on the joint optimization between the enc-dec and the SF channel. To this end, we focus on a tractable scenario with a linear enc–dec structure and a Gaussian source, from which we derive the optimal SF channel in closed form.

  • We propose an end-to-end training strategy for jointly optimizing the DNN-based enc-dec and the SF channel under limited mutual information. In analog SCs, the SF channel is modeled as an AWGN channel, where each SF is corrupted by Gaussian noise with a learnable variance. In digital SCs, the SF channel is modeled as a set of BSCs, where each bit can be flipped with a learnable bit-flip probability. In both analog and digital SCs, the limited mutual information is addressed as a rate allocation problem, in which each SF or bit is assigned a portion of the total communication rate. The allocated rate determines the noise variance in analog SCs or the bit-flip probability in digital SCs, thereby enabling individual control over the distortion of each SF or bit.

  • We introduce a communication strategy, referred to as a PHY calibration strategy, which realizes the optimized SF channel by controlling PHY parameters. In single-user analog SCs, the proposed strategy determines the transmit power and feature-to-channel mapping so that the actual SNR matches the trained SNR. This approach can be readily extended to multi-user analog SCs. In multi-user digital SCs, it jointly adjusts the transmit powers and modulation levels across multiple users to align the actual BERs with the trained bit-flip probabilities. For both analog and digital SCs, the proposed strategy selects the most suitable SF channel among multiple candidates to adapt to varying communication environments.

  • Through simulation, we demonstrate that the proposed framework achieves superior image reconstruction quality across various mutual information limits. We also numerically verify that the simulation results observed in the Gaussian-source analysis also appear in SC, confirming the practical validity of the theoretical insights. Furthermore, we show that the proposed PHY calibration strategy faithfully realizes the target SF channel in actual wireless environments.

II System Model and Concept of SF Channel

In this section, we first present the analog and digital SC systems and then introduce the concept of the SF channel.

II-A System Model

We consider a typical SC model where a transmitter is connected to a receiver over a wireless network to perform an image reconstruction task. This model can be readily extended to other machine learning tasks and SC architectures. Let 𝒙N{\bm{x}}\in\mathbb{R}^{N} denote the image data of length NN. The transmitter encodes 𝒙{\bm{x}} using an encoder as follows:

𝒛=f𝜽enc(𝒙)M,\displaystyle{\bm{z}}=f_{{\bm{\theta}}_{\rm enc}}({\bm{x}})\in\mathbb{R}^{M}, (1)

where f𝜽enc()f_{{\bm{\theta}}_{\rm enc}}(\cdot) is the encoding function parameterized by 𝜽enc{\bm{\theta}}_{\rm enc}, and 𝒛{\bm{z}} is the SF vector of length MM.

After encoding, the SF vector 𝒛{\bm{z}} is mapped into either an analog or a digital symbol depending on whether analog or digital communication is employed.

  • Analog symbol mapping: Each SF is mean-centered and scaled as

    z~m(A)=pm(A)(zmμm),m{1,,M},\displaystyle\tilde{z}_{m}^{\rm(A)}=\sqrt{p_{m}^{\rm(A)}}({z}_{m}-\mu_{m}),\quad m\in\{1,\cdots,M\}, (2)

    where μm\mu_{m} is the mean of the mm-th SF, and pm(A)p_{m}^{\rm(A)} is a power allocation coefficient satisfying m𝔼[|z~m(A)|2]Ptot\sum_{m}\mathbb{E}[|\tilde{z}_{m}^{\rm(A)}|^{2}]\leq P_{\rm tot}, with PtotP_{\rm tot} denoting the total power budget. Then, pairs of real-valued SFs are grouped into complex symbols as

    su(A)=z~2u1(A)+jz~2u(A),u{1,,M2}.\displaystyle{s}_{u}^{\rm(A)}=\tilde{z}_{2u-1}^{\rm(A)}+j\,\tilde{z}_{2u}^{\rm(A)},\quad u\in\left\{1,\cdots,\frac{M}{2}\right\}. (3)
  • Digital symbol mapping: The SF vector is first quantized into a bit sequence 𝒃{0,1}B{\bm{b}}\in\{0,1\}^{B} of length BB using standard quantization methods [25, 13, 24]. The bit sequence is then mapped to a symbol sequence 𝒛~(D)𝒞T\tilde{\bm{z}}^{\rm(D)}\in\mathcal{C}^{T} of length TT through a digital modulation process, where 𝒞\mathcal{C} denotes the constellation set. Each modulated symbol is scaled as

    st(D)=pt(D)z~t(D),t{1,,T},\displaystyle{s}_{t}^{\rm(D)}=\sqrt{p_{t}^{\rm(D)}}\tilde{z}_{t}^{\rm(D)},\quad t\in\left\{1,\cdots,T\right\}, (4)

    where pt(D)p_{t}^{\rm(D)} is the transmit power allocated to the tt-th symbol z~t(D)\tilde{z}_{t}^{\rm(D)} and satisfies tpt(D)Ptot\sum_{t}p_{t}^{\rm(D)}\leq P_{\rm tot} under the assumption that 𝔼[|z~t(D)|2]=1\mathbb{E}[|\tilde{z}_{t}^{\rm(D)}|^{2}]=1.

For consistency, we denote the symbol sequence length by TT. In analog mapping, T=M/2T=M/2 is fixed, while in digital mapping, TT varies depending on the modulation order. The superscripts (A)(\rm A) and (D)(\rm D) are omitted hereafter for notational simplicity.

Under a flat-fading channel, the received signal at the tt-th channel use is expressed as

yt=htst+nt,\displaystyle y_{t}=h_{t}{s}_{t}+n_{t}, (5)

where hth_{t}\in\mathbb{C} is the channel coefficient, and nt𝒞𝒩(0,σ2)n_{t}\sim\mathcal{CN}(0,\sigma^{2}) is AWGN with variance σ2\sigma^{2}. The channel coefficient hth_{t} may remain constant or vary depending on the coherence time and the number of subcarriers [11]. Upon receiving the signal in (5), the receiver performs channel equalization to obtain the equalized signal at the tt-th channel use, expressed as

y~tht|ht|2yt=st+n~t,\displaystyle\tilde{y}_{t}\triangleq\frac{h_{t}^{*}}{|h_{t}|^{2}}y_{t}={s}_{t}+\tilde{n}_{t}, (6)

where n~t𝒞𝒩(0,σ2|ht|2)\tilde{n}_{t}\sim\mathcal{CN}(0,\frac{\sigma^{2}}{|h_{t}|^{2}}). From the equalized signal in (6), an estimate of zmz_{m} is obtained using either an analog or digital demapping process.

  • Analog symbol demapping: The equalized signal y~t\tilde{y}_{t} is decomposed into its in-phase and quadrature components, followed by power de-scaling and mean restoration. This standard receiver-side processing leads to the equivalent additive-noise model, given by

    z^m=zm+wm,wm𝒩(0,σ22|ht|2pm),\displaystyle\hat{z}_{m}=z_{m}+w_{m},\quad w_{m}\sim\mathcal{N}\!\left(0,\tfrac{\sigma^{2}}{2|h_{t}|^{2}p_{m}}\right), (7)

    where t=m/2t=\lceil m/2\rceil.

  • Digital symbol demapping: Symbol detection is performed on y~t\tilde{y}_{t} to recover the estimated bit sequence 𝒃^{0,1}B\hat{\bm{b}}\in\{0,1\}^{B}. The estimated bit sequence is then dequantized to obtain the estimated SF vector 𝒛^\hat{\bm{z}}.

Finally, the receiver reconstructs an image using a decoder as follows:

𝒙^=f𝜽dec(𝒛^)N,\displaystyle\hat{\bm{x}}=f_{{\bm{\theta}}_{\rm dec}}(\hat{\bm{z}})\in\mathbb{R}^{N}, (8)

where 𝒙^\hat{\bm{x}} denotes the reconstructed image, and f𝜽dec()f_{{\bm{\theta}}_{\rm dec}}(\cdot) represents the decoding function parameterized by 𝜽dec{\bm{\theta}}_{\rm dec}.

II-B SF Channel

Definition (SF channel): The SF channel is the equivalent channel between the encoder output and the decoder input, denoted by p(𝒛^|𝒛)p(\hat{\bm{z}}|{\bm{z}}) for analog communication and by p(𝒃^|𝒃)p(\hat{\bm{b}}|{\bm{b}}) for digital communication111In this work, we focus on digital SC systems, where the encoder output is quantized and converted into a bit sequence. Nevertheless, the system can be extended to the discrete symbol domain, as in [33], where the SF channel can be modeled as p(𝒔^|𝒔)p(\hat{\bm{s}}|{\bm{s}}) with 𝒔\bm{s} denoting discrete modulation symbols..

For the system described in Sec. II-A, the SF channel includes the entire transmit-receive process, including power control and equalization. Following the definition of the SF channel, the overall SC pipeline can be represented as

𝒙Enc𝒛(or𝒃)SF channel𝒛^(or𝒃^)Dec𝒙^,\displaystyle{\bm{x}}\xrightarrow{\text{Enc}}{\bm{z}}~(\text{or}~{\bm{b}})\xrightarrow{\text{SF channel}}\hat{\bm{z}}~(\text{or}~\hat{\bm{b}})\xrightarrow{\text{Dec}}\hat{\bm{x}}, (9)

where its visualization is shown in Fig. 1.

III Motivation and Case Study

In this section, we present the motivation for joint enc-dec and SF channel optimization and provide a case study that analytically illustrates its necessity.

III-A Motivation for Joint Optimization

Our key observation is that the SF channel is configurable through various communication strategies. For instance, (7) shows that the distortion of each SF zmz_{m} can be controlled by adjusting the corresponding power coefficient pmp_{m}. Despite this inherent configurability, most existing works focus solely on optimizing the enc-dec while keeping the SF channel fixed. This motivates us to consider the joint optimization of the enc-dec and the SF channel.

A trivial SF channel that maximizes task performance corresponds to the ideal error-free channel, i.e., 𝒛^=𝒛\hat{\bm{z}}={\bm{z}}. While this solution preserves all SFs without distortion, it fails to account for the task-dependent importance of individual SF components and results in an inefficient allocation of communication resources. In particular, over-allocating resources to all SFs disregards their heterogeneous contributions to task performance, leading to potentially excessive resource usage. Therefore, it is essential to impose appropriate constraints on the SF channel, ensuring operation under limited communication resources and within a non-trivial regime where 𝒛^𝒛\hat{\bm{z}}\neq{\bm{z}}.

To facilitate optimal SF channel design under resource constraints, we raise the following fundamental question:

Motivating Question: What is the optimal SF channel that maximizes task performance under a limited mutual information, i.e., I(𝒛;𝒛^)ImaxI({\bm{z}};\hat{\bm{z}})\leq I_{\rm max} or I(𝒃;𝒃^)ImaxI({\bm{b}};\hat{\bm{b}})\leq I_{\rm max}?

From an information-theoretic perspective, this constraint implies that the transmitted and received SFs are not identical. Therefore, it indirectly imposes communication constraints and allows us to accommodate various communication scenarios. One representative example is the Gaussian channel, where the mutual information is determined by bandwidth, transmit power, and noise variance. By constraining the mutual information, these factors are restricted in an integrated manner. As a result, the framework captures the effects of multiple communication constraints, while avoiding excessive training overhead for the enc-dec. The mutual information limit ImaxI_{\rm max} does not represent the Shannon channel capacity; rather, it quantifies how much information about 𝒛{\bm{z}} can be conveyed through the channel to 𝒛^\hat{\bm{z}}. A smaller ImaxI_{\rm max} indicates more severe degradation under poor channel conditions or low transmit power, whereas a larger ImaxI_{\rm max} corresponds to more reliable transmission. Since channel-induced distortions are inherently allowed, our formulation differs from the classical source-channel separation theory.

III-B Case Study: Analog SC with Linear Enc-Dec and Gaussian Input

Motivated by the above question, we present a simple analytical case study to gain insight into the joint optimization of the enc-dec and the SF channel. To this end, we consider a simplified and tractable scenario with a linear enc-dec structure and a Gaussian source, for which the joint optimization can be analytically characterized. The proposed SC framework that generalizes this insight to practical settings is presented in the subsequent sections.

Let 𝒙N{\bm{x}}\in\mathbb{R}^{N} be a Gaussian source with distribution 𝒙𝒩(𝟎,𝚺xx){\bm{x}}\sim\mathcal{N}({\bm{0}},{\bm{\Sigma}}_{\rm xx}), where 𝚺xx=diag(σx,12,,σx,N2){\bm{\Sigma}}_{\rm xx}={\rm diag}(\sigma_{{\rm x},1}^{2},\cdots,\sigma_{{\rm x},N}^{2}) and σx,12σx,N2\sigma_{{\rm x},1}^{2}\geq\cdots\geq\sigma_{{\rm x},N}^{2}. A linear encoder compresses 𝒙{\bm{x}} as 𝒛=𝑨𝒙M{\bm{z}}={\bm{A}}{\bm{x}}\in\mathbb{R}^{M}, with MNM\leq N and 𝑨M×N{\bm{A}}\in\mathbb{R}^{M\times N} satisfying 𝑨𝑨𝖳=𝑰M{\bm{A}}{\bm{A}}^{\sf T}={\bm{I}}_{M} to constrain the encoder output power. The SF channel is assumed to add an independent Gaussian noise 𝒘𝒩(𝟎,𝚺ww){\bm{w}}\sim\mathcal{N}({\bm{0}},{\bm{\Sigma}}_{\rm ww}), where 𝚺ww=diag(σw,12,,σw,M2){\bm{\Sigma}}_{\rm ww}={\rm diag}(\sigma_{{\rm w},1}^{2},\cdots,\sigma_{{\rm w},M}^{2}), resulting in the received signal 𝒛^=𝒛+𝒘\hat{\bm{z}}={\bm{z}}+{\bm{w}}. The decoder reconstructs 𝒙^=𝑩𝒛^\hat{\bm{x}}={\bm{B}}\hat{\bm{z}}, where 𝑩N×M{\bm{B}}\in\mathbb{R}^{N\times M}. The optimization problem is formulated as

(𝐏𝟏)min𝑨,𝑩,𝚺ww\displaystyle({\bf P1})~~\min_{{\bm{A}},{\bm{B}},{\bm{\Sigma}}_{\rm ww}} 𝔼[𝒙𝒙^2],\displaystyle\mathbb{E}[\|{\bm{x}}-\hat{\bm{x}}\|^{2}], (10)
s.t. I(𝐳;𝐳^)Imax,𝑨𝑨𝖳=𝑰M.\displaystyle I({\bf z};\hat{\bf z})\leq I_{\rm max},~{\bm{A}}{\bm{A}}^{\sf T}={\bm{I}}_{M}. (11)

In problem 𝐏𝟏{\bf P1}, the mutual information is given by

I(𝒛;𝒛^)\displaystyle I({\bm{z}};\hat{\bm{z}}) =h(𝒛^)h(𝒛^|𝒛)\displaystyle=h(\hat{\bm{z}})-h(\hat{\bm{z}}|{\bm{z}})
=12log(det(𝚺ww1(𝑨𝚺xx𝑨𝖳+𝚺ww))).\displaystyle=\frac{1}{2}\log\left(\det({\bm{\Sigma}}_{\rm ww}^{-1}({\bm{A}}{\bm{\Sigma}}_{\rm xx}{\bm{A}}^{\sf T}+{\bm{\Sigma}}_{\rm ww}))\right). (12)

The objective function can be expressed as

𝔼[𝒙𝒙^2]\displaystyle\mathbb{E}[\|{\bm{x}}-\hat{\bm{x}}\|^{2}] =𝔼[𝒙𝑩𝒛^2]\displaystyle=\mathbb{E}[\|{\bm{x}}-{\bm{B}}\hat{\bm{z}}\|^{2}]
=Tr(𝚺xx2𝑩𝚺z^x+𝑩𝚺z^z^𝑩𝖳),\displaystyle={\rm Tr}({\bm{\Sigma}}_{\rm xx}-2{\bm{B}}{\bm{\Sigma}}_{\hat{\rm z}{\rm x}}+{\bm{B}}{\bm{\Sigma}}_{\hat{\rm z}\hat{\rm z}}{\bm{B}}^{\sf T}), (13)

where 𝚺z^x𝔼[𝒛^𝒙𝖳]=𝑨𝚺xx𝖳{\bm{\Sigma}}_{\hat{\rm z}{\rm x}}\triangleq\mathbb{E}[\hat{\bm{z}}{\bm{x}}^{\sf T}]={\bm{A}}{\bm{\Sigma}}_{{\rm x}{\rm x}}^{\sf T}, and 𝚺z^z^𝔼[𝒛^𝒛^𝖳]=𝑨𝚺xx𝑨𝖳+𝚺ww{\bm{\Sigma}}_{\hat{\rm z}\hat{\rm z}}\triangleq\mathbb{E}[\hat{\bm{z}}\hat{\bm{z}}^{\sf T}]={\bm{A}}{\bm{\Sigma}}_{{\rm x}{\rm x}}{\bm{A}}^{\sf T}+{\bm{\Sigma}}_{{\rm w}{\rm w}}. Differentiating (III-B) with respect to 𝑩{\bm{B}} and setting the result to zero, we have the optimal form of 𝑩{\bm{B}} as

𝑩\displaystyle{\bm{B}} =𝚺xz^𝚺z^z^1=𝚺xx𝑨𝖳(𝑨𝚺xx𝑨𝖳+𝚺ww)1.\displaystyle={\bm{\Sigma}}_{{\rm x}\hat{\rm z}}{\bm{\Sigma}}_{\hat{\rm z}\hat{\rm z}}^{-1}={\bm{\Sigma}}_{{\rm x}{\rm x}}{\bm{A}}^{\sf T}({\bm{A}}{\bm{\Sigma}}_{{\rm x}{\rm x}}{\bm{A}}^{\sf T}+{\bm{\Sigma}}_{{\rm w}{\rm w}})^{-1}. (14)

Substituting (14) into (III-B) and applying the Woodbury matrix identity, the objective function can be rewritten as

𝔼[𝒙𝒙^2]=Tr((𝚺xx1+𝑨𝖳𝚺ww1𝑨)1),\displaystyle\mathbb{E}[\|{\bm{x}}-\hat{\bm{x}}\|^{2}]={\rm Tr}(({\bm{\Sigma}}_{\rm xx}^{-1}+{\bm{A}}^{\sf T}{\bm{\Sigma}}_{{\rm w}{\rm w}}^{-1}{\bm{A}})^{-1}), (15)

which demonstrates the dependence of the objective on 𝑨{\bm{A}} and 𝚺ww{\bm{\Sigma}}_{\rm ww}. However, directly differentiating it with respect to these variables does not yield a closed-form solution due to the complex trace-inverse form. To address this, we derive the solution through three steps: (i) we characterize the optimal form of 𝑨{\bm{A}}, (ii) determine the optimal 𝚺ww{\bm{\Sigma}}_{\rm ww}, and (iii) obtain the closed-form expression for the optimal 𝑨{\bm{A}}.

To characterize the optimal form of 𝑨{\bm{A}}, let us refer to a binary matrix 𝑷~\tilde{\bm{P}} in which every standard basis vector appears once as a column, with the remaining columns (if any) being zero vectors, as a partial permutation matrix. Then the following lemma holds.

Lemma 1

For any matrix 𝐀{\bm{A}}, there exists a partial permutation matrix 𝐏~\tilde{\bm{P}}, such that

Tr((𝚺xx1+𝑨𝖳𝚺ww1𝑨)1)Tr((𝚺xx1+𝑷~𝖳𝚺ww1𝑷~)1).\displaystyle{\rm Tr}(({\bm{\Sigma}}_{\rm xx}^{-1}+{\bm{A}}^{\sf T}{\bm{\Sigma}}_{{\rm w}{\rm w}}^{-1}{\bm{A}})^{-1})\geq{\rm Tr}(({\bm{\Sigma}}_{\rm xx}^{-1}+\tilde{\bm{P}}^{\sf T}{\bm{\Sigma}}_{{\rm w}{\rm w}}^{-1}\tilde{\bm{P}})^{-1}). (16)
Proof:

See Appendix A. ∎

Based on Lemma 1, the optimal 𝑨{\bm{A}} has the form of a partial permutation matrix 𝑷~\tilde{\bm{P}}. Then, problem 𝐏𝟏{\bf P1} is reformulated as

(𝐏𝟐)\displaystyle({\bf P2})~~ min{p~m,n}m,n,{σw,m}mn=1N11σx,n2+m=1Mp~m,nσw,m2,\displaystyle\min_{\{\tilde{p}_{m,n}\}_{\forall m,n},\{\sigma_{{\rm w},m}\}_{\forall m}}\quad\sum_{n=1}^{N}\frac{1}{\frac{1}{\sigma_{{\rm x},n}^{2}}+\sum_{m=1}^{M}\frac{\tilde{p}_{m,n}}{\sigma_{{\rm w},m}^{2}}}, (17)
s.t. 12m=1Mlog(1+[𝑷~𝚺xx𝑷~𝖳]m,mσw,m2)Imax,\displaystyle\frac{1}{2}\sum_{m=1}^{M}\log\left(1+\frac{[\tilde{\bm{P}}{\bm{\Sigma}}_{\rm xx}\tilde{\bm{P}}^{\sf T}]_{m,m}}{\sigma_{{\rm w},m}^{2}}\right)\leq I_{\rm max}, (18)
p~m,n\displaystyle\tilde{p}_{m,n} {0,1},n=1Np~m,n=1,m=1Mp~m,n{0,1},\displaystyle\in\{0,1\},~\sum_{n=1}^{N}\tilde{p}_{m,n}=1,~\sum_{m=1}^{M}\tilde{p}_{m,n}\in\{0,1\}, (19)

where the constraints in (19) come from the definition of 𝑷~\tilde{\bm{P}}.

Setting 𝑨{\bm{A}} as a partial permutation matrix implies that only a subset of sources is selected for transmission. Let 𝒯{1,,N}\mathcal{T}\subset\{1,\cdots,N\} denote the selected source index set with |𝒯|=M|\mathcal{T}|=M, and let ϕ:𝒯{1,,M}\phi:\mathcal{T}\rightarrow\{1,\cdots,M\} denote the source-channel index mapping function such that pϕ(k),k=1p_{\phi(k),k}=1 for k𝒯k\in\mathcal{T}. Then, the objective function and mutual information constraint in 𝐏𝟐{\bf P2} can be rewritten as

k𝒯σx,k2σw,ϕ(k)2σx,k2+σw,ϕ(k)2+t(𝒩𝒯)σx,t2,\displaystyle\sum_{k\in\mathcal{T}}\frac{\sigma_{{\rm x},k}^{2}\sigma_{{\rm w},\phi(k)}^{2}}{\sigma_{{\rm x},k}^{2}+\sigma_{{\rm w},\phi(k)}^{2}}+\sum_{t\in\mathcal{(N\setminus T)}}\sigma_{{\rm x},t}^{2}, (20)

and

12k𝒯log(1+σx,k2σw,ϕ(k)2)Imax,\displaystyle\frac{1}{2}\sum_{k\in\mathcal{T}}\log\left(1+\frac{\sigma_{{\rm x},k}^{2}}{\sigma_{{\rm w},\phi(k)}^{2}}\right)\leq I_{\rm max}, (21)

respectively.

Applying the Lagrangian method to (20) and (21), the optimal noise variance is obtained as

(σw,ϕ(k)2)={λσx,k22σx,k2λif λ<2σx,k2,if λ2σx,k2,\displaystyle\big(\sigma_{{\rm w},\phi(k)}^{2}\big)^{\star}=\begin{cases}\displaystyle\frac{\lambda^{\star}\sigma_{{\rm x},k}^{2}}{2\sigma_{{\rm x},k}^{2}-\lambda^{\star}}&\text{if }\lambda^{\star}<2\sigma_{{\rm x},k}^{2},\\ \infty&\text{if }\lambda^{\star}\geq 2\sigma_{{\rm x},k}^{2},\end{cases} (22)

where λ\lambda^{\star} is the optimal Lagrangian multiplier satisfying 12k𝒯log(1+σx,k2(σw,ϕ(k)2))\frac{1}{2}\sum_{k\in\mathcal{T}}\log\bigg(1+\frac{\sigma_{{\rm x},k}^{2}}{\big(\sigma_{{\rm w},\phi(k)}^{2}\big)^{\star}}\bigg).

Substituting (22) into (20), the objective function is represented as

t=1Nσx,t2k𝒜σx,k2+λ2|𝒜|,\displaystyle\sum_{t=1}^{N}\sigma_{{\rm x},t}^{2}-\sum_{k\in\mathcal{A}}\sigma_{{\rm x},k}^{2}+\frac{\lambda}{2}|\mathcal{A}|, (23)

where 𝒜={k|k𝒯,σw,ϕ(k)2<}\mathcal{A}=\{k|k\in\mathcal{T},\sigma_{{\rm w},\phi(k)}^{2}<\infty\} is the active source index set. It should be noted that the problem of determining 𝒯\mathcal{T} and ϕ\phi reduces to finding the optimal active set, which is characterized as follows:

Lemma 2

The optimal active set 𝒜\mathcal{A}^{\star} is {1,2,,|𝒜|}\{1,2,\cdots,|\mathcal{A}|\}, where |𝒜||\mathcal{A}| is determined by λ\lambda^{\star}.

Proof:

See Appendix B

From Lemma 2, the following corollary holds:

Corollary 1

Setting 𝒯={1,2,,M}\mathcal{T}=\{1,2,\cdots,M\} is sufficient to determine the optimal active set 𝒜\mathcal{A}^{\star}.

Proof:

The set 𝒯\mathcal{T} must contain 𝒜\mathcal{A}^{\star}, and satisfy |𝒯|=M|\mathcal{T}|=M. Therefore, it is obvious that 𝒯\mathcal{T} must be 𝒜𝒰\mathcal{A}^{\star}\cup\mathcal{U} where 𝒰\mathcal{U} is an arbitrary subset of {|𝒜|+1,N}\{|\mathcal{A}|+1\cdots,N\} with |𝒰|=M|𝒜||\mathcal{U}|=M-|\mathcal{A}^{\star}|. ∎

Regarding the mapping function, since ϕ\phi does not affect the objective function in (23) and 𝒜\mathcal{A}, the identity mapping ϕ(k)=k\phi(k)=k can be adopted as a sufficient choice.

The sequence of results established in (14), (22), Lemmas 1 and 2, and Corollary 1 leads to the following theorem.

Theorem 1 (Optimal Solution)

The optimal encoder, decoder, and noise covariance matrix of the SF channel in problem 𝐏𝟏{\bf P1} are given by

𝑨\displaystyle{\bm{A}}^{\star} =[𝑰M,𝟎M×(NM)],\displaystyle=\big[{\bm{I}}_{M},~{\bm{0}}_{M\times(N-M)}\big], (24)
𝑩\displaystyle{\bm{B}}^{\star} =𝚺xx𝑨𝖳(𝑨𝚺xx𝑨𝖳+𝚺ww)1,\displaystyle={\bm{\Sigma}}_{\rm xx}{\bm{A}}^{{\star}{\sf T}}\big({\bm{A}}^{\star}{\bm{\Sigma}}_{\rm xx}{\bm{A}}^{{\star}{\sf T}}+{\bm{\Sigma}}_{\rm ww}^{\star}\big)^{-1}, (25)
𝚺ww\displaystyle{\bm{\Sigma}}_{\rm ww}^{\star} =diag((σw,12),,(σw,M2)),\displaystyle={\rm diag}\left(\big(\sigma_{{\rm w},1}^{2}\big)^{\star},\cdots,\big(\sigma_{{\rm w},M}^{2}\big)^{\star}\right), (26)

where (σw,k2),k{1,,M}\big({\sigma}_{{\rm w},k}^{2}\big)^{\star},~k\in\{1,\cdots,M\} is obtained from (22) by setting ϕ(k)=k\phi(k)=k.

Theorem 1 shows that sources with larger variances are selected for transmission, and their noise variances are inversely proportional to the source variances.

Refer to caption
(a) MSE vs. ImaxI_{\rm max}
Refer to caption
(b) MSE vs. MM
Figure 2: MSE curves over the mutual information limit ImaxI_{\rm max} and the SF vector length MM.

To verify the effectiveness of the SF channel in Theorem 1, we conduct simulations with N=1000N=1000, where σx,n2Lognormal(0,4)\sigma_{{\rm x},n}^{2}\sim{\rm Lognormal}(0,4) and are sorted in descending order. We compare three schemes: (i) Proposed SFC (Theorem 1), (ii) ENVC (an equal-noise-variance channel across all SFs with the optimal enc-dec), and (iii) R-D theory (Gaussian R-D bound with 𝑨=𝑩=𝑰N{\bm{A}}={\bm{B}}={\bm{I}}_{N} [9]). Fig. 2(a) shows that the proposed SFC closely follows the R-D bound, with a negligible gap for moderate MM. Fig. 2(b) illustrates the MSE versus MM when Imax=100I_{\rm max}=100, showing that the proposed SFC rapidly converges to the R-D bound, while EC degrades for large MM. The major reason for this degradation is that, as MM increases, stronger noise is assigned to all SFs, thereby causing greater distortion to high-variance sources.

Remark 1 (Connection to R-D Theory): The R-D bound can be characterized via a test channel [9]. For a Gaussian source, the optimal test channel is an additive Gaussian channel with an appropriately chosen noise variance. However, this channel is derived under the assumption that the source dimension NN and the channel input dimension MM are identical, i.e., M=NM=N. Moreover, the case M<NM<N cannot be directly inferred from the test-channel result. In contrast, our formulation starts from the more general setting with MNM\leq N. As a result, the classical test-channel result is recovered when M=NM=N, while the case M<NM<N extends beyond it. Therefore, our framework can be interpreted as a generalization of the test channel, and the closeness to the R-D bound observed in Fig. 2 naturally occurs as MM approaches NN.

IV Proposed Joint Enc-Dec and SF Channel Optimization for SCs

Our analysis in Sec. III provides analytical evidence that jointly optimizing the enc-dec and the SF channel can improve task performance. However, a closed-form solution is obtainable only under a simplified setting (i.e., analog SC with a linear enc–dec and a Gaussian input). In general SC scenarios, it is difficult to obtain an analytically optimal SF channel due to unknown input distributions and nonlinear DNN-based enc–dec structures. To overcome this limitation, we propose an end-to-end training strategy that leverages a data-driven approach to jointly optimize both the enc–dec and the SF channel. The high-level procedure of our strategy is illustrated in Fig. 3.

Refer to caption
(a) End-to-end training for analog SC
Refer to caption
(b) End-to-end training for digital SC
Figure 3: The proposed end-to-end training strategy jointly optimizing the enc-dec and the SF channel for analog and digital SCs.

IV-A End-to-End Training for Analog SC

The SF channel during training is modeled as an AWGN channel, where 𝒛^=𝒛+𝒘\hat{\bm{z}}={\bm{z}}+{\bm{w}} and 𝒘𝒩(𝟎,𝚺ww){\bm{w}}\sim\mathcal{N}({\bm{0}},{\bm{\Sigma}}_{\rm ww}). The noise covariance 𝚺ww{\bm{\Sigma}}_{\rm ww} is treated as a trainable parameter so that different SFs can experience different noise levels during optimization. This implies that SFs that are more critical to the task are assigned lower noise variances for higher reliability, while less important SFs are assigned higher variances to improve communication efficiency. Further, it is important to note that the AWGN modeling is employed solely during training and does not restrict the actual communication scenarios. The practical communication strategy, including power allocation, fading channels, and detection, is described in Sec. V.

Following the above strategy, the optimization problem for end-to-end training is formulated as

(𝐏𝟑)\displaystyle({\bf P3})~~ min𝜽enc,𝜽dec,𝚺ww𝔼[𝒙𝒙^2],\displaystyle\min_{{\bm{\theta}}_{\rm enc},{\bm{\theta}}_{\rm dec},{\bm{\Sigma}}_{\rm ww}}\mathbb{E}[\|{\bm{x}}-\hat{\bm{x}}\|^{2}], (27)
s.t.I(𝒛;𝒛^)Imax.\displaystyle~~~~~~~\text{s.t.}~~~~~~I({\bm{z}};\hat{\bm{z}})\leq I_{\rm max}. (28)

One key challenge in solving 𝐏𝟑{\bf P3} is that the mutual information is difficult to compute due to the nonlinear nature of DNN-based enc-dec and unknown input distributions. Moreover, directly computing the mutual information would incur high computational complexity, making the optimization intractable. To address this, we adopt the mean-field assumption in [35], under which I(𝒛;𝒛^)I({\bm{z}};\hat{\bm{z}}) is decomposed as follows:

I(𝒛;𝒛^)=m=1MI(zm;z^m).\displaystyle I({\bm{z}};\hat{\bm{z}})=\sum_{m=1}^{M}I(z_{m};\hat{z}_{m}). (29)

Based on this decomposition, an upper bound on the mutual information is given by

I(𝒛;𝒛^)12m=1Mlog(1+σz,m2σw,m2),\displaystyle I({\bm{z}};\hat{\bm{z}})\leq\frac{1}{2}\sum_{m=1}^{M}\log\left(1+\frac{\sigma_{{\rm z},m}^{2}}{\sigma_{{\rm w},m}^{2}}\right), (30)

where σz,m2\sigma_{{\rm z},m}^{2} is the variance of zmz_{m}, which can be empirically estimated from training samples. From the above expression, we define the communication rate of the mm-th SF as

Cm=12log(1+σz,m2σw,m2),s.t.m=1MCm=Imax.\displaystyle C_{m}=\frac{1}{2}\log\left(1+\frac{\sigma_{{\rm z},m}^{2}}{\sigma_{{\rm w},m}^{2}}\right),\quad\text{s.t.}~\sum_{m=1}^{M}C_{m}=I_{\rm max}. (31)

To find the optimal CmC_{m} via training, we parameterize it as

Cm=ρmImax,s.t.m=1Mρm=1,\displaystyle C_{m}=\rho_{m}I_{\rm max},\quad\text{s.t.}~\sum_{m=1}^{M}\rho_{m}=1, (32)

where ρm0\rho_{m}\geq 0 is a trainable parameter that determines the portion of the total rate assigned to the mm-th SF. The constraint in (32) is directly derived from (31). With this parameterization, problem 𝐏𝟑{\bf P3} is reformulated as a rate allocation problem with optimization parameters {ρm}\{\rho_{m}\}_{\forall}.

The parameter ρm\rho_{m} can be readily implemented as

ρm=|vm|2i=1M|vi|2,\displaystyle\rho_{m}=\frac{|v_{m}|^{2}}{\sum_{i=1}^{M}|v_{i}|^{2}}, (33)

where vmv_{m}\in\mathbb{R} denotes a trainable raw parameter. By the definition of CmC_{m} in (31), the noise variance is given by

σw,m2=σz,m222ρmImax1.\displaystyle\sigma_{{\rm w},m}^{2}=\frac{\sigma_{{\rm z},m}^{2}}{2^{2\rho_{m}I_{\rm max}}-1}. (34)

Then, the training for the SF channel is realized as

z^m=zm+σw,mνm,νm𝒩(0,1),m.\displaystyle\hat{z}_{m}=z_{m}+\sigma_{{\rm w},m}\nu_{m},~\nu_{m}\sim\mathcal{N}(0,1),~\forall m. (35)

Here, the noise variance of the AWGN channel acts as a bias term in conventional DNNs. As a result, it can be readily optimized using standard neural network optimizers. Meanwhile, in our training, while the AWGN model is adopted as a convenient abstraction for training, it can be extended to more structured channel models, such as correlated Gaussian noise.

In our training, only MM additional parameters {ρm}m\{\rho_{m}\}_{\forall m} are introduced. In practice, MM is sufficiently small compared to the number of enc-dec parameters. Moreover, as shown in (33)–(35), the additional computations required for the SF channel optimization are purely element-wise operations and do not involve large-scale matrix multiplications. Therefore, the proposed method incurs only a marginal increase in computational complexity compared to conventional DeepJSCC.

IV-B End-to-End Training for Digital SC

In digital SCs, the SF channel is modeled as parallel BSCs. The optimization problem is formulated as

(𝐏𝟒)\displaystyle({\bf P4})~~ min𝜽enc,𝜽dec,𝝁𝔼[𝒙𝒙^2],\displaystyle\min_{{\bm{\theta}}_{\rm enc},{\bm{\theta}}_{\rm dec},{\bm{\mu}}}\mathbb{E}[\|{\bm{x}}-\hat{\bm{x}}\|^{2}], (36)
s.t.I(𝒃;𝒃^)Imax.\displaystyle~~~~~\text{s.t.}~~~~I({\bm{b}};\hat{\bm{b}})\leq I_{\rm max}. (37)

The remaining procedures are similar to those in Sec. IV-A. The mutual information is decomposed under the mean-field assumption, and an upper bound is obtained as

I(𝒃;𝒃^)n=1B(1H2(μn)),\displaystyle I({\bm{b}};\hat{\bm{b}})\leq\sum_{n=1}^{B}\left(1-H_{2}(\mu_{n})\right), (38)

where H2(u)=ulog2u(1u)log2(1u)H_{2}(u)=-u\log_{2}u-(1-u)\log_{2}(1-u) for 0u0.50\leq u\leq 0.5. The communication rate of the nn-th bit is defined as

Cn=1H2(μn),\displaystyle C_{n}=1-H_{2}(\mu_{n}), (39)

subject to the following constraints:

n=1BCn=Imax,0Cn1,\displaystyle\sum_{n=1}^{B}C_{n}=I_{\rm max},\quad 0\leq C_{n}\leq 1, (40)

where the first constraint is derived from (37), and the second constraint comes from 0μn0.50\leq\mu_{n}\leq 0.5. The rate allocation problem for digital SC is formulated by parameterizing

Cn=ρnImax,s.t. n=1Bρn=1,0ρn1Imax.\displaystyle C_{n}=\rho_{n}I_{\rm max},~~\text{s.t. }\sum_{n=1}^{B}\rho_{n}=1,~0\leq\rho_{n}\leq\frac{1}{I_{\rm max}}. (41)

The parameter ρn\rho_{n} can be implemented as

ρn=|vn|2+αi=1B(|vi|2+α),\displaystyle\rho_{n}=\frac{|v_{n}|^{2}+\alpha}{\sum_{i=1}^{B}(|v_{i}|^{2}+\alpha)}, (42)

where α=max(Imaxmaxi|vi|2i|vi|2BImax,0)\alpha=\max\left(\frac{I_{\rm max}\max_{i}|v_{i}|^{2}-\sum_{i}|v_{i}|^{2}}{B-I_{\rm max}},0\right) [26]. From the definition of CnC_{n} in (39), the bit-flip probability of the nn-th BSC is given by

μn\displaystyle\mu_{n} =H21(1ρnImax)\displaystyle=H_{2}^{-1}(1-\rho_{n}I_{\rm max})
12aρnImaxb(ρnImax)2c(ρnImax)3,\displaystyle\approx\frac{1}{2}-\sqrt{a\rho_{n}I_{\rm max}-b(\rho_{n}I_{\rm max})^{2}-c(\rho_{n}I_{\rm max})^{3}}, (43)

where a=log22a=\frac{\log 2}{2}, b=(log2)26b=\frac{(\log 2)^{2}}{6}, and c=ab14c=a-b-\frac{1}{4}. The approximation is used since H21()H_{2}^{-1}(\cdot) has no closed-form expression; it is obtained by performing a Taylor expansion of H2(u)H_{2}(u) around u=0.5u=0.5, followed by series reversion.

Training is realized under the relaxed BSC model, given by

b^n=(2bn1)e~n+12[0,1],\displaystyle\hat{b}_{n}=\frac{(2b_{n}-1)\tilde{e}_{n}+1}{2}\in[0,1], (44)

where

e~n=tanh(1τ(logμn1μn+logun1un)),\displaystyle\tilde{e}_{n}=-\tanh\!\left(\frac{1}{\tau}\Big(\log\frac{\mu_{n}}{1-\mu_{n}}+\log\frac{u_{n}}{1-u_{n}}\Big)\right), (45)

un𝒰(0,1)u_{n}\sim\mathcal{U}(0,1) is a random variable, and τ\tau is a temperature parameter [27]. The relaxation is used to compute a gradient of μn\mu_{n} with respect to a given loss function. Consequently, μn\mu_{n} (or ρn\rho_{n}) is jointly optimized with the enc-dec. Meanwhile, similar to the analog case, digital SCs also introduce only BB additional parameters, and the associated computations are purely element-wise. Therefore, the resulting increase in computational complexity is marginal.

Remark 2 (Adaptation to Various Communication Environments): Recall that, in Sec. III, we have discussed the trade-off between the mutual information limit ImaxI_{\rm max} and the MSE. This naturally extends to SCs as a trade-off between ImaxI_{\rm max} and the task performance, as demonstrated in Sec. VI. To handle various communication environments, multiple enc-dec and SF channel pairs can be trained under different mutual information limits. In Sec. V, we introduce a communication strategy that adaptively selects an appropriate SF channel for a given communication environment.

Remark 3 (Comparison to Prior Work in [27]): A similar approach was also studied in our prior work [27], where a BSC-based SF channel was optimized via end-to-end training. However, the optimization relied on a heuristically designed loss function, rather than capturing or constraining the mutual information of the SF channel. Consequently, [27] did not establish a theoretical connection between the SF channel and practical communication systems. Moreover, its validation was restricted to digital SC, raising concerns about its scalability to other forms of SC scenarios, e.g., analog SC. The advantage of our mutual-information-constrained approach over the heuristic approach in [27] will be further discussed in Sec. VI.

V Proposed PHY Calibration for Realizing the Trained SF Channel

The training framework in Sec. IV produces the optimized SF channel by imposing a mutual information constraint, which captures the effects of various communication constraints in an integrated manner during training. However, this abstraction does not directly guarantee that the trained SF channel can be realized under practical communication settings because it does not explicitly account for communication constraints such as total transmit power. To address this issue, the communication parameters must be calibrated so that the SF channel observed during transmission aligns with the optimally trained one while satisfying communication constraints. We refer to this process as PHY calibration. In this section, we present PHY calibration strategies for two communication settings: (i) single-user analog SCs and (ii) multi-user digital SCs.

V-A Single-User Analog SCs

Consider the SF channels trained for different mutual information limits {Imax(u)}u=1U\{I_{\rm max}^{(u)}\}_{u=1}^{U}, satisfying Imax(1)>>Imax(U)I_{\rm max}^{(1)}>\cdots>I_{\rm max}^{(U)}, as discussed in Remark 1. The corresponding losses {L(u)}u=1U\{L^{(u)}\}_{u=1}^{U} follow L(1)<<L(U)L^{(1)}<\cdots<L^{(U)}. Our objective for PHY calibration is to jointly select a proper SF channel and the transmit power. The optimization problem is formulated as

(𝐏𝟓)\displaystyle({\bf P5})~ min{pm}m,u(1w0)L(u)+w0m=1Mpmσz,m2,\displaystyle\min_{\{{p}_{m}\}_{\forall m},\,u}~~(1-w_{0})L^{(u)}+w_{0}\sum_{m=1}^{M}{p}_{m}\sigma_{{\rm z},m}^{2}, (46)
s.t. 2|ht|2pmσz,m2σ2SNR¯m(u),m,t,m=1Mpmσz,m2Ptot,\displaystyle\frac{2|h_{t}|^{2}{p}_{m}\sigma_{{\rm z},m}^{2}}{\sigma^{2}}\geq\overline{\rm SNR}_{m}^{(u)},~\forall m,t,~\sum_{m=1}^{M}{p}_{m}\sigma_{{\rm z},m}^{2}\leq P_{\rm tot}, (47)

where pmσz,m2p_{m}\sigma_{{\rm z},m}^{2} represents the average transmit power used for sending the mm-th SF, and w0[0,1]w_{0}\in[0,1] controls the trade-off between the task loss and the total transmit power. The target SNR of the mm-th SF in the uu-th SF channel, denoted by SNR¯m(u)\overline{\rm SNR}_{m}^{(u)}, is defined as

SNR¯m(u)σz,m2(σw,m(u))2,\displaystyle\overline{\rm SNR}_{m}^{(u)}\triangleq\frac{\sigma_{{\rm z},m}^{2}}{(\sigma_{{\rm w},m}^{(u)})^{2}}, (48)

where (σw,m(u))2(\sigma_{{\rm w},m}^{(u)})^{2} is the trained noise variance of the mm-th SF in the uu-th SF channel. In the first constraint, 2|ht|2pmσz,m2σ2\frac{2|h_{t}|^{2}p_{m}\sigma_{{\rm z},m}^{2}}{\sigma^{2}} represents the actual SNR of zmz_{m} during transmission. This constraint ensures alignment between the target and actual SNRs, thereby improving the reliability of task performance.

To solve problem 𝐏𝟓{\bf P5}, an auxiliary variable is precomputed as

τm(u)=SNR¯m(u)2σz,m2,m,u.\displaystyle\tau_{m}^{(u)}=\frac{\overline{\rm SNR}_{m}^{(u)}}{2\sigma_{{\rm z},m}^{2}},~\forall m,u. (49)

For each uu, τm(u)\tau_{m}^{(u)} is sorted in descending order with respect to mm in advance. When communication begins, the channel-gain-to-noise-power ratio |ht|2σ2\tfrac{|h_{t}|^{2}}{\sigma^{2}} is also sorted in descending order. Here, the indices mm and tt are retained after sorting for notational simplicity. The required power coefficient is then computed as

p¯m(u)=τm(u)σ2|ht|2,t=m2.\displaystyle\bar{p}_{m}^{(u)}=\frac{\tau_{m}^{(u)}\sigma^{2}}{|h_{t}|^{2}},\quad t=\frac{\lceil m\rceil}{2}. (50)

The sorting above assigns SFs with higher τm(u)\tau_{m}^{(u)} to stronger channels, thereby reducing the total transmit power. After obtaining p¯m(u)\bar{p}_{m}^{(u)}, the optimal SF channel index is determined as

u=argminu(1w0)L(u)+w0Preq(u):Preq(u)Ptot),\displaystyle u^{\star}=\operatornamewithlimits{argmin}_{u}\left(1-w_{0})L^{(u)}+w_{0}P_{\rm req}^{(u)}:P_{\rm req}^{(u)}\leq P_{\rm tot}\right), (51)

where Preq(u)=m=1Mp¯m(u)σz,m2P_{\rm req}^{(u)}=\sum_{m=1}^{M}\bar{p}_{m}^{(u)}\sigma_{{\rm z},m}^{2}. The optimal power coefficient is given by p¯m(u)\bar{p}_{m}^{(u^{\star})}.

The proposed PHY calibration for analog SC has several notable features. First, since τm(u)\tau_{m}^{(u)} is pre-shared between the transmitter and the receiver, the optimal power coefficient and SF channel can be computed locally once |ht|2σ2\tfrac{|h_{t}|^{2}}{\sigma^{2}} is obtained. Therefore, no additional communication overhead is required other than sharing |ht|2σ2\tfrac{|h_{t}|^{2}}{\sigma^{2}}222The channel-gain-to-noise-power ratio |ht|2σ2\tfrac{|h_{t}|^{2}}{\sigma^{2}} can be estimated using standard pilot-based techniques or feedback mechanisms[11]. When the channel coherence time is sufficiently large, only a small number of ratios need to be estimated or fed back, resulting in marginal communication overhead. for reconstructing 𝒛^\hat{\bm{z}} and 𝒙^\hat{\bm{x}}. Second, the proposed method incurs very low computational complexity, as the optimal transmit power coefficients are obtained in closed form and require only simple arithmetic operations for each SF. Finally, the method can be readily extended to an interference-free multi-user scenario, in which each user independently adjusts its transmit power based on its own trained target SNRs. Meanwhile, in our PHY calibration strategy, only a total power constraint is imposed. Nevertheless, the framework can be extended to various practical constraints. For example, per-time-slot power constraints can be handled via power clipping.

V-B Multi-User Digital SCs

We consider a multi-user digital SC where KK users transmit different images to a single base station (BS). The channels of all users are assumed to be independent and remain constant during the transmission of all symbols. For the kk-th user, the SF channels trained for different mutual information limits {Imax,k(uk)}uk=1Uk\{I_{{\rm max},k}^{(u_{k})}\}_{u_{k}=1}^{U_{k}}, satisfying Imax,k(1)>>Imax,k(Uk)I_{{\rm max},k}^{(1)}>\cdots>I_{{\rm max},k}^{(U_{k})}, are given. The corresponding losses {Lk(uk)}uk=1Uk\{L_{k}^{(u_{k})}\}_{u_{k}=1}^{U_{k}} follow Lk(1)<<Lk(Uk)L_{k}^{(1)}<\cdots<L_{k}^{(U_{k})}. Our objective for PHY calibration is to jointly determine a proper SF channel, the transmit power, and the modulation levels. The optimization problem is formulated as

(𝐏𝟔)min{{pt,k}t,uk,mk}kk=1KwkLk(uk)+w0k=1Kt=1Tkpt,k\displaystyle({\bf P6})~~\min_{\big\{\{p_{t,k}\}_{\forall t},u_{k},m_{k}\big\}_{\forall k}}~\sum_{k=1}^{K}w_{k}L_{k}^{(u_{k})}+w_{0}\sum_{k=1}^{K}\sum_{t=1}^{T_{k}}p_{t,k} (52)
s.t.μ¯n,k(uk)BER(pt,k,mk,|hk|2σ2),k,n{1,,Bk},\displaystyle\text{s.t.}~\bar{\mu}_{n,k}^{(u_{k})}\geq{\rm BER}\left(p_{t,k},m_{k},\frac{|h_{k}|^{2}}{\sigma^{2}}\right),~\forall k,n\in\{1,\cdots,B_{k}\}, (53)
t=1Tkpt,kPtot(k),k,k=1KTkT,mk{2,4,6,},k,\displaystyle~\sum_{t=1}^{T_{k}}p_{t,k}\leq P_{{\rm tot}}^{(k)},~\forall k,~\sum_{k=1}^{K}T_{k}\leq T,~m_{k}\in\{2,4,6,\cdots\},~\forall k, (54)

where pt,kp_{t,k} is the transmit power for the tt-th symbol, mkm_{k} is the modulation level, BkB_{k} is the number of transmitted bits, and Tk=Bk/mkT_{k}=B_{k}/m_{k} is the corresponding symbol sequence length for the kk-th user. The weighting factors w0w_{0} and wkw_{k} control the trade-off between the total power consumption and the task performance of each user. In the first constraint, μ¯n,k(uk)\bar{\mu}_{n,k}^{(u_{k})} denotes the trained (target) bit-flip probability of the nn-th bit in the uku_{k}-th SF channel. Each nn-th bit is transmitted within the tt-th symbol, where t=n/mkt=\lceil{n}/{m_{k}}\rceil. The BER for this bit is defined as

BER\displaystyle{\rm BER} (pt,k,mk,|hk|2σ2)a(mk)erfc(c(mk)pt,k|hk|2σ2)\displaystyle\left(p_{t,k},m_{k},\frac{|h_{k}|^{2}}{\sigma^{2}}\right)\triangleq a(m_{k}){\rm erfc}\Bigg(\sqrt{\frac{c(m_{k}){p_{t,k}|h_{k}|^{2}}}{\sigma^{2}}}\Bigg)
+b(mk)erfc(3c(mk)pt,k|hk|2σ2),\displaystyle~~\qquad\qquad+b(m_{k}){\rm erfc}\Bigg(3\sqrt{\frac{c(m_{k}){p_{t,k}|h_{k}|^{2}}}{\sigma^{2}}}\Bigg), (55)

where hkh_{k}\in\mathbb{C} is the channel coefficient of the kk-th user, a(mk)=2mk12mklog22mka(m_{k})=\frac{\sqrt{2^{m_{k}}}-1}{\sqrt{2^{m_{k}}}\log_{2}\sqrt{2^{m_{k}}}}, b(mk)=2mk22mklog22mkb(m_{k})=\frac{\sqrt{2^{m_{k}}}-2}{\sqrt{2^{m_{k}}}\log_{2}\sqrt{2^{m_{k}}}}, and c(mk)=32(2mk1)c(m_{k})=\frac{3}{2(2^{m_{k}}-1)} [6]. The second constraint limits the total power budget of each user. The third constraint guarantees that the total number of channel uses across all users does not exceed TT, and the fourth constraint is the candidate modulation levels.

To solve problem 𝐏𝟔{\bf P6}, we first sort μ¯n,k(uk)\bar{\mu}_{n,k}^{(u_{k})} in descending order with respect to nn in advance, where the index nn is retained for notational simplicity. The sorted bit-flip probabilities are grouped by every mkm_{k} bits, and the minimum value within each group is defined as

μ¯t,k(uk,mk)=minn{(t1)mk+1,,tmk}{μ¯n,k(uk)},\displaystyle\bar{\mu}_{t,k}^{(u_{k},m_{k})}=\min_{n\in\{(t-1)m_{k}+1,\cdots,tm_{k}\}}\big\{\bar{\mu}_{n,k}^{(u_{k})}\big\}, (56)

for t{1,,tk(mk)}t\in\{1,\cdots,t_{k}(m_{k})\}, where tk(mk)Bk/mkt_{k}(m_{k})\triangleq B_{k}/m_{k}. The sorting above groups bits with similar bit-flip probabilities. This helps reduce the total transmit power because the transmit power of each symbol is determined by the minimum bit-flip probability within its group, as described in below. Given mkm_{k} and μ¯t,k(uk,mk)\bar{\mu}_{t,k}^{(u_{k},m_{k})}, an auxiliary variable is precomputed as

γt,k(uk,mk)=min{p:μ¯t,k(uk,mk)BER(p,mk,1)},\displaystyle\gamma_{t,k}^{(u_{k},m_{k})}=\min\{p:\bar{\mu}_{t,k}^{(u_{k},m_{k})}\geq{\rm BER}(p,m_{k},1)\}, (57)

for all (t,k,uk,mk)(t,k,u_{k},m_{k}), assuming |hk|2/σ2=1|h_{k}|^{2}/\sigma^{2}=1. When communication begins, the actual channel-gain-to-noise-power ratio |hk|2σ2\frac{|h_{k}|^{2}}{\sigma^{2}} is used to determine the required transmit power as

p¯t,k(uk,mk)=γt,k(uk,mk)σ2|hk|2.\displaystyle\bar{p}_{t,k}^{(u_{k},m_{k})}=\frac{\gamma_{t,k}^{(u_{k},m_{k})}\sigma^{2}}{|h_{k}|^{2}}. (58)

Under the total power constraint, the feasible set for the kk-th user is defined as

Ωk={(uk,mk):Preq,k(uk,mk)Ptot(k)},\displaystyle\Omega_{k}=\{(u_{k},m_{k}):P_{{\rm req},k}^{(u_{k},m_{k})}\leq P_{\rm tot}^{(k)}\}, (59)

where Preq,k(uk,mk)=t=1Tkp¯t,k(uk,mk)P_{{\rm req},k}^{(u_{k},m_{k})}=\sum_{t=1}^{T_{k}}\bar{p}_{t,k}^{(u_{k},m_{k})}. For each feasible pair (uk,mk)Ωk(u_{k},m_{k})\in\Omega_{k}, the corresponding objective value is given by

Jk(uk,mk)=wkLk(uk)+w0Preq,k(uk,mk).\displaystyle J_{k}(u_{k},m_{k})=w_{k}L_{k}^{(u_{k})}+w_{0}P_{{\rm req},k}^{(u_{k},m_{k})}. (60)

For notational convenience, we redefine

tk,j=tk(mk),Jk,j=Jk(uk,mk),\displaystyle t_{k,j}=t_{k}(m_{k}),\quad J_{k,j}=J_{k}(u_{k},m_{k}), (61)

where j{1,,|Ωk|}j\in\{1,\dots,|\Omega_{k}|\} indexes each feasible pair (uk,mk)Ωk(u_{k},m_{k})\in\Omega_{k}. Then, problem 𝐏𝟔{\bf P6} can be reformulated as

(𝐏𝟔)min{xk,j}k,jk=1Kj=1|Ωk|Jk,jxk,j\displaystyle({\bf P6}^{\prime})~~\min_{\{x_{k,j}\}_{\forall k,j}}~\sum_{k=1}^{K}\sum_{j=1}^{|\Omega_{k}|}J_{k,j}x_{k,j} (62)
s.t. j=1|Ωk|xk,j=1,xk,j{0,1},k=1Kj=1|Ωk|tk,jxk,jT,\displaystyle\text{s.t. }\sum_{j=1}^{|\Omega_{k}|}x_{k,j}=1,~x_{k,j}\in\{0,1\},\sum_{k=1}^{K}\sum_{j=1}^{|\Omega_{k}|}t_{k,j}x_{k,j}\leq T, (63)

where the first two constraints ensure that exactly one candidate is selected from the feasible set Ωk\Omega_{k} for the kk-th user. The third constraint corresponds to the total channel-use constraint in (54). We note that problem 𝐏𝟔{\bf P6}^{\prime} is a conventional multiple-choice knapsack problem. This is a well-studied combinatorial optimization problem, and many efficient solvers have been developed [17]. From a computational complexity perspective, the worst-case approach is exhaustive search, which evaluates all combinations across the candidate sets for each user. In practice, however, the number of candidate SF channels and modulation levels per user is small, resulting in moderate computational cost.

In the proposed PHY calibration for multi-user digital SC, the optimal SF channel index and modulation level (uk,mk)(u_{k}^{\star},m_{k}^{\star}) are first determined at the BS by solving 𝐏𝟔{\bf P6}^{\prime}. The BS then transmits (uk,mk)(u_{k}^{\star},m_{k}^{\star}) and |hk|2σ2\frac{|h_{k}|^{2}}{\sigma^{2}} to each user. Upon receiving them, each user computes the optimal transmit power as p¯t,k=γt,k(uk,mk)σ2|hk|2\bar{p}_{t,k}^{\star}=\frac{\gamma_{t,k}^{(u_{k}^{\star},m_{k}^{\star})}\sigma^{2}}{|h_{k}|^{2}}, which can also be computed at the BS. Therefore, only a small amount of information needs to be exchanged.

VI Simulation Results

In this section, we demonstrate the superiority of the proposed SF channel in SCs, using the MNIST [20], CIFAR-1010 [19], and STL-1010 [8] datasets. Unless otherwise stated, the enc-dec architecture follows the same configuration as in [27], except that the activation function of the last encoder layer is replaced with a sigmoid. The loss function is used as the MSE loss when evaluating with the PSNR, and the SSIM loss when evaluating with the SSIM [33]. For MNIST and CIFAR-1010, the number of training epochs is set to 50 for PSNR and 20 for SSIM, while 100 epochs are used for STL-1010. The batch size is fixed to 6464 for all datasets, and the Adam optimizer [18] is employed with an initial learning rate of 10410^{-4}.

For performance comparison of analog SCs, we consider the following baselines.

  • DeepJSCC-A (Proposed SFC): This framework integrates the proposed SF channel (SFC) optimization into the analog DeepJSCC framework of [3].

  • DeepJSCC-A (ENVC) [3]: This framework corresponds to the original analog DeepJSCC of [3] without any SFC optimization. The SF channel is modeled as an equal-noise-variance channel (ENVC), in which all SFs are corrupted by Gaussian noise with the same variance.

  • DeepJSCC-A (ERC): This variant modifies the conventional DeepJSCC by explicitly imposing an equal-rate constraint across all SFs. Specifically, the noise variance of the mm-th SF is adjusted so that its communication rate satisfies Cm=ImaxMC_{m}=\frac{I_{{\rm max}}}{M}.

  • DeepJSCC-A (IB) [32] / DeepJSCC-A (IB-SA) [23]: Both baselines train the encoder–decoder using an information bottleneck (IB)-based loss function. During communication, the baseline in [34] evaluates the robustness of SFs to noise and allocates SFs with lower robustness to stronger subchannels.

For performance comparison of digital SCs, we consider the following baselines.

  • DeepJSCC-D (Proposed SFC): This framework incorporates the proposed SF channel optimization into the digital DeepJSCC of [7].

  • DeepJSCC-D (ENVC = ERC) [7]: This framework can be regarded as a quantized version of DeepJSCC-A (ENVC), extending the one-bit quantization process in [7] to a multi-bit representation. For training, it adopts multiple BSCs with an equal bit-flip probability applied to all bits, resulting in equal rate allocation.

  • BlindSC [27]: This framework corresponds to the digital SC framework in [27]. All bit-flip probabilities are initialized equally to satisfy the mutual information limit ImaxI_{\rm max}, and the regularization weight is tuned so that the constraint is maintained at the final training epoch.

All digital SC frameworks use an 8-bit uniform quantizer for the encoder output.

Refer to caption
(a) PSNR vs. ImaxI_{\rm max} (Analog SC)
Refer to caption
(b) PSNR vs. MM (Analog SC)
Figure 4: PSNR curves over the mutual information limit ImaxI_{\rm max} and the SF vector length MM for analog SCs on the MNIST dataset.

Fig. 4 shows the PSNR performance of analog SCs on the MNIST dataset for different values of the mutual information limit ImaxI_{\rm max} and the SF vector length MM. In Fig. 4(a), MM is fixed to 392392 (corresponding to N/M=2N/M=2), while in Fig. 4(b), ImaxI_{\rm max} is fixed to 784784. Similar to the Gaussian case, Fig. 4(a) shows that the proposed SFC consistently achieves the highest PSNR across all values of ImaxI_{\rm max}. This indicates that the proposed SFC utilizes the available mutual information more effectively than the baselines by optimizing the SF channel. In Fig. 4(b), when MM is small, all schemes yield relatively low PSNR due to strong compression. However, as MM increases, the PSNR of the proposed SFC gradually improves and eventually converges. This is because a larger MM preserves more information from the input data, but the gains diminish due to the limited mutual information. In contrast, the ENVC, ERC, and IB baselines initially show an increase in PSNR but begin to degrade as MM becomes large. This degradation occurs because increasing MM forces stronger noise to be assigned to all SFs, thereby distorting even the task-critical SFs.

Refer to caption
(a) PSNR vs. ImaxI_{\rm max} (Digital SC)
Refer to caption
(b) PSNR vs. BB (Digital SC)
Figure 5: PSNR curves over the mutual information limit ImaxI_{\rm max} and the SF vector length BB for digital SCs on the CIFAR-1010 dataset.

Fig. 5 shows the PSNR performance of digital SCs on the CIFAR-1010 dataset for different values of the mutual information limit ImaxI_{\rm max} and the bit sequence length BB. The enc-dec architecture follows a Swin Transformer-based SwinJSCC in [36]. In Fig. 5(a), BB is fixed to 1228812288 (corresponding to 8N/B=28N/B=2), while in Fig. 5(b), ImaxI_{\rm max} is fixed to 30723072. In line with the Gaussian and analog SC results, Fig. 5(a) shows that the proposed SFC consistently outperforms the other baselines over the entire range of ImaxI_{\rm max}. In Fig. 5(b), when B3072B\leq 3072, the bit sequence length BB is smaller than or equal to ImaxI_{\rm max}. In this case, the communication becomes error-free, and all schemes achieve identical PSNR values. Meanwhile, the comparison with BlindSC demonstrates that the proposed SFC achieves superior performance by leveraging an information-theoretic optimization instead of heuristic loss design.

Refer to caption
(a) Analog SC
Refer to caption
(b) Digital SC
Figure 6: PSNR curves over the SNR for single-user analog and digital SCs on the MNIST dataset.

Fig. 6 shows the PSNR performance of single-user analog and digital SCs on the MNIST dataset for different values of SNR. In this simulation, we set Ptot=104P_{\rm tot}=10^{4} and w01w_{0}\ll 1. For analog SC, Imax(u)=392u,u{1,2,3,4}I_{{\rm max}}^{(u)}=392u,~u\in\{1,2,3,4\}. For digital SC, Imax(v)=392v,v{1,3,5,7}I_{{\rm max}}^{(v)}=392v,~v\in\{1,3,5,7\}, and 4-QAM is used. For both SCs, the transmission is performed over T/10\lceil T/10\rceil Rayleigh fading subchannels, each spanning 10 channel uses. For fair comparison, all schemes, except for IB-SA, follow the PHY calibration strategy in Sec. V-A with their respective target SNRs or BERs. For IB-SA, since there is no criterion to select the enc-dec pair for a given SNR, we evaluate multiple enc-dec pairs and report the best performance at each SNR. The results show that the proposed SFC consistently achieves the highest PSNR across all SNR regimes. Notably, the performance trend observed here aligns well with Figs. 4 and 5. This consistency demonstrates that the optimized SF channel trained under the mutual information constraint can be faithfully realized in practical wireless environments through the proposed PHY calibration strategy. In other words, even though the training of the SF channel is performed in an abstract mutual-information domain, its performance advantage seamlessly transfers to real physical channels once the PHY calibration is applied.

Refer to caption
(a) User 1 (MNIST)
Refer to caption
(b) User 2 (CIFAR-1010)
Refer to caption
(c) User 3 (STL-1010)
Figure 7: SSIM performance over varying SNRs for multi-user digital SCs on MNIST, CIFAR-1010, and STL-1010 datasets.

Fig. 7 shows the SSIM performance of multi-user digital SCs for different values of SNR. In this simulation, we consider three users, where each user transmits images from a different dataset (MNIST, CIFAR-10, and STL-10). For each dataset, the SF vector length MM is chosen such that N/M=8N/M=8 holds. The mutual information limits are set as Imax,k(1)=Bk/8I_{{\rm max},k}^{(1)}=B_{k}/8 and Imax,k(2)=Bk/2I_{{\rm max},k}^{(2)}=B_{k}/2 for all kk, while the total transmit powers for the three users are set to 10310^{3}, 10410^{4}, 10510^{5}, respectively. Each user experiences an independent Rayleigh fading channel. The other parameters are set as T=104T=10^{4}, w01w_{0}\ll 1, and wk=1,kw_{k}=1,\forall k. For fair comparison, all schemes follow the PHY calibration strategy in Sec. V-B with their respective target bit-flip probabilities, and the problem 𝐏𝟔{\bf P6}^{\prime} is solved using full search. The results show that the proposed SFC consistently achieves the highest SSIM across all SNR values and datasets. These results also confirm that the SF channel optimized under the mutual-information constraint can be faithfully realized even in digital SCs.

Refer to caption
Figure 8: Selection ratios over the SNR for the user transmitting the STL-1010 dataset in multi-user digital SCs.

Fig. 8 shows the selection ratios of Imax,3(1)I_{{\rm max},3}^{(1)} and Imax,3(2)I_{{\rm max},3}^{(2)} over the SNR for the user transmitting the STL-1010 dataset, under the same simulation setting in Fig. 7. The results show that the user mainly selects Imax,3(1)I_{{\rm max},3}^{(1)} when the SNR is low and switches to Imax,3(2)I_{{\rm max},3}^{(2)} as the SNR increases. This demonstrates that the proposed PHY calibration strategy adaptively chooses the appropriate enc-dec pair depending on the channel condition.

VII Conclusion

In this work, we reinterpreted SC from the perspective of the encoder–SF channel–decoder pipeline. Unlike conventional approaches that assume a fixed SF channel, we observed that the SF channel is configurable and can be optimized to improve task performance under a mutual information constraint. We first provided a theoretical analysis for Gaussian sources and linear enc-dec mappings, which revealed that the optimal SF channel allocates lower noise variance to sources with higher variance. Building upon this insight, we developed an end-to-end optimization strategy that jointly trains the DNN-based enc-dec and the SF channel, applicable to both analog and digital SCs. We also proposed a PHY calibration strategy that enables the trained SF channel to be realized in practical wireless environments by adaptively controlling PHY parameters, including transmit power and modulation levels. Simulation results across various datasets demonstrated that the proposed SF channel optimization consistently achieves superior image reconstruction quality and adaptability under diverse channel conditions.

Future research may extend the proposed framework in several promising directions. First, jointly addressing source distribution generalization and channel adaptation remains an important direction for future research. In this direction, leveraging generative models could be a promising approach due to their ability to capture rich semantic priors [29, 12]. In particular, it would be interesting to investigate the relationship between transformer-based attention mechanisms and the optimized noise variance of the trained SF channel, as both can be interpreted as measures of semantic importance. Further, when multi-modal generative models are employed, how to design and optimize the SF channel remains an open problem. Second, developing advanced PHY calibration techniques such as beamforming, reconfigurable intelligent surfaces, and non-orthogonal multiple access could further enhance the scalability and real-world applicability [10]. Finally, exploring theoretical bounds for non-Gaussian models would deepen the information-theoretic understanding of the SF channel.

Appendix A Proof of Lemma 1

Let 𝑼=𝚺xx1{\bm{U}}={\bm{\Sigma}}_{\rm xx}^{-1} and 𝑽=𝑨𝖳𝚺ww1𝑨{\bm{V}}={\bm{A}}^{\sf T}{\bm{\Sigma}}_{{\rm w}{\rm w}}^{-1}{\bm{A}}, which are positive semidefinite matrices. Then, it holds that

Tr((𝑼+𝑽)1)=n=1N1λn(𝑼+𝑽),\displaystyle{\rm Tr}\big(({\bm{U}}+{\bm{V}})^{-1}\big)=\sum_{n=1}^{N}\frac{1}{\lambda_{n}({\bm{U}}+{\bm{V}})}, (64)

where λn()\lambda_{n}(\cdot) is the nn-th largest eigenvalue. By the theorem of Lidskii and Wielandt [1], we have

[λn(𝑼+𝑽)][λn(𝑼)+λNn+1(𝑽)],\displaystyle[\lambda_{n}({\bm{U}}+{\bm{V}})]\succ[\lambda_{n}({\bm{U}})+\lambda_{N-n+1}({\bm{V}})], (65)

where [an](a1,,aN)[a_{n}]\triangleq(a_{1},\ldots,a_{N}) denotes a vector, and \succ represents the majorization relation between vectors. Since the mapping (a1,,aN)n1an(a_{1},\ldots,a_{N})\mapsto\sum_{n}\frac{1}{a_{n}} is Schur-convex, it follows that

n=1N1λn(𝑼+𝑽)n=1N1λn(𝑼)+λNn+1(𝑽).\displaystyle\sum_{n=1}^{N}\frac{1}{\lambda_{n}({\bm{U}}+{\bm{V}})}\geq\sum_{n=1}^{N}\frac{1}{\lambda_{n}({\bm{U}})+\lambda_{N-n+1}({\bm{V}})}. (66)

Substituting this bound into (64) yields

Tr((𝑼+𝑽)1)n=1N1λn(𝑼)+λNn+1(𝑽).\displaystyle{\rm Tr}\big(({\bm{U}}+{\bm{V}})^{-1}\big)\geq\sum_{n=1}^{N}\frac{1}{\lambda_{n}({\bm{U}})+\lambda_{N-n+1}({\bm{V}})}. (67)

Here, λn(𝑼)\lambda_{n}({\bm{U}}) and λNn+1(𝑽)\lambda_{N-n+1}({\bm{V}}) are determined by the eigenvalues of 𝚺xx{\bm{\Sigma}}_{\rm xx} and 𝚺ww{\bm{\Sigma}}_{{\rm ww}}, respectively. Hence, the right-hand side of (67) depends only on 𝚺xx{\bm{\Sigma}}_{\rm xx} and 𝚺ww{\bm{\Sigma}}_{{\rm ww}}. The left-hand side is a function of 𝑨{\bm{A}} and thus varies with its choice. The equality in (67) can be achieved when 𝑽{\bm{V}} is diagonal with its entries arranged in the reverse order of those of 𝑼{\bm{U}}. Taking this condition into account, together with the constraint 𝑨𝑨𝖳=𝑰{\bm{A}}{\bm{A}}^{\sf T}={\bm{I}}, the optimal form of 𝑨{\bm{A}} is given by a partial permutation matrix.

Appendix B Proof of Lemma 2

Let D(𝒜)D(\mathcal{A}) denote the objective value for an active set 𝒜\mathcal{A}. For p<qp<q with σx,p2>σx,q2\sigma_{{\rm x},p}^{2}>\sigma_{{\rm x},q}^{2}, consider q𝒜q\in\mathcal{A}, p𝒜p\notin\mathcal{A}, and the swapped set =(𝒜{q}){p}\mathcal{B}=(\mathcal{A}\setminus\{q\})\cup\{p\}. Under the optimal noise variance in (22), the Lagrange multiplier can be represented as λ𝒜=2e2C/|𝒜|(k𝒜σx,k2)1/|𝒜|.\lambda_{\mathcal{A}}=\frac{2}{e^{2C/|\mathcal{A}|}}\left(\prod_{k\in\mathcal{A}}\sigma_{{\rm x},k}^{2}\!\right)^{1/|\mathcal{A}|}. Since 𝒜\mathcal{A} and \mathcal{B} differ by one element, the ratio between the two multipliers is obtained as λλ𝒜=(σx,p2σx,q2)1/|𝒜|=r1/|𝒜|,\frac{\lambda_{\mathcal{B}}}{\lambda_{\mathcal{A}}}=\left(\frac{\sigma_{{\rm x},p}^{2}}{\sigma_{{\rm x},q}^{2}}\right)^{1/|\mathcal{A}|}=r^{1/|\mathcal{A}|}, where rσx,p2/σx,q2r\triangleq{\sigma_{{\rm x},p}^{2}}/{\sigma_{{\rm x},q}^{2}}. Then, the difference between the objective values of 𝒜\mathcal{A} and \mathcal{B} is given by

D()D(𝒜)\displaystyle D(\mathcal{B})-D(\mathcal{A}) =λλ𝒜2|𝒜|(σx,p2σx,q2)\displaystyle=\frac{\lambda_{\mathcal{B}}-\lambda_{\mathcal{A}}}{2}|\mathcal{A}|-(\sigma_{{\rm x},p}^{2}-\sigma_{{\rm x},q}^{2})
=λ𝒜2|𝒜|(r1/|𝒜|1)(σx,p2σx,q2).\displaystyle=\frac{\lambda_{\mathcal{A}}}{2}|\mathcal{A}|(r^{1/|\mathcal{A}|}-1)-(\sigma_{{\rm x},p}^{2}-\sigma_{{\rm x},q}^{2}). (68)

From Bernoulli’s inequality, (1+a)b1+ab(1+a)^{b}\leq 1+ab for 0b10\leq b\leq 1 and a1a\geq-1, it can be shown that r1/|𝒜|=(1+r1)1/|𝒜|1+r1|𝒜|.r^{1/|\mathcal{A}|}=(1+r-1)^{1/|\mathcal{A}|}\leq 1+\frac{r-1}{|\mathcal{A}|}. Substituting this bound into (68) yields

D()D(𝒜)λ𝒜2(r1)(σx,p2σx,q2)\displaystyle D(\mathcal{B})-D(\mathcal{A})\leq\frac{\lambda_{\mathcal{A}}}{2}(r-1)-(\sigma_{{\rm x},p}^{2}-\sigma_{{\rm x},q}^{2})
<(a)σx,q2(r1)(σx,p2σx,q2)=0,\displaystyle\overset{(a)}{<}\sigma_{{\rm x},q}^{2}(r-1)-(\sigma_{{\rm x},p}^{2}-\sigma_{{\rm x},q}^{2})=0,

where the inequality (a)(a) follows from λ𝒜<2σx,q2\lambda_{\mathcal{A}}<2\sigma_{{\rm x},q}^{2} for the active components. Therefore, including a source with a larger variance σx,p2\sigma_{{\rm x},p}^{2} in the active set reduces distortion. By repeatedly applying this argument, the optimal active set is determined as 𝒜={1,2,,|𝒜|}\mathcal{A}^{\star}=\{1,2,\cdots,|\mathcal{A}|\}. This completes the proof.

References

  • [1] T. Ando (1994-Mar.) Majorizations and inequalities in matrix theory. Linear Algebra Appl. 199, pp. 17–67. Cited by: Appendix A.
  • [2] C. Bian, Y. Shao, and D. Gündüz (2023-Dec.) DeepJSCC-1++: Robust and bandwidth-adaptive wireless image transmission. In Proc. IEEE Global Commun. Conf. (GLOBECOM), Kuala Lumpur, Malaysia, pp. 3148–3154. Cited by: TABLE I, §I, §I.
  • [3] E. Bourtsoulatze, D. Burth Kurka, and D. Gündüz (2019-Sep.) Deep joint source-channel coding for wireless image transmission. IEEE Trans. Cogn. Commun. Netw. 5 (3), pp. 567–579. Cited by: TABLE I, §I, §I, §I, 1st item, 2nd item, 2nd item.
  • [4] C. Cai, X. Yuan, and Y. A. Zhang (2025-Apr.) End-to-end learning for task-oriented semantic communications over MIMO channels: An information-theoretic framework. IEEE J. Sel. Areas Commun. 43 (4), pp. 1292–1307. Cited by: §I.
  • [5] C. Chaccour, W. Saad, M. Debbah, Z. Han, and H. V. Poor (2025-Feb.) Less data, more knowledge: Building next generation semantic communication networks. IEEE Commun. Surveys Tuts. 27 (1), pp. 37–76. Cited by: §I.
  • [6] K. Cho and D. Yoon (2002-Jul.) On the general BER expression of one- and two-dimensional amplitude modulations. IEEE Trans. Commun. 50 (7), pp. 1074–1080. Cited by: §V-B.
  • [7] K. Choi, K. Tatwawadi, A. Grover, T. Weissman, and S. Ermon (2019-Jun.) Neural joint source-channel coding. In Proc. Int. Conf. Machine Learning (ICML), Long Beach, CA, USA, pp. 1182–1192. Cited by: TABLE I, §I, §I, 1st item, 2nd item, 2nd item.
  • [8] A. Coates, A. Ng, and H. Lee (2011-Apr.) An analysis of single-layer networks in unsupervised feature learning. In Proc. Int. Conf. Artif. Intell. Statist. (AISTATS), Ft. Lauderdale, FL, USA, pp. 215–223. Cited by: §VI.
  • [9] T. M. Cover (1999) Elements of information theory. John Wiley & Sons. Cited by: §III-B, §III-B.
  • [10] Z. Ding, Y. Liu, J. Choi, Q. Sun, M. Elkashlan, I. Chih-Lin, and H. V. Poor (2017-Feb.) Application of non-orthogonal multiple access in LTE and 5G networks. IEEE Commun. Mag. 55 (2), pp. 185–191. Cited by: §VII.
  • [11] A. Goldsmith (2005) Wireless communications. ​​​​​​ Cambridge, U.K.: Cambridge Univ. Press. Cited by: §I, §II-A, footnote 2.
  • [12] E. Grassucci, Y. Mitsufuji, P. Zhang, and D. Comminiello (2024-Apr.) Enhancing semantic communication with deep generative models: An overview. In Proc. IEEE Int. Conf. Acoust., Speech, Signal Process. (ICASSP), Seoul, Korea, Republic of, pp. 13021–13025. Cited by: §VII.
  • [13] R.M. Gray and D.L. Neuhoff (1998-Oct.) Quantization. IEEE Trans. Inf. Theory 44 (6), pp. 2325–2383. Cited by: 2nd item.
  • [14] D. Gündüz, M. A. Wigger, T. Tung, P. Zhang, and Y. Xiao (2025-Sep.) Joint source–channel coding: Fundamentals and recent progress in practical designs. Proc. IEEE 113 (9), pp. 888–919. Cited by: §I.
  • [15] J. Huang, D. Li, C. Huang, X. Qin, and W. Zhang (2024-Jan.) Joint task and data-oriented semantic communications: A deep separate source-channel coding scheme. IEEE Internet Things J. 11 (2), pp. 2255–2272. Cited by: §I.
  • [16] Y. Huh, H. Seo, and W. Choi (2025-Apr.) Universal joint source-channel coding for modulation-agnostic semantic communication. IEEE J. Sel. Areas Commun. 43 (7), pp. 2560–2574. Cited by: TABLE I, §I, §I.
  • [17] H. Kellerer, U. Pferschy, and D. Pisinger (2004) The multiple-choice knapsack problem. ​​​​Springer Berlin Heidelberg. Cited by: §V-B.
  • [18] D. P. Kingma and J. Ba (2015-05) Adam: A method for stochastic optimization. In Proc. Int. Conf. Learn. Represent. (ICLR), San Diego, CA, USA, pp. 1–13. Cited by: §VI.
  • [19] A. Krizhevsky and G. Hinton (2009) Learning multiple layers of features from tiny images. Note: M.S. thesis, Univ. Toronto, Toronto, ON, Canada Cited by: §VI.
  • [20] Y. Lecun, L. Bottou, Y. Bengio, and P. Haffner (1998-Nov.) Gradient-based learning applied to document recognition. Proc. IEEE 86 (11), pp. 2278–2324. Cited by: §VI.
  • [21] F. Liu, Z. Sun, Y. Yang, C. Guo, and S. Zhao (2024-05) Rate-adaptable multitask-oriented semantic communication: An extended rate–distortion theory-based scheme. IEEE Internet Things J. 11 (9), pp. 15557–15570. Cited by: TABLE I, §I, §I.
  • [22] X. Luo, H. Chen, and Q. Guo (2022-Feb.) Semantic communications: Overview, open issues, and future research directions. IEEE Wireless Commun. 29 (1), pp. 210–219. Cited by: §I.
  • [23] S. Lyu, Y. Sun, L. Guo, X. Yuan, F. Fang, L. Zhang, and X. Wang (2024-Nov.) Improving channel resilience for task-oriented semantic communications: A unified information bottleneck approach. IEEE Commun. Lett. 28 (11), pp. 2623–2627. Cited by: 4th item.
  • [24] S. Ma, C. Zhang, B. Shen, Y. Wu, H. Li, S. Li, G. Shi, and N. Al-Dhahir (2024-Oct.) Semantic feature division multiple access for multi-user digital interference networks. IEEE Trans. Wireless Commun. 23 (10), pp. 15230–15244. Cited by: 2nd item.
  • [25] Y. Oh, J. Lee, C. G. Brinton, and Y. Jeon (2025-Jun.) Communication-efficient split learning via adaptive feature-wise compression. IEEE Trans. Neural Netw. Learn. Syst. 36 (6), pp. 10844–10858. Cited by: 2nd item.
  • [26] Y. Oh, J. Park, J. Choi, and Y. Jeon (2025-Nov.) Deep learning-based modulation and power control: A BER perspective. IEEE Commun. Lett. 29 (11), pp. 2616–2620. Cited by: §IV-B.
  • [27] Y. Oh, J. Park, J. Choi, J. Park, and Y. Jeon (2025-Nov.) Blind training for channel-adaptive digital semantic communications. IEEE Trans. Commun. 73 (11), pp. 11274–11290. Cited by: TABLE I, §I, §IV-B, §IV-B, §IV-B, 3rd item, 3rd item, §VI.
  • [28] J. Park, Y. Oh, S. Kim, and Y. Jeon (2025-Feb.) Joint source-channel coding for channel-adaptive digital semantic communications. IEEE Trans. Cogn. Commun. Netw. 11 (1), pp. 75–89. Cited by: TABLE I, §I, §I.
  • [29] J. Ren, Z. Zhang, J. Xu, G. Chen, Y. Sun, P. Zhang, and S. Cui (2024-Aug.) Knowledge base enabled semantic communication: A generative perspective. IEEE Wireless Commun. 31 (4), pp. 14–22. Cited by: §VII.
  • [30] T. Ren, R. Li, M. Zhao, X. Chen, G. Liu, Y. Yang, Z. Zhao, and H. Zhang (2025) Separate source channel coding is still what you need: An LLM-based rethinking. Note: arXiv:2501.04285 Cited by: §I.
  • [31] H. Seo, J. Park, M. Bennis, and M. Debbah (2023-Jun.) Semantics-native communication via contextual reasoning. IEEE Trans. Cogn. Commun. Netw. 9 (3), pp. 604–617. Cited by: §I.
  • [32] J. Shao, Y. Mao, and J. Zhang (2022-Jan.) Learning task-oriented communication for edge inference: An information bottleneck approach. IEEE J. Sel. Areas Commun. 40 (1), pp. 197–211. Cited by: 4th item.
  • [33] T. Tung, D. B. Kurka, M. Jankowski, and D. Gündüz (2022-Dec.) DeepJSCC-Q: Constellation constrained deep joint source-channel coding. IEEE J. Sel. Areas Inf. Theory 3 (4), pp. 720–731. Cited by: TABLE I, §I, §I, §VI, footnote 1.
  • [34] H. Wu, Y. Shao, C. Bian, K. Mikolajczyk, and D. Gündüz (2023-05) Vision transformer for adaptive image transmission over MIMO channels. In Proc. IEEE Int. Conf. Commun. (ICC), Rome, Italy, pp. 3702–3707. Cited by: TABLE I, §I, §I.
  • [35] S. Xie, S. Ma, M. Ding, Y. Shi, M. Tang, and Y. Wu (2023-Aug.) Robust information bottleneck for task-oriented communication with digital modulation. IEEE J. Sel. Areas Commun. 41 (8), pp. 2577–2591. Cited by: §IV-A.
  • [36] K. Yang, S. Wang, J. Dai, X. Qin, K. Niu, and P. Zhang (2025-Feb.) SwinJSCC: Taming swin transformer for deep joint source-channel coding. IEEE Trans. Cogn. Commun. Netw. 11 (1), pp. 90–104. Cited by: §VI.
  • [37] M. Yang, C. Bian, and H. Kim (2022-Jun.) OFDM-guided deep joint source channel coding for wireless multipath fading channels. IEEE Trans. Cogn. Commun. Netw. 8 (2), pp. 584–599. Cited by: TABLE I, §I, §I.
  • [38] W. Yang, H. Du, Z. Q. Liew, W. Y. B. Lim, Z. Xiong, D. Niyato, X. Chi, X. Shen, and C. Miao (2023-firstquarter) Semantic communications for future Internet: Fundamentals, applications, and challenges. IEEE Commun. Surveys Tuts. 25 (1), pp. 213–250. Cited by: §I.
  • [39] W. Zhang, H. Zhang, H. Ma, H. Shao, N. Wang, and V. C. M. Leung (2023-Aug.) Predictive and adaptive deep coding for wireless image transmission in semantic communication. IEEE Trans. Wireless Commun. 22 (8), pp. 5486–5501. Cited by: TABLE I, §I, §I.