Towards Optimal Semantic Communications: Reconsidering the Role of Semantic Feature Channels
Abstract
This paper investigates the optimization of transmitting the encoder outputs, termed semantic features (SFs), in semantic communication (SC). We begin by modeling the entire communication process from the encoder output to the decoder input, encompassing the physical channel and all transceiver operations, as the SF channel, thereby establishing an encoder–SF channel–decoder pipeline. In contrast to prior studies that assume a fixed SF channel, we note that the SF channel is configurable, as its characteristics are shaped by various transmission and reception strategies, such as power allocation. Based on this observation, we formulate the SF channel optimization problem under a mutual information constraint between the SFs and their reconstructions, and analytically derive the optimal SF channel under a linear encoder-decoder structure and Gaussian source assumption. Building on this analysis, we propose a joint optimization framework for the encoder-decoder and SF channel applicable to both analog and digital SC systems. To realize the optimized SF channel, we also propose a physical-layer calibration strategy that enables real-time power control and adaptation to varying channel conditions. Simulation results demonstrate that the proposed SF channel optimization achieves superior task performance under various communication environments.
I Introduction
With recent advances in artificial intelligence (AI), next-generation wireless networks are anticipated to support emerging intelligent applications such as digital twins, intelligent transportation, and collaborative robotics [4, 22]. These applications often require frequent and large-scale data exchange, which places a heavy burden on existing communication systems designed for the accurate transmission of raw data. Fortunately, with AI systems increasingly equipped with perception and decision-making capabilities, the communication goals in such intelligent services are gradually changing from accurately reproducing raw data to conveying information that contributes to performing specific tasks. This shift has led to the emergence of a new communication paradigm known as semantic communication (SC), which focuses on transmitting only task-relevant information, thereby improving bandwidth efficiency and robustness against channel perturbations [5, 38, 31].
Among various implementations of SC, two representative approaches have been widely studied: deep separate source-channel coding (DeepSSCC) and deep joint source-channel coding (DeepJSCC). In DeepSSCC, the source and channel coding schemes are designed independently [15, 30]. This separation provides high design flexibility, allowing source and channel coding schemes to be freely combined. However, the decoupled design prevents joint optimization across the source and channel, thereby limiting the achievable task performance [14]. In contrast, DeepJSCC jointly designs the source and channel coding schemes [3]. This joint design principle reduces flexibility, as the encoder and decoder are typically tailored to specific source distributions and channel conditions. Nevertheless, the ability to perform end-to-end optimization often leads to superior task performance, which has driven extensive research efforts toward DeepJSCC.
A typical approach in DeepJSCC is to transform the input data at the transmitter into a latent representation, often referred to as the semantic features (SFs). These SFs are then transmitted through either analog or digital communication, depending on whether the SFs are conveyed in continuous or discrete forms. In both cases, the transmitted SFs inevitably experience degradation due to channel fading and noise. At the receiver, the decoder performs the designated task based on the received SFs. We note that the entire process from the encoder output to the decoder input, including the physical channel, modulation, power control, and other transceiver operations, can be collectively modeled as an equivalent channel, referred to as the SF channel. This abstraction allows us to interpret the DeepJSCC framework as an encoder–SF channel–decoder pipeline, as illustrated in Fig. 1, where the SF channel represents how the transmitted SFs are distorted and transformed through the communication process.
Over the years, extensive research has focused on encoder-decoder (enc-dec) optimization under a single fixed SF channel assumption [3, 33, 37, 34, 7]. For example, earlier works modeled the SF channel as an additive white Gaussian noise (AWGN) or Rayleigh fading channel with a fixed signal-to-noise ratio (SNR) level [3, 33]. This approach was further extended by incorporating multipath fading effects and multi-antenna systems into the SF channel [37, 34]. Also, in [7], to capture bit-level transmission characteristics of digital communication, the SF channel was characterized as a binary symmetric channel (BSC) with a fixed bit-flip probability. These studies have successfully demonstrated the effectiveness of enc-dec-centric optimization by achieving high task performance under the assumed SF channel. Nevertheless, the resulting enc-dec often suffers from significant performance degradation when the actual SF channel deviates from the trained one.
To address this challenge, several studies have attempted to enhance robustness or adaptability by training the enc-dec under multiple SF channels [39, 28, 16, 2, 21]. In [39], multiple SF channels were configured under an AWGN or Rayleigh fading channel with varying SNR levels. For digital communications, multiple SF channels were generated by sampling the bit-flip probabilities of the BSCs [28] or by changing both the SNR levels and modulation orders under an AWGN channel [16]. Further, in [2] and [21], multiple SF channels were induced by varying the number of transmitted SFs for rate-adaptive SCs. Although these studies further improve the robustness and generality of the enc-dec-centric optimization, their performance still tends to degrade under unseen communication environments. Moreover, covering all possible SF channels using the above approaches would require excessive data sampling and/or model complexity as the actual SF channel can vary by numerous factors, including antenna configurations, channel statistics, and interference/noise levels.
Unlike prior works that focus solely on enc–dec-centric optimization under fixed SF channels, this work highlights that the SF channel is configurable, as its behavior depends on various transceiver operations such as power allocation and adaptive modulation and coding [11]. This implies that the SF channel itself can be incorporated into the training process to further improve task performance. An initial attempt in this direction was made in [27], which jointly optimizes the enc–dec and a BSC-based SF channel. However, the applicability of [27] is limited to point-to-point digital SC scenarios. Moreover, [27] does not establish a theoretical connection between the SF channel and practical communication systems, as it relies on a heuristic regularization loss for optimizing the BSC-based SF channel. Table I summarizes representative studies on SF channel modeling and optimization and categorizes them into three groups: single SF channels, multiple SF channels, and trainable SF channels.
To further shed light on the potential of configuring the SF channel, this paper proposes a universal and theoretically grounded framework for jointly optimizing the enc-dec and the SF channel in both analog and digital SCs. In this framework, we focus on solving a joint optimization problem that maximizes the task performance under a limited mutual information between the transmitted and reconstructed SFs. To provide analytical evidence for the necessity of this joint optimization, we first analyze a tractable case where the source is Gaussian and the enc-dec is modeled as linear mappings. Building upon the insights from this analysis, we propose an end-to-end training strategy that jointly optimizes the non-linear deep neural network (DNN)-based enc-dec and the SF channel. We also introduce a communication strategy that realizes the trained SF channel in practical communication scenarios by controlling physical layer (PHY) parameters including transmit power and modulation level. Our framework controls the distortion of each SF by optimizing the SF channel, thereby improving the task performance of SC. Furthermore, it reduces the training overhead by decoupling the training process from the actual communication system.
| Category | Work | SF channel | Channel adaptiveness or robustness | Train SF channel for analog SCs | Train SF channel for digital SCs | Theoretical foundation for SF channel |
| Single SF channel | [3, 33] | AWGN/Rayleigh with a fixed SNR | ✗ | ✗ | ✗ | ✗ |
| [37] | OFDM with a fixed number of multipaths | ✗ | ✗ | ✗ | ✗ | |
| [34] | MIMO with a fixed number of antennas | ✗ | ✗ | ✗ | ✗ | |
| [7] | BSC with a fixed bit-flip probability | ✗ | ✗ | ✗ | ✗ | |
| Multiple SF channels | [39] | AWGN/Rayleigh with varying SNRs | ✓ | ✗ | ✗ | ✗ |
| [28] | BSC with varying bit-flip probabilities | ✓ | ✗ | ✗ | ✗ | |
| [16] | AWGN with varying SNRs and modulation orders | ✓ | ✗ | ✗ | ✗ | |
| [2, 21] | Transmission with varying numbers of SFs | ✓ | ✗ | ✗ | ✗ | |
| Trainable SF channel | [27] | Trainable BSC with a heuristic loss function | ✓ | ✗ | ✓ | ✗ |
| This work | Trainable AWGN / Trainable BSC with a mutual information constraint | ✓ | ✓ | ✓ | ✓ |
The major contributions of this paper are summarized as follows.
-
•
We formulate the joint optimization problem of the enc-dec and the SF channel to maximize task performance in both analog and digital SCs. In this problem, we introduce a mutual information constraint between the transmitted and reconstructed SFs to prevent convergence to a trivial error-free SF channel, while accounting for various communication constraints in an integrated manner.
-
•
We present an analytical study on the joint optimization between the enc-dec and the SF channel. To this end, we focus on a tractable scenario with a linear enc–dec structure and a Gaussian source, from which we derive the optimal SF channel in closed form.
-
•
We propose an end-to-end training strategy for jointly optimizing the DNN-based enc-dec and the SF channel under limited mutual information. In analog SCs, the SF channel is modeled as an AWGN channel, where each SF is corrupted by Gaussian noise with a learnable variance. In digital SCs, the SF channel is modeled as a set of BSCs, where each bit can be flipped with a learnable bit-flip probability. In both analog and digital SCs, the limited mutual information is addressed as a rate allocation problem, in which each SF or bit is assigned a portion of the total communication rate. The allocated rate determines the noise variance in analog SCs or the bit-flip probability in digital SCs, thereby enabling individual control over the distortion of each SF or bit.
-
•
We introduce a communication strategy, referred to as a PHY calibration strategy, which realizes the optimized SF channel by controlling PHY parameters. In single-user analog SCs, the proposed strategy determines the transmit power and feature-to-channel mapping so that the actual SNR matches the trained SNR. This approach can be readily extended to multi-user analog SCs. In multi-user digital SCs, it jointly adjusts the transmit powers and modulation levels across multiple users to align the actual BERs with the trained bit-flip probabilities. For both analog and digital SCs, the proposed strategy selects the most suitable SF channel among multiple candidates to adapt to varying communication environments.
-
•
Through simulation, we demonstrate that the proposed framework achieves superior image reconstruction quality across various mutual information limits. We also numerically verify that the simulation results observed in the Gaussian-source analysis also appear in SC, confirming the practical validity of the theoretical insights. Furthermore, we show that the proposed PHY calibration strategy faithfully realizes the target SF channel in actual wireless environments.
II System Model and Concept of SF Channel
In this section, we first present the analog and digital SC systems and then introduce the concept of the SF channel.
II-A System Model
We consider a typical SC model where a transmitter is connected to a receiver over a wireless network to perform an image reconstruction task. This model can be readily extended to other machine learning tasks and SC architectures. Let denote the image data of length . The transmitter encodes using an encoder as follows:
| (1) |
where is the encoding function parameterized by , and is the SF vector of length .
After encoding, the SF vector is mapped into either an analog or a digital symbol depending on whether analog or digital communication is employed.
-
•
Analog symbol mapping: Each SF is mean-centered and scaled as
(2) where is the mean of the -th SF, and is a power allocation coefficient satisfying , with denoting the total power budget. Then, pairs of real-valued SFs are grouped into complex symbols as
(3) -
•
Digital symbol mapping: The SF vector is first quantized into a bit sequence of length using standard quantization methods [25, 13, 24]. The bit sequence is then mapped to a symbol sequence of length through a digital modulation process, where denotes the constellation set. Each modulated symbol is scaled as
(4) where is the transmit power allocated to the -th symbol and satisfies under the assumption that .
For consistency, we denote the symbol sequence length by . In analog mapping, is fixed, while in digital mapping, varies depending on the modulation order. The superscripts and are omitted hereafter for notational simplicity.
Under a flat-fading channel, the received signal at the -th channel use is expressed as
| (5) |
where is the channel coefficient, and is AWGN with variance . The channel coefficient may remain constant or vary depending on the coherence time and the number of subcarriers [11]. Upon receiving the signal in (5), the receiver performs channel equalization to obtain the equalized signal at the -th channel use, expressed as
| (6) |
where . From the equalized signal in (6), an estimate of is obtained using either an analog or digital demapping process.
-
•
Analog symbol demapping: The equalized signal is decomposed into its in-phase and quadrature components, followed by power de-scaling and mean restoration. This standard receiver-side processing leads to the equivalent additive-noise model, given by
(7) where .
-
•
Digital symbol demapping: Symbol detection is performed on to recover the estimated bit sequence . The estimated bit sequence is then dequantized to obtain the estimated SF vector .
Finally, the receiver reconstructs an image using a decoder as follows:
| (8) |
where denotes the reconstructed image, and represents the decoding function parameterized by .
II-B SF Channel
Definition (SF channel): The SF channel is the equivalent channel between the encoder output and the decoder input, denoted by for analog communication and by for digital communication111In this work, we focus on digital SC systems, where the encoder output is quantized and converted into a bit sequence. Nevertheless, the system can be extended to the discrete symbol domain, as in [33], where the SF channel can be modeled as with denoting discrete modulation symbols..
III Motivation and Case Study
In this section, we present the motivation for joint enc-dec and SF channel optimization and provide a case study that analytically illustrates its necessity.
III-A Motivation for Joint Optimization
Our key observation is that the SF channel is configurable through various communication strategies. For instance, (7) shows that the distortion of each SF can be controlled by adjusting the corresponding power coefficient . Despite this inherent configurability, most existing works focus solely on optimizing the enc-dec while keeping the SF channel fixed. This motivates us to consider the joint optimization of the enc-dec and the SF channel.
A trivial SF channel that maximizes task performance corresponds to the ideal error-free channel, i.e., . While this solution preserves all SFs without distortion, it fails to account for the task-dependent importance of individual SF components and results in an inefficient allocation of communication resources. In particular, over-allocating resources to all SFs disregards their heterogeneous contributions to task performance, leading to potentially excessive resource usage. Therefore, it is essential to impose appropriate constraints on the SF channel, ensuring operation under limited communication resources and within a non-trivial regime where .
To facilitate optimal SF channel design under resource constraints, we raise the following fundamental question:
Motivating Question: What is the optimal SF channel that maximizes task performance under a limited mutual information, i.e., or ?
From an information-theoretic perspective, this constraint implies that the transmitted and received SFs are not identical. Therefore, it indirectly imposes communication constraints and allows us to accommodate various communication scenarios. One representative example is the Gaussian channel, where the mutual information is determined by bandwidth, transmit power, and noise variance. By constraining the mutual information, these factors are restricted in an integrated manner. As a result, the framework captures the effects of multiple communication constraints, while avoiding excessive training overhead for the enc-dec. The mutual information limit does not represent the Shannon channel capacity; rather, it quantifies how much information about can be conveyed through the channel to . A smaller indicates more severe degradation under poor channel conditions or low transmit power, whereas a larger corresponds to more reliable transmission. Since channel-induced distortions are inherently allowed, our formulation differs from the classical source-channel separation theory.
III-B Case Study: Analog SC with Linear Enc-Dec and Gaussian Input
Motivated by the above question, we present a simple analytical case study to gain insight into the joint optimization of the enc-dec and the SF channel. To this end, we consider a simplified and tractable scenario with a linear enc-dec structure and a Gaussian source, for which the joint optimization can be analytically characterized. The proposed SC framework that generalizes this insight to practical settings is presented in the subsequent sections.
Let be a Gaussian source with distribution , where and . A linear encoder compresses as , with and satisfying to constrain the encoder output power. The SF channel is assumed to add an independent Gaussian noise , where , resulting in the received signal . The decoder reconstructs , where . The optimization problem is formulated as
| (10) | ||||
| s.t. | (11) |
In problem , the mutual information is given by
| (12) |
The objective function can be expressed as
| (13) |
where , and . Differentiating (III-B) with respect to and setting the result to zero, we have the optimal form of as
| (14) |
Substituting (14) into (III-B) and applying the Woodbury matrix identity, the objective function can be rewritten as
| (15) |
which demonstrates the dependence of the objective on and . However, directly differentiating it with respect to these variables does not yield a closed-form solution due to the complex trace-inverse form. To address this, we derive the solution through three steps: (i) we characterize the optimal form of , (ii) determine the optimal , and (iii) obtain the closed-form expression for the optimal .
To characterize the optimal form of , let us refer to a binary matrix in which every standard basis vector appears once as a column, with the remaining columns (if any) being zero vectors, as a partial permutation matrix. Then the following lemma holds.
Lemma 1
For any matrix , there exists a partial permutation matrix , such that
| (16) |
Proof:
See Appendix A. ∎
Based on Lemma 1, the optimal has the form of a partial permutation matrix . Then, problem is reformulated as
| (17) | ||||
| s.t. | (18) | |||
| (19) |
where the constraints in (19) come from the definition of .
Setting as a partial permutation matrix implies that only a subset of sources is selected for transmission. Let denote the selected source index set with , and let denote the source-channel index mapping function such that for . Then, the objective function and mutual information constraint in can be rewritten as
| (20) |
and
| (21) |
respectively.
Applying the Lagrangian method to (20) and (21), the optimal noise variance is obtained as
| (22) |
where is the optimal Lagrangian multiplier satisfying .
Substituting (22) into (20), the objective function is represented as
| (23) |
where is the active source index set. It should be noted that the problem of determining and reduces to finding the optimal active set, which is characterized as follows:
Lemma 2
The optimal active set is , where is determined by .
Proof:
See Appendix B ∎
From Lemma 2, the following corollary holds:
Corollary 1
Setting is sufficient to determine the optimal active set .
Proof:
The set must contain , and satisfy . Therefore, it is obvious that must be where is an arbitrary subset of with . ∎
Regarding the mapping function, since does not affect the objective function in (23) and , the identity mapping can be adopted as a sufficient choice.
The sequence of results established in (14), (22), Lemmas 1 and 2, and Corollary 1 leads to the following theorem.
Theorem 1 (Optimal Solution)
The optimal encoder, decoder, and noise covariance matrix of the SF channel in problem are given by
| (24) | ||||
| (25) | ||||
| (26) |
where is obtained from (22) by setting .
Theorem 1 shows that sources with larger variances are selected for transmission, and their noise variances are inversely proportional to the source variances.
To verify the effectiveness of the SF channel in Theorem 1, we conduct simulations with , where and are sorted in descending order. We compare three schemes: (i) Proposed SFC (Theorem 1), (ii) ENVC (an equal-noise-variance channel across all SFs with the optimal enc-dec), and (iii) R-D theory (Gaussian R-D bound with [9]). Fig. 2(a) shows that the proposed SFC closely follows the R-D bound, with a negligible gap for moderate . Fig. 2(b) illustrates the MSE versus when , showing that the proposed SFC rapidly converges to the R-D bound, while EC degrades for large . The major reason for this degradation is that, as increases, stronger noise is assigned to all SFs, thereby causing greater distortion to high-variance sources.
Remark 1 (Connection to R-D Theory): The R-D bound can be characterized via a test channel [9]. For a Gaussian source, the optimal test channel is an additive Gaussian channel with an appropriately chosen noise variance. However, this channel is derived under the assumption that the source dimension and the channel input dimension are identical, i.e., . Moreover, the case cannot be directly inferred from the test-channel result. In contrast, our formulation starts from the more general setting with . As a result, the classical test-channel result is recovered when , while the case extends beyond it. Therefore, our framework can be interpreted as a generalization of the test channel, and the closeness to the R-D bound observed in Fig. 2 naturally occurs as approaches .
IV Proposed Joint Enc-Dec and SF Channel Optimization for SCs
Our analysis in Sec. III provides analytical evidence that jointly optimizing the enc-dec and the SF channel can improve task performance. However, a closed-form solution is obtainable only under a simplified setting (i.e., analog SC with a linear enc–dec and a Gaussian input). In general SC scenarios, it is difficult to obtain an analytically optimal SF channel due to unknown input distributions and nonlinear DNN-based enc–dec structures. To overcome this limitation, we propose an end-to-end training strategy that leverages a data-driven approach to jointly optimize both the enc–dec and the SF channel. The high-level procedure of our strategy is illustrated in Fig. 3.
IV-A End-to-End Training for Analog SC
The SF channel during training is modeled as an AWGN channel, where and . The noise covariance is treated as a trainable parameter so that different SFs can experience different noise levels during optimization. This implies that SFs that are more critical to the task are assigned lower noise variances for higher reliability, while less important SFs are assigned higher variances to improve communication efficiency. Further, it is important to note that the AWGN modeling is employed solely during training and does not restrict the actual communication scenarios. The practical communication strategy, including power allocation, fading channels, and detection, is described in Sec. V.
Following the above strategy, the optimization problem for end-to-end training is formulated as
| (27) | ||||
| (28) |
One key challenge in solving is that the mutual information is difficult to compute due to the nonlinear nature of DNN-based enc-dec and unknown input distributions. Moreover, directly computing the mutual information would incur high computational complexity, making the optimization intractable. To address this, we adopt the mean-field assumption in [35], under which is decomposed as follows:
| (29) |
Based on this decomposition, an upper bound on the mutual information is given by
| (30) |
where is the variance of , which can be empirically estimated from training samples. From the above expression, we define the communication rate of the -th SF as
| (31) |
To find the optimal via training, we parameterize it as
| (32) |
where is a trainable parameter that determines the portion of the total rate assigned to the -th SF. The constraint in (32) is directly derived from (31). With this parameterization, problem is reformulated as a rate allocation problem with optimization parameters .
The parameter can be readily implemented as
| (33) |
where denotes a trainable raw parameter. By the definition of in (31), the noise variance is given by
| (34) |
Then, the training for the SF channel is realized as
| (35) |
Here, the noise variance of the AWGN channel acts as a bias term in conventional DNNs. As a result, it can be readily optimized using standard neural network optimizers. Meanwhile, in our training, while the AWGN model is adopted as a convenient abstraction for training, it can be extended to more structured channel models, such as correlated Gaussian noise.
In our training, only additional parameters are introduced. In practice, is sufficiently small compared to the number of enc-dec parameters. Moreover, as shown in (33)–(35), the additional computations required for the SF channel optimization are purely element-wise operations and do not involve large-scale matrix multiplications. Therefore, the proposed method incurs only a marginal increase in computational complexity compared to conventional DeepJSCC.
IV-B End-to-End Training for Digital SC
In digital SCs, the SF channel is modeled as parallel BSCs. The optimization problem is formulated as
| (36) | ||||
| (37) |
The remaining procedures are similar to those in Sec. IV-A. The mutual information is decomposed under the mean-field assumption, and an upper bound is obtained as
| (38) |
where for . The communication rate of the -th bit is defined as
| (39) |
subject to the following constraints:
| (40) |
where the first constraint is derived from (37), and the second constraint comes from . The rate allocation problem for digital SC is formulated by parameterizing
| (41) |
The parameter can be implemented as
| (42) |
where [26]. From the definition of in (39), the bit-flip probability of the -th BSC is given by
| (43) |
where , , and . The approximation is used since has no closed-form expression; it is obtained by performing a Taylor expansion of around , followed by series reversion.
Training is realized under the relaxed BSC model, given by
| (44) |
where
| (45) |
is a random variable, and is a temperature parameter [27]. The relaxation is used to compute a gradient of with respect to a given loss function. Consequently, (or ) is jointly optimized with the enc-dec. Meanwhile, similar to the analog case, digital SCs also introduce only additional parameters, and the associated computations are purely element-wise. Therefore, the resulting increase in computational complexity is marginal.
Remark 2 (Adaptation to Various Communication Environments): Recall that, in Sec. III, we have discussed the trade-off between the mutual information limit and the MSE. This naturally extends to SCs as a trade-off between and the task performance, as demonstrated in Sec. VI. To handle various communication environments, multiple enc-dec and SF channel pairs can be trained under different mutual information limits. In Sec. V, we introduce a communication strategy that adaptively selects an appropriate SF channel for a given communication environment.
Remark 3 (Comparison to Prior Work in [27]): A similar approach was also studied in our prior work [27], where a BSC-based SF channel was optimized via end-to-end training. However, the optimization relied on a heuristically designed loss function, rather than capturing or constraining the mutual information of the SF channel. Consequently, [27] did not establish a theoretical connection between the SF channel and practical communication systems. Moreover, its validation was restricted to digital SC, raising concerns about its scalability to other forms of SC scenarios, e.g., analog SC. The advantage of our mutual-information-constrained approach over the heuristic approach in [27] will be further discussed in Sec. VI.
V Proposed PHY Calibration for Realizing the Trained SF Channel
The training framework in Sec. IV produces the optimized SF channel by imposing a mutual information constraint, which captures the effects of various communication constraints in an integrated manner during training. However, this abstraction does not directly guarantee that the trained SF channel can be realized under practical communication settings because it does not explicitly account for communication constraints such as total transmit power. To address this issue, the communication parameters must be calibrated so that the SF channel observed during transmission aligns with the optimally trained one while satisfying communication constraints. We refer to this process as PHY calibration. In this section, we present PHY calibration strategies for two communication settings: (i) single-user analog SCs and (ii) multi-user digital SCs.
V-A Single-User Analog SCs
Consider the SF channels trained for different mutual information limits , satisfying , as discussed in Remark 1. The corresponding losses follow . Our objective for PHY calibration is to jointly select a proper SF channel and the transmit power. The optimization problem is formulated as
| (46) | ||||
| s.t. | (47) |
where represents the average transmit power used for sending the -th SF, and controls the trade-off between the task loss and the total transmit power. The target SNR of the -th SF in the -th SF channel, denoted by , is defined as
| (48) |
where is the trained noise variance of the -th SF in the -th SF channel. In the first constraint, represents the actual SNR of during transmission. This constraint ensures alignment between the target and actual SNRs, thereby improving the reliability of task performance.
To solve problem , an auxiliary variable is precomputed as
| (49) |
For each , is sorted in descending order with respect to in advance. When communication begins, the channel-gain-to-noise-power ratio is also sorted in descending order. Here, the indices and are retained after sorting for notational simplicity. The required power coefficient is then computed as
| (50) |
The sorting above assigns SFs with higher to stronger channels, thereby reducing the total transmit power. After obtaining , the optimal SF channel index is determined as
| (51) |
where . The optimal power coefficient is given by .
The proposed PHY calibration for analog SC has several notable features. First, since is pre-shared between the transmitter and the receiver, the optimal power coefficient and SF channel can be computed locally once is obtained. Therefore, no additional communication overhead is required other than sharing 222The channel-gain-to-noise-power ratio can be estimated using standard pilot-based techniques or feedback mechanisms[11]. When the channel coherence time is sufficiently large, only a small number of ratios need to be estimated or fed back, resulting in marginal communication overhead. for reconstructing and . Second, the proposed method incurs very low computational complexity, as the optimal transmit power coefficients are obtained in closed form and require only simple arithmetic operations for each SF. Finally, the method can be readily extended to an interference-free multi-user scenario, in which each user independently adjusts its transmit power based on its own trained target SNRs. Meanwhile, in our PHY calibration strategy, only a total power constraint is imposed. Nevertheless, the framework can be extended to various practical constraints. For example, per-time-slot power constraints can be handled via power clipping.
V-B Multi-User Digital SCs
We consider a multi-user digital SC where users transmit different images to a single base station (BS). The channels of all users are assumed to be independent and remain constant during the transmission of all symbols. For the -th user, the SF channels trained for different mutual information limits , satisfying , are given. The corresponding losses follow . Our objective for PHY calibration is to jointly determine a proper SF channel, the transmit power, and the modulation levels. The optimization problem is formulated as
| (52) | |||
| (53) | |||
| (54) |
where is the transmit power for the -th symbol, is the modulation level, is the number of transmitted bits, and is the corresponding symbol sequence length for the -th user. The weighting factors and control the trade-off between the total power consumption and the task performance of each user. In the first constraint, denotes the trained (target) bit-flip probability of the -th bit in the -th SF channel. Each -th bit is transmitted within the -th symbol, where . The BER for this bit is defined as
| (55) |
where is the channel coefficient of the -th user, , , and [6]. The second constraint limits the total power budget of each user. The third constraint guarantees that the total number of channel uses across all users does not exceed , and the fourth constraint is the candidate modulation levels.
To solve problem , we first sort in descending order with respect to in advance, where the index is retained for notational simplicity. The sorted bit-flip probabilities are grouped by every bits, and the minimum value within each group is defined as
| (56) |
for , where . The sorting above groups bits with similar bit-flip probabilities. This helps reduce the total transmit power because the transmit power of each symbol is determined by the minimum bit-flip probability within its group, as described in below. Given and , an auxiliary variable is precomputed as
| (57) |
for all , assuming . When communication begins, the actual channel-gain-to-noise-power ratio is used to determine the required transmit power as
| (58) |
Under the total power constraint, the feasible set for the -th user is defined as
| (59) |
where . For each feasible pair , the corresponding objective value is given by
| (60) |
For notational convenience, we redefine
| (61) |
where indexes each feasible pair . Then, problem can be reformulated as
| (62) | |||
| (63) |
where the first two constraints ensure that exactly one candidate is selected from the feasible set for the -th user. The third constraint corresponds to the total channel-use constraint in (54). We note that problem is a conventional multiple-choice knapsack problem. This is a well-studied combinatorial optimization problem, and many efficient solvers have been developed [17]. From a computational complexity perspective, the worst-case approach is exhaustive search, which evaluates all combinations across the candidate sets for each user. In practice, however, the number of candidate SF channels and modulation levels per user is small, resulting in moderate computational cost.
In the proposed PHY calibration for multi-user digital SC, the optimal SF channel index and modulation level are first determined at the BS by solving . The BS then transmits and to each user. Upon receiving them, each user computes the optimal transmit power as , which can also be computed at the BS. Therefore, only a small amount of information needs to be exchanged.
VI Simulation Results
In this section, we demonstrate the superiority of the proposed SF channel in SCs, using the MNIST [20], CIFAR- [19], and STL- [8] datasets. Unless otherwise stated, the enc-dec architecture follows the same configuration as in [27], except that the activation function of the last encoder layer is replaced with a sigmoid. The loss function is used as the MSE loss when evaluating with the PSNR, and the SSIM loss when evaluating with the SSIM [33]. For MNIST and CIFAR-, the number of training epochs is set to 50 for PSNR and 20 for SSIM, while 100 epochs are used for STL-. The batch size is fixed to for all datasets, and the Adam optimizer [18] is employed with an initial learning rate of .
For performance comparison of analog SCs, we consider the following baselines.
-
•
DeepJSCC-A (Proposed SFC): This framework integrates the proposed SF channel (SFC) optimization into the analog DeepJSCC framework of [3].
- •
-
•
DeepJSCC-A (ERC): This variant modifies the conventional DeepJSCC by explicitly imposing an equal-rate constraint across all SFs. Specifically, the noise variance of the -th SF is adjusted so that its communication rate satisfies .
- •
For performance comparison of digital SCs, we consider the following baselines.
-
•
DeepJSCC-D (Proposed SFC): This framework incorporates the proposed SF channel optimization into the digital DeepJSCC of [7].
-
•
DeepJSCC-D (ENVC = ERC) [7]: This framework can be regarded as a quantized version of DeepJSCC-A (ENVC), extending the one-bit quantization process in [7] to a multi-bit representation. For training, it adopts multiple BSCs with an equal bit-flip probability applied to all bits, resulting in equal rate allocation.
- •
All digital SC frameworks use an 8-bit uniform quantizer for the encoder output.
Fig. 4 shows the PSNR performance of analog SCs on the MNIST dataset for different values of the mutual information limit and the SF vector length . In Fig. 4(a), is fixed to (corresponding to ), while in Fig. 4(b), is fixed to . Similar to the Gaussian case, Fig. 4(a) shows that the proposed SFC consistently achieves the highest PSNR across all values of . This indicates that the proposed SFC utilizes the available mutual information more effectively than the baselines by optimizing the SF channel. In Fig. 4(b), when is small, all schemes yield relatively low PSNR due to strong compression. However, as increases, the PSNR of the proposed SFC gradually improves and eventually converges. This is because a larger preserves more information from the input data, but the gains diminish due to the limited mutual information. In contrast, the ENVC, ERC, and IB baselines initially show an increase in PSNR but begin to degrade as becomes large. This degradation occurs because increasing forces stronger noise to be assigned to all SFs, thereby distorting even the task-critical SFs.
Fig. 5 shows the PSNR performance of digital SCs on the CIFAR- dataset for different values of the mutual information limit and the bit sequence length . The enc-dec architecture follows a Swin Transformer-based SwinJSCC in [36]. In Fig. 5(a), is fixed to (corresponding to ), while in Fig. 5(b), is fixed to . In line with the Gaussian and analog SC results, Fig. 5(a) shows that the proposed SFC consistently outperforms the other baselines over the entire range of . In Fig. 5(b), when , the bit sequence length is smaller than or equal to . In this case, the communication becomes error-free, and all schemes achieve identical PSNR values. Meanwhile, the comparison with BlindSC demonstrates that the proposed SFC achieves superior performance by leveraging an information-theoretic optimization instead of heuristic loss design.
Fig. 6 shows the PSNR performance of single-user analog and digital SCs on the MNIST dataset for different values of SNR. In this simulation, we set and . For analog SC, . For digital SC, , and 4-QAM is used. For both SCs, the transmission is performed over Rayleigh fading subchannels, each spanning 10 channel uses. For fair comparison, all schemes, except for IB-SA, follow the PHY calibration strategy in Sec. V-A with their respective target SNRs or BERs. For IB-SA, since there is no criterion to select the enc-dec pair for a given SNR, we evaluate multiple enc-dec pairs and report the best performance at each SNR. The results show that the proposed SFC consistently achieves the highest PSNR across all SNR regimes. Notably, the performance trend observed here aligns well with Figs. 4 and 5. This consistency demonstrates that the optimized SF channel trained under the mutual information constraint can be faithfully realized in practical wireless environments through the proposed PHY calibration strategy. In other words, even though the training of the SF channel is performed in an abstract mutual-information domain, its performance advantage seamlessly transfers to real physical channels once the PHY calibration is applied.
Fig. 7 shows the SSIM performance of multi-user digital SCs for different values of SNR. In this simulation, we consider three users, where each user transmits images from a different dataset (MNIST, CIFAR-10, and STL-10). For each dataset, the SF vector length is chosen such that holds. The mutual information limits are set as and for all , while the total transmit powers for the three users are set to , , , respectively. Each user experiences an independent Rayleigh fading channel. The other parameters are set as , , and . For fair comparison, all schemes follow the PHY calibration strategy in Sec. V-B with their respective target bit-flip probabilities, and the problem is solved using full search. The results show that the proposed SFC consistently achieves the highest SSIM across all SNR values and datasets. These results also confirm that the SF channel optimized under the mutual-information constraint can be faithfully realized even in digital SCs.
Fig. 8 shows the selection ratios of and over the SNR for the user transmitting the STL- dataset, under the same simulation setting in Fig. 7. The results show that the user mainly selects when the SNR is low and switches to as the SNR increases. This demonstrates that the proposed PHY calibration strategy adaptively chooses the appropriate enc-dec pair depending on the channel condition.
VII Conclusion
In this work, we reinterpreted SC from the perspective of the encoder–SF channel–decoder pipeline. Unlike conventional approaches that assume a fixed SF channel, we observed that the SF channel is configurable and can be optimized to improve task performance under a mutual information constraint. We first provided a theoretical analysis for Gaussian sources and linear enc-dec mappings, which revealed that the optimal SF channel allocates lower noise variance to sources with higher variance. Building upon this insight, we developed an end-to-end optimization strategy that jointly trains the DNN-based enc-dec and the SF channel, applicable to both analog and digital SCs. We also proposed a PHY calibration strategy that enables the trained SF channel to be realized in practical wireless environments by adaptively controlling PHY parameters, including transmit power and modulation levels. Simulation results across various datasets demonstrated that the proposed SF channel optimization consistently achieves superior image reconstruction quality and adaptability under diverse channel conditions.
Future research may extend the proposed framework in several promising directions. First, jointly addressing source distribution generalization and channel adaptation remains an important direction for future research. In this direction, leveraging generative models could be a promising approach due to their ability to capture rich semantic priors [29, 12]. In particular, it would be interesting to investigate the relationship between transformer-based attention mechanisms and the optimized noise variance of the trained SF channel, as both can be interpreted as measures of semantic importance. Further, when multi-modal generative models are employed, how to design and optimize the SF channel remains an open problem. Second, developing advanced PHY calibration techniques such as beamforming, reconfigurable intelligent surfaces, and non-orthogonal multiple access could further enhance the scalability and real-world applicability [10]. Finally, exploring theoretical bounds for non-Gaussian models would deepen the information-theoretic understanding of the SF channel.
Appendix A Proof of Lemma 1
Let and , which are positive semidefinite matrices. Then, it holds that
| (64) |
where is the -th largest eigenvalue. By the theorem of Lidskii and Wielandt [1], we have
| (65) |
where denotes a vector, and represents the majorization relation between vectors. Since the mapping is Schur-convex, it follows that
| (66) |
Substituting this bound into (64) yields
| (67) |
Here, and are determined by the eigenvalues of and , respectively. Hence, the right-hand side of (67) depends only on and . The left-hand side is a function of and thus varies with its choice. The equality in (67) can be achieved when is diagonal with its entries arranged in the reverse order of those of . Taking this condition into account, together with the constraint , the optimal form of is given by a partial permutation matrix.
Appendix B Proof of Lemma 2
Let denote the objective value for an active set . For with , consider , , and the swapped set . Under the optimal noise variance in (22), the Lagrange multiplier can be represented as Since and differ by one element, the ratio between the two multipliers is obtained as where . Then, the difference between the objective values of and is given by
| (68) |
From Bernoulli’s inequality, for and , it can be shown that Substituting this bound into (68) yields
where the inequality follows from for the active components. Therefore, including a source with a larger variance in the active set reduces distortion. By repeatedly applying this argument, the optimal active set is determined as . This completes the proof.
References
- [1] (1994-Mar.) Majorizations and inequalities in matrix theory. Linear Algebra Appl. 199, pp. 17–67. Cited by: Appendix A.
- [2] (2023-Dec.) DeepJSCC-1++: Robust and bandwidth-adaptive wireless image transmission. In Proc. IEEE Global Commun. Conf. (GLOBECOM), Kuala Lumpur, Malaysia, pp. 3148–3154. Cited by: TABLE I, §I, §I.
- [3] (2019-Sep.) Deep joint source-channel coding for wireless image transmission. IEEE Trans. Cogn. Commun. Netw. 5 (3), pp. 567–579. Cited by: TABLE I, §I, §I, §I, 1st item, 2nd item, 2nd item.
- [4] (2025-Apr.) End-to-end learning for task-oriented semantic communications over MIMO channels: An information-theoretic framework. IEEE J. Sel. Areas Commun. 43 (4), pp. 1292–1307. Cited by: §I.
- [5] (2025-Feb.) Less data, more knowledge: Building next generation semantic communication networks. IEEE Commun. Surveys Tuts. 27 (1), pp. 37–76. Cited by: §I.
- [6] (2002-Jul.) On the general BER expression of one- and two-dimensional amplitude modulations. IEEE Trans. Commun. 50 (7), pp. 1074–1080. Cited by: §V-B.
- [7] (2019-Jun.) Neural joint source-channel coding. In Proc. Int. Conf. Machine Learning (ICML), Long Beach, CA, USA, pp. 1182–1192. Cited by: TABLE I, §I, §I, 1st item, 2nd item, 2nd item.
- [8] (2011-Apr.) An analysis of single-layer networks in unsupervised feature learning. In Proc. Int. Conf. Artif. Intell. Statist. (AISTATS), Ft. Lauderdale, FL, USA, pp. 215–223. Cited by: §VI.
- [9] (1999) Elements of information theory. John Wiley & Sons. Cited by: §III-B, §III-B.
- [10] (2017-Feb.) Application of non-orthogonal multiple access in LTE and 5G networks. IEEE Commun. Mag. 55 (2), pp. 185–191. Cited by: §VII.
- [11] (2005) Wireless communications. Cambridge, U.K.: Cambridge Univ. Press. Cited by: §I, §II-A, footnote 2.
- [12] (2024-Apr.) Enhancing semantic communication with deep generative models: An overview. In Proc. IEEE Int. Conf. Acoust., Speech, Signal Process. (ICASSP), Seoul, Korea, Republic of, pp. 13021–13025. Cited by: §VII.
- [13] (1998-Oct.) Quantization. IEEE Trans. Inf. Theory 44 (6), pp. 2325–2383. Cited by: 2nd item.
- [14] (2025-Sep.) Joint source–channel coding: Fundamentals and recent progress in practical designs. Proc. IEEE 113 (9), pp. 888–919. Cited by: §I.
- [15] (2024-Jan.) Joint task and data-oriented semantic communications: A deep separate source-channel coding scheme. IEEE Internet Things J. 11 (2), pp. 2255–2272. Cited by: §I.
- [16] (2025-Apr.) Universal joint source-channel coding for modulation-agnostic semantic communication. IEEE J. Sel. Areas Commun. 43 (7), pp. 2560–2574. Cited by: TABLE I, §I, §I.
- [17] (2004) The multiple-choice knapsack problem. Springer Berlin Heidelberg. Cited by: §V-B.
- [18] (2015-05) Adam: A method for stochastic optimization. In Proc. Int. Conf. Learn. Represent. (ICLR), San Diego, CA, USA, pp. 1–13. Cited by: §VI.
- [19] (2009) Learning multiple layers of features from tiny images. Note: M.S. thesis, Univ. Toronto, Toronto, ON, Canada Cited by: §VI.
- [20] (1998-Nov.) Gradient-based learning applied to document recognition. Proc. IEEE 86 (11), pp. 2278–2324. Cited by: §VI.
- [21] (2024-05) Rate-adaptable multitask-oriented semantic communication: An extended rate–distortion theory-based scheme. IEEE Internet Things J. 11 (9), pp. 15557–15570. Cited by: TABLE I, §I, §I.
- [22] (2022-Feb.) Semantic communications: Overview, open issues, and future research directions. IEEE Wireless Commun. 29 (1), pp. 210–219. Cited by: §I.
- [23] (2024-Nov.) Improving channel resilience for task-oriented semantic communications: A unified information bottleneck approach. IEEE Commun. Lett. 28 (11), pp. 2623–2627. Cited by: 4th item.
- [24] (2024-Oct.) Semantic feature division multiple access for multi-user digital interference networks. IEEE Trans. Wireless Commun. 23 (10), pp. 15230–15244. Cited by: 2nd item.
- [25] (2025-Jun.) Communication-efficient split learning via adaptive feature-wise compression. IEEE Trans. Neural Netw. Learn. Syst. 36 (6), pp. 10844–10858. Cited by: 2nd item.
- [26] (2025-Nov.) Deep learning-based modulation and power control: A BER perspective. IEEE Commun. Lett. 29 (11), pp. 2616–2620. Cited by: §IV-B.
- [27] (2025-Nov.) Blind training for channel-adaptive digital semantic communications. IEEE Trans. Commun. 73 (11), pp. 11274–11290. Cited by: TABLE I, §I, §IV-B, §IV-B, §IV-B, 3rd item, 3rd item, §VI.
- [28] (2025-Feb.) Joint source-channel coding for channel-adaptive digital semantic communications. IEEE Trans. Cogn. Commun. Netw. 11 (1), pp. 75–89. Cited by: TABLE I, §I, §I.
- [29] (2024-Aug.) Knowledge base enabled semantic communication: A generative perspective. IEEE Wireless Commun. 31 (4), pp. 14–22. Cited by: §VII.
- [30] (2025) Separate source channel coding is still what you need: An LLM-based rethinking. Note: arXiv:2501.04285 Cited by: §I.
- [31] (2023-Jun.) Semantics-native communication via contextual reasoning. IEEE Trans. Cogn. Commun. Netw. 9 (3), pp. 604–617. Cited by: §I.
- [32] (2022-Jan.) Learning task-oriented communication for edge inference: An information bottleneck approach. IEEE J. Sel. Areas Commun. 40 (1), pp. 197–211. Cited by: 4th item.
- [33] (2022-Dec.) DeepJSCC-Q: Constellation constrained deep joint source-channel coding. IEEE J. Sel. Areas Inf. Theory 3 (4), pp. 720–731. Cited by: TABLE I, §I, §I, §VI, footnote 1.
- [34] (2023-05) Vision transformer for adaptive image transmission over MIMO channels. In Proc. IEEE Int. Conf. Commun. (ICC), Rome, Italy, pp. 3702–3707. Cited by: TABLE I, §I, §I.
- [35] (2023-Aug.) Robust information bottleneck for task-oriented communication with digital modulation. IEEE J. Sel. Areas Commun. 41 (8), pp. 2577–2591. Cited by: §IV-A.
- [36] (2025-Feb.) SwinJSCC: Taming swin transformer for deep joint source-channel coding. IEEE Trans. Cogn. Commun. Netw. 11 (1), pp. 90–104. Cited by: §VI.
- [37] (2022-Jun.) OFDM-guided deep joint source channel coding for wireless multipath fading channels. IEEE Trans. Cogn. Commun. Netw. 8 (2), pp. 584–599. Cited by: TABLE I, §I, §I.
- [38] (2023-firstquarter) Semantic communications for future Internet: Fundamentals, applications, and challenges. IEEE Commun. Surveys Tuts. 25 (1), pp. 213–250. Cited by: §I.
- [39] (2023-Aug.) Predictive and adaptive deep coding for wireless image transmission in semantic communication. IEEE Trans. Wireless Commun. 22 (8), pp. 5486–5501. Cited by: TABLE I, §I, §I.