He 2020
He 2020
68, 2020
Abstract—In this paper, we investigate the model-driven deep exponentially with the number of decision variables. Some
learning (DL) for MIMO detection. In particular, the MIMO de- suboptimal linear detectors, such as zero-forcing (ZF) and linear
tector is specially designed by unfolding an iterative algorithm and minimum mean-squared error (LMMSE) detectors, are with
adding some trainable parameters. Since the number of trainable
parameters is much fewer than the data-driven DL based signal reduced computational complexity, but have a huge performance
detector, the model-driven DL based MIMO detector can be rapidly degradation compared with the ML detection.
trained with a much smaller data set. The proposed MIMO detector With excellent performance and moderate complexity, itera-
can be extended to soft-input soft-output detection easily. Further- tive detectors, based on approximate message passing (AMP) [8]
more, we investigate joint MIMO channel estimation and signal and expectation propagation (EP) [9] algorithms, have been
detection (JCESD), where the detector takes channel estimation
error and channel statistics into consideration while channel esti- proposed for MIMO detection [6], [7]. The AMP-based detector
mation is refined by detected data and considers the detection error. approximates the posterior probability on a dense factor graph
Based on numerical results, the model-driven DL based MIMO by using the central limit theorem and the Taylor expansion,
detector significantly improves the performance of corresponding which can achieve Bayes-optimal performance in the large-
traditional iterative detector, outperforms other DL-based MIMO
scale systems when the elements of the channel matrix are
detectors and exhibits superior robustness to various mismatches.
with independent and identically sub-Gaussian distribution. The
Index Terms—Deep learning, Model-driven, MIMO detection, EP-based detector [7], [9] is derived by approximating the pos-
Iterative detector, Neural network, JCESD. terior distribution with factorized Gaussian distributions and can
I. INTRODUCTION achieve Bayes-optimal performance when the channel matrix is
unitarily-invariant1 and with a large scale. However, for practical
ULTIPLE-INPUT multiple-output (MIMO) technology small-size (e.g., 4 × 4 or 8 × 8) MIMO systems, the perfor-
M can dramatically improve the spectral efficiency and link
reliability and has been applied to many wireless communication
mance of these iterative detectors is still far from Bayes-optimal
solution and has serious deterioration with correlated MIMO
systems. To obtain the benefits of MIMO [2], efficient chan- channels and imperfect channel state information (CSI) [10].
nel estimation and signal detection algorithms, which balance Owing to strong learning ability from the data, deep learn-
performance and complexity, are essential in receiver design ing (DL) has been successfully introduced to computer vision,
and have arouse a series of research [3]–[7]. Among existing automatic speech recognition, and natural language process-
detectors, maximum likelihood (ML) detection can achieve ing. Recently, it has been applied in physical layer communi-
the optimal performance. However, its complexity increases cations [11]–[13], such as channel estimation [14]–[16], CSI
feedback [17], signal detection [18]–[26], channel coding [27],
Manuscript received July 22, 2019; revised December 13, 2019 and February [28], and end-to-end transceiver design [29], [30]. In particular,
10, 2020; accepted February 23, 2020. Date of publication February 28, 2020;
date of current version March 16, 2020. The associate editor coordinating the a five-layer fully connected deep neural network (DNN) is
review of this manuscript and approving it for publication was Prof. B. Shim. embedded into an orthogonal frequency-division multiplexing
This work was supported in part by the National Key Research and Development (OFDM) system for joint channel estimation and signal de-
Program 2018YFA0701602, in part by the National Science Foundation of China
(NSFC) for Distinguished Young Scholars with Grant 61625106, and in part by tection (JCESD) by treating the receiver as a black box and
the NSFC under Grant 61941104. The work of Hengtao He was supported in without exploiting domain knowledge [18]. However, training
part by the Scientific Research Foundation of Graduate School of Southeast such a black-box-based network requires a lot of training time
University under Grant YBPY1939 and in part by the Scholarship from the
China Scholarship Council under Grant 201806090077. The work of Chao- in addition to a huge data set. On the other hand, model-driven
Kai Wen was supported in part by the Ministry of Science and Technology DL constructs the network topology based on known domain
of Taiwan under Grants MOST 108-2628-E-110-001-MY3 and in part by the knowledge and has been successfully applied to image recon-
ITRI in Hsinchu, Taiwan. This paper was presented in part at the IEEE Global
Conference Signal and Information Processing (Globalsip), Anaheim, CA, Nov. struction [32], sparse signal recovery [33]–[36], and wireless
2018 [1]. (Corresponding author: Shi Jin.) communications recently [1], [12].
Hengtao He and Shi Jin are with the National Mobile Communications For MIMO detection, a specifically designed network, named
Research Laboratory, Southeast University, Nanjing 210096, China (e-mail:
[email protected]; [email protected]). DetNet, has been proposed in [19] by unfolding the iteration of
Chao-Kai Wen is with the Institute of Communications Engineering, Na- a projected gradient descent algorithm and adding considerable
tional Sun Yat-sen University, Kaohsiung 804, Taiwan (e-mail: chaokai.wen@ trainable variables. DetNet has comparable performance with
mail.nsysu.edu.tw).
Geoffrey Ye Li is with the School of Electrical and Computer Engineering,
Georgia Institute of Technology, Atlanta, GA 30332 USA (e-mail: liye@ece. 1 A matrix A = UΣV is unitarily-invariant if U, Σ and V are mutually
gatech.edu). independent, and U, V are Haar-distributed. The independent and identically
Digital Object Identifier 10.1109/TSP.2020.2976585 distributed (i.i.d.) Gaussian matrix is a typical unitarily-invariant matrix.
1053-587X © 2020 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See https://www.ieee.org/publications/rights/index.html for more information.
Authorized licensed use limited to: University of Canberra. Downloaded on April 28,2020 at 07:37:09 UTC from IEEE Xplore. Restrictions apply.
HE et al.: MODEL-DRIVEN DEEP LEARNING FOR MIMO DETECTION 1703
the AMP-based detector and is more robust to ill-conditioned architecture,2 we introduce the signal detection considering
channels [19]. To further reduce the number of learnable param- channel estimation error and data-aided channel estimation.3
eters and improve convergence, the approaches in [20] and [21] We assume that channel matrix H ∈ CNr ×Nt does not
use DL techniques for the belief propagation and message pass- change in a time slot. In each time slot, Np pilot vec-
ing detector, respectively. In [26], a DL-based sphere decoding tors xp [n] ∈ CNt ×1 for n = 1, . . . , Np , are first transmitted,
algorithm is proposed, where the radius of the decoding hyper- which are followed Nd data vectors, xd [n] ∈ CNt ×1 . The re-
sphere is learned by DNN. The performance achieved by this ceived signal vectors are yp [n] ∈ CNr ×1 for n = 1, . . . , Np and
algorithm is very close to the optimal ML detection. However, yd [n] ∈ CNr ×1 for n = 1, . . . , Nd corresponding to the pilot
most of the existing DL-based detector assume accurate CSI at and data vectors, respectively. We can also express them into
the receiver and ignore the channel estimation error. matrix forms as Xp = (xp [1], . . . , xp [Np ]) ∈ CNt ×Np , Yp =
Motivated by existing works, we develop a model-driven DL (yp [1], . . . , yp [Np ]) ∈ CNr ×Np , Xd = (xd [1], . . . , xd [Nd ]) ∈
network, named OAMP-Net2, for MIMO detection in this arti- CNt ×Nd , and Yd = (yd [1], . . . , yd [Nd ]) ∈ CNr ×Nd .
cle, where the iterative detector is improved with a few number
of trainable variables to adapt to various channel environments. A. JCESD Architecture
The structure of the detector is obtained by unfolding the OAMP
detector, which is similar to our early work in [1] and inspired As in Fig. 1, we consider a turbo-like JCESD architecture for
by the TISTA network [34] but adds more trainable parame- MIMO systems in this paper, which shares the same spirit as iter-
ters. Furthermore, an OAMP-Net2-based JCESD architecture is ative decoding. In JCESD, channel estimator and signal detector
proposed for imperfect CSI and data-aided scheme is utilized exchange information iteratively until convergence [40]. In the
to further improve channel estimation. The trainable parameters first iteration, pilot-only based channel estimation is performed.
are optimized by DL technique to adapt to various channel envi- In the subsequent iterations, data-aided channel estimation is
ronments and take channel estimation error into consideration. employed with the help of the detected data.
The main contributions of this paper are summarized as follows: The input of the JCESD architecture is the pilot signal matrix,
• Different from the existing DL-based MIMO detector [19]– Xp , received signal matrix corresponding to the pilot matrix,
[24], [26] with perfect CSI, we consider the MIMO detec- Yp , corresponding to the data matrix, Yd in each time slot. In
tion with estimated channel, which improves the perfor- the l-th turbo iteration, Ĥ(l) is the estimated channel matrix,
(l) (l) (l)
mance of MIMO receiver by considering the characteristics X̂d is the estimated data matrix, and V̂est and V̂det are used
of channel estimation error and channel statistics and using to compute the covariance matrix for equivalent noise in signal
the estimated payload data to refine the channel estimation. detector and channel estimator, respectively. The final output of
(L)
• Compared with the existing DL-based MIMO detec- the signal detector is finally detected data matrix X̂d , where
tor [19]–[21], [26], our proposed detector can provide L is the total number of turbo iterations.
soft-output information for decoder and absorb the soft Compared with the conventional receiver design where the
information. In addition, only a few trainable parameters channel estimator and signal detector are designed separately,
are required to be learned, which can reduce the demand this architecture can improve the performance of the receiver
for computing resources and training time significantly. by considering the characteristics of channel estimation error in
• Based on our numerical results, the OAMP-Net2 has con- addition to channel statistics when performing signal detection
siderable performance gain compared with the OAMP and using the estimated payload data for channel estimation, as
detector. Furthermore, OAMP-Net2 has strong robustness we will illustrate subsequently.
to signal-to-noise (SNR), channel correlation, modulation
symbol and MIMO configuration mismatches. B. Signal Detection With Channel Estimation Error
Notations: For any matrix A, AT , AH , and tr(A) denote the In the MIMO system, the received data signal vector yd [n]
transpose, conjugated transpose, and trace of A, respectively. corresponding to the n-th data vector can be expressed by
In addition, I is the identity matrix, 0 is the zero matrix. A
proper complex Gaussian with mean µ and covariance Ω can yd [n] = Hxd [n] + nd [n], (1)
be described by the probability density function,
where nd [n] ∼ NC (0, σ 2 INr ) is the additive white Gaussian
1 H −1 noise (AWGN) vector. Note that (1) can also be expressed into
NC (z; µ, Θ) = e−(z−µ) Θ (z−µ) . matrix from as
det(πΘ)
Yd = HXd + Nd , (2)
The rest of this paper is organized as follows. After introducing
Nr ×Nd
the JCESD architecture in Section II, we develop channel esti- where Nd = (nd [1], . . . , nd [Nd ]) ∈ C is the AWGN ma-
mator in Section III and propose the model-driven DL detector in trix in the data transmission stage. Denote the estimated channel
Section IV. Then, numerical results are presented in Section V.
Finally, Section VI concludes the paper. 2 One can introduce various JCESD architectures for MIMO systems, includ-
ing the schemes in [37]–[41] we present a turbo-like JCESD architecture similar
II. JOINT CHANNEL ESTIMATION AND SIGNAL DETECTION to [40] in this paper.
3 Although we mainly investigate the model-driven-DL-based MIMO detector
In this section, we consider a MIMO system with Nt trans- in this paper, we first introduce a JCESD architecture and then elaborate the
mit and Nr receive antennas. After presenting the JCESD channel estimator and signal detector modules, respectively.
Authorized licensed use limited to: University of Canberra. Downloaded on April 28,2020 at 07:37:09 UTC from IEEE Xplore. Restrictions apply.
1704 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 68, 2020
Fig. 1. The diagram of the turbo-like JCESD architecture. The channel estimator and signal detector exchange information iteratively until convergence.
Ĥ ∈ CNr ×Nt as estimator as additional pilot. Then, the received signal matrix
Yd corresponding to X̂d can be expressed as
Ĥ = H + ΔH, (3)
Yd = HXd + Nd
where ΔH is the channel estimation error. If the estimated
channel is used for signal detector, the signal detection problem = H(X̂d − Ed ) + Nd
can be formulated as
= HX̂d + (Nd − HEd )
yd [n] = Hxd [n] + nd [n]
= HX̂d + N̂p , (7)
= (Ĥ − ΔH)xd [n] + nd [n]
= Ĥxd [n] + nd [n] − ΔHxd [n] where N̂p = Nd − HEd is the equivalent noise for additional
pilot part X̂d . The statistical information of the n-th column of
= Ĥxd [n] + n̂d [n], (4) N̂p , n̂p [n] ∼ NC (0, V̂p [n]) for n = 1, . . . , Nd , where V̂p [n]
is calculated in Appendix A and will be utilized in data-aided
where n̂d [n] = nd [n] − ΔHxd [n] is the equivalent noise in
channel estimation stage. Then, we denote Y = (Yp , Yd ) as re-
signal detector which includes the contribution of channel esti-
ceived signal matrix corresponding to overall transmitted signal
mation error and original additive noise. n̂d [n] ∼ NC (0, V̂d [n])
matrix. Based on (5) and (7), we have
is assumed to be Gaussian distribution,4 and the covariance
matrix V̂d [n] can be obtained by considering the statistical
Y = (Yp Yd )
properties of the channel estimation error and detailed calculated
process is shown in Appendix A. = (HXp + Np , HX̂d + N̂p )
Authorized licensed use limited to: University of Canberra. Downloaded on April 28,2020 at 07:37:09 UTC from IEEE Xplore. Restrictions apply.
HE et al.: MODEL-DRIVEN DEEP LEARNING FOR MIMO DETECTION 1705
Authorized licensed use limited to: University of Canberra. Downloaded on April 28,2020 at 07:37:09 UTC from IEEE Xplore. Restrictions apply.
1706 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 68, 2020
Fig. 2. Block diagram of OAMP-Net2 detector. The network consists of T cascade layers, and each layer has the same structure that contains the linear estimator
Wt , nonlinear estimator ηt (·), error variance τt2 and vt2 , and tied weights.
Authorized licensed use limited to: University of Canberra. Downloaded on April 28,2020 at 07:37:09 UTC from IEEE Xplore. Restrictions apply.
HE et al.: MODEL-DRIVEN DEEP LEARNING FOR MIMO DETECTION 1707
of the linear estimator and learnable parameters (φt , ξt ). The multiplication, but has performance deterioration in small-size
MMSE estimator (22) can be interpreted as a special case of MIMO systems. The LMMSE detector has a complexity of
(28) by setting φt = 1 and ξt = 0. O(Nt3 ) as one matrix inversion is needed, but it is non-iterative
3) Error Variance Estimators: The error variance estimators and no training is required. Although ML can achieve the opti-
vt2 in (26) and τt2 in (27) play important roles in providing appro- mal performance, it has a complexity of O(|S|Nt ). In Table I,
priate variance estimates required for the linear and nonlinear we compare the computational complexity of the OAMP-net2
estimators in the OAMP-Net2 detector. For error variance vt2 , detector with the state-of-art DL-based MIMO detectors in [19]–
we adopt the same estimator with the OAMP detector in (26). [23] including DetNet [19], DNN-dBP [20], DNN-MPD [21],
For error variance τt2 , we construct the estimator based on two TPG [22], LcgNet [23]. Although the state-of-art DL-based
assumptions on error vectors pt and qt in [42] and incorporate MIMO detectors have lower complexity than the OAMP-Net2
learnable variables (γt , θt ). The detailed derivation for the two detector, their performance is deteriorated in the small-MIMO
variance estimators vt2 and τt2 is provided in Appendix B. Fur- systems to some extent.
thermore, we substitute vt2 by max(vt2 , ) for a small positive Furthermore, we investigate the number of learnable vari-
constant ε = 5 × 10−13 to avoid stability problem. ables in different DL-based MIMO detectors. From Fig. 2, the
4) Learnable Variables: The learnable variables Ω = total number of trainable variables is equal to 4T since each
{Ωt }Tt=1 are optimized in the training process for OAMP-Net2. layer of the OAMP-Net2 contains only four trainable variables
The Bayes-optimal property of the OAMP algorithm has been Ωt = (γt , φt , ξt , θt ). By contrast, 2T trainable variables (γt , θt )
proven in [43], but it is derived in the large system with are required to train in OAMP-Net. However, the numbers of
unitarily-invariant matrix H. In fact, the performance of the learnable variables in DetNet, DNN-MPD and LcgNet heavily
OAMP detector is far from the Bayes-optimal performance in are dominated by the number of antennas in the transmitter or
practical finite-dimensional MIMO systems, especially when the receiver. By contrary, the number of trainable variables of
there are strong spatial correlation and channel estimation error. the OAMP-Net2 and OAMP-Net are independent of the number
These observations motivate us to improve the original iterative of antennas Nr and Nt , and only determined by the number
detector with several trainable parameters to adapt to various of layers T . This is an advantageous feature for large-scale
channel environments. problems, such as massive MIMO detection. With only few
Similar to the TPG detector [22] and OAMP-Net [1], the trainable variables, the stability and speed of convergence can
OAMP-Net2 uses two learnable parameters (γt , θt ) to adjust be improved in the training process.
the linear estimator, and to provide appropriate stepsize for the
update of the mean rt and variance τt2 in the MMSE estima- C. Soft-Input and Soft-Output
tor. On the other hand, the linear estimator (24) is related to
As many modern digital communication systems need to
gradient descent algorithm and its convergence behavior and
produce a probabilistic estimation of the transmitted data given
performance are determined by appropriate step-size of moving
the observations to probabilistic channel decoder. A significant
to the search point. The optimal step-size γt can be learned
issue is whether the MIMO detector can use the soft-information
from the data for the update of the prior mean rt in the MMSE
from the decoder and produce the soft-output. Different from
estimator. Furthermore, the parameter θt has the similar function
the DetNet in [19] that only can provide the soft-output, our
for the error variance τt2 , which can compensate for the channel
proposed detector are the soft-input and soft-output receiver and
estimation error and regulate the τt2 to provide appropriate value
therefore can achieve the turbo equalization. We only provide
for the update of the prior variance in the MMSE estimator.
the principle of the OAMP-Net2-based turbo receiver and the
The parameters (φt , ξt ) in the nonlinear estimator ηt (·) play
specific experimental results are outside the scope of this paper
important roles in constructing an appropriate divergence-free
and will be conducted in the future.
estimator, which has been discussed in [42]. In precise, the
From (28), the OAMP-Net2 can decouple the joint posterior
divergence-free estimator (28) can be applied in the OAMP
probability P(x|yd , Ĥ) into a series of marginal posterior prob-
detector, but the φt and ξt are related to the prior distribution of
ability P(xj |rt , τt ). The marginal posterior probability is used
the original signal and difficult to calculate. Therefore, MMSE
to produce the soft output log-likelihood ratios (LLR), which is
estimator (22) is considered for simplicity in the OAMP detector.
given by
In our early work [1], we set φt = 1 and ξt = 0 in OAMP-Net
and use MMSE estimator (22). By contrary, we adaptively S + P(xj |rt , τt )
learn two parameters φt and ξt in OAMP-Net2 to construct the LA (bj,k ) = log j , (29)
S − P(xj |rt , τt )
nonlinear estimator ηt (·) satisfying the divergence-free property. j
5) Complexity Analysis: The computational complexity re- where bj,k is the k-th bit in the transmitted symbol xj , and
quired for the OAMP-Net2 is O(T Nt3 ). Similar to the OAMP Sj+ and Sj− denote the subsets of the constellation symbols
detector and OAMP-Net, the computational complexity is dom- with the k-th bit being 1 and 0, respectively. After interleaved
inant by the matrix inverse in each layer in (20). When Nt and delivered to the channel encoder, extrinsic LLR can be
is relatively small (e.g., 4 or 8), the matrix inverse operation computed and given to the OAMP-Net2 detector as updated
is always acceptable. By contrary, the AMP detector [5] has prior information. The detector and decoder iteratively exchange
a complexity of O(T Nt2 ), which is dominated by the matrix information until convergence.
Authorized licensed use limited to: University of Canberra. Downloaded on April 28,2020 at 07:37:09 UTC from IEEE Xplore. Restrictions apply.
1708 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 68, 2020
TABLE I
COMPUTATIONAL COMPLEXITY OF DIFFERENT DETECTORS
Authorized licensed use limited to: University of Canberra. Downloaded on April 28,2020 at 07:37:09 UTC from IEEE Xplore. Restrictions apply.
HE et al.: MODEL-DRIVEN DEEP LEARNING FOR MIMO DETECTION 1709
Fig. 3. BERs performance of the OAMP-Net2 and OAMP detector versus the Fig. 4. BERs performance comparison of the OAMP-Net2 with other MIMO
number of layers under QPSK and 16-QAM modulation. detectors under i.i.d. Rayleigh MIMO channels with different orders of
modulation.
Authorized licensed use limited to: University of Canberra. Downloaded on April 28,2020 at 07:37:09 UTC from IEEE Xplore. Restrictions apply.
1710 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 68, 2020
Fig. 5. BERs performance comparisons of the OAMP and OAMP-Net2 under Fig. 6. BERs performance comparison of the OAMP and OAMP-Net2 un-
high-dimensional i.i.d. Rayleigh MIMO channels with different modulation der correlated Rayleigh MIMO channels with ρ = 0.5 for different orders of
symbols. modulation.
C. JCESD Performance
which is defined as
In the above, all detectors are investigated with accurate CSI.
L
T
In this section, we consider OAMP-Net2-based JCESD architec- 1 (i) (l)
l2 (Ω) = xd − x̂d,T (y(i) )22 , (33)
ture for a 4 × 4 MIMO system. Both OAMP and OAMP-Net2 D
t=1
l=1 (i) d ∈D
has four layers. In order to avoid gradient vanishing, we use the
summation of l2 -loss over all L × T layers to train the model, and the learning rate is set to be 0.0001.
Authorized licensed use limited to: University of Canberra. Downloaded on April 28,2020 at 07:37:09 UTC from IEEE Xplore. Restrictions apply.
HE et al.: MODEL-DRIVEN DEEP LEARNING FOR MIMO DETECTION 1711
Fig. 8. BERs performance comparisons of the OAMP and OAMP-Net2 in Fig. 9. BERs performance comparisons of the OAMP and OAMP-Net2 with
JCESD architecture with Np = 4 and Nc = 16. SNR and correlation mismatches for QPSK modulation.
We consider orthogonal pilot in channel training stage, where and tested with mismatched channel correlation coefficient ρ and
the pilot matrix Xp ∈ CNt ×Np is chosen by selecting Nt SNR, which are shown in the figure. From the figure, the trained
columns of the discrete Fourier transformation (DFT) matrix network with correlation mismatch still outperforms the OAMP
F ∈ CNp ×Np . Fig. 8 shows the BERs of the OAMP and OAMP- detector significantly in all setting and has 0.3 dB performance
Net2 detectors in the JCESD architecture, where L = 1 means loss, if we target for BER = 10−2 and ρ = 0.5, which demon-
no data feedback to channel estimator and L = 3 means the strates the OAMP-Net2 has strong robustness to mismatch.
detected data are feedback twice to channel estimator. From Interestingly, the trained network even outperforms the OAMP
the figure, the OAMP-Net2 detector outperforms the OAMP detector with ρ = 0.1 in high SNR regime (SNR = 20–30 dB).
detector in all settings significantly. Specifically, if we target Since the learnable variables successfully compensates for the
for BER = 10−2 with 16-QAM, BER performance improves disadvantages of channel correlation. Compared with perfect
about 2.1 dB for L = 1 and improves about 2.9 dB for L = 3. SNR,7 the OAMP-Net2 detector with SNR mismatch has little
Furthermore, we observe that OAMP-Net2 detector without data performance deterioration except for SNR = 25 dB or above,
feedback (L = 1) outperforms OAMP detector with data feed- when we target ρ = 0.5.
back twice (L = 3), which demonstrates learning some trainable 2) Robustness to MIMO Configuration and Modulation:
variables can compensate for the channel estimation error to Fig. 10 presents the BER performance of OAMP and OAMP-
improve the equivalent SNR in detection stage. For QPSK mod- Net2 with the MIMO configuration and modulation symbol
ulation, the performance gain are 5.1 dB and 3.1 dB for L = 1 mismatches. We train the OAMP-Net2 with 16-QAM symbol
and L = 3, respectively, when we target for BER = 10−2 . In in 8 × 8 MIMO system and test the robustness of the trained
addition, marginal performance is obtained by data feedback for parameters. Fig. 10(a) exhibits the numerical results of the
the OAMP-Net2 detector with QPSK because learnable param- OAMP-Net2 with MIMO configuration mismatch, where the
eters have strong ability to compensate for channel estimation network is tested in 4 × 4 systems with 16-QAM. Although
error. Therefore, no data feedback is needed in this case. the network is employed in different MIMO configurations, the
OAMP-Net2 still outperforms the LMMSE and OAMP detectors
D. Robustness and has little performance loss. The robustness demonstrates
the OAMP-Net2 is flexible to different MIMO configuration.
In this section, we analyze the robustness of the OAMP-Net2
Furthermore, Fig. 10(b) shows the performance of OAMP-Net2
against various mismatches, including the SNR, channel cor-
with modulation symbol mismatch, where the network is tested
relation, MIMO configuration and modulation mismatches. As
in 8 × 8 MIMO system with QPSK symbol. Only 0.4 dB perfor-
aforementioned numerical results are performed when training
mance loss is incurred owing to modulation symbol mismatch if
and test data are generated with same system parameters, an
we target for BER = 10−2 , which demonstrates the OAMP-Net2
interesting question is whether the trained parameters are robust
is robust to modulation symbol mismatch. The OAMP-Net2
to various mismatches. Because of limited data and computing
detector contains a linear and a nonlinear estimators. The linear
resources are available for online training, verifying the robust-
estimator is related to gradient descent algorithm and its conver-
ness of offline-trained network is particularly meaningful.
gence behavior and performance are determined by appropriate
1) Robustness to SNR and Channel Correlation: Fig. 9
step-size γt moving to the search point. As the linear estimator
presents the BER performance of OAMP and OAMP-Net2 with
SNR and correlation mismatches in 8 × 8 MIMO system. The
network is trained in the correlated Rayleigh MIMO channel (31) 7 Perfect SNR means that network in the training and test stage have the same
with channel correlation coefficient ρ = 0.5 and SNR = 20 dB, SNR.
Authorized licensed use limited to: University of Canberra. Downloaded on April 28,2020 at 07:37:09 UTC from IEEE Xplore. Restrictions apply.
1712 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 68, 2020
APPENDIX A
DERIVATION FOR COVARIANCE MATRICES
In order to derive the covariance matrix V̂d [n] of the equiv-
alent noise n̂d [n] in signal detection stage, we should evalu-
ate the statistical properties of the zd [n] = ΔHxd [n] for n =
1, . . . , Nd , which can be expressed as
Cov[zd [n]] = E zd [n](zd [n])H . (34)
0, ∀i = k
H
E[zd,i zd,k ]= Nt (35)
j=1 σΔhi,j , i=k
2
t
where zd,i = N j=1 Δhi,j xd,j is the i-th element of the zd ,
Δhi,j is the (i, j)-th element of the channel estimation error
matrix ΔH, and xd,j is j-th element of the xd . The σΔh
2
i,j
is the
variance of (i, j)-th element in channel estimation error matrix
ΔH which can be obtained from RΔhp in (11) or RΔh in (13).
By considering the contribution of original additive noise, the
covariance matrix V̂est can be obtained as
⎛ ⎞
Nt
Nt
V̂est = diag ⎝ σΔh
2
i,j
+ σ2 , . . . , σΔh
2
i,j
+ σ2 ⎠ .
j=1 j=1
(36)
Fig. 10. BERs performance comparisons of the OAMP and OAMP-Net2 with
The covariance matrix V̂est will be utilized in the OAMP-Net2
the MIMO configuration and modulation symbol mismatches. detector as the covariance matrix of the equivalent noise Rn̂d n̂d .
Next, we compute the covariance matrix V̂p [n] of the equiv-
alent noise n̂p [n] in data-aided channel estimation stage using
is independent of the modulation symbols, the learnable pa- t For each time index n, We denote zn = Hen
similar approach.
rameter γt shows strong robustness to modulation symbol mis- and zi,n = N j=1 hi,j ej,n , where en is the n-th column of the
matches. Therefore, the trained network can be utilized directly signal detection error matrix Ed . In a similar way, we have
in different MIMO configurations with different modulation
symbols. 0, ∀i = k
H
E[zi,n zk,n ]= N t (37)
j=1 σej,n /Nr , i=k
2
VI. CONCLUSION
We have developed a novel model-driven DL network for because hi,j ∼ NC (0, 1/Nr ) and σe2j,n is the variance for j-th
MIMO detection, named OAMP-Net2, which is obtained by element of the en . Therefore, the covariance matrix V̂p [n] is
unfolding the OAMP detection algorithm. The OAMP-Net2 given by
detector inherits the superiority of the Bayes-optimal signal re-
covery algorithm and DL technique, and thus presents excellent ⎛ ⎞
Nt
performance. The network are easy and fast to train because only V̂p [n] = ⎝ σe2j,n + σ 2 ⎠ INt . (38)
few adjustable parameters are required to be optimized. To han- j=1
dle imperfect CSI, an OAMP-Net2-based JCESD is proposed,
where the detector takes channel estimation error and channel
The covariance matrix V̂p [n] will be utilized in channel estima-
statistics into consideration while channel estimation is refined
tor to evaluate the covariance matrix of the equivalent noise Rnn
by detected data and considers the detection error. Simulation
where n = vec(N̂). By considering the different covariance
results demonstrate that significant performance gain can be
matrix of the actual pilot and additional pilot, we have
obtained by learning corresponding optimal parameters from
the data to improve the detector and compensate for the channel
estimation error. Furthermore, the OAMP-Net2 exhibits strong σ 2 IN p N r
Rnn = (39)
robustness to various mismatches. V̂det
Authorized licensed use limited to: University of Canberra. Downloaded on April 28,2020 at 07:37:09 UTC from IEEE Xplore. Restrictions apply.
HE et al.: MODEL-DRIVEN DEEP LEARNING FOR MIMO DETECTION 1713
where σ 2 INp Nr denotes the covariance matrix of the noise Then, for the error variance estimator τt2 , we have
vec(Np ) for actual pilot Xp while
⎡ ⎤ E[qt 22 ] 1
τt2 = = E[(I − γt Wt H)(x̂t − x) + γt Wt n22 ]
V̂p [1] Nt Nt
⎢ .. ⎥
V̂det = ⎣ . ⎦ (40) 1
= E[(x̂t − x)H (I − γt Wt H)(I − γt Wt H)H (x̂t − x)]
V̂p [Nd ] Nt
γt2
denotes the covariance matrix of the equivalent noise vec(N̂p ) + E[nH WtH Wt n]
Nt
for the additional pilot X̂d .
2γt
+ E[(x̂t − x)H (I − γt Wt H)H Wt n]
APPENDIX B Nt
DERIVATION FOR VARIANCE ESTIMATORS 1
= tr (I − γt Wt H)(I − γt Wt H)H vt2
To derive the expressions for the error variance estimators Nt
vt2 and τt2 in OAMP-Net2, we use similar method to [34]. We γt2 2(γt − γt2 )
should indicate that vt2 and τt2 in OAMP-detector are based on + tr(Wt Wt H )σ 2 + E[(x̂t − x)H HH n].
Nt Nt
the following two assumptions about error vectors pt and qt (44)
[42].
Assumption 1: pt consists of i.i.d. zero-mean Gaussian en- The last term in the (44) vanishes as the E[(x̂t − x)H HH n] = 0.
tries independent of xd . Therefore, the error variance estimator τt2 is given by
Assumption 2: qt consists of i.i.d. zero-mean Gaussian en-
1
tries independent of H and n̂d . τt2 = tr (I − γt Wt H)(I − γt Wt H)H vt2
From the similar viewpoint, we obtain the error variance Nt
estimators vt2 and τt2 (26) and (27) in OAMP-Net2 using As- γt2
+ tr(Wt Wt H )σ 2 . (45)
sumption 1 and 2, and considering the effect of the learnable Nt
variables γt and θt . For convenience, we will use H, x̂t , and n
to refer Ĥ, x̂d,t , and n̂d , respectively. Based on Assumption Then, we replace the parameter γt with θt in (45) as the simu-
2, qt consists of i.i.d. zero-mean Gaussian entries which is lation results show that two parameters (γt , θt ) can preferably
independent of H and n, which means that regulate the error variance estimator vt2 and τt2 . The superior
performance is attributed to more parameters, which can incor-
E[(x̂t − x)H HH n] = 0. (41) porate more side information from the data. Therefore, the final
expression for τt2 is obtained by
Therefore, The error variance estimator vt2 is given by,
E[qt 22 ] 1 tr(HH H)E[qt 22 ] 1 2 θt2 σ 2
vt2 = = τt2 = tr Ct CH
t vt + tr Wt WtH , (46)
Nt Nt tr(HH H) Nt Nt
Authorized licensed use limited to: University of Canberra. Downloaded on April 28,2020 at 07:37:09 UTC from IEEE Xplore. Restrictions apply.
1714 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 68, 2020
[7] J. Céspedes, P. M. Olmos, M. Sánchez-Fernandez, and F. Pérez-Cruz, “Ex- [32] Y. Yang, J. Sun, H. Li, and Z. Xu, “ADMM-Net: A deep learning approach
pectation propagation detection for high-order high-dimensional MIMO for compressive sensing MRI,” IEEE Trans. Pattern Anal. Mach. Intell.,
systems,” IEEE Trans. Commun., vol. 62, no. 8, pp. 2840–2849, Aug. 2014. vol. 42, no. 3, pp. 521–538, Mar. 2020.
[8] D. L. Donoho, A. Maleki, and A. Montanari, “Message-passing algorithms [33] M. Borgerding, P. Schniter, and S. Rangan, “AMP-inspired deep networks
for compressed sensing,” Proc. Nat. Acad. Sci., vol. 106, no. 45, pp. 18914– for sparse linear inverse problems,” IEEE Trans. Signal Process. vol 65,
18919, 2009. no. 16, pp. 4293–4308, Aug. 2017.
[9] T. P. Minka, “A family of algorithms for approximate Bayesian Inference,” [34] D. Ito, S. Takabe, and T. Wadayama, “Trainable ISTA for sparse signal
Ph.D. dissertation, Dept. Elect. Eng. Comput. Sci., MIT, Cambridge, MA, recovery,” IEEE Trans. Signal. Process., vol. 67, no. 12, pp. 3113–3125,
USA, 2001. Jun. 2019.
[10] K. Ghavami and M. Naraghi-Pour, “MIMO detection with imperfect [35] X. Chen, J. Liu, Z. Wang, and W. Yin, “Theoretical linear convergence of
channel state information using expectation propagation,” IEEE Trans. unfolded ISTA and its practical weights and thresholds,” in Adv. Neural
Veh. Technol., vol. 66, no. 9, pp. 8129–8138, Sep. 2017. Inf. Process. Syst., pp. 9061–9071, 2018.
[11] T. Wang, C.-K. Wen, H. Wang, F. Gao, T. Jiang, and S. Jin, “Deep [36] K. Gregor and Y. LeCun, “Learning fast approximations of sparse coding,”
learning for wireless physical layer: Opportunities and challenges,” China in Proc. Int’l Conf. Mach. Learn., 2010, pp. 399–406.
Commun., vol. 14, no. 11, pp. 92–111, Nov. 2017. [37] K. Takeuchi, R. R. Müller, M. Vehkaperä, and T. Tanaka, “On an achievable
[12] H. He, S. Jin, C.-K. Wen, F. Gao, G. Y. Li, and Z. Xu, “Model-driven deep rate of large Rayleigh block-fading MIMO channels with no CSI,” IEEE
learning for physical layer communications,” IEEE Wireless Commun., Trans. Inf. Theory., vol. 59, no. 10, pp. 6517–6541, Oct. 2013.
vol. 26, no. 5, pp. 77–83, Oct. 2019. [38] J. Ma and L. Ping, “Data-aided channel estimation in large antenna
[13] Z.-J. Qin, H. Ye, G. Y. Li, and B.-H. Juang, “Deep learning in physical systems,” IEEE Trans. Signal Process., vol. 62, no. 12, pp. 3111–3124,
layer communications,” IEEE Wireless Commun., vol. 26, no. 2, pp. 93–99, Jun. 2014.
Apr. 2019. [39] C.-K. Wen, C.-J. Wang, S. Jin, K.-K. Wong, and P. Ting, “Bayes-optimal
[14] H. He, C.-K. Wen, S. Jin, and G. Y. Li, “Deep learning-based channel esti- joint channel-and-data estimation for massive MIMO with low-precision
mation for beamspace mmWave massive MIMO systems,” IEEE Wireless ADCs,” IEEE Trans. Signal Process., vol. 64, no. 10, pp. 2541–2556,
Commun. Lett., vol. 7, no. 5, pp. 852–855, Oct. 2018. May 2016.
[15] Y. Yang, F. Gao, X. Ma, and S. Zhang, “Deep learning-based channel [40] F. Steiner, A. Mezghani, A. Lee Swindlehurst, J. A. Nossek, and
estimation for doubly selective fading channels,” IEEE Access, vol. 7, W. Utschick, “Turbo-like joint data-and-channel estimation in quantized
pp. 36579–36589, 2019. massive MIMO systems,” in Proc. 20th Int. ITG Workshop Smart Antennas,
[16] C.-J. Chun, J.-M. Kang, and I.-M. Kim, “Deep learning based channel Mar. 2016, pp. 1–5.
estimation for massive MIMO systems,” IEEE Wireless Commun. Lett., [41] S. Talwar, “Blind separation of synchronous co-channel digital signals
vol. 8, no. 4, pp. 1228–1231, Aug. 2019. using an antenna array–part II: performance analysis,” IEEE Trans. Signal
[17] C.-K. Wen, W. T. Shih, and S. Jin, “Deep learning for massive MIMO Process., vol. 45, no. 3, pp. 706–718, Mar. 1997.
CSI feedback,” IEEE Wireless Commun. Lett., vol. 7, no. 5, pp. 748–751, [42] J. Ma and L. Ping, “Orthogonal AMP,” IEEE Access, vol. 5, pp. 2020–
Oct. 2018. 2033, 2017.
[18] H. Ye, G. Y. Li, and B.-H. F. Juang, “Power of deep learning for chan- [43] K. Takeuchi, “Rigorous dynamics of expectation-propagation-based signal
nel estimation and signal detection in OFDM systems,” IEEE Wireless recovery from unitarily invariant measurements, IEEE Trans. Inf. Theory,
Commun. Lett., vol. 7, no. 1, pp. 114–117, Feb. 2018. vol. 66, no. 1, pp. 368–386, Jan. 2020.
[19] N. Samuel, T. Diskin, and A. Wiesel, “Deep MIMO detection,” in [44] S. L. Loyka, “Channel capacity of MIMO architecture using the exponen-
Proc. 18th IEEE Int. Workshop Signal Process. Adv. Wireless Commun. tial correlation matrix,” IEEE Commun. Lett., vol. 5, no. 9, pp. 369–371,
Hokkaido, Japan, Jul. 2017, pp. 1–5. Sep. 2001.
[20] X. Tan, W. Xu, Y. Be’ery, Z. Zhang, X. You, and C. Zhang, “Improving [45] Z. Xue, J. Ma, and X. Yuan, “Denoising-based turbo compressed sensing,”
massive MIMO belief propagation detector with deep neural network,” IEEE Access, vol. 5, pp. 7193–7204, 2017.
2018, arXiv:1804.01002. [46] P. Liu and I.-M. Kim, “Exact and closed-form error performance analysis
[21] X. Tan, Z. Zhong, Z. Zhang, X. You, and C. Zhang, “Low-Complexity for hard MMSE-SIC detection in MIMO systems,” IEEE Trans. Commun.,
message passing MIMO detection algorithm with deep neural network,” vol. 59, no. 9, pp. 2463–2477, Sep. 2011.
in Proc. IEEE Glob. Conf. Signal Inf. Process., Nov. 2018, pp. 559–563. [47] 3GPP, “Technical specification group radio access network; study on 3D
[22] S. Takabe, M. Imanishi, T. Wadayama, and K. Hayashi, “Trainable pro- channel model for LTE (Release 12),” in Proc. 3rd Generation Partnership
jected gradient detector for massive overload MIMO channels: Data-driven Project (3GPP), TR 36.873 V12.2.0, Jun. 2015. [Online]. Available:
tuning approach,” IEEE Access, vol. 7, pp. 93326–93338, 2019. [Online]. Available: http://www.3gpp.org/dynareport/36873.htm
[23] Y. Wei, M.-M. Zhao, M. Hong, M.-J. Zhao, and M. Lei, “Learned con- [48] S. Jaeckel, L. Raschkowski, K. Börner, and L. Thiele, “QuaDRiGa: A
jugate gradient descent network for massive MIMO detection,” 2019, 3-D multi-cell channel model with time evolution for enabling virtual
arXiv:1906.03814. field trials,” IEEE Trans. Antennas Propag., vol. 62, no. 6, 3242–3256,
[24] M. Khani, M. Alizadeh, J. Hoydis, and P. Fleming, “Adaptive neural signal Mar. 2014.
detection for massive MIMO,” 2019, arXiv:1906.04610.
[25] X. Gao, S. Jin, C.-K. Wen, and G. Y. Li, “ComNet: Combination of deep
learning and expert knowledge in OFDM receivers,” IEEE Commun. Lett.,
vol. 22, no. 12, pp. 2627–2630, Dec. 2018.
[26] M. Mohammadkarimi, M. Mehrabi, M. Ardakani, and Y. Jing, “Deep
learning based sphere decoding,” IEEE Trans. Wireless Commun., vol. 18,
no. 9, pp. 4368–4378, Sep. 2019.
[27] E. Nachmani, E. Marciano, L. Lugosch, W. J. Gross, D. Burshtein, and
Y. Beery, “Deep learning methods for improved decoding of linear codes,”
IEEE J. Sel. Topics Signal Process., vol. 12, no. 1, pp. 119–131, Feb. 2018. Hengtao He (Student Member, IEEE) received the
[28] Y. He, J. Zhang, C.-K. Wen, and S. Jin, “TurboNet: A model-driven DNN B.S. degree in communications engineering from
decoder based on max-log-MAP algorithm for turbo code,” in Proc. IEEE the Nanjing University of Science and Technology,
VTS Asia Pacific Wireless Commun. Symp., Singapore, 2019, pp. 1–5. Nanjing, China, in 2015. He is currently working
[29] T. O’Shea and J. Hoydis, “An introduction to deep learning for the physical toward the Ph.D. degree in information and com-
layer,” IEEE Trans. Cogn. Commun. Netw., vol. 3, no. 4, pp. 563–575, munications engineering with Southeast University,
Dec. 2017. Nanjing, China, under the supervision of Prof. S. Jin.
[30] H. Ye, L. Liang, G. Y. Li, and B.-H. F. Juang, “Deep learning From October 2018 to January 2020, he was a Visiting
based end-to-end wireless communication systems with GAN as un- Student with the Department of Electrical and Com-
known channel,” IEEE Trans. Wireless Commun., to be published, doi: puter Engineering, Georgia Institute of Technology,
10.1109/TWC.2020.2970707. Atlanta, GA, USA. His research interests include
[31] I. Santos and J. J. Murillo-Fuentes, “EP-based turbo detection for millimeter wave communications, massive MIMO, and machine learning for
MIMO receivers and large-scale systems,” vol. 8, no. 4, pp. 1095–1098, wireless communications. He was the recipient of the exemplary of IEEE
Aug. 2019. WIRELESS COMMUNICATIONS LETTERS in 2019.
Authorized licensed use limited to: University of Canberra. Downloaded on April 28,2020 at 07:37:09 UTC from IEEE Xplore. Restrictions apply.
HE et al.: MODEL-DRIVEN DEEP LEARNING FOR MIMO DETECTION 1715
Chao-Kai Wen (Member, IEEE) received the Ph.D. Geoffrey Ye Li (Fellow, IEEE) is currently a Pro-
degree from the Institute of Communications Engi- fessor with the Georgia Institute of Technology, At-
neering, National Tsing Hua University, Taiwan, in lanta, GA, USA. Before moving to Georgia Tech,
2004. He was with the Industrial Technology Re- he was with AT&T Labs-Research, Red Bank, NJ,
search Institute, Hsinchu, Taiwan and MediaTek Inc., USA, as a Senior and then a Principal Technical
Hsinchu, Taiwan, from 2004 to 2009. Since 2009, Staff Member, from 1996 to 2000, and a Postdoctoral
he has been with the National Sun Yat-sen Univer- Research Associate with the University of Maryland
sity, Taiwan, where he is Professor with the Institute at College Park, MD, USA, from 1994 to 1996.
of Communications Engineering. His research inter- He has authored/coauthored more than 500 journal
ests include the optimization in wireless multimedia and conference papers in addition to more than 40
networks. granted patents. His publications have been cited
around 40,000 times and he has been recognized as the World’s Most Influential
Scientific Mind, also known as a Highly Cited Researcher, by Thomson Reuters
almost every year. His research interests include statistical signal processing and
machine learning for wireless communications. Dr. G. Y. Li was the recipient
of the IEEE Fellow for his contributions to signal processing for wireless
communications in 2005. He was also the recipient of several prestigious awards
from IEEE Signal Processing Society (Donald G. Fink Overview Paper Award in
Shi Jin (Senior Member, IEEE) received the B.S. 2017), IEEE Vehicular Technology Society (James Evans Avant Garde Award in
degree in communications engineering from Guilin 2013 and Jack Neubauer Memorial Award in 2014), and IEEE Communications
University of Electronic Technology, Guilin, China, Society (Stephen O. Rice Prize Paper Award in 2013, Award for Advances in
in 1996, the M.S. degree from Nanjing University Communication in 2017, and Edwin Howard Armstrong Achievement Award
of Posts and Telecommunications, Nanjing, China, in 2019). He also the recipient of 2015 Distinguished Faculty Achievement
in 2003, and the Ph.D. degree in information and Award from the School of Electrical and Computer Engineering, Georgia Tech.
communications engineering from the Southeast Uni- He has been involved in editorial activities for more than 20 technical journals,
versity, Nanjing, China, in 2007. From June 2007 to including the founding Editor-in-Chief of IEEE 5G Tech Focus and the founding
October 2009, he was a Research Fellow with the Editor-in-Chief of IEEE JSAC Special Series on ML in Communications and
Adastral Park Research Campus, University College Networking. He has organized and chaired many international conferences,
London, London, U.K. He is currently with the fac- including technical program Vice-Chair of IEEE ICC’03, technical program
ulty of the National Mobile Communications Research Laboratory, Southeast Co-Chair of IEEE SPAWC’11, General Chair of IEEE GlobalSIP’14, techni-
University. His research interests include space time wireless communications, cal program Co-Chair of IEEE VTC’16 (Spring), General Co-Chair of IEEE
random matrix theory, and information theory. He serves as an Associate Editor VTC’19 (Fall), and General Co-Chair of IEEE SPAWC’20.
for the IEEE TRANSACTIONS ON WIRELESS COMMUNICATIONS, and IEEE COM-
MUNICATIONS LETTERS, and IET COMMUNICATIONS. Dr. Jin and his coauthors
was the recipient of the 2011 IEEE Communications Society Stephen O. Rice
Prize Paper Award in the field of communication theory and a 2010 Young
Author Best Paper Award by the IEEE Signal Processing Society.
Authorized licensed use limited to: University of Canberra. Downloaded on April 28,2020 at 07:37:09 UTC from IEEE Xplore. Restrictions apply.