Machine Learning For Transmit Beamforming and Power Control
Machine Learning For Transmit Beamforming and Power Control
Nikos Sidiropoulos
University of Virginia
ECE Department
ML4COM Workshop @ ICC 2018, Kansas City MO, May 24, 2018
Introduction
Worst-case design: [Karipidis et al. 2008], [Zheng et al. 2008], [Tajer et al.
2011], [Song et al. 2012], [Huang et al. 2013], [Ma et al. 2017]
Downlink channels: bounded perturbations of a set of nominal channel
vectors
Metric: worst-case QoS w.r.t. all channel perturbations
Can result in a very conservative design
Outage-based Design
Prior approaches:
Postulate/fit a model for the underlying probability distribution
Use knowledge of distribution to minimize outage probability
NP-hard → Approximation algorithms, still computationally demanding
Our approach:
Knowledge of underlying distribution not required
Stochastic approximation - simple, online algorithms for directly
minimizing outage
Performs remarkably well, hard to analyze
Problem Statement
y = hH ws + n
Problem Formulation
Problem Formulation
Equally applicable to single-group multicast beamforming [Ntranos et al.
2009]
Challenges
Key Idea
Reformulate as stochastic optimization problem
T
( )
H 2 1X
min Pr |w h| < γ = Eh [I{|wH h|2 <γ} ] ≈ I{|wH ht |2 <γ}
w∈W T
t=1
(
1, if f (x) < a
I{f (x)<a} = : Indicator function
0, otherwise
Interpretation: minimize total # outages over (“recent”) channel
“history” - very reasonable
Use stochastic approximation [Robbins-Monro 1951], [Shapiro et al. 2009]
Given most recent channel realization ht
Update w to minimize instantaneous cost function I{|wH ht |2 <γ}
Stochastic Approximation
Benefits:
Knowledge of channel distribution not required!
Online implementation
Low memory and computational footprint
Naturally robust to intermittent/stale feedback from the user
All channel vectors are statistically equivalent
Feedback requirements are considerably relaxed
Can also exploit feedback from “peer” users
“Collaborative Filtering/Beamforming”
Well suited for FDD systems
Can it work well for our non-convex, NP-hard problem?
Stochastic Approximation
Major roadblock:
Indicator function is non-convex, discontinuous
Proposed solution:
Approximate indicator function via smooth surrogates
Visualization of Surrogates
1
Indicator
PWM
0.9
Smoothed PWM
Sigmoid
0.8
0.7
0.6
f(w)
0.5
0.4
0.3
0.2
0.1
0
-3 -2 -1 0 1 2 3
Nikos Sidiropoulos (University of Virginia) w
ML for Tx BMF & PC 11 / 49
Learning to Beamform
Sigmoidal Approximation:
1
u(w̃; h̃) :=
1 + exp (kH̃T w̃k22 − γ)
Continuously differentiable
Non-differentiable!
Solution: Apply Nesterov’s smoothing trick [Nesterov 2005]
Online Algorithms
Online Gradient Descent (OGD)
Given realization ξt , define ft (x) := f (x; ξt )
Update:
x(t+1) = ΠX (x(t) − αt ∇ft (x(t) )), ∀ t ∈ N
Online Variance Reduced Gradient (OVRG) [Frostig et al. 2015]
Streaming variant of SVRG [Johnson-Zhang 2013]
Proceeds in stages
At each stage s ∈ [S], define “centering variable” ys from last stage
“Anchor” OGD iterates to gradient of ys
Eξ [∇f (ys ; ξ)] is unavailable; form surrogate via mini-batching
1 X
ĝs := ∇fi (ys )
ks
i∈[ks ]
Update:
x(t+1)
s = ΠX (x(t) (t) (t)
s − αs (∇ft (xs ) − ∇ft (ys ) + ĝs )), ∀ t ∈ [T ]
(T +1)
Set ys+1 = xs
Nikos Sidiropoulos (University of Virginia) ML for Tx BMF & PC 16 / 49
Learning to Beamform
Convergence?
According to theory:
OGD:
(a.s.)convergence to stationary point with diminishing step-size rule
[Razaviyayn et al. 2016]
Requires f (; ξ) to have L Lipschitz continuous gradients
OVRG:
Only established for strongly convex with constant step-sizes f (; ξ)
[Frostig et al. 2015]
Extension to non-convex f (; ξ) currently an open problem
To go by the book (or not)?
OGD: hard to estimate L; estimates too conservative to work well in
practice
OVRG: non-trivial to establish convergence
Use empirically chosen step-sizes; work well in simulations
Alternative approach:
max wH Rw
w∈W
Online solution:
Sum-power constraints: Oja’s Algorithm [Oja 1982]
(a.s.)convergence to optimal solution
Per-antenna constraints: Stochastic SUM [Razaviyayn et al. 2016]
(a.s.)convergence to stationary point
Simulations
Setup:
Algorithms: Sigmoid OGD & OVRG, PWM OGD & OVRG, Online
Markov Approximation (OM-App)
Step-sizes: Diminishing rule for OGD, constant for OVRG
Iteration Number: fix maximum gradient budget for all methods
Smoothing parameter for PWM µ = 10−3
For OVRG
Length of each stage: T = 1000
Mini-batch sizes:
80,
s=1
ks = 2ks−1 , ks < 640
640, otherwise
Illustrative Example
OM-App
Sigmoid OGD
Sigmoid OVRG
P(outage)
PWM OGD
PWM OVRG
10 -2
Detailed Results
10 0
10 -1
10 -1
OM-App
Sigmoid OGD
P(outage)
P(outage)
10 -2 Sigmoid OVRG
PWM OGD
PWM OVRG
OM-App
10 -2 Sigmoid OGD
Sigmoid OVRG
PWM OGD
PWM OVRG
10 -3
10 -3
1 2 3 4 5 6 7 8 9 10 50 100 150 200
Threshold γ1/2 Number of Antennas
Be bold!
T1 T2 T3
p1 p2 p3
h13
R1 R2 R3
Example: Formulation
|hkk |2 pk
SINR = P 2 2
j6=k |hkj | pj + σk
Example: Formulation
Maximize weighted system throughput:
K
!
X |hkk |2 pk
max f (p; h) = αk log 1 + P 2 2
j6=k |hkj | pj + σk
p
k=1
αk : nonnegative weights
Pmax : max power allocated to each user
NP hard problem [Luo-Zhang 2008]
Lots of iterative algorithms in the literature deal with (generalized
versions of) this problem, e.g., SCALE [Papandriopoulos et al 09],
Pricing [Shi et al 08], WMMSE [Shi et al 11], BSUM [Hong et al 14];
See [Schmidt et al 13] for comparison of different algorithms
Introduction
Literature Review
Proposed Method
ACCURATE
SLOW
Algorithm
ACCURATE
Optimized FAST
Problem
Instance Solution
ACCURATE Problem Learner Desired
Learner Instance τ (·; θ) Solution
τ (·; θ)
Error
max(·,0)
Recent advances
New initialization methods [Hinton et al. (2012)]
New training algorithms: ADAM [Kingma and Ba (2014)], RMSprop
[Hinton et al. (2012)], ...
New hardwares: CPU clusters, GPUs, TPUs, ...
0.8
0.6 GD
DNN
0.4
x (optimal solution)
0.2
-0.2
-0.4
-0.6
-0.8
-0.5 0 0.5
h (problem parameter)
Figure: The learned model (the red line) with random initialization
Learning the mapping h → xT ?
Issue: Cannot learn the behavior of the algorithm well
Three layers DNN; 50 K training samples
Nikos Sidiropoulos (University of Virginia) ML for Tx BMF & PC 32 / 49
Proposed Method: Learning to Optimize
0.6 GD
DNN
0.4
x (optimal solution)
0.2
-0.2
-0.4
-0.6
-0.8
-0.5 0 0.5
h (problem parameter)
xk+1 = gk (xk , h)
K
!
X |hkk |2 pk
max f (p; h) = αk log 1 + P 2 2
j6=k |hkj | pj + σk
p
k=1
αk : nonnegative weights
Pmax : max power allocated to each user
Transform the problem into one with three sets of variables (v, u, w)
fpik g
WMMSE
fH (i)g τ (·; θ)
τ (·; θ) pik g
f~ output
fH (i)g
pik g
f~
error
2500 500
User User
2000 Base Station 400 Base Station
1500 300
1000 200
500 100
0 0
-500 -100
-1000 -200
-1500 -300
-2000 -400
-2500 -500
-2500 -2000 -1500 -1000 -500 0 500 1000 1500 2000 2500 -500 0 500
x axis position (meter) x axis position (meter)
IMAC - Results
1.0
WMMSE
800 DNN
0.8
cumulative probability
number of samples
600 0.6
400 0.4
WMMSE
200 DNN
0.2 Max Power
Random Power
0 0.0
0 10 20 30 40 0 10 20 30 40 50 60
sum-rate (bit/sec) sum-rate (bit/sec)
Key observations:
Increase training samples helps
Increase number of neurons helps
Receiver
Two very (NP-)hard problems: BMF for min outage; max sum rate
power control for multiuser interference channel
Boldly using ML (staples): SGD, DNN, ...
Some things we can prove, design currently an art, not difficult to
tune
As engineers, we have to appreciate opportunities, understand why
Updates
Lee et al, Deep Power Control: Transmit Power Control Scheme Based
on Convolutional Neural Network, IEEE Communications Letters
(2018) extend our approach, using sum rate for training in second
stage (can improve upon WMMSE);
de Kerret et al, Decentralized Deep Scheduling for Interference
Channels, arXiv:1711.00625 (2017) consider user scheduling for the IC
using multiple collaboratively trained DNNs
Thank You!
References
F. Rashid-Farrokhi, L. Tassiulas, and K. J. R. Liu, “Joint optimal power control
and beamforming in wireless networks using antenna arrays,” IEEE Trans.
Commun., vol. 46, no. 10, pp. 1313-1324, Oct. 1998.
M. Bengtsson, and B. Ottersten, “Optimal and suboptimal transmit
beamforming,” in Handbook of Antennas in Wireless Communications, L. C.
Godara, Ed. Boca Raton, FL, USA: CRC Press, Aug. 2001, ch. 18.
M. J. Lopez, “Multiplexing, scheduling, and multicasting strategies for antenna
arrays in wireless networks,” Ph.D. dissertation, Elect. Eng. and Comp. Sci.
Dept., MIT, Cambridge, MA, 2002.
N. D. Sidiropoulos, T. Davidson, and Z.-Q. Luo, “Transmit beamforming for
physical-layer multicasting,” IEEE Trans. Signal Process., vol. 54, no. 6, pp.
2239–2251, June 2006.
E. Karipidis, N. D. Sidiropoulos, and Z.-Q. Luo, “Quality of service and
max-min-fair transmit beamforming to multiple co-channel multicast groups,”
IEEE Trans. Signal Process., vol. 56, no. 3, pp. 1268–1279, Mar. 2008.
G. Zheng, K.-K. Wong, and T.-S. Ng, “Robust linear MIMO in the downlink: A
worst-case optimization with ellipsoidal uncertainty regions,” EURASIP J. Adv.
Signal Process., vol. 2008, pp. 1-15, June 2008.
Nikos Sidiropoulos (University of Virginia) ML for Tx BMF & PC 49 / 49
Case Study: Resource Management for Wireless Networks
References
A. Tajer, N. Prasad, and X. Wang, “Robust linear precoder design for multi-cell
downlink transmission,” IEEE Trans. Signal Process., vol. 59, no. 1, pp. 235-251,
Jan. 2011.
E. Song, Q. Shi, M. Sanjabi, R.-Y. Sun, and Z.-Q. Luo, “Robust SINR constrained
MISO downlink beamforming: When is semidefinite programming relaxation
tight?,” EURASIP J. Wireless Commun. Netw., vol. 1, no. 1, pp. 1-11, Dec.
2012.
Y. Huang, D. P. Palomar, and S. Zhang, “Lorentz-positive maps and quadratic
matrix inequalities with applications to robust MISO transmit beamforming,” IEEE
Trans. Signal Process., vol. 61, no. 5, pp. 1121-1130, Mar. 2013.
W.-K. Ma, J. Pan, A. M.-C. So, and T.-H. Chang, “Unraveling the rank-one
solution mystery of robust MISO downlink transmit optimization: A verifiable
sufficient condition via a new duality result”, IEEE Trans. Signal Process., vol. 65,
no. 7, pp. 1909-1924, Apr. 2017.
Y. Xie, C. N. Georghiades, and A. Arapostathis, “Minimum outage probability
transmission with imperfect feedback for MISO fading channels,” IEEE Trans.
Wireless Commun., vol. 4, no. 3, pp. 1084–1091, May 2005.
Nikos Sidiropoulos (University of Virginia) ML for Tx BMF & PC 49 / 49
Case Study: Resource Management for Wireless Networks
References
S. A. Vorobyov, H. Chen, and A. B. Gershman, “On the relationship between
robust minimum variance beamformmers with probabilistic and worst-case
distortionless response constraints,” IEEE Trans. Signal Process., vol. 56, pp.
5719–5724, Nov. 2008.
V. Ntranos, N. D. Sidiropoulos, and L. Tassiulas, “On multicast beamforming for
minimum outage”, IEEE Trans. Wireless Commun., vol. 8, no. 6, pp. 3172–3181,
June 2009.
K.-Y. Wang, A. M.-C. So, T.-H. Chang, W.-K. Ma, and C.-Y. Chi, “Outage
constrained robust transmit optimization for multiuser MISO downlinks: Tractable
approximations by conic optimization,” IEEE Trans. Signal Process., vol. 62, no.
21, pp. 5690-5705, Sep. 2014.
X. He and Y.-C. Wu, “Tight probabilistic SINR constrained beamforming under
channel uncertainties,” IEEE Trans. Signal Process., vol. 63, no. 13, pp.
3490–3505, July 2015.
F. Sohrabi and T. N. Davidson, “Coordinate update algorithms for robust power
loading for the MU-MISO downlink with outage constraints,” IEEE Trans. Signal
Process., vol. 64, no. 11, pp. 2761-2773, June 2016.
H. Robbins and S. Monro, “A stochastic approximation method,” Ann. Math.
Statist., vol. 22, no. 3, pp. 400-407, Sep. 1951.
Nikos Sidiropoulos (University of Virginia) ML for Tx BMF & PC 49 / 49
Case Study: Resource Management for Wireless Networks
References
References
Gregor, Karol, and Yann LeCun. “Learning fast approximations of sparse
coding.” Proceedings of the 27th International Conference on Machine
Learning. 2010.
Daubechies, Ingrid, Michel Defrise, and Christine De Mol. “An iterative
thresholding algorithm for linear inverse problems with a sparsity constraint.”
Communications on pure and applied mathematics 57.11 (2004): 1413-1457.
Sprechmann, Pablo, et al. “Supervised sparse analysis and synthesis
operators.” Advances in Neural Information Processing Systems. 2013.
Hershey, John R., Jonathan Le Roux, and Felix Weninger. “Deep unfolding:
Model-based inspiration of novel deep architectures.” arXiv preprint
arXiv:1409.2574 (2014).
Andrychowicz, Marcin, et al. “Learning to learn by gradient descent by
gradient descent.” Advances in Neural Information Processing Systems.
2016.
Li, Ke, and Jitendra Malik. “Learning to optimize.” arXiv preprint
arXiv:1606.01885 (2016).
Liang, Shiyu, and R. Srikant. ”Why Deep Neural Networks for Function
Approximation?.” (2016).
Nikos Sidiropoulos (University of Virginia) ML for Tx BMF & PC 49 / 49
Case Study: Resource Management for Wireless Networks
References
References
Kingma, Diederik, and Jimmy Ba. “Adam: A method for stochastic
optimization.” arXiv preprint arXiv:1412.6980 (2014).
Mhaskar, Hrushikesh, Qianli Liao, and Tomaso Poggio. “Learning functions:
When is deep better than shallow.” arXiv preprint arXiv:1603.00988 (2016).
Luo, Zhi-Quan, and Shuzhong Zhang. “Dynamic spectrum management:
Complexity and duality.” IEEE Journal of Selected Topics in Signal
Processing 2.1 (2008): 57-73.
Shi, Qingjiang, et al. “An iteratively weighted MMSE approach to
distributed sum-utility maximization for a MIMO interfering broadcast
channel.” IEEE Transactions on Signal Processing 59.9 (2011): 4331-4340.
Gjendemsj, Anders, et al. “Binary power control for sum rate maximization
over multiple interfering links.” IEEE Transactions on Wireless
Communications (2008).
Hinton, Geoffrey, NiRsh Srivastava, and Kevin Swersky. “Neural Networks
for Machine Learning Lecture 6a Overview of mini-batch gradient descent.”
(2012).
Nikos Sidiropoulos (University of Virginia) ML for Tx BMF & PC 49 / 49
Case Study: Resource Management for Wireless Networks
Input
Training Stage
K K
H 2 R 2 × 2 H 2 RK×K
0
0 0
Receiver Receiver
Algorithm DNN
Testing Stage
Nikos Sidiropoulos (University of Virginia) ML for Tx BMF & PC 49 / 49
Case Study: Resource Management for Wireless Networks
Half-user results