0% found this document useful (0 votes)
28 views

Machine Learning For Transmit Beamforming and Power Control

The document discusses using machine learning techniques for transmit beamforming and power control in wireless communication systems. It describes using stochastic approximation algorithms to directly minimize outage probability without requiring knowledge of the channel distribution. The key idea is to reformulate the problem as a stochastic optimization and use online learning methods to update the beamforming weights based on recent channel observations.

Uploaded by

dtvt2006
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
28 views

Machine Learning For Transmit Beamforming and Power Control

The document discusses using machine learning techniques for transmit beamforming and power control in wireless communication systems. It describes using stochastic approximation algorithms to directly minimize outage probability without requiring knowledge of the channel distribution. The key idea is to reformulate the problem as a stochastic optimization and use online learning methods to update the beamforming weights based on recent channel observations.

Uploaded by

dtvt2006
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 67

Machine learning for transmit beamforming and power control

Nikos Sidiropoulos

University of Virginia
ECE Department

ML4COM Workshop @ ICC 2018, Kansas City MO, May 24, 2018

Nikos Sidiropoulos (University of Virginia) ML for Tx BMF & PC 1 / 49


Co-Authors

Haoran Sun Xiangyi Chen Qingjiang Shi Yunmei Shi


UMN UMN NUAA Harbin IT

Aritra Konar Xiao Fu Mingyi hong


UVA Oregon State UMN

Nikos Sidiropoulos (University of Virginia) ML for Tx BMF & PC 1 / 49


Learning to Beamform

Introduction

Transmit Beamforming: [Farrokhi et al. 1998 (multiuser)],


[Bengtsson-Ottersten 2001], [Sidiropoulos et al. 2006 (multicast)]
1 Exploits CSI at base station (BS) to provide QoS, enhance throughput
in multi-antenna wireless systems
2 Exact CSIT cannot be obtained in practice
3 Acquiring accurate CSIT is a burden, esp. for FDD, high mobility
4 Alternative: Robust beamformer design
Optimize robust performance metric w.r.t. channel uncertainty

Nikos Sidiropoulos (University of Virginia) ML for Tx BMF & PC 2 / 49


Learning to Beamform

Robust Design: Prior Art

Worst-case design: [Karipidis et al. 2008], [Zheng et al. 2008], [Tajer et al.
2011], [Song et al. 2012], [Huang et al. 2013], [Ma et al. 2017]
Downlink channels: bounded perturbations of a set of nominal channel
vectors
Metric: worst-case QoS w.r.t. all channel perturbations
Can result in a very conservative design

Outage-based design: [Xie et al. 2005], [Vorobyov et al. 2008], [Ntranos et


al. 2009], [Wang et al. 2014], [He-Wu 2015], [Sohrabi-Davidson 2016]
Downlink channels: random vectors from an underlying distribution
Metric: QoS exceeds pre-specified threshold with high probability
Vary level of conservativeness by changing threshold
Approach adopted here

Nikos Sidiropoulos (University of Virginia) ML for Tx BMF & PC 3 / 49


Learning to Beamform

Outage-based Design

Prior approaches:
Postulate/fit a model for the underlying probability distribution
Use knowledge of distribution to minimize outage probability
NP-hard → Approximation algorithms, still computationally demanding

Our approach:
Knowledge of underlying distribution not required
Stochastic approximation - simple, online algorithms for directly
minimizing outage
Performs remarkably well, hard to analyze

Nikos Sidiropoulos (University of Virginia) ML for Tx BMF & PC 4 / 49


Learning to Beamform

Problem Statement

Point-to-point MISO link:


BS equipped with N transmit antennas
Received signal at user:

y = hH ws + n

QoS: (normalized) receive SNR = |wH h|2


Assumption: Temporal variations of h ∈ CN are realizations of an
underlying distribution
Example: Gaussian Mixture Model (GMM) [Ntranos et al. 2009]
Interpretation: Each Gaussian kernel corresponds to a different channel
state

Nikos Sidiropoulos (University of Virginia) ML for Tx BMF & PC 5 / 49


Learning to Beamform

Problem Formulation

Minimize outage probability subject to power constraints:


  
H 2
min F (w) := Pr |w h| < γ
w∈W

W ⊂ CN : set of power constraints


“simple” (easy to project onto), convex, compact
Example: per-antenna power constraints, sum-power constraints
γ ∈ R+ : Outage threshold

Nikos Sidiropoulos (University of Virginia) ML for Tx BMF & PC 6 / 49


Learning to Beamform

Problem Formulation
Equally applicable to single-group multicast beamforming [Ntranos et al.
2009]

Nikos Sidiropoulos (University of Virginia) ML for Tx BMF & PC 7 / 49


Learning to Beamform

Challenges

Non-convex problem, NP–hard [Ntranos et al. 2009]


Approximate minimization via simple algorithms?
Only for specific cases [Ntranos et al. 2009]
Extension to general case requires computing cumbersome integrals

Who tells you the channel distribution?


Not available in practice!
Use data-driven approach instead?

Nikos Sidiropoulos (University of Virginia) ML for Tx BMF & PC 8 / 49


Learning to Beamform

Key Idea
Reformulate as stochastic optimization problem

T
(   )
H 2 1X
min Pr |w h| < γ = Eh [I{|wH h|2 <γ} ] ≈ I{|wH ht |2 <γ}
w∈W T
t=1

(
1, if f (x) < a
I{f (x)<a} = : Indicator function
0, otherwise
Interpretation: minimize total # outages over (“recent”) channel
“history” - very reasonable
Use stochastic approximation [Robbins-Monro 1951], [Shapiro et al. 2009]
Given most recent channel realization ht
Update w to minimize instantaneous cost function I{|wH ht |2 <γ}

Nikos Sidiropoulos (University of Virginia) ML for Tx BMF & PC 9 / 49


Learning to Beamform

Stochastic Approximation

Benefits:
Knowledge of channel distribution not required!
Online implementation
Low memory and computational footprint
Naturally robust to intermittent/stale feedback from the user
All channel vectors are statistically equivalent
Feedback requirements are considerably relaxed
Can also exploit feedback from “peer” users
“Collaborative Filtering/Beamforming”
Well suited for FDD systems
Can it work well for our non-convex, NP-hard problem?

Nikos Sidiropoulos (University of Virginia) ML for Tx BMF & PC 10 / 49


Learning to Beamform

Stochastic Approximation
Major roadblock:
Indicator function is non-convex, discontinuous
Proposed solution:
Approximate indicator function via smooth surrogates

Visualization of Surrogates
1
Indicator
PWM
0.9
Smoothed PWM
Sigmoid
0.8

0.7

0.6
f(w)

0.5

0.4

0.3

0.2

0.1

0
-3 -2 -1 0 1 2 3
Nikos Sidiropoulos (University of Virginia) w
ML for Tx BMF & PC 11 / 49
Learning to Beamform

Construction of smooth surrogates

Transformation to real domain:


Define w̃ := [<[w]T , =[w]T ]T ∈ R2N , h̃ := [<[h]T , =[h]T ]T ∈ R2N
Define  
<[h] =[h]
H̃ := ∈ R2N ×2
=[h] −<[h]

In terms of real variables


Indicator function f (w̃; h̃) := I{kH̃T w̃k2 <γ}
2

Constraint set W̃ ⊂ R2N

Nikos Sidiropoulos (University of Virginia) ML for Tx BMF & PC 12 / 49


Learning to Beamform

Construction of smooth surrogates

Sigmoidal Approximation:
1
u(w̃; h̃) :=
1 + exp (kH̃T w̃k22 − γ)

Continuously differentiable

Point-wise Max (PWM) Approximation:


( ) ( !)
kH̃T w̃k22 kH̃T w̃k22
v(w̃; h̃) := max 0, 1 − = max y 1 −
γ 0≤y≤1 γ

Non-differentiable!
Solution: Apply Nesterov’s smoothing trick [Nesterov 2005]

Nikos Sidiropoulos (University of Virginia) ML for Tx BMF & PC 13 / 49


Learning to Beamform

Construction of smooth surrogates


Smoothed Point-wise Max Approximation:
Define smoothing parameter µ ∈ R+
kH̃T w̃k22
Define g(w̃; h̃) := 1 −
γ
Consider the modified PWM function
n µ 2o
v (µ) (w̃; h̃) = max yg(w̃; h̃) − y
0≤y≤1 2


 0, g(w̃; h̃) < 0
 1 
 2
= 2µ g(w̃; h̃) , 0 ≤ g(w̃; h̃) ≤ µ
g(w̃; h̃) − µ ,


g(w̃; h̃) > µ

2
Continuously differentiable!
Furthermore,
µ
v (µ) (w̃; h̃) ≤ v(w̃; h̃) ≤ v (µ) (w̃; h̃) + , ∀ (w̃; h̃) [Nesterov 2005]
2
Nikos Sidiropoulos (University of Virginia) ML for Tx BMF & PC 14 / 49
Learning to Beamform

Putting it all together


Modified Problem(s):
 
min U (w̃) := Eh̃ [u(w̃; h̃)] [Sigmoidal Approx.]
w̃∈W̃
 
(µ) (µ)
min V (w̃) := Eh̃ [v (w̃; h̃)] [Smoothed PWM Approx.]
w̃∈W̃
Represent both via the problem

min Eξ [f (x; ξ)]


x∈X

X ⊂ Rd : convex, compact and simple


ξ : random vector drawn from unknown probability distribution with
support set Ξ ⊂ Rd
f (.; ξ) : non-convex, continuously differentiable
Minimize by sequentially processing stream of realizations {ξt }∞
t=0

Nikos Sidiropoulos (University of Virginia) ML for Tx BMF & PC 15 / 49


Learning to Beamform

Online Algorithms
Online Gradient Descent (OGD)
Given realization ξt , define ft (x) := f (x; ξt )
Update:
x(t+1) = ΠX (x(t) − αt ∇ft (x(t) )), ∀ t ∈ N
Online Variance Reduced Gradient (OVRG) [Frostig et al. 2015]
Streaming variant of SVRG [Johnson-Zhang 2013]
Proceeds in stages
At each stage s ∈ [S], define “centering variable” ys from last stage
“Anchor” OGD iterates to gradient of ys
Eξ [∇f (ys ; ξ)] is unavailable; form surrogate via mini-batching
1 X
ĝs := ∇fi (ys )
ks
i∈[ks ]

Update:
x(t+1)
s = ΠX (x(t) (t) (t)
s − αs (∇ft (xs ) − ∇ft (ys ) + ĝs )), ∀ t ∈ [T ]
(T +1)
Set ys+1 = xs
Nikos Sidiropoulos (University of Virginia) ML for Tx BMF & PC 16 / 49
Learning to Beamform

Convergence?

According to theory:
OGD:
(a.s.)convergence to stationary point with diminishing step-size rule
[Razaviyayn et al. 2016]
Requires f (; ξ) to have L Lipschitz continuous gradients
OVRG:
Only established for strongly convex with constant step-sizes f (; ξ)
[Frostig et al. 2015]
Extension to non-convex f (; ξ) currently an open problem
To go by the book (or not)?
OGD: hard to estimate L; estimates too conservative to work well in
practice
OVRG: non-trivial to establish convergence
Use empirically chosen step-sizes; work well in simulations

Nikos Sidiropoulos (University of Virginia) ML for Tx BMF & PC 17 / 49


Learning to Beamform

Baseline for comparison

Alternative approach:

min Pr[|wH h|2 < γ] ⇐⇒ max Pr[|wH h|2 ≥ γ]


w∈W w∈W

Ideally: Maximize lower bound of objective function


NP–hard to compute [Ntranos et al. 2009]
Construct lower bound using moment information [He-Wu 2015]
Entails solving non-trivial, non-convex problem
Not suitable for online approximation
Instead: Use Markov’s inequality to maximize upper bound [Ntranos et
al. 2009]
Pr[|wH h|2 ≥ γ] ≤ γ −1 wH Rw, ∀ w ∈ W

Nikos Sidiropoulos (University of Virginia) ML for Tx BMF & PC 18 / 49


Learning to Beamform

Baseline for comparison

Online Markov Approximation

max wH Rw
w∈W

Online solution:
Sum-power constraints: Oja’s Algorithm [Oja 1982]
(a.s.)convergence to optimal solution
Per-antenna constraints: Stochastic SUM [Razaviyayn et al. 2016]
(a.s.)convergence to stationary point

Nikos Sidiropoulos (University of Virginia) ML for Tx BMF & PC 19 / 49


Learning to Beamform

Simulations

Setup:
Algorithms: Sigmoid OGD & OVRG, PWM OGD & OVRG, Online
Markov Approximation (OM-App)
Step-sizes: Diminishing rule for OGD, constant for OVRG
Iteration Number: fix maximum gradient budget for all methods
Smoothing parameter for PWM µ = 10−3
For OVRG
Length of each stage: T = 1000
Mini-batch sizes: 
80,
 s=1
ks = 2ks−1 , ks < 640

640, otherwise

Constraints: Per-antenna (-6dbW per antenna)


Channels: GMM with 4 kernels
Equal mixture probabilities
Mean of each kernel modeled using different LOS component

Nikos Sidiropoulos (University of Virginia) ML for Tx BMF & PC 20 / 49


Learning to Beamform

Illustrative Example

OM-App
Sigmoid OGD
Sigmoid OVRG
P(outage)

PWM OGD
PWM OVRG

10 -2

20 40 60 80 100 120 140 160


# Gradients / Ks

Figure: N = 100, γ = 4, Ks = 200

Nikos Sidiropoulos (University of Virginia) ML for Tx BMF & PC 21 / 49


Learning to Beamform

Detailed Results

10 0

10 -1

10 -1
OM-App
Sigmoid OGD
P(outage)

P(outage)
10 -2 Sigmoid OVRG
PWM OGD
PWM OVRG

OM-App
10 -2 Sigmoid OGD
Sigmoid OVRG
PWM OGD
PWM OVRG
10 -3

10 -3
1 2 3 4 5 6 7 8 9 10 50 100 150 200
Threshold γ1/2 Number of Antennas

Nikos Sidiropoulos (University of Virginia) ML for Tx BMF & PC 22 / 49


Learning to Beamform

Intermezzo - take home points

Learning to beamform for minimum outage


No prior knowledge of distribution required at BS
Reformulate as stochastic optimization problem
Construct smooth surrogate of indicator function
Use simple stochastic approximation based algorithms based on user
feedback
Feedback can be intermittent/delayed/stale/from peer users
Works remarkably well in practice (problem is NP-hard even for
known channel distribution!)
Future work: Extension to general multi-user MIMO, better
theoretical understanding of WHY it works that well

Nikos Sidiropoulos (University of Virginia) ML for Tx BMF & PC 23 / 49


Learning to Beamform

Be bold!

Nikos Sidiropoulos (University of Virginia) ML for Tx BMF & PC 23 / 49


Learning to Beamform

Part II: Resource Management for Wireless Networks

Wireless Resource Management


Tx power allocation to optimize throughput.

T1 T2 T3
p1 p2 p3
h13

h11 h22 h33

R1 R2 R3

Nikos Sidiropoulos (University of Virginia) ML for Tx BMF & PC 24 / 49


Learning to Beamform

Example: Formulation

For each receiver k, signal to interference-plus-noise ratio (SINR)

|hkk |2 pk
SINR = P 2 2
j6=k |hkj | pj + σk

hij : elements of channel matrix H


pk : power allocated to k-th link (optimization variable)
σk2 : noise power at k-th receiver

Nikos Sidiropoulos (University of Virginia) ML for Tx BMF & PC 25 / 49


Learning to Beamform

Example: Formulation
Maximize weighted system throughput:

K
!
X |hkk |2 pk
max f (p; h) = αk log 1 + P 2 2
j6=k |hkj | pj + σk
p
k=1

s.t. 0 ≤ pk ≤ Pmax , ∀k = 1, 2, ..., K

αk : nonnegative weights
Pmax : max power allocated to each user
NP hard problem [Luo-Zhang 2008]
Lots of iterative algorithms in the literature deal with (generalized
versions of) this problem, e.g., SCALE [Papandriopoulos et al 09],
Pricing [Shi et al 08], WMMSE [Shi et al 11], BSUM [Hong et al 14];
See [Schmidt et al 13] for comparison of different algorithms

Nikos Sidiropoulos (University of Virginia) ML for Tx BMF & PC 26 / 49


Learning to Beamform

Introduction

Proposed Method: Learning to Optimize


ACCURATE
SLOW
Algorithm
ACCURATE
Optimized FAST
Problem
Instance Solution
ACCURATE Problem Learner Desired
Learner Instance τ (·; θ) Solution
τ (·; θ)
Error

minimize error by tuning θ

Figure: Training Stage Figure: Testing Stage

Nikos Sidiropoulos (University of Virginia) ML for Tx BMF & PC 27 / 49


Learning to Beamform

Literature Review

“Unfold” specific iterative algorithm


Gregor and LeCun (2010)
Iterative soft-thresholding algorithm (ISTA)
Gregor and LeCun (2010)
Coordinate descent algorithm (CD)
Sprechmann et al. (2013)
Alternating direction method of multipliers (ADMM)
Hershey et al. (2014)
Multiplicative updates for non-negative matrix factorization (NMF)
Drawbacks
No theoretical approximation guarantees
Can we use fewer layers to approximate more iterations?

Nikos Sidiropoulos (University of Virginia) ML for Tx BMF & PC 28 / 49


Learning to Beamform

Can we learn the entire algorithm?

Nikos Sidiropoulos (University of Virginia) ML for Tx BMF & PC 28 / 49


Learning to Beamform

Proposed Method
ACCURATE
SLOW
Algorithm
ACCURATE
Optimized FAST
Problem
Instance Solution
ACCURATE Problem Learner Desired
Learner Instance τ (·; θ) Solution
τ (·; θ)
Error

minimize error by tuning θ

Figure: Training Stage Figure: Testing Stage

Given lots of (h, x∗ ) pairs, learn the nonlinear “mapping” h → x∗


Questions:
How to choose “Learner”?
What kinds of algorithms can we accurately learn?
What’s the major benefit of such an approach?
Nikos Sidiropoulos (University of Virginia) ML for Tx BMF & PC 29 / 49
Proposed Method: Learning to Optimize

Deep Neural Network

max(·,0)

input layer multiple hidden layers output layer

Figure: deep neural network

Nikos Sidiropoulos (University of Virginia) ML for Tx BMF & PC 30 / 49


Proposed Method: Learning to Optimize

Deep Neural Network

Difficult to train [Glorot and Bengio (2010)]


The vanishing gradient problem
Traditional gradient descent not work

Recent advances
New initialization methods [Hinton et al. (2012)]
New training algorithms: ADAM [Kingma and Ba (2014)], RMSprop
[Hinton et al. (2012)], ...
New hardwares: CPU clusters, GPUs, TPUs, ...

DNN is more powerful than the traditional NN [Telgarsky (2016)]

To achieve the same accuracy as shallow neural network, DNN can be


exponentially faster in testing stage [Mhaskar et al. (2016)]

Nikos Sidiropoulos (University of Virginia) ML for Tx BMF & PC 31 / 49


Proposed Method: Learning to Optimize

Example: Approximate iterative algorithm

0.8

0.6 GD
DNN
0.4
x (optimal solution)
0.2

-0.2

-0.4

-0.6

-0.8
-0.5 0 0.5
h (problem parameter)

Figure: The learned model (the red line) with random initialization
Learning the mapping h → xT ?
Issue: Cannot learn the behavior of the algorithm well
Three layers DNN; 50 K training samples
Nikos Sidiropoulos (University of Virginia) ML for Tx BMF & PC 32 / 49
Proposed Method: Learning to Optimize

Example: Approximate iterative algorithm


Reason: Non-convexity results in multiple local solutions

Nikos Sidiropoulos (University of Virginia) ML for Tx BMF & PC 33 / 49


Proposed Method: Learning to Optimize

Example: Approximate iterative algorithm


Reason: Non-convexity results in multiple local solutions

Solution: Add init as features: Learn the mapping (x0 , h) → xT

Nikos Sidiropoulos (University of Virginia) ML for Tx BMF & PC 33 / 49


Proposed Method: Learning to Optimize

Example: Approximate iterative algorithm


Reason: Non-convexity results in multiple local solutions

Solution: Add init as features: Learn the mapping (x0 , h) → xT

Nikos Sidiropoulos (University of Virginia) ML for Tx BMF & PC 33 / 49


Proposed Method: Learning to Optimize

Example: Approximate iterative algorithm


Reason: Non-convexity results in multiple local solutions

Solution: Add init as features: Learn the mapping (x0 , h) → xT

The model learned in this way


0.8

0.6 GD
DNN
0.4
x (optimal solution)

0.2

-0.2

-0.4

-0.6

-0.8
-0.5 0 0.5
h (problem parameter)

Figure: The learned model (the red line)

Nikos Sidiropoulos (University of Virginia) ML for Tx BMF & PC 33 / 49


Proposed Method: Learning to Optimize

Universal approximation theorem for iterative algorithm

Theorem 1 [Sun et al 17]


Given a T iteration algorithm whose input output relationship is:

xT = gT (gT −1 (. . . g1 (g0 (x0 , h), h) . . . , h), h) , GT (x0 , h) (1)


where h is problem parameter; x0 is initialization; gk (xk−1 ; h) is a
continuous mapping, representing the algorithm at kth iteration

Nikos Sidiropoulos (University of Virginia) ML for Tx BMF & PC 34 / 49


Proposed Method: Learning to Optimize

Universal approximation theorem for iterative algorithm

Theorem 1 [Sun et al 17]


Given a T iteration algorithm whose input output relationship is:

xT = gT (gT −1 (. . . g1 (g0 (x0 , h), h) . . . , h), h) , GT (x0 , h) (1)


where h is problem parameter; x0 is initialization; gk (xk−1 ; h) is a
continuous mapping, representing the algorithm at kth iteration
Then for any  > 0, there exist a three-layer neural network
N ETN () (x0 , h) with N () nodes in the hidden layer such that

sup kN ETN () (x0 , h) − GT (x0 , h)k ≤ . (2)


(x0 ,h)∈X0 ×H

where H and initialization X0 are any compact sets.

Nikos Sidiropoulos (University of Virginia) ML for Tx BMF & PC 34 / 49


Proposed Method: Learning to Optimize

Universal approximation theorem for iterative algorithm

Theorem 1 [Sun et al 17]


Given a T iteration algorithm whose input output relationship is:

xT = gT (gT −1 (. . . g1 (g0 (x0 , h), h) . . . , h), h) , GT (x0 , h) (1)


where h is problem parameter; x0 is initialization; gk (xk−1 ; h) is a
continuous mapping, representing the algorithm at kth iteration
Then for any  > 0, there exist a three-layer neural network
N ETN () (x0 , h) with N () nodes in the hidden layer such that

sup kN ETN () (x0 , h) − GT (x0 , h)k ≤ . (2)


(x0 ,h)∈X0 ×H

where H and initialization X0 are any compact sets.


Extension of the classical result [Cybenko (1989)]

Nikos Sidiropoulos (University of Virginia) ML for Tx BMF & PC 34 / 49


Proposed Method: Learning to Optimize

Universal approximation theorem for iterative algorithm

Key point: It is possible to learn an iterative algorithm, represented


by the mapping (x0 , h) → xT
Assumptions on the algorithm:
For iterative algorithm:

xk+1 = gk (xk , h)

where h ∈ H is the problem parameter; xk , xk+1 ∈ X are the


optimization variables.
The function gk is a continuous mapping
X and H are compact sets

Nikos Sidiropoulos (University of Virginia) ML for Tx BMF & PC 35 / 49


Case Study: Resource Management for Wireless Networks

Case Study: Resource Management for Wireless Networks

Maximize weighted system throughput:

K
!
X |hkk |2 pk
max f (p; h) = αk log 1 + P 2 2
j6=k |hkj | pj + σk
p
k=1

s.t. 0 ≤ pk ≤ Pmax , ∀k = 1, 2, ..., K

αk : nonnegative weights
Pmax : max power allocated to each user

Nikos Sidiropoulos (University of Virginia) ML for Tx BMF & PC 36 / 49


Case Study: Resource Management for Wireless Networks

Case Study: Existing Methods

We will attempt to learn a popular method called Weighted Minimum


Mean Square Error (WMMSE) [Shi et al. (2011)]

Transform the problem into one with three sets of variables (v, u, w)

Optimize in a coordinate descent manner

Nikos Sidiropoulos (University of Virginia) ML for Tx BMF & PC 37 / 49


Case Study: Resource Management for Wireless Networks

Case Study: Existing Methods

input: H, {σk }, Pmax , output: {pk }



1. Initialize vk0 such that 0 ≤ vk0 ≤ Pmax , ∀ k;
|hkk |vk0 1
2. Initialize u0k = PK 0 2 2, wk0 = 1−u0k |hkk |vk0
, ∀ k;
j=1 |hkj |(vj ) +σk
3. repeat
 √Pmax
α wt−1 ut−1 |h |
4. Update vk : vkt = PK k k t−1 k t−1kk2 , ∀ k;
j=1 αj wj (uj ) |hjk |2
t
0
|h |v
5. Update uk : utk = PK kk k
2 t 2 2 , ∀ k;
j=1 |hkj | (vj ) +σk
t 1
6. Update wk : wk = 1−ut |h |vt , ∀ k;
P   P k kk k  
7. until K log w t − K
log w t−1
≤;

j=1 j j=1 j
8. output pk = (vk )2 , ∀ k;
Figure: Pseudo code of WMMSE for the scalar IC.

Nikos Sidiropoulos (University of Virginia) ML for Tx BMF & PC 38 / 49


Case Study: Resource Management for Wireless Networks

Case Study: Proposed Approach

fpik g
WMMSE

fH (i)g τ (·; θ)
τ (·; θ) pik g
f~ output
fH (i)g
pik g
f~

error

minimize error by tuning θ

Figure: Training Stage Figure: Testing Stage

Nikos Sidiropoulos (University of Virginia) ML for Tx BMF & PC 39 / 49


Case Study: Resource Management for Wireless Networks

Approximation of WMMSE by deep neural networks

Theorem 2 [Sun et al 17]


Suppose WMMSE is initialized with pk = Pmax , ∀k. Define
( K
)
X
H := h | Hmin ≤ |hjk | ≤ Hmax , ∀j, k vit (h) ≥ Pmin > 0, ∀t .
i=1

Given > 0,thereexists a neural network N ET


(h) consisting
  of
1 1 1 1
O T 2 log max K, Pmax , Hmax , , , + T log layers
σ Hmin Pmin 
     
1 1 1 1
O T 2 K 2 log max K, Pmax , Hmax , , , + T K 2 log
σ Hmin Pmin 
ReLUs and Binary units,

such that max max |(pTi (h))2 − N ET (h)i | ≤ 


h∈H i

Nikos Sidiropoulos (University of Virginia) ML for Tx BMF & PC 40 / 49


Case Study: Resource Management for Wireless Networks

IMAC - Data Generation

For problem with N base stations and total K users


Channels are generated according to 3GPP standards
Fix other values, i.e., Pmax = 1, σk = 1
Given tuple ({H̃ (i) }, Pmax , {σk }), run WMMSE get {pik }, ∀i, k
106 training samples (H (i) , {pik }), ∀i ∈ T
104 testing samples H (i) , ∀i ∈ V

Nikos Sidiropoulos (University of Virginia) ML for Tx BMF & PC 41 / 49


Case Study: Resource Management for Wireless Networks

IMAC - Training Stage

Training Deep Neural Network


We pick a three-hidden-layer DNN with 200-80-80 neurons
Implemented by Python 3.6.0 with TensorFlow 1.0.0
Training using two Nvidia K20 GPUs
Training is based on optimizing the loss function
X
min kτ (H (i) , θ) − {pik }k2
θ
i∈T

Nikos Sidiropoulos (University of Virginia) ML for Tx BMF & PC 42 / 49


Case Study: Resource Management for Wireless Networks

IMAC - Testing Stage

Testing Deep Neural Network


DNN appraoch: implemented by Python
WMMSE algorithm: implemented in C
Testing only using CPU
Objective function
K
!
X |hkk |2 pk
f= log 1 + P 2 2
k=1 j6=k |hkj | pj + σk

Evaluate ratio of the per-testing sample sum-rates

f (H (i) , {p̃ik }, {σk }) ⇒ DNN


, ∀i
f (H (i) , {pik }, {σk }) ⇒ WMMSE

Nikos Sidiropoulos (University of Virginia) ML for Tx BMF & PC 43 / 49


Case Study: Resource Management for Wireless Networks

IMAC - Larger Problem

2500 500
User User
2000 Base Station 400 Base Station

1500 300

1000 200

y axis position (meter)


y axis position (meter)

500 100

0 0

-500 -100

-1000 -200

-1500 -300

-2000 -400

-2500 -500
-2500 -2000 -1500 -1000 -500 0 500 1000 1500 2000 2500 -500 0 500
x axis position (meter) x axis position (meter)

Figure: radius = 500 m, MD = 0 m Figure: radius = 100 m, MD = 20 m


Figure: IMAC: N = 20, K = 80

Nikos Sidiropoulos (University of Virginia) ML for Tx BMF & PC 44 / 49


Case Study: Resource Management for Wireless Networks

IMAC - Results

1.0
WMMSE
800 DNN
0.8

cumulative probability
number of samples

600 0.6

400 0.4
WMMSE
200 DNN
0.2 Max Power
Random Power
0 0.0
0 10 20 30 40 0 10 20 30 40 50 60
sum-rate (bit/sec) sum-rate (bit/sec)

Figure: IMAC: N = 20, K = 80, radius = 100m

Nikos Sidiropoulos (University of Virginia) ML for Tx BMF & PC 45 / 49


Case Study: Resource Management for Wireless Networks

IMAC - Larger Problem

Table: Relative CPU Time and Sum-Rate for IMAC


network training sum-rate computational time
structure samples r=500m r=100m r=500m r=100m
200-200-200 2 million 98.44% 88.46% 0.7% 0.4%
200-200-200 1 million 97.03% 89.59% 0.7% 0.4%
200-80-80 2 millions 95.58% 87.44% 0.6% 0.5%
200-80-80 1 million 95.39% 86.70% 0.6% 0.3%
200-80-80 0.5 million 95.39% 85.35% 0.6% 0.3%
200-80-80 0.1 million 94.71% 81.28% 0.6% 0.3%

Key observations:
Increase training samples helps
Increase number of neurons helps

Nikos Sidiropoulos (University of Virginia) ML for Tx BMF & PC 46 / 49


Case Study: Resource Management for Wireless Networks

Problem Setup - VDSL channel


K direction channel K ∗ (K − 1) interference channel
Transmitter

Receiver

Figure: cast as a 28-user IC problem

Data collected by France Telecom R&D [Karipidis et al. (2005)]


Measured lengths: 75 meters, 150 meters, and 300 meters
far-end crosstalk (FEXT) vs. near-end crosstalk (NEXT)
Total of 6955 channel measurements

Nikos Sidiropoulos (University of Virginia) ML for Tx BMF & PC 47 / 49


Case Study: Resource Management for Wireless Networks

VDSL - Procedure & Results

6955 real data = 5000 validation + 1955 testing


50, 000 training: computer-generated following validation statistics
Same training and testing procedure

Table: Sum-Rate and Computational Performance for Measured VDSL Data


sum-rate computational time
(length, type) DNN/WMMSE DNN/WMMSE(C)
( 75, FEXT) 99.96% 42.18%
(150, FEXT) 99.43% 50.98%
(300, FEXT) 99.58% 57.78%
( 75, NEXT) 99.85% 3.16%
(150, NEXT) 98.31% 7.14%
(300, NEXT) 94.14% 5.52%

Nikos Sidiropoulos (University of Virginia) ML for Tx BMF & PC 48 / 49


Case Study: Resource Management for Wireless Networks

Recap, take home, road forward

Two very (NP-)hard problems: BMF for min outage; max sum rate
power control for multiuser interference channel
Boldly using ML (staples): SGD, DNN, ...
Some things we can prove, design currently an art, not difficult to
tune
As engineers, we have to appreciate opportunities, understand why
Updates
Lee et al, Deep Power Control: Transmit Power Control Scheme Based
on Convolutional Neural Network, IEEE Communications Letters
(2018) extend our approach, using sum rate for training in second
stage (can improve upon WMMSE);
de Kerret et al, Decentralized Deep Scheduling for Interference
Channels, arXiv:1711.00625 (2017) consider user scheduling for the IC
using multiple collaboratively trained DNNs

Nikos Sidiropoulos (University of Virginia) ML for Tx BMF & PC 49 / 49


Case Study: Resource Management for Wireless Networks

Thank You!

Paper: Y. Shi, A. Konar, N. D. Sidiropoulos, et al., “Learning to


Beamform for Minimum Outage”, in review.
Paper: H. Sun, X. Chen, Q. Shi, M. Hong, X. Fu, and N. D. Sidiropoulos,
“Learning to Optimize: Training Deep Neural Networks for Wireless
Resource Management”, https://arxiv.org/abs/1705.09412
Code: https://github.com/Haoran-S/TSP-DNN

Nikos Sidiropoulos (University of Virginia) ML for Tx BMF & PC 49 / 49


Case Study: Resource Management for Wireless Networks

References
F. Rashid-Farrokhi, L. Tassiulas, and K. J. R. Liu, “Joint optimal power control
and beamforming in wireless networks using antenna arrays,” IEEE Trans.
Commun., vol. 46, no. 10, pp. 1313-1324, Oct. 1998.
M. Bengtsson, and B. Ottersten, “Optimal and suboptimal transmit
beamforming,” in Handbook of Antennas in Wireless Communications, L. C.
Godara, Ed. Boca Raton, FL, USA: CRC Press, Aug. 2001, ch. 18.
M. J. Lopez, “Multiplexing, scheduling, and multicasting strategies for antenna
arrays in wireless networks,” Ph.D. dissertation, Elect. Eng. and Comp. Sci.
Dept., MIT, Cambridge, MA, 2002.
N. D. Sidiropoulos, T. Davidson, and Z.-Q. Luo, “Transmit beamforming for
physical-layer multicasting,” IEEE Trans. Signal Process., vol. 54, no. 6, pp.
2239–2251, June 2006.
E. Karipidis, N. D. Sidiropoulos, and Z.-Q. Luo, “Quality of service and
max-min-fair transmit beamforming to multiple co-channel multicast groups,”
IEEE Trans. Signal Process., vol. 56, no. 3, pp. 1268–1279, Mar. 2008.
G. Zheng, K.-K. Wong, and T.-S. Ng, “Robust linear MIMO in the downlink: A
worst-case optimization with ellipsoidal uncertainty regions,” EURASIP J. Adv.
Signal Process., vol. 2008, pp. 1-15, June 2008.
Nikos Sidiropoulos (University of Virginia) ML for Tx BMF & PC 49 / 49
Case Study: Resource Management for Wireless Networks

References
A. Tajer, N. Prasad, and X. Wang, “Robust linear precoder design for multi-cell
downlink transmission,” IEEE Trans. Signal Process., vol. 59, no. 1, pp. 235-251,
Jan. 2011.
E. Song, Q. Shi, M. Sanjabi, R.-Y. Sun, and Z.-Q. Luo, “Robust SINR constrained
MISO downlink beamforming: When is semidefinite programming relaxation
tight?,” EURASIP J. Wireless Commun. Netw., vol. 1, no. 1, pp. 1-11, Dec.
2012.
Y. Huang, D. P. Palomar, and S. Zhang, “Lorentz-positive maps and quadratic
matrix inequalities with applications to robust MISO transmit beamforming,” IEEE
Trans. Signal Process., vol. 61, no. 5, pp. 1121-1130, Mar. 2013.
W.-K. Ma, J. Pan, A. M.-C. So, and T.-H. Chang, “Unraveling the rank-one
solution mystery of robust MISO downlink transmit optimization: A verifiable
sufficient condition via a new duality result”, IEEE Trans. Signal Process., vol. 65,
no. 7, pp. 1909-1924, Apr. 2017.
Y. Xie, C. N. Georghiades, and A. Arapostathis, “Minimum outage probability
transmission with imperfect feedback for MISO fading channels,” IEEE Trans.
Wireless Commun., vol. 4, no. 3, pp. 1084–1091, May 2005.
Nikos Sidiropoulos (University of Virginia) ML for Tx BMF & PC 49 / 49
Case Study: Resource Management for Wireless Networks

References
S. A. Vorobyov, H. Chen, and A. B. Gershman, “On the relationship between
robust minimum variance beamformmers with probabilistic and worst-case
distortionless response constraints,” IEEE Trans. Signal Process., vol. 56, pp.
5719–5724, Nov. 2008.
V. Ntranos, N. D. Sidiropoulos, and L. Tassiulas, “On multicast beamforming for
minimum outage”, IEEE Trans. Wireless Commun., vol. 8, no. 6, pp. 3172–3181,
June 2009.
K.-Y. Wang, A. M.-C. So, T.-H. Chang, W.-K. Ma, and C.-Y. Chi, “Outage
constrained robust transmit optimization for multiuser MISO downlinks: Tractable
approximations by conic optimization,” IEEE Trans. Signal Process., vol. 62, no.
21, pp. 5690-5705, Sep. 2014.
X. He and Y.-C. Wu, “Tight probabilistic SINR constrained beamforming under
channel uncertainties,” IEEE Trans. Signal Process., vol. 63, no. 13, pp.
3490–3505, July 2015.
F. Sohrabi and T. N. Davidson, “Coordinate update algorithms for robust power
loading for the MU-MISO downlink with outage constraints,” IEEE Trans. Signal
Process., vol. 64, no. 11, pp. 2761-2773, June 2016.
H. Robbins and S. Monro, “A stochastic approximation method,” Ann. Math.
Statist., vol. 22, no. 3, pp. 400-407, Sep. 1951.
Nikos Sidiropoulos (University of Virginia) ML for Tx BMF & PC 49 / 49
Case Study: Resource Management for Wireless Networks

References

A. Shapiro, D. Dentcheva, and A. Ruszczynski, Lectures on stochastic


programming: Modeling and theory, SIAM, 2009.
Y. Nesterov, “Smooth minimization of non-smooth functions,” Math. Program.,
vol. 103, no. 1, pp 127–152, May 2005.
R. Frostig, R. Ge, S. M. Kakade, and A. Sidford, “Competing with the empirical
risk minimizer in a single pass”, in Proc. Conf. Learn. Theory, Paris, France, July
2015, pp. 728–763.
R. Johnson, and T. Zhang, “Accelerating stochastic gradient descent using
predictive variance reduction,” Adv. Neural Info. Process Syst., Lake Tahoe, CA,
USA, Dec. 2013, pp. 315–323.
E. Oja, “Simplified neuron model as a principal component analyzer”, J. Math.
Biology, vol. 15, no. 3, pp. 263–273, Nov. 1982.
M. Razaviyayn, M. Sanjabi, and Z.-Q. Luo, “A stochastic successive minimization
method for nonsmooth nonconvex optimization with applications to transceiver
design in wireless communication networks,” Math. Prog., vol. 157, no. 2, pp.
515-545, June 2016.

Nikos Sidiropoulos (University of Virginia) ML for Tx BMF & PC 49 / 49


Case Study: Resource Management for Wireless Networks

References
Gregor, Karol, and Yann LeCun. “Learning fast approximations of sparse
coding.” Proceedings of the 27th International Conference on Machine
Learning. 2010.
Daubechies, Ingrid, Michel Defrise, and Christine De Mol. “An iterative
thresholding algorithm for linear inverse problems with a sparsity constraint.”
Communications on pure and applied mathematics 57.11 (2004): 1413-1457.
Sprechmann, Pablo, et al. “Supervised sparse analysis and synthesis
operators.” Advances in Neural Information Processing Systems. 2013.
Hershey, John R., Jonathan Le Roux, and Felix Weninger. “Deep unfolding:
Model-based inspiration of novel deep architectures.” arXiv preprint
arXiv:1409.2574 (2014).
Andrychowicz, Marcin, et al. “Learning to learn by gradient descent by
gradient descent.” Advances in Neural Information Processing Systems.
2016.
Li, Ke, and Jitendra Malik. “Learning to optimize.” arXiv preprint
arXiv:1606.01885 (2016).
Liang, Shiyu, and R. Srikant. ”Why Deep Neural Networks for Function
Approximation?.” (2016).
Nikos Sidiropoulos (University of Virginia) ML for Tx BMF & PC 49 / 49
Case Study: Resource Management for Wireless Networks

References

Hinton, Geoffrey, et al. “Deep neural networks for acoustic modeling in


speech recognition: The shared views of four research groups.” IEEE Signal
Processing Magazine 29.6 (2012): 82-97.
Krizhevsky, Alex, Ilya Sutskever, and Geoffrey E. Hinton. “Imagenet
classification with deep convolutional neural networks.” Advances in neural
information processing systems. 2012.
Liu, Bing. “Sentiment analysis and opinion mining.” Synthesis lectures on
human language technologies 5.1 (2012): 1-167.
Cybenko, George. “Approximation by superpositions of a sigmoidal
function.” Mathematics of Control, Signals, and Systems (MCSS) 2.4
(1989): 303-314.
Sun, Haoran, et al. “Learning to optimize: Training deep neural networks for
wireless resource management.” arXiv preprint arXiv: 1705.09412 (2017).
Glorot, Xavier, and Yoshua Bengio. “Understanding the difficulty of training
deep feedforward neural networks.” Aistats. Vol. 9. 2010.

Nikos Sidiropoulos (University of Virginia) ML for Tx BMF & PC 49 / 49


Case Study: Resource Management for Wireless Networks

References
Kingma, Diederik, and Jimmy Ba. “Adam: A method for stochastic
optimization.” arXiv preprint arXiv:1412.6980 (2014).
Mhaskar, Hrushikesh, Qianli Liao, and Tomaso Poggio. “Learning functions:
When is deep better than shallow.” arXiv preprint arXiv:1603.00988 (2016).
Luo, Zhi-Quan, and Shuzhong Zhang. “Dynamic spectrum management:
Complexity and duality.” IEEE Journal of Selected Topics in Signal
Processing 2.1 (2008): 57-73.
Shi, Qingjiang, et al. “An iteratively weighted MMSE approach to
distributed sum-utility maximization for a MIMO interfering broadcast
channel.” IEEE Transactions on Signal Processing 59.9 (2011): 4331-4340.
Gjendemsj, Anders, et al. “Binary power control for sum rate maximization
over multiple interfering links.” IEEE Transactions on Wireless
Communications (2008).
Hinton, Geoffrey, NiRsh Srivastava, and Kevin Swersky. “Neural Networks
for Machine Learning Lecture 6a Overview of mini-batch gradient descent.”
(2012).
Nikos Sidiropoulos (University of Virginia) ML for Tx BMF & PC 49 / 49
Case Study: Resource Management for Wireless Networks

Interference Channel (IC) - Generalization


Issues:
Same number of users for both training and testing
In practice, what if K in testing is different from training?
Half-user simulation setup:
H 2 RK×K
Transmitter Transmitter

Input
Training Stage
K K
H 2 R 2 × 2 H 2 RK×K
0

0 0
Receiver Receiver
Algorithm DNN
Testing Stage
Nikos Sidiropoulos (University of Virginia) ML for Tx BMF & PC 49 / 49
Case Study: Resource Management for Wireless Networks

Interference Channel (IC) - Generalization

Half-user results

Table: Relative CPU Time and Sum-Rate for Gaussian IC half-user


sum-rate computational time
# of users (K) full-user half-user full-user half-user
10 97.92% 99.22% 0.32% 0.96%
20 92.65% 92.78% 0.16% 0.48%
30 85.66% 87.77% 0.12% 0.37%

Nikos Sidiropoulos (University of Virginia) ML for Tx BMF & PC 49 / 49

You might also like