Computer Networks - Congestion Control
Computer Networks - Congestion Control
Networks
ECE 5713
Congestion Control and Resource
Allocation
You are
here
3
Where we are now …
• Understand how to
– Build a network on one physical medium
– Connect networks together (with switches)
– Implement a reliable byte stream on Internet
– Implement a UDP/TCP connection/channel
– Address network heterogeneity
– Address global scale
– End-to-end issues and common protocols
• Today’s topic
– Congestion control
4
Congestion Control
Outline
Queuing Discipline
Reacting to Congestion
Avoiding Congestion
5
Air Travel
• Planning a vacation for your semester break ?
• Try a trip to scenic Northern Areas
• Its a mere 3 hops from CASE
Flight
Islamabad was Peshawer
Airport overbooked
CASE Sakardu
6
Congestion Control
• Reading: Peterson and Davie, Ch. 6
• Basics: problem, terminology,
approaches, metrics
• Solutions
– router-based: queuing disciplines
– host-based: TCP congestion control
• Congestion avoidance
– DECbit
– RED gateways
• Quality of service
7
Congestion Control Basics
• Problem
– demand for network resources can grow
beyond the resources available
– want to provide “fair” amount to each user
• Examples
– bandwidth between Islamabad and Peshawer
– bandwidth in a network link
– buffers in a queue
8
Congestion Control Basics
• Range of solutions
– congestion control: cure congestion when it happens
– congestion avoidance: stay away from congestion
– resource allocation: prevent congestion from occurring
• Model of network
– packet-switched internetwork (or network)
– connectionless flows (logical channels/connections)
between hosts
9
State Maintained in Routers
11
Finite Buffering Capacity per Output
Customer
Standard check-in lines service
a
angry mob eagerly awaiting
opportunity to address
underpaid customer service
representative
1x6 Switch
you are turned
12
away at door
Congestion Control Taxonomy
13
• Why is explicit per-flow congestion feedback from
routers rarely used in practice?
– too many sources to track
• millions of flows may fan in to one router
• can’t send feedback to all of them
– adds complexity to router
• need to track more state
• certainly can’t track state for all sources
– wastes bandwidth: network already congested, not the
time to generate more traffic
– can’t force the sources (hosts) to use feedback
• Sol: use stochastic methods to select fairly between
flows, or classify flows into categories at edge routers
14
Why consider rate control ?
• Buffer space (window size) is an intrinsic
physical quantity
• Can provide rate control with window control
• Only need estimate of RTT
• Answer:
– want rate control when granularity of averaging
must be smaller than RTT
15
Criticism of Resource Allocation
• Example: you have 10 Gbps total bandwidth
• Case 1: reserve whatever you want
– users’ line of thought: On average, I don’t need much
bandwidth, but when my personal Web crawler goes to
work, I need at least 100 Mbps, so I’ll reserve that much.
– result: 100 users consume all bandwidth, all others get 0
• Case 2: fair/equitable reservations
– 35,000 students + 5,000 faculty and staff
– each user gets 250 kbps, almost 5x a modem! (watch that
crawler crawl...)
16
Back to Air Travel Analogy
• Daily Islamabad to Peshawer flight, ~200 seats
• Case 1: reserve whatever you want
– 200 of us get seats. I’m Gold...are you ?
• Case 2: fair/equitable reservations
– 2,000 possible customers
– 0.099 seats per customer per flight
– disclaimer: the passenger assumes all risks and damages
related to unsuccessful reassembly in Peshawer !!!
17
Congestion Control Metrics
• How do we evaluate solutions ?
– effectiveness (shown as power in the figure below)
– fairness
18
Congestion Control Metrics:
What’s Fair ?
• Give flow A 1/3 of the link bandwidth ?
– globally “fair”
– A uses 2 links at 1/3 bandwidth
– B and C use 1 link at 2/3 bandwidth
• Give flow A 1/2 of the link bandwidth ?
– locally “fair” (fair on each link)
– also globally “fair”: maximizes minimum flow
19
Power Measure for a Sliding
Window
• Bottleneck link with capacity C
• Round-trip time RTT
20
Router-Based
Congestion Control Solution
Congestion Control
• Basics: problem, terminology,
approaches, metrics
• Solutions
– Router-based: queuing disciplines
– Host-based: TCP congestion control
• Congestion avoidance
– DECbit
– RED gateways
• Quality of service
22
Router Solutions: Queuing
Disciplines
• Router defines policies on each outgoing link
– Allocates buffer space:
Which packets are discarded?
– Allocates bandwidth:
Which packets are transmitted?
– Affects packet latency:
When are packets transmitted?
23
More Fairness Choices
• First in, first out (FIFO)
– fairness for latency
– minimizes per-packet delay
– bandwidth not considered (not good
for congestion)
• Fair queuing
– fairness for bandwidth
– provides equal bandwidths (possibly
weighted)
– delay not considered
24
Fair Queuing
• Logical round-robin on bits
– Equal-length packets: round-robin on packets
– Variable-length packets ?
• Idea
– Let Si denote accumulated service for flow i
– Serve the flow with lowest accumulated service
– On serving a packet of length P from flow i, update
Si = Si + P
25
Fair Queuing Example
15 10 10 A
10 15 20 10 15 20 10
B
10 20
C
20 15 SA 0 10 10 10 20 20 35 35
SB 0 0 20 20 20 20 20 30
SC 0 0 0 15 15 35 35 35
26
Fair Queuing Example
• Compare Si or Si + P ?
15 10 10 A
20 15 10 20 10 15 10
B
10 20
C
20 15 SA 0 10 10 20 20 20 35 35
SB 0 0 0 0 20 30 30 30
SC 0 0 15 15 15 15 15 35
Another detail: update counter at start or end of transmission ?
27
Fair Queuing
• Why is the suggested approach not quite adequate?
– flows can “save up” credit
– no transmission for long time (call it T)
– burst uses all bandwidth for up to time
T x flow’s share of link
• How might we fix this problem?
– don’t allow inactive flows to retain service rates below
that of any active flow
– i.e., after updating some flow’s Si
• for each flow j with no packets in its queue
• set Sj to the minimum Sk for all active flows k
(or 0 if no flows are active)
28
Fair Queuing Example
10 A
10 20 15 20 10
B
10 20
C
20 15 SA 0 10 10 15 20 30
SB 0 0 20 20 20 30
SC 0 0 0 15 35 35
29
Weighted Fair Queuing
30
Weighted Fair Queuing Example
15 10 10 A (1)
20 10 10 10 15 20 10
B (2)
10 10 20
20 15 C (1) S 0 10 10 10 20 20 20 20
A
SB 0 0 10 10 10 15 20 20
SC 0 0 0 15 15 15 15 35
31
Weighted Fair Queuing
• What makes up a flow for fair queuing
in the Internet ?
• Too many resources to have separate
queues/variables for host-to-host flows
• Scale down number of flows
• Typically just based on inputs
• e.g., share outgoing STS-12 between
incoming ISP’s
32
Fair Queuing in the Internet
10 10 10 A
10 10 10 10 10 10
B
10 10 10
STS-12
C SA 0 10 10 10 20 20 20
10 10 10
D
SB 0 0 10 10 10 20 20
Service-Level STS-4
Agreements
(SLAs) for
SC 0 0 0 10 10 10 20
STS-3
(155Mbps) SD 0 0 0 10 10 10 20
33
Congestion Control
• Basics: problem, terminology,
approaches, metrics
• Solutions
– Router-based: queuing disciplines
– Host-based: TCP congestion control
• Congestion avoidance
– DECbit
– RED gateways
• Quality of service
34
Host Solutions:
TCP Congestion Control
• Host has very little information
– assumes best-effort network
– acts independently of other hosts
• Host infers congestion
– from synchronization feedback
– e.g., dropped packet timeouts, duplicate ACK’s
– loss on wired lines rarely due to transmission error
• Host acts
– reduce transmission rate below congestion threshold
– continuously monitor network for signs of congestion
35
TCP Congestion Control
• Add notion of congestion window
• Effective sliding window is smaller of
– remote advertised window (flow control)
– local congestion window (congestion control)
• Changes in congestion window size
– slow increases to absorb new bandwidth
– quick decreases to eliminate congestion
36
TCP Congestion Control Strategy
• Self-clocking
– send data only when outstanding data ACK’d
– equivalent to send window limitation mentioned
• Growth
– add one maximum segment size (MSS) per
congestion window of data ACK’d
– It’s really done this way, at least in Linux:
see tcp_cong_avoid in tcp_input.c. Actually, every ack
for new data is treated as an MSS ACK’d… any
problems with that?
– known as additive increase
37
TCP Congestion Control Strategy
• Decrease
– cut window in half when timeout occurs
– in practice, set window = window /2
– known as multiplicative decrease
• Additive increase, multiplicative decrease
(AIMD)
38
TCP Congestion Control
• AIMD sawtooth trace
39
TCP Congestion Control
• AIMD fairness for two competing flows
– slope of 1 for additive increase
– proportional decrease (towards origin) for
multiplicative decrease
40
Initialization of Congestion Window
• Congestion window should start small
• Avoid congestion due to new connections
• Start at 1 MSS, reset to 1 MSS with each
timeout (note that timeouts are coarse-
grained, ~1/2 sec)
• Known as slow start
41
Initialization of Congestion Window
42
Initialization of Congestion Window
• Threshold value
– initially set to maximum window size
– set to 1/2 of current window on timeout
• In practice, increase congestion window
by one MSS for each ACK of new data
(or N bytes for N bytes)
43
TCP Congestion Control
• Coarse-grained timeouts lead to idle periods …
• Solution: fast retransmission
– send ACK for each segment received
– when duplicate ACK’s received
• resend lost segment immediately
• do not wait for timeout
• in practice, retransmit on 3rd duplicate
– fast recovery
• when fast retransmission occurs, skip slow start
• congestion window becomes 1/2 previous
• start additive increase immediately
44
TCP Congestion Window Trace
45
Congestion Avoidance
Flight Planning for Your Air Travel
• Planning a vacation for your semester break ?
• Trying for a trip to scenic Northern Areas
• No way to go (route is too congested)
• Delay your program till next semester
Flights
Islamabad are Peshawer
Airport FULL
CASE Sakardu
47
Congestion Avoidance
• Control vs. avoidance
– control: minimize impact of congestion when it occurs
– avoidance: avoid producing congestion
• In terms of operating point limits:
48
Congestion Control
• Basics: problem, terminology,
approaches, metrics
• Solutions
– Router-based: queuing disciplines
– Host-based: TCP congestion control
• Router-based congestion avoidance
– DECbit
– RED gateways
• Quality of service
49
Router-based
Congestion Avoidance
DECbit: Destination Experiencing
Congestion bit
• Developed for the Digital Network Architecture
• Basic idea
– one bit allocated in packet header
– any router experiencing congestion sets bit
– destination returns bit to source
– source adjusts rate based on bits
• Note that responsibility is shared
– routers identify congestion
– hosts act to avoid congestion
51
DECbit (continued)
• Routers
– calculate average queue length
– averaging interval spans last busy+idle cycles
• busy: queue is non-empty
• idle: queue is empty
– set DECbit when average queue length >= 1
• Why 1?
– maximizes power function
– smaller values result in more idle time
– larger values result in more queuing delay
52
DECbit – Average Queue Length
53
DECbit (continued)
• Hosts
– count number of marked packets in last congestion
window of packets
– increase congestion window by 1 if < 50% of packets
marked
– decrease congestion window by factor of 7/8 if >= 50%
of packets marked
• Why 50%?
– maximizes power function
• Resurfaced as Explicit Congestion Notification
(ECN) in last few years; proposed TCP extension
54
Congestion Control
• Basics: problem, terminology,
approaches, metrics
• Solutions
– Router-based: queuing disciplines
– Host-based: TCP congestion control
• Router-based congestion avoidance
– DECbit
– RED gateways
• Quality of service
55
Random Early Detection
(RED) Gateways
• Developed for use with TCP
• Basic idea
– implicit rather than explicit notification
– when a router is “almost” congested
– drop packets randomly
• Responsibility is again shared
– router identifies, host acts
– relies on TCP’s response to dropped packets
56
Weighted Running Average Queue
length
57
Random Early Detection (RED)
Gateways
• Hosts
– implement TCP congestion control
– back off when a packet is dropped
• Routers calculate average queue length (as
exponential moving average)
length = (A) measurement + (1 - A) length
• Routers act based on average queue length
– below minimum length, keep new packets
– above maximum length, drop new packets
– between the two, drop randomly
58
RED Thresholds on a FIFO Queue
59
Drop Probability Function for RED
60
Random Early Detection (RED)
Gateways
61
Random Early Detection (RED)
Gateways
• Router calculates
– base probability for drop:
base = max_base* (length - min)/(max -
min)
62
Random Early Detection (RED)
Gateways
• Parameter values
– max_base is typically 0.02 (drops roughly 1 in 50
packets at midpoint)
– min (threshold) is typically max/2
• Choosing parameters
– carefully tuned to maximize power function
– confirmed through simulation
– but answer depends on accuracy of traffic model
– may create oscillating behavior in the Internet
63
Tuning the RED Gateways
• Drops roughly in proportion to flow bandwidths
• Queue size changes
– due to difference in input/output rates
– not related absolute magnitude of those rates
• min must allow reasonable link utilization
• Difference between min and max thresholds
– large enough to absorb burstiness
– in practice, can use max = 2 * min
• Use penalty box for offenders!
64
Host-based Congestion Avoidance
TCP Vegas – Basic Idea
• Watch for signs of queue growth
• In particular, difference between
– increasing congestion window
– stable throughput (presumably at max. capacity)
• Keep just enough “extra data” in the network
– time to react if bandwidth decreases
– data available if bandwidth increases
66
Trace of TCP Vegas Congestion
Avoidance Mechanism
67
TCP Vegas - Implementation
• Estimate uncongested RTT, baseRTT, as
minimum measured RTT
• Calculate expected throughput as congestion
window / baseRTT
• Measure throughput each RTT
– mark time of sending distinguished packet
– calculate data sent between send time and receipt of
ACK
68
TCP Vegas - Implementation
• Act to keep the difference between estimated and
actual throughput in a specified range
– below minimum threshold, increase congestion window
– above maximum threshold, decrease congestion
window
• Additive decrease used only to avoid congestion
• Want between 1 and 3 packets of extra data (used
to pick min/max thresholds)
69
Congestion Control & Resource
Allocation
• Basics: problem, terminology,
approaches, metrics
• Solutions
– Router-based: queuing disciplines
– Host-based: TCP congestion control
• Router-based congestion avoidance
– DECbit
– RED gateways
• Quality of service
70
Quality of Service
Outline
Realtime Applications
Integrated Services
Differentiated Services
71
How “good” are late data and low-throughput channels?
72
• High-throughput allows new classes of
applications
– some are sensitive to network delay, e.g., voice, video
– called real-time applications
– control applications (e.g., factories)
• more an issue of possible starvation (see below)
• but fit within QoS framework
• How to achieve timely delivery
– when actual RTT small (< 2/3) relative to acceptable
delay, retransmit
– when base RTT (no queuing delay) large (> 2) relative
to acceptable delay, impossible
– otherwise possible, but not through retransmission
73
Quality of Service
• Within the United States, for example
– base RTT (no queueing delay) peaks around 75 msec
– actual RTT is often 10-100 msec
• Humans notice about 50 msec delay for voice
• When in comparable regime, retransmission
cannot satisfy, so...
– use erasure codes across packets (e.g. FEC) , or
– support delay preferences in the network; called quality
of service, or QoS
74
QoS Granularity
• Fine-grained: guarantees for individual flows,
used in RSVP and ATM virtual circuits between
applications
• Coarse-grained: guarantees for flow, used in
DIFFSERV and ATM VC’s between routers
• IETF is standardizing extensions …
75
Real Time Applications
• Require “deliver on time” assurances
– must come from inside the network
Packet
generation Playback
Network Buffer
delay
Time
78
Example Distribution of Delays
90% 97% 98% 99%
3
Packets (%)
79
Taxonomy of Applications
80
Taxonomy
Applications
Delay- Rate-
adaptive adaptive
81
Elastic Applications
82
Real-time Applications
• Intolerant to loss
• Tolerant to loss
– non-adaptive: quality fixed (sample delay and
throughput, then assume worst-case variation)
– adaptive: adjust quality to current parameters
• delay adaptive: adjust to achieved delay
• rate adaptive: adjust to achieved throughput
83
Integrated Services :
Resource Reservation Protocol
(RSVP)
84
Integrated Services
• Service Classes
– guaranteed
– controlled-load
• Mechanisms
– signalling protocol
– admission control
– policing
– packet scheduling
85
Three Classes of Service
86
Mechanisms to Support Integrated
Services
• Flow specification: delay requirements for flow
• Admission control: network decision to support
flow
• Resource reservation: protocol for exchanging
flowspecs, performing admission control, etc.
• Packet classification: mapping packets to flows
• Packet scheduling: forwarding policy
87
Integrated Services Example
88
Flow Specification Components
• RSpec: describes service requested from network
– controlled-delay: level of delay required
– guaranteed/predictive: delay target
• TSpec: describes flow’s traffic characterization
– characterized by token bucket filter
– average throughput r
– maximum buffering requirement B
89
Flowspec
• Rspec: describes service requested from network
– controlled-load: none
– guaranteed: delay target
• Tspec: describes flow’s traffic characteristics
– average bandwidth + burstiness: token bucket filter
– token rate r
– bucket depth B
– must have a token to send a byte
– must have n tokens to send n bytes
– start with no tokens
– accumulate tokens at rate of r per second
– can accumulate no more than B tokens
90
Flows with Equal Average Rates but
Different Token Bucket Descriptions
91
Unrealistic Expectations …
92
Token Bucket Filters
r tokens/sec
Token Bucket,
Each byte needs a
Capacity B
token in order to
pass
Data
97
Reservation Protocol
• Called signaling in ATM
• Proposed Internet standard: RSVP
– consistent with robustness of today’s connectionless
model
– uses soft state (refresh periodically)
– designed to support multicast (can specify number of
speakers)
– receiver-oriented
• RSVP uses two messages
– PATH transmitted by source every 30 sec
– destination responds with RESV message
• Requirements must be merged for multicast
98
99
RSVP Example
Sender 1
PATH
Sender 2 R
R
PATH
RESV
(merged) R
RESV
R Receiver A
R RESV
Receiver B
100
RSVP – Receiver Oriented Layered
Multicast
• RSVP addresses disparate delay requirements
• Can different rates be supported?
101
RSVP - Solution
102
Integrated Services –
Scalability Issue
103
Differentiated Services
• Problem with IntServ: scalability
• Goal: use small number of classes to provide
scalable solution
• Idea: support two classes of packets
– Premium (like first-class)
– best-effort, regular (like bulk mail)
• Diffserv proposes 6 bits of IP ToS field (64
classes)
104
Differentiated Services Questions
105
Differentiated Services
• Mechanisms
– packets: ‘in’ and ‘out’ bit P(drop)
MaxP
AvgLen
Min out Min in Max out Max in
106
Differentiated Services
• Expedited forwarding
– Per-hop behavior
– must strictly limit use
– mechanisms
• strict priority
• weighted fair queueing (WFQ) with large weights
for expedited forwarding
107
Differentiated Services
• Assured forwarding
– per-hop behavior
– like RED but with “in” and “out” packets (RIO)
– does not reorder packets
– weighted RED generalizes to >2 classes
• Edge routes can mark packets as “in” or “out”
108
QoS in ATM
• Similar to RSVP
• Five service classes
– Constant bit rate (CBR)
– Variable bit rate (VBR) – real-time
– Variable bit rate (VBR) – non-real-time
– Unspecified bit rate (UBR) – like best effort
– Available bit rate (ABR) – like best effort + ATM’s
congestion control
109
RSVP versus ATM (Q.2931)
• RSVP
– receiver generates reservation
– soft state (refresh/timeout)
– separate from route establishment
– QoS can change dynamically
– supports receiver heterogeneity
• ATM
– sender generates connection request
– hard state (explicit delete)
– concurrent with route establishment
– QoS is static for life of connection (except ABR)
– uniform QoS to all receivers
110