IP Quality of Service PDF
IP Quality of Service PDF
IP Quality of Service
Srinivas Vegesna
Cisco Press
201 W 103rd Street
Indianapolis, IN 46290 USA
1163_FM.fm Page ii Thursday, December 14, 2000 1:46 PM
ii
IP Quality of Service
Srinivas Vegesna
Copyright 2001 Cisco Press
Cisco Press logo is a trademark of Cisco Systems, Inc.
Published by:
Cisco Press
201 West 103rd Street
Indianapolis, IN 46290 USA
All rights reserved. No part of this book may be reproduced or transmitted in any form or by any means,
electronic or mechanical, including photocopying, recording, or by any information storage and retrieval
system, without written permission from the publisher, except for the inclusion of brief quotations in a
review.
Printed in the United States of America 1 2 3 4 5 6 7 8 9 0 04 03 02 01
First Printing December 2000
Library of Congress Cataloging-in-Publication Number: 98-86710
ISBN: 1-57870-116-3
Trademark Acknowledgments
All terms mentioned in this book that are known to be trademarks or service marks have been appropriately
capitalized. Cisco Press or Cisco Systems, Inc., cannot attest to the accuracy of this information. Use of a
term in this book should not be regarded as affecting the validity of any trademark or service mark.
Feedback Information
At Cisco Press, our goal is to create in-depth technical books of the highest quality and value. Each book is
crafted with care and precision, undergoing rigorous development that involves the unique expertise of
members from the professional technical community.
Readers feedback is a natural continuation of this process. If you have any comments regarding how we could
improve the quality of this book, or otherwise alter it to better suit your needs, you can contact us through
e-mail at [email protected]. Please make sure to include the book title and ISBN in your message.
We greatly appreciate your assistance.
1163_FM.fm Page iii Thursday, December 14, 2000 1:46 PM
iii
Cisco Systems has more than 200 offices in the following countries. Addresses, phone numbers, and fax numbers are listed on
the Cisco Web site at www.cisco.com/go/offices
Argentina Australia Austria Belgium Brazil Bulgaria Canada Chile China Colombia Costa
Rica Croatia Czech Republic Denmark Dubai, UAE Finland France Germany Greece Hong
Kong Hungary India Indonesia Ireland Israel Italy Japan Korea Luxembourg Malaysia
Mexico The Netherlands New Zealand Norway Peru Philippines Poland Portugal Puerto Rico
Romania Russia Saudi Arabia Scotland Singapore Slovakia Slovenia South Africa Spain Sweden
Switzerland Taiwan Thailand Turkey Ukraine United Kingdom United States Venezuela Vietnam
Zimbabwe
Copyright 2000, Cisco Systems, Inc. All rights reserved. Access Registrar, AccessPath, Are You Ready, ATM Director, Browse with Me, CCDA, CCDE, CCDP, CCIE, CCNA,
CCNP, CCSI, CD-PAC, CiscoLink, the Cisco NetWorks logo, the Cisco Powered Network logo, Cisco Systems Networking Academy, Fast Step, FireRunner, Follow Me Browsing,
FormShare, GigaStack, IGX, Intelligence in the Optical Core, Internet Quotient, IP/VC, iQ Breakthrough, iQ Expertise, iQ FastTrack, iQuick Study, iQ Readiness Scorecard, The iQ
Logo, Kernel Proxy, MGX, Natural Network Viewer, Network Registrar, the Networkers logo, Packet, PIX, Point and Click Internetworking, Policy Builder, RateMUX, ReyMaster,
ReyView, ScriptShare, Secure Script, Shop with Me, SlideCast, SMARTnet, SVX, TrafficDirector, TransPath, VlanDirector, Voice LAN, Wavelength Router, Workgroup Director,
and Workgroup Stack are trademarks of Cisco Systems, Inc.; Changing the Way We Work, Live, Play, and Learn, Empowering the Internet Generation, are service marks of Cisco
Systems, Inc.; and Aironet, ASIST, BPX, Catalyst, Cisco, the Cisco Certified Internetwork Expert Logo, Cisco IOS, the Cisco IOS logo, Cisco Press, Cisco Systems, Cisco Systems
Capital, the Cisco Systems logo, Collision Free, Enterprise/Solver, EtherChannel, EtherSwitch, FastHub, FastLink, FastPAD, IOS, IP/TV, IPX, LightStream, LightSwitch, MICA,
NetRanger, Post-Routing, Pre-Routing, Registrar, StrataView Plus, Stratm, SwitchProbe, TeleRouter, are registered trademarks of Cisco Systems, Inc. or its affiliates in the U.S. and
certain other countries.
All other brands, names, or trademarks mentioned in this document or Web site are the property of their respective owners. The use of the word partner does not imply a partnership
relationship between Cisco and any other company. (0010R)
1163_FM.fm Page iv Thursday, December 14, 2000 1:46 PM
iv
Acknowledgments
I would like to thank all my friends and colleagues at Cisco Systems for a stimulating work environment for
the last six years. I value the many technical discussions we had in the internal e-mail aliases and hallway
conversations. My special thanks go to the technical reviewers of the book, Sanjay Kalra and Vijay
Bollapragada, and the development editors of the book, Kitty Jarrett and Allison Johnson. Their input has
considerably enhanced the presentation and content in the book. I would like to thank Mosaddaq Turabi for
his thoughts on the subject and interest in the book. I would also like to remember a special colleague and
friend at Cisco, Kevin Hu, who passed away in 1995. Kevin and I started at Cisco the same day and worked
as a team for the one year I knew him. He was truly an all-round person.
Finally, the book wouldnt have been possible without the support and patience of my family. I would like to
express my deep gratitude and love for my wife, Latha, for the understanding all along the course of the
book. I would also like to thank my brother, Srihari, for being a great brother and a friend. A very special
thanks goes to my two-year old son, Akshay, for his bright smile and cute words and my newborn son,
Karthik for his innocent looks and sweet nothings.
Dedication
To my parents, Venkatapathi Raju and Kasturi.
1163_FM.fm Page vi Thursday, December 14, 2000 1:46 PM
vi
vii
Contents at a Glance
Part I IP QoS 3
Chapter 1 Introducing IP Quality of Service 5
Chapter 2 Differentiated Services Architecture 21
Chapter 3 Network Boundary Traffic Conditioners: Packet Classifier, Marker, and Traffic
Rate Management 33
Chapter 4 Per-Hop Behavior: Resource Allocation I 67
Chapter 5 Per-Hop Behavior: Resource Allocation II 105
Chapter 6 Per-Hop Behavior: Congestion Avoidance and Packet Drop Policy 127
Chapter 7 Integrated Services: RSVP 147
Index 327
1163_FM.fm Page viii Thursday, December 14, 2000 1:46 PM
viii
Table of Contents
Part I IP QoS 3
Chapter 1 Introducing IP Quality of Service 5
Levels of QoS 6
IP QoS History 8
Performance Measures 9
Bandwidth 10
Packet Delay and Jitter 10
Packet Loss 11
QoS Functions 12
Packet Classifier and Marker 12
Traffic Rate Management 12
Resource Allocation 12
Congestion Avoidance and Packet Drop Policy 13
QoS Signaling Protocol 13
Switching 13
Routing 13
Layer 2 QoS Technologies 14
Multiprotocol Label Switching 14
End-to-End QoS 15
Objectives 15
Audience 16
Scope and Limitations 16
Organization 17
Part I 17
Part II 17
Part III 18
Part IV 18
References 18
Chapter 2 Differentiated Services Architecture 21
Intserv Architecture 21
Diffserv Architecture 22
Network Boundary Traffic Conditioners 25
PHB 26
1163_FM.fm Page ix Thursday, December 14, 2000 1:46 PM
ix
xi
xii
xiii
Case Study 8-10: Mapping Between Frame Relay DE Bits and IP Precedence Bits 198
Case 8-11: Frame Relay Fragmentation 199
The IEEE 802.3 Family of LANs 200
Expedited Traffic Capability 200
Summary 206
Frequently Asked Questions 207
References 208
Chapter 9 QoS in MPLS-Based Networks 211
MPLS 211
Forwarding Component 212
Control Component 213
Label Encapsulation 216
MPLS with ATM 218
Case Study 9-1: Downstream Label Distribution 219
MPLS QoS 223
End-to-End IP QoS 225
Case Study 9-2: MPLS CoS 225
LER 227
LSR 227
MPLS VPN 227
Case Study 9-3: MPLS VPN 229
MPLS VPN QoS 237
Differentiated MPLS VPN QoS 237
Guaranteed QoS 238
RSVP at VPN Sites Only 239
RSVP at VPN Sites and Diff-Serv Across the Service Provider Backbone 240
End-to-End Guaranteed Bandwidth 240
Case Study 9-4: MPLS VPN QoS 240
Summary 241
Frequently Asked Questions 242
References 242
1163_FM.fm Page xiv Thursday, December 14, 2000 1:46 PM
xiv
xv
xvi
Index 327
1163_FM.fm Page xvii Thursday, December 14, 2000 1:46 PM
1163_09.book Page 2 Thursday, December 14, 2000 1:47 PM
1163_09.book Page 3 Thursday, December 14, 2000 1:47 PM
PART
I
IP QoS
Chapter 1 Introducing IP Quality of Service
CHAPTER
1
of service requirements to be met by the network in transporting a flow. QoS provides end-
to-end service guarantees and policy-based control of an IP networks performance
measures, such as resource allocation, switching, routing, packet scheduling, and packet
drop mechanisms.
The following are some main IP QoS benefits:
It enables networks to support existing and emerging multimedia service/application
requirements. New applications such as Voice over IP (VoIP) have specific QoS
requirements from the network.
It gives the network operator control of network resources and their usage.
It provides service guarantees and traffic differentiation across the network. It is
required to converge voice, video, and data traffic to be carried on a single IP network.
It enables service providers to offer premium services along with the present best-
effort Class of Service (CoS). A provider could rate its premium services to customers
as Platinum, Gold, and Silver, for example, and configure the network to differentiate
the traffic from the various classes accordingly.
It enables application-aware networking, in which a network services its packets
based on their application information within the packet headers.
It plays an essential role in new network service offerings such as Virtual Private
Networks (VPNs).
Levels of QoS
Traffic in a network is made up of flows originated by a variety of applications on end
stations. These applications differ in their service and performance requirements. Any
flows requirements depend inherently on the application it belongs to. Hence, under-
standing the application types is key to understanding the different service needs of flows
within a network.
The networks capability to deliver service needed by specific network applications with
some level of control over performance measuresthat is, bandwidth, delay/jitter, and
lossis categorized into three service levels:
Best-effort serviceBasic connectivity with no guarantee as to whether or when a
packet is delivered to the destination, although a packet is usually dropped only when
the router input or output buffer queues are exhausted.
Best-effort service is not really a part of QoS because no service or delivery
guarantees are made in forwarding best-effort traffic. This is the only service the
Internet offers today.
1163_09.book Page 7 Thursday, December 14, 2000 1:47 PM
Levels of QoS 7
Most data applications, such as File Transfer Protocol (FTP), work correctly with
best-effort service, albeit with degraded performance. To function well, all
applications require certain network resource allocations in terms of bandwidth,
delay, and minimal packet loss.
Differentiated serviceIn differentiated service, traffic is grouped into classes based
on their service requirements. Each traffic class is differentiated by the network and
serviced according to the configured QOS mechanisms for the class. This scheme for
delivering QOS is often referred to as COS.
Note that differentiated service doesn't give service guarantees per se. It only
differentiates traffic and allows a preferential treatment of one traffic class over the
other. For this reason, this service is also referred as soft QOS.
This QoS scheme works well for bandwidth-intensive data applications. It is
important that network control traffic is differentiated from the rest of the data traffic
and prioritized so as to ensure basic network connectivity all the time.
Guaranteed serviceA service that requires network resource reservation to ensure
that the network meets a traffic flows specific service requirements.
Guaranteed service requires prior network resource reservation over the connection
path. Guaranteed service also is referred to as hard QoS because it requires rigid
guarantees from the network.
Path reservations with a granularity of a single flow dont scale over the Internet
backbone, which services thousands of flows at any given time. Aggregate
reservations, however, which call for only a minimum state of information in the
Internet core routers, should be a scalable means of offering this service.
Applications requiring such service include multimedia applications such as audio
and video. Interactive voice applications over the Internet need to limit latency to 100
ms to meet human ergonomic needs. This latency also is acceptable to a large
spectrum of multimedia applications. Internet telephony needs at a minimum an
8-Kbps bandwidth and a 100-ms round-trip delay. The network needs to reserve
resources to be able to meet such guaranteed service requirements.
Layer 2 QoS refers to all the QoS mechanisms that either are targeted for or exist in the
various link layer technologies. Chapter 8, Layer 2 QoS: Interworking with IP QoS,
covers Layer 2 QoS. Layer 3 QoS refers to QoS functions at the network layer, which is IP.
Table 1-1 outlines the three service levels and their related enabling QoS functions at
Layers 2 and 3. These QoS functions are discussed in detail in the rest of this book.
1163_09.book Page 8 Thursday, December 14, 2000 1:47 PM
IP QoS History
IP QoS is not an afterthought. The Internets founding fathers envisioned this need and
provisioned a Type of Service (ToS) byte in the IP header to facilitate QoS as part of the
initial IP specification. It described the purpose of the ToS byte as follows:
The Type of Service provides an indication of the abstract parameters of the quality
of service desired. These parameters are to be used to guide the selection of the actual
service parameters when transmitting a datagram through the particular network.1
Until the late 1980s, the Internet was still within its academic roots and had limited
applications and traffic running over it. Hence, ToS support wasnt necessarily important,
and almost all IP implementations ignored the ToS byte. IP applications didnt specifically
mark the ToS byte, nor did routers use it to affect the forwarding treatment given to an IP
packet.
The importance of QoS over the Internet has grown with its evolution from its academic
roots to its present commercial and popular stage. The Internet is based on a connectionless
end-to-end packet service, which traditionally provided best-effort means of data
transportation using the Transmission Control Protocol/Internet Protocol (TCP/IP) Suite.
Although the connectionless design gives the Internet its flexibility and robustness, its
packet dynamics also make it prone to congestion problems, especially at routers that
connect networks of widely different bandwidths. The congestion collapse problem was
discussed by John Nagle during the Internets early growth phase in the mid-1980s2.
The initial QoS function set was for Internet hosts. One major problem with expensive
wide-area network (WAN) links is the excessive overhead due to small Transmission
Control Protocol (TCP) packets created by applications such as telnet and rlogin. The Nagle
1163_09.book Page 9 Thursday, December 14, 2000 1:47 PM
Performance Measures 9
algorithm, which solves this issue, is now supported by all IP host implementations3. The
Nagle algorithm heralded the beginning of Internet QoS-based functionality in IP.
In 1986, Van Jacobson developed the next set of Internet QoS tools, the congestion
avoidance mechanisms for end systems that are now required in TCP implementations.
These mechanismsslow start and congestion avoidancehave helped greatly in
preventing a congestion collapse of the present-day Internet. They primarily make the TCP
flows responsive to the congestion signals (dropped packets) within the network. Two
additional mechanismsfast retransmit and fast recoverywere added in 1990 to provide
optimal performance during periods of packet loss4.
Though QoS mechanisms in end systems are essential, they didnt complete the end-to-end
QoS story until adequate mechanisms were provided within routers to transport traffic
between end systems. Hence, around 1990 QoSs focus was on routers. Routers, which are
limited to only first-in, first-out (FIFO) scheduling, dont offer a mechanism to differentiate
or prioritize traffic within the packet-scheduling algorithm. FIFO queuing causes tail drops
and doesnt protect well-behaving flows from misbehaving flows. WFQ, a packet
scheduling algorithm5, and WRED, a queue management algorithm6, are widely accepted
to fill this gap in the Internet backbone.
Internet QoS development continued with standardization efforts in delivering end-to-end
QoS over the Internet. The Integrated Services (intserv) Internet Engineering Task Force
(IETF) Working Group7 aims to provide the means for applications to express end-to-end
resource requirements with support mechanisms in routers and subnet technologies. RSVP
is the signaling protocol for this purpose. The Intserv model requires per-flow states along
the path of the connection, which doesnt scale in the Internet backbones, where thousands
of flows exist at any time. Chapter 7, Integrated Services: RSVP, provides a discussion
on RSVP and the intserv service types.
The IP ToS byte hasnt been used much in the past, but it is increasingly used lately as a
way to signal QoS. The ToS byte is emerging as the primary mechanism for delivering
diffserv over the Internet, and for this purpose, the IETF differentiated services (diffserv)
Working Group8 is working on standardizing its use as a diffserv byte. Chapter 2,
Differentiated Services Architecture, discusses the diffserv architecture in detail.
Performance Measures
QoS deployment intends to provide a connection with certain performance bounds from the
network. Bandwidth, packet delay and jitter, and packet loss are the common measures used
to characterize a connections performance within a network. They are described in the
following sections.
1163_09.book Page 10 Thursday, December 14, 2000 1:47 PM
Bandwidth
The term bandwidth is used to describe the rated throughput capacity of a given medium,
protocol, or connection. It effectively describes the size of the pipe required for the
application to communicate over the network.
Generally, a connection requiring guaranteed service has certain bandwidth requirements
and wants the network to allocate a minimum bandwidth specifically for it. A digitized
voice application produces voice as a 64-kbps stream. Such an application becomes nearly
unusable if it gets less than 64 kbps from the network along the connections path.
Performance Measures 11
Figure 1-1 Delay Components of a 1500-byte Packet on a Transcontinental U.S. Link with Increasing
Bandwidths
Serialization delay
OC12 (622 Mbps)
Switching delay
Propagation delay
OC3 (155 Mbps)
T1 (1.5 Mbps)
0% 50% 100%
% Total delay
Packet Loss
Packet loss specifies the number of packets being lost by the network during transmission.
Packet drops at network congestion points and corrupted packets on the transmission wire
cause packet loss. Packet drops generally occur at congestion points when incoming
packets far exceed the queue size limit at the output queue. They also occur due to
insufficient input buffers on packet arrival. Packet loss is generally specified as a fraction
of packets lost while transmitting a certain number of packets over some time interval.
Certain applications dont function well or are highly inefficient when packets are lost.
Such loss-intolerant applications call for packet loss guarantees from the network.
Packet loss should be rare for a well-designed, correctly subscribed or under-subscribed
network. It is also rare for guaranteed service applications for which the network has
already reserved the required resources. Packet loss is mainly due to packet drops at
network congestion points with fiber transmission lines, with a Bit Error Rate (BER) of
10E-9 being relatively loss-free. Packet drops, however, are a fact of life when transmitting
best-effort traffic, although such drops are done only when necessary. Keep in mind that
dropped packets waste network resources, as they already consumed certain network
resources on their way to the loss point.
1163_09.book Page 12 Thursday, December 14, 2000 1:47 PM
QoS Functions
This section briefly discusses the various QoS functions, their related features, and their
benefits. The functions are discussed in further detail in the rest of the book.
Resource Allocation
FIFO scheduling is the widely deployed, traditional queuing mechanism within routers and
switches on the Internet today. Though it is simple to implement, FIFO queuing has some
fundamental problems in providing QoS. It provides no way to enable delay-sensitive
traffic to be prioritized and moved to the head of the queue. All traffic is treated exactly the
same, with no scope for traffic differentiation or service differentiation among traffic.
For the scheduling algorithm to deliver QoS, at a minimum it needs to be able to
differentiate among the different packets in the queue and know the service level of each
packet. A scheduling algorithm determines which packet goes next from a queue. How
often the flow packets are served determines the bandwidth or resource allocation for the
flow.
Chapter 4, Per-Hop Behavior: Resource Allocation I, covers QoS features in this section
in detail.
1163_09.book Page 13 Thursday, December 14, 2000 1:47 PM
QoS Functions 13
Switching
A routers primary function is to quickly and efficiently switch all incoming traffic to the
correct output interface and next-hop address based on the information in the forwarding
table. The traditional cache-based forwarding mechanism, although efficient, has scaling
and performance problems because it is traffic-driven and can lead to increased cache
maintenance and poor switching performance during network instability.
The topology-based forwarding method solves the problems involved with cache-based
forwarding mechanisms by building a forwarding table that exactly matches the routers
routing table. The topology-based forwarding mechanism is referred to as Cisco Express
Forwarding (CEF) in Cisco routers. Appendix B, Packet Switching Mechanisms, offers
more detail on these QoS functions.
Routing
Traditional routing is destination-based only and routes packets on the shortest path derived
in the routing table. This is not flexible enough for certain network scenarios. Policy routing
is a QoS function that enables the user to change destination-based routing to routing based
on various user-configurable packet parameters.
Current routing protocols provide shortest-path routing, which selects routes based on a
metric value such as administrative cost, weight, or hop count. Packets are routed based on
1163_09.book Page 14 Thursday, December 14, 2000 1:47 PM
the routing table, without any knowledge of the flow requirements or the resource
availability along the route. QoS routing is a routing mechanism that takes into account a
flows QoS requirements and has some knowledge of the resource availability in the
network in its route selection criteria.
Appendix C, Routing Policies, offers more detail on these QoS functions.
Objectives 15
End-to-End QoS
Layer 2 QoS technologies offer solutions on a smaller scope only and cant provide end-to-
end QoS simply because the Internet or any large scale IP network is made up of a large
group of diverse Layer 2 technologies. In a network, end-to-end connectivity starts at Layer
3 and, hence, only a network layer protocol, which is IP in the TCP/IP-based Internet, can
deliver end-to-end QoS.
The Internet is made up of diverse link technologies and physical media. IP, being the layer
providing end-to-end connectivity, needs to map its QoS functions to the link QoS
mechanisms, especially of switched networks, to facilitate end-to-end QoS.
Some service provider backbones are based on switched networks such as ATM or Frame
Relay. In this case, you need to have ATM and Frame Relay QoS-to-IP interworking to
provide end-to-end QoS. This enables the IP QoS request to be honored within the ATM or
the frame cloud.
Switched LANs are an integral part of Internet service providers (ISPs) that provide Web-
hosting services and corporate intranets. IEEE 801.1p and IEEE 802.1Q offer priority-
based traffic differentiation in switched LANs. Interworking these protocols with IP is
essential to making QoS end to end. Chapter 8 discusses IP QoS interworking with
switches, backbones, and LANs in detail.
MPLS facilitates IP QoS delivery and provides extensive traffic engineering capabilities
that help provide MPLS-based VPNs. For end-to-end QoS, IP QoS needs to interwork with
the QoS mechanisms in MPLS and MPLS-based VPNs. Chapter 9 focuses on this topic.
Objectives
This book is intended to be a valuable technical resource for network managers, architects,
and engineers who want to understand and deploy IP QoS-based services within their
network. IP QoS functions are indispensable in todays scalable, IP network designs, which
are intended to deliver guaranteed and differentiated Internet services by giving control of
the network resources and its usage to the network operator.
This books goal is to discuss IP QoS architectures and their associated QoS functions that
enable end-to-end QoS in corporate intranets, service provider networks, and, in general,
the Internet. On the subject of IP QoS architectures, this books primary focus is on the
diffserv architecture. This book also focuses on ATM, Frame Relay, IEEE 801.1p, IEEE
801.1Q, MPLS, and MPLS VPN QoS technologies and on how they interwork with IP QoS
in providing an end-to-end service. Another important topic of this book is MPLS traffic
engineering.
1163_09.book Page 16 Thursday, December 14, 2000 1:47 PM
This book provides complete coverage of IP QoS and all related technologies, complete
with case studies. Readers will gain a thorough understanding in the following areas to help
deliver and deploy IP QoS and MPLS-based traffic engineering:
Fundamentals and the need for IP QoS
The diffserv QoS architecture and its enabling QoS functionality
The Intserv QoS model and its enabling QoS functions
ATM, Frame Relay, and IEEE 802.1p/802.1Q QoS technologiesInterworking with
IP QoS
MPLS and MPLS VPN QoSInterworking with IP QoS
MPLS traffic engineering
Routing policies, general IP QoS functions, and other miscellaneous QoS information
QoS applies to any IP-based network. As such, this book targets all IP networkscorporate
intranets, service provider networks, and the Internet.
Audience
The book is written for internetworking professionals who are responsible for designing
and maintaining IP services for corporate intranets and for service provider network
infrastructures. If you are a network engineer, architect, planner, designer, or operator who
has a rudimentary knowledge of QoS technologies, this book will provide you with
practical insights on what you need to consider to design and implement varying degrees
of QoS in the network.
This book also includes useful information for consultants, systems engineers, and sales
engineers who design IP networks for clients. The information in this book covers a wide
audience because incorporating some measure of QoS is an integral part of any network
design process.
Organization 17
The case studies in this book are designed to discuss the application and provide some
configuration details on enabling QoS functionality to help the reader implement QoS
in his network. It is not meant to replace the general Cisco documentation. Cisco
documentation is still the best resource for complete details on a particular QoS
configuration command.
The case studies in this book are based on a number of different IOS versions. In general,
most case studies are based on 12.0(6)S or a more recent 12.0S IOS version unless
otherwise noted. In case of the MPLS case studies, 12.0(8)ST or a more recent 12.0ST IOS
version is used.
Organization
This book consists of four parts: Part I, IP QoS, focuses on IP QoS architectures and the
QoS functions enabling them. Part II, Layer 2, MPLS QoSInterworking with IP QoS,
lists the QoS mechanisms in ATM, Frame Relay, Ethernet, MPLS, and MPLS VPN and
discusses how they map with IP QoS. Part III, Traffic Engineering, discusses traffic
engineering using MPLS. Finally, Part IV, Appendixes, discusses the modular QoS
command-line interface and miscellaneous QoS functions and provides some useful
reference material.
Most chapters include a case study section to help in implementation, as well as a question
and answer section.
Part I
This part of the book discusses the IP QoS architectures and their enabling functions.
Chapter 2 introduces the two IP QoS architectures: diffserv and intserv, and goes on to
discuss the diffserv architecture.
Chapters 3, 4, 5, and 6 discuss the different functions that enable diffserv architecture.
Chapter 3, for instance, discusses the QoS functions that condition the traffic at the network
boundary to facilitate diffserv within the network. Chapters 4 and 5 discuss packet
scheduling mechanisms that provide minimum bandwidth guarantees for traffic. Chapter 6
focuses on the active queue management techniques that proactively drop packets signaling
congestion. Finally, Chapter 7 discusses the RSVP protocol and its two integrated service
types.
Part II
This section of the book, comprising Chapters 8 and 9, discusses ATM, Frame Relay, IEEE
801.1p, IEEE 801.1Q, MPLS, and MPLS VPN QoS technologies and how they interwork
to provide an end-to-end IP QoS.
1163_09.book Page 18 Thursday, December 14, 2000 1:47 PM
Part III
Chapter 10, the only chapter in Part III, talks about the need for traffic engineering and
discusses MPLS traffic engineering operation.
Part IV
This part of the book has useful information that didnt fit well with previous sections but
still is relevant in providing IP QoS.
Appendix A, Cisco Modular QoS Command-Line Interface, details the new user interface
that enables flexible and modular QoS configuration.
Appendix B, Packet Switching Mechanisms, introduces the various packet-switching
mechanisms available on Cisco platforms. It compares the switching mechanisms and
recommends CEF, which also is a required packet-switching mechanism for certain QoS
features.
Appendix C, Routing Policies, discusses QoS routing, policy-based routing, and QoS
Policy Propagation using Border Gateway Protocol (QPPB).
Appendix D, Real-Time Transport Protocol (RTP), talks about the transport protocol used
to carry real-time packetized audio and video traffic.
Appendix E, General IP Line Efficiency Functions, talks about some IP functions that
help improve available bandwidth.
Appendix F, Link Layer Fragmentation and Interleaving, discusses fragmentation and
interleaving functionality with the Multilink Point-to-Point protocol.
Appendix G, IP Precedence and DSCP Values, tabulates IP precedence and DSCP values.
It also shows how IP precedence and DSCP values are mapped to each other.
References
1 RFC 791: Internet Protocol Specification, J. Postel, 1981
2 RFC 896: Congestion Control in IP/TCP Internetworks, J. Nagle, 1984
3 RFC 1122: Requirements for Internet HostsCommunication Layers, R. Braden,
1989
4 RFC 2001: TCP Slow Start, Congestion Avoidance, Fast Retransmit, and Fast Recovery
Algorithms, W. Stevens, 1997
5 S. Floyd and V. Jacobson. Random Early Detection Gateways for Congestion
Avoidance. IEEE/ACM Transactions on Networking, August 1993
1163_09.book Page 19 Thursday, December 14, 2000 1:47 PM
References 19
CHAPTER
2
Differentiated Services
Architecture
The aim of IP Quality of Service (QoS) is to deliver guaranteed and differentiated services
on the Internet or any IP-based network. Guaranteed and differentiated services provide
different levels of QoS, and each represents an architectural model for delivering QoS.
This chapter primarily focuses on Differentiated Services (diffserv) architecture for
delivering QoS in the Internet. The other architectural model, Integrated Services (intserv)
is introduced. Intserv is discussed in Chapter 7, Integrated Services: RSVP. An
operational QoS model for a network node is also presented.
Intserv Architecture
The Internet Engineering Task Force (IETF) set up the intserv Working Group (WG) in
1994 to expand the Internets service model to better meet the needs of emerging, diverse
voice/video applications. It aims to clearly define the new enhanced Internet service
model as well as to provide the means for applications to express end-to-end resource
requirements with support mechanisms in routers and subnet technologies. It follows the
goal of managing those flows separately that requested specific QoS.
Two servicesguaranteed1 and controlled load2are defined for this purpose.
Guaranteed service provides deterministic delay guarantees, whereas controlled load
service provides a network service close to that provided by a best-effort network under
lightly loaded conditions. Both services are discussed in detail in Chapter 7.
Resource Reservation Protocol (RSVP) is suggested as the signaling protocol that delivers
end-to-end service requirements3.
The intserv model requires per-flow guaranteed QoS on the Internet. With the thousands of
flows existing on the Internet today, the amount of state information required in the routers
can be enormous. This can create scaling problems, as the state information increases as the
number of flows increases. This makes intserv hard to deploy on the Internet.
In 1998, the diffserv Working Group was formed under IETF. Diffserv is a bridge between
intservs guaranteed QoS requirements and the best-effort service offered by the Internet
today. Diffserv provides traffic differentiation by classifying traffic into a few classes, with
relative service priority among the traffic classes.
1163_09.book Page 22 Thursday, December 14, 2000 1:47 PM
Diffserv Architecture
The diffserv approach4 to providing QoS in networks employs a small, well-defined set of
building blocks from which you can build a variety of services. Its aim is to define the
differentiated services (DS) byte, the Type of Service (ToS) byte from the Internet Protocol
(IP) Version 4 header and the Traffic Class byte from IP Version 6, and mark the
standardized DS byte of the packet such that it receives a particular forwarding treatment,
or per-hop behavior (PHB), at each network node.
The diffserv architecture provides a framework5 within which service providers can offer
customers a range of network services, each differentiated based on performance. A
customer can choose the performance level needed on a packet-by-packet basis by simply
marking the packets Differentiated Services Code Point (DSCP) field to a specific value.
This value specifies the PHB given to the packet within the service provider network.
Typically, the service provider and customer negotiate a profile describing the rate at which
traffic can be submitted at each service level. Packets submitted in excess of the agreed
profile might not be allotted the requested service level.
The diffserv architecture only specifies the basic mechanisms on ways you can treat
packets. You can build a variety of services by using these mechanisms as building blocks.
A service defines some significant characteristic of packet transmission, such as
throughput, delay, jitter, and packet loss in one direction along a path in a network. In
addition, you can characterize a service in terms of the relative priority of access to
resources in a network. After a service is defined, a PHB is specified on all the network
nodes of the network offering this service, and a DSCP is assigned to the PHB. A PHB is
an externally observable forwarding behavior given by a network node to all packets
carrying a specific DSCP value. The traffic requiring a specific service level carries the
associated DSCP field in its packets.
All nodes in the diffserv domain observe the PHB based on the DSCP field in the packet.
In addition, the network nodes on the diffserv domains boundary carry the important
function of conditioning the traffic entering the domain. Traffic conditioning involves
functions such as packet classification and traffic policing and is typically carried out on the
input interface of the traffic arriving into the domain. Traffic conditioning plays a crucial
role in engineering traffic carried within a diffserv domain, such that the network can
observe the PHB for all its traffic entering the domain.
The diffserv architecture is illustrated in Figure 2-1. The two major functional blocks in this
architecture are shown in Table 2-1.
1163_09.book Page 23 Thursday, December 14, 2000 1:47 PM
Diffserv Architecture 23
DS domain
Per-hop behavior
Apart from these two functional blocks, resource allocation policy plays an important role
in defining the policy for admission control, ratio of resource overbooking, and so on.
NOTE Cisco introduced modular QoS command-line interface (CLI) (discussed in Appendix C,
Routing Policies) to provide a clean separation and modular configuration of the different
enabling QoS functions.
DSCP
The IETF diffserv group is in the process of standardizing an effort that enables users to
mark 6 bits of the ToS byte in the IP header with a DSCP. The lowest-order 2 bits are
currently unused (CU). DSCP is an extension to 3 bits used by IP precedence. Like IP
precedence, you can use DSCP to provide differential treatment to packets marked
appropriately. Figure 2-3 shows the ToS byte6. The ToS byte is renamed the DS byte with
the standardization of the DSCP field. Figure 2-4 shows the DS byte.
P2 P1 P0 T3 T2 T1 T0 CU
The DSCPs defined7 thus far by the IETF Working Group are as follows:
Default DSCPIt is defined to be 000 000.
1163_09.book Page 25 Thursday, December 14, 2000 1:47 PM
Diffserv Architecture 25
sections; however, they are more fully discussed in Chapter 3, Network Boundary Traffic
Conditioners: Packet Classifier, Marker, and Traffic Rate Management.
Classifier
The classifier selects a packet in a traffic stream based on the content of some portion of the
packet header. The most common way to classify traffic is based on the DSCP field, but you
can classify traffic based on the other fields in the packet headers. This function identifies
a packets traffic class.
Marker
This function helps write/rewrite the DSCP field of a packet based on its traffic class.
Metering
The metering function checks compliance to traffic profile, based on a traffic descriptor
such as a token bucket, and passes the result to the marker function and either a shaper or
dropper function to trigger particular action for in-profile and out-of-profile packets.
Shaper
The shaper function delays traffic by buffering some packets so that they comply with the
profile. This action is also referred to as traffic shaping.
Dropper
The dropper function drops all traffic that doesnt comply with the traffic profile. This
action is also referred to as traffic policing.
PHB
Network nodes with diffserv support use the DSCP field in the IP header to select a specific
PHB for a packet. A PHB is a description of the externally observable forwarding behavior
of a diffserv node applied to a set of packets with the same DSCP.
You can define a PHB in terms of its resource priority relative to other PHBs, or to some
observable traffic service characteristics, such as packet delay, loss, or jitter. You can view
a PHB as a black box, as it defines some externally observable forwarding behavior without
mandating a particular implementation.
1163_09.book Page 27 Thursday, December 14, 2000 1:47 PM
Diffserv Architecture 27
In a diffserv network, best-effort behavior can be viewed as the default PHB. Diffserv
recommends specific DSCP values for each PHB, but a network provider can choose to use
a different DSCP than the recommended values in his or her network. The recommended
DSCP for best-effort behavior is 000000.
The PHB of a specific traffic class depends on a number of factors:
Arrival rate or load for the traffic classThis is controlled by the traffic
conditioning at the network boundary.
Resource allocation for the traffic classThis is controlled by the resource
allocation on the nodes in the diffserv domain. Resource allocation in the network
nodes is discussed in Chapter 4, Per-Hop Behavior: Resource Allocation I, and
Chapter 5, Per-Hop Behavior: Resource Allocation II.
Traffic lossThis depends on the packet discard policy on the nodes in the diffserv
domain. This function is covered in Chapter 6, Per-Hop Behavior: Congestion
Avoidance and Packet Drop Policy.
Two PHBs, EF and AF, are standardized. They are discussed in the following sections.
EF PHB
You can use the EF PHB to build a low-loss, low-latency, low-jitter, assured-bandwidth,
end-to-end service through diffserv domains.8 EF PHB targets applications such as Voice
over IP (VoIP) and video conferencing, and services such as virtual leased line, as the
service looks like a point-to-point connection for a diffserv networks end nodes. Such
service is also often termed as premium service.
The main contributors to high packet delays and packet jitter are queuing delays caused by
large, accumulated queues. Such queues are typical at network congestion points. Network
congestion occurs when the arrival rate of packets exceeds their departure rate. You can
essentially eliminate queuing delays if the maximum arrival rate is less than the minimal
departure rate. The EF service sets the departure rate, whereas you can control the traffic
arrival rate at the node by using appropriate traffic conditioners at the network boundary.
An EF PHB needs to assure that the traffic sees no or minimal queues and, hence, needs to
configure a departure rate for traffic that is equal to or less than the packet arrival rate. The
departure rate or bandwidth should be independent of the other traffic at any time. The
packet arrival and departure rates are typically measured at intervals equal to the time it
takes for a links maximum transmission unit (MTU)-sized packet to be transmitted.
A router can allocate resources for a certain departure rate on an interface by using different
EF functionality implementations. Packet scheduling techniquessuch as Class-Based
Weighted Fair Queuing (CBWFQ), Weighted Round Robin (WRR), and Deficit Round
Robin (DRR)provide this functionality when the EF traffic can be carried over a highly
weighted queue; that is, a weight that allocates a much higher rate to EF traffic than the
1163_09.book Page 28 Thursday, December 14, 2000 1:47 PM
actual EF traffic arrival rate. Further, you can modify these scheduling techniques to include
a priority queue to carry EF traffic. The scheduling techniques are discussed in detail in
Chapter 5.
When EF traffic is implemented using a priority queue, it is important to ensure that a busy
EF priority queue does not potentially starve the remaining traffic queues beyond a certain
configurable limit. To alleviate this problem, a user can set up a maximum rate against
which the traffic serviced by the priority queue is policed. If the traffic exceeds the
configured rate limit, all excess EF traffic is dropped. The network boundary traffic
conditioners should be configured such that the EF traffic never exceeds its maximum
configured rate at any hop in the network.
The recommended DSCP to be used for EF traffic in the network is 101110.
AF PHB
AF PHB9 is a means for a service provider to offer different levels of forwarding assurances
for IP packets received from a customer diffserv domain. It is suitable for most
Transmission Control Protocol (TCP)-based applications.
An AF PHB provides differentiated service levels among the four AF traffic classes. Each
AF traffic class is serviced in its own queue, enabling independent capacity management
for the four traffic classes. Within each AF class are three drop precedence levels (Low,
Medium, and High) for Random Early Detection (RED)-like queue management.
Network Provisioning
One resource allocation solution is to provision resources within the network using
heuristic methods or systematic modeling techniques. This method can work only in a small
network environment where the QoS policies and network traffic profile dont change often.
Signaled QoS
In this method, applications signal QoS requests using a signaling protocol such as RSVP.
For RSVP, the diffserv domain is treated as another link in the network for admission
1163_09.book Page 29 Thursday, December 14, 2000 1:47 PM
Diffserv Architecture 29
control. QoSes are mapped between RSVP and diffserv classes. You can map RSVP
guaranteed service to diffserv EF service, for example.
Signaled QoS can be a scalable solution in a large-scale network environment because
RSVP is run only at the network edges with diffserv used in the network core, as shown in
Figure 2-5. Mapping between RSVP reservation and a diffserv class happens at the edge of
the diffserv network. With widespread RSVP support (for instance, the Microsoft Windows
2000 operating system supports RSVP), policy-aware applications at the network edges can
use RSVP to signal QoS across the network without any scalability concerns. The solution
suits well in some large-scale enterprise networks.
Differentiated services
Differentiated services
network
Server
RSVP can scale well to support a few thousand per-flow sessions running in parallel. In
addition, work is ongoing to provide aggregated RSVP. Multiple RSVP reservations are
aggregated into a single aggregate reservation for large-scale RSVP deployment across a
core network backbone that requires topology-aware admission control. Aggregated RSVP
reservation is a fat, slowly adjusting reservation state that results in a reduced state signaling
information in the network core. As a normal RSVP reservation, you can map the aggregate
reservation to a diffserv class.
policy manager to define policies makes the task less daunting. This policy manager can
distribute the policy to all the network devices.
Common Open Policy Service (COPS) is an IETF protocol for distributing policy. In COPS
terminology, the centralized policy server is called the Policy Decision Point (PDP). The
network node implementing or enforcing the policy is called the Policy Enforcement Point
(PEP). The PDP uses the COPS protocol for downloading the policies into the PEPs in the
network. A PEP device can generate a message informing the PDP if it cannot implement
a policy that it was given by PDP.
Summary
Two Internet QoS architectures, intserv and diffserv, are introduced in this chapter. intserv
architecture is defined to enable applications to request end-to-end network resource
requirements. RSVP is suggested as a signaling protocol to request end-to-end resource
requirements. This architecture can run into scalability issues in large networks, especially
on the Internet, where the number of traffic flows can run in the order of tens of thousands.
Intserv is discussed in detail in Chapter 7.
The chapter focuses on the diffserv architecture. Diffserv defines the architecture for
implementing scalable service differentiation on the Internet. Diffserv uses the newly
standardized DSCP field in the IP header to mark the QoS required by a packet, and a
1163_09.book Page 31 Thursday, December 14, 2000 1:47 PM
References 31
diffserv-enabled network delivers the packet with a PHB indicated by the DSCP field.
Traffic is policed and marked appropriately at the edge of the diffserv-enabled network. A
PHB is an externally observable forwarding behavior given by a network node to all packets
carrying a specific DSCP value. You can specify PHBs in terms of their relative priority in
access to resources, or in terms of their relative traffic performance characteristics. Two
PHBs, EF and AF, are defined. EF is targeted at real-time applications, such as VoIP. AF
provides different forwarding assurance levels for packets based on their DSCP field.
COPS is a new IETF protocol for distributing QoS policy information across the network.
References
1 Specification of the Controlled-Load Network Element Service, J. Wroclawski, RFC
2211.
2 Specification of Guaranteed Quality of Service, S. Shenker, C. Partridge, R. Guerin,
RFC 2212.
3 The Use of RSVP with IETF Integrated Services, J. Wroclawski, RFC 2210.
4 IETF Differentiated Services Working Group.
5 Type of Service in the Internet Protocol Suite, P. Almquist, RFC 1349.
6 A Framework for Differentiated Services, Y. Bernet, and others, Internet Draft.
7 Definition of the Differentiated Services Field (DS Field) in the IPv4 and IPv6 Headers,
K. Nichols, and others, RFC 2474.
8 Expedited Forwarding (EF) PHB. RFC2598.txt.
9 Assured Forwarding (AF) PHB. RFC2597.txt.
1163_09.book Page 32 Thursday, December 14, 2000 1:47 PM
1163_09.book Page 33 Thursday, December 14, 2000 1:47 PM
CHAPTER
3
Network Boundary Traffic
Conditioners: Packet Classifier,
Marker, and Traffic Rate
Management
Traffic conditioning functions at the network boundary are vital to delivering differentiated
services within a network domain. These functions provide packet classifier, marker, and
traffic rate management.
In a network, packets are generally differentiated on a flow basis by the five flow fields in
the Internet Protocol (IP) packet header: source IP address; destination IP address; IP
protocol field; and source and destination ports. An individual flow is made of packets
going from an application on a source machine to an application on a destination machine,
and packets belonging to a flow carry the same values for the five packet header flow fields.
Quality of service (QoS) applied on an individual flow basis, however, is not scalable
because the number of flows can be large. So, routers at the network boundary perform
classifier functions to identify packets belonging to a certain traffic class based on one or
more Transmission Control Protocol/Internet Protocol (TCP/IP) header fields. A marker
function is used to color the classified traffic by setting either the IP Precedence or the
Differentiated Services Code Point (DSCP) field. Within the network core, you can apply
a per-hop behavior (PHB) to the packets based on either the IP Precedence or the DSCP
field marked in the packet header.
Another important traffic conditioner at the network boundary is traffic rate management.
It enables a service provider to meter the customers traffic entering the network against the
customers traffic profile using a policing function. Conversely, an enterprise accessing its
service provider can meter all its traffic to shape the traffic and send out at a constant rate
such that all its traffic passes through the service providers policing functions.
Network boundary traffic conditioners are essential to delivering differentiated services in
a network.
1163_09.book Page 34 Thursday, December 14, 2000 1:47 PM
34 Chapter 3: Network Boundary Traffic Conditioners: Packet Classifier, Marker, and Traffic Rate Management
Packet Classification
Packet classification is a means of identifying packets to be of a certain class based on one
or more fields in a packet. The identification function can range from straightforward to
complicated. The different classification support types include:
IP flow identification based on the five flow parameters: Source IP Address,
Destination IP Address, IP protocol field, Source Port Number, and Destination Port
number.
Identification based on IP Precedence or DSCP field.
Packet identification based on other TCP/IP header parameters, such as packet length.
Identification based on source and destination Media Access Control (MAC)
addresses.
Application identification based on port numbers, Web Universal Resource Locator
(URL) addresses, and so on. This functionality is available in Cisco products as
Network Based Application Recognition (NBAR).
You can use access lists to match packets based on the various flow parameters. Access lists
can also identify packets based on the IP Precedence or DSCP field. NBAR enables a router
to recognize traffic flows as belonging to a specific application enabling packet
classification based on application.
Packet classification also can be done based on information internal to the router. Examples
of such classification are identification based on the arrived input interface and the QoS
group field in the internal packet data structure. All the preceding classification mechanisms
are supported across all QoS functions as part of Modular QoS command-line interface
(CLI). Modular QoS CLI is discussed in Appendix A, Cisco Modular QoS Command-Line
Interface.
The classification action is referred to as packet marking or packet coloring. Packets
identified to belong to a class are colored accordingly.
Packet Marking
You can mark classified packets to indicate their traffic class. You can color packets by
marking the IP Precedence or the DSCP field in the packets IP header, or the QoS group
field in the packets internal data structure within a router.
IP Precedence
The IP Precedence field in the packets IP header is used to indicate the relative priority with
which a particular packet should be handled. It is made up of three bits in the IP headers
Type of Service (ToS) byte. Apart from IP Precedence, the ToS byte contains ToS bits. ToS
1163_09.book Page 35 Thursday, December 14, 2000 1:47 PM
Packet Marking 35
bits were designed to contain values indicating how each packet should be handled in a
network, but this particular field is never used much in the real world. The ToS byte1
showing both the IP Precedence and ToS bits is illustrated in Figure 2-3 in Chapter 2,
Differentiated Services Architecture. Table 3-1 shows the different IP Precedence bit
values and their names2.
Table 3-1 IP Precedence Values and Names
IP Precedence Value IP Precedence Bits IP Precedence Names
0 000 Routine
1 001 Priority
2 010 Immediate
3 011 Flash
4 100 Flash Override
5 101 Critical
6 110 Internetwork Control
7 111 Network Control
NOTE All routing control traffic in the network uses IP Precedence 6 by default. IP Precedence 7
also is reserved for network control traffic. Hence, use of IP Precedences 6 and 7 is not
recommended for user traffic.
Packet coloring by setting the IP Precedence field can be done either by the application
originating the traffic or by a node in the network. Cisco QoS features supporting this
function include Committed Access Rate (CAR), Policy-Based Routing (PBR), and QoS
Policy Propagation using Border Gateway Control (BGP) (QPPB). Case study 3-1, later in
this chapter, discusses the IP Precedence setting using different QoS features. CAR is
discussed in the Traffic Policing section of this chapter. PBR and QPPB are discussed in
Appendix C, Routing Policies.
NOTE PBR is primarily a feature to route packets based on policy, though it has functionality for
marking packets with IP Precedence. As such, PBR is recommended only for marking
packets when either CAR or QPPB support is unavailable, or when IP Precedence needs to
be marked while routing packets based on a policy.
1163_09.book Page 36 Thursday, December 14, 2000 1:47 PM
36 Chapter 3: Network Boundary Traffic Conditioners: Packet Classifier, Marker, and Traffic Rate Management
DSCP
DSCP field is used to indicate a certain PHB in a network. It is made up of 6 bits in
the IP header and is being standardized by the Internet Engineering Task Force (IETF)
Differentiated Services Working Group. The original ToS byte containing the DSCP bits
has been renamed the DSCP byte. The DSCP byte is shown in Figure 2-4 in Chapter 2. The
DSCP field definitions3 and the recommended DSCP values for the different forwarding
behaviors are also discussed in Chapter 2.
The DSCP field is part of the IP header, similar to IP Precedence. In fact, the DSCP field is
a superset of the IP Precedence field. Hence, the DSCP field is used and is set in ways
similar to what was described with respect to IP Precedence.
Note that the DSCP field definition is backward-compatible with the IP Precedence values.
Often, packets arrive at a network boundary carrying a set IP Precedence or DSCP field.
Even in such situations when the packet arriving into the network is already marked, a
network operator wants to enforce the right marking at the network edge based on the
packets class and its offered service level before the traffic enters the network. Case study
3-3, later in this chapter, discusses an example of IP Precedence enforcement.
1163_09.book Page 37 Thursday, December 14, 2000 1:47 PM
Packet Marking 37
ISP
Hssi0/0/1
217.217.217.1
S3
D
This case study discusses the IP Precedence setting function using CAR, PBR, and QPPB.
On the ISP routers High-Speed Serial Interface (HSSI) connecting to the enterprise
customer, you should configure CAR, PBR, or QPPB commands to set IP Precedence
values based on the source IP address for all incoming traffic.
continues
1163_09.book Page 38 Thursday, December 14, 2000 1:47 PM
38 Chapter 3: Network Boundary Traffic Conditioners: Packet Classifier, Marker, and Traffic Rate Management
Two rate-limit statements are defined to set the IP Precedence values. The first statement
sets IP Precedence 5 for all traffic carrying a source address in the 215.215.215.0/24 address
space. The rest of the traffic arriving on the Hssi0/0/1 interface is set with an IP Precedence
of 4.
Note that though rate-limit parameters are defined in the two statements, the purpose is not
to rate-limit but simply to set IP Precedence based on an IP access list.
NOTE CAR provides two functions in a single statement: rate limiting and IP Precedence setting.
In this case study, the purpose is to set only the IP Precedence, but a CAR statement requires
both rate-limiting and IP Precedence functions to be configured. In the future, with the
support for modular QoS CLI, setting IP Precedence will not require enabling rate-limiting
functions.
The route map tasman is used to set IP Precedence 5 for traffic with a source address in the
215.215.215.0/24 address space and IP Precedence 4 for the rest of the traffic.
1163_09.book Page 39 Thursday, December 14, 2000 1:47 PM
Packet Marking 39
router bgp 10
table-map tasman
neighbor 217.217.217.2 remote-as 2345
40 Chapter 3: Network Boundary Traffic Conditioners: Packet Classifier, Marker, and Traffic Rate Management
NOTE You can choose to classify packets by QoS group rather than IP Precedence either when
you want more than eight traffic classes, or when you dont want to change the original
packets IP Precedence values.
This case study discusses packet classification based on the QoS group using CAR and
QPPB. It shows sample CAR and QPPB configurations on the ISP router to classify and
mark packets using QoS group values.
router bgp 10
table-map tasman
neighbor 217.217.217.2 remote-as 2345
Packet Marking 41
S3
D
Hssi1/0
Service provider
Hssi1/1
D
S3
The service provider needs to enforce the IP Precedence setting to 0 for all traffic coming
from best-effort service customers. For traffic coming from premium customers, the service
provider has to mark the packets with an IP Precedence of 5.
Input CAR is configured on the HSSI interface of the router enforcing IP Precedence
values. Listing 3-6 shows a sample configuration.
Listing 3-6 Using CAR to Enforce IP Precedence on Traffic
interface Hssi 1/0
rate-limit input 45000000 22500 22500 conform-action set-prec-transmit 0
exceed-action set-prec-transmit 0
Interface Hssi1/0 connects to a customer with normal, best-effort service. Therefore, all
incoming traffic on this interface is tagged with an IP Precedence of 0.
Similarly, interface Hssi1/1 connects to a customer with premium service. Therefore, all
incoming traffic on this interface is tagged with an IP Precedence of 5.
1163_09.book Page 42 Thursday, December 14, 2000 1:47 PM
42 Chapter 3: Network Boundary Traffic Conditioners: Packet Classifier, Marker, and Traffic Rate Management
NOTE Token bucket is simply a measuring tool. It doesnt filter, alter, or act on the traffic.
Traffic Policing 43
Traffic Policing
The traffic policing or rate-limiting function is provided by CAR. CAR offers two primary
functions: packet coloring by setting IP Precedence, and rate-limiting.
As a traffic policing function, CAR doesnt buffer or smooth traffic and might drop packets
when the allowed bursting capability is exceeded. CAR is implemented as a list of rate-limit
statements. You can apply it to both the output and input traffic on an interface. A rate-limit
statement is implemented as follows:
rate-limit <input/ouput> access-group rate-limit # CIR
conformed burst extended burst conform-action action desired
exceed-action action desired
Each rate-limit statement is made up of three elements. The rate-limit statements are
processed, as shown in Figure 3-3.
Traffic
matching
specification
Traffic
measurement
Next
instrumentation
rate-limit
statement
Action
policy
1163_09.book Page 44 Thursday, December 14, 2000 1:47 PM
44 Chapter 3: Network Boundary Traffic Conditioners: Packet Classifier, Marker, and Traffic Rate Management
Traffic Policing 45
Tokens
T
Bc: Confomed burst size
Bc T: Token arrival rate
Overflow
tokens
Packets
Conform
arriving
Exceed
DA is the number of tokens the stream currently borrowed. This is reduced at regular
intervals, determined by the configured committed rate by the accumulation of tokens. Say
you borrow 100 tokens for each of the three packets you send after the last packet drop. The
DA is 100, 200, and 300 after the first, second, and third packets are sent, respectively.
DC is the sum of the DA of all packets sent since the last time a packet was dropped. Unlike
DA, which is an actual count of the borrowed tokens since the last packet drop, DC is the
sum of the actual debts for all the packets that borrowed tokens since the last CAR packet
drop. Say, as in the previous example, you borrow 100 tokens for each of the three packets
you send after the last packet drop. DC equals 100, 300 ( = 100 + 200), and 600 ( = 100 +
200 + 300) after the first, second, and third packets are sent, respectively. Note that for the
first packet that needs to borrow tokens after a packet drop, DC is equal to DA.
The DC value is set to zero after a packet is dropped, and the next packet that needs to
borrow has a new value computed, which is equal to the DA. In the example, if the fourth
packet gets dropped, the next packet that needs to borrow tokens (for example, 100) has its
1163_09.book Page 46 Thursday, December 14, 2000 1:47 PM
46 Chapter 3: Network Boundary Traffic Conditioners: Packet Classifier, Marker, and Traffic Rate Management
DC = DA = 400 (= 300 + 100). Note that unlike DC, DA is not forgiven after a packet drop.
If DA is greater than the extended limit, all packets are dropped until DA is reduced through
accumulation of tokens.
The need for a token bucket with extended burst capability is not to immediately enter into
a tail-drop scenario such as a standard token bucket, but rather, to gradually drop packets in
a more Random Early Detection (RED)-like fashion. RED is discussed in Chapter 6, Per-
Hop Behavior: Congestion Avoidance and Packet Drop Policy. If a packet arrives and
needs to borrow some tokens, a comparison is made between BE and DC. If DC is greater
than BE, the packet is dropped and DC is set to zero. Otherwise, the packet is sent, and DA
is incremented by the number of tokens borrowed and DC with the newly computed DA
value.
Note that if a packet is dropped because the number of available tokens exceeds the packet
size (in bytes), tokens will not be removed from the bucket (for example, dropped packets
do not count against any rate or burst limits).
It is important to note that CIR is a rate expressed in bytes per second. The bursts are
expressed in bytes. A burst counter counts the current burst size. The burst counter can
either be less than or greater than the BC. When the burst counter exceeds BC, the burst
counter equals BC + DA. When a packet arrives, the burst counter is evaluated, as shown in
Figure 3-5.
Burst Counter
BC BE
For cases when the burst counter value is between BC and BE, you can approximately
represent the exceed action probability as:
(Burst counter BC) (BE BC)
Based on this approximation, the CAR packet drop probability is shown in Figure 3-6. The
concept of exceed action packet drop probability between the conformed and extended
burst is similar to the packet drop probability of RED between the minimum and maximum
thresholds.
1163_09.book Page 47 Thursday, December 14, 2000 1:47 PM
Traffic Policing 47
Packet discard %
100
Burst
counter
Conformed Extended
burst burst
NOTE The discussion in this section assumes packet transmission and packet drop as conform and
exceed actions, respectively, for simplicity. In reality, a number of options are available to
define both conform and exceed actions. They are discussed in the next section, The
Action Policy.
In a simple rate-limit statement where the BC and BE are the same value, no variable drop
probability exceed region exists.
CAR implementation puts the following constraints on the token bucket parameters:
Rate (bps) should be in increments of 8 Kbps, and the lowest value allowed for
conformed and extended burst size is 8000 bytes.
The minimum value of BC size is Rate (bps) divided by 2000. It should be at least
8000 bytes.
The BE is always equal to or greater than the BC value.
48 Chapter 3: Network Boundary Traffic Conditioners: Packet Classifier, Marker, and Traffic Rate Management
The first CAR statement is used to rate-limit all incoming traffic to 30 Mbps. It uses a
continue action so that you continue to the next rate-limit statement. The second rate-limit
statement is used to set the IP Precedence value for HTTP traffic based on its traffic rate.
1163_09.book Page 49 Thursday, December 14, 2000 1:47 PM
Traffic Policing 49
The last rate-limit statement is used to set the IP Precedence value to 4 for all non-HTTP
traffic. Listings 3-8 and 3-9 give the output of some relevant show commands for CAR.
Listing 3-8 CAR Parameters and Packet Statistics
#show interface hssi1/0 rate
Hssi1/0/0
Input
matches: all traffic
params: 30000000 bps, 15000 limit, 15000 extended limit
conformed 0 packets, 0 bytes; action: continue
exceeded 0 packets, 0 bytes; action: drop
last packet: 338617304ms ago, current burst: 0 bytes
last cleared 00:11:11 ago, conformed 0 bps, exceeded 0 bps
matches: access-group 101
params: 15000000 bps, 10000 limit, 10000 extended limit
conformed 0 packets, 0 bytes; action: set-prec-transmit 4
exceeded 0 packets, 0 bytes; action: set-prec-transmit 0
last packet: 338617201ms ago, current burst: 0 bytes
last cleared 00:11:11 ago, conformed 0 bps, exceeded 0 bps
matches: all traffic
params: 15000000 bps, 10000 limit, 10000 extended limit
conformed 0 packets, 0 bytes; action: set-prec-transmit 4
exceeded 0 packets, 0 bytes; action: set-prec-transmit 4
last packet: 338617304ms ago, current burst: 0 bytes
last cleared 00:03:30 ago, conformed 0 bps, exceeded 0 bps
The show interface rate command displays the CAR configuration parameters as well as
CAR packet statistics. With the display, current burst gives a snapshot of the value in the
token bucket at the time the value is printed. The conformed bps is obtained by dividing the
total conformed traffic passed by the elapsed time since the counters were cleared the last
time.
Listing 3-9 Input and Output IP Precedence Accounting Information on Interface Hssi1/0/0
#show interface hssi1/0/0 precedence
Hssi1/0/0
Input
Precedence 4: 10 packets, 1040 bytes
Output
Precedence 4: 10 packets, 1040 bytes
The show interface precedence command shows input and output packet accounting on
an interface when IP accounting precedence input/output is applied on the interface.
50 Chapter 3: Network Boundary Traffic Conditioners: Packet Classifier, Marker, and Traffic Rate Management
Service
IP precedence 0 - 3 traffic
provider
together is rate-limited
to 10 Mbps. Hssi1/0/0
S3
Partner D
network
On the HSSI interface going to the partner network, the service provider adds the
configuration shown in Listing 3-10 for this functionality.
Listing 3-10 Rate-Limiting Traffic Based on Packet Precedence Values
interface Hssi1/0/0
rate-limit input access-group rate-limit 1 10000000 10000 10000
conform-action transmit exceed-action drop
To match a range of precedence values, you use a mask of 8 bits, where each bit refers to a
precedence value. Precedence 7 is 10000000, precedence 6 is 01000000, precedence 5 is
0010000, and so on. In this example, the mask 07 is in hex, which is 00000111 in decimal.
Hence, the mask matches precedence values 0, 1, and 2.
We use a rate-limit access list statement to match the precedence values 0, 1, and 2. IP-
precedence-based rate-limit access lists range from 1 to 99. The rate-limit statement limits
all traffic with precedence 0, 1, and 2 to 10 Mbps.
Traffic Policing 51
The service provider can configure an inbound rate-limit statement on the T3 interface to
limit the traffic rate to the subrate the T3 customer subscribed. In the configuration example
in Listing 3-11, you are limiting the input rate of traffic to 20 Mbps. Any traffic that exceeds
the 20 Mbps rate is dropped.
Listing 3-11 Delivering Subrate IP Service
interface Hssi0/0/0
rate-limit input rate-limit 20000000 40000 40000 conform-action
transmit exceed-action drop
All packets being sent by the web server with an IP address of 209.11.212.1 are set with an
IP Precedence value based on the traffic rate. All traffic within the 5-Mbps rate gets an IP
Precedence of 4, and the rest get an IP Precedence of 0.
52 Chapter 3: Network Boundary Traffic Conditioners: Packet Classifier, Marker, and Traffic Rate Management
Listing 3-13 shows ways you can enable a router with a DS3 interface to limit its incoming
ICMP traffic to 256 Kbps.
Listing 3-13 Limiting Denial-of-Service Attacks
interface Hssi1/0
rate-limit input access-group 100 256000 8000 8000 conform-action
transmit exceed-action drop
ISP X ISP Y
The Tier-1 ISP wants to rate-limit all traffic received from the ISP X and ISP Y routers to a
mean rate of 30 Mbps and 40 Mbps, respectively. Any traffic exceeding the mean rate is
dropped.
The Tier-1 ISP drops any packets arriving from non-peers on the FDDI (other than ISPs X
and Y).
1163_09.book Page 53 Thursday, December 14, 2000 1:47 PM
Traffic Shaping 53
Listing 3-14 show ways you can enable the Tier-1 ISP router for this functionality.
Listing 3-14 Enforcing Public Exchange Point Traffic Example
interface Fddi1/0/0
ip address 162.111.10.1 255.255.255.192
rate-limit input access-group rate-limit 110 30000000 15000 15000
conform-action transmit exceed-action drop
rate-limit input access-group rate-limit 120 40000000 40000 40000
conform-action continue exceed-action drop
rate-limit input access-group 100 4000000 40000 40000
conform-action drop exceed-action drop
access-list rate-limit 110 0000.0c10.7819
access-list rate-limit 120 0000.0c89.6900
access-list 100 permit ip any any
0000.0c10.7819 and 0000.0c89.6900 are the MAC addresses on the FDDI interface of ISP
Xs and ISP Ys routers, respectively.
Rate-limit access lists ranging from 100 to 199 are used to match traffic by MAC addresses.
A rate-limit access list of 110 limits the traffic coming from ISP X to 30 Mbps. A rate-limit
access list of 120 is used to rate-limit the traffic from ISP Y to 40 Mbps.
The last rate-limit statement is a catch-all statement that drops any traffic arriving from
non-peer routers connected on the FDDI.
Traffic Shaping
TS is a mechanism to smooth the traffic flow on an interface to avoid congestion on the link
and to meet a service providers requirements. TS smoothes bursty traffic to meet the
configured CIR by queuing or buffering packets exceeding the mean rate. The queued
packets are transmitted as tokens become available. The queued packets transmission is
scheduled in either the first-in, first-out (FIFO) or Weighted Fair Queuing (WFQ) order. TS
operation is illustrated in Figure 3-9.
1163_09.book Page 54 Thursday, December 14, 2000 1:47 PM
54 Chapter 3: Network Boundary Traffic Conditioners: Packet Classifier, Marker, and Traffic Rate Management
Traffic Shaping 55
Figure 3-10 The Token Bucket Scheme for the Traffic Shaping Function
Bucket Size
BC+BE
You can accomplish traffic shaping on any generic interface using one of two
implementationsGeneric Traffic Shaping (GTS) and Distributed Traffic Shaping (DTS).
Table 3-4 compares the two TS implementations.
NOTE TS works only for outbound traffic; hence, TS cannot be applied on the inbound traffic to
an interface.
56 Chapter 3: Network Boundary Traffic Conditioners: Packet Classifier, Marker, and Traffic Rate Management
Shaping traffic
at 256 K
Serial0 T1
Enterprise
network
The serial interface of the enterprises router that connects to the service provider has an
access rate of 256 Kbps. Traffic shaping by both GTS and DTS features is discussed in the
following sections.
Traffic Shaping 57
Note that in the configuration, only the desired CIR is provided. The appropriate sustained
and BE values, which are shown in Listing 3-16, are picked automatically.
Listing 3-16 GTS Parameters on Interface Serial0
#show traffic shape serial0
Access Target Byte Sustain Excess Interval Increment Adapt
I/F List Rate Limit bits/int bits/int (ms) (bytes) Active
Se0 256000 1984 7936 7936 31 992 -
The show traffic-shape command shows the traffic shaping configuration and the token
bucket parameters.
Listing 3-17 Information on the GTS shaping queue
#show traffic serial queue
Traffic queued in shaping queue on Serial0
Queueing strategy: weighted fair
Queueing Stats: 0/1000/64/0 (size/max total/threshold/drops)
Conversations 4/8/256 (active/max active/max total)
Reserved Conversations 0/0 (allocated/max allocated)
The show traffic-shape queue command shows the WFQ information as well as the WFQ
queued packets on the interface.
Listing 3-18 GTS Statistics on Interface Serial0
#show traffic serial0
Access Queue Packets Bytes Packets Bytes Shaping
I/F List Depth Delayed Delayed Active
Se0 4 2000 185152 20 1281 yes
1163_09.book Page 58 Thursday, December 14, 2000 1:47 PM
58 Chapter 3: Network Boundary Traffic Conditioners: Packet Classifier, Marker, and Traffic Rate Management
The show traffic-shape statistics command shows the GTS packet statistics. Queue depth
shows the number of packets in the WFQ queue at the moment the command is issued.
Packets and Bytes show the total traffic switched through the interface. Packets Delayed
and Bytes Delayed show the part of the total traffic transmitted that was delayed in the WFQ
queue. Shaping Active shows if traffic shaping is active or not. Shaping is active if packets
are to be shaped in the WFQ queue before transmitting them.
policy-map mypolicy
class myclass
shape average 256000 16384 0
interface Serial0/0/0
service-policy output mypolicy
The shape average command is used to send traffic at the configured CIR of 256000 bps
without sending any excess burst bits per interval. The conformed burst (BC) is 16384 bits.
Hence, the interval is 64 ms. The command allows only BC bits of data to be sent per
interval. The excess burst is not allowed at all, even if it is configured to a value other than
0. The shape peak command is used to peak to a burst of BC + BE bits per interval while
keeping a CIR.
Listing 3-20 DTS Parameters and Queue Statistics
#show interface shape
Serial0/0/0 nobuffer drop 0
Serial0/0/0 (class 2):
cir 256000, Bc 16384, Be 0
packets output 0, bytes output 0
queue limit 0, queue size 0, drops 0
last clear = 00:01:39 ago, shape rate = 0 bps
The show interface shape command shows the packet and queue statistics along with the
configured DTS parameters.
1163_09.book Page 59 Thursday, December 14, 2000 1:47 PM
Traffic Shaping 59
Case Study 3-11: Shaping Incoming and Outgoing Traffic for a Host
to a Certain Mean Rate
At a large companys remote sales office, Host A with an IP address of 200.200.200.1 is
connected to the network on an Ethernet interface. It connects to the corporate office
through a router using a T1. The network administrator wants to shape Host As incoming
and outgoing traffic to 128 K, as in Figure 3-12.
Service provider
network
Shape traffic from
Host A to 128 Kbps
Shape traffic to
Host A at
T1
128 Kbps
Serial0
Ethernet0
Host A
Host A is connected to the remote sales office router using interface ethernet0. The routers
interface serial0 connects to the corporate network through a T1. The following
configuration shows how to shape Host As to and from traffic using GTS and DTS.
Listings 3-21 and 3-22 show samples of the configuration required for GTS and DTS,
respectively, to achieve this functionality.
interface ethernet 0
traffic-shape group 102 128000
60 Chapter 3: Network Boundary Traffic Conditioners: Packet Classifier, Marker, and Traffic Rate Management
The access lists 101 and 102 match host 200.200.200.1s outgoing and incoming traffic,
respectively. The outgoing traffic is shaped on interface serial0 to 128 Kbps. The incoming
traffic is shaped similarly on the ethernet0 interface.
class-map ToHostA
match ip access-list 102
policy-map frompolicy
class FromHostA
shape peak 128000 8192 1280
policy-map topolicy
class ToHostA
shape peak 128000 8192 1280
interface serial0/0/0
service-policy output frompolicy
interface ethernet1/0/0
service-policy output topolicy
The class maps FromHostA and ToHostA are used to match the from and the to traffic to
Host A, respectively. On both traffic classes, a policy to shape the traffic to 128 Kbps is
applied. The service-policy commands are used to activate the policies on the interface
output traffic.
The shape peak 128000 8192 1280 shapes traffic providing output traffic at an average rate
of 128 Kbps with packet bursts of 9472 (8192 + 1280) per interval.
Traffic Shaping 61
In this case study, we will go over the configuration required on the e-business site routers
interface to the service provider.
Listings 3-23 and 3-24 show samples of the configuration required for GTS and DTS,
respectively, to achieve this functionality.
The traffic-shape adaptive command makes the router adapt its traffic-shaping rate to
the incoming BECNs. The outgoing traffic on the interface is shaped at 256 Kbps (as
configured by the traffic-shape rate command) on average when it receives no BECNs.
The CIR is throttled to 64 Kbps, however, if it receives a series of BECNs, as in Figure 3-13.
Service provider
Access rate: 256 Kbps network
CIR: 64 Kbps
Send traffic
at 256 Kbps
T1
BECN
Send traffic at
the CIR 64 Kbps
on receipt of BECNs
continues
1163_09.book Page 62 Thursday, December 14, 2000 1:47 PM
62 Chapter 3: Network Boundary Traffic Conditioners: Packet Classifier, Marker, and Traffic Rate Management
The shape adaptive command makes the router adapt its traffic-shaping rate to the
incoming BECNs. The outgoing traffic on the interface is shaped at 256 Kbps (as
configured by the shape peak command) on average when it receives no BECNs. The CIR
is throttled to 64 Kbps, however, if it receives a series of BECNs.
Summary
Network boundary traffic conditioners provide packet classifier, marker, and traffic rate
management functions.
Packets are classified at the network boundary so that they receive differentiated service
within the network. Packet classification is necessary to identify different classes of traffic
based on their service level. You can perform packet classification based on one or more
fields in the packets IP header. After a packet is identified to be of a certain class, a marker
function is used to color the packet by setting the IP Precedence, the DSCP, or the QoS
group value.
Traffic rate management on network boundary routers is essential to ensure resource
availability and QoS within the network core. CAR is used to rate-limit any traffic trying to
go over the configured rate. CAR can send some burst of traffic at up to the line rate and
then start dropping packets after a given rate is reached. TS smoothes the traffic by queuing
packets and sending them at a configured rate.
TS is more TCP-friendly than CAR, as a packet drop can cause a TCP stream to lower the
window size to 1. This can lower the rate taken by the TCP stream below the allowed rate.
By choosing the right burst parameters, which can vary from one TCP implementation to
another, a TCP stream can take up the entire configured rate.
1163_09.book Page 63 Thursday, December 14, 2000 1:47 PM
64 Chapter 3: Network Boundary Traffic Conditioners: Packet Classifier, Marker, and Traffic Rate Management
grow quickly to a value above the extended limit such that even compounding is
not necessary to cause the packet to get dropped. You continue to drop packets
until the DA is reduced because of token accumulation.
Q How can a network administrator enable traffic rate limiting to be active during
a certain time period of the day only?
A A network administrator can make this possible by using time-based IP access
lists within CAR or modular QoS. Please refer to Cisco documentation for more
information on time-based IP access lists.
Q What amount of processor resources is taken by CAR on a router?
A The amount of processor resources taken by CAR depends on the match condition
and the depth of the rate-limit statements.
A match condition using a complicated extended IP access list takes more
processor power relative to one using a basic IP access list. In general, the new
precedence and MAC address access lists need less processor resources compared
to the IP access lists.
The rate limits are evaluated sequentially, so the amount of processor resources
used increases linearly with the number of rate limits checked.
References
1 Type of Service in the Internet Protocol Suite, P. Almquist, RFC 1349.
2 Internet Protocol Specification, Jon Postel, RFC 791.
3 Definition of the Differentiated Services Field (DS Field) in the IPv4 and IPv6 Headers,
K. Nichols, and others, RFC 2474.
1163_09.book Page 65 Thursday, December 14, 2000 1:47 PM
1163_04.fm Page 66 Thursday, December 14, 2000 2:36 PM
1163_04.fm Page 67 Thursday, December 14, 2000 2:36 PM
CHAPTER
4
Per-Hop Behavior: Resource
Allocation I
At times of network congestion, resource allocation for a flow on a router is determined by
the routers scheduling discipline for the packets queued in the queuing system. The
scheduling behavior determines which packet goes next from a queue. How often a flows
packets are served determines its bandwidth, or resource allocation.
The traditional packet scheduling mechanism on the Internet has been first-in, first-out
(FIFO) scheduling, by which packets are transmitted in the same order in which they arrive
in the output queue. FIFO is simple and easy to implement but cannot differentiate among
the many flows; hence, FIFO cannot allocate specific performance bounds for a flow, or
prioritize one flow over the others.
Weighted Fair Queuing (WFQ) is a scheduling discipline in which flow differentiation
occurs in scheduling. In WFQ, each flow or traffic class is assigned a weight, and the rate
at which a flow or a traffic class is serviced is proportional to its assigned weight. WFQ
provides prioritization among unequally weighted traffic flows and fairness and protection
among equally weighted traffic flows as per the max-min fair-share allocation scheme. This
chapter discusses max-min fair-share allocation and how Fair Queuing (FQ) simulates this
allocation scheme. It also covers WFQ in detail. Other related scheduling disciplines
Modified Weighted Round Robin (MWRR) and Modified Deficit Round Robin (MDRR)
are covered in Chapter 5, Per-Hop Behavior: Resource Allocation II.
Later in this chapter, priority queuing and custom queuing schemes are discussed. Priority
queuing and custom queuing also help define flows into different packet flows and schedule
them based on an absolute priority and round-robin basis, respectively.
The chapter ends with a section on scheduling disciplines for voice. Note that this chapter
covers only packet scheduling issues that decide which packet is served next. The queue
management component of scheduling, which defines the packet drop policy, is covered in
Chapter 6, Per-Hop Behavior: Congestion Avoidance and Packet Drop Policy.
because no queues build at the routers. When some network congestion exists, however,
queues build upon the routers, and the scheduling mechanism on a router determines the
order in which the packets in the queue are serviced.
For the scheduling algorithm to deliver QoS, at a minimum it needs to be able to
differentiate among the different packets in the queue and know each packets service level.
Such a scheduling algorithm should allow guarantees on performance bounds by allocating
resources on a flow basis and/or by prioritizing one flow over the other. You can do this at
a granularity of a single flow or a traffic class that might be made up of packets from
different traffic flows. Traffic classification is discussed in Chapter 3, Network Boundary
Traffic Conditioners: Packet Classifier, Marker, and Traffic Rate Management.
In addition, a scheduling algorithm is needed that provides fairness and protection among
the flows with the same priority, such as all best-effort traffic flows.
Other requirements for such a scheduling algorithm include ease of implementation and
admission control for flows requiring resource guarantees.
Although WFQ is more difficult to implement than FIFO queuing, it supports all the other
requirements for QoS support that FIFO cannot deliver. WFQ also can work in conjunction
with Resource Reservation Protocol (RSVP) to provide admission control for flows
signaling resource requirements using RSVP. RSVP is discussed in detail in Chapter 7,
Integrated Services: RSVP.
NOTE It is important to keep in mind the conservation law from queuing theory, which states that
any scheduling discipline can only reduce a flows mean delay at another flows expense.
Occasionally, latency is traded for bandwidth. Some flows are delayed to offer a particular
bandwidth to other flows. When someone gets preferential treatment, someone else will
suffer.
FIFO Queuing
FIFO queuing is a queuing mechanism in which the order of the packets coming into a
queue is the same as the order in which they are serviced or transmitted out of the queue.
Figure 4-1 illustrates a FIFO queue.
1163_04.fm Page 69 Thursday, December 14, 2000 2:36 PM
FIFO queue
3 2 1 3 2 1
In FIFO queuing, the order in which packets arrive in a queue is the same as the order in
which they are transmitted from the queue. FIFO, the most common scheduling mechanism
in routers today, is easy to implement. It has no mechanism to differentiate between flows,
however, and hence cannot prioritize among them. Not only can FIFO queuing not
prioritize one flow over the other, but it also offers neither protection nor fairness to equal-
priority traffic flows because a large, badly behaving flow can take the share of resources of
well-behaving flows with end-to-end, adaptive flow-control schemes, such as Transmission
Control Protocol (TCP) dynamic window control. With FIFO, flows receive service
approximately in proportion to the rate at which they send data into the network. Such a
scheme is obviously not fair, because it rewards greedy flows over well-behaved ones. Any
fairness algorithm by its nature offers protection against greedy flows.
demand for 3. This unused excess is distributed evenly between Users D and E so that each
now has a resource of 3.33 + (0.33 2) = 3.5.
We can calculate fair allocation as follows:
Fair allocation = (resource capacity resource capacity already allocated to users)
number of users who still need resource allocation
This example is illustrated in Figures 4-2, 4-3, and 4-4. In Step 1, shown in Figure 4-2, the
demands of Users A and B are fully allocated because their resource requests fall within the
fair allocation. In this step, the demands of C, D, and E exceed fair allocation of 2.8 and,
hence, cannot be allocated. In the next step, the unused excess bandwidth of A and Bs fair
allocation is equally distributed among the three remaining users, C, D, and E.
Step 1
14 0
Fair Allocation (FA) = = 2.8
5
6
Resource 5
demands
4
3
FA = 2.8
2
1
A B C D E Users
In Step 2, shown in Figure 4-3, the demand of User C is fully allocated because its resource
request falls within the fair allocation. In this step, the demands of D and E exceed fair
allocation of 3.33 and, hence, cannot be allocated. In the next step, the unused excess
bandwidth of Cs fair allocation is equally distributed between the two remaining users,
D and E.
In Step 3, shown in Figure 4-4, the fair allocation of 3.5 falls below the requests of both
Users D and E, which are each allocated 3.5 and have unsatisfied demands of 1.5 and 2.5,
respectively.
This scheme allocates resources according to the max-min fair-share scheme. Note that all
users with unsatisfied demands (beyond what is their max-min fair share) get equal
allocation. So, you can see that this scheme is referred to as max-min fair-share allocation
because it maximizes the minimum share of a user whose demand is not fully satisfied.
Consider an extension to the max-min fair-share allocation scheme in which each user is
assigned a weight. Such a scheme is referred to as weighted max-min fair-share allocation,
in which a users fair share is proportional to its assigned weight.
1163_04.fm Page 71 Thursday, December 14, 2000 2:36 PM
Step 2
14 (2 + 2)
Fair Allocation (FA) = = 3.33
3
6
Resource 5
demands
4
FA = 3.33
3
2
1
C D E Users
Step 3
Fair Allocation (FA) = 14 (2 + 2 + 3)= 3.5
2
6
Resource 5
demands
4
FA = 3.5
3
2
1
D E Users
If you assign a weight per flow, in each round of service, GPS services an amount of data
from a flow in proportion to the assigned weight. This GPS extension provides weighted
max-min fair share.
GPS, though an ideal model for max-min fair share, is not possible to implement. The right
scheduling algorithm for practical purposes is one that approximates GPS and can be
implemented.
When Packet A2 arrives, the flow is already active with A1 in the queue, waiting for service,
with a sequence number of 228. The sequence number of Packet A2 is 228 + 128 = 356,
because it needs to be transmitted after A1. Hence, the sequence number of a packet
arriving on an active flow is the highest sequence number of the packet in the flow queue,
plus its packet size in bytes.
Similarly, Packet A3 gets a sequence number of 356 + 128 = 484. Because Packets B1 and
C1 arrive on an inactive flow, their sequence numbers are 164 (that is, 100 + 64) and 132
(that is, 100 + 32), respectively.
Sequence Number (SN) assignment for a packet is summarized based on whether it arrives
on an active or an inactive flow as follows:
Packet arrives on an inactive flow:
SN = size of the packet in bytes + the round number at the time the packet arrived (The
round number is the sequence number of the last packet serviced.)
Packet arrives on an active flow:
SN = size of the packet in bytes + the highest sequence number of the packet already
in the flow queue
Packets in their flow queues, along with their computed sequence numbers, are shown in
Figure 4-5 to illustrate how the FQ scheduler emulated GPS. A GPS scheduler will have
completed scheduling the entire Packet A1 in the 228th round. The sequence number
denotes the relative order in which the packets are served by the scheduler. The FQ
scheduler serves the packets in the following order: C1, B1, A1, A2, and A3.
Round numbers are used only for calculating sequence numbers if the arriving packet
belongs to a new flow. Otherwise, the sequence number is based on the highest sequence
number of a packet in that flow awaiting service. If Packet A4 arrives at any time before A3
is serviced, it has a sequence number of 484 + 128 = 612.
Note that the round number is updated every time a packet is scheduled for transmission to
equal the sequence number of the packet being transmitted. So if Packet D1 of size 32
bytes, belonging to a new flow, arrives when A1 is being transmitted, the round number is
228 and the sequence number of D1 is 260 (228 + 32). Because D1 has a lower sequence
number than A2 and A3, it is scheduled for transmission before A2 and A3. Figure 4-6
depicts this change in scheduling order.
1163_04.fm Page 74 Thursday, December 14, 2000 2:36 PM
Figure 4-5 An Example Illustrating the Byte-by-Byte Round-Robin GPS Scheduler Simulation for FQ
Queue A
Queue B
B1
(64)
Queue C
C1
(32)
484 356 228 164 132
A3 A2 A1
164
C1 B1 A3 A2 A1 A3 A2 A1 B1
Queue B B1
132
Queue C C1
An FQ scheduler
Most often, some flows are considered more important or mission-critical than others. Such
flows need to be preferred over the others by the scheduler. You can expand the FQ concept
to assign weights per flow so that each flow is serviced in proportion to its weight. Such a
fair queuing system is called flow-based WFQ and is discussed in the next section.
1163_04.fm Page 75 Thursday, December 14, 2000 2:36 PM
Flow-Based WFQ 75
Figure 4-6 Illustration of FQ Scheduler Behavior; Packet D1 Arriving After Packet A1 Is Scheduled
Queue A
A3 (128) A2 (128)
D1 Queue D
(32)
484 356 260 228
A3 A2
D1 ..... A3 A2 .... A3 A2 D1
260
Queue D D1
An FQ scheduler
Flow-Based WFQ
In WFQ, weights are assigned based on their precedence value in the Internet Protocol (IP)
header. They are calculated as follows:
Weight = 4096 (IP precedence + 1)
1163_04.fm Page 76 Thursday, December 14, 2000 2:36 PM
NOTE With the recent changes in WFQ implementation, the preceding WFQ flow weight
calculation formula applies only when running IOS Versions 12.0(4)T or lower. The
changes are made to enable class guarantees for Class-Based Weighted Fair Queuing
(CBWFQ). CBWFQ is discussed later in this chapter.
For IOS Versions 12.0(5)T and higher, the WFQ flow weights discussed in this chapter are
multiplied by 8. Hence, the WFQ weight for best-effort (IP precedence 0) traffic becomes
4096 8 = 32768
and the equation for weight calculation becomes
Weight = 32768 (IP precedence + 1)
Table 4-1 tabulates a packets weight based on its IP precedence and Type of Service (ToS)
byte value.
Table 4-1 Weights Assigned Based on the IP Precedence Value of a Packet Belonging to an Unreserved (Non-
RSVP) Flow
IP Precedence ToS Byte Value Weight
IOS Versions
IOS Versions Prior 12.0(5)T and
to 12.0(5)T Higher
The weight of an RSVP flow with the largest bandwidth reservation is 4 until IOS Version
12.0(5)T and is 6 for 12.0(5)T and higher. The weight of all other RSVP flow reservations
is derived based on the largest bandwidth reservation, as shown here:
Weight for RSVP flow or conversation = highest bandwidth reservation on the link
(greatest bandwidth reservation on the link conservation bandwidth)
For the purpose of the discussion in the remainder of this section, weights prior to 12.0(5)T
are used. It is important to note that the exact weight calculation scheme used doesnt
matter in illustrating the working of WFQ.
1163_04.fm Page 77 Thursday, December 14, 2000 2:36 PM
Flow-Based WFQ 77
WFQ uses two packet parameters to determine its sequence number. Like FQ, WFQ uses
the packets byte size. In addition, however, WFQ uses the weight assigned to the packet.
The packets weight is multiplied by its byte size for calculating the sequence number. This
is the only difference between WFQ and FQ.
Note that the direct correlation between a byte-by-byte round-robin scheduler and FQ is
lost with WFQ, because the packets byte count is multiplied by its weight before its
sequence number is calculated. Consider a sequence number in WFQ as a number
calculated to determine the relative order of a packet in a WFQ scheduler, and consider the
round number as the sequence number of the last packet served in the WFQ scheduler.
Using the same example discussed in the FQ section, assume that packets of Flow A have
precedence 5, whereas Flows B and C have precedence 0. This results in a weight of 683
for packets in Flow A and 4096 for packets in Flows B and C. Table 4-2 shows all the flow
parameters in this example. The sequence number of Packet A1 is calculated as 100 + (683
128) = 87524. Similarly, you can calculate sequence numbers for packets A2, A3, B1, and
C1 as 174948, 262372, 262244, and 131172, respectively. So the order in which the
scheduler services them is A1, C1, A2, B1, and A3, as illustrated in Figure 4-7.
Table 4-2 Flow-Based WFQ Example
Packet Weight = 4096
Queue Size Precedence (Precedence + 1)
Queue A 128 5 683
Queue B 64 0 4096
Queue C 32 0 4096
Note that with WFQ, you can prioritize Flow A, but you cant accommodate Flows B and
C fairly. A WFQ scheduler simulates a max-min weighted GPS.
If Packets A4 and D1 (a new flow with a precedence 0 and size 32 bytes) arrive after A1 is
scheduled, A4 and D1 get a sequence number of 349,796 ((683 128) + 262,372) and
218,596 ((4096 32) + 87,524), respectively. The discussion of calculating the sequence
numbers for A4 and D1 with FQ still applies here. Now, the scheduling order of the
remaining packets is changed to C1, A2, D1, B1, A3, and A4. This is shown in Figure 4-8.
In Figure 4-8, packet D1 comes closely after packet A1 has been scheduled for transmis-
sion. Packet D1 is transmitted before packets A3 and A4, which arrived in the queue early.
1163_04.fm Page 78 Thursday, December 14, 2000 2:36 PM
262244
C1 B1 A3 A2 A1 A3 B1 A2 C1 A1
B1
Queue B (weight 4096) (64)
131172
C1
Queue C (weight 4096) (32)
NOTE Sequence number calculation for a packet arriving on an interface occurs only when
congestion is on the outbound interface (no buffer is available in the interface hardware
queue). When no congestion is on the outbound interface, the scheduling behavior is
FIFOan arriving packet is simply queued to the outbound interface hardware queue for
transmission.
The length of interface hardware transmit queues determines the maximum queuing delay
for real-time traffic in a WFQ scheduler. Real-time traffic such as voice has to wait until the
packets already queued in the hardware queue are sent before it can be transmitted.
Excessive queuing delays can result in jitter, a problem for real-time traffic such as voice.
Typical hardware interface buffers can hold from one to five packets. In IOS implement-
ation, most interfaces automatically reduce their hardware transmit queues to 2 when WFQ
is enabled. A network operator should be able to modify the length of the interface
hardware transmit queues based on the delay requirements for the traffic in the network.
This is especially true for voice and other real-time traffic. You can modify the interface
transmit queue size by using the tx-queue-limit command.
1163_04.fm Page 79 Thursday, December 14, 2000 2:44 PM
Flow-Based WFQ 79
262244
D1 ..... C1 B1 A3 A2 A4 A3 B1 D1 A2 C1
B1
Queue B (with weight 4096) (64)
131172
C1
Queue C (with weight 4096) (32)
218596
D1
Queue D (with weight 4096) (32)
WFQ Implementation
In flow-based WFQ implementation, weights are based strictly on precedence and cannot
be changed. Though FQ in itself is not available, WFQ becomes FQ for all practical
purposes when all traffic arriving at the scheduler carries the same precedence value.
With flow-based WFQ, packets with different IP precedence in a single flow are not
scheduled out of order. In this regard, a flow is implemented as a hash defined by source
and destination IP addresses, IP protocol field, Transmission Control Protocol/User
Datagram Protocol (TCP/UDP) port numbers and the 5 bits (excluding the 3 IP precedence
bits) in the ToS byte. Due to this flow description, packets of the same flow, but with
different precedence values, fall into the same queue. Packets within a flow queue are
serviced in FIFO order.
1163_04.fm Page 80 Thursday, December 14, 2000 2:36 PM
In general, WFQ limits its drops to the most active flows, whereas FIFO might drop from
any flow. Therefore, WFQ should encourage the most active flows to scale back without
affecting the smaller flows. Because the median flow duration in the Internet is 1020
packets in length, a fairly small percentage of the flows should be taking the lions share of
the drops with WFQ, while FIFO drops should be distributed across all flows. Hence, the
effects of global synchronization with FIFO are less pronounced with WFQ for traffic with
adaptive flow control such as TCP traffic. Global synchronization and proactive drop
policies are detailed in Chapter 6.
In general, a flow-based WFQ uses a subqueue for each flow. As such, flow-based WFQ
queues are referred to as conversation queues. Because memory is a finite resource, the
default number of conversation queues allocated is restricted to 256. This parameter is
configurable when enabling the fair-queue interface command, however. Note that
increasing the number of queues increases the memory taken by the queue data structures
and the amount of state information maintained by the router. If the number of flows is
greater than the number of queues, multiple flows can share a single queue. Configuring a
large number of queues increases the chances of having only one flow per queue.
Flow-based WFQ can also work in conjunction with Weighted Random Early Detection
(WRED), a proactive packet drop policy to avoid congestion. WRED is discussed in
Chapter 6.
Flow-based WFQ implementation is done using list sorting. The complexity is of the order
of O(n), where n is the number of packets waiting for service from a WFQ scheduler. List
sorting can become prohibitively expensive on high-bandwidth links where the number of
flows and the number of packets to be serviced per second is high.
NOTE Flow-based WFQ is available on most Cisco router platforms. One notable exception is
Cisco 12000 series routers. It is the default queuing mechanism in all interfaces with a
bandwidth less than 2 Mbps.
Flow-Based WFQ 81
After IP precedence is set at the ingress, you can deploy flow-based WFQ on the routers to
achieve the preceding objectives. Packet classification based on IP precedence is discussed
in case studies in Chapter 3.
You can set up flow-based WFQ on an interface by using the fair-queue interface
command. The show queue command, shown in Listing 4-1, describes the active flows
along with the queue depth, weight, and other queue statistics as well as the queue
parameters in the WFQ system.
Listing 4-1 show queue serial0 Command Output
Router#show queue serial0
Input queue: 0/75/0 (size/max/drops); Total output drops: 0
Queueing strategy: weighted fair
Output queue: 9/1000/120/0 (size/max total/threshold/drops
Conversations 1/4/256 (active/max active/threshold)
Reserved Conversations 0/1 (allocated/max allocated)
In Listing 4-1, max total is a per-interface, global limit on the number of buffers for WFQ,
whereas threshold is the maximum number of buffers allowed per conversation. Note that
max total is the same as the output hold queue, and you can change it using the hold-queue
<queue length> out interface command. A conversation with nonzero queue depth is said
to be active. active and max active show the present and the maximum number of active
conversations. Reserved Conversations shows RSVP flow reservations.
In the second part of Listing 4-1, depth shows the number of packets in the conservation
queue (which is defined below this line of queue statistics) waiting service, and weight
shows the weight assigned to this flow. A discard is done by WFQ logic when the number
of packets in a conversation queue exceeds the threshold. Tail drops happen when the WFQ
buffer surpasses the max total. At that time, any packet arriving on the tail of a full queue
gets dropped. Interleaves happen when link layer fragmentation and interleaving are
configured to allow small packets to be interleaved between fragments of big packets,
lowering the delay jitter seen by the small packet flows.
NOTE Queues are displayed only if one or more packets are in the queue at the time the show
queue is issued. Otherwise, the conversation is inactive and no output is displayed. Hence,
the show queue interface command does not give any output if the interface is lightly
loaded.
1163_04.fm Page 82 Thursday, December 14, 2000 2:36 PM
The show queueing fair command is used to check the WFQ parameters on an interface.
Listing 4-2 displays the output of the show queueing fair command.
Listing 4-2 show queueing fair Command Output
Router#show queueing fair
Current fair queue configuration:
The voice traffic is made up of 64 bytes and is represented as V1, V2, V3, and so on, and
FTP traffic consists of 1472-byte packets and is represented as F1, F2, F3, and so on.
The following scenarios discuss the resource allocation for voice traffic (when run in
conjunction with FTP traffic) in the FQ and WFQ situation.
In the first scenario, IP precedence is not set in any packet. All traffic carries the default
precedence value of 0. The voice traffic of 64-byte packets gets a fair treatment competing
with an FTP flow. When both flows are active, the voice traffic gets the same bandwidth as
the FTP traffic, effectively scheduling one FTP packet for 23 (= 1472 64) voice packets,
on average.
In the second scenario, voice packets carry precedence 5 and FTP traffic has a precedence
of 0. The voice traffic is weighted lower than the FTP traffic. The voice traffic and FTP get
6/7 percent and 1/7 percent of the link bandwidth, respectively. Hence, the voice traffic
flow gets six times the bandwidth of the FTP flow. Taking packet size differences into
consideration, when both flows are active, one FTP packet is scheduled for 138 (= 23 6)
voice packets!
NOTE WFQ in itself doesnt give absolute priority to a particular traffic flow. It can give only a
higher weight in the share of resources or certain guaranteed bandwidth, as shown in this
case study. Hence, it might not be ideal for interactive, real-time applications, such as voice,
in all circumstances. WFQ can work for voice when little background traffic is in the
network. On a loaded network, however, WFQ cannot achieve the low-jitter requirements
for interactive voice traffic. For applications such as voice, WFQ is modified with a priority
queue (PQ). WFQ with a PQ is discussed toward the end of this chapter.
memory requirements. It is a trade-off between the larger memory costs versus limitations
imposed by a smaller set of calendar queues.
Timestamps are computed for each arriving packet and are sorted using a calendar queue.
Any calendar queue-based implementation has its timestamp granularity constrained by the
number of calendar queues in the system. For the calendar queue system to behave the same
as the flow-based WFQ implementation, you need 4096 maximum transmission unit
(MTU)-size calendar queues!
Because the number of calendar queues prevents the timestamp granularity from allowing
unique timestamp values for packets ranging from 1 byte to MTU bytes, packets with
different sizes from flows/classes with the same weight cannot have the same timestamp.
To serve these variable-size packets with the same timestamp, it is necessary to run a
deficit-like algorithm on the calendar queues to ensure proper bandwidth allocation. Thus,
part of the DWFQ implementation has Deficit Round Robin (DRR)-like characteristics.
The DRR algorithm is described in Chapter 5.
DWFQ implementation has an individual queue limit and an aggregate queue limit across
all the individual queues. The individual queue limits are enforced only when the aggregate
queue limit is reached.
Flow-based DWFQ is actually FQ. Under FQ, all flows are the same, and flows are not
weighted. In the case of flow-based DWFQ, a flow is described by a hash function
consisting of the source and destination IP addresses, IP field, and TCP/UDP port numbers.
All non-IP traffic is treated as a single flow and, therefore, placed in the same queue.
Because of the reasons discussed in the section on flow-based WFQ, the number of
subqueues allowed in flow-based DWFQ is restricted. The total number of subqueues for a
flow-based DWFQ is 512. If more than 512 flows exist on an interface, some flows will
share the same subqueue.
Class-Based WFQ 85
displays a line showing VIP-based fair queuing as the queuing or scheduling strategy being
used. Listing 4-3 shows information on the flow-based DWFQ.
Listing 4-3 Flow-Based DWFQ Information
Router#show interface fair
POS0/0/0 queue size 0
packets output 2, wfq drops 1, nobuffer drops 0
WFQ: aggregate queue limit 5972, individual queue limit 2986
max available buffers 5972
The show interface fair command displays packet statistics for the number of packets
transmitted as well as packet drops due to DWFQ and buffer pool depletion.
NOTE The number of available buffers for DWFQ, individual, and aggregate limits is derived
based on the VIP Static Random Access Memory (SRAM) capacity, the number of
interfaces on the VIP, and the speed of those interfaces.
Class-Based WFQ
The last two sections discussed flow-based WFQ mechanisms running on the IOS router
platforms central processor, and flow-based DWFQ mechanisms running on the 7500
platforms VIP line cards. This section studies a CBWFQ mechanism that is supported in
both nondistributed and distributed operation modes.
CBWFQ allocates a different subqueue for each traffic class compared with a subqueue per
each flow in the flow-based versions of WFQ. So, you can use the existing flow-based
implementations of WFQ to deliver CBWFQ in both nondistributed and distributed modes
of operation by adding a traffic classification module in which each WFQ subqueue carries
a traffic class rather than a traffic flow. Hence, CBWFQ is still based on sequence number
computation when run on the routers central processor and on calendar queue
implementation when run on the 7500 platforms VIP line cards.
The CBWFQ mechanism uses the modular QoS command-line interface (CLI) framework
discussed in Appendix A, Cisco Modular QoS Command-Line Interface. As such, it
supports all classes supported under this framework. You can base traffic classes on a
variety of traffic parameters, such as IP precedence, Differentiated Services Code Point
(DSCP), input interface, and QoS groups. Appendix A lists the possible classifications.
CBWFQ enables a user to directly specify the required minimum bandwidth per traffic
class. This functionality is different from flow-based WFQ, where a flows minimum
bandwidth is derived indirectly based on the assigned weights to all active flows in the WFQ
system.
1163_04.fm Page 86 Thursday, December 14, 2000 2:36 PM
NOTE Note that CBWFQ also can be used to run flow-based WFQ. In CBWFQ, the default-class
traffic class appears as normal WFQ flows on which you can apply flow-based WFQ by
using the fair-queue command.
DWFQ and CBWFQ differ in that you can run FQ within any DWFQ class, but in the case
of CBWFQ, only default classes can run WFQ.
policy-map goldservice
class gold
bandwidth 500
interface serial0
service-policy output goldservice
access-list 101 permit ip any any udp range 1500 1600
CBWFQ can directly specify a minimum bandwidth per class by using the bandwidth
command.
The access-list 101 matches all critical database application traffic in the network. The
show policy and show class commands display all policy and class map information on the
router, respectively.
1163_04.fm Page 87 Thursday, December 14, 2000 2:36 PM
Class-Based WFQ 87
Listings 4-5 and 4-6 display the output of the policy goldservice and the gold traffic class,
respectively.
Listing 4-5 Information on the goldservice Policy
Router#show policy goldservice
Policy Map goldservice
Weighted Fair Queueing
Class gold
Bandwidth 500 (kbps) Max Thresh 64 (packets)
As in Case Study 4-1, you use show queueing fair for the WFQ information and the show
queue serial0 command to see packets waiting in the queue.
policy-map febandwidth
class server
bandwidth 30000
interface Hssi0/0
service-policy output febandwidth
1163_04.fm Page 88 Thursday, December 14, 2000 2:36 PM
Assume here that packet classification into IP precedence based on traffic rate was already
done on the network boundary routers or interfaces connecting to the customer traffic. See
Chapter 3 for details regarding packet classification.
On an HSSI interface, ToS classes 03 are given 6750 Kbps, 13500 Kbps, 18000 Kbps, and
6750 Kbps, respectively, based on the link bandwidth percentage allocations, as shown in
Listing 4-8.
Listing 4-8 Enabling CBWFQ for ToS Classes
class-map match-any class0
match ip precedence 4
match ip precedence 0
class-map match-any class1
match ip precedence 1
match ip precedence 5
class-map match-any class2
match ip precedence 2
match ip precedence 6
class-map match-any class3
match ip precedence 3
match ip precedence 7
policy-map tos-based
class class0
bandwidth 6750
class class1
bandwidth 13500
class class2
bandwidth 18000
class class3
bandwidth 6750
interface hssi0/0/0
service-policy output tos-based
1163_04.fm Page 89 Thursday, December 14, 2000 2:36 PM
Class-Based WFQ 89
NOTE Although bandwidth specification as a percentage of the total link bandwidth is not allowed
at the time of this writing, this option will be available soon.
ToS-Based DWFQ
ToS-based DWFQ is a class-based WFQ by which the two low-order precedence bits are
used to choose the queue. Only four possible queues exist. Table 4-4 shows the ToS class
allocation based on the IP precedence bits.
Table 4-4 ToS-Based Classification
IP Precedence Bits ToS Class Bits ToS Class Assigned
000 00 0
001 01 1
010 10 2
011 11 3
100 00 0
101 01 1
110 10 2
111 11 3
NOTE The weight parameter is used differently in DWFQ implementation when compared to
flow-based WFQ. Weights in DWFQ implementation indicate the percentage of link
bandwidth allocated.
Note that the default weights of ToS classes 1, 2, and 3 are 20, 30, and 40, respectively. In
this case, the intended weight assignments for classes 2 and 3 are their default values and
hence, they need not be configured at all. Class 1 is assigned a weight of 15. Listing 4-10
displays information on ToS-based DWFQ and its packet statistics.
Listing 4-10 ToS-Based DWFQ Information
Router#show interface fair
HSSI0/0/0 queue size 0
packets output 20, wfq drops 1, nobuffer drops 0
WFQ: aggregate queue limit 5972, individual queue limit 2986
max available buffers 5972
weight indicates a percentage of the link bandwidth allocated to the given class. A ToS class
3 has a weight of 40, for example, which means it is allocated 40 percent of the link
bandwidth during times when the queues for all four classes (0, 1, 2, and 3) are
simultaneously backlogged.
The weight for ToS class 0 is always based on the weights of the other classes and changes
when weights for any one of the classes 13 change. Because the total weight is 100, the
bandwidth allotted to ToS class 0 is always 100 - (weights of classes 13). Note the sum of
the class 13 weights should not exceed 99.
Priority Queuing 91
and not a field within the IP packet, unlike IP precedence. Without using modular QoS CLI,
the QoS group-based DWFQ feature is enabled by using a fair-queue qos-group
command.
The bandwidth is allocated in a ratio of 4:1 between QoS groups 3 and 0. Because the
weight indicates the percentage of bandwidth, the ratio of the weights to get the bandwidth
allocation desired is 80:20 for QoS groups 3 and 1.
Listing 4-12 shows the information on DWFQ parameters and operation on the router.
Listing 4-12 DWFQ Information
Router#show interface fair
HSSI0/0/0 queue size 0
packets output 3142, wfq drops 32, nobuffer drops 0
WFQ: aggregate queue limit 5972, individual queue limit 2986
max available buffers 5972
Priority Queuing
Priority queuing maintains four output subqueueshigh, medium, normal, and lowin
decreasing order of priority. A network administrator can classify flows to fall into any of
these four queues. Packets on the highest-priority queue are transmitted first. When that
queue empties, traffic on the next-highest-priority queue is transmitted, and so on. No
packets in the medium-priority queue are serviced if packets in the high-priority queue are
waiting for service.
1163_04.fm Page 92 Thursday, December 14, 2000 2:36 PM
Priority Queuing 93
Listings 4-14 and 4-15 show the current priority queuing configuration and interface
queuing strategy information, respectively. The show queueing priority command shows
the current priority queue configuration on the router. The show interface serial0
command shows the priority list configured on the interface as well as packet statistics for
the four priority queues.
Listing 4-14 Information on Priority Queuing Parameters
Router#show queueing priority
Current priority queue configuration:
List Queue Args
1 low default
1 high protocol ip list 103
1 medium protocol ip list 102
1 normal protocol ip list 101
1 low protocol ip list 100
Custom Queuing
Whereas priority queuing potentially guarantees the entire bandwidth for mission-critical
data at the expense of low-priority data, custom queuing guarantees a minimum bandwidth
for each traffic classification.
This bandwidth reservation discipline services each nonempty queue sequentially in a
round-robin fashion, transmitting a configurable percentage of traffic on each queue.
Custom queuing guarantees that mission-critical data is always assigned a certain
percentage of the bandwidth, while assuring predictable throughput for other traffic.
You can think of custom queuing as CBWFQ with lots of configuration details.
You can classify traffic into 16 queues. Apart from the 16 queues is a special 0 queue, called
the system queue. The system queue handles high-priority packets, such as keepalive
packets and control packets. User traffic cannot be classified into this queue. Custom
queuing is implemented to classify IP packets into any of the 16 queues based on input
interface, simple and extended IP access lists, packet size, and application type.
A popular use of custom queuing is to guarantee a certain bandwidth to a set of places
selected by an access list. To allocate bandwidth to different queues, you must specify the
byte count for each queue.
1163_04.fm Page 95 Thursday, December 14, 2000 2:36 PM
Custom Queuing 95
1, 11.2, 1.3
The result is the ratio of the number of packets that must be sent so that
the percentage of bandwidth each protocol uses is approximately 20, 60,
and 20 percent.
Step 3 A fraction in any of the ratio values means an additional packet is sent.
Round up the numbers to the next whole number to obtain the actual
packet count.
In this example, the actual ratio is 1 packet, 12 packets, and 2 packets.
Step 4 Convert the packet number ratio into byte counts by multiplying each
packet count by the corresponding packet size.
In this example, the number of packets sent is one 1086-byte packet,
twelve 291-byte packets, and two 831-byte packets, or 1086, 3492, and
1662 bytes, respectively, from each queue. These are the byte counts you
would specify in your custom queuing configuration.
Step 5 To determine the bandwidth distribution this ratio represents, first
determine the total number of bytes sent after all three queues are
serviced:
(1 1086) + (12 291) + (2 831) = 1086 + 3492 + 1662 = 6240
Step 6 Then determine the percentage of the total number of bytes sent from
each queue:
1086 6240, 3492 6240, 1662 6240 = 17.4%, 56%, and 26.6%
As you can see, this is close to the desired ratio of 20:60:20.
Step 7 If the actual bandwidth is not close enough to the desired bandwidth,
multiply the original ratio of 1:11.2:1.3 in Step 2 by the best value, in
order to get the ratio as close to three integer values as possible. Note that
the multiplier you use need not be an integer. If you multiply the ratio by
2, for example, you get 2:22.4:2.6. You would now send two 1086-byte
packets, twenty-three 291-byte packets, and three 831-byte packets, or
2172:6693:2493, for a total of 11,358 bytes. The resulting ratio is
19:59:22 percent, which is much closer to the desired ratio you achieved.
1163_04.fm Page 97 Thursday, December 14, 2000 2:36 PM
Custom Queuing 97
Listing 4-18 is the sample configuration needed to stipulate the byte count of the three
protocol queues and the assignment of each protocol traffic to its appropriate queue. The
configured custom Queue 1 is enabled on interface Serial0/0/3.
Listing 4-18 Enabling Custom Queuing
interface Serial0/0/3
custom-queue-list 1
Listings 4-19 and 4-20 show information on the custom queuing configuration and the
interface queuing strategy, respectively. The show queueing custom command is used to
display the custom queuing configuration.
Listing 4-19 Information on Custom Queuing Configuration
Router#show queueing custom
Current custom queue configuration:
The show interface command gives the maximum queue size of each of the 16 queues
under custom queuing, along with the instantaneous queue size and packet drop statistics
per queue at the time the command is issued.
1163_04.fm Page 98 Thursday, December 14, 2000 2:36 PM
Also, note that window size affects the bandwidth distribution as well. If the window size
of a particular protocol is set to 1, that protocol does not place another packet into the queue
until it receives an acknowledgment. The custom queuing algorithm moves to the next
queue if the byte count is exceeded or if no packets are in that queue.
Therefore, with a window size of 1, only one packet is sent each time. If the byte count is
set to 2 KB and the packet size is 256 bytes, only 256 bytes are sent each time this queue is
serviced.
NOTE Although custom queuing allows bandwidth reservation per traffic class, as does CBWFQ,
CBWFQ has many advantages over custom queuing. Some are listed here:
Setting up CBWFQ is far easier and straightforward compared to enabling custom
queuing for bandwidth allocations.
RSVP depends on CBWFQ for bandwidth allocation.
With CBWFQ, you can apply packet drop policies such as Random Early Detection
(RED) on each traffic class, in addition to allocating a minimum bandwidth.
You are no longer limited to 16 custom queues. CBWFQ supports 64 classes.
differentiation and bandwidth guarantees among the other traffic classes. CBWFQ with a
priority queue is also called low latency queuing (LLQ).
A voice priority queue in CBWFQ provides a single priority queue behaving similar to the
high-priority queue discussed in the priority queuing section of this chapter. The remaining
queues are CBWFQ queues (one queue per traffic class) delivering differentiation and
bandwidth guarantees among the queues based on the configured weight or the bandwidth
allocated for each queue.
You can identify voice traffic by its Real-time Transport Protocol (RTP) port numbers and
classify it into a priority queue by using the ip rtp priority command on an output
interface. RTP is discussed in Appendix D, Real-time Transport Protocol.
A busy priority queue can potentially starve the remaining queues on an interface, making
the performance seen by the CBWFQ queues less than desirable. To alleviate this problem,
a user can set up a maximum bandwidth against which the traffic serviced by the priority
queue is policed. If the traffic exceeds the configured bandwidth, all excess traffic is
dropped. Note that this might not always be the right way of policing traffic, because it is
based on bandwidth rather than voice calls.
The sum of the bandwidths for the priority queue and the CBWFQ queues is not allowed
to exceed 75 percent of the interface bandwidth. This is done to provide room for other
unclassified traffic and Layer 2 encapsulations. However, using the max-reserved-
bandwidth command, you can modify the default maximum reservable bandwidth.
Even with the priority queue in CBWFQ, the queuing system cant service and transmit a
voice packet arriving on an empty priority queue immediately, because it needs to finish
scheduling the packet it is already servicing and transmit it on the wire. Voice packets are
small in size, but the data packet sizes carried in the CBWFQ queues can potentially be
large. The larger the packet sizes, the greater the possible delay seen by the packets in the
voice traffic.
This delay can be more perceivable on low-speed interfaces. To reduce voice traffic delay,
Multilink Point-to-Point (MLPP) fragmentation needs to be configured for low-speed
interfaces to fragment the large data packets so that you can interleave the small voice
packets between the data fragments that make up a large data packet. MLPP is discussed in
Appendix E, General IP Line Efficiency Functions.
NOTE You can enable the voice priority queue in conjunction with not only CBWFQ, but also
flow-based WFQ.
1163_04.fm Page 100 Thursday, December 14, 2000 2:36 PM
jnterface serial0
service-policy output premiumpolicy
On interface serial0, CBWFQ allocates a minimum bandwidth of 640 Kbps to the premium
voice traffic using CBWFQ. CBWFQ services the voice traffic on a strict priority queue
based on the priority keyword command in the policy-map statement.
Summary 101
Custom queuing supports up to 16 queues per interface. To support priority queue(s), one
of the 16 queues can be set up as the lowest-custom queue. This modifies all queues before
the lowest-custom queue to be priority queues. If Queue 6 is configured as a lowest-custom
queue, for example, Queues 15 act as priority queues and queues 616 act as custom
queues. Queues 15 are in decreasing order of priority. Queue 1 is the highest-priority
queue and Queue 5 the lowest.
If a custom queue is being serviced when the priority queues are empty, a newly arrived
packet in a priority queue needs to wait until the custom queue is serviced according to its
byte-count allocation. So the byte counts need to be kept low to avoid large delays for
priority packets.
NOTE Note that MWRR and MDRR scheduling can also support voice when one of their queues
is made a strict priority queue. MWRR and MDRR algorithms are discussed in Chapter 5.
Summary
At times of network congestion, a scheduling discipline can allocate a specific bandwidth
to a certain traffic flow or packet class by determining the order in which the packets in its
queue get serviced. In flow-based WFQ, all flows with the same weight are treated fairly
based on the max-min fair-share algorithm, and flows with different weights get unequal
bandwidth allocations based on their weights. The CBWFQ algorithm is a class-based
WFQ mechanism using modular QoS CLI. It is used to allocate a minimum guaranteed
bandwidth to a traffic class. Each traffic class is allocated a different subqueue and is
serviced according to its bandwidth allocation.
Priority and custom queuing algorithms service queues on a strict priority and round-robin
basis, respectively. Voice traffic can be serviced on a strict priority queue in CBWFQ and
custom queuing so that voice traffic sees low jitter.
1163_04.fm Page 102 Thursday, December 14, 2000 2:36 PM
References 103
References
1 An Engineering Approach to Computer Networking, S. Keshav, Reading, MA:
Addison-Wesley, 1997.
2 A classical self-clocked WFQ algorithm, A. Demers, S. Keshav, and S. Shenker,
SIGCOMM 1989, Austin, TX, September 1989.
1163_09.book Page 104 Thursday, December 14, 2000 1:47 PM
1163_09.book Page 105 Thursday, December 14, 2000 1:47 PM
CHAPTER
5
Per-Hop Behavior: Resource
Allocation II
The right packet scheduling mechanism for a router depends on its switching architecture.
Weighted Fair Queuing (WFQ) is a scheduling algorithm for resource allocation on Cisco
router platforms with a bus-based architecture. Cisco platforms using a switch fabric for
packet switching tend to use a scheduling algorithm that better suits that architecture.
In particular, the Cisco Catalyst family of switches and 8540 routers use the Modified
Weighted Round Robin (MWRR) algorithm, and the Cisco 12000 series routers use the
Modified Deficit Round Robin (MDRR) algorithm. Both MWRR and MDRR are similar
in scheduling behavior to WFQ because they, too, simulate Generalized Processor Sharing
(GPS).
The next two sections provide a detailed discussion of the MWRR and MDRR algorithms.
Before a queue is serviced, its deficit counter is initialized to the queues weight. A packet
from a queue is scheduled only if the deficit counter is greater than zero. After serving the
n-cell packet, the resulting counter is decremented by n. Packets are scheduled as long as
the counter is greater than 0. Otherwise, you skip to the next queue. In each coming round,
the queues deficit counter is incremented by the queues weight. No packet is scheduled,
however, if the deficit counter is still not greater than 0. If the counter becomes greater than
0, a packet is scheduled. After serving the packet, the deficit counter is decremented by the
number of cells in the packet. By using a deficit counter, MWRR works independent of the
variable-length packet sizes in the long run.
The effective bandwidth for each queue is proportional to its weight:
Effective queue bandwidth = (Queue weight Interface bandwidth) Sum of all active queue weights)
The queues show the cells queued, and the cells making up a packet are marked in the same
shade of black. Queue 2, for example, has a 2-cell, 3-cell, and 4-cell packet in its queue.
Queue 0 is the first queue being served. The deficit counter is initialized to 2, the queues
weight. At the head of the queue is a 4-cell packet. Therefore, the deficit counter becomes
2 4 = 2 after serving the packet. Because the deficit counter is negative, the queue cannot
be served until it accumulates to a value greater than zero, as in Figure 5-2.
Queue 1 is the next queue to be served. Its deficit counter is initialized to 3. The 3-cell
packet at the head of the queue is served, which makes the deficit counter become 3 3 =
0. Because the counter is not greater than zero, you skip to the next queue, as in Figure 5-3.
1163_09.book Page 107 Thursday, December 14, 2000 1:47 PM
Figure 5-1 WRR Queues with Their Deficit Counters Before Start of Service
Queue 2
9 8 7 6 5 4 3 2 1
Queue 1
9 8 7 6 5 4 3 2 1
Queue 0
9 8 7 6 5 4 3 2 1
Queue Deficit
counter
0 0
1 0
2 0
Queue 2
9 8 7 6 5 4 3 2 1
Queue 1
9 8 7 6 5 4 3 2 1
Queue 0
9 8 7 6 5
Queue Deficit
counter
2 0
1 0
0 -2
1163_09.book Page 108 Thursday, December 14, 2000 1:47 PM
Queue 2
9 8 7 6 5 4 3 2 1
Queue 1
9 8 7 6 5 4
Queue 0
9 8 7 6 5
Queue Deficit
counter
2 0
1 0
0 -2
Now it is Queue 2s turn to be serviced. Its deficit counter is initialized to 4. The 2-cell
packet at the head of the queue is served, which makes the deficit counter 4 2 = 2. The
next 3-cell packet is also served, as the deficit counter is greater than zero. After the 3-cell
packet is served, the deficit counter is 2 3 = 1, as in Figure 5-4.
Queue 0 is now served in the second round. The deficit counter from the last round was
2. Incrementing the deficit counter by the queues weight makes the counter 2 + 2 = 0.
No packet can be served because the deficit counter is still not greater than zero, so you skip
to the next queue, as in Figure 5-5.
Queue 1 has a deficit counter of zero in the first round. For the second round, the deficit
counter is 0 + 3 = 3. The 4-cell packet at the head of the queue is served, making the deficit
counter 3 4 = 1, as in Figure 5-6.
1163_09.book Page 109 Thursday, December 14, 2000 1:47 PM
Queue 2
9 8 7 6
Queue 1
9 8 7 6 5 4
Queue 0
9 8 7 6 5
Queue Deficit
counter
2 -1
1 0
0 -2
Queue 2
9 8 7 6
Queue 1
9 8 7 6 5 4
Queue 0
9 8 7 6 5
Queue Deficit
counter
2 -1
1 0
0 0
1163_09.book Page 110 Thursday, December 14, 2000 1:47 PM
Queue 2
9 8 7 6
Queue 1
9 8
Queue 0
9 8 7 6 5
Queue Deficit
counter
2 -1
1 -1
0 0
In the second round, Queue 2s deficit counter from the first round is incremented by the
queues weight, making it 1 + 4 = 3. The 4-cell packet at the head of Queue 2 is served,
making the deficit counter 3 4 = 1. Because Queue 2 is now empty, the deficit counter is
initialized to zero, as in Figure 5-7.
Now, it is again Queue 0s turn to be served. Its deficit counter becomes 0 + 2 = 2. The
2-cell packet at the head of the queue is served, which results in a deficit counter of 2 2 =
0. Now skip to Queue 1, as in Figure 5-8.
Queue 1s new deficit counter is 1 + 3 = 2. The 2-cell packet at the head of Queue 1 is
served, resulting in a deficit counter of 2 2 = 0. The resulting Queue 1 is now empty.
Because Queue 2 is already empty, skip to Queue 0, as in Figure 5-9.
1163_09.book Page 111 Thursday, December 14, 2000 1:47 PM
Queue 2
Queue 1
9 8
Queue 0
9 8 7 6 5
Queue Deficit
counter
2 0
1 -1
0 0
Queue 2
Queue 1
9 8
Queue 0
9 8 7
Queue Deficit
counter
2 0
1 -1
0 0
1163_09.book Page 112 Thursday, December 14, 2000 1:47 PM
Queue 2
Queue 1
Queue 0
9 8 7
Queue Deficit
counter
2 0
1 0
0 0
Queue 0s deficit counter in the fourth round becomes 2. The 3-cell packet is served, which
makes the deficit counter equal to 1. Because Queue 0 is now empty, reset the deficit
counter to zero.
MWRR Implementation
MWRR is implemented in the Cisco Catalyst family of switches and the Cisco 8540
routers. These switches and routers differ in terms of the number of available MWRR
queues and in the ways you can classify traffic into the queues.
MWRR in 8540 routers offers four queues between any interface pair based on Type of
Service (ToS) group bits. Table 5-2 shows the ToS class allocation based on the IP
precedence bits.
Table 5-2 MWRR ToS Class Allocation
IP Precedence Bits ToS Class Bits ToS Class Assigned
000 00 0
001 00 1
010 01 2
011 01 3
1163_09.book Page 113 Thursday, December 14, 2000 1:47 PM
NOTE 8500 ToS-based MWRR is similar to ToS-based Distributed WFQ (DWFQ), discussed in
Chapter 4, Per-Hop Behavior: Resource Allocation I, but differs in terms of which
precedence bits are used to implement it. ToS-based DWFQ uses the two low-order
precedence bits, whereas 8500 ToS-based MWRR uses the two high-order precedence bits.
In both cases, the leftover bit can signify the drop priority. Drop priority indicates which IP
precedence packets can be dropped at a higher probability between the IP precedence
values making up a ToS class.
Catalyst 6000 and 6500 series switches use MWRR with two queues, Queue 1 and Queue
2, based on the Layer 2 Institute of Electrical and Electronic Engineers (IEEE) 802.1p Class
of Service (CoS) field. Frames with CoS values of 03 go to Queue 1, and frames with CoS
values of 47 go to Queue 2. 802.1p CoS is discussed in Chapter 8, Layer 2 QoS:
Interworking with IP QoS.
6500 series switches also implement strict priority queues as part of MWRR to support the
low-latency requirements of voice and other real-time traffic.
You can enable Quality of Service (QoS)-based forwarding in an 8540 router by using the
global command qos switching. The default weight allocation for ToS classes 03 is 1, 2,
4, and 8, respectively. Hence, ToS classes 03 get an effective bandwidth of 1/15, 2/15,
4/15, and 8/15 of the interface bandwidth.
In this case, the bandwidth allocation for classes 03 is 15:15:30:40 or 3:3:6:8 because the
WRR scheduling weight can only be between 115.
Listing 5-1 shows a sample configuration to enable ToS-based MWRR globally on an 8500
router.
Listing 5-1 Enabling ToS-Based MWRR
qos switching
qos mapping precedence 0 wrr-weight 3
qos mapping precedence 1 wrr-weight 3
qos mapping precedence 2 wrr-weight 6
qos mapping precedence 3 wrr-weight 8
The configuration to QoS-switch according to the above criteria for traffic coming into port
1 and going out of port 0 only is given in Listing 5-2.
Listing 5-2 Enabling ToS-Based MWRR on Specific Traffic
qos switching
qos mapping <incoming interface> <outgoing interface>
precedence 0 wrr-weight 3
qos mapping <incoming interface> <outgoing interface>
precedence 1 wrr-weight 3
qos mapping <incoming interface> <outgoing interface>
precedence 2 wrr-weight 6
qos mapping <incoming interface> <outgoing interface>
precedence 3 wrr-weight 8
its allocated quantum bytes. The amount the queue is entitled to be served in a subsequent
round is reduced from the quantum bytes by a value equal to the deficit counter.
For efficiency, you should make the quantum size equal to the maximum packet size in the
network. This ensures that the DRR scheduler always serves at least one packet from each
nonempty flow queue.
NOTE An empty flow queues deficit counter is reset to zero so that credits are not accumulated
indefinitely, as it would eventually lead to unfairness.
The general DRR algorithm described in this section is modified to allow a low-latency
queue. In MDRR, all queues are serviced in a round-robin fashion with the exception of the
low-latency queue. You can define this queue to run in either one of two ways: in strict
priority or alternate priority mode.
In strict priority mode, the low-latency queue is serviced whenever the queue is nonempty.
This allows the lowest possible delay for this traffic. It is conceivable, however, for the other
queues to starve if the high-priority, low-latency queue is full for long periods of time
because it can potentially take 100 percent of the interface bandwidth.
In alternate priority mode, the low-latency queue is serviced alternating between the low-
latency queue and the remaining CoS queues. In addition to a low-latency queue, MDRR
supports up to seven other queues, making the total number of queues to eight. Assuming
that 0 is the low-latency queue, the queues are served in the following order: 0, 1, 0, 2, 0,
3, 0, 4, 0, 5, 0, 6, 0, 7.
In alternate priority mode the largest delay for Queue 0 is equal to the largest single
quantum for the other queues rather than the sum of all the quanta for the queues if Queue
0 were served in traditional round-robin fashion.
In addition to being DRR-draining, MDRR is not conventional round-robin scheduling.
Instead, DRR is modified in such a way that it limits the latency on one user-configurable
queue, thus providing better jitter characteristics.
An MDRR Example
This example, which illustrates an alternate-priority low-latency queue, defines three
queuesQueue 2, Queue 1, and Queue 0, with weights of 1, 2, and 1, respectively. Queue
2 is the low-latency queue running in alternate-priority mode. All the queues, along with
their deficit counters, are shown in Figure 5-10.
1163_09.book Page 116 Thursday, December 14, 2000 1:47 PM
Queue 2
500 1500 500
Queue 1
500 1500 500 1500
Queue 0
1500 500 1500
Queue Deficit
counter
2 0
1 0
0 0
Table 5-4 provides the weight and quantum associated with each queue. When MDRR is
run on the output interface queue, the interface maximum transmission unit (MTU) is used.
When MDRR is run, the fabric queues.
Table 5-4 Queues 02, Along with Their Associated Weights and Quantum Values
Quantum = Weight MTU
Queue Number Weight (MTU = 1500 Bytes)
Queue 2 1 1500
Queue 1 2 3000
Queue 0 1 1500
On the first pass, Queue 2 is served. Queue 2s deficit counter is initialized to equal its
quantum value, 1500. Queue 2 is served as long as the deficit counter is greater than 0. After
serving a packet, Queue 2s size is subtracted from the deficit counter. The first 500-byte
packet from the queue gets served because the deficit counter is 1500. Now, the deficit
counter is updated as 1500 500 = 1000. Therefore, the next packet is served. After the
1500-byte packet is served, the deficit counter becomes 500 and Queue 2 can no longer be
served. Figure 5-11 shows the three queues and the deficit counters after Queue 2 is served.
1163_09.book Page 117 Thursday, December 14, 2000 1:47 PM
Queue 2
500
Queue 1
500 1500 500 1500
Queue 0
1500 1000 1500
Queue Deficit
counter
2 -500
1 0
0 0
Because you are in alternate-priority mode, you alternate between serving Queue 2 and
another queue. This other queue is selected in a round-robin fashion. Consider that in the
round robin, it is now Queue 0s turn. The deficit counter is initialized to 1500, the quantum
value for the queue. The first 1500-byte packet is served. After serving the first packet, its
deficit counter is updated as 1500 1500 = 0. Hence, no other packet can be served in this
pass. Figure 5-12 shows the three queues and their deficit counters after Queue 0 is served.
Because you alternate between the low-latency queue and the other queues served in the
round robin, Queue 2 is served next. Queue 2s deficit counter is updated to 500 + 1500 =
1000. This allows the next packet in Queue 2 to be served. After sending the 500-byte
packet, the deficit counter becomes 500. It could have served another packet, but Queue 2
is empty. Therefore, its deficit counter is reset to 0. An empty queue is not attended, and the
deficit counter remains 0 until a packet arrives on the queue. Figure 5-13 shows the queues
and the counters at this point.
1163_09.book Page 118 Thursday, December 14, 2000 1:47 PM
Queue 2
500
Queue 1
500 1500 500 1500
Queue 0
1500 1000
Queue Deficit
counter
2 -500
1 0
0 0
Queue 2
Queue 1
500 1500 500 1500
Queue 0
1500 1000
Queue Deficit
counter
2 0
1 0
0 0
1163_09.book Page 119 Thursday, December 14, 2000 1:47 PM
Queue 1 is served next. It deficit counter is initialized to 3000. This allows three packets to
be sent, leaving the deficit counter to be 3000 1500 500 1500 = 500. Figure 5-14
shows the queues and the deficit counters at this stage.
Queue 2
Queue 1
500
Queue 0
1500 1000
Queue Deficit
counter
2 0
1 -500
0 0
Queue 0 is the next queue serviced and sends two packets, making the deficit counter
1500 1000 1500 = 500. Because the queue is now empty, the deficit counter is reset
to 0. Figure 5-15 depicts the queues and counters at this stage.
Queue 1 serves the remaining packet in a similar fashion in its next pass. Because the queue
becomes empty, its deficit counter is reset to 0.
MDRR Implementation
Cisco 12000 series routers support MDRR. MDRR can run on the output interface queue
(transmit [TX] side) or on the input interface queue (receive [RX] side) when feeding the
fabric queues to the output interface.
Different hardware revisions of line cards termed as engine 0, 1, 2, 3, and so on, exist for
Cisco 12000 series routers. The nature of MDRR support on a line card depends on the line
cards hardware revision. Engine 0 supports MDRR software implementation. Line card
hardware revisions, Engine 2 and above, support MDRR hardware implementation.
1163_09.book Page 120 Thursday, December 14, 2000 1:47 PM
Queue 2
Queue 1
500
Queue 0
Queue Deficit
counter
2 0
1 -500
0 0
MDRR on the RX
MDRR is implemented in either software or hardware on a line card. In a software
implementation, each line card can send traffic to 16 destination slots because the 12000
series routers use a 16x16 switching fabric. For each destination slot, the switching fabric
has eight CoS queues, making the total number of CoS queues 128 (16 x 8). You can
configure each CoS queue independently.
In the hardware implementation, each line card has eight CoS queues per destination
interface. With 16 destination slots and 16 interfaces per slot, the maximum number of CoS
queues is 16 16 8 = 2048. All the interfaces on a destination slot have the same CoS
parameters.
MDRR on the TX
Each interface has eight CoS queues, which you can configure independently in both
hardware- and software-based MDRR implementations.
Flexible mapping between IP precedence and the eight possible queues is offered in the
MDRR implementation. MDRR allows a maximum of eight queues so that each IP
precedence value can be made its own queue. The mapping is flexible, however. The
1163_09.book Page 121 Thursday, December 14, 2000 1:47 PM
number of queues needed and the precedence values mapped to those queues are user-
configurable. You can map one or more precedence values into a queue.
MDRR also offers individualized drop policy and bandwidth allocation. Each queue has its
own associated Random Early Detection (RED) parameters that determine its drop
thresholds and DRR quantum, the latter which determines how much bandwidth it gets. The
quantum (in other words, the average number of bytes taken from the queue for each
service) is user-configurable.
interface POS2/0
tx-cos cos-a
interface POS3/0
tx-cos cos-a
slot-table-cos table-a
destination-slot 0 cos-a
destination-slot 1 cos-a
destination-slot 2 cos-a
rx-cos-slot 1 table-a
rx-cos-slot 2 table-a
rx-cos-slot 3 table-a
cos-queue-group cos-a
precedence all random-detect-label 0
precedence 0 queue 0
precedence 1 queue 1
continues
1163_09.book Page 122 Thursday, December 14, 2000 1:47 PM
Listing 5-3 Defining Traffic Classes and Allocating Them to Appropriate Queues with a Minimum Bandwidth
During Congestion (Continued)
precedence 2 queue 2
precedence 3 queue 3
precedence 4 queue low-latency
precedence 5 queue 0
random-detect-label 0 50 200 2
exponential-weighting-constant 8
queue 0 10
queue 1 20
queue 2 30
queue 3 40
queue low-latency strict-priority 20
All interfaces for PoS1/0, PoS2/0, and PoS3/0 are configured with TX CoS based on the
cos-queue group cos-a command. The cos-a command defines a CoS policy. Traffic is
mapped into classes based on their IP precedence value in the packet. Each of the five
classes is allocated to its individual queue, and weights are allocated based on the
bandwidth allocation for each class. The bandwidth allocation for a class is proportional to
its weight. The percentage of interface bandwidth allocation for Queues 03 and the low-
latency queue is 8.33, 16.67, 25, 33.33, and 16.67, respectively. Voice is delay-sensitive but
not bandwidth-intensive. Hence, it is allocated a low-latency queue with strict priority, but
it doesnt need a high-bandwidth allocation.
The network supports only IP precedence values of 04. IP precedence 5 is allocated to
Queue 0 for best-effort service. IP precedence 6 and 7 packets are control packets that a
router originates. They are flagged internally and are transmitted first, regardless of the
MDRR configuration.
The cos-a cos-queue-group command defines a Weighted Random Early Detection
(WRED), a congestion avoidance policy that applies to all queues as follows:
Minimum threshold: 50 packets
Maximum threshold: 200 packets
Probability of dropping packets at maximum threshold: = 50 percent
Exponential weighting constant to calculate average queue depth: 8
WRED is discussed in Chapter 6, Per-Hop Behavior: Congestion Avoidance and Packet
Drop Policy.
The MDRR algorithm can also be applied on the input interface line card on the fabric
queues delivering the packet to the destination line card. The slot-table-cos command
defines the CoS policy for each destination line cards CoS fabric queues on the receive line
card. In the example, the table-a slot-table-cos command defines the CoS policy for
1163_09.book Page 123 Thursday, December 14, 2000 1:47 PM
destination line cards 02 based on the cos-a cos-queue-group command. Note that the
destination line card can be the same as the receive line card because the input and output
interfaces for certain traffic can exist on the same line card.
The rx-cos-slot command applies the table-a slot-table-cos command to a particular slot
(line card). Listing 5-4 shows the CoS configuration on the router.
Listing 5-4 CoS Information
Router#show cos
Interface Queue Cos Group
PO1/0 cos-a
PO2/0 cos-a
PO3/0 cos-a
Note that Queues 46 are not mapped to any IP precedence, so they are empty queues. Only
Queues 03 and the low-latency queue are mapped to an IP precedence, and bandwidth is
allocated proportional to the queue weights during congestion.
From the line card, the show controllers frfab/tofab cos-queue length/parameters/
variables command shows information regarding the CoS receive and transmit queues.
1163_09.book Page 124 Thursday, December 14, 2000 1:47 PM
Summary
In this chapter, we discuss two new scheduling algorithms, MWRR and MDRR, that are
used for resource allocation. MWRR and MDRR are similar to WFQ algorithm in their
scheduling behavior. MWRR and MDRR scheduling can also support voice traffic if the
voice queue is made a strict priority queue. At this time, MWRR and MDRR are used in the
Catalyst family of switches and Cisco 12000 series routers, respectively.
References
1 An Engineering Approach to Computer Networking, S. Keshav, Addison-Wesley,
1997.
2 Efficient Fair Queuing using Deficit Round Robin, M. Shreedhar, George Varghese,
SIGCOMM 1995, pp. 231-242.
1163_09.book Page 125 Thursday, December 14, 2000 1:47 PM
1163_09.book Page 126 Thursday, December 14, 2000 1:47 PM
1163_09.book Page 127 Thursday, December 14, 2000 1:47 PM
CHAPTER
6
Per-Hop Behavior: Congestion
Avoidance and Packet Drop
Policy
A packet drop policy is a queue management algorithm that manages the packets and queue
length in a queuing system. Traditional first-in, first-out (FIFO) queue management uses a
simple tail-drop policy, which drops any packet arriving on a full queue.
Transmission Control Protocol (TCP) is currently the dominant transport protocol used on
the Internet. This chapter discusses TCP congestion control mechanisms and how TCP
traffic reacts in a tail-drop scenario. It calls for an active queue management algorithm,
Random Early Detection (RED), that avoids network congestion by dropping packets
proactively to signal congestion to the TCP sources with end-to-end adaptive feedback
control.
Weighted Random Early Detection (WRED), also discussed in this chapter, allows different
RED parameters based on the packets Internet Protocol (IP) precedence value or traffic
class. Flow WRED, a WRED extension that applies an increased non-zero drop probability
to penalize flows taking more than their fair share of queue resources, and Explicit
Congestion Notification (ECN), which enables congestion notification for incipient
congestion by marking packets rather than dropping them, are also discussed. The chapter
ends with a section on Selective Packet Discard (SPD), a selective packet drop policy on
the IP input queue that queues packets for the IP process in a router.
128 Chapter 6: Per-Hop Behavior: Congestion Avoidance and Packet Drop Policy
follows the slow start algorithm, which operates by sending new packets into the network
at a rate of acknowledgments received from the other end. This makes TCP self-clocking.
In TCP, packet loss is an indicator of congestion. A TCP source detects congestion when it
fails to receive an acknowledgment for a packet within the estimated retransmit timer
timeout (RTT) period. In such a situation, it resets the congestion window to one segment
and restarts the slow start algorithm. It also decreases the slow start threshold (ssthresh) to
half the congestion window size at the time the retransmit was required. Note that when the
TCP session is established, ssthresh is set to the size of the receiver window announced by
the other end, or the default 65,535 bytes.
After an RTT timeout, a sender follows the slow start algorithm until the window size
reaches ssthresh. Afterward, the window size increases linearly (by 1/cwnd) per
acknowledgment received. The window size is increased slowly after it reaches ssthresh
because ssthresh estimates the bandwidth-delay product for the TCP connection. TCPs
slow start algorithm and congestion avoidance1 behavior are depicted in Figure 6-1.
Figure 6-1 TCP Congestion Window Showing Slow Start and Congestion Avoidance Operations
1. Slow start
Congestion 2. Retransmit timer timeout (RTT)
window 3. Congestion avoidance/linear increase
(cwnd)
2 Receiver window
ssthreshold
1
3
1
1
Time
When packet loss occurs for reasons other than network congestion, waiting for the RTT
times to expire can have an adverse performance impact, especially in high-speed networks.
To avoid this scenario, TCP fast retransmit and recovery algorithms are used.
1163_09.book Page 129 Thursday, December 14, 2000 1:47 PM
2 2 2 2
Time
1163_09.book Page 130 Thursday, December 14, 2000 1:47 PM
130 Chapter 6: Per-Hop Behavior: Congestion Avoidance and Packet Drop Policy
132 Chapter 6: Per-Hop Behavior: Congestion Avoidance and Packet Drop Policy
With low values of n, the average queue size closely tracks the current queue size, resulting
in the following RED behavior:
The average queue size moves rapidly and fluctuates with changes in the traffic levels.
The RED process responds quickly to long queues. When the queue falls below the
minimum threshold, the process stops dropping packets.
If n is too low, RED overreacts to temporary traffic bursts and drops traffic
unnecessarily.
When the average queue depth is above the minimum threshold, RED starts dropping
packets. The packet drop rate increases linearly as the average queue size increases, until
the average queue size reaches the maximum threshold.
When the average queue size is above the maximum threshold, all packets are dropped.
Packet drop probability is illustrated in Figure 6-3.
Packet drop
probability
0
Minimum threshold Maximum threshold
Average queue size
1163_09.book Page 133 Thursday, December 14, 2000 1:47 PM
WRED 133
WRED
WRED introduces grades of service among packets based on a packets drop probability
and allows selective RED parameters based on IP precedence. As a result, WRED drops
more aggressively for certain-precedence-level packets and less aggressively for other-
precedence-level packets.
WRED Implementation
WRED can be run on the central processor of the router or on a distributed mode on Ciscos
7500 series routers with Versatile Interface Processors (VIPs). By default, the maximum
threshold is the same for all precedence levels, but the minimum threshold varies with
packet precedence. Hence, you drop the lower precedence packets more aggressively than
the higher precedence packets. The default minimum threshold value for precedence 0
traffic is half the value of the maximum threshold.
You can enable WRED based on traffic classes by using modular QoS CLI. Modular QoS
CLI allows different RED parameters for each traffic class and is discussed in Appendix A,
Cisco Modular QoS Command-line Interface.
In Cisco 12000 series routers, WRED is available in either hardware- or software-based
implementations, depending on the hardware revision of the line card. Cisco 12000 series
routers allow eight class of service (CoS) queues. You can map a CoS queue to carry packets
of one or more precedence value(s). After the CoS queues are defined, RED parameters can
be applied independently to the different CoS queues. Because this router platform uses a
switch-based architecture, you can enable WRED on both the fabric queues on the receive
side and the interface queues on the transmit side. For information on how to enable WRED
on a Cisco 1200 series routers, refer to Case Study 5-10 in Chapter 5, Per-Hop Behavior:
Allocation II.
134 Chapter 6: Per-Hop Behavior: Congestion Avoidance and Packet Drop Policy
Active queue management using WRED is needed on interfaces that peer with the other
service providers to control congestion. On the interface connecting a peer service provider,
enable WRED by using the random-detect command.
Note that the WRED minimum drop threshold parameter needs to be relatively higher for
higher-precedence (better service) traffic when compared to the lower-precedence traffic,
so lower-precedence packets are dropped before higher-precedence traffic gets affected.
Listings 6-1 and 6-2 show the WRED default parameters and packet drop statistics by using
the show queueing random-detect command when WRED is running in the central
processor and VIP line cards of a Cisco 7500 router, respectively.
Listing 6-1 WRED Operation on a Low-end 7200 Series Router and on a 7500 Series Router in a Nondistributed
Mode
Router#show queueing random-detect
Hssi3/0/0
Queueing strategy: random early detection (WRED)
Exp-weight-constant: 9 (1/512)
mean queue depth: 40
drops: class random tail min-th max-th mark-prob
0 13783 174972 20 40 1/10
1 14790 109428 22 40 1/10
2 14522 119275 24 40 1/10
3 14166 128738 26 40 1/10
4 13384 138281 28 40 1/10
5 12285 147148 31 40 1/10
6 10893 156288 33 40 1/10
7 9573 166044 35 40 1/10
rsvp 0 0 37 40 1/10
When running on the central processor, WRED is run on the interface transmit queue and
its thresholds are determined as follows:
min_threshold (i)The minimum threshold corresponding to traffic of precedence
i, equal to:
(1/2 + i/18) output hold queue
max_thresholdEqual to the output hold queue
The output of the show queuing red command gives the following packet statistics:
Random DropWRED drops when the mean queue depth falls in between the
minimum and maximum WRED thresholds.
Tail-dropWRED drops when the mean queue depth exceeds the maximum
threshold.
1163_09.book Page 135 Thursday, December 14, 2000 1:47 PM
WRED 135
Mark ProbabilityThe drop probability when the mean queue depth is equal to the
maximum threshold.
Listing 6-2 WRED Operation on a VIP-Based 7500 Series Router
Router#show queueing random-detect
Hssi3/0/0
Queueing strategy: VIP-based fair queueing
Packet drop strategy: VIP-based random early detection (DWRED)
Exp-weight-constant: 9 (1/512)
Mean queue depth: 0
Queue size: 0 Maximum available buffers: 5649
Output packets: 118483 WRED drops: 800 No buffer: 0
When running WRED in a distributed mode on VIP-based 7500 routers, WRED processing
for an interface line card is done locally and not on the route processor. Hence, distributed
WRED operates on the queue on the VIP and not on the interface transmit queue.
Distributed WRED uses Cisco Express Forwarding (CEF)-based inter-process
communication to propagate configuration and statistics information between the Route
Switch Processor (RSP) and the VIP. (CEF is discussed in Appendix B, Packet Switching
Mechanisms.) Hence, CEF should be enabled when operating WRED in a distributed
mode. The interface drop statistics include Distributed WRED (DWRED) drops.
DWRED calculates the default maximum threshold based on the pool size (loosely
speaking, VIP queue size), the interface maximum transmission unit (MTU) size, and the
interface bandwidth. Pool size is, in turn, dependent on the memory size on the VIP and
other factors and, therefore, difficult to pin down to a fixed number. Therefore, the pool size
information only helps to estimate the burst size that can be accommodated. You can
configure the DWRED maximum threshold to a value different from the default, if
necessary.
Case 6-2: WRED Based on Traffic Classes Using Modular QoS CLI
A service provider is interested in running WRED in its routers because the interface
queues in the routers often exhibit symptoms of global synchronization. The service
provider carries critical User Datagram Protocol (UDP)-based application traffic on its
1163_09.book Page 136 Thursday, December 14, 2000 1:47 PM
136 Chapter 6: Per-Hop Behavior: Congestion Avoidance and Packet Drop Policy
network and hence, it doesnt want to randomly drop any of its critical application traffic,
although tail drop can be applied at the queues maximum threshold.
Listing 6-3 excludes the critical UDP-based application traffic before applying WRED on
the routers interface.
Listing 6-3 Enable WRED Using Modular QoS CLI on the Noncritical Traffic Class Only
class-map non-critical
match access-group 101
policy-map wred
match class non-critical
random-detect exponential-weighting-constant 9
random-detect precedence 0 112 375 1
interface Hssi0/0/0
service-policy output wred
As shown in Listing 6-3, you first define a noncritical traffic class that includes all traffic
except the critical UDP traffic. Next, use an access-list 101 command (not shown in Listing
6-3) to match all traffic, except the critical UDP application traffic. Follow this with the
match access-group 101 command, which matches all traffic that passed the access-list
101 definition, and the policy-map wred command, which enables WRED on the
noncritical traffic. In this Listing, WRED is enabled with an exponential weighting constant
of 9. At the same time, minimum threshold, maximum threshold, and mark probability
denominator values for IP precedence 0 packets are set to 112, 375, and 1, respectively. The
policy-map wred is then enabled on the routers Hssi0/0/0 interface.
Flow WRED
Only adaptive TCP flows respond to a congestion signal and slow down, while nonadaptive
UDP flows, which do not respond to congestion signals, dont slow down. For this reason,
nonadaptive flows can send packets at a much higher rate than adaptive flows at times of
congestion. Hence, greedy, nonadaptive flows tend to use a higher queue resource than the
adaptive flows that slow down in response to congestion signals. Flow WRED modifies
WRED such that it penalizes flows taking up more than their fair share of queue resources.
To provide fairness among the active traffic flows in the queue, WRED classifies all arriving
packets into the queue based on their flow and precedence. It also maintains state for all
active flows, or flows that have packets in the queue. This state information is used to
determine the fair amount of queue resources for each flow (queue size/number of active
flows), and flows taking more than their fair share are penalized more than the others are.
1163_09.book Page 137 Thursday, December 14, 2000 1:47 PM
To accommodate for a flows traffic burstiness, you can increase each flows fair share by a
scaling factor before it gets penalized using the following formulas:
Fair share of queue resources per active flow = queue size number of active flows
Scaled fair share of queue resources per flow = (queue size number of active flows) scaling factor
A flow exceeding the scaled fair share of queue resources per flow in the queue is penalized
by an increase in the non-zero drop probability for all the newly arriving packets in the
queue.
As an example, consider a packet arriving on a queue running Flow WRED. Flow WRED
considers both the IP precedence value in the packet and the flow state information to
determine the packets drop probability. The packets IP precedence determines the
configured (or the default) minimum and maximum WRED thresholds for the packet. If the
average queue size is below the minimum threshold, the packet gets zero drop probability
(in other words, it is not dropped). If the average queue size is in between the packets
minimum and maximum threshold (as determined by the packets IP precedence), the flow
state information is taken into consideration. If the packet belongs to a flow that exceeded
the scaled fair share of queue resources per flow in the queue, you increase the packets drop
probability by decreasing the WRED maximum threshold as follows:
New Maximum threshold = Minimum threshold + ((Maximum threshold Minimum threshold) 2)
The non-zero probability is then derived based on the minimum threshold and the new
maximum threshold values. Because the drop probability curve is much steeper now, as
shown in Figure 6-4, you apply a higher drop probability on the packet. If the flow is within
its fair allocation of queue resources, the packet gets a non-zero drop probability
determined by the normal WRED calculation.
If the average queue size exceeds the maximum threshold, continue to drop packets using
a process similar to that used in the WRED operation.
Flow WRED increases the probability of a packet getting dropped only if the packet
belongs to a flow whose packets in the queue exceed the scaled per-flow limit. Otherwise,
Flow WRED operates similar to WRED.
NOTE Flow WRED, as described previously, still applies non-zero drop probability for flows with
a few packets in the queue when the average queue size is between the minimum and
maximum thresholds. You can implement Flow WRED such that it doesnt drop packets of
a flow with a few, specified number of packets in the queue by increasing the minimum
threshold to be close to or equal to the maximum threshold. You apply tail-drop on the
packet at the queues WRED maximum threshold value.
1163_09.book Page 138 Thursday, December 14, 2000 1:47 PM
138 Chapter 6: Per-Hop Behavior: Congestion Avoidance and Packet Drop Policy
The random-detect flow command enables Flow WRED with a scaling factor of 4 and a
flow count of 256. Because the traffic is expected to be bursty in nature, you can increase
the scaling factor to 6 using the random-detect flow average-depth-factor 6 command.
You can modify the allowed active flow count to be any value between 0 and 215 using the
random-detect flow count command.
You can use the show queueing random-detect command to verify an interfaces queuing
strategy.
1163_09.book Page 139 Thursday, December 14, 2000 1:47 PM
SPD 139
ECN
Thus far, in active queue management using WRED, packets have been dropped to signal
congestion to a TCP source. ECN provides routers the functionality to signal congestion to
a TCP source by marking a packet header rather than dropping the packet4. In the scenarios
where a WRED-enabled router dropped a packet to signal congestion, it can now set the
ECN bit in the packet, avoiding potential delays due to packet retransmissions caused by
packet loss.
ECN functionality requires support for a Congestion Experienced (CE) bit in the IP header
and a transport protocol that understood the CE bit. An ECN field of 2 bits in the IP header
is provided for this purpose. The ECN-Capable Transport (ECT) bit is set by the TCP
source to indicate that the transport protocols end-points are ECN-capable. The CE bit is
set by the router to indicate congestion to the end nodes.
Bits 6 and 7 in the IPv4 Type of Service (ToS) byte form the ECN field and are designated
as the ECT bit and the CE bit, respectively. Bits 6 and 7 are listed in differentiated services
architecture as currently unused.
For TCP, ECN requires three new mechanisms:
ECN Capability NegotiationThe TCP endpoints negotiate during setup to
determine if they are both ECN-capable.
ECN-Echo flag in the TCP headerThe TCP receiver uses the ECN-Echo flag to
inform the TCP source that a CE packet has been received.
Congestion Window Reduced (CWR) flag in the TCP headerThe TCP source
uses the CWR flag to inform the TCP receiver that the congestion window has been
reduced.
ECN functionality is still under discussion in the standard bodies.
SPD
SPD helps differentiate important control traffic (such as routing protocol packets) over
normal data traffic to the router. This enables the router to keep its Interior Gateway
Protocol (IGP) and Border Gateway Protocol (BGP) routing information during congestion
by enqueuing routing protocol packets over the normal data traffic.
SPD implements a selective packet drop policy on the routers IP process queue. Therefore,
it applies to only process switched traffic. Even when the router is using route-cache
forwarding (also called fast switching), some of the transit data traffic still needs to be
process switched in order to create a route-cache entry. When a router is using CEF, though,
all transit data traffic is usually CEF switched and the only packets that reach the IP process
1163_09.book Page 140 Thursday, December 14, 2000 1:47 PM
140 Chapter 6: Per-Hop Behavior: Congestion Avoidance and Packet Drop Policy
input queue are the important control packets such as routing and keepalives, normal data
traffic destined to the router, and transit traffic that is not CEF supported. Process switching,
route-cache forwarding, and CEF are discussed in Appendix B.
Traffic arriving at the IP process input queue is classified in three ways:
Important IP control traffic (routing protocol packets), often called priority traffic.
Normal IP traffic, such as telnet/ping packets to a router interface, IP packets with
options, and any IP feature or encapsulation not supported by CEF.
Aggressive dropable packets. These are IP packets that fail the IP sanity check; that
is, they might have incorrect checksums, invalid versions, an expired Time-to-Live
(TTL) value, an invalid UDP/TCP port number, an invalid IP protocol field, and so on.
Most of these packets trigger an Internet Control Message Protocol (ICMP) packet to
notify the sender of the bad packet. A small number of these packets are generated due
to normal utilities such as a trace route. Such packets in large numbers, however, can
be part of a malicious smurf attack intended to cripple the router by filling up the IP
process queue. It is essential to selectively drop these packets without losing the
important control information.
SPD operates in the following modes:
DisabledThe SPD feature is disabled on the router.
NormalThe IP input queue is less than the queue minimum threshold. No packets
are dropped.
Random dropThe IP input queue is more than the minimum threshold but less than
the maximum threshold. Normal IP packets are dropped in this mode, with a drop
probability shown in the following formula:
drop probability = (queue length min. threshold) (max. threshold min. threshold)
Random drops are called SPD flushes. Important IP control traffic is still enqueued.
Full dropThe IP input queue is above the maximum threshold. All normal IP traffic
is dropped. Important IP control traffic is still received to a special process level
queue, termed the priority queue, that is drained before the normal one.
Aggressive dropThis is a special aggressive drop mode for IP packets failing the
sanity check. All bad IP packets are dropped when the input queue is above the
minimum threshold. You can enable this special drop mechanism using the ip spd
mode aggressive command.
Figure 6-5 illustrates SPD operation and its modes.
1163_09.book Page 141 Thursday, December 14, 2000 1:47 PM
SPD 141
om
Packet drop a nd p
probability R dro
Normal
drop
0
Minimum Maximum
threshold threshold
IP input IP
process queue priority
queue
Queue size
142 Chapter 6: Per-Hop Behavior: Congestion Avoidance and Packet Drop Policy
You use the show ip spd command to show the SPD operation mode, the minimum and
maximum thresholds, and the IP process and priority queue sizes. The minimum and
maximum input queue SPD thresholds are user-configurable.
IP normal queue and priority queue give the value of their respective queues at the time the
show ip spd command is executed. IP normal queue is 20, which is below the minimum
queue threshold of 73. Hence, SPD is currently operating in the normal mode.
A special queue, termed the IP priority queue, is used to hold the important IP traffic when
the IP input process queue is full. The IP input process queue is a global queue that holds
all IP packets waiting to be processed by the IP input process. These packets cannot be
processed on a CPU interrupt.
You can configure the priority queue length using the ip spd headroom command. Any
important IP traffic arriving on a full IP priority queue is dropped.
NOTE Note that some of the SPD command formats changed in 12.0 IOS releases when compared
to the earlier 11.1 IOS versions. While all the 11.1 IOS version SPD commands started with
ip spd, some of the SPD commands in 12.0 and later versions of IOS can only be configured
as spd <> instead of ip spd <>. Refer to the appropriate Cisco documentation for
clarification on the command formats.
Headroom shows the priority queues depth. Priority queue holds the priority traffic.
Presently, the priority queue has 20 packets to be processed. The default normal queue
thresholds are derived from the input queue length. The show interface command shows
the queue length of the input queue. All bad IP packets need to be process-switched. You
should enable the SPD aggressive drop mechanism to drop bad or illegal IP packets when
the normal IP queue exceeds the minimum threshold. You can enable the SPD aggressive
drop mode using the ip spd mode aggressive command. Listing 6-6 shows the SPD
operation with special SPD aggressive mode on.
Listing 6-6 Aggressive SPD Drop Mechanism for Bad or Illegal IP Packets
R3#sh ip spd
Current mode: normal.
Queue min/max thresholds: 73/74, Headroom: 100
IP normal queue: 20, priority queue: 5.
SPD special drop mode: aggressively drop bad packets
Summary 143
You use the show interfaces switching command to view statistics on the SPD flushes and
aggressive drops. It includes information on received important (priority) IP traffic and on
any drops that occur when you exceed the headroom threshold.
Route Processor (RP) drops are categorized as those dropped by a processor on a full input
queue. They match with the input drops shown in the show interface Hssi0/0/0 command
output.
Fast SPD flushes are the random, full drops as well as the special aggressive drops done by
SPD when the input queue is not physically exhausted. No SPD drops occur when the
normal queue is below the minimum threshold.
Fast SPD aggressive drops occur when SPD is operating in the special aggressive mode.
They show the number of bad or illegal IP packets dropped by SPD. No SPD aggressive
drops occur when the normal queue is below the minimum threshold.
The SPD priority queue carries the priority traffic. Priority traffic is control traffic tagged
with a precedence of 6.
Summary
This chapter focuses on congestion avoidance and packet drop policy algorithms: RED,
WRED, Flow WRED, and ECN.
RED is an active queue management algorithm that allows routers to detect congestion
before the queue overflows. It aims to reduce the average queue size, hence reducing
queuing delay and avoiding global synchronization by adopting a probability-based packet
drop strategy between certain queue thresholds. Weighted version of RED, WRED allows
different RED parameters based on a packets IP precedence.
Flow WRED extends WRED to provide fairness in the packet drop behavior among the
different flow types. ECN enables congestion notification for incipient congestion by
marking packets rather than dropping them.
The chapter also discusses SPD, a selective packet drop policy that is used on the IP input
queue of a Cisco router. SPD is used to differentiate between the data and control packets
that are enqueued to the IP process in a Cisco router.
1163_09.book Page 144 Thursday, December 14, 2000 1:47 PM
144 Chapter 6: Per-Hop Behavior: Congestion Avoidance and Packet Drop Policy
References 145
References
1 TCP Slow Start, Congestion Avoidance, Fast Recovery, and Fast Recovery Algorithms,
RFC 2001.
2 Recommendations on Queue Management and Congestion Avoidance in the Internet,
B. Braden, D. Clark, J. Crowcroft, B. Davie, S. Deering, D. Estrin, S. Floyd, V. Jacobson,
G. Minshall, C. Partridge, L. Peterson, K.Ramakrishnan, S. Shenker, J. Wroclawski, L.
Zhang, April 1998.
3 Random Early Detection Gateways for Congestion Avoidance, S. Floyd and V.
Jacobson, IEEE/ACM Transactions on Networking, V.1 N.4, August 1993, pp. 397-413.
4 A Proposal to add Explicit Congestion Notification (ECN) to IP, K. Ramakrishnan, S.
Floyd, RFC 2481.
5 RED in a Different Light, V. Jacobson, K. Nichols, K. Poduri, work in progress.
1163_09.book Page 146 Thursday, December 14, 2000 1:47 PM
1163_09.book Page 147 Thursday, December 14, 2000 1:47 PM
CHAPTER
7
RSVP
The Internet Engineering Task Force (IETF) specified RSVP1 as a signaling protocol for
the intserv architecture. RSVP enables applications to signal per-flow QoS requirements to
the network. Service parameters are used to specifically quantify these requirements for
admission control.
RSVP is used in multicast applications such as audio/video conferencing and broadcasting.
Although the initial target for RSVP is multimedia traffic, there is a clear interest in
1163_09.book Page 148 Thursday, December 14, 2000 1:47 PM
reserving bandwidth for unicast traffic such as Network File System (NFS), and for Virtual
Private Network (VPN) management.
RSVP signals resource reservation requests along the routed path available within the
network. It does not perform its own routing; instead, it is designed to use the Internets
current robust routing protocols. Like other IP traffic, it depends on the underlying routing
protocol to determine the path for both its data and its control traffic. As the routing protocol
information adapts to network topology changes, RSVP reservations are carried over to the
new path. This modularity helps RSVP to function effectively with any underlying routing
service. RSVP provides opaque transport of traffic control and policy control messages, and
provides transparent operation through nonsupporting regions.
RSVP Operation
End systems use RSVP to request a specific QoS from the network on behalf of an
application data stream. RSVP requests are carried through the network, visiting each node
the network uses to carry the stream. At each node, RSVP attempts to make a resource
reservation for the stream.
RSVP-enabled routers help deliver the right flows to the right locations. Figure 7-1 gives
an overview of the important modules and the data and control flow information of a client
and router running RSVP.
Figure 7-1 Data and Control Flow Information of a Client and Router Running RSVP
Host Router
Router
Application RSVPD RSVPD
daemon
Policy Policy
control control
Admission Admission
Packet control Packet control
classifier classifier
Packet Packet
scheduler scheduler
RSVP/Control packets
The RSVP daemon in a router communicates with two local decision modulesadmission
control and policy controlbefore making a resource reservation2. Admission control
determines whether the node has sufficient available resources to supply the requested QoS.
1163_09.book Page 149 Thursday, December 14, 2000 1:47 PM
RSVP 149
Policy control determines whether the user has administrative permission to make the
reservation. If either check fails, the RSVP daemon sends an error notification to the
application process that originated the request. If both checks succeed, the RSVP daemon
sets parameters in a packet classifier and a packet scheduler to obtain the desired QoS. The
packet classifier determines the QoS class for each packet, and the packet scheduler orders
packet transmission based on its QoS class. The Weighted Fair Queuing (WFQ) and
Weighted Random Early Detection (WRED) disciplines provide scheduler support for
QoS. WFQ and WRED are discussed in Chapters 4 and 6, respectively.
During the admission control decision process, a reservation for the requested capacity is
put in place if sufficient capacity remains in the requested traffic class. Otherwise, the
admission request is refused, but the traffic is still forwarded with the default service for
that traffics traffic class. In many cases, even an admission request that failed at one or
more routers can still supply acceptable quality, as it might have succeeded in installing a
reservation in all the routers suffering congestion. This is because other reservations might
not be fully utilizing their reserved capacity.
Reservations must follow on the same unicast path or on the multicast tree at all times. In
case of link failures, the router should inform the RSVP daemon so that RSVP messages
are generated on a new route.
You can break down the process of installing a reservation into five distinct steps3:
1 Data senders send RSVP PATH control messages the same way they send regular data
traffic. These messages describe the data they are sending or intend to send.
2 Each RSVP router intercepts the PATH messages, saves the previous hop IP address,
writes its own address as the previous hop, and sends the updated message along the
same route the application data is using.
3 Receiver stations select a subset of the sessions for which they are receiving PATH
information and request RSVP resource reservations from the previous hop router
using an RSVP RESV message. The RSVP RESV messages going from a receiver to
a sender take an exact reverse path when compared to the path taken by the RSVP
PATH messages.
4 The RSVP routers determine whether they can honor those RESV requests. If they
cant, they refuse the reservations. If they can, they merge reservation requests being
received and request a reservation from the previous hop router.
5 The senders receive reservation requests from the next hop routers indicating that
reservations are in place. Note that the actual reservation allocation is made by the
RESV messages.
Figure 7-2 shows the RSVP reservation setup mechanism.
As discussed in Chapter 3, Network Boundary Traffic Conditioners: Packet Classifier,
Marker, and Traffic Rate Management, an individual flow is made of packets going from
1163_09.book Page 150 Thursday, December 14, 2000 1:47 PM
Sender Receiver
S1 R1
RE
SV SV
PAT RE
H TA
DA
DA RESV
TA TH
PA
Router 1 Router 2
PAT
TA PATH DATA H
DA
DA
H TA
PAT S V RE
SV
RE
Sender Receiver
S2 R2
RSVP Components
The operational responsibilities of the three RSVP components are as follows:
An RSVP sender is an application that originates traffic in an RSVP session. The flow
specifications that RSVP senders can send across the RSVP network are:
Average data rate
Maximum burst size
An RSVP-enabled router network provides the path between the RSVP senders and
the RSVP receivers.
An RSVP receiver is an application that receives traffic in an RSVP session. In
conferencing and Voice over IP (VoIP) applications, an application can act as both an
RSVP sender and receiver. The flow specifications that RSVP receivers can send
across the RSVP network are:
Average data rate
Maximum burst size
QoS, including
Guaranteed servicePATH messages also describe the worst-case
delays in the network.
Controlled load serviceThe routers guarantee only that network
delays will be maximized.
1163_09.book Page 151 Thursday, December 14, 2000 1:47 PM
RSVP 151
RSVP Messages
RSVP uses seven message types for its operation: two required message typesPATH and
RESVand five optional message typesPATH ERROR, PATH TEARDOWN, RESV
ERROR, RESV CONFIRM, and RSV TEARDOWN. The RSVP routers and clients use
them to create and maintain reservation states.
RSVP usually runs directly over the IP. As such, RSVP messages are unreliable datagrams.
They help create soft states within the routers, and a periodic refresh is needed.
The following are the sender message types:
PATH messages are sent periodically by senders. The senders describe the flows in
terms of the source and destination IP addresses, the IP protocol, and the User
Datagram Protocol (UDP) or Transmission Control Protocol (TCP) ports, if
applicable. They quantify the expected resource requirements for this data by
specifying its mean rate and burst size.
They are sent to the multicast group or unicast destination of the flow for which the
reservation is being made; RSVP routers detect them because they are sent in UDP
messages to a particular UDP port, or because they have the IP Router Alert option in
their IP header. A router creates a Path State Block (PSB) when the PATH messages
are received.
PATH messages contain a periodic hello interval indicating how frequently the sender
sends them. The default hello interval is 30 seconds. It is important to keep the hello
interval small, or to have a fast retransmit scheme, because lost PATH messages can
result in poor performance for VoIP, as that would delay the establishment of an RSVP
reservation along the path of the VoIP call. The PSB is discarded upon a PATH
TEARDOWN or ingress link failure, or when the PSB has not been refreshed by a
new PATH message after four hello intervals.
When error(s) in a PATH message are found, the optional PATH ERROR message is
sent by the receiver or router, notifying the sender of the problem. Typically, this is a
fundamental format or integrity check fault.
PATH TEARDOWN messages are sent to the multicast group with the senders source
address when the PATH must be flushed from the database, either due to a link failure
or because the sender is exiting the multicast group.
The following are the receiver message types:
RESV messages are sent periodically by receivers. The receivers describe the flows
and resource guarantees they need using information derived from the PATH
messages, in terms of the source and destination IP addresses, the IP protocol, and
the UDP or TCP ports, if applicable. They also describe the bit rate and delay
characteristics they need, using flow specifications. They traverse through all RSVP
1163_09.book Page 152 Thursday, December 14, 2000 1:47 PM
routers along the routed path to the sender for which the reservation is being made.
Routers create Reservation State Blocks (RSBs) when RESV messages (FlowSpec,
FilterSpec) are granted.
RESV messages contain a periodic hello interval indicating how frequently the
receiver sends them. The RSB is discarded upon a RESV TEARDOWN or ingress
link failure, or when they have not been refreshed by a new RESV message after four
hello intervals.
When error(s) in an RESV message are found, an RESV ERROR message is sent by
a sender or router informing the receiver of a problem. Typically, it is due to a
fundamental format or integrity check fault, or because insufficient resources were
available to make the requested guarantees.
When the effect of an RESV message applies end to end and a receiver requests
notification of the fact, RESV CONFIRM messages are sent to the receivers or merge
point routers.
RESV TEARDOWN messages are sent when an RSB must be flushed from the
database, either due to a link failure or because the sender is exiting the multicast
group.
Reservation Styles
You can categorize RSVP flow reservations into two major typesdistinct and shared
which are discussed in the following sections.
Distinct Reservations
Distinct reservations are appropriate for those applications in which multiple data sources
are likely to transmit simultaneously. In a video application, each sender emits a distinct
data stream requiring separate admission control and queue management on the routers
along its path to the receiver. Such a flow, therefore, requires a separate reservation per
sender on each link along the path.
Distinct reservations are explicit about the sender and are installed using a Fixed Filter (FF)
reservation style. Symbolically, you can represent an FF-style reservation request by FF
(S,Q), where the S represents the sender selection and the Q represents the FlowSpec.
Unicast applications form the simplest case of a distinct reservation, in which there is one
sender and one receiver.
1163_09.book Page 153 Thursday, December 14, 2000 1:47 PM
Shared Reservations
Shared reservations are appropriate for those applications in which multiple data sources
are unlikely to transmit simultaneously. Digitized audio applications, such as VoIP, are
suitable for shared reservations. In this case, as a small number of people talk at any time,
a limited number of senders send at any given time. Such a flow, therefore, does not require
a separate reservation per sender; it requires a single reservation that you can apply to any
sender within a set, as needed.
RSVP refers to such a flow as a shared flow and installs it using a shared explicit or wildcard
reservation scope. These two reservation styles are discussed below.
The Shared Explicit (SE) reservation style specifically identifies the flows that reserve
network resources.
Symbolically, you can represent an SE-style reservation request by SE((S1,S2){Q}),
where the S1, S2, . . . represent specific senders for the reservation and the Q
represents the FlowSpec.
The Wildcard Filter (WF) reserves bandwidth and delay characteristics for any sender.
It does not admit the senders specification; it accepts all senders, which is denoted by
setting the source address and port to zero.
Symbolically, you can represent a WF-style reservation request by WF(* {Q}), where
the asterisk represents the wildcard sender selection and the Q represents the
FlowSpec.
Table 7-1 shows the different reservation filters based on the reservation styles and senders
scope, and Figure 7-3 illustrates the three reservation filter styles described previously.
Table 7-1 Different Reservation Filters, Based on Style and Sender Scope
Sender Selection Scope Reservation Styles
Distinct Shared
Explicit FF SE
Wildcard None defined WF
1163_09.book Page 154 Thursday, December 14, 2000 1:47 PM
Sender Receiver
S1 R1
( S1, 2B )
( S1, 1B )
( S2, 1B )
Sender Receiver
S2 Fixed filter R2
Sender Receiver
S1 R1
( S1, 3B )
( S1, 3B )
( S1 S2, 3B )
Router 1 Router 2
( S1 S2, 3B )
( S2, 3B )
Sender Receiver
S2 Shared explicit filter R2
Sender Receiver
S1 R1
( *, 2B )
( *, 1B )
( *, 2B )
Router 1 Router 2
( *, 2B )
( *, 2B )
Sender Receiver
S2 Wildcard filter R2
1163_09.book Page 155 Thursday, December 14, 2000 1:47 PM
Service Types
RSVP provides two types of integrated services that the receivers can request through their
RSVP RESV messages: controlled load service and guaranteed bit rate.
Controlled Load
Under controlled load service4, the network guarantees that the reserved flow will reach its
destination with a minimum of interference from the best-effort traffic. In addition, Ciscos
implementation of the service offers isolation between the reserved flows. Flow isolation
allows a flow reservation to operate unaffected by the presence of any other flow
reservations that might exist in the network.
The controlled load service is primarily intended for a broad class of applications running
on the Internet today that are sensitive to overloaded conditions. These applications work
well on unloaded nets, but they degrade quickly under overloaded conditions. An example
of such an application is File Transfer Protocol (FTP).
Both controlled load and guaranteed bit rate use a token bucket to describe a data flows
traffic parameters. The token bucket is a rate control mechanism that identifies a mean rate
(how much can be sent or forwarded per unit time on average), a burst size (how much can
be sent within a given unit of time without scheduling concerns), and a measurement
interval (the time quantum). Details regarding the token bucket rate control mechanism are
discussed in Chapter 3.
Under both services, a receiver requests for a certain bit rate and burst size in an RESV
message. The WFQ scheduler and WRED queue management techniques with preferential
weights assure that traffic to the receiver has a bounded latency. The latency bound is not
specified, however. The controlled load service only promises good service, and the
guaranteed service provides information (in the PATH messages) from which the delay
bounds can be calculated.
RSVP Scalability
One drawback of RSVP is that the amount of state information required increases with the
number of per-flow reservations. As many hundreds of thousands of real-time unicast and
multicast flows can exist in the Internet backbone at any time, state information on a per-
flow granularity is considered a nonscalable solution for Internet backbones.
RSVP with per-flow reservations scales well for medium-size corporate intranets with link
speeds of DS3 or less. For large intranets and for Internet service provider (ISP) backbones,
you can make RSVP scale well when you use it with large multicast groups, large static
classes, or an aggregation of flows at the edges rather than per-flow reservations. RSVP
reservation aggregation6 proposes to aggregate several end-to-end reservations sharing
1163_09.book Page 157 Thursday, December 14, 2000 1:47 PM
Case Study 7-1: Reserving End-to-End Bandwidth for an Application Using RSVP 157
common ingress and egress routers into one large, end-to-end reservation. Another
approach is to use RSVP at the edges and diffserv in the network backbone to address
RSVP scalability issues in the core of a large network. RSVP-to-diffserv mapping is
discussed in Chapter 2, Differentiated Services Architecture.
The service provider networks and the Internet of the future are assumed to have, for the
most part, sufficient capacity to carry normal telephony traffic. If a network is engineered
with sufficient capacity, you can provision all telephony traffic as a single class. Depending
on available network capacity, telephony traffic can require relatively modest capacity,
which is given some fraction of the capacity overall, without the need for resource
allocation per individual call.
Figure 7-4 RSVP Signaling and Reservations for Traffic FLows Between Two End-Hosts
30 60
Host A 1 Host B
101 102 105 106 33
e0 s0 e0
s0 s1 s1
R1 R2 R3
Note: The IP addresses in the diagram indicate only the last octet of 210.210.210.X address.
PATH
UDP Port 1040
RESV
PATH
ICMP
RESV
PATH
ICMP
RESV
In case the applications are not RSVP-compliant, end routers should be set up to behave as
though they are receiving periodic signaling information from the sender and the receiver.
Another RSVP reservation should be configured for Internet Control Message Protocol
(ICMP) packets to allow pinging between the sender and the receiver to troubleshoot end-
to-end network connectivity issues.
1163_09.book Page 158 Thursday, December 14, 2000 1:47 PM
NOTE The use of RSVP for ICMP messages in this case study is meant primarily to illustrate
RSVP operation, though the practical need might be limited.
To enable end-to-end RSVP signaling over the network, the inbound and outbound
interfaces of the routers along the path of the sender to the receiver need to be configured
for WFQ and RSVP. Note that WFQ is on by default on interfaces with bandwidths of
2 MB or less. The show interface command shows the queuing strategy being used by an
interface.
In this case study, RSVP reservations are made for the following protocols:
UDP Port 1040 application traffic from Host A to Host B.
ICMP packets from Host A to Host B and vice versa to verify and diagnose
connectivity issues at times of network congestion. This enables ping (ICMP Echo
Request) packets from one end host to the other, and ICMP Echo Reply packets in
response to the requests to work at all times.
Listing 7-1 shows a sample configuration for Router R1 for the RSVP setup.
Listing 7-1 RSVP-Related Configuration on Router R1
interface Ethernet0
ip address 210.210.210.1 255.255.255.224
fair-queue 64 256 234
ip rsvp bandwidth 7500 7500
!
interface Serial0
ip address 210.210.210.101 255.255.255.252
fair-queue 64 256 36
ip rsvp bandwidth 1158 1158
This configuration enables WFQ and RSVP on the router interfaces. The fair-queue
command enables you to set up the queues drop threshold and the maximum number of
normal and reservable conversation queues. The ip rsvp bandwidth command sets up the
maximum amount of reservable bandwidth on the interface and the maximum allowable
reservable bandwidth for any particular reservation.
The ip rsvp sender command is used to set up the router as if it were receiving periodic
RSVP PATH messages from a downstream sender, and enable it to send RSVP PATH
1163_09.book Page 159 Thursday, December 14, 2000 1:47 PM
Case Study 7-1: Reserving End-to-End Bandwidth for an Application Using RSVP 159
messages upstream. Note that this command is needed only when the sender cannot send
RSVP PATH messages.
Two ip rsvp sender commands are used in the configuration. They are used to simulate
receipt of periodic RSVP PATH messages from sender 210.210.210.30 for ICMP and UDP
Port 1040 traffic, respectively. Router R1 forwards the received RSVP PATH messages
further downstream toward the destination 210.210.210.60.
The ip rsvp reservation command is used to set up the router as if it were receiving
periodic RSVP RESV messages. The command in the configuration shows receipt of
periodic RSVP RESV messages from 210.210.210.30 for the ICMP traffic. This is to enable
RSVP reservation for ping (ICMP) packets from Host B to Host A. The RSVP RESV
messages are forwarded toward source 210.210.210.60.
Listings 7-2 and 7-3 show the RSVP-related configuration for Routers R2 and R3,
respectively.
Listing 7-2 RSVP-Related Configuration on Router R2
interface Serial0
ip address 210.210.210.102 255.255.255.252
fair-queue 64 256 36
ip rsvp bandwidth 1158 1158
!
interface Serial1
ip address 210.210.210.105 255.255.255.252
fair-queue 64 256 36
ip rsvp bandwidth 1158 1158
Note that this router doesnt need any RSVP sender and reservation statements, as they are
needed only on edge routers. Edge routers dynamically propagate the RSVP messages to
allow Router R2 to make reservations for the RSVP flows.
Listing 7-3 RSVP-Related Configuration on Router R3
interface Ethernet0
ip address 210.210.210.33 255.255.255.224
fair-queue 64 256 234
ip rsvp bandwidth 7500 7500
!
interface Serial1
ip address 210.210.210.106 255.255.255.252
fair-queue 64 256 36
ip rsvp bandwidth 1158 1158
Listings 7-4 through 7-10 provide output of the different RSVP-related information from
the router.
Listing 7-4 Interface-Related RSVP Information of Router R1
Router#show ip rsvp interface
interfac allocate i/f max flow max per/255 UDP IP UDP_IP
UDP M/C
Et0 1K 7500K 7500K 0 /255 0 1 0
0
Se0 33K 1158K 1158K 7 /255 0 1 0
0
The show ip rsvp interface command shows the total allocated bandwidth on an interface.
By default, the maximum reservable bandwidth on an interface is 0.75 times the total
interface bandwidth.
Listing 7-5 RSVP Sender Information on Router R1
Router#show ip rsvp sender
To From Pro DPort Sport
Prev Hop I/F BPS Bytes
210.210.210.30 210.210.210.60 1 0 0
210.210.210.102 Se0 1K 1K
210.210.210.60 210.210.210.30 1 0 0
210.210.210.30 Et0 1K 1K
210.210.210.60 210.210.210.30 UDP 1040 0
210.210.210.30 Et0 32K 32K
The show ip rsvp sender command shows that Router R1 saw three different RSVP PATH
messages.
Listing 7-6 Received RSVP Reservation Requests by Router R1
Router#show ip rsvp reservation
To From Pro DPort Sport
Next Hop I/F Fi Serv BPS Bytes
210.210.210.30 210.210.210.60 1 0 0
210.210.210.30 Et0 FF LOAD 1K 1K
210.210.210.60 210.210.210.30 1 0 0
210.210.210.102 Se0 FF LOAD 1K 1K
210.210.210.60 210.210.210.30 UDP 1040 0
210.210.210.102 Se0 FF LOAD 32K 32K
The show ip rsvp reservation command shows that Router R1 received three RSVP RESV
messages. As an example, for the UDP flow, Router R1 received the RSVP RESV message
from Router R2 with a next-hop of 210.210.210.102.
1163_09.book Page 161 Thursday, December 14, 2000 1:47 PM
Case Study 7-1: Reserving End-to-End Bandwidth for an Application Using RSVP 161
The show ip rsvp installed command shows three active RSVP reservations in Router R1.
It also shows the conversation number and weight assigned by WFQ for each reservation.
Note that the weight for the largest RSVP reservation is always 4. The weight for the
reservation of ICMP packets on the serial0 interface is derived as:
4 (RSVP reservation request) (largest RSVP reservation allocated) = 4 32 1 = 128.
Listing 7-8 RSVP Reservations Sent Upstream by Router R1
Router#show ip rsvp request
To From Pro DPort Sport
Next Hop I/F Fi Serv BPS Bytes
210.210.210.30 210.210.210.60 1 0 0
210.210.210.102 Se0 FF LOAD 1K 1K
210.210.210.60 210.210.210.30 1 0 0
210.210.210.30 Et0 FF LOAD 1K 1K
210.210.210.60 210.210.210.30 UDP 1040 0
210.210.210.30 Et0 FF LOAD 32K 32K
The show ip rsvp request command shows that Router R1 passed on three RSVP RESV
messages. As an example, for the UDP flow, Router R1 sent the RSVP RESV message to
the downstream host with a next-hop of 210.210.210.30.
Listing 7-9 RSVP Neighbors of Router R1
Router#show ip rsvp neighbor
Interfac Neighbor Encapsulation
Et0 210.210.210.30 RSVP
Se0 210.210.210.102 RSVP
The show ip rsvp neighbor command shows the neighbor routers or hosts from which
Router R1 received RSVP messages.
1163_09.book Page 162 Thursday, December 14, 2000 1:47 PM
The show queue s0 command shows the capture of the packets in the output queue of the
serial0 interface at the time the command was issued. Note that the conservation numbers
of the RSVP flows in the queue match those shown in the show ip rsvp installed command
output.
Summary 163
NOTE The retransmit time for RSVP messages can be too long for VoIP. Because lost RSVP PATH
messages can result in poor performance for VoIP by delaying the establishment of an
RSVP reservation along the VoIP calls path, it is important to keep hello interval for RSVP
PATH messages small, or to have a fast retransmit scheme.
Summary
In intserv, RSVP is used to signal QoS information using control messages that are different
from the actual data packets. RSVP signaling results in certain resource guarantees along
the traffics routed path.
Like diffserv, intserv using RSVP depends on the packet scheduler with QoS support (such
as WFQ and WRED) in the router to offer the desired QoS for the RSVP reserved flows.
1163_09.book Page 164 Thursday, December 14, 2000 1:47 PM
References 165
A It is likely that the RSVP flow is sending more than the reserved bandwidth. If the
flow is sending traffic at a rate that goes over the bandwidth reservation, the excess
traffic is treated as best effort and is assigned a weight of 4096 in WFQ. Therefore,
the packet with a weight of 4096 might be a nonconforming packet. Chapter 4
describes the WFQ algorithm in detail.
Q The router cannot install the RSVP reservation for the flow I just configured on the
router. The debug ip rsvp command shows the following message: RSVP RESV: no
path information for 207.2.23.1. What does this debug message mean?
A Generally, you should see this message on a router that has a static RSVP
reservation configuration for a flow so that the router behaves as if it is receiving
periodic RESV messages for this flow. The debug informs that the router didnt
receive any PATH message for the corresponding flow. RSVP cannot send a RESV
message without first receiving a corresponding PATH message for the flow.
Troubleshoot over the path leading to the sender to find out why the router didnt
receive any RSVP PATH message.
Q Are RSVP messages Cisco Express Forwarding (CEF)-switched?
A No. RSVP messages are control packets that need to be processed by the router.
Hence, they are process-switched. Note, however, that the data packets belonging
to an RSVP flow follow whatever switching path is configured in the router. Packet
switching is discussed in Appendix B, Packet Switching Mechanisms.
References
1 Resource ReSerVation Protocol (RSVP)Version 1 Functional Specification, R.
Branden and others, RFC 2205, 1997.
2 Resource Reservation Protocol (RSVP) Version 1 Applicability Statement, Some
Guidelines on Deployment, A. Mankin and others, RFC 2208, 1997.
3
RSVP home page: www.isi.edu/rsvp.
4
Specification of the Controlled-Load Network Element Service, J. Wroclawski, RFC
2211, 1997.
5
Specification of Guaranteed Quality of Service, S. Shenker, C. Partridge, and R.
Guerin, RFC 2212, 1997.
6
Aggregation of RSVP for IPv4 and IPv6 Reservations, IETF Draft, Baker and others,
1999.
1163_09.book Page 166 Thursday, December 14, 2000 1:47 PM
1163_09.book Page 167 Thursday, December 14, 2000 1:47 PM
PART
II
Layer 2, MPLS QoSInterworking
with IP QoS
Chapter 8 Layer 2 QoS: Interworking with IP QoS
CHAPTER
8
Layer 2 QoS: Interworking with
IP QoS
Most networks use diverse network technologies. Popular link-layer technologies, such as
Asynchronous Transfer Mode (ATM), Frame Relay, and Ethernet, offer quality of service
(QoS) functionality at Layer 2. This chapter discusses these multi-access Layer 2 techno-
logies and their QoS offerings. Because QoS is only as good as its weakest link, it is
necessary to ensure that Internet Protocol (IP) QoS is seamless across diverse link
technologies. To provide end-to-end IP QoS, it is necessary that QoS at Layer 2 map
to IP QoS, and vice versa.
ATM
ATM1 is a fixed-size cell-switching and multiplexing technology. It is connection-oriented,
and a virtual circuit (VC) must be set up across the ATM network before any user data can
be transferred between two or more ATM attached devices. Primarily, ATM has two types
of connections, or VCs: permanent virtual circuits (PVCs) and switched virtual circuits
(SVCs). PVCs are generally static and need a manual or external configuration to set them
up. SVCs are dynamic and are created based on demand. Their setup requires a signaling
protocol between the ATM endpoints and ATM switches.
An ATM network is composed of ATM switches and ATM end nodes, or hosts. The cell
header contains the information the ATM switches use to switch ATM cells. The data link
layer is broken down into two sublayers: the ATM Adaptation Layer (AAL) and the ATM
layer. You map the different services to the common ATM layer through the AAL. Higher
layers pass down the user information in the form of bits to the AAL. User information gets
encapsulated into an AAL frame, and then the ATM layer breaks the information down into
ATM cells. The reverse is done at the receiver end. This process is known as segmentation
and reassembly (SAR).
Node Interface (NNI), which defines the format for cells between the switching nodes.
The cell formats are shown in Figure 8-1.
VPI
Header VCI VCI
(5 bytes)
PT CLP PT CLP
53 HEC HEC
bytes
8 bits
ATM cell ATM UNI cell ATM NNI cell
Generic flow control (GFC) has local flow-control significance to the user for flow control.
The GFC mechanism is used to control traffic flow from end stations to the network. The
GFC field is not present in the NNI cell format. Two modes of operation are defined:
uncontrolled access and controlled access. Traffic enters the network without GFC-based
flow control in uncontrolled access mode. In controlled access mode, ATM-attached end
nodes shape their transmission in accordance with the value present in GFC. Most UNI
implementations dont use this field.
A virtual path (VP) consists of a bundle of VCs and is assigned to a virtual path identifier
(VPI). ATM switches can switch VPIs, along with all the VCs within them. VCs are the
paths over which user data is sent. Each VC within a VP is assigned a virtual channel
identifier (VCI). VPI and VCI fields are used in ATM switches to make switching decisions.
Figure 8-2 shows a VC being set up between routers R1 and R2 through a network of ATM
switches. All the cells leaving R1 for R2 are tagged with a VPI of 0 and a VCI of 64. The
1163_09.book Page 171 Thursday, December 14, 2000 1:47 PM
ATM 171
ATM switch S1 looks at the VPI and VCI pair on Port 0 and looks them up in its translation
table. Based on the lookup, the ATM switch switches the cells out of Port 1 with a VPI of
1 and a VCI of 100. Similarly, Switch S4 switches cells on Port 4 with a VPI of 1 and a VCI
of 100 onto Port 3 with a VPI of 2 and a VCI of 100 based on its translation table.
0/64 1/100 1
VPI/VCI S1 S2
= 0/64
Port 0
Port 1
R1
1/100
Port 4 Port 3
2/100
S3 S4 R2
1/100 2/100 3
Cell loss priority (CLP)CLP defines a cells priority. If CLP is not set (CLP = 0),
it is considered at a higher priority than cells with CLP set (CLP = 1). With CLP set,
the cell has a higher chance of being discarded at times of network congestion.
Header error control (HEC)HEC is used for detecting and correcting the errors
in the ATM header.
ATM QoS
ATM offers QoS guarantees by making the ATM end system explicitly specify a traffic
contract describing its intended traffic flow characteristics. The flow descriptor carries QoS
parameters, such as Peak Cell Rate (PCR), Sustained Cell Rate (SCR), and burst size.
ATM end systems are responsible for making sure the transmitted traffic meets the QoS
contract. The ATM end system shapes traffic by buffering data and transmitting it within
the contracted QoS parameters. The ATM switches police each users traffic characteristics
and compare them to their QoS contract. If certain traffic is found to exceed the QoS
contract, a switch can set the CLP bit on the nonconforming traffic. A cell with the CLP bit
set has a higher drop probability at times of congestion.
ATM 173
Implementing PPD is straightforward with AAL5. PPD can be signaled on a per-VC basis.
With PPD, after the switch drops cells from the VC, the switch continues dropping cells
from the same VC until the switch sees the parameter set in the ATM cell header indicating
the end of the AAL packet. The end-of-packet cell itself is not dropped. Because AAL5
does not support the simultaneous multiplexing of packets on a single VC, you can use this
cell to delimit packet boundaries.
Because a cell discard of a larger packet (and, hence, PPD) can occur after some of the
earlier cells of the larger packet find space in the output queue, PPD does only a partial
packet discard. Thus, the congested link can still transmit a significant fraction of cells
belonging to corrupted packets. Note, however, that if the discarded cell happens to be the
packets first cell, PPD is effectively doing a full packet discard. On the other hand, EPD3
occurs before any cell is admitted into the output queue. When a new packet arrives, EPD
checks the output buffer usage. If the buffer usage is below the configured threshold value,
the ATM switch knows the buffer space is not about to be exhausted, and all the packets
cells can be queued. Otherwise, the switch assumes the buffer is close to exhaustion and the
entire packet might not be queued; hence, the switch throws the entire packet away.
Therefore, EPD either queues all the cells of a packet or drops the entire packet altogether.
Because EPD does a full packet discard before any cell belonging to the packet is queued,
this process is called early packet discard.
VP Shaping
Similar to the ATM services for an ATM VC, an ATM serviceor shapingcan be applied
on a VP for traffic contract restrictions on an entire VP. Within a shaped VP, all VCs still
can be UBR without strict traffic constrictions carrying best-effort traffic. Figure 8-3 shows
a logical diagram of how VCs are bundled within a VP.
VC VP VP VC
Transmission path
VC VP VP VC
NOTE A VP ATM service can support all the ATM service classes that can be supported at the VC
level.
1163_09.book Page 175 Thursday, December 14, 2000 1:47 PM
ATM 175
The ABR service is configured with a PCR and MCR of 10 Mbps and 1Mbps, respectively.
The atm abr rate-factor 8 8 command configures the cell transmission rate increases and
decreases in response to RM control information from the network.
Note that the PCR and MCR default values are the line rate and 0, respectively. The default
rate factor is 16.
6 meg VP
ISP 1
ISP router
DSL ATM0/0/0
ATM/frame
DSLAM
switch
DSL ATM0/0/1
ISP 2
Frame DS3 3 meg VP ATM DS3
Relay (or OC-3) (or OC-3)
Each ISP connected by the CLEC is assigned an interface on the Tier-1 ISP router, and all
its VCs fall under the same VP. In the sample configuration on the Tier-1 ISP router shown
in Listing 8-2, the CLECs ISP 1 is assigned interface ATM0/0/0 and VP 2. The VP traffic
shaping is configured under the main ATM interface. All VCs use the same VP, and each is
assigned its own subinterface under the main interface.
Listing 8-2 VP Traffic Shaping Configuration on the ATM0/0/0 interface of the Tier 1 ISP Router
interface ATM0/0/0
atm pvp 2 6000000
!
interface ATM0/0/0.1 point-to-point
ip address 212.12.12.1 255.255.255.252
atm pvc 101 2 101 aal5snap
!
interface ATM0/0/0.2 point-to-point
ip address 212.12.12.4 255.255.255.252
atm pvc 102 2 102 aal5snap
As shown in Listing 8-2, the atm pvp configuration command sets up VP traffic shaping
for all the VCs of the specified VP to the stipulated peak rate.
1163_09.book Page 177 Thursday, December 14, 2000 1:47 PM
ATM 177
To examine the VP groups configuration, you can use the show atm vp command, which
shows the VP traffic shaping parameters of all VP groups in the router. The user can create
up to 256 different VP groups. It also shows the number of VCs in a VP group. Listing 8-3
shows the ATM VP traffic shaping parameters.
Listing 8-3 ATM VP Traffic Shaping Parameters
Router#show atm vp
Data CES Peak CES
Interface VPI VCs VCs Kbps Kbps Status
ATM0/0/0 2 3 0 6000000 0 ACTIVE
You can obtain detailed information on VP 2 by using the show atm vp 2 command. This
command shows information about all the VCs with a VPI of 2 in the router. Two
management virtual channels are created automatically by the router per each VP group
configured for the segment OAM F4 flow cells and for the end-to-end OAM F4 flow cells.
Listing 8-4 shows detailed information on the VCs with VPI 2.
Listing 8-4 Information on VP Traffic Shaping Parameters for VPI 2
Router#show atm vp 2
ATM0/0/0 VPI: 2, PeakRate: 6000000, CesRate: 0, DataVCs: 3, CesVCs: 0,
Status: ACTIVE
The show atm vc command shows information on all the VCs in the router without regard
to the VPI information. Listing 8-5 displays information on all ATM VCs enabled on the
router.
Listing 8-5 ATM VC Parameters
VCD / Peak Avg/Min Burst
Interface Name VPI VCI Type Encaps SC Kbps Kbps Cells
Sts
0/0/0 1 2 3 PVC F4-OAM UBR 6000000
UP
0/0/0 2 2 4 PVC F4-OAM UBR 6000000
UP
0/0/0.1 101 2 101 PVC SNAP UBR 6000000
UP
0/0/0.2 102 2 102 PVC SNAP UBR 6000000
UP
0/0/0.3 103 2 103 PVC SNAP UBR 6000000
UP
1163_09.book Page 178 Thursday, December 14, 2000 1:47 PM
IP Subnet1 IP Subnet4
ATM cloud
IP Subnet2
IP Subnet3
Note that from the IP QoS perspective, the IP traffic is transported across the ATM network
without any loss (as any drops in the ATM network are IP QoS-unaware) by using an ATM
service that suits the traffics IP service needs. For IP QoS, each VC in an ATM network
maintains a separate queue. You can apply the WRED and WFQ IP QoS functions on each
VC queue.
Two scenarios are discussed in this section as a way to preserve IP QoS over an ATM
network:
1 A single PVC carrying all the IP traffic to its destinationIP traffic exceeding the
ATM PVC parameters and service at the ingress to the ATM network gets queued, and
IP QoS techniques such as WRED and WFQ are applied on the queue as it builds up
due to congestion conditions.
1163_09.book Page 179 Thursday, December 14, 2000 1:47 PM
WRED ensures that high-precedence traffic has low loss relative to lower-precedence
traffic. WFQ ensures that high-precedence traffic gets a higher bandwidth relative to
the lower-precedence traffic, because it schedules high-precedence traffic more often.
Note that when Class-Based Weighted Fair Queuing (CBWFQ) is run on a PVC, you
can make bandwidth allocations based on traffic class. CBWFQ is discussed in
Chapter 4.
2 A VC bundle (made of multiple PVCs) carrying IP traffic to its destination
When carrying traffic with different QoS (real-time, non-real-time, best-effort) to the
same destination, it is a good idea to provision multiple PVCs across the ATM
network to the destination so that each IP traffic class is carried by a separate PVC.
Each PVC is provisioned to an ATM service class based on the IP traffic it is mapped
to carry. Figure 8-6 depicts IP-ATM QoS using a VC bundle. Some VC bundle
characteristics are as follows:
Each VC in the bundle is mapped to carry traffic with certain IP precedence
value(s). You can map a VC to one or more IP precedence values. Note,
however, that only one routing peering or adjacency is made per PVC
bundle.
You can monitor VC integrity by using ATM OAM or Interim Local
Management Interface (ILMI). If a bundles high-precedence VC fails, you
can either bump its traffic to a lower-precedence VC in the bundle, or the
entire bundle can be declared down.
A separate queue exists for each VC in the bundle. You can apply the WRED
and WFQ IP QoS techniques on each VC queue.
VC1
VC2
VC3 ATM
VC4 cloud
ATM service for the IP traffic is expected to be above UBR (best-effort) class so that no
packets (cells) are dropped as part of ATM QoS within the ATM network. Figure 8-7
illustrates the two IP-ATM QoS scenarios.
ATM cloud
Congestion
feedback
The difference between using a single PVC or a PVC bundle for IP QoS interworking
depends on the cost and traffic needs in the network. Although a PVC bundle can be more
expensive than a single PVC, a PVC bundle provides traffic isolation for critical traffic
classes such as voice. On the other hand, a PVC bundle requires prior traffic engineering
so that all the PVCs in the bundle are utilized optimally. Otherwise, you can run into
conditions in which the PVC in the bundle carrying the high-precedence traffic gets
congested while the PVC carrying the lower-precedence traffic is running relatively
uncongested. Note that you cannot automatically bump high-precedence traffic to a
different member PVC when the PVC carrying high precedence gets congested. When
using a single PVC, you can enable CBWFQ with a priority queue on it so that voice traffic
is prioritized over the rest of the traffic carried by the PVC. CBWFQ with a priority queue
is discussed in Chapter 4.
class, and WRED so that any packet drops are based on the knowledge of the IP
precedence values in the packet.
Table 8-1 Service Classes Based on IP Precedence
Control Gold Silver Bronze
Normal Traffic Precedence 7 Precedence 5 Precedence 3 Precedence 1
Excess Traffic Precedence 6 Precedence 4 Precedence 2 Precedence 0
The ATM PVC is engineered to be lossless because drops in the ATM network are without
regard to the IP precedence value in the packet. The ATM network just functions as a
lossless physical transport mechanism offering a service required by the underlying traffic.
To begin with, the different traffic classes and the policies are defined for each class. The
defined policy is then applied to the ingress interface to the ATM network. Listing 8-6 is a
sample configuration for this functionality.
Listing 8-6 Configuration for IP-ATM QoS Interworking
interface ATM0/0/0.1 point-to-point
ip address 200.200.200.1 255.255.255.252
pvc 0/101
abr 10000 1000
encapsulation aal5nlpid
service-policy output atmpolicy
class-map control
match precedence 7
match precedence 6
class-map gold
match precedence 5
match precedence 4
class-map silver
match precedence 3
match precedence 2
class-map bronze
match precedence 1
match precedence 0
policy-map atmpolicy
class control
bandwidth 10000
random-detect
class gold
bandwidth 40000
random-detect
class silver
bandwidth 30000
random-detect
1163_09.book Page 182 Thursday, December 14, 2000 1:47 PM
Note that the configuration uses modular QoS command-line interface (CLI). Modular QoS
CLI is discussed in Appendix A, Cisco Modular QoS Command-Line Interface.
The configuration shown in Listing 8-6 defines three class mapsgold, silver, and
bronzeto classify data traffic based on precedence. A fourth class map, control, is defined
to match all network control traffic that can carry an IP precedence of 6 and 7.
After the initial traffic classification step, the atmpolicy policy map is defined to stipulate a
bandwidth and WRED policy for the four traffic classes. The gold class, for example, is
allocated a bandwidth of 40 Mbps and a WRED drop policy. Finally, the atmpolicy policy
is applied to the ATM subinterface carrying the ATM PVC to the destination across the
ATM cloud by using the service-policy configuration command. The show policy-map
command shows the policies for each class of a policy map. The atmpolicy policy map
information is shown in Listing 8-7.
Listing 8-7 The atmpolicy Policy Map Information
Router#show policy-map atmpolicy
Policy Map atmpolicy
Weighted Fair Queueing
Class control
Bandwidth 20000 (kbps)
exponential weight 9
class min-threshold max-threshold mark-probability
----------------------------------------------------------
0 - - 1/10
1 - - 1/10
2 - - 1/10
3 - - 1/10
4 - - 1/10
5 - - 1/10
6 128 512 1/10
7 256 512 1/10
rsvp - - 1/10
Class gold
Bandwidth 10000 (kbps)
exponential weight 9
class min-threshold max-threshold mark-probability
----------------------------------------------------------
0 - - 1/10
1 - - 1/10
2 - - 1/10
1163_09.book Page 183 Thursday, December 14, 2000 1:47 PM
0 - - 1/10
1 - - 1/10
2 128 512 1/10
3 256 512 1/10
4 - - 1/10
5 - - 1/10
6 - - 1/10
7 - - 1/10
rsvp - - 1/10
Class bronze
Bandwidth 20000 (kbps)
exponential weight 9
class min-threshold max-threshold mark-probability
----------------------------------------------------------
network uses ATM as the transport technology, and the networks backbone engineer wants
to map traffic with the three precedence levels on different VCs across the ATM cloud.
A PVC bundle (with multiple PVCs) is used to carry traffic to London, the PVCs remote
destination. Listing 8-8 shows the sample configuration of the PVC bundle to deliver IP-
based differentiated services across an ATM network.
Listing 8-8 Configuration for an ATM PVC Bundle
interface atm0/0.1 multipoint
ip address 222.21.123.1 255.255.255.252
bundle london
oam-bundle manage 5
encapsulation aal5snap
protocol ip 192.1.103.20 broadcast
pvc-bundle control 1/107
vbr-nrt 4000 2000 200
precedence 6-7
protect vc
random-detect
pvc-bundle gold 1/105
vbr-nrt 2000 1000 100
precedence 4-5
protect group
random-detect
pvc-bundle silver 1/103
vbr-nrt 1000 500 50
precedence 2-3
random-detect
protect group
pvc-bundle bronze 1/102
ubr
precedence 0-1
random-detect
protect group
Initially, a PVC bundle london is defined. The bundle definition is followed by the
definition of its member PVCscontrol, gold, silver, and bronze. Each member PVC
belongs to an ATM service class based on the IP traffic it is mapped to carry. A silver PVC
bundle member, for example, carries traffic requiring better service than a bronze PVC,
which carries best-effort traffic.
The PVC bundle member control is defined as a protect VC because the control PVC is
crucial for the entire PVC bundle to operate properly. Hence, if the control PVC goes down,
the entire PVC bundle is brought down.
The bundle members gold, silver, and bronze form a protect group. All the member VCs in
the protect group need to go down before the bundle is brought down. If a protect groups
member VC goes down, its traffic is bumped to a lower-precedence VC.
1163_09.book Page 185 Thursday, December 14, 2000 1:47 PM
policy-map lossok
class best-effort
set atm-clp 1
Frame Relay
Frame Relay is a popular wide-area network (WAN) packet technology well suited for data
traffic. It is a simple protocol that avoids link-layer flow control and error correction
functions within the Frame Relay network. These functions are left for the applications in
the end stations. The protocol is best suited for data traffic, because it can carry occasional
bursts.
Frame Relay operates using VCs. A VC offers a logical connection between two end points
in a Frame Relay network. A network can use a Frame Relay VC as a replacement for a
private leased line. You can use PVCs and SVCs in Frame Relay networks. A PVC is set up
by a network operator through a network management station, whereas an SVC is set up
dynamically on a call-by-call basis. PVCs are most commonly used, and SVC support is
relatively new.
A user frame is placed in a Frame Relay header to be sent on a Frame Relay network. The
Frame Relay header is shown in Figure 8-8.
The 10-bit data-link connection identifier (DLCI) is the Frame Relay VC number corres-
ponding to a logical connection to the Frame Relay network. A DLCI has only local
1163_09.book Page 186 Thursday, December 14, 2000 1:47 PM
significance. A Frame Relay switch maintains the VC mapping tables to switch a frame to
its destination DLCI.
The Address Extension (AE) bit indicates a 3- or 4-byte header. It is not supported under
the present Cisco Frame Relay implementation. The Command and Response (C/R) bit is
not used by the Frame Relay protocol and is transmitted unchanged end to end.
Byte 1 Byte 2
Bits 8 7 6 5 4 3 2 1 8 7 6 5 4 3 2 1
The Frame Check Sequence (FCS) is used to verify the frames integrity by the switches in
the Frame Relay network and the destination station. A frame that fails the FCS test is
dropped. A Frame Relay network doesnt attempt to perform any error correction. It is up
to the higher-layer protocol at the end stations to retransmit the frame after discovering the
frame might have been lost.
You can set the FECN bit to a value of 1 by a switch to indicate to a destination data terminal
equipment (DTE) device, such as a router, that congestion was experienced in the direction
of the frame transmission from source to destination.
The BECN bit is set to a value of 1 by a switch to indicate to the destination router that
congestion was experienced in the network in the direction opposite the frame transmission
from source to destination.
The primary benefit of the FECN and BECN congestion notification bits is the capability
of higher-layer protocols to react intelligently to these congestion indicators. The use of
FECN and BECN is shown in Figure 8-9.
Congestion in
this direction
Access
device BECN FECN
Access
device
The DE bit is set by the router or other DTE device to indicate that the marked frame
is of lesser importance relative to other frames being transmitted. It provides a basic
prioritization mechanism in Frame Relay networks. Frames with the DE bit set are
discarded first, before the frames without the DE bit flagged.
FRTS in adaptive mode also permits output on Frame Relay VCs to be throttled based on
received network BECN congestion indicators. It shapes traffic going out on a PVC in
accordance with the bandwidth available in a Frame Relay network. It is able to run at rate
X, and when the network sees BECNs, it drops down to rate Y.
VC Traffic Shaping
A token bucket, similar to the one used for traffic shaping in Chapter 3, Network Boundary
Traffic Conditioners: Packet Classifier, Marker, and Traffic Rate Management, is used as
a traffic descriptor to measure conforming traffic. The contracted average rate is called the
Committed Information Rate (CIR). Burst size (BC) is the amount of data added to the
token bucket of size (BC + BE) at each measuring time interval, TC. TC is defined as
BC CIR. Excess burst size (BE) is the amount of excess burst of data allowed to be sent
during the first interval when the token bucket is full. When a new packet arrives, it is
queued into the output queue and scheduled for transmission based on the queue scheduler
used, such as WFQ or first-in, first-out (FIFO). A packet scheduled for transmission by the
scheduler is transmitted only if enough tokens are available in the bucket equivalent to the
scheduled packet. After a conforming packet is transmitted, an equivalent amount of tokens
are removed from the bucket.
If not all BC bytes are sent in a TC interval, you can transmit the unused bytes in the
subsequent interval along with the new credit for BC bytes. Hence, in a TC interval during
which there is less than BC traffic, the credit can increase to an upper bound of BC + BE for
the next subsequent interval.
If serious load sets in and the token bucket is full, you can send BE + BC bytes in the first
interval and BC bytes in each subsequent interval until congestion conditions ease. As you
can see, you can throttle to the CIR equation at times of congestion.
This relationship between the traffic shaping parameters is shown in Figure 8-10.
Frame Relay traffic shaping allows occasional bursts over the CIR on a PVC, although the
rate throttles to the CIR at times of congestion. A PVC can also be configured for a fixed
data rate equal to the CIR (BE = 0).
Adaptive FRTS
At every time interval TC, a process checks if any BECN was received from the Frame
Relay network. If BECN was received in the last TC interval, the transmission rate is
dropped by 25 percent of the current rate. It might continue to drop until the lower limit is
reached, which is the minimum CIR (MINCIR). No matter how congested the Frame Relay
network is, the router does not drop its transmission rate below MINCIR. The default value
for MINCIR is half of the CIR.
1163_09.book Page 189 Thursday, December 14, 2000 1:47 PM
Bits
Access
rate
BC + BE
BC A
CIR
Time
TO TO+TC
After the traffic rate has adapted to the congestion in the Frame Relay network, it takes 16
TC intervals without BECN to start increasing the traffic rate back to the configured CIR.
FECN/BECN Integration
A UDP- or IP-based application can result in a unidirectional traffic flow, but the
application might not necessarily have data flowing in the opposite direction because both
IP and UDP do not use any acknowledging scheme. If you enable traffic shaping on UDP
data, the only congestion notification set by the network is the FECN bit received in the
frames arriving at the destination router. The router sourcing all this traffic does not see any
BECNs because theres no return traffic. FECN/BECN integration implemented a
command to send a Q.922 test frame with the BECN bit set in response to a frame that has
the FECN set. Note that you need to clear the DE bit in a BECN frame you send in response
to a received FECN. The source router gets a test frame, which is discarded, but the BECN
bit is used to throttle the data flow. To have this interaction, traffic shaping commands need
to be present on the ingress and egress ports.
1163_09.book Page 190 Thursday, December 14, 2000 1:47 PM
NOTE In case of unidirectional traffic, the traffic-shape adaptive command is needed on the
router sourcing the traffic, and the traffic-shape fecn-adapt command is used to send a
Q.922 test frame4 with the BECN bit set in response to an FECN.
with QoS functionality such as WFQ and WRED, is used to ensure a consistent flow of
voice information.
Prior to FRF.12, in an IP environment the only way to fragment frames was to set the IP
maximum transmission unit (MTU) to a small value. This action introduces undesirable
inefficiencies in the system in terms of increased processor overhead and packet overhead,
however. In addition, FRF.12 operates below Layer 3 of the Open System Interconnection
(OSI) model, and hence it fragments not only IP, but other frames as well. With Voice over
IP (VoIP), fragmentation becomes an absolute necessity in low-bandwidth environments.
In addition to FRF.12 fragmentation, Cisco supports FRF.11 Annex C fragmentation and
a Cisco proprietary fragmentation. Frame Relay fragmentation is part of Ciscos traffic-
shaping functionality. Figure 8-11 illustrates Frame Relay fragmentation when carrying
voice and data traffic on a PVC.
Data
Voice
Router Switch
Enabling FRTS is a prerequisite to turning on FRF.12 fragmentation. For voice, FRTS uses
a flow-based Weighted Fair Queuing (WFQ) with a priority queue (PQ) on the shaping
queue. Each Frame Relay PVC has its own PQ-WFQ structure that is used to schedule
packets based on their IP precedence values as per the flow-based WFQ algorithm
discussed in Chapter 4. FRTS uses dual FIFO queues at the interface levelthe first queue
for priority packets such as voice, Local Management Interface (LMI), and related high-
priority packets and the second queue for the rest of the traffic. Note that FRTS disallows
any queuing other than dual FIFO at the interface level.
In Figure 8-12, three different flowstwo data flows and one voice flowarrive for trans-
mission on an interface. The flows are routed to their respective PVC structures based on
the flow header information. In this case, flows 1 and 2 belong to PVC 1, and flow 3 belongs
to PVC 2. Each PVC has its own shaping queue. In this case, PQ-WFQ is enabled on PVC
1 such that all voice flow packets are scheduled first, and all voice packets go to the priority
interface FIFO queue. Data packets scheduled by the shaping queue are fragmented based
on the fragment threshold value and are put in the normal FIFO interface queue. It is
assumed that the fragment threshold is set such that none of the voice packets need to be
fragmented. Though a separate shaping queue exists for each PVC, all the PVCs on the
interface share the dual FIFO interface queues. Packets are transmitted from the priority
FIFO queue with a strict priority over the packets in the normal FIFO interface queue.
1163_09.book Page 192 Thursday, December 14, 2000 1:47 PM
Case Study 8-6: Frame Relay Traffic Shaping with QoS Autosense
Say the network administrator of a nationwide network wants his Frame Relay-connected
routers to shape traffic according to the contracted QoS (CIR, BC, BE) information. Instead
1163_09.book Page 193 Thursday, December 14, 2000 1:47 PM
of an explicit configuration, the administrator wants to use extended LMI so that the routers
get their per-VC QoS information directly from the connected switch in the Frame Relay
network.
For the QoS autosense to work, extended LMI must be configured on both the router and
its connected switch. The router dynamically gets the QoS information from the switch for
each VC.
Based on the QoS (CIR, BC, BE) information received by the router, the router shapes the
traffic on the VC.
Listing 8-10 shows an example configuration needed in a router to enable FRTS with QoS
autosense. Only the configuration on the router is shown.
Listing 8-10 An FRTS Configuration with QoS Autosense
interface Serial0/0
encapsulation frame-relay
frame-relay lmi-type ansi
frame-relay traffic-shaping
frame-relay qos-autosense
DLCI = 17
OUT: CIR 64000 BC 9600 BE 9600 FMIF 4497
IN: CIR 32000 BC 30000 BE 20000 FMIF 4497
Priority 0 (Time elapsed since last update 00:00:03)
1163_09.book Page 194 Thursday, December 14, 2000 1:47 PM
The Byte Limit value is the size of the bucket in bytes (Size of bucket = BC + BE = 9600 +
9600 = 19200 bits = 2400 bytes). Interval is the TC interval (TC = BC CIR = 9600/64000
= 150 ms).
Listing 8-14 E-Business Site DTS Configuration on the Interface Connecting to the Service Provider
interface Serial 0/0/0
encapsulation frame-relay
class-map myclass
match any
1163_09.book Page 195 Thursday, December 14, 2000 1:47 PM
Listing 8-14 E-Business Site DTS Configuration on the Interface Connecting to the Service Provider (Continued)
policy-map mypolicy
class-map myclass
shape peak 256000 8000 8000
shape adaptive 64000
The router at the e-business site is set up to send at a CIR of 256000 and dynamically adapts
the rate at which it receives BECNs. The minimum CIR with continued receipt of BECNs
(congestion) is 64 Kbps.
Listings 8-15 and 8-16 show the FRTS and DTS configurations needed on the service
provider router.
Listing 8-15 FRTS Configuration on the Service Provider Router Interface
interface Serial 0/0
encapsulation frame-relay
frame-relay traffic-shaping
class-map myclass
match any
policy-map mypolicy
class-map myclass
shape peak 32000 8000 8000
shape fecn-adapt
Because the traffic from the e-business router to the remote router is UDP-based and
unidirectional, the e-business site router does not receive any BECNs from the service
provider router to adaptively shape its traffic to the adaptive rate in case of congestion.
Hence, the service provider router needs to be made to reflect a frame with a BECN bit set
in response to a frame with an FECN bit for the e-business router to shape adaptively.
The configuration sets the router to send with a CIR of 32 Kbps and respond with a BECN
to any received FECNs.
DTS uses modular QoS CLI, which is discussed in Appendix A.
The four PVCs configured to the destination have 203, 202, 201, and 200 as their DLCI
numbers. The frame-relay priority-dlci-group command configures these four DLCIs
to carry traffic of high, medium, normal, and low priority, respectively. Note that the
command doesnt enable priority queuing, it only assigns different DLCIs to carry different
traffic classes. The command uses priority-list 1 to categorize traffic into the four classes
that carry traffic with IP precedence 03. The high-priority traffic is carried on the high-
priority DLCI, the medium-priority traffic on the medium-priority DLCI, and so on.
1163_09.book Page 197 Thursday, December 14, 2000 1:47 PM
Listings 8-18 and 8-19 display the output of two relevant show commands. The show
frame-relay pvc command in Listing 8-18 shows all the known PVC/DLCI and priority
DLCI group information on the router. This command also displays each DLCIs packet
statistics. The show queueing priority command in Listing 8-19 displays the configured
priority lists on the router.
Listing 8-18 show frame-relay pvc Command Output
Router#show frame-relay pvc
PVC Statistics for interface Serial5/0 (Frame Relay DTE)
Active Inactive Deleted Static
Local 4 0 0 0
Switched 0 0 0 0
Unused 0 0 0 0
DLCI = 200, DLCI USAGE = LOCAL, PVC STATUS = ACTIVE, INTERFACE = Se5/0
DLCI = 201, DLCI USAGE = LOCAL, PVC STATUS = ACTIVE, INTERFACE = Se5/0
DLCI = 202, DLCI USAGE = LOCAL, PVC STATUS = ACTIVE, INTERFACE = Se5/0
DLCI = 203, DLCI USAGE = LOCAL, PVC STATUS = ACTIVE, INTERFACE = Se5/0
Listing 8-21 Sample Configuration to Map Precedence 0 IP Packets with the DE Bit Set to 1 on the Frame Relay
Network
frame-relay de-list 1 protocol ip list 101
The de-group command defines the packet class and the DLCI on which the DE bit
mapping occurs. The de-list command defines the packet class that needs to be sent on the
Frame Relay network with the DE bit set. In the preceding example, all IP packets with
precedence 0 that need to go on the Frame Relay DLCI have the DE bit set. All other IP
traffic goes on the Frame Relay without the DE bit flagged (DE = 0).
map-class frag
frame-relay cir 64000
1163_09.book Page 200 Thursday, December 14, 2000 1:47 PM
Listing 8-22 Enable Frame Relay Fragmentation with PQ-WFQ Shaping Queue (Continued)
frame-relay bc 8000
frame-relay fragment 64
frame-relay fair-queue
Class frag is defined under the Frame Relay interface to configure Frame Relay fragment-
ation along with traffic shaping. The frame-relay fragment command defines the fragment
size. According to the configuration, any packet bigger than 64 bytes is fragmented. Frame
Relay traffic shaping needs to be enabled on the Frame Relay interface to enable
fragmentation. A CIR of 64 Kbps and a BC of 8000 bytes is defined for traffic shaping. On
the queue used to buffer frames for traffic shaping, WFQ scheduling is used to decide which
buffered packet to transmit next. The ip rtp priority command is used to enable a priority
queue within WFQ. By default, end-to-end fragmentation based on the FRF.12
specification is used. Other types of fragmentation also are supported on the Cisco router.
Fragmentation based on the FRF.11 Annex C specification is used if the vofr command is
used on the Frame Relay interface. The command vofr cisco is used to enable Cisco
proprietary fragmentation.
Destination Source
Preamble Type Data FCS
address address
2 MAC length/type
Variable Data
PAD
4 FCS
1163_09.book Page 202 Thursday, December 14, 2000 1:47 PM
802.1Q defines a new tagged frame type by adding a 4-byte tag, which is made up of the
following:
2 bytes of Tagged Protocol Identifier (TPID)
0x8100 is used to indicate an 802.1Q packet
2 bytes of Tagged Control Information (TCI)
3-bit 802.1p bits
1-bit canonical format identifier (CFI)
12-bit VLAN Identifier (ID)
Figure 8-15 shows how the original Ethernet/802.3 frame is changed into a tagged 802.1Q
frame. The FCS needs to be recalculated after introducing the 4-byte tag.
Original
Dest Src Len/Etype Data FCS
frame
Len/Etype Tagged
Dest Src TPID TCI Data FCS
frame
802.1p provides a way to maintain priority information across LANs. It offers eight
priorities from the three 802.1p bits. To support 802.1p, the link layer has to support
multiple queuesone for each priority or traffic class. The high-priority traffic is always
preferred over lower-priority traffic. A switch preserves the priority values while switching
a frame.
NOTE You can easily map the 3-bit IP precedence field to the three priority bits of IEEE 802.1p,
and vice versa, to provide interworking between IP QoS and IEEE 802.1p.
Ciscos Catalyst family of switches uses 802.1p class of service (CoS) bits for prioritizing
traffic based on QoS features such as Weighted Round Robin (WRR) and WRED. WRR
and WRED are discussed in detail in Chapter 5, Per-Hop Behavior: Resource Allocation
II, and in Chapter 6, respectively. Though 802.1p uses the tag defined by IEEE 802.1Q, a
1163_09.book Page 203 Thursday, December 14, 2000 1:47 PM
standard for VLANs, the 802.1p can still be used in the absence of VLANs, as shown in
Figure 8-16. A VLAN ID of 0 is reserved and is used to indicate the absence of VLANs.
Incoming frame
VID = 0, Prio = 5
Switch
No VLAN
With the addition of the 4-byte tag introduced by the 802.1Q and 802.1p specifications, an
Ethernet frame can now exceed the maximum frame size of 1518 bytes. Hence, IEEE
802.3ac is tasked with modifying the 802.3 standard to extend the maximum frame size
from 1518 to 1522 bytes.
NOTE IEEE 802.1Q Standard for Virtual Bridged Local Area Networks defines a method of
establishing VLANs.
Expedited traffic capabilities are defined as part of the 802.1p standard. 802.1p is part of
the recently modified version of 802.1D7, a standard for transparent bridging. Expedited
traffic capabilities define traffic classes to allow user priorities at frame level.
IETFs Integrated Services over Specific Lower Layers (ISSLL) Working Group8 is
defining a way to map Layer 3 Resource Reservation Protocol (RSVP) requests to 802.1p
priorities through a Subnet Bandwidth Manager (SBM). SBM is covered in the next
section.
SBM
SBM9 is a signaling protocol that supports RSVP-based admission control over the
Ethernet family of 802.3-style networks. As discussed in Chapter 7, Integrated Services:
RSVP, RSVP is an end-to-end signaling mechanism used to request specific resource
reservation from the network. Across an Ethernet, guaranteed bandwidth reservation is not
possible by any single station on the segment because the Ethernet segment is a shared
medium. A station on the Ethernet segment has no idea of the reservations guaranteed by
the other stations on the segment and can send traffic to the segment at a rate that might
compromise the reservations existing on the other stations. SBM is a means of supporting
RSVP-based reservations on 802.3-style networks.
An Ethernet segment with one or more SBM-capable devices is referred to as a managed
Ethernet segment. A managed segment can be either a shared segment with one or many
SBM stations or a switched segment with up to two SBM stations. On a managed segment,
1163_09.book Page 204 Thursday, December 14, 2000 1:47 PM
one of the SBM-capable devices acts as a designated SBM (DSBM). You can elect a DSBM
dynamically, or you can stipulate it by static configuration on SBM stations.
All other SBM-capable stations other than the DSBM on a managed segment act as DSBM
clients. A DSBM is responsible for admission control for all resource reservation requests
originating from DSBM clients in its managed segment. Cisco routers and Windows NT 5.0
are examples of stations with DSBM functionality.
A station with SBM functionality is called an SBM-capable station. An SBM-capable
station configured to participate in DSBM election is called a candidate DSBM station.
Initialization
SBM uses two multicast addresses for its operation:
AllSBMAAddressThis address is used for DSBM election and all DSBM
messages. All SBM-capable stations listen on this address. A DSBM sends its
messages on this address.
DSBMLogicalAddressThis address is used for RSVP PATH messages from
DSBM clients to DSBM. Only candidate DSBM stations listen to this address.
When RSVP is enabled on an SBM-capable station on a shared segment, RSVP registers
the interface on the shared segment to listen to the AllSBMAddress multicast address. If the
station receives a message I_AM_DSBM on the AllSBMAddress, the interface is
considered to be on a managed segment. RSVP on a managed segment should operate
according to SBM protocol. A DSBM client on a managed segment will listen for RSVP
PATH messages destined to it through the DSBM on AllSBMAddress.
A station configured as a candidate DSBM listens to DSBMLogicalAddress along with
AllSBMAddress. A candidate DSBM sends a DSBM_WILLING message on the managed
segment. All RSVP PATH messages originated by the SBM clients are sent to the
DSBMLogicalAddress. A DSBM does not listen for PATH messages on the
AllSBAAddress.
Broadly, an SBM-capable station adds the following functionality:
DSBM ElectionA mechanism to elect a DSBM on a managed segment
RSVP ExtensionsExtensions for incoming and outgoing RSVP PATH and PATH
TEAR message processing
The following two sections discuss these requirements.
1163_09.book Page 205 Thursday, December 14, 2000 1:47 PM
DSBM Election
A new SBM station on a managed segment initially listens for a period of time to see if a
DSBM is already elected for that segment. If it receives an I_AM_DSBM message on
AllSBMAddress, it doesnt participate in the DSBM election until the DSBM goes down
and a new DSBM election process becomes necessary.
A DSBM sends an I_AM_DSBM message every DSBMRefreshInterval seconds. If a
DSBM client doesnt see an I_AM_DSBM message after DSBMDeadInterval (a multiple
of DSBMRefreshInterval) seconds, it assumes the DSBM is probably down and starts a
new DSBM election process if it is configured to be a candidate DSBM.
During DSBM election, each candidate DSBM station sends a DSBM_WILLING message
to DSBMLogicalAddress listing its interface address and SBM priority. SBM priority
determines the precedence of a candidate DSBM to become a DSBM. A higher-priority
candidate DSBM station wins the election. If the SBM priority is the same, the tie is broken
by using the IP addresses of the candidate DSBMs. A candidate DSBM with the highest IP
address wins the election.
RSVP Extensions
Under SBM, a DSBM intercepts all incoming and outgoing RSVP PATH and PATH-TEAR
messages, adding an extra hop to the normal RSVP operation.
All outgoing RSVP PATH messages from the SBM client on a managed segment are sent
to the segments DSBM device (using DSBMLogicalAddress) instead of to the RSVP
session destination address. After processing, the DSBM forwards the PATH message to the
RSVP session destination address. As part of the processing, the DSBM builds and updates
a Path State Block (PSB) for the session and maintains the previous hop L2/L3 addresses
of the PATH message.
An incoming RSVP PATH message on a DSBM requires parsing of the additional SBM
objects and setting up the required SBM-related information in the PSB. An RSVP PATH-
TEAR message is used to tear down an established PSB for an RSVP session.
RSVP RESV messages dont need any changes as part of SBM. An SBM client wanting to
make a reservation after processing an incoming RSVP PATH message follows the standard
RSVP rules and sends RSVP RESV messages to the previous hop L2/L3 addresses (the
segments DSBM) of the incoming PATH message based on the information in the sessions
PSB.
A DSBM processes the RSVP RESV message from an SBM client based on available
bandwidth. If the request cannot be granted, an RSVP RESVERR message is sent to the
SBM client requesting the reservation. If the reservation request can be granted, the RESV
message is forwarded to the previous hop address of the incoming PATH message based on
the sessions PSB. Similar to standard RSVP, a DSBM can merge reservation requests when
possible.
1163_09.book Page 206 Thursday, December 14, 2000 1:47 PM
Summary
ATM provides rich QoS functionality that offers a variety of services. PPD and EPD cell
discard techniques improve the effective throughput relative to a random cell discard in an
ATM network carrying IP traffic. When a cell needs to be dropped in an ATM network, PPD
can send a partial packet (not all packet cells can be dropped), whereas EPD drops all the
packet cells.
For running IP QoS end to end across an ATM network, packet drops and scheduling at the
ingress to the ATM network should be done based on the packets IP precedence. For this
purpose, the IP QoS technologies WFQ and WRED are used. Within the ATM cloud, the
ATM VC carrying the IP traffic is provisioned to be lossless and offers a service meeting
the IP traffics service requirements.
When all the traffic to a destination is carried over a single PVC, you can apply IP QoS on
a per-VC basis. In the case of a PVC bundle where multiple PVCs exist to the destination,
you can map certain IP precedence to each PVC and apply per-VC IP QoS for each VC in
the bundle.
Frame Relay offers a CIR-based QoS with a capability to burst above the committed QoS
when the network is not congested. It also offers extensive congestion control parameters.
Frame Relay fragmentation enables real-time traffic to be carried on the same PVC as the
data packets made up of relatively large packet sizes.
On a shared or a switched Ethernet, you can prioritize traffic using 802.1p on an 802.1Q
frame. A high-priority frame gets precedence over a lower-priority frame. RSVP-based
bandwidth reservation on an Ethernet becomes a problem because it is a shared medium.
SBM designates a single station to make RSVP-type bandwidth reservations for the entire
Ethernet segment.
1163_09.book Page 207 Thursday, December 14, 2000 1:47 PM
References
1 The ATM Forum, www.atmforum.com/.
2 Dynamics of TCP Traffic over ATM Networks, A. Romanow and S. Floyd, IEEE JSAC,
V. 13 N. 4, May 1995, pp. 633-641.
3 Early Packet Discard (EDP) Page, www.aciri.org/floyd/epd.html.
4 ISDN Data Link Layer Specification for Frame Mode Bearer Services, International
Telegraph and Telephone Consultative Committee, CCITT Recommendation Q.922, 19
April 1991.
5 Frame Relay Forum (FRF), www.frforum.com.
6 Virtual Bridged Local Area Networks, IEEE 802.1Q, grouper.ieee.org/groups/802/1/
vlan.html.
7 MAC Bridges, IEEE 802.1D, grouper.ieee.org/groups/802/1/mac.html.
8 Integrated Services over Specific Link Layers (ISSLL), IETF Working Group,
www.ietf.org/html.charters/issll-charter.html.
9 SBM: A Protocol for RSVP-Based Admission Control over IEEE 802-Style Networks,
Ed. Yavatkar and others, draft-ietf-issll-is802-sbm-08.txt, search.ietf.org/internet-
drafts/draft-ietf-issll-is802-sbm-08.txt.
1163_09.book Page 209 Thursday, December 14, 2000 1:47 PM
1163_09.book Page 210 Thursday, December 14, 2000 1:47 PM
1163_09.book Page 211 Thursday, December 14, 2000 1:47 PM
CHAPTER
9
MPLS
MPLS1 is an IETF standard for label-swapping-based forwarding in the presence of routing
information. It consists of two principal components: control and forwarding. The control
component uses a label distribution protocol to maintain label-forwarding information for
all destinations in the MPLS network. The forwarding component switches packets by
swapping labels using the label information carried in the packet and the label-forwarding
information maintained by the control component.
MPLS, as the name suggests, works for different network layer protocols. As such, the
forwarding component is independent of any network layer protocol. The control compo-
nent has to support label distribution for different network layer protocols to enable MPLS
use with multiple network layer protocols.
1163_09.book Page 212 Thursday, December 14, 2000 1:47 PM
Forwarding Component
MPLS packet forwarding occurs by using a label-swapping technique. When a packet
carrying a label arrives at a Label Switching Router (LSR), the LSR uses the label as the
index in its Label Information Base (LIB). For an incoming label, LIB carries a matching
entry with the corresponding outgoing label, interface, and link-level encapsulation infor-
mation to forward the packet. Based on the information in the LIB, the LSR swaps the
incoming label with the outgoing label and transmits the packet on the outgoing interface
with the appropriate link-layer encapsulation.
A Label Edge Router (LER) is an edge router in the MPLS cloud. Some LER interfaces
perform non-MPLS-based forwarding and some run MPLS. An LER adds an MPLS
label to all packets entering the MPLS cloud from non-MPLS interfaces. On the same
token, it removes the MPLS label from a packet leaving the MPLS cloud. The forwarding
behavior in an MPLS network is depicted in Figure 9-1.
LER
LSR
LER
MPLS 213
Control Component
The control component is responsible for creating label bindings and then distributing the
label binding information among LSRs.
Label binding is an association between a label and network layers reachability infor-
mation or a single traffic flow, based on the forwarding granularity. On one end, a label can
be associated to a group of routes, thereby providing MPLS good-scaling capabilities. On
the other end, a label can be bound to a single application flow, accommodating flexible
forwarding functionality. MPLS network operation is shown in Figure 9-2.
2 6
1
3
4 5
1. Existing routing protocols (e.g., OSPF, ISIS) establish reachability to destination networks.
2. Label Distribution Protocol (LDP) establishes tag to destination network mappings.
3. Ingress label switch router receives packet, performs Layer 3 value-added services, and tags packets.
4. Core LSR switches packets by using label swapping.
5. The penultimate hop to the destination removes the label and sends an unlabeled packet towards the
last hop LER.
6. Egress LER switches an unlabeled packet based on its destination IP address.
1163_09.book Page 214 Thursday, December 14, 2000 1:47 PM
MPLS 215
The other label distribution mode is termed ordered control mode. In this mode, a router
waits for the label from its downstream neighbor before sending its label upstream.
NOTE Some important points to note regarding the MPLS control component:
The total number of labels used in an LSR is no greater than the number of its CEF
entries. Actually, in most cases, you can associate a single label with a group of
routes sharing the same next hop; hence, the number of labels used is much less than
the number of CEF entries. This provides a scalable architecture.
Label allocation is driven by topology information as reflected by CEF, which is
based on the routing information and not on actual data traffic.
MPLS doesnt replace IP routing protocols. The MPLS control component depends
on the existence of routing information in a router, but it is independent of the kind
of IP routing protocol used in an MPLS network. For that matter, you can use any or
multiple routing protocols in an MPLS network.
Label Encapsulation
A packet can carry label information in a variety of ways:
As a 4-byte label inserted between the Layer 2 and network layer headersThis
applies to Point-to-Point Protocol (PPP) links and Ethernet (all flavors) LANs. A
single MPLS label or a label stack (multiple labels) can be carried in this way. Figure
9-3 shows how the label is carried over PPP links and over an Ethernet-type LAN.
As a part of the Layer 2 headerThis applies to ATM, where the label information
is carried in the VPI/VCI fields, as shown in Figure 9-4.
Figure 9-4 MPLS Label Carried in the VPI/VCI Fields in an ATM Header
MPLS label
1163_09.book Page 217 Thursday, December 14, 2000 1:47 PM
MPLS 217
As part of the ATM Adaptation Layer 5 (AAL5) frame before segmentation and
reassembly (SAR)This occurs in an ATM environment for label information made
up of a label stack (multiple MPLS label fields).
An MPLS label field consists of a label header and a 20-bit label. The label header
consists of three fields: CoS, S bit, and Time-to-Live (TTL). The 4-byte MPLS label
field format is shown in Figure 9-5.
0 1 2 3
012345678910123456789012345678901
Label CoS S TTL
Label 20 bits
CoS (class of service) bits 3 bits
S (Bottom of stack) 1 bit
TTL (Time-to-Live) 8 bits
NOTE The CoS field in the MPLS header has only 3 bits. As such, it can carry
only the 3-bit IP precedence field and not the 6-bit Differentiated
Services Code Point (DSCP) field. Therefore, as needed, the CoS
information can be carried as one of the labels in an MPLS label stack.
The label field is 20 bits in length and can fit in either the IP precedence
field or the DSCP field.
1163_09.book Page 218 Thursday, December 14, 2000 1:47 PM
S bitIndicates a label entry at the bottom of the label stack. It is set to 1 for the last
entry in the label stack and to zero for all other label stack entries. This allows binding
a prefix with multiple labels, also called a label stack. In the case of a label stack, each
label has its own associated CoS, S, and TTL values.
TTL (8 bits)Indicates the time to live for an MPLS packet. The TTL value, when
set at the edge of the MPLS network, is decremented at each MPLS network hop.
NOTE The IP TTL field is copied into the MPLS TTL field during label imposition by default. It
enables the traceroute utility to show all the MPLS hops when the destination lies within
or across the MPLS cloud.
The no mpls ip propagate-ttl command is used to disallow copying of IP TTL into the
MPLS TTL field at the ingress to the MPLS network. In this case, the MPLS TTL field is
set to 255. Hence, the traceroute output does not show any hops within the MPLS network.
It shows only one IP hop to transit the entire MPLS domain.
Local tag: 26
222.222.222.22 Hssi1/0
Local tag: 29
Local Outgoing Outgoing
Prefix
label label or VC interface
29 POP label 222.222.222.3 Hssi1/0
The mpls ip command is used to enable MPLS on a router. One prerequisite for label
distribution is CEF, which you can enable by using the global ip cef command.
1163_09.book Page 220 Thursday, December 14, 2000 1:47 PM
In downstream label allocation, an MPLS router allocates an incoming label for each CEF
entry and distributes it on all its interfaces. An upstream router receives a label binding and
uses it as an outgoing label for its associated CEF prefix. In this method of label allocation,
the incoming label is allocated locally and the outgoing label is received remotely from a
downstream router.
This case study concentrates on studying the behavior of downstream label distribution
for one prefix, 222.222.222.3. The following Listings show information on the prefix
222.222.222.3 from the CEF table, the label bindings, and the label forwarding table from
each router. The show ip cef, show mpls ldp bindings, and show mpls forwarding-table
commands are used to fetch information on CEF, label binding, and label-based
forwarding, respectively. Listings 9-1, 9-2, and 9-3 show information on CEF, label
binding, and the label distribution protocol from Router NewYork.
Listing 9-1 Information on the 222.222.222.3 Prefix in the CEF Table of Router NewYork
NewYork#show ip cef 222.222.222.3
222.222.222.3/32, version 170, connected, receive
Listing 9-2 Label Binding Information for Prefix 222.222.222.3 on Router NewYork
NewYork#show mpls ldp bindings
LIB entry: 222.222.222.3/32, rev 118
local binding: label: imp-null
remote binding: lsr: 222.222.222.2:0, label: 26
remote binding: lsr: 222.222.222.4:0, label: 29
MPLS performs label binding for all prefixes in the CEF table and distributes them to all
established LDP neighbors. The show mpls ldp bindings command shows the label
bindings. A local binding is advertised upstream by a router. In this case, the local binding
for prefix 222.222.222.3 is NULL because it is a directly connected IP address on the
router. Router NewYork advertises the null label binding to its LDP adjacent routers,
Chicago and Dallas. A router receiving a null label binding for a prefix pops the label out
when forwarding a packet destined to this prefix.
1163_09.book Page 221 Thursday, December 14, 2000 1:47 PM
The remote bindings are the label bindings advertised by the respective routers. The remote
LSR routers 222.222.222.2 and 222.222.222.4 have a local label of 26 and 29, respectively.
A local label binding is advertised to all LDP adjacent routers. The show mpls ldp
parameters command displays the LDP protocol and label binding information.
Because Router NewYork receives packets destined to its directly connected 222.222.222.3
prefix without a label, the label forwarding table does not carry any information on prefix
222.222.222.3.
Listings 9-4, 9-5, and 9-6 display CEF, label binding, and label forwarding information for
prefix 222.222.222.3 from Router Chicago.
Listing 9-4 CEF Information on Prefix 222.222.222.3 in Router Chicago
Chicago#show ip cef 222.222.222.3
222.222.222.3/32, version 179, cached adjacency to Hssi1/0
0 packets, 0 bytes
via 210.210.210.9, Hssi1/0, 0 dependencies
next hop 210.210.210.9, Hssi1/0
valid cached adjacency
Listing 9-5 Label Binding Information for Prefix 222.222.222.3 in Router Chicago
Chicago#show mpls ldp bindings
LIB entry: 222.222.222.3/32, rev 90
local binding: label: 26
remote binding: lsr: 222.222.222.3:0, label: imp-null
remote binding: lsr: 222.222.222.1:0, label: 31
The local label 26 is the label binding for 222.222.222.3 on Router Chicago. This local
binding is triggered by the presence of 222.222.222.3 in the CEF table, and it is distributed
to all its LDP neighbors. LSR SanFrancisco has a local binding of label 31 for
222.222.222.3, which it advertised to all its LDP neighbors. The show mpls forwarding-
table command shows the information required to switch packets by label swapping. The
outgoing label is the pop label because it received a null label from the remote NewYork
LSR on its outgoing interface (because the prefix is a directly connected address on the
1163_09.book Page 222 Thursday, December 14, 2000 1:47 PM
remote LSR). In this example, any incoming packet with a label of 26 is switched to the
outgoing interface Hssi1/0 after removing or popping the label. The label is removed
because the packet is switched to its ultimate destination router. This phenomenon for
popping the label at the penultimate hop to the destination is termed penultimate hop
popping (PHP).
Listings 9-7, 9-8, and 9-9 give CEF, label binding, and label-based forwarding information
for the 222.222.222.3 prefix in Router Dallas.
Listing 9-7 CEF-related Information on Prefix 222.222.222.3 in Router Dallas
Dallas#show ip cef 222.222.222.3
222.222.222.3/32, version 18, cached adjacency to Hssi1/0
0 packets, 0 bytes
via 210.210.210.14, Hssi1/0, 0 dependencies
next hop 210.210.210.14, Hssi1/0
valid cached adjacency
Listing 9-9 Label-based Forwarding Information for Prefix 222.222.222.3 in Router Dallas
Dallas#show mpls forwarding-table
Local Outgoing Prefix Bytes label Outgoing Next Hop
label label or VC or Tunnel Id switched interface
29 Pop label 222.222.222.3/32 1190 Hs1/0 point2point
Explanations for the preceding listings from Router Dallas are largely similar to the
discussion on the listings from Router Chicago.
Listings 9-10, 9-11, and 9-12 display CEF, label binding, and label-based forwarding
information on prefix 222.222.222.3 in Router SanFrancisco.
Listing 9-10 CEF Entry for Prefix 222.222.222.3 in Router SanFrancisco
SanFrancisco#show ip cef 222.222.222.3
222.222.222.3/32, version 38, per-destination sharing
0 packets, 0 bytes
via 210.210.210.22, Hssi1/1, 0 dependencies
traffic share 1
next hop 210.210.210.22, Hssi1/1
valid adjacency
1163_09.book Page 223 Thursday, December 14, 2000 1:47 PM
Listing 9-10 CEF Entry for Prefix 222.222.222.3 in Router SanFrancisco (Continued)
via 210.210.210.18, Hssi1/0, 0 dependencies
traffic share 1
next hop 210.210.210.18, Hssi1/0
valid adjacency
0 packets, 0 bytes switched through the prefix
The CEF table shows two equal cost paths taken to reach 222.222.222.3. The local binding
for the prefix is 31, and it distributes to all its LDP neighbors. The remote bindings show
the local label bindings distributed by the LDP adjacent routers. The MPLS router label
switches packets received with a local label with an outgoing label based on the received
remote binding information.
MPLS QoS
QoS is an important component of MPLS. In an MPLS network, QoS information is carried
in the label headers MPLS CoS field.
Like IP QoS, MPLS QoS is achieved in two main logical steps, as shown in Table 9-2, and
uses the same associated QoS functions. Figure 9-7 depicts the QoS functions used in an
MPLS network.
MPLS uses the same IP QoS functions to provide differentiated QoS for traffic within
an MPLS network. The only real difference is that MPLS QoS is based on the CoS bits
in the MPLS label, whereas IP QoS is based on the IP precedence field in the IP header.
On an ATM backbone with an MPLS-enabled ATM switch, the switch can support MPLS
CoS in two ways:
Single Label Switched Path (LSP) with Available Bit Rate (ABR) service
Parallel LSPs with Label Bit Rate (LBR) service
1163_09.book Page 224 Thursday, December 14, 2000 1:47 PM
A single LSP using ATM ABR service can be established through LDP. All MPLS traffic
uses the same ABR LSP, and the differentiation is made on the ingress routers to the ATM
cloud by running WFQ and WRED algorithms on traffic going over an LSP.
Multiple LSPs in parallel can be established through LDP to support traffic with multiple
precedence values. Each established LSP is mapped to carry traffic of certain MPLS CoS
values. The LSPs use the LBR ATM service. LBR is a new ATM service category that relies
on scheduling and discarding in the ATM switch based on WFQ and WRED, respectively,
and hence is more appropriate for IP.
1163_09.book Page 225 Thursday, December 14, 2000 1:47 PM
Seattle
LSR
ATM-LSR
LER
Atlanta
When an ATM switch doesnt support MPLS, you can use ATM QoS using the ATM Forum
traffic class (Constant Bit Rate [CBR], Variable Bit Rate [VBR], and ABR) and its IP
interworking, as discussed in Chapter 8, Layer 2 QoS: Interworking with IP QoS.
End-to-End IP QoS
You can set the MPLS CoS bits at the edge of the network to provide traffic classification
so that the QoS functions within the network can provide differentiated QoS. As such, to
deliver end-to-end IP QoS across a QoS-enabled MPLS network, you map or copy the
IP precedence value to the MPLS CoS bits at the edge of the MPLS network. The IP
precedence value continues to be used after the packet exits the MPLS network. Table 9-3
shows the various QoS functions for delivering end-to-end IP QoS across an MPLS
network.
Two sites, Seattle and Atlanta, in the IP network are connected through the MPLS network
as shown previously in Figure 9-7. Discuss the MPLS CoS functionality needed in the
MPLS network to offer a standard service class for the traffic from Seattle to Atlanta.
Table 9-4 Four Service Classes Using MPLS CoS
IP Precedence Bits Type of Service
Class (Value in Decimals) (ToS) Class Bits Drop Priority
Class 0 (Available) 000 (0) 00 0
100 (4) 00 1
Class 1 (Standard) 001 (1) 01 0
101 (5) 01 1
Class 2 (Premium) 010 (2) 10 0
110 (6) 10 1
Class 3 (Control) 011 (3) 11 0
111 (7) 11 1
1163_09.book Page 227 Thursday, December 14, 2000 1:47 PM
LER
IP traffic from the Seattle site going to the Atlanta site enters the MPLS network on the LER
router.
On the LER router interface that connects to the Seattle site, enable CAR to police
incoming traffic according to the service contract. In Listing 9-13, the incoming traffic on
the interface is contracted at standard CoS, with traffic of 20 Mbps getting an IP precedence
of 5 and any exceeding traffic getting an IP precedence of 1.
Listing 9-13 Enable CAR to Classify Traffic for Standard CoS
interface Hssi0/0/0
ip address 221.221.221.254 255.255.255.252
rate-limit input 20000000 2000000 8000000 conform-action set-mpls-exp-transmit
transmit 5
exceed-action set-mpls-exp-transmit 1
As part of IP/MPLS QoS Interworking, IP precedence is copied into the MPLS CoS field.
From the LER, the Atlanta-bound traffic goes via the LSR. On the LER POS3/0/0 interface
connecting to the LSR, WRED and WFQ are enabled to differentiate traffic based on the
MPLS CoS value carried in the MPLS packet.
LSR
Traffic going to the Atlanta site goes via an MPLS ATM network. On the LSR, parallel
LSPs with LBR service are set up such that separate LSPs are used to carry traffic belonging
to each IP precedence. It enables traffic differentiation based on IP precedence. Listing
9-14 shows a sample configuration to enable MPLS-based parallel LSPs with LBR service.
On each LSP, WRED is enabled.
Listing 9-14 Enable Parallel LSPs with LBR Service on an LSR
interface ATM1/1/0
!
interface ATM 1/1/0.1 mpls
ip unnumbered Loopback0
mpls multi-vc
mpls random detect
MPLS VPN
One important MPLS application is the VPN2 service. A client with multiple remote sites
can connect to a VPN service provider backbone at multiple locations. The VPN backbone
offers connectivity among the different client sites. The characteristics of this connectivity
make the service provider cloud look like a private network to the client. No
communication is allowed between different VPNs on a VPN service provider backbone.
1163_09.book Page 228 Thursday, December 14, 2000 1:47 PM
Any device at a VPN site can communicate only with a device at a site belonging to the
same VPN.
In a service provider network, a Provider Edge router connects to the Customer Edge router
at each VPN site. A VPN usually has multiple geographically distributed sites that connect
to the service providers local Provider Edge routers. A VPN site and its associated routes
are assigned one or more VPN colors. Each VPN color defines the VPN a site belongs to.
A site can communicate with another site connected to the VPN backbone only if it belongs
to the same VPN color. A VPN intranet service is provided among all sites connected to the
VPN backbone using the same color. A site to one VPN can communicate to a different
VPN or to the Internet by using a VPN extranet service. A VPN extranet service can be
provided by selectively leaking some external routes of a different VPN or of the Internet
into a VPN intranet.
Provider Edge routers hold routing information only for the VPNs to which they are directly
connected. Provider Edge routers in the VPN provider network are fully meshed using
multiprotocol internal BGP (IBGP)3 peerings.
Multiple VPNs can use the same IP version 4 (IPv4) addresses. Examples of such addresses
are IP private addresses and unregistered IP addresses. The Provider Edge router needs to
distinguish among such addresses of different VPNs. To enable this, a new address family
called VPN-IPv4 is defined. The VPN-IPv4 address is a 12-byte value; the first eight bytes
carry the route distinguisher (RD) value, and the last four bytes consist of the IPv4 address.
The RD is used to make the private and unregistered IPv4 addresses in a VPN network
unique in a service provider backbone. The RD consists of a 2-byte autonomous system
(AS) number, followed by a 4-byte value that the provider can assign. VPN-IPv4 addresses
are treated as a different address family and are carried in BGP by using BGP multiprotocol
extensions. In these BGP extensions, label-mapping information is carried as part of the
Network Layer Reachability Information (NLRI)4. The label identifies the output interface
connecting to this NLRI.
The extended community attribute5 is used to carry Route Target (RT) and Source of Origin
(SOO) values. A route can be associated with multiple RT values similar to the BGP
community attribute that can carry multiple communities for an IP prefix. RT values are
used to control route distribution, because a router can decide to accept or reject a route
based on its RT value. The SOO value is used to uniquely identify a VPN site. Table 9-5
lists some important MPLS VPN terminology.
1163_09.book Page 229 Thursday, December 14, 2000 1:47 PM
VPN Red
222.222.222.2
Chicago
222.222.222.1 222.222.222.3
SanFrancisco NewYork
17 14
POS5/3 222.222.222.24/30 POS8/2
222.222.222.16/30 222.222.222.12/30
VPN Red
26
18 13
A VRF instance green for VPN service to the enterprise customer is defined. An RD of
200:100 is used to distinguish this customers routes in the service provider backbone. The
VPN configuration on Router SanFrancisco and the VPN operation between the customer
San Francisco site and New York site across the service provider network is discussed next.
Listing 9-15 gives the MPLS VPN-related configuration on the SanFrancisco router.
Listing 9-15 Setting Up MPLS VPN on Router SanFrancisco
ip vrf green
rd 200:100
route-target export 200:1
route-target import 200:1
route-target import 200:2
ip cef
1163_09.book Page 231 Thursday, December 14, 2000 1:47 PM
VRF green is enabled on interface Hssi0/0 that connects to the customer San Francisco site
on Router SanFrancisco. Listing 9-16 shows the RD and the interfaces used for VRF green.
Listing 9-16 Information on VRF Instance Green
SanFrancisco#sh ip vrf green
Name Default RD Interfaces
green 200:100 Hssi0/0
In the first section of the BGP configuration, Router SanFrancisco peers with the other
routers through BGP in the service provider backbone. The loopback0 interface addresses
are used for this peering because such interfaces never go down.
The next section of the BGP configuration specifies the routes belonging to VRF green.
Such routes can be learned dynamically using the VRF green instance of protocols, such as
BGP and Routing Information Protocol version 2 (RIPv2). A VRF green instance of a
protocol runs only on the VRF green interfaces. Hence, all VRF green routes are learned
dynamically through the VRF green interfaces or are specified statically as VRF green
routes that point to a VRF green interface.
On Router SanFrancisco, the customer routes belonging to the San Francisco site are
specified using static routes into VRF green. These static routes point to the point-to-point
interface connecting to the San Francisco site. You can see static VRF green routes by using
the show ip route vrf green command.
The last section of the BGP configuration specifies the BGP peering to exchange the VPN-
IPv4 address route family. BGPs extended community attribute is used to carry the export
RT values. Local VRF routes are installed in the local BGP VPN-IPv4 table if their import
RT value matches their specified export RT value. A VRF in a router has to specify an
import RT value equal to one of the export RT values carried by a route for the route to be
installed in the BGP VPN routing table.
On Router SanFrancisco, an export RT of 200:1 is specified for VRF green. All local routes
belonging to VRF green on this router are exported through BGP with an RT of 200:1 in
the BGP extended community attribute. A router that needs the SanFrancisco routes
belonging to the VRF green should import routes with an RT of 200:1 by using the
1163_09.book Page 233 Thursday, December 14, 2000 1:47 PM
command route-target import 200:1. Router SanFrancisco itself should have this
command because its import RT should match its export RT for the router to carry the
routes through BGP.
VRF green specifies an RT of 200:2 in addition to 200:1. Therefore, Router SanFrancisco
can import any BGP VPN routes that have an RT of either 200:1 or 200:2 into the VRF
green routing table. You can show the VRF green routes carried in BGP by using the show
ip bgp vpn vrf green command. You can see the routing table for VRF green by using the
show ip router vrf green command. The IP routes and BGP routes belonging to VRF green
in Router SanFrancisco are shown in Listings 9-17 and 9-18, respectively.
Listing 9-17 SanFrancisco Routes in the VRF Routing Table Green
SanFrancisco#show ip route vrf green
Codes: C - connected, S - static, I - IGRP, R - RIP, M - mobile, B - BGP
D - EIGRP, EX - EIGRP external, O - OSPF, IA - OSPF inter area
N1 - OSPF NSSA external type 1, N2 - OSPF NSSA external type 2
E1 - OSPF external type 1, E2 - OSPF external type 2, E - EGP
i - IS-IS, L1 - IS-IS level-1, L2 - IS-IS level-2, ia - IS-IS inter area
* - candidate default, U - per-user static routeo - ODR
The VRF green instance of the routing table shows all the dynamically learned or statically
specified VRF green routes. It also shows the BGP VPN-IPv4 that is imported into the local
VRF green routing table as it matches the import RT 200:2.
Listing 9-18 SanFrancisco BGP VPN-IPv4 Routes That Belong to VRF Green
SanFrancisco#sh ip bgp vpn vrf green
BGP table version is 57, local router ID is 222.222.222.1
Status codes: s suppressed, d damped, h history, * valid, > best, i - internal
Origin codes: i - IGP, e - EGP, ? - incomplete
BGP exchanges label bindings along with its VPN-IPv4 routes. A BGP label indicates the
local output interface that can reach the route. Listing 9-19 shows the label bindings for the
VPN-IPv4 routes carried by BGP.
Listing 9-19 Label Bindings for the VPN-IPv4 Routes Carried by BGP
SanFrancisco#sh ip bgp vpn vrf green label
Network Next Hop In label/Out label
Route Distinguisher: 200:100 (green)
200.200.1.0 0.0.0.0 26/nolabel
200.200.2.0 222.222.222.3 nolabel/39
200.200.11.0 0.0.0.0 27/nolabel
200.200.22.0 222.222.222.3 nolabel/40
Listing 9-19 shows that 39 is the label the NewYork router with router ID 222.222.222.3
advertised as a label binding for the 200.200.2.0 route. The next several Listings show how
an IP packet from the customer San Francisco site reaches the 200.200.2.1 address in the
New York site across the service provider network.
Listing 9-20 shows label forwarding information for the prefix 222.222.222.3.
Listing 9-20 Label Forwarding Information for the 222.222.222.3 Prefix
SanFrancisco#show mpls forwarding-table 222.222.222.3
Local Outgoing Prefix Bytes mpls Outgoing Next Hop
label label or VC or Tunnel Id switched interface
36 37 222.222.222.3/32 0 Po5/3 point2point
30 222.222.222.3/32 7161 Po5/1 point2point
Depending on the nature of load balancing over parallel paths, the router can choose either
label for 222.222.222.3. In this case, it chose label 30 through interface Po5/1.
After a packet for the 200.200.2.1 prefix arrives on Router SanFrancisco, the router attaches
a label stack of two labels (30 and 39) and sends the packet on interface PO5/1 toward
Router Chicago. An inner label takes the packet to the next hop 222.222.222.3 and the outer
label switches the packet to its actual destination 222.222.2.1. Listing 9-21 shows the label
forwarding information on Router Chicago for an incoming label of 30.
Listing 9-21 The Label Forwarding Table for Prefix 222.222.222.3
Chicago#sh mpls forwarding-table 222.222.222.3 32
Local Outgoing Prefix Bytes label Outgoing Next Hop
label label or VC or Tunnel Id switched interface
30 Pop label 222.222.222.3/32 7161 Po0/0 point2point
1163_09.book Page 235 Thursday, December 14, 2000 1:47 PM
Being a penultimate hop to the next-hop address 222.222.222.3, Router Chicago pops the
inner label of 30 and sends the packet toward Router NewYork through interface Serial0,
as shown in Listing 9-22.
Listing 9-22 Debug Output Showing How the Router Chicago Label-Switched an MPLS Packet from the Customer
San Francisco Site to 200.200.2.1 Address of the Customer New York Site
MPLS: Po0/1: recvd: CoS=0, TTL=255, Label(s)=30/39
MPLS: Po0/0: xmit: CoS=0, TTL=254, Label(s)=39
On arrival to Router NewYork, the packet still carries the second label, 39. Listing 9-23
shows the label forwarding table in Router NewYork for an incoming label of 39.
Listing 9-23 Router NewYorks Label Forwarding Table for VRF Green
NewYork#sh mpls forwarding-table vrf green
Local Outgoing Prefix Bytes mpls Outgoing Next Hop
label label or VC or Tunnel Id switched interface
39 Unlabelged 200.200.2.0/24[V] 520 Hs0/0 point2point
40 Unlabelged 200.200.22.0/24[V] 0 Hs0/0 point2point
Based on the label forwarding information, Router NewYork removes the incoming label
39 and sends the packet out on interface Hssi0/0, which connects to the New York site. This
label-switching action is depicted in Listing 9-24 by using debug commands in Router
NewYork.
Listing 9-24 Debug Output Showing How Router NewYork Label-Switched an MPLS Packet from SanFrancisco to
the 200.200.2.1 Address
MPLS: Po8/0: recvd: CoS=0, TTL=254, Label(s)=39
MPLS: Hs0/0: xmit: (no label)
In Router NewYork, the second section of the BGP configuration shows that you use BGP
peering to learn the VRF green routes from New York site. Note that on Router
SanFrancisco discussed earlier, you configured the San Francisco site VRF green routes
statically.
The service provider for the VPN customer provides the CAR and CDR services for each
access port, as shown in Figure 9-9. The VPN service provider provisions its network to
deliver its access ports CAR and CDR rates. Any traffic over the committed rates is
dropped at a higher probability over the committed traffic.
Committed packets in a properly provisioned MPLS VPN network are delivered with a high
probability and provide the same service level as CIR in Frame Relay networks.
Guaranteed QoS
Guaranteed QoS requires the use of RSVP end-to-end along the path, from the source to
the destination at the VPN sites connected by the VPN service provider backbone. The
extent and level of guaranteed QoS depends on which part of the network makes explicit
reservations through RSVP PATH messages. The three levels of guaranteed QoS
deployment are discussed next. Figure 9-10 depicts these three options, which vary
primarily on the QoS offered from the VPN service provider.
1163_09.book Page 239 Thursday, December 14, 2000 1:47 PM
CDR
RSVP RSVP
Best-effort/
MPLS CoS/
GB tunnel
MPLS VPN
service provider
This enables a customer to control resource allocation within the customers sites to specific
applications but has no effect on how the customers traffic is handled in the service
provider network.
Summary 241
The incoming interfaces on the Provider Edge routers are configured to define the CAR and
CDR parameters. The CAR and CDR values for the customer site are configured as 1.5
Mbps and 10 Mbps, respectively. All the VPN service provider backbone router interfaces
have MPLS CoS defined to differentiate traffic based on CoS values carried in an MPLS
packet.
Summary
MPLS enables packet switching based on the label information in the packet without the
need to look into the packets IP header. Packet switching is done using label swapping
rather than best-match forwarding for the destination address in the IP forwarding table. An
MPLS label carries three CoS bits to indicate packet precedence, similar to the precedence
field in the IP header. You can use the MPLS CoS field to indicate packet precedence within
the MPLS network and to deliver end-to-end IP QoS across an MPLS cloud.
One important MPLS application is MPLS-based VPNs. MPLS VPNs enable a scalable
VPN solution with the help of routing protocols that restrict the topology information
known to an incoming packet from a VPN site. MPLS VPNs can deliver QoS functionality
based on CAR and CDR and differentiated QoS in the service provider core. GB tunnels
are used to offer guaranteed VPN QoS.
1163_09.book Page 242 Thursday, December 14, 2000 1:47 PM
References
1 Multiprotocol Label Switching Architecture, E. Rosen, A.Viswanathan, and R. Callon,
IETF Draft, Work in Progress.
2 BGP/MPLS VPNs, E. Rosen and Y. Rekhter, RFC 2547.
3 Multiprotocol Extensions for BGP4, T. Bates, R. Chandra, D. Katz and Y. Rekhter,
RFC 2283.
4 Carrying Label Information in BGP4, Y. Rekhter and E. Rosen, Work in Progress.
5 BGP Extended Communities Attribute, R. Srihari, D. Tappan, draft-ramachandra-bgp-
ext-communities.txt.
1163_09.book Page 243 Thursday, December 14, 2000 1:47 PM
1163_09.book Page 244 Thursday, December 14, 2000 1:47 PM
1163_09.book Page 245 Thursday, December 14, 2000 1:47 PM
PART
III
Traffic Engineering
Chapter 10 MPLS Traffic Engineering
1163_09.book Page 246 Thursday, December 14, 2000 1:47 PM
1163_09.book Page 247 Thursday, December 14, 2000 1:47 PM
CHAPTER
10
router has a logical Layer 3 connection to every other router connected to the Layer 2
cloud.
Figure 10-1 shows a physical and logical Layer 3 view of such overlay models. The
physical model shows Layer 3 devicesrouters connected through Layer 2 (Frame Relay
or ATM) switches. The switches in the network help define the exact physical path taken by
any Layer 3 traffic entering the Frame Relay or the ATM cloud.
B
L3 L3
5 4 B
L2 L2 L3 L3
3
L3 L2 L2 L3 L3 L3
6
L2 L2 2 L3 L3
1 A
L3 L3
A
Physical network Layer 3 logical mesh
For the traffic from Router A to Router B, for example, the Layer 2 cloud offers three
physical paths: A->1->4->B, A->1->2->3->4->B, and A->1->6->5->4->B. The actual path
the IP traffic takes, however, is determined by the path predetermined by the Layer 2
switches in the network. The use of the explicit Layer 2 transit layer gives you exact control
over how traffic uses the available bandwidth in ways not currently possible by adjusting
the Layer 3 IP routing metrics.
Large mesh networks mean extra infrastructure costs. They might also cause scalability
concerns for the underlying IGP routing protocol, such as OSPF and IS-IS, as the normal
IGP flooding mechanism is inefficient in large mesh environments.
RRR
Routing protocols such as OSPF and IS-IS route traffic using the information on the
network topology and the link metrics. In addition to the information supplied by a routing
protocol, RRR routes an IP packet taking into consideration its traffic class, the traffic class
resource requirements, and the available network resources.
1163_09.book Page 249 Thursday, December 14, 2000 1:47 PM
RRR 249
Figure 10-2 shows two paths from San Francisco to New York in a service provider
networkone through Dallas and another through Chicago. The service provider noticed
that the traffic from San Francisco to New York usually takes the San Francisco->Chicago
->New York path. This path becomes heavily congested during a certain period of the day,
however. It also was noted that during this period of congestion on the path from San
Francisco to New York through Chicago, the path from San Francisco to New York through
Dallas is heavily underutilized. The need is to engineer the traffic such that it is routed
across a network by best utilizing all the available network resources. In this case, all the
traffic between San Francisco and New York is getting poor performance because the
network cannot use the available alternate path between these two sites during a certain
period of the day. This scenario typifies the application and the need for TE. Case Study
10-1 discusses a more detailed scenario to illustrate the workings of MPLS TE.
Figure 10-2 TE Tunnel from the San Francisco Router to the New York Router
New York
Chicago
TE tunnel
Dallas
San Francisco
After some traffic analysis, it was clear that if all New York-bound traffic from San
Francisco is carried along the path through Dallas rather than Chicago during its period of
congestion, both the paths will be optimally utilized. Therefore, a TE tunnel is established
between San Francisco and New York. It is called a tunnel because the path taken by the
traffic is predetermined at the San Francisco router and not by a hop-by-hop routing
decision. Normally, the TE tunnel takes the path through Chicago. During the period of
congestion at Chicago, however, the TE path changes to the path through Dallas. In this
case, TE resulted in optimal utilization of the available network resources while avoiding
points of network congestion. You can set up the TE path to change back to the path through
Chicago after sufficient network resources along that path become available.
RRR TE requires the user to define the traffic trunk, the resource requirements and policies
on the traffic tunnel, and the computation or specification of the explicit path the TE tunnel
will take. In this example, the traffic trunk is New York-bound traffic from San Francisco,
the resource requirements are the TE tunnels bandwidth and other policies, and the explicit
path is the path from San Francisco to New York through Dallas.
1163_09.book Page 250 Thursday, December 14, 2000 1:47 PM
An RRR operational model is shown in Figure 10-3. It depicts the various operational
functional blocks of the TE in a flowchart format. The following sections discuss each
function in detail.
Traffic modeling
Traffic class to be carried over the TE tunnel Each link carries attributes on its available
bandwidth and other resource policy and
usage information
Add the traffic model at the edge router Link resource attributes of each link are made
originating the TE tunnel available all over the network via an IGP such as
OSPF or IS-IS
Path optimization and restoration Preempt established tunnels, trigger IGP updates
when available link resources exceed a given
threshold, etc.
TE tunnel setup
NOTE Prior to deploying TE in a network, it is important to profile the traffic flow patterns and
statistics at various points in the network. You can do this using offline tools and techniques
that are beyond the scope of this book. The goal is to come up with a traffic model that best
optimizes the network resources.
TE Trunk Definition
The TE trunk defines the class of packets carried on a TE tunnel. This policy is local to the
head-end router originating the TE tunnel setup.
As discussed in Chapter 3, Network Boundary Traffic Conditioners: Packet Classifier,
Marker, and Traffic Rate Management, a traffic class is defined flexibly based on the
Transmission Control Protocol/Internet Protocol (TCP/IP) traffic headers. A traffic class
can be based on a single parameter, such as the IP destination or the MPLS Class of Service
(CoS) field, or on a number of parameters, such as all File Transfer Protocol (FTP) traffic
going from a certain sender to a specific destination. In RRR, all packets of a traffic class
take a specified defined or dynamically determined common path across a network. For this
reason, RRR traffic classes are also termed traffic trunks.
TE Tunnel Attributes
A TE tunnel is given attributes to describe the traffic trunks requirements and to specify
various administrative policies. This section discusses the various tunnel attributes.
Bandwidth
The bandwidth attribute shows the end-to-end bandwidth required by a TE tunnel. You can
define it based on the requirements of the traffic class being carried within the TE tunnel.
Table 10-1 shows the implications of the low and high values for TE tunnel setup and
holding priorities.
Table 10-1 Implications of the Low and High Values for the TE Tunnel Setup and Holding Priorities
High Value Low Value
Setup Priority Likely to preempt established TE Less likely to preempt established
tunnels (preemptor). TE tunnels (nonpreemptor).
Holding Priority Less likely to be preempted by a Likely to be preempted by a
newly established TE tunnel (non- newly established TE tunnel
preemptable). (preemptable).
Adaptability
The adaptability attribute specifies whether an existing TE tunnel needs to be reoptimized
when a path better than the current TE tunnel path comes up. Reoptimization is discussed
later in this chapter.
Resilience
The resilience attribute specifies the desired behavior if the current TE tunnel path no
longer exists. This typically occurs due to network failures or preemption. Restoration of a
TE tunnel when the current path doesnt work is addressed later in this chapter.
Available Bandwidth
The available bandwidth attribute describes the amount of bandwidth available at each
setup priority. The available bandwidth might not be the actual available bandwidth. In
certain situations, a network operator can choose to oversubscribe a link by assigning it to
a value that is higher than its actual bandwidth.
NOTE Available bandwidth for a higher setup priority should always be more than or equal to that
for a lower setup priority.
Resource Class
The resource class attribute colors the link. As was discussed earlier in this chapter, a tunnel
decides to include or exclude a link in its path selection computation based on the links
resource class attribute and its own resource class affinity attribute.
1163_09.book Page 254 Thursday, December 14, 2000 1:47 PM
Step 3 Runs a separate Shortest Path First (SPF) algorithm to compute the
shortest (minimum metric) path on the IGP protocols link state database
after removing any pruned links.
This instance of the SPF algorithm is specific to the TE path in question
and is different from the SPF algorithm a router uses to build its routing
table based on the entire link-state database.
An explicit path for the TE tunnel is computed from the SPF run. The computed explicit
path is expressed as a sequence of router IP addresses. Upon request to establish a TE
tunnel, an explicit path is used in establishing the TE tunnel based on the path selection
order.
1163_09.book Page 255 Thursday, December 14, 2000 1:47 PM
TE Tunnel Setup
TE-RSVP is used to signal TE tunnels2. It uses the same original RSVP messages as the
generic signaling protocol, with certain modifications and extensions to support this new
application. TE-RSVP helps build an explicitly routed LSP to establish a TE tunnel.
TE-RSVP adds two important capabilities that enable it to build an Explicitly Routed Label
Switched Path (ER-LSP): a way to bind labels to RSVP flows and explicitly route RSVP
messages. On an ER-LSP-based TE tunnel, the only sender to the TE tunnel is the LSPs
first node, and the only destination is the LSPs last node. All intermediate nodes in the LSP
do normal label switching based on the incoming label. The first node in the LSP initiates
ER-LSP creation. The first node in the LSP is also referred to as the head-end router.
The head-end router initiates a TE tunnel setup by sending an RSVP PATH message to the
tunnel destination IP address with a Source Route Object (SRO) specifying the explicit
route. The SRO contains a list of IP addresses with a pointer pointing to the next hop in the
list. All nodes in the network forward the PATH message to the next-hop address based on
the SRO. They also add the SRO to their path state block. When the destination receives the
PATH message, it recognizes that it needs to set up an ER-LSP based on the Label Request
Object (LRO) present in the PATH message and generates an RSVP reservation request
(RESV) message for the session. In the RSVP RESV message that it sends toward the
sender, the destination also creates a label and sends it as the LABEL object. A node
receiving the RESV message uses the label to send all traffic over that path. It also creates
a new label and sends it as a LABEL object in the RSVP RESV message to the next node
toward the sender. This is the label the node expects for all incoming traffic on this path.
NOTE The RSVP RESV message follows exactly the reverse path as the RSVP PATH message
because the SRO in the path state block established at each node by the PATH message
defines how the RSVP RESV message is forwarded upstream.
An ER-LSP is formed and the TE tunnel is established as a result of these operations. Note
that no resource reservations are necessary if the traffic being carried is best-effort traffic.
When resources need to be allocated to an ER-LSP, the normal RSVP objects, Tspec and
Rspec, are used for this purpose. The sender Tspec in the PATH message is used to define
the traffic being sent over the path. The RSVP PATH message destination uses this
information to construct appropriate receiver Tspecs and Rspecs used for resource
allocation at each node in the ER-LSP.
1163_09.book Page 256 Thursday, December 14, 2000 1:47 PM
TE Path Maintenance
TE path maintenance performs path reoptimization and restoration functions. These
operations are carried out after the TE tunnel is established. Path reoptimization describes
the desired behavior in case a better potential TE path comes up after a TE path has already
been established. With path reoptimization, a router should look for opportunities to
reoptimize an existing TE path. It is indicated to the router by the TE tunnels adaptability
attribute.
Path restoration describes how a TE tunnel is restored when the current path doesnt work.
The TE tunnels resilience attribute describes this behavior.
TE-RSVP
TE-RSVP extends the available RSVP protocol to support LSP path signaling. TE-RSVP
uses RSVPs available signaling messages, making certain extensions to support TE. Some
important extensions include the following:
Label reservation supportTo use RSVP for LSP tunnel signaling, RSVP needs to
support label reservations and installation. Unlike normal RSVP flows, TE-RSVP
uses RSVP for label reservations for flows without any bandwidth reservations. A new
type of FlowSpec object is added for this purpose. TE-RSVP also manages labels to
reserve labels for flows.
Source routing supportLSP tunnels use explicit source routing. Explicit source
routing is implemented in RSVP by introducing a new object, SRO.
RSVP host supportIn TE-RSVP, RSVP PATH and RESV messages are originated
by the network head-end routers. This is unlike the original RSVP, in which RSVP
PATH and RESV messages are generated by applications in end-hosts.
Hence, TE-RSVP requires RSVP host support in routers.
1163_09.book Page 257 Thursday, December 14, 2000 1:47 PM
IS-IS Modifications
The IS reachability TLV is extended to carry the new data for link resource information.
The extended IS reachability TLV is TLV type 22. Within this TLV, various sub-TLVs are
used to carry the link attributes for TE.
OSPF Modifications
Because the baseline OSPF Router LSA is essentially nonextensible, OSPF extensions for
TE use the Opaque LSA5. Three types of Opaque LSAs exist, each having a different
flooding scope. OSPF extensions for TE use only Type 10 LSAs, which have a flooding
scope of an area.
The new Type 10 Opaque LSA for TE is called the TE LSA. This LSA describes routers and
point-to-point links (similar to a Router LSA). For TE purposes, the existing Network LSA
suffices for describing multiaccess links, so no additional LSA is defined for this purpose.
TE Approaches
The example discussed in the beginning of this chapter typifies one approach to TE
engineering traffic around points of congestion. This approach to scope TE to only a few
paths might not work when other traffic in the network is carried without TE, however.
A commonly recommended approach for TE uses the full mesh of LSP-based TE tunnels
between all the edge routers in a service provider network. Generally, these edge routers are
customer Points of Presence (POP) routers, or routers with peer connections to other
providers.
Figure 10-4 MPLS TE from the SanFrancisco Router to the NewYork Router
222.222.222.2
Chicago
22 10
POS0/1 25 POS0/0
222.222.222.20/30 222.222.222.8/30
POS5/1 POS8/0
POS0/3
21 9
Dallas
222.222.222.4
Discuss the operation of the TE tunnel and the TE extensions of IS-IS and RSVP based on
the various show and debug commands in the router.
All four routers are configured with the addressing scheme and connectivity, as shown in
Figure 10-4. Integrated IS-IS is used as the IGP. IS-IS is set up for 32-bit-wide metric
values, as opposed to normal 6-bit metrics. The loopback0 interface IP address is enabled
as the router-id for MPLS TE.
To enable MPLS TE, the mpls traffic-eng tunnels command at both the global and
interface levels is enabled. RSVP is enabled on all the interfaces to facilitate RSVP
signaling to establish the TE tunnels. RSVP performs two major tasks: It distributes labels
to establish a label-switched path along an explicit path for the tunnel, and it makes any
necessary bandwidth reservations along the label-switched path. Weighted Fair Queuing
(WFQ) is enabled on the interfaces because RSVP depends on WFQ to provide bandwidth
guarantees.
1163_09.book Page 260 Thursday, December 14, 2000 1:47 PM
Router SanFrancisco initiates the TE tunnel to the NewYork router. Interface tunnel0 is
configured for this purpose. The tunnel mode is set to TE by the tunnel mode mpls traffic-
eng command. A bandwidth reservation of 10 kbps is requested along the tunnel path. The
setup and holding priority for the tunnel are set at 7, the highest possible priority value.
An explicit TE tunnel path through the Chicago router is specified by an explicit path
named sfny. A path option of 1 is used to indicate that the explicitly specified path is the
preferred path to establish the TE tunnel. A dynamically determined TE path is specified as
the fail-over path. Note that the path option specifies a paths order of preference to be the
TE tunnel path.
Listings 10-1 through 10-4 list the configuration required on each router. Note that the
listings show only the MPLS-related configuration, removing any extraneous information.
Listing 10-1 Configuration on Router SanFrancisco
ip cef
clns routing
mpls traffic-eng tunnels
!
!
!
interface Loopback0
ip address 222.222.222.1 255.255.255.255
ip router isis isp
!
interface Tunnel0
ip unnumbered Loopback0
tunnel destination 222.222.222.3
tunnel mode mpls traffic-eng
tunnel mpls traffic-eng autoroute announce
tunnel mpls traffic-eng priority 7 7
tunnel mpls traffic-eng bandwidth 10
tunnel mpls traffic-eng path-option 1 explicit name sfny
tunnel mpls traffic-eng path-option 2 dynamic
tunnel mpls traffic-eng record-route
!
!
interface POS5/1
ip address 222.222.222.21 255.255.255.252
ip router isis isp
mpls traffic-eng tunnels
fair-queue 64 256 36
ip rsvp bandwidth 1158 1158
!
interface POS5/3
ip address 222.222.222.17 255.255.255.252
ip router isis isp
mpls traffic-eng tunnels
fair-queue 64 256 36
ip rsvp bandwidth 1158 1158
!
1163_09.book Page 261 Thursday, December 14, 2000 1:47 PM
interface POS9/0
ip address 222.222.222.18 255.255.255.252
ip router isis isp
mpls traffic-eng tunnels
fair-queue 64 256 36
ip rsvp bandwidth 1158 1158
!
interface POS9/1
ip address 222.222.222.13 255.255.255.252
ip router isis isp
mpls traffic-eng tunnels
fair-queue 64 256 36
ip rsvp bandwidth 1158 1158
!
interface POS9/2
ip address 222.222.222.26 255.255.255.252
ip router isis isp
1163_09.book Page 263 Thursday, December 14, 2000 1:47 PM
After enabling the preceding configurations on the routers, check the TE tunnels status by
using the show mpls traffic-eng tunnel tunnel0 command. Listing 10-5 shows the output
of this command. The output shows the tunnels status and the path option used in setting
up the tunnel. In addition, it shows the tunnel configuration and the RSVP signaling
information.
Listing 10-5 The show mpls traffic-eng tunnel tunnel0 Command Output When the Tunnel Is Taking the Path of
the Administratively Defined Explicit Route
SanFrancisco#sh mpls tr tun t0
path option 1, type explicit sfny (Basis for Setup, path weight 20)
path option 2, type dynamic
Config Paramters:
Bandwidth: 10 Priority: 7 7 Affinity: 0x0/0xFFFF
AutoRoute: enabled LockDown: disabled
InLabel : -
OutLabel : POS5/1, 26
RSVP Signalling Info:
Src 222.222.222.1, Dst 222.222.222.3, Tun_Id 0, Tun_Instance 1
RSVP Path Info:
My Address: 222.222.222.1
Explicit Route: 222.222.222.22 222.222.222.9
Record Route:
Tspec: ave rate=10 kbits, burst=1000 bytes, peak rate=10 kbits
RSVP Resv Info:
Record Route: 222.222.222.9 222.222.222.22
Fspec: ave rate=10 kbits, burst=1000 bytes, peak rate=Inf
As is evident from the output shown, the tunnel was set up by using the user-given explicit
path through Chicago. RSVP signaling shows the label 26 and the outgoing interface
POS5/1 used to send packets on the tunnel path. Because the Record Route option was
1163_09.book Page 264 Thursday, December 14, 2000 1:47 PM
enabled using the tunnel mpls traffic-eng record-route command on the TE tunnel, the
RSVP RESV information records the route taken by it.
After the tunnel has been operational, assume that the link from Chicago to New York
becomes no longer usable in the tunnel path. The Chicago-to-New York link can be made
unusable by just bringing down the link, or by changing the local link resource class
attribute. In this case study, the resource class attribute on the link from the Chicago router
to the NewYork router is changed to 0x1. Therefore, a bit-wise logical AND operation
between the link resource class (0x1) and the tunnel resource class mask (0x0000FFFF) is
not logically equal to the tunnel resource affinity value (0x0000). Hence, the link cannot be
used in the tunnel path.
The resource class attribute on the link from the Chicago router to the NewYork router is
changed to 0x1 by using the mpls traffic-eng attribute-flag 0x1 configuration command
on the Chicago routers POS0/0 interface.
This local link resource attribute change on the Chicago router makes the explicit path
unusable because it fails the resource affinity test.
Hence, the TE tunnel path falls over to a dynamically selected path. While selecting a
dynamic path, the router removes the Chicago-to-NewYork link for the network topology
based on its local link resource policy. The router recomputes the TE tunnel path to run
through Dallas by running an instance of the SPF algorithm for the tunnel.
Listing 10-6 shows the output of the show mpls traffic-eng tunnel tunnel0 command
when the tunnel path is taking the dynamic path through the Dallas router.
Listing 10-6 The show mpls traffic-eng tunnel tunnel0 Command Output When the Tunnel Path Is Taking the
Dynamic Path Through the Dallas Router
SanFrancisco#sh mpls tr tun t0
path option 2, type dynamic (Basis for Setup, path weight 20)
path option 1, type explicit sfny
Config Paramters:
Bandwidth: 10 Priority: 7 7 Affinity: 0x0/0xFFFF
AutoRoute: enabled LockDown: disabled
InLabel : -
OutLabel : POS5/3, 26
RSVP Signalling Info:
Src 222.222.222.1, Dst 222.222.222.3, Tun_Id 0, Tun_Instance 2
RSVP Path Info:
My Address: 222.222.222.1
Explicit Route: 222.222.222.18 222.222.222.14 222.222.222.3
Record Route:
1163_09.book Page 265 Thursday, December 14, 2000 1:47 PM
Listing 10-6 The show mpls traffic-eng tunnel tunnel0 Command Output When the Tunnel Path Is Taking the
Dynamic Path Through the Dallas Router (Continued)
Tspec: ave rate=10 kbits, burst=1000 bytes, peak rate=10 kbits
RSVP Resv Info:
Record Route: 222.222.222.14 222.222.222.18
Fspec: ave rate=10 kbits, burst=1000 bytes, peak rate=Inf
After the path through Chicago is established, once again a link in the tunnel path becomes
unusable. The local resource attribute on the link from the SanFrancisco router to the Dallas
router is changed to 0x1, making the established TE path no longer usable. Hence, taking
into consideration the new local link resource attributes, the SanFrancisco router chooses
the TE path through the Chicago and Dallas routers for the path to the NewYork router.
Listing 10-7 shows the output of the show mpls traffic-eng tunnel tunnel0 command
when the tunnel path is taking the dynamic path through the Chicago and Dallas routers to
go to the NewYork router.
Listing 10-7 The show mpls traffic-eng tunnel tunnel0 Command Output on the SanFrancisco Router When the
Tunnel Path Is Taking the Dynamically Determined Path via Chicago and Dallas to Go to New York
SanFrancisco#sh mpls traffic-eng tunnel t0
path option 2, type dynamic (Basis for Setup, path weight 30)
path option 1, type explicit sfny
Config Paramters:
Bandwidth: 10 Priority: 7 7 Affinity: 0x0/0xFFFF
AutoRoute: enabled LockDown: disabled
InLabel : -
OutLabel : POS5/0, 26
RSVP Signalling Info:
Src 222.222.222.1, Dst 222.222.222.3, Tun_Id 0, Tun_Instance 2
RSVP Path Info:
My Address: 222.222.222.1
Explicit Route: 222.222.222.22 222.222.222.26 222.222.222.14 222.222.222.3
Record Route:
Tspec: ave rate=10 kbits, burst=1000 bytes, peak rate=10 kbits
RSVP Resv Info:
Record Route: 222.222.222.14 222.222.222.26 222.222.222.22
Fspec: ave rate=10 kbits, burst=1000 bytes, peak rate=Inf
SanFrancisco#
The preceding example on the SanFrancisco router shows that the TE tunnel path to
NewYork is now established through the Chicago and Dallas routers.
1163_09.book Page 266 Thursday, December 14, 2000 1:47 PM
To learn more about how the MPLS TE tunnel operates, look at Listing 10-8. It shows the
relevant beginning section of the show interface tunnel0 command output. The show isis
mpls traffic tunnel command can also be used to check on the established TE tunnels. Its
output is shown in Listing 10-9.
Listing 10-8 The show interface tunnel0 Command Output
SanFrancisco#sh interface tunnel 0
Tunnel0 is up, line protocol is up
Hardware is Tunnel
Interface is unnumbered. Using address of Loopback0 (222.222.222.1)
MTU 1496 bytes, BW 9 Kbit, DLY 500000 usec, rely 255/255, load 1/255
Encapsulation TUNNEL, loopback not set
Keepalive set (10 sec)
Tunnel source 222.222.222.1, destination 222.222.222.3
Tunnel protocol/transport Label Switching, key disabled, sequencing disabled
Checksumming of packets disabled, fast tunneling enabled
Listing 10-9 The show isis mpls traffic tunnel Command Output
SanFrancisco# sh isis mpls traffic-eng tunnel
System Id Tunnel Name Bandwidth Nexthop Metric Mode
NewYork.00 Tunnel0 100000 222.222.222.3
Listings 10-10 and 10-11 use commands that show the operation of IS-IS with TE
extensions. IS-IS with TE extensions not only gives the network topology with the link
metric information, but also carries TE attributes of all the links in the network. Listing
10-10 shows the output of a command to show the MPLS TE attributes of all the MPLS-
enabled links on the Chicago router with a router-id of 222.222.222.2.
Listing 10-10 A Display of the MPLS Traffic Engineering Attributes of All the MPLS-Enabled Links on the Chicago
Router with a router-id of 222.222.222.2
4c7507=R1#sh mpls traffic-eng topology 222.222.222.2
Listing 10-10 A Display of the MPLS Traffic Engineering Attributes of All the MPLS-Enabled Links on the Chicago
Router with a router-id of 222.222.222.2 (Continued)
--------- ---------- --------- ----------
bw[0]: 0 1158 bw[1]: 0 1158
bw[2]: 0 1158 bw[3]: 0 1158
bw[4]: 0 1158 bw[5]: 0 1158
bw[6]: 0 1158 bw[7]: 0 1158
Each router in MPLS TE advertises its own link attributes to the rest of the network using
IS-IS extensions. The show isis mpls traffic-eng advertise command shows the TE
attributes being advertised by a router. Listing 10-11 shows the output of this command on
the SanFrancisco router.
Listing 10-11 TE Attributes Being Advertised by the SanFrancisco Router
SanFrancisco#sh isis mpls traffic-eng advertise
System ID: SanFrancisco.00
Router ID: 222.222.222.1
Link Count: 2
Link[1]
Neighbor System ID: W1-R2c4500m=R2.00 (broadcast link)
Interface IP address: 222.222.222.21
Neighbor IP Address: 222.222.222.22
Admin. Weight: 10
Physical BW: 1544000 bits/sec
Reservable BW: 1158000 bits/sec
BW unreserved[0]: 1158000 bits/sec, BW unreserved[1]: 1158000 bits/sec
BW unreserved[2]: 1158000 bits/sec, BW unreserved[3]: 1158000 bits/sec
BW unreserved[4]: 1158000 bits/sec, BW unreserved[5]: 1158000 bits/sec
BW unreserved[6]: 1158000 bits/sec, BW unreserved[7]: 1148000 bits/sec
Affinity Bits: 0x00000000
Link[2]
Neighbor System ID: Y1-R5c7513=R4.00 (broadcast link)
Interface IP address: 222.222.222.17
Neighbor IP Address: 222.222.222.18
Admin. Weight: 10
Physical BW: 1544000 bits/sec
Reservable BW: 1158000 bits/sec
BW unreserved[0]: 1158000 bits/sec, BW unreserved[1]: 1158000 bits/sec
BW unreserved[2]: 1158000 bits/sec, BW unreserved[3]: 1158000 bits/sec
continues
1163_09.book Page 268 Thursday, December 14, 2000 1:47 PM
SanFrancisco(config)#int t0
SanFrancisco(config-if)#tunnel mpls traffic-eng autoroute announce
Listing 10-13 shows the information on the tunnel destination 222.222.222.3 in the Cisco
Express Forwarding (CEF) table by using the show ip cef 222.222.222.3 internal
command. This command output shows the label imposed on the traffic to the tunnel
destination.
1163_09.book Page 269 Thursday, December 14, 2000 1:47 PM
Listing 10-13 Information on the Tunnel Destination 222.222.222.3 in the CEF Table
SanFrancisco#sh ip cef 222.222.222.3 internal
222.222.222.3/32, version 253
0 packets, 0 bytes
has label information: local label: tunnel head
fast label rewrite: Tu0, point2point, labels imposed 26
via 222.222.222.3, Tunnel0, 0 dependencies
next hop 222.222.222.3, Tunnel0
valid adjacency
RSVP establishes labels to create a label-switched MPLS tunnel path. Each router keeps a
label forwarding table to switch packets based on their incoming label. Listing 10-14 shows
the label forwarding tables on the routers in the TE tunnel path to the destination. Note that
the Dallas router removes the MPLS label before sending the packet to the destination
NewYork router. Hence, the NewYork routers label forwarding table is not shown in this
example.
Listing 10-14 Information on the Label Forwarding Table on the SanFrancisco, Chicago, and Dallas Routers
SanFrancisco#sh mpls forwarding-table 222.222.222.3 32 detail
Local Outgoing Prefix Bytes mpls Outgoing Next Hop
label label or VC or Tunnel Id switched interface
Tun hd Unlabeled 222.222.222.3/32 0 Tu0 point2point
MAC/Encaps=4/8, MTU=1500, Label Stack{26}, via POS5/1
0F008847 0001A000
Figure 10-5 shows how packets are label-switched along the TE tunnels path. The
SanFrancisco router imposes a label of 26 and sends all traffic destined to or through
the tunnel destination on interface POS5/1. The Chicago router receives the packet with
an incoming label of 26. Based on its label forwarding table, the Chicago router swaps
the incoming label in the packet with an outgoing label of 30 and sends it on interface
POS0/3. The packet now arrives on the Dallas router with an incoming label of 30. Because
Dallas is the penultimate hop before the destination, it removes the MPLS label and sends
it toward the destination through interface POS9/1.
1163_09.book Page 270 Thursday, December 14, 2000 1:47 PM
Figure 10-5 LSP for the TE Tunnel from San Francisco to New York
As noted previously, RSVP is used to establish TE tunnels. On the Chicago router, debug
information is captured to show the contents of the RSVP messages used to establish the
tunnel. Listings 10-15 and 10-16 display the contents of the RSVP PATH and RESV
messages, respectively, as seen by the Chicago router. Note the new and modified RSVP
objects and the information they carried.
Listing 10-15 RSVP PATH Message That the Chicago Router Receives from the SanFrancisco Router
RSVP: version:1 flags:0000 type:PATH cksum:0000 ttl:254 reserved:0 length:248
SESSION type 7 length 16:
Destination 222.222.222.3, TunnelId 0, Source 222.222.222.1
HOP type 1 length 12: DEDEDE15
: 00000000
TIME_VALUES type 1 length 8 : 00007530
EXPLICIT_ROUTE type 1 length 28:
(#1) Strict IPv4 Prefix, 8 bytes, 222.222.222.22/32
(#2) Strict IPv4 Prefix, 8 bytes, 222.222.222.26/32
(#3) Strict IPv4 Prefix, 8 bytes, 222.222.222.14/32
LABEL_REQUEST type 1 length 8 : 00000800
SESSION_ATTRIBUTE type 7 length 24:
setup_pri: 7, reservation_pri: 7 MAY REROUTE
SESSION_NAME:SanFrancisco_t0
SENDER_TEMPLATE type 7 length 12:
Source 222.222.222.1, tunnel_id 1
SENDER_TSPEC type 2 length 36:
version=0, length in words=7
service id=1, service length=6
parameter id=127, flags=0, parameter length=5
average rate=1250 bytes/sec, burst depth=1000 bytes
1163_09.book Page 271 Thursday, December 14, 2000 1:47 PM
Listing 10-15 RSVP PATH Message That the Chicago Router Receives from the SanFrancisco Router (Continued)
peak rate =1250 bytes/sec
min unit=0 bytes, max unit=0 bytes
ADSPEC type 2 length 84:
version=0 length in words=19
General Parameters break bit=0 service length=8
IS Hops:0
Minimum Path Bandwidth (bytes/sec):2147483647
Path Latency (microseconds):0
Path MTU:-1
Guaranteed Service break bit=0 service length=8
Path Delay (microseconds):0
Path Jitter (microseconds):0
Path delay since shaping (microseconds):0
Path Jitter since shaping (microseconds):0
Controlled Load Service break bit=0 service length=0
RECORD_ROUTE type 1 length 12:
(#1) IPv4 address, 222.222.222.21/32
Note the Explicit Route object in the RSVP PATH message that gives the exact path to be
taken by the RSVP PATH message to establish the label-switched tunnel path. It also carries
a LABEL REQUEST object to request label distribution and a RECORD ROUTE object to
record the path taken by the message.
Listing 10-16 RSVP RESV Message Sent by the Chicago Router to the SanFranciso Router
RSVP: version:1 flags:0000 type:RESV cksum:D748 ttl:255 reserved:0 length:136
SESSION type 7 length 16:
Destination 222.222.222.3, TunnelId 0, Source 222.222.222.1
HOP type 1 length 12: DEDEDE16
: 00000000
TIME_VALUES type 1 length 8 : 00007530
STYLE type 1 length 8 :
RSVP_SE_OPTION
FLOWSPEC type 2 length 36:
version = 0 length in words = 7
service id = 5, service length = 6
tspec parameter id = 127, tspec flags = 0, tspec length = 5
average rate = 1250 bytes/sec, burst depth = 1000 bytes
peak rate = 2147483647 bytes/sec
min unit = 0 bytes, max unit = 0 bytes
FILTER_SPEC type 7 length 12:
Source 222.222.222.1, tunnel_id 1
LABEL type 1 length 8 : 0000001A
RECORD_ROUTE type 1 length 28:
(#1) IPv4 address, 222.222.222.14/32
(#2) IPv4 address, 222.222.222.26/32
(#3) IPv4 address, 222.222.222.22/32
1163_09.book Page 272 Thursday, December 14, 2000 1:47 PM
Note that the RSVP RESV message carries the LABEL 0x1A (26 in decimal) and the
FLOWSPEC information. Its RECORD ROUTE object recorded the path taken by the
message. The RSVP session and the installed reservation are shown in Listing 10-17.
Listing 10-17 The RSVP Session and Installed Reservation Information from the SanFrancisco Router
SanFrancisco#show ip rsvp host sender
To From Pro DPort Sport Prev Hop I/F BPS Bytes
222.222.222.3 222.222.222.1 0 0 1 10K 1K
By default, the Time-to-Live (TTL) value in the IP header is copied to the TTL value on the
MPLS header at the edge router to a label-switched path. Hence, a trace route utility would
normally show all the hops in a label-switched tunnel path. You can mask the label-
switched hops in a network from outside users, however, by not copying the TTL value of
the IP header into the MPLS header at the edge of the MPLS network. Listing 10-18 shows
the output of a trace route utility on the SanFrancisco router before and after enabling the
no mpls ip propagate-ttl command.
Listing 10-18 The traceroute Utility Output With and Without Setting the IP Header TTL Value in the MPLS Label
SanFrancisco#trace 222.222.222.3
SanFrancisco#trace 222.222.222.3
Summary 273
Summary
MPLS TE uses MPLS circuit-switching properties to engineer TE tunnels with adequate
reserved resources along the tunnel path based on the traffic carried by the tunnel. TE-
RSVP is used to establish the LSP-based TE path for a TE tunnel. TE extensions for OSPF
and IS-IS enable them to carry the available link resource information all over the network.
The famous quote on TE by Mike ODell of UUNETThe efficacy with which one
uses the available bandwidth in the transmission fabric directly drives the fundamental
manufacturing efficiency of the business and its cost structureeloquently captures the
reasoning for TE.
1163_09.book Page 274 Thursday, December 14, 2000 1:47 PM
References 275
References
1 Requirements for Traffic Engineering Over MPLS, D. Awduche et al., draft-ietf-mpls-
traffic-eng-00.txt, work in progress.
2
Extensions to RSVP for LSP Tunnels, D. Awduche et al., draft-ietf-mpls-rsvp-lsp-
tunnel-04.txt, work in progress.
3
IS-IS Extensions for Traffic Engineering, H. Smit and T. Li, draft-ietf-isis-traffic-00.txt,
work in progress.
4 Traffic Engineering Extensions to OSPF, D. Katz and D. Yueng, draft-katz-yeung-ospf-
traffic-00.txt, work in progress.
5 The OSPF Opaque LSA Option, R. Coltun, RFC 2370, July 1998.
1163_09.book Page 276 Thursday, December 14, 2000 1:47 PM
1163_09.book Page 277 Thursday, December 14, 2000 1:47 PM
PART
IV
Appendixes
Appendix A Cisco Modular QoS Command-Line Interface
APPENDIX
A
Cisco Modular QoS Command-
Line Interface
Modular quality of service (QoS) command-line interface (CLI) defines a new modular
CLI-based QoS configuration in Cisco IOS Software. This new framework for defining
QoS policies divides the QoS configuration into three modules: traffic class definition,
policy definition, and policy application.
Modular QoS CLI applies equally well for various Internet Protocol (IP) QoS, Multi-
protocol Label Switching (MPLS) QoS, and the Asynchronous Transfer Mode (ATM) and
Frame Relay virtual circuit (VC)-based QoS models. In terms of policy definition, all QoS
models share a common thread of policies. They include rate-shaping or rate-limiting
policies, such as Committed Access Rate (CAR), Generic Traffic Shaping (GTS), and
Frame Relay Traffic Shaping (FRTS); packet scheduling policies, such as Weighted Fair
Queuing (WFQ); and packet drop policies, such as Weighted Random Early Detection
(WRED).
The primary difference between the various QoS models is their traffic classification policy.
IP precedence-based and Differentiated Services Code Point (DSCP)-based IP QoS differ,
for example, in terms of the bits they use to define a traffic class. After a traffic class is
defined, the same standard policies are used to deliver differentiated service among the
classes. Similarly, IP and MPLS QoS differ only in terms of the bits in the packet header
they use to define a traffic class. Frame Relay- and ATM-based QoS models differ from IP
QoS in terms of their level of policy application. IP QoS policies are applied at the interface
level, whereas in ATM and Frame Relay models, QoS is applied at the VC level.
In the modular QoS CLI model, you can configure QoS policies in a router in three steps,
one step for each module in the new CLI model:
Step 1 Traffic class definitionConfiguring the traffic classification policy.
This is achieved using a class-map configuration command.
Step 2 Policy definitionConfiguring policies on the various traffic classes
defined. This is achieved using a policy-map configuration command.
Step 3 Policy applicationEnabling the policy to an interface or a VC in the
ATM or Frame Relay QoS models. This is achieved using the service-
policy configuration command.
1163_09.book Page 280 Thursday, December 14, 2000 1:47 PM
class-map class2
match ip dscp EF
class-map class3
match destination <mac-address>
class-map class4
match input-interface Hssi0/0/0
class-map class5
match protocol https
Another necessary element in the traffic classification definition is the ability to define a
traffic class such that it matches all packets meeting the criteria of any or all of the
previously defined classes. Two keywordsmatch-any and match-allare added for this
purpose. These keywords define a logical OR and a logical AND operation, respectively, on
the match subcommands defined under the class-map definition. Listing A-2 defines a
class-any traffic class that matches any traffic matching either of the previously defined
class1 and class2 traffic classes.
Listing A-2 Example of match-any class-map Keyword Usage
class-map class1
match <>
class-map class2
1163_09.book Page 281 Thursday, December 14, 2000 1:47 PM
Listing A-3 defines a class-all traffic class that matches traffic matching both of the
previously defined class1 and class2 traffic classes.
Listing A-3 Example of match-all class-map Keyword Usage
class-map class1
match <>
class-map class2
match <>
Policy Definition
After traffic class configuration, the next step is defining the QoS policies on the previously
defined traffic classes. Policies are defined using the policy-map command. You can
implement any QoS function as a subcommand under the policy-map definition. They
include all edge QoS features, such as rate-limiting, rate-shaping, and IP precedence or
DSCP settings, as well as core QoS features, such as WFQ and WRED.
Listings A-4 and A-5 show examples of policies meant for network boundary and core
network interfaces, respectively.
Listing A-4 Examples of Policies Meant for Network Boundary Interfaces
policy-map epolicy1
match class1
rate-limit <>
policy-map epolicy2
match class2
set ip dscp EF
policy-map epolicy3
match class3
shape <>
set ip precedence 4
1163_09.book Page 282 Thursday, December 14, 2000 1:47 PM
In Listing A-4, epolicy1 defines a rate-limiting policy on the class1 traffic class, epolicy2
sets the DSCP field to EF on the class2 traffic class, and epolicy3 enables traffic shaping as
well as sets the IP precedence value to 4 for all traffic belonging to the class3 class.
Listing A-5 Examples of Policies Meant for Core Network Interfaces
policy-map cpolicy1
match class1
bandwidth <>
random-detect <>
In Listing A-5, cpolicy1 defines a certain minimum bandwidth and WRED policies on the
class1 traffic class.
An example of a multidimensional policy configuration is shown in Listing A-6. In this
example, epolicy1 defines a traffic rate-limiting policy on traffic class class1 and sets the
IP precedence value to 3 for traffic belonging to the class2 class.
Listing A-6 Multidimensional Policies
policy-map epolicy1
match class1
rate-limit <>
match class2
set ip precedence 3
Policy Application
After configuring all relevant class-maps and policy-maps, the final step is enabling the
policy on an interface by associating a policy-map command to an interface using a
service-policy command. An additional keyword, input or output, is used to specify
whether the policy applies to incoming or outgoing packets on an interface, respectively.
Listing A-7 shows examples for enabling a policy on an interface. A policy1 policy is
applied on all the input traffic arriving on interface HSSI0/0/0. A policy2 policy is applied
on the output traffic of interface POS0/0/0.
Listing A-7 Examples of Enabling a Policy on an Interface
interface HSSI0/0/0
service-policy input policy1
interface POS0/0/0
service-policy output policy2
The service-policy command is also available on a per-VC basis on ATM and Frame Relay
interfaces, and on a per-interface basis on logical interfaces such as Tunnel and Fast
EtherChannel interfaces, provided the policies specified by the associated policy-map are
supported over a VC or logical interface.
1163_09.book Page 283 Thursday, December 14, 2000 1:47 PM
Hierarchical Policies
For certain applications, it is necessary to allow definition of hierarchical policies.
Given that service policies can be attached to interfaces as well as to individual VCs on a
Frame Relay or ATM interface, there is already an implied hierarchy of policies that you
can configure by attaching policies at different layers of the hierarchy. (You can attach one
service-policy command to a Frame Relay interface and a different service-policy
command to a permanent virtual circuit [PVC] on the Frame Relay interface.) In a real
sense, an interface or PVC, though not defined by means of a class-map command,
represents a traffic class that shares a common attribute. Hence, you also can configure a
service-policy command not only on an interface or PVC, but also on a traffic class under
a policy-map command.
Listing A-8 shows how you can apply a service-policy command on a traffic class to define
a policy.
Listing A-8 Attaching a Service Policy Directly to a Class
policy-map policy-hierarchy
class class1
service-policy <>
class-map telnet
match <all telnet traffic>
class-map ftp
match <all ftp traffic>
policy-map telnet-ftp-police
class telnet
rate-limit 1000000
class ftp
rate-limit 1000000
policy-map TCP-police-hierarchical
class tcp
rate-limit 10000000
service-policy telnet-ftp-police
1163_09.book Page 284 Thursday, December 14, 2000 1:47 PM
In the preceding listing, the classes tcp, telnet, and ftp define the traffic belonging to TCP
and TCP-based applications, Telnet, and FTP, respectively. A telnet-ftp-police policy is
defined to rate-limit traffic belonging to Telnet and FTP applications to 1 Mbps each.
Finally, the TCP-police-hierarchical policy is defined to enable hierarchical policy
configuration. This hierarchical policy rate-limits traffic belonging to the tcp class to 10
Mbps, while at the same time uses the telnet-ftp-police policy to rate-limit individual TCP-
based application traffic belonging to Telnet and FTP traffic to 1 Mbps each.
4 Input rate-limiting
5 Policy-based routing
7 Output rate-limiting
Apart from the QoS policies, the input packet accounting occurs immediately after the input
access-lists are applied, and the output packet accounting occurs immediately before the
packet goes on the wire.
in which matches are applied. The sequence number also enables easy insertion of new
classes into the ordering.
Consider Listing A-10, where a match against class1 is performed prior to the match
against class2 (sequence number 10 before 20) for policy-map policy1.
Listing A-10 Intra-Policy Execution Order
policy-map policy1
class class1
rate-limit 10 input <>
class class2
rate-limit 20 input <>
In Listing A-11, the policy2 policy-map policy matches traffic against class2 before class1
(sequence number 5 before 10). It illustrates how you can change the order in which
policies are executed.
Listing A-11 Intra-Policy Execution Order Changed by Modifying Sequence Number
policy-map policy1
class class1
rate-limit 10 input <>
class class2
rate-limit 5 input <>
1163_09.book Page 286 Thursday, December 14, 2000 1:47 PM
1163_09.book Page 287 Thursday, December 14, 2000 1:47 PM
APPENDIX
B
Process Switching
In process switching, a packet arrives on an incoming interface and is enqueued on the input
queue of the process that switches the packet. When the process is scheduled to run by the
scheduler, the process looks in the routing table for a route to the destination address. If a
route is found, the next-hop address is retrieved from the routing table. The Layer 2 Media
Access Control (MAC) rewrite information for this next hop is derived from the Address
Resolution Protocol (ARP) table, and the packet is now enqueued on the outbound interface
for transmission. Process switching is slow, inefficient, and processor-intensive because
every packet-switching decision involves lookups in the routing table and in the ARP table.
Process switching does per-packet load balancing when multiple equal cost paths exist to a
destination. Process-switching operation is depicted in Figure B-1.
Route-Cache Forwarding
Route-cache forwarding addresses some of the problems with process switching. In this
method, after the first packet to a destination is process-switched, the destination address,
next-hop interface, and MAC address encapsulation to that next hop are all stored in a
single table called the route cache. You can quickly switch subsequent packets to the
destination by looking up the destination in this route cache. In this method, the switching
decision is made on the same receive interrupt that fetched the incoming packet. Route-
cache forwarding is also commonly referred to as fast switching. The route-cache
forwarding mechanism for packet forwarding is depicted in Figure B-2.
1163_09.book Page 288 Thursday, December 14, 2000 1:47 PM
Figure B-1 Process Switching Packets by Doing Internet Protocol (IP) Routing Table and ARP Table Lookups
IP
routing
table
Frame Packet Packet Packet Frame Packet
ARP
Layer 3 switching
Layer 2 rewrite
Process switch
first packet
IP cache
Frame Packet Frame Packet
In entry Out
Route-cache forwarding populates a fast lookup cache on demand for destination prefixes.
A route-cache entry for a destination is created only after the router receives the first packet
to that destination. This first packet to a destination is process-switched, but any subsequent
packets to the same destination are switched by looking them up in the faster and more
efficient route cache. Route-cache entries are periodically aged out. In addition, network
topology changes can immediately invalidate these entries.
This demand-caching scheme is efficient for scenarios in which most of the traffic flows
are associated with a subset of the destinations in the routing table. Traffic profiles in
the Internet core and in large intranets, however, no longer fit this description. Traffic
characteristics have changed toward an increased number of short-duration flows, typically
sourced by Web-based and interactive applications. This changed pattern of Internet traffic
calls for a paradigm change in the switching mechanismone that reduces the increased
cache-maintenance activity caused by a greater number of short-duration flows and
network topology changes, as reflected by the routing table.
1163_09.book Page 289 Thursday, December 14, 2000 1:47 PM
CEF 289
When multiple equal cost paths exist to a destination subnet, route-cache forwarding caches
/32 host prefixes for all traffic going to that subnet to accomplish load balancing.
CEF
CEF is a scalable, Layer 3 switching mechanism designed to accommodate the changing
traffic characteristics and network dynamics of the Internet and of large intranets. CEF
provides a number of improvements over the traditional route-cache switching approach.
CEF Advantages
CEF avoids the potential overhead of continuous cache activity by using a topology-driven
CEF table for the destination switching decision. The CEF table mirrors the entire contents
of the routing information; there is a one-to-one correspondence between CEF table entries
and routing table prefixes. Any route recursion is also resolved while creating a CEF entry.
This translates into significant benefits in terms of performance, scalability, network
resilience, and functionality. CEFs benefits are most apparent in large, complex networks
that have dynamic traffic patterns.
CEF also avoids performance hits during times of network instability by dropping packets
without trying to process switch a packet based on the routing table when a CEF entry is
missing. Because CEF is based on routing information, a missing CEF entry automatically
implies a missing route entry, which itself might be due to lost peering sessions or network
instability.
An adjacency table is maintained along with the CEF table. The adjacency table keeps the
Layer 2 header information separate from the CEF table and is populated by any protocol
ARP, Open Shortest Path First (OSPF), BGP, and so onthat discovers an adjacency. Each
adjacent nodes link layer header is precomputed and stored along with the adjacency.
The CEF table is populated by callbacks from the routing table. After a route is resolved,
its corresponding CEF entry points to a next hop, which should be an adjacency. If an
adjacency is found in the adjacency table, a pointer to the appropriate adjacency is cached
in the CEF entry. Figure B-3 depicts a CEF switching operation.
1163_09.book Page 290 Thursday, December 14, 2000 1:47 PM
IP routing
table
CEF
Frame Packet Frame Packet
In table Out
Adjacency
information
In addition to regular physical adjacencies, some special handling is also required for
special adjacency types. CEF entries with prefixes that need special processing are cached
with the appropriate special adjacency type. Table B-1 shows examples of these adjacency
types.
Table B-1 Examples of Certain Common Special Adjacency Types
Adjacency Type Reason for Special Handling
Receive Packets to these prefixes are intended for the router. CEF entries with
this adjacency are /32 prefixes of the routers IP addresses, and direct
and normal broadcast addresses.
Null Packets to these prefixes are destined to the routers Null0 interface.
CEF entries with this adjacency are created for prefixes destined to the
routers Null0 interface. Packets to CEF entries pointing to Null
adjacency are dropped by CEF.
Glean CEF entries with Glean adjacency are created for subnet prefixes that
are directly connected to one of the routers non-point-to-point
interfaces. For packets that need to be forwarded to an end station, the
adjacency database is gleaned for the specific prefix.
Punt CEF entries with Punt adjacency are created for prefixes that cant be
CEF-switched because certain features might require special handling
or are not yet CEF-supported.
NOTE The Null0 interface is a pseudo-interface that functions similarly to the null devices
available on most operating systems. This interface is always up and can never forward or
receive traffic. It provides an alternative method of filtering traffic.
1163_09.book Page 291 Thursday, December 14, 2000 1:47 PM
CEF 291
Unlike the route-cache model, which caches /32 prefixes to accomplish load balancing,
CEF uses an efficient hash function lookup to provide per-source/destination pair load
balancing. A hash function points to a unique adjacency for each source and destination
address. In addition, CEF can do per-packet load balancing by using a pointer that moves
among the equal-cost adjacencies to a destination in a round-robin manner.
Other significant advantages of CEF are extensive CEF packet accounting statistics and
QoS policy propagation. Appendix C discusses QoS policy propagation.
NOTE Route-cache forwarding can also operate on a distributed mode by downloading the IP
cache information to the interface line cards.
NOTE This case study is intended to illustrate how CEF works and to differentiate it from the
route-cache model. By enabling route-cache and CEF forwarding one after the other, this
case study discusses the route-cache and the CEF forwarding tables on Router R2 to foster
a better understanding of both route-cache forwarding and CEF. The case study also
discusses the various types of CEF entries.
1163_09.book Page 292 Thursday, December 14, 2000 1:47 PM
Hssi3/0/0
191.108.0.0/16
R1 R5 20.0.0.0/8
131.108.0.0/16
101 109 210.210.210.0/24
10
Hssi2/0/0 Hssi2/0/1 14
200.200.200.100/30 200.200.200.108/30
ISP network
191.108.10.8/30 191.108.10.12/30
AS109
Hssi3/0/0
102 110
Serial4/0/0
Hssi2/0/1 Hssi2/0/1 9 13
Hssi2/0/0 Hssi2/0/0
R2 R3
105 200.200.200.104/30 106
FE1/0/0
17 FE1/0/0 9
Serial4/0/1
201.201.201.8/30
200.200.200.16/28 201.201.201.16/28
10 Customer network
AS 4567
R4 R6
Because both CEF and route-cache model switch packets are based on routing information,
first look into the routing table in Router R2. Listing B-1 shows the routing information in
Router R2 by using the show ip route command.
Listing B-1 Routing Table of Router R2
R2#sh ip route
Codes: C - connected, S - static, I - IGRP, R - RIP, M - mobile,
B - BGP
D - EIGRP, EX - EIGRP external, O - OSPF, IA - OSPF inter area
N1 - OSPF NSSA external type 1, N2 - OSPF NSSA external type 2
E1 - OSPF external type 1, E2 - OSPF external type 2, E - EGP
i - IS-IS, L1 - IS-IS level-1, L2 - IS-IS level-2, * - candidate
default
U - per-user static route, o - ODR, P - periodic downloaded
static route
T - traffic engineered route
Gateway of last resort is not set
200.200.200.0/24 is variably subnetted, 8 subnets, 4 masks
C 200.200.200.104/30 is directly connected, Hssi2/0/0
O 200.200.200.108/30 [110/20] via 200.200.200.101, Hssi2/0/1
1163_09.book Page 293 Thursday, December 14, 2000 1:47 PM
CEF 293
The IP routing table is independent of the packet switching mechanism, CEF or route-cache
forwarding. It is populated mainly by dynamic routing protocols, connected subnets, and
static routes.
Before enabling CEF on Router R2, use the show ip cache verbose command to look at
the cache table used in the route-cache model. Listing B-2 shows a snapshot of Router R2s
route cache table.
Listing B-2 A Snapshot of Router R2 Doing Route-Cache Forwarding
R2#sh ip cache verbose
IP routing cache 7 entries, 1196 bytes
1967 adds, 1960 invalidates, 0 refcounts
Minimum invalidation interval 2 seconds, maximum interval 5 seconds,
quiet interval 3 seconds, threshold 0 requests
Invalidation rate 0 in last second, 0 in last 3 seconds
The cache table is always in a state of flux, as it is traffic-driven. A cache entry to a prefix
is created only after the first packet to that prefix address space is process-switched. Note
the following characteristics of the route-cache-based forwarding table:
The cache table is classful. Its prefix length can only be 0, 8, 16, 24, or 32.
A Class A, B, or C network prefix cannot have a cache prefix length of other than 8,
16, or 24, respectively.
The length of a prefix in the cache table is equal to the most specific subnet of that
particular prefixs class network in the IP routing table. The cache prefix length of the
200.200.200.0/24 address space, for example, is /32 because at least one route in the
200.200.200.0/24 address space exists with a /32 mask in the routing table.
The length of a prefix is shown in the format x-y in the cache table. Here, x indicates the
actual length of the cache entry, and y indicates the length of the matching prefix in the
routing table. A cache entry to a prefix also shows the outgoing interface, the next-hop IP
address, and the Layer 2 encapsulation.
CEF is enabled on Router R2 using the global command ip cef. Listings B-3 and B-4 show
Router R2s CEF adjacency table and CEF forwarding table.
Listing B-3 CEF Adjacency Table of Router R2
R4506#show adjacency detail
Protocol Interface Address
IP Hssi2/0/0 point2point(26)
0 packets, 0 bytes
0F000800
CEF expires: 00:02:47
refresh: 00:00:47
IP Hssi2/0/1 point2point(16)
0 packets, 0 bytes
0F000800
CEF expires: 00:02:01
refresh: 00:00:01
IP FastEthernet1/0/0 200.200.200.20(16)
0 packets, 0 bytes
00000C31DD8B00E0B0E2B8430800
ARP 00:08:00
The adjacency table shows the Layer 2 encapsulation needed to reach various directly
connected routers and end-hosts. Adjacencies are discovered by means of ARP, routing
1163_09.book Page 295 Thursday, December 14, 2000 1:47 PM
CEF 295
protocols, or map configuration commands used typically on ATM and Frame Relay
interfaces. The adjacencies are periodically refreshed to keep them current.
Listing B-4 CEF Forwarding Table of Router R2
R2#sh ip cef
Prefix Next Hop Interface
0.0.0.0/32 receive
20.0.0.0/8 200.200.200.106 Hssi2/0/0
131.108.0.0/16 200.200.200.106 Hssi2/0/0
191.108.0.0/16 200.200.200.106 Hssi2/0/0
191.108.10.8/30 200.200.200.106 Hssi2/0/0
191.108.10.12/30 200.200.200.106 Hssi2/0/0
194.194.194.0/24 200.200.200.106 Hssi2/0/0
200.200.200.0/24 attached Null0
200.200.200.1/32 200.200.200.101 Hssi2/0/1
200.200.200.2/32 receive
200.200.200.3/32 200.200.200.106 Hssi2/0/0
200.200.200.16/28 attached FastEthernet1/0/0
200.200.200.16/32 receive
200.200.200.17/32 receive
200.200.200.31/32 receive
200.200.200.100/30 attached Hssi2/0/1
200.200.200.100/32 receive
200.200.200.101/32 200.200.200.101 Hssi2/0/1
200.200.200.102/32 receive
200.200.200.103/32 receive
200.200.200.104/30 attached Hssi2/0/0
200.200.200.104/32 receive
200.200.200.105/32 receive
200.200.200.107/32 receive
200.200.200.108/30 200.200.200.101 Hssi2/0/1
200.200.200.106 Hssi2/0/0
201.201.201.8/30 200.200.200.106 Hssi2/0/0
201.201.201.16/28 200.200.200.106 Hssi2/0/0
210.210.0.0/16 200.200.200.106 Hssi2/0/0
224.0.0.0/4 0.0.0.0
224.0.0.0/24 receive
255.255.255.255/32 receive
The CEF table is stable as long as the topology, as reflected by the routing table, stays the
same. All routing table entries have a one-to-one matching entry in the CEF table. CEF
entries of various adjacency types in the remaining part of the case study are discussed next.
Listings B-5 through B-10 show examples of the various CEF entry types.
Listing B-5 An Example of a CEF Entry with Receive Adjacency
R4506#sh ip cef 200.200.200.105
200.200.200.105/32, version 6, receive
1163_09.book Page 296 Thursday, December 14, 2000 1:47 PM
CEF, in addition to maintaining a matching CEF entry for each route in the routing table,
carries CEF entries with a Receive adjacency for all connected IP addresses, directed
broadcast addresses (the first and the last addresses of each connected subnet) of the router,
and general broadcast addresses (0.0.0.0 and 255.255.255.255). 200.200.200.105 is the IP
address of the routers Hssi2/0/0 interface.
Listing B-6 An Example of a Recursive CEF Entry with Valid Cached Adjacency
R4506#sh ip cef 131.108.0.0
131.108.0.0/16, version 20, cached adjacency to Hssi2/0/0
0 packets, 0 bytes
via 191.108.10.10, 0 dependencies, recursive
next hop 200.200.200.106, Hssi2/0/0 via 191.108.10.8/30
valid cached adjacency
The routing table entry for 131.108.0.0 has a next-hop address of 191.108.10.10, which is
not a directly connected next hop. It requires a recursive lookup for the next hop for
191.108.10.10 to arrive at a right-connected next hop of 200.200.200.106 for 131.108.0.0.
CEF does a recursive lookup prior to creating a CEF entry for 131.108.10.1. Hence,
131.108.0.0 gets a connected next hop 200.200.200.106 in the CEF table.
In the route-cache forwarding model, recursive lookup for a destination, if needed, is done
in the process switch path for the first packet being switched to the destination appropriate
to the cache entry created. CEF precomputes recursive lookups before packet arrival. This
can save some CPU cycles.
Listing B-7 An Example of a CEF Entry with Valid Cached Adjacency with Dependencies
R2#sh ip cef 191.108.10.10
191.108.10.8/30, version 17, cached adjacency to Hssi2/0/0
0 packets, 0 bytes
via 200.200.200.106, Hssi2/0/0, 4 dependencies
next hop 200.200.200.106, Hssi2/0/0
valid cached adjacency
This CEF entry denotes four dependencies. It indicates that four recursive entries (20.0.0.0/
8, 131.108.0.0/16, 191.108.0.0/16, and 210.210.210.0/24) depend on this CEF entry to
resolve their next hop.
Listing B-8 An Example of a CEF Entry for Equal Cost Paths with Per-Destination Load Sharing
R2#sh ip cef 200.200.200.108
200.200.200.108/30, version 12, per-destination sharing
0 packets, 0 bytes
via 200.200.200.101, Hssi2/0/1, 0 dependencies
traffic share 1
next hop 200.200.200.101, Hssi2/0/1
valid adjacency
via 200.200.200.106, Hssi2/0/0, 0 dependencies
traffic share 1
1163_09.book Page 297 Thursday, December 14, 2000 1:47 PM
CEF 297
Listing B-8 An Example of a CEF Entry for Equal Cost Paths with Per-Destination Load Sharing (Continued)
next hop 200.200.200.106, Hssi2/0/0
valid adjacency
0 packets, 0 bytes switched through the prefix
200.200.200.108/30 has two equal cost paths and load shares on a per source-destination
pair basis. (Note that the term per-destination sharing in Listing B-8 is not entirely
accurate. It is actually per source-destination pair load sharing.) Load sharing on a per-
packet basis is also supported under CEF.
Listing B-9 An Example of CEF Entry with Glean Adjacency
R2#sh ip cef 200.200.200.16 255.255.255.240
200.200.200.16/28, version 26, attached, connected
0 packets, 0 bytes
via FastEthernet1/0/0, 0 dependencies
valid glean adjacency
Table B-2 Comparison Between Route-Cache and CEF Switching Schemes (Continued)
Route-Cache Switching CEF Switching
Efficient for traffic characteristics where there Efficient for traffic characteristics where there
is a small number of long-duration flows, is a large number of short-duration flows,
typically sourced by file transfer applications. typically sourced by Web-based and interactive
applications.
Any recursive lookup for a route is done when CEF resolves any recursive lookup required for
an initial packet to a destination triggers cache routes in the table before it installs it in the
creation. The limit on depth of recursion is six. forwarding table. There is no limit on depth of
recursion.
Periodic CPU spikes of cache-based No activity that causes periodic CPU spikes.
forwarding mechanism and its performance hit Doesnt take performance hit during network
during network instability. instability. CEF either switches a packet or
drops it.
Does destination-based load balancing. Does load balancing per source-destination
Accomplished by creating /32 cache entries for pair. Done by a hash algorithm that takes both
a routing entry that has multiple equal cost source and destination into consideration.
paths.
No per-packet load balancing. Does per-packet load balancing. Accomplished
by using a hash algorithm that points to
possible paths in a round-robin manner.
Packets that need to be forwarded to the Null0 Packets that need to be forwarded to the Null0
interface are not supported under route-cache interface can be CEF-switched and can be
forwarding. Such packets go through the dropped efficiently by means of a CEF special
inefficient process-switching path. Null adjacency.
Features such as policy routing, which need Because CEF is an exact replica of the routing
occasional route lookup, cant be supported table, policy routing can be efficiently
cleanly. supported through CEF.
Doesnt provide hooks for QoS policy Allows QoS policy information tagged to a
propagation. CEF entry. A packet can be CEF-switched,
taking the tagged QoS policy into
consideration.
Summary
CEF is the recommended switching mechanism in todays large-scale IP networks and ISP
networks. Support for some of the QoS functions can be limited to specific packet-
switching modes. In particular, all distributed QoS functions, where QoS functions run on
a routers individual line cards instead of on the central processor card, require distributed
CEF.
1163_09.book Page 299 Thursday, December 14, 2000 1:47 PM
1163_09.book Page 300 Thursday, December 14, 2000 1:47 PM
1163_09.book Page 301 Thursday, December 14, 2000 1:47 PM
APPENDIX
C
Routing Policies
This appendix discusses policy routing and quality of service (QoS) policy propagation
using the Border Gateway Protocol (BGP). Policy routing is used to override a routers
traditional, destination-based routing function with flexible policies. QoS policy
propagation uses BGP to propagate policy information over a service provider network.
This appendix also briefly discusses QoS-based routing.
QoS-Based Routing
QoS-based routing is a routing mechanism under which paths for flows are determined
based on some knowledge of the resource availability in the network as well as the QoS
requirements of flows1.
QoS-based routing calls the following significant extensions:
A routing protocol carries metrics with dynamic resource (QoS) availability
information (for example, available bandwidth, packet loss, or delay).
A routing protocol should calculate not only the most optimal path, but also multiple
possible paths based on their QoS availability.
Each flow carries the required QoS in it. The required QoS information can be carried
in the Type of Service (ToS) byte in the Internet Protocol (IP) header. The routing path
for a flow is chosen according to the flows QoS requirement.
QoS-based routing also involves significant challenges. QoS availability metrics are highly
dynamic in nature. This makes routing updates more frequent, consuming valuable network
resources and router CPU cycles. A flow could oscillate frequently among alternate QoS
paths as the fluid path QoS metrics change. Furthermore, frequently changing routes can
increase jitter, the variation in the delay experienced by end users. Unless these concerns
1163_09.book Page 302 Thursday, December 14, 2000 1:47 PM
are addressed, QoS-based routing defeats its objective of being a value add-on to a QoS-
based network.
Open Shortest Path First (OSPF) and Intermediate System-to-Intermediate System (IS-IS),
the common Interior Gateway Protocols (IGPs) in a service provider network, could
advertise a ToS byte along with a link-state advertisement. But the ToS byte is currently set
to zero and is not being used. QoS routing is still a topic under discussion in the standards
bodies.
In the meantime, OSPF and IS-IS are being extended for Multiprotocol Label Switching
(MPLS) traffic engineering (TE) to carry link resource information with each route. These
routing protocols still remain destination-based, but each route carries extra resource
information, which protocols such as MPLS can use for TE. TE-extended OSPF and IS-IS
protocols provide a practical trade-off between the present-day destination-based routing
protocols and QoS routing. MPLS-based TE is discussed in Chapter 10, MPLS Traffic
Engineering.
Policy-Based Routing
Routing in IP networks today is based solely on a packets destination IP address. Routing
based on other information carried in a packets IP header or packet length is not possible
using present-day dynamic routing protocols. Policy routing is intended to address this
need for flexible routing policies.
For traffic destined to a particular server, an Internet service provider (ISP) might want to
send traffic with a precedence of 3 on a dedicated faster link than traffic with a precedence
of 0. Though the destination is the same, the traffic is routed over a different dedicated link
for each IP precedence. Similarly, routing can be based on packet length, source address, a
flow defined by the source destination pair and Transmission Control Protocol (TCP)/User
Datagram Protocol (UDP) ports, ToS/precedence bits, batch versus interactive traffic, and
so on. This flexible routing mode is commonly referred to as policy-based routing.
Policy-based routing is not based on any dynamic routing protocol, but it uses the static
configuration local to the router. It allows traffic to be routed based on the defined policy,
either when specific routing information for the flow destination is unavailable, or by totally
bypassing the dynamic routing information. In addition, for policy-routed traffic, you can
configure a router to mark the packets IP precedence.
NOTE Some policy routing functions requiring a route table lookup perform well in the Cisco
Express Forwarding (CEF) switching path. CEF is discussed in Appendix B, Packet
Switching Mechanisms. Because CEF mirrors each entry in the routing table, policy
routing can use the CEF table without ever needing to make a route table lookup.
1163_09.book Page 303 Thursday, December 14, 2000 1:47 PM
If netflow accounting is enabled on the interface to collect flow statistics, you should enable
the ip route-cache flow accelerate command. For a trade-off of minor memory intake,
flow-cache entries carry state and avoid the policy route-map check for each packet of an
active flow. Netflow accounting is used to collect traffic flow statistics.
Because the policy routing configuration is static, it can potentially black-hole traffic when
the configured next hop is no longer available. Policy routing can use the Cisco Discovery
Protocol (CDP) to verify next-hop availability. When policy routing can no longer see the
next hop in the CDP table, it stops forwarding the matching packets to the configured next
hop and routes those packets using the routing table. The router reverts to policy routing
when the next hop becomes available (through CDP). This functionality applies only when
CDP is enabled on the interface.
65 Hssi3/0/0 181.188.10.8/30 10
Service provider
Serial4/0/0 181.188.10.12/30 14
IR
211.201.201.64/27 [Traffic with precedence 0, 1, 2, 3]
Listing C-1 shows the configuration on the Internet Router (IR) of the e-commerce
company to enable the router to route packets based on their IP precedence value.
Listing C-1 Configuration on the IR to Route Packets Based on Their IP Precedence Value
interface FastEthernet 2/0/1
ip address 211.201.201.65 255.255.255.224
ip policy route-map tasman
Listing C-1 Configuration on the IR to Route Packets Based on Their IP Precedence Value (Continued)
access-list 101 permit ip any any precedence priority
access-list 101 permit ip any any precedence immediate
access-list 101 permit ip any any precedence flash
access-list 102 permit ip any any precedence flash-override
access-list 102 permit ip any any precedence critical
access-list 102 permit ip any any precedence internet
access-list 102 permit ip any any precedence network
The interface FastEthernet2/0/1 is the input interface for all internal traffic. Policy routing
is enabled on the input interface. All packets arriving on this interface are policy-routed
based on route-map tasman.
access-list 101 and access-list 102 are used to match packets with IP precedence values of
0, 1, 2, 3 and 4, 5, 6, 7, respectively.
All packets matching access-list 101 are forwarded to the next-hop IP address of
181.188.10.14. All packets matching access-list 102 are forwarded to the next-hop IP
address of 181.188.10.10. Listing C-2 shows the relevant show commands for policy
routing.
Listing C-2 show Commands for Verifying Policy Routing Configuration and Operation
IR#show ip policy
The show ip policy command shows the interface(s) performing policy routing for
incoming packets along the associated route map for each policy-routed interface. The
show route-map tasman command shows the details of the route map tasman and the
policy-routed packet statistics for each element (sequence number) of the route map.
All packets with a minimum and maximum packet size of 32 and 1000 bytes, respectively,
are set with an IP precedence of 5.
NOTE A few handy pieces of information on policy-routing configuration are given here:
Only one policy route map is allowed per interface. You can enter multiple route-map
elements with different combinations of match and set commands, however.
You can specify multiple match and set statements in a policy-routing route map.
When all match conditions are true, all sets are performed.
When more than one parameter is used for a match or a set statement, a match or
set happens when any of the parameters is a successful match or a successful set,
respectively.
match ip address 101 102 is true, for example, when the packet matches against
either IP access list 101 or 102. set ip next-hop X Y Z sets the IP next hop for the
matched packet to the first reachable next hop. X, Y, and Z are the first, second, and
third choices for the IP next hop.
1163_09.book Page 306 Thursday, December 14, 2000 1:47 PM
value based on its service level? This section studies ways you can use BGP for QoS policy
propagation in such situations.
QoS policy propagation using BGP2 is a mechanism to classify packets based on IP prefix,
BGP community, and BGP autonomous system (AS) path information. The supported
classification policies include the IP precedence setting and the ability to tag the packet with
a QoS class identifier, called a QoS group, internal to the router. After a packet is classified,
you can use other QoS features such as Committed Access Rate (CAR) and Weighted
Random Early Detection (WRED) to specify and enforce business policies to fit your
business model. CAR and WRED are discussed in detail in Chapter 3, Network Boundary
Traffic Conditioners: Packet Classifier, Marker, and Traffic Rate Management, and in
Chapter 6, Per-Hop Behavior: Congestion Avoidance and Packet Drop Policy.
QoS policy propagation using BGP requires CEF. CEF switching is discussed in Appendix
B. Any BGP QoS policy from the BGP routing table is passed to the CEF table through the
IP routing table. The CEF entry for a destination prefix is tagged with the BGP QoS policy.
When CEF-switching a packet, the QoS policy is mapped to the packet as per the CEF
table.
CEF table
Prefix Next-hop Precedence Internet traffic
going to the
194.194.194.0/24 Hssi0/0/0 4 premium customer Peer
BR-2 ISP
Internet traffic
going to the
premium customer Internet service provider Hssi3/0/0
AS 109
BR-3 Serial4/0/0 Premium
HSSI3/0/0 Hssi0/0/0
Peer customer
ISP Serial4/0/1 AS 4567
iBGP peering 194.194.194.0/24
BR-1
In Figure C-2, Router BR-3 connects to the premium customer. Therefore, all traffic
coming from the premium customer connection gets premium service with the provider
network. Premium service is identified by an IP precedence of 4 within the packet header.
Internet traffic for the premium customer can arrive on either Router BR-1 or BR-3, as both
routers peer with the rest of the Internet. All such Internet traffic on Router BR-1 and Router
BR-3 going to the premium customer network needs to be given an IP precedence of 4.
Listing C-4 shows how to enable Router BR-3 for a premium customer and BGP policy
propagation functionality for premium service.
Listing C-4 Enable Router BR-3 with BGP Policy Propagation Functionality for Premium Service and a Premium
Customer Connection
ip cef
interface loopback 0
ip address 200.200.200.3 255.255.255.255
interface Serial4/0/1
ip address 201.201.201.10 255.255.255.252
bgp-policy source ip-prec-map
interface Hssi3/0/0
bgp-policy destination ip-prec-map
interface Serial4/0/0
1163_09.book Page 309 Thursday, December 14, 2000 1:47 PM
Listing C-4 Enable Router BR-3 with BGP Policy Propagation Functionality for Premium Service and a Premium
Customer Connection (Continued)
bgp-policy destination ip-prec-map
ip bgp-community new-format
The route-map tasman command on router BR-3 sets a precedence of 4 for all routes with
a AS path of 4567. In this case, it is only route 194.194.194.0/24 that belongs to AS 4567.
Hence, IP precedence 4 is set on this route in the IP routing table, which is carried over to
the CEF table. In addition, routes received on this peering with the premium customer are
assigned a community of 109:4 using the route-map premium command such that routers
elsewhere can use the community information to assign a policy.
Note that the bgp-policy source ip-prec-map command is used on interface Serial4/0/1
so that BGP policy propagation is applied on all premium customer packets. Here, IP
precedence mapping is done based on the arriving packets source address using the
precedence value tagged to the source IP addresss matching CEF entry. Internet traffic
going to the premium customer can enter its service provider network on any of its edge
routers with peering connection to other service providers. Hence, QoS policy information
regarding a premium customer should be propagated all over the provider network so that
the edge routers can set IP precedence based on the QoS policy information. In this
example, Internet traffic for the premium customer can arrive on either Router BR-1 or
BR-3.
Premium customer traffic arriving on interface Hssi3/0/0 and on Serial4/0/0 of Router
BR-3 is assigned a precedence of 4. The bgp-policy destination ip-prec-map command is
needed on the packets input interface so that BGP policy propagation is applied on all
incoming packets. Here, IP precedence mapping is done based on the packets destination
address using the matching CEF entrys precedence value for the destination address.
1163_09.book Page 310 Thursday, December 14, 2000 1:47 PM
Listing C-5 shows the relevant BR-1 configuration that enables BGP policy propagation for
premium service.
Listing C-5 Enable Router BR-1 with BGP Policy Propagation Functionality for Premium Service
ip cef
In Listing C-5 of router BR-1, the table-map command uses route-map tasman to assign a
precedence of 4 for all BGP routes in the routing table that have a BGP community whose
last two bytes are set to 4. Because router BR-3 tags the premium customer route
194.194.194.0/24 with a community 109:4 and exchanges it via IBGP with routers BR-1
and BR-2, the router BR-1 tags the 194.14.194.0/24 in its IP routing table and CEF table
with an IP precedence value of 4
The bgp-policy destination ip-prec-map command is needed on the input interface
HSSI3/0 of router BR-1 for BGP policy propagation to be applied on the incoming packets
from the Internet based on their destination IP address.
Summary
There is a growing need for QoS and traffic engineering in large, dynamic routing
environments. The capability of policy-based routing to selectively set precedence bits and
route packets based on a predefined flexible policy is becoming increasingly important. At
the same time, routing protocols such as OSPF and IS-IS are being addressed for QoS
support. TE extends OSPF and IS-IS to carry available resource information along with its
advertisements, and it is a step toward full QoS routing. The viability of full QoS routing is
still under discussion in the standards bodies.
BGP facilitates policy propagation across the entire network. CEF gets this BGP policy
information from the routing table and uses it to set a packet policy before forwarding it.
1163_09.book Page 311 Thursday, December 14, 2000 1:47 PM
References 311
References
1 RFC 2386, A Framework for QoS-Based Routing in the Internet, E. Crawley, et al.
2 RFC 1771, Border Gateway Protocol 4 (BGP-4), Y. Rekhter and T. Li.
1163_09.book Page 312 Thursday, December 14, 2000 1:47 PM
1163_09.book Page 313 Thursday, December 14, 2000 1:47 PM
APPENDIX
D
Real-time Transport Protocol
(RTP)
Real-time Transport Protocol (RTP) is a transport protocol for carrying real-time traffic
flows, such as packetized audio and video traffic, in an Internet Protocol (IP) network.1 It
provides a standard packet header format, which gives sequence numbering, media-specific
timestamp data, source identification, and payload identification, among other things. RTP
is usually carried using User Datagram Protocol (UDP).
RTP is supplemented by the Real-time Transfer Control Protocol (RTCP), which carries
control information about the current RTP session. RTCP packets are used to carry status
information on the RTP channel, such as the amount of data transmitted and how well the
data is received. RTP can use this information on current packet loss statistics provided by
RTCP and can request the source to adapt its transmission rate accordingly.
RTCP provides support for real-time conferencing for large groups. RTP allows flows from
several sources to be mixed in gateways to provide a single resulting flow. When this
happens, each mixed packet contains the source identifications of all the contributing
sources.
RTP timestamps are flow-specific and, therefore, the timestamps used are in units
appropriate for that media flow. Hence, when multiple flows are mixed to form a single
flow, the RTP timestamps cannot ensure interflow synchronization by themselves. RTCP is
used to provide the relationship between the real-time clock at a sender and the RTP media
timestamps. RTCP also can carry descriptive textual information on the source in a
conference.
Note that RTP does not address the issue of resource reservation or quality of service (QoS);
instead, it relies on resource allocation functions such as Weighted Fair Queuing (WFQ)
and on reservation protocols, such as Resource Reservation Protocol (RSVP).
Reference
1 RTP: A Transport Protocol for Real-Time Applications, RFC 1889, V. Jacobson,
January 1996.
1163_09.book Page 314 Thursday, December 14, 2000 1:47 PM
1163_09.book Page 315 Thursday, December 14, 2000 1:47 PM
APPENDIX
E
General IP Line Efficiency
Functions
This appendix discusses the following Internet Protocol (IP) line efficiency functions: the
Nagle Algorithm, Path maximum transmission unit (MTU) Discovery, Transmission
Control Protocol/Internet Protocol (TCP/IP) header compression, and Real-time Transport
Protocol (RTP) header compression.
To determine the Path MTU, the source station sends out large packets with a DF (Dont
Fragment) bit set in the IP header to the destination. When a router or a switch along the
path needs to switch the packet to a link that supports a lower MTU, it sends back a Cant
Fragment Internet Control Message Protocol (ICMP) message to the sender. A sender that
receives a Cant Fragment ICMP message lowers the packet size and tries to discover the
Path MTU once again. When a sender no longer receives the Cant Fragment ICMP
message, the MTU size used by the sender becomes the Path MTU.
The ip tcp path-mtu-discovery command is used to enable TCP MTU path discovery for
TCP connections initiated by the router.
References
1 RFC 896, Congestion Control in IP/TCP Internetworks, John Nagle, 1984.
2 RFC 1191, Path MTU Discovery, S. Deering, and others, 1990.
3 RFC 1144, Compressing TCP/IP Headers for Low-Speed Serial Links, V. Jacobson,
1990.
4 RFC 2508, Compressing IP/UDP/RTP Headers for Low-Speed Serial Links, S. Casner,
V. Jacobson, 1999.
1163_09.book Page 317 Thursday, December 14, 2000 1:47 PM
1163_09.book Page 318 Thursday, December 14, 2000 1:47 PM
1163_09.book Page 319 Thursday, December 14, 2000 1:47 PM
APPENDIX
F
Link-Layer Fragmentation and
Interleaving
This appendix studies the Link-Layer Fragmentation and Interleaving (LFI) function using
the Multi-link Point-to-Point (MLPP) protocol. LFI is a link-layer efficiency mechanism
that improves the efficiency and predictability of the service offered to certain critical,
application-level traffic.
MLPP1 provides the functionality to spread traffic over multiple wide-area network
(WAN) links running the Point-to-Point Protocol (PPP) while providing multi-vendor
interoperability as well as packet fragmentation and sequencing. Chapter 8, Layer 2 QoS:
Interworking with IP QoS, discusses the Frame Relay fragmentation standard FRF.12,
which performs link-layer fragmentation and interleaving functions over a Frame Relay
network.
Even with scheduling algorithms such as Class-Based Weighted Fair Queuing (CBWFQ)
with a priority queue and Resource Reservation Protocol (RSVP), real-time interactive
traffic can still run into blocking delays due to large packets, especially on slow serial
connections. When a voice packet arrives just after a large packet has been scheduled for
transmission, the voice packet needs to wait until the large packet is transmitted before it
can be scheduled. This blocking delay caused by the large packet can be prohibitive for
real-time applications such as interactive voice. Interactive voice requires an end-to-end
delay of 100 ms to 150 ms to be effective.
LFI fragments the large data frames into smaller frames, interleaving the small, delay-
sensitive packets between the fragments of large packets before putting them on the queue
for transmission, thereby easing the delay seen by the small-size, real-time packets. The
fragmented data frame is reassembled at the destination.
You can use LFI in conjunction with CBWFQ with a priority queue for achieving the needs
of real-time voice traffic, as shown in Figure F-1.
1163_09.book Page 320 Thursday, December 14, 2000 1:47 PM
Priority queue
1 1 1 1 Transmit
Incoming packets 1 queue
2 1 1 1 2 2 Classify 2 2 1 2
Large packet
Large IP voice fragmentation:
data fragment size
CBWFQ
packet based on
required delay
As you can see in Figure F-1, the real-time voice traffic is placed in the priority queue. The
other nonreal-time data traffic can go into one or more normal Weighted Fair Queuing
(WFQ) queues. The packets belonging to the nonreal-time traffic are fragmented to secure
a minimal blocking delay for the voice traffic. The data traffic fragments are placed in the
WFQ queues. Now, CBWFQ with a priority queue can run on the voice and data queues.
The maximum blocking delay seen by a voice packet is equal to a fragments serialization
delay. Table F-1 shows the fragment size for a maximum blocking delay of 10 ms based on
link speed.
Table F-1 Fragment Size for a Maximum Blocking Delay of 10 ms Based on Link Speed
Link Speed (in Kbps) Fragment Size (in Bytes)
56 70
64 80
128 160
256 320
512 640
768 1000
LFI ensures that voice and similar small-size packets are not unacceptably delayed behind
large data packets. It also attempts to ensure that the small packets are sent in a more regular
fashion, thereby reducing jitter. This capability allows a network to carry voice and other
delay-sensitive traffic along with nontime-sensitive traffic.
1163_09.book Page 321 Thursday, December 14, 2000 1:47 PM
Reference 321
NOTE MLPP link-layer fragmentation and interleaving are used in conjunction with CBWFQ
using a priority queue to minimize the delay seen by the voice traffic. Listing F-1 shows the
configuration for this purpose.
Listing F-1 MLPP Link-Layer Fragmentation Configuration
class-map premium
match <voice packets>
policy-map premiumpolicy
class premium
priority 500
interface serial0
bandwidth 128
no fair-queue
ppp multilink
interarface serial1
bandwidth 128
no fair-queue
ppp multilink
interface virtual-template 1
service-policy output premiumpolicy
ppp multilink
ppp multilink fragment-delay 20
ppp multilink interleave
Reference
1 RFC 1717, The PPP Multilink Protocol (MP), K. Sklower et al., 1994.
1163_09.book Page 322 Thursday, December 14, 2000 1:47 PM
1163_09.book Page 323 Thursday, December 14, 2000 1:47 PM
APPENDIX
G
P2 P1 P0 T3 T2 T1 T0 CU
Figure G-2 Differentiated Services Code Point (DSCP) Bits in the Differentiated Services (DS) Byte
Defined DSCPs:
Default DSCP: 000 000.
1163_09.book Page 324 Thursday, December 14, 2000 1:47 PM
325
INDEX
B C
backbone routers, deployment, 291297 C/R (Command and Response), 186
Backward Explicit Notification (BECN), 54 caches, route-cache forwarding, 287288
bandwidth, 10, 316 CAR (Committed Access Rate), 35, 279
allocation, 82 configuring, 37, 40
assignment per ToS class, 88 rate-limiting function, 4348
commands, 86 case studies, 183184
custom queuing, 9497 ABR PVC, 175
end-to-end, 157162, 240 ATM CLP bit based on IP Precedence, 185
higher allocation case study, 86 bandwidth
QoS-group classification case study, 91 assignment per ToS class, 88
TE tunnels, 251 QoS-group classification, 91
BECN (Backward Explicit Notification), 54, for voice traffic, 121123
186187 class-based MWRR scheduling, 113114
Frame Relay case study, 194195 congestion avoidance, 133135
integration, 189 differentiated IP packet discards, 180
behavior downstream label ditribution, 219222
PHB, 2628 enforcing public exchange point traffic,
TCP, 129 5253
Best-Effort service, 6 flow-based WFQ, 8082
BGP (Border Gateway Policy), 35, 301, 307 fragmentation (Frame Relay), 199
bgp-policy destination ip-prec-map Frame Relay adaptive traffic shaping, 194
command, 309 FRTS with Autosense, 192
bgp-policy source ip-prec-map command, 309 high bandwidth, 86
binding labels, 213216 IP Precedence, 41
Border Gateway Protocol. See BGP limiting traffic, 4850
boundaries mapping Frame Relay, 198
network interfaces, 281 minimum interface bandwidth with
traffic conditioners, 2526 different protocols, 9598
bumping traffic, 179 MPLS
burst size, 172 CoS, 225227
business-site FRTS configuration, 194 TE tunnel configuration, 258272
byte-by-byte round-robin GPS Scheduler VPN QoS, 240241
simulation, 7374 multiple PVCs (Frame Relay), 196, 198
byters, 95 packets
classification, 3739
prioritization, 93
per-VC WFQ (Frame Relay), 198
preventing denial-of-service attacks, 51
1163_09IX.fm Page 329 Friday, December 15, 2000 1:01 PM
commands 329
330 commands
distribution 331
RSVP-related, 159
TE tunnel, 258272
D
voice traffic, 121123 data terminal equipment (DTE), 187
confirm-action statement, 47 data-link connection identifier (DLCI), 185
congestion DCEF (Distributed Cisco Express
avoidance, 13 Forwarding), 83, 291
ECN, 139 DE (Discard Eligible), 186187, 198
enhancing link utilization with Deficit Round Robin (DRR), 84, 106
WRED, 133135 definitions
Frame Relay, 186187 policies, 281
RED, 130132 traffic class, 280
support, 6770, 72 QoS policies, 279
TCP, 127128 TE trunks, 251
Congestion Avoidance Policy, 121123 de-group command, 199
Congestion Experienced (CE), 139 de-list command, 199
connections deleting labels, 222
performance, 9, 11 denial-of-service attacks, preventing, 51
TCP (Nagle Algorithm), 315 deploying CEF backbone routers, 291297
traffic shaping, 56, 5860 designated SBM (DSBM), 204
Constant Bit Rate (CBR), 172173 differentiated IP packet discards, 180
contract restrictions, VP shaping, 174 differentiated MPLS VPN QoS, 237238
control Differentiated Service Code Point (DSCP), 279
Frame relay congestion, 186187 differentiated services, ATM case study,
MPLS, 211217 183184
controlled load service, 155 differentiated services byte (DS byte), 22
conversation queues, 80 Differentiated Services Code Point (DSCP),
COPS (Common Open Policy Service), 30 12, 2228
core network interfaces, 282 diffserv architecture, 2228
CoS Discard Eliglible (DE), 186187
configuring, 123 discard strategies, cells, 173174
MPLS case study, 225227 distinct reservations, 152
cos-queue group cos-a command, 122 Distributed Cisco Express Forwarding
critical traffic, higher bandwidth allocation, (DCEF), 83, 291
8687 Distributed Traffic Shaping (DTS), 55
CU (currently unused), 24 distributed WFQ (DWFQ), 8385
currently unused (CU), 24 distribution
custom queuing, 94100 downstream label case study, 219222
local link resource information, 254
1163_09IX.fm Page 332 Friday, December 15, 2000 1:01 PM
DLCI (data-link connection identifier), 185 ER-LSP (Explicitly Routed Label Switched
domains, 22 Path), 255
downstream label allocation, 214 ERP (Enterprise Resource Planning), 190
drop probability, 132135 exceed-action statement, 47
dropper function, 26 Excess Infromation Rate (EIR), 187
DRR (Deficit Round Robin), 84, 106 execution, order of policies, 284
DS byte (differentiated services byte), 22 expanded traffic capability, LANs, 200203
DSBM (designated SBM), 204 Expedited Forwarding (EF), 25
DSCP (Differentiated Services Code Point), Explicitly Routed Label Switched Path (ER-
12, 22, 85, 323325 LSP), 255
DTE (data terminal equipment), 187 extensions
DTS (Distributed Traffic Shaping), 55 IGP routing protocol, 257
DWFQ ( distributed WFQ), 8384 RSVP, 205
QoS-based, 90
ToS-based, 8990
F
E fair allocation schemes, 6970
Fair Queuing (FQ), 67
early packet discard (EPD), 173174 fair-queue command, 86, 158
ECN routers, 139 fair-queue interface command, 80
EF (Expedited Forwarding), 25 fair-queue qos-group command, 91
EIR (Excess Information Rate), 187 fast switching, 287
election, DSBM, 205 FCS (Frame Check Sequence), 186
enabling FDDI (Fiber Distributed Data Interface), 52
Frame relay fragmentation, 199 FECN (Forward Explicit Congestion
Nagle Algorithm, 315 Notification), 54, 186187
Path MTU Discovery, 316 Frame Relay case study, 194195
polices, 282283 integration, 189
encapsulation, labels, 216217 FF (Fixed Filter), 152
end-tend guaranteed bandwith, 240 Fiber Distributed Data Interface (FDDI), 52
end-to-end bandwidth, reserving with, FIFO (first-in, first-out), 9, 6869
157160, 162 File Transfer Protocol (FTP), 6, 155, 283
end-to-end IP QoS, 225 first-in, first-out (FIFO), 9, 6869
end-to-end QoS, 15 Fixed Filter (FF), 152
enforcing flow WRED, 136138
IP Precedence setting case study, 41 flow-based DWFQ (distributed WFQ), 8385
public exchange point traffic, 5253 flow-based WFQ, 7578
engineering traffic, 214 case study, 8082
enhancing link utilization with WRED, strict priorty queue for voice, 100
133135 FlowSpec, 152
Enterprise Resource Planning (ERP), 190 formats, ATM cells, 169170
EPD (early parket discard), 173174
1163_09IX.fm Page 333 Friday, December 15, 2000 1:01 PM
I interworking
ATM, 178179
ICMP (Internet Control Message Protocol), 51 Frame Relay, 192
identification function, 3435 intra-feature execution orders, 284
IEEE (Institute of Electronic Engineers), 14, intserv, 9, 21
200203 IP (Internet Protocol), 30
IETF (Internet Engineering Task Force), 9, 21 flow-based WFQ, 7578
IGPs (Interior Gateway Protocols), 257, 302 Precedence, 3435, 323325
ILMI (Interim Local Manement Interface), 179 case study, 3741
implementation DSCP, 36
MDRR, 119, 121 limiting traffic, 4950
MWRR, 112113 mapping Frame Relay DE bits, 198
queuing, 68 QPPB, 39
traffic measurement, 44, 54 networks, 302303
WFQ, 7980 priority queuing case study, 92
incoming traffic, 307309 QoS history, 89
initialization, SBM, 204 services, 50
input, interface queues, 119 ip cef command, 219, 294
Institute of Electronic Engineers (IEEE), 14, ip local policy route-map command, 306
200203 ip policy route-map command, 306
integrated services, 155 ip priority queue, 142
Integrated Services (intserv), 9 ip route-cache flow accelerate command, 303
Integrated Services over Specifiuc Lower ip rsvp reservation command, 159
Layers (ISSLL), 203 ip rsvp sender command, 158
integration, BECN/FECN, 189 ip rtp priority command, 99, 200
interfaces ip spd headroom command, 142
network boundary, 281 ip spd mode aggressive command, 142
policies, 282 ip tcp path-mtu-discovery command, 316
virtual-template, 321 IR (Internet Router), 303
Interim Local Management Interface IS-IS (Intermediate System-to-Intermediate
(ILMI), 179 System), 302
Interior Gateway Protocols (IGPs), 302 IS-IS modifications, 258
Intermediate System-to-Intermediate System ISP (Internet service provider), 113, 291
(IS-IS), 302 backbone, 240
Internet Control Message Protocol (ICMP), 51 CEF based packet forwarding, 291297
Internet Engineering Task Force (IETF), 9, 21 ISSLL (Integrated Services over Specific
Internet Router (IR), 303 Lower Layers), 203
Internet service provider (ISP), 113, 291 its, 306
inter-policy feature ordering, 284
1163_09IX.fm Page 335 Friday, December 15, 2000 1:01 PM
J-K-L M
jitter MAC (Media Access Control), 280
minimum, 121123 maintenance, TE path, 256
packet, 10 management
policies, 301305
Label Edge Router (LER), 212 queues, 130132
Label Information Base (LIB), 212 traffic rate, 4248
Label Switched Path (LSP), 247 managers, policy, 29
Label Switching Router (LSR), 212 map commands, 295
labels mapping, Frame Relay, 198
binding, 213216 mark probability denominator, 132
downstream distribution, 219222 marking packets, 3439
encapsulation, 216217 match access-group 101 command, 136
switching, 14 match commands, 305
LANs (local-area networks), 200203 match subcommand, 280
Layer 2 matching traffic specification, 44
ATM, 169170184 max-min fair-share allocation scheme, 6970
overlay model, 247248 max-reserved bandwidth command, 99
QoS technologies, 14 maximum burst size (MBS), 172
LER (Label Edge Router), 212, 227 maximum segment size (MSS), 127
levels of QoS, 6 MBS (maximum burst size), 172
LIB (Label Information Base), 212 MCR (Minimum Cell Rate), 173
limiting traffic, 4850 MDRR (Modified Deficit Round Robin),
link admission control, TE tunnels, 256 114121
link resource attributes, 253 measuring traffic, 42, 54
Link State Advertisement (LSA), 257 mechanisms, voice traffic, 98101
links, enhancing utilization, 133135 Media Access Control (MAC), 280
local link resource information, media support, RSVP, 156
distribution of, 254 messages, RSVP, 151
lookups, process switching, 287 metering function, 26
loss, packet, 11 Minimum Cell rate (MCR), 173
low-latency queues, 98, 115 MINCIR (minimum CIR), 188
LSA (Link State Advertisement), 257 minimum jitter, 121123
LSP (Label Switched Path), 247 MLPP (Multilink Point-to-Point), 99
LSR (Label Switching Router), 212, 227 models, Layer 2 overlay, 247248
1163_09IX.fm Page 336 Friday, December 15, 2000 1:01 PM
336 modes
precedence 337
OSPF (Open Shortest Path First), 258, 289 PEP (Policy Enforcement Point), 30
outgoing traffic, 307309 performance, 911. See also optimization
output, interface queue, 119 Per-INterface Rate Configuration (PIRC), 48
permanent virtual circuits (PVCs), 54, 169
permium services, EF PHB, 27
P per-VC WFQ, 198
PHB (per-hop behavior), 2628
packets MDRR, 114121
cell discard strategies, 173174 MWRR, 105113
classification, 3439 resource allocation policy, 2830
classifier and marker, 12 PIRC (Per-Interface Rate Configuration), 48
coloring, 34 Point-to-Point Protocol (PPP), 319
delay and jitter, 10 Points of Presence (POP), 258
differentied IP discards, 180 policies
drop policy, 13 action, 47
drop probability, 132135 applications, 282283
FIFO queuing, 6869 defining, 281
flow WRED, 136138 hierarchical, 283
Frame Relay, 190191 incoming traffic, 307309managers, 29
label encapsulation, 216217 order of execution, 284
loss, 11 outgoing traffic, 307309
marking, 34 packet drop, 13
Path MTU Discovery, 315 path selection, 254
prioritization, 93 resource allocation, 2830
priority queuing, 9192 routing, 301305
process switching, 287 policing traffic, 4348
route-cache forwarding, 287288 Policy Enforcement Point (PEP), 30
scheduling, 106 Policy-Based Routing (PBR), 35, 302303
size, 305 policy-map command, 281
TE trunks, 251 policy-map wred command, 136
traffic shaping, 5360 POP (Points of Presence), 258
WRR (Weighted Round Robin), 106108 popping labels, 222
parameters, 93 PPD (partial packet discard), 173174
partial packet discard (PPD), 173174 PPP (Point-to-Point Protocol), 319
PATH messages, 151 ppp mulitlink command, 321
Path MTU Discovery, 315 ppp multilink fragement-delay 20
Path State Block (PSB), 151, 205 command, 321
paths ppp mulitlink interleave command, 321
selection, 253254 PQ (priority queue), 83
TE maintenance, 256 precedence, 3435, 303304, 323325.
Payload Type Identifier (PTI), 171 See also IP (Internet Protocol)
PBR (Policy-Based Routing), 35, 38
PCR (Peak Cell Rate), 172
1163_09IX.fm Page 338 Friday, December 15, 2000 1:01 PM
R route-cache
forwarding, 287288
random-detect command, 134 switching, 297298
random-detect flow average-depth-factor 6 route-map command, 280
command, 138 route-target import 200colon1 command
random-detect flow command, 138 (change in edit), 233
random-detect flow count command, 138 routers
rate-limiting function, 4345, 4748 ABR PVC, 175
rates backbone, 291297
access, 5660 class-based WFQ, 8586
ICMP, 51 ECN, 139
traffic, 4850 head-end, 255
traffic management, 4248 Layer 2 overlay model, 247248
RD (route distinguisher), 228 packets, 34
Real-time Transfer Control Protocol Provider Edge, 228
(RTCP), 313 RSVP-related configuration, 159
Real-time Transport Protocol (RTP), 99, 313 SPD, 139140
Real-Time Variable Bit Rate (RT-VBR), 172 support, 6772
receivers TE trunks, 251
controlled load services, 155 traffic rate management, 4248
guaranteed bit rate services, 155 routing, 13
RED, congestion avoidance, 130132 IGP protocol extensions, 257
req-qos guaranteed-delay command, 163 IP precedence, 303304
reservation styles, RSVP, 152162 packet size, 305
resilience, TE tunnels, 253 policy-based, 302303
resource allocation, 12 QoS, 301305
resource allocation policy, 2830 RRR (Routing by Resource Reservation), 247
resource class affinity, TE tunnel, 252 249, 253
resource class attribute, 253 RSVP (Resource Reservation Protocol), 21,
Resource Management (RM), 173 313, 147
Resource Reservation Protocol (RSVP), case study, 157163
21, 313 controlled load service, 155
RESV (RSVP reservation request), 79 extensions, 205
retransmit timer (RTT), 128 guaranteed bit rate service, 155
RM (Resource Management), 173 media support, 156
round numbers, 72 resrvation styles, 152
round-robin, byte-by-byte GPS Scheduler scalability, 156
simulation, 7374 TE-RSVP, 256257
route distinguisher (RD), 228 WFQ interaction, 79
at VPN sites only, 239
RSVP reservation request (RESV), 79
1163_09IX.fm Page 340 Friday, December 15, 2000 1:01 PM
T levels of QoS, 6
limiting, 4950
table-a slot-table cos command, 122 measuring, 42, 54
tables, adjacency, 289 mechanisms, 99
tail drop, 13 multimedia, 147151
TCP (Tranmission Control Protocol), 127128, network boundary conditioners, 2526
283, 315 outgoing, 307309
congestion, 139 PHB (per-hop behavior), 26
slow start avoidance, 127128 policing, 4348
traffic, 129 priority queuing, 9192
TCP/IP (Transmission Control Protocol/ public exchange point, 5253
Internet Protocol), 8, 316 QPPB, 39
TCP-police-hierarchical policies, 284 rate management, 4248
TE (traffic engineering), 258, 302 rates, 4850
path, 256 RRR, 248249
trunks, 251 RTP, 313
tunnels, 251272 shaping, 5354
telent-ftp-police policies, 284 case study, 5660
TE-RSVP, 256257 Frame Relay, 6061
thresholds, packet drop probability, 132 SPD, 139140
Time-to-Live (TTL), 272 TCP behavior in tail-drop scenario, 129
tinygrams, 315 voice
token buckets, 4244, 54 configuring, 121123
tools, routing decisions, 301305 scheduling mechanisms, 98101
ToS (Type of Service), 22, 112113 VP shaping, 174
ToS-based DWFQ, 8990 Web hosting services, 51
traffic traffic engineering (TE), 302
action policies, 47 traffic matching specification, 44
ATM QoS, 172 traffic shaping (TS), 42
bandwidth assignment per ToS class, 88 traffic-shape fecn-adapt command, 190
bumping, 179 traffic-shape_adaptive command, 190
CAR, 3740 Transmission Control Protocol. See TCP
class definition, 280 Transmission Control Protocol/Internet
custom queuing, 9497 Protocol. See TCP/IP
engineering, 214 troubleshooting
flow-based WFQ, 8082 performance, 9, 11
Frame Relay, 186189 RSVP, 156
higher bandwidth allocation case study, scheduling, 67 72
8687 TE path, 256
incoming, 307309 traffic rate management, 4248
IP precedence, 303304 trunks, TE, 251
LANs, 200203 TS (traffic shaping), 42
Layer 2 overlay model, 247248 TTL (Time-to-Live), 272
1163_09IX.fm Page 343 Friday, December 15, 2000 1:01 PM
W
U
WAN (wide area network), 185199, 319
UBR (Unspecified Bit Rate), 173 Web hosting services, 51
UDP (User Datagram Protocol), 151, 302 Weighted Fair Queuing. See WFQ
under, 301 weighted flow classes, GPS, 7172
UNI (User-to-Network Interface), 169170 Weighted Random Early Detection.
unicast traffic, 148 See WRED
Universal Resource Locator (URL), 34 Weighted Round Robin (WRR), 105108
Unspecified Bit Rate (UBR), 173 WF (Wildcard Filter), 153
upstream label allocation, 216 WFQ (weighted Fair Queuing), 8385,
URL (Universal Resource Locator), 34 279, 320
User Datagram Protocol (UDP), 151, 302 case study, 8283
User-to-Network Interface (UNI), 169170 class-based, 8586
flow-based, 7578
bandwidth allocation case study, 82
V case study, 8081
strict priority queue for voice, 100
values implmenting, 7980
quantum, 114 per-VC WFQ, 198
SOO, 228 sequence number computation-based,
VBR (Variable Bit Rate), 172173 7273
VC (virtual circuit), 185199, 279 WG (Working Group), 21
VCI (virtual channel identifier), 170 wide area network (WAN), 185199, 319
Versatile Interface Processor (VIP), 48, 83 Wildcard Filter (WF), 153
video, 147151, 313. See also RSVP Working Group (WG), 21
VIP (Versatile Interface Processor), 48, 83 WRED (Weighted Random Early
virtual channel identifier (VCI), 170 Detection), 279
virtual circuit (VC), 185199, 279 congestion avoidance, 133135
virtual path (VP), 170 flow WRED, 136138
Virtual Private Network. See VPN WRR (Weighted Round Robin), 105108
virtual-template interface, 321
vofr command, 200
1163_09IX.fm Page 344 Friday, December 15, 2000 1:01 PM
supportBM.qxd 12/15/00 1:05 PM Page 345
CCIE Professional
Development
Routing TCP/IP, Volume I
Jeff Doyle, CCIE
1-57870-041-8 AVAILABLE NOW
Routing TCP/IP, Volume I, takes the reader from a basic understanding of
routers and routing protocols through a detailed examination of each of the IP
interior routing protocols. Learn techniques for designing networks that maxi-
mize the efficiency of the protocol being used. Exercises and review questions
provide core study for the CCIE Routing and Switching exam.
www.ciscopress.com
supportBM.qxd 12/15/00 1:05 PM Page 346
CCIE Professional
Development
Cisco LAN Switching
Kennedy Clark, CCIE; Kevin Hamilton, CCIE
1-57870-094-9 AVAILABLE NOW
This volume provides an in-depth analysis of Cisco LAN switching technologies,
architectures, and deployments, including unique coverage of Catalyst network
design essentials. Network designs and configuration examples are incorporated
throughout to demonstrate the principles and enable easy translation of the
material into practice in production networks.
www.ciscopress.com
supportBM.qxd 12/15/00 1:05 PM Page 347
www.ciscopress.com
supportBM.qxd 12/15/00 1:05 PM Page 348
OpenCable Architecture
Michael Adams
1-57870-135-X AVAILABLE NOW
This book explains key concepts in practical terms. It describes the digital head-
end, optical transport, distribution hub, hybrid-fiber coax, and set-top terminal
equipment and how these components are interconnected. Whether you are a
television, data communications, or telecommunications professional, or an
interested layperson. OpenCable Architecture will help you understand the tech-
nical and business issues surrounding interactive television services. It will pro-
vide you with an inside look a the combined efforts of the cable, data, and con-
sumer electronics industries to develop those new services.
www.ciscopress.com
supportBM.qxd 12/15/00 1:05 PM Page 349
www.ciscopress.com
supportBM.qxd 12/15/00 1:05 PM Page 350
www.ciscopress.com
supportBM.qxd 12/15/00 1:05 PM Page 351
IP Routing Fundamentals
Mark Sportack
1-57870-071-X AVAILABLE NOW
IP Routing Fundamentals is the definitive introduction to routing in IP net-
works. This comprehensive guide explores the mechanics of routers, routing
protocols, network interfaces, and operating systems. This reference provides
essential background information for network professionals who are deploying
and maintaining LANs and WANs, as well as IT managers who are seeking
information on how evolving internetworking technology will affect future net-
works.
www.ciscopress.com
supportBM.qxd 12/15/00 1:05 PM Page 352