Tutorial On Bridges, Routers, Switches, Oh My!: Radia Perlman
Tutorial On Bridges, Routers, Switches, Oh My!: Radia Perlman
Why?
Demystify this portion of networking, so people dont drown in the alphabet soup Think about these things critically N-party protocols are the most interesting Lots of issues are common to other layers You cant design layer n without understanding layers n-1 and n+1
2
Outline
layer 2 issues: addresses, multiplexing, bridges, spanning tree algorithm layer 3: addresses, neighbor discovery, connectionless vs connection-oriented
Routing protocols
Distance vector Link state Path vector
4
Definitions
Repeater: layer 1 relay
10
Definitions
Repeater: layer 1 relay Bridge: layer 2 relay
11
Definitions
Repeater: layer 1 relay Bridge: layer 2 relay Router: layer 3 relay
12
Definitions
Repeater: layer 1 relay Bridge: layer 2 relay Router: layer 3 relay OK: What is layer 2 vs layer 3?
13
Definitions
Repeater: layer 1 relay Bridge: layer 2 relay Router: layer 3 relay OK: What is layer 2 vs layer 3?
The right definition: layer 2 is neighborneighbor. Relays should only be in layer 3!
14
Definitions
Repeater: layer 1 relay Bridge: layer 2 relay Router: layer 3 relay OK: What is layer 2 vs layer 3? True definition of a layer n protocol: Anything designed by a committee whose charter is to design a layer n protocol
15
The world got confused. Built on layer 2 I tried to argue: But you might want to talk from one Ethernet to another! Which will win? Ethernet or DECnet?
16
Layer 3 packet
source
dest
hops
data
Layer 3 header
17
Ethernet packet
source
dest
data
Ethernet header
18
Assigned in blocks of 224 Given 23-bit constant (OUI) plus g/i bit all 1s intended to mean broadcast
19
Horrible terminology
Local area net Subnet Ethernet Internet
21
22
Problem Statement
Need something that will sit between two Ethernets, and let a station on one Ethernet talk to another
23
Basic idea
Listen promiscuously Learn location of source address based on source address in packet and port from which packet received Forward based on learned location of destination
24
25
B1
B2
B3
26
B1
B2
B3
27
B1
B2
B3
28
B1
B2
B3
29
B1
B2
B3
30
31
Algorhyme
I think that I shall never see A graph more lovely than a tree. A tree whose crucial property Is loop-free connectivity. A tree which must be sure to span So packets can reach every LAN. First the Root must be selected By ID it is elected. Least cost paths from Root are traced In the tree these paths are placed. A mesh is made by folks like me. Then bridges find a spanning tree. Radia Perlman
32
A
2,1,6 2,2,11
X
2,3,3
11 7 9
2,1,7
6
2,0,2
3
2,2,4
10 4 14
2
2,0,2
33
34
35
37
So what is Ethernet?
CSMA/CD, right? Not any more, really... source, destination (and no hop count) limited distance, scalability (not any more, really)
38
Switches
Ethernet used to be bus Easier to wire, more robust if star (one huge multiport repeater with pt-to-pt links If store and forward rather than repeater, and with learning, more aggregate bandwidth Can cascade devicesdo spanning tree Were reinvented the bridge!
39
data
40
When I started
Layer 3 had source, destination addresses Layer 2 was just point-to-point links (mostly) If layer 2 is multiaccess, then need two headers:
Layer 3 has ultimate source, destination Layer 2 has next hop source, destination
41
42
R3
S S:
R3
S R1:
R3
S R2:
R3
S R3:
A routing algorithm
Exchange information with your neighbors Collectively compute routes with all rtrs Compute a forwarding table
47
Network Layer
connectionless fans designed IPv4, IPv6, CLNP, IPX, AppleTalk, DECnet Connection-oriented reliable fans designed X.25 Connection-oriented datagram fans designed ATM, MPLS
48
49
Connection-oriented Nets
S
8
R1
7
92
R3
2 3 4
R2
4
R4
2
R5
3
VC=8, 92, 8, 6
50
51
52
S3
MPLS
Originally for faster forwarding than parsing IP header later traffic engineering classify pkts based on more than destination address
54
Addresses
802 address flat, though assigned with OUI/rest. No topological significance layer 3 addresses: locator/node : topologically hierarchical address interesting difference:
IPv4, IPv6, IPX, AppleTalk: locator specific to a link CLNP, DECnet: locator area, whole campus
56
2* 23*
57
58
59
Distance Vector
Know
your own ID how many cables hanging off your box cost, for each cable, of getting to nbr
cost 3 cost 2 j k I am 4 n m cost 2 cost 7
60
cost 3 cost 2
j k I am 4
m n
cost 2 cost 7
distance vector rcvd from cable j cost 3 12 3 15 3 12 5 3 18 0 7 distance vector rcvd from cable k cost 2 5 8 3 2 10 7 4 20 5 0 distance vector rcvd from cable m cost 2 0 5 3 2 19 9 5 22 2 4 distance vector rcvd from cable n cost 7 6 2 0 7 8 5 8 12 11 3 your own calculated distance vector 2 6 5 0 12 8 6 19 3 ? your own calculated forwarding table m j m 0 k j k/j n j ? 15 15 7 2 ? ?
61
cost 3 cost 2
j k I am 4
m n
cost 2 cost 7
distance vector rcvd from cable j cost 3 12 3 15 3 12 5 3 18 0 7 distance vector rcvd from cable k cost 2 5 8 3 2 10 7 4 20 5 0 distance vector rcvd from cable m cost 2 0 5 3 2 19 9 5 22 2 4 distance vector rcvd from cable n cost 7 6 2 0 7 8 5 8 12 11 3 your own calculated distance vector 2 6 5 0 12 8 6 19 3 ? your own calculated forwarding table m j m 0 k j k/j n j ? 15 15 7 2 ? ?
62
cost 3 cost 2
j k I am 4
m n
cost 2 cost 7
distance vector rcvd from cable j cost 3 12 3 15 3 12 5 3 18 0 7 distance vector rcvd from cable k cost 2 5 8 3 2 10 7 4 20 5 0 distance vector rcvd from cable m cost 2 0 5 3 2 19 9 5 22 2 4 distance vector rcvd from cable n cost 7 6 2 0 7 8 5 8 12 11 3 your own calculated distance vector 2 6 5 0 12 8 6 19 3 ? your own calculated forwarding table m j m 0 k j k/j n j ? 15 15 7 2 ? ?
63
cost 3 cost 2
j k I am 4
m n
cost 2 cost 7
distance vector rcvd from cable j cost 3 12 3 15 3 12 5 3 18 0 7 distance vector rcvd from cable k cost 2 5 8 3 2 10 7 4 20 5 0 distance vector rcvd from cable m cost 2 0 5 3 2 19 9 5 22 2 4 distance vector rcvd from cable n cost 7 6 2 0 7 8 5 8 12 11 3 your own calculated distance vector 2 6 5 0 12 8 6 19 3 ? your own calculated forwarding table m j m 0 k j k/j n j ? 15 15 7 2 ? ?
64
cost 3 cost 2
j k I am 4
m n
cost 2 cost 7
distance vector rcvd from cable j cost 3 12 3 15 3 12 5 3 18 0 7 distance vector rcvd from cable k cost 2 5 8 3 2 10 7 4 20 5 0 distance vector rcvd from cable m cost 2 0 5 3 2 19 9 5 22 2 4 distance vector rcvd from cable n cost 7 6 2 0 7 8 5 8 12 11 3 your own calculated distance vector 2 6 5 0 12 8 6 19 3 ? your own calculated forwarding table m j m 0 k j k/j n j ? 15 15 7 2 ? ?
65
cost 3 cost 2
j k I am 4
m n
cost 2 cost 7
distance vector rcvd from cable j cost 3 12 3 15 3 12 5 3 18 0 7 distance vector rcvd from cable k cost 2 5 8 3 2 10 7 4 20 5 0 distance vector rcvd from cable m cost 2 0 5 3 2 19 9 5 22 2 4 distance vector rcvd from cable n cost 7 6 2 0 7 8 5 8 12 11 3 your own calculated distance vector 2 6 5 0 12 8 6 19 3 ? your own calculated forwarding table m j m 0 k j k/j n j ? 15 15 7 2 ? ?
66
Looping Problem
A B C
67
Looping Problem
A 2 B 1 C 0 Cost to C
68
Looping Problem
direction towards C A 2 B 1 direction towards C C 0 Cost to C
69
Looping Problem
A 2 B 1 C 0 Cost to C
70
Looping Problem
A 2 B 1 3 C 0 Cost to C
71
Looping Problem
direction towards C A 2 direction towards C B 1 3 C 0 Cost to C
72
Looping Problem
direction towards C A 2 4 direction towards C B 1 3 C 0 Cost to C
73
Looping Problem
direction towards C A 2 4 direction towards C B 1 3 5 C 0 Cost to C
74
C N M V H
75
D
76
Broadcast LSPs to all rtrs (a miracle occurs) Store latest LSP from each rtr Compute Routes (breadth first, i.e., shortest path firstwell known and efficient algorithm)
77
A 2 D
B 1 2 E
C 2 F
5 G 1
A B/6 D/2
D A/2 E/2
G C/5 F/1
78
Computing Routes
Edsgar Dijkstras algorithm:
calculate tree of shortest paths from self to each also calculate cost from self to each Algorithm: step 0: put (SELF, 0) on tree step 1: look at LSP of node (N,c) just put on tree. If for any nbr K, this is best path so far to K, put (K, c+dist(N,K)) on tree, child of N, with dotted line step 2: make dotted line with smallest cost solid, go to step 1
79
F(2)
80
F(2)
81
F(2)
F(2) G(3)
83
F(2) G(3)
84
F(2) G(3)
85
F(2) G(3)
F(2) G(3)
F(2) G(3)
F(2) G(3)
C(0)
F(2) G(3)
A(7)
90
C(0)
F(2) G(3)
A(7)
91
Were done!
A B/6 D/2 B A/6 C/2 E/1 C B/2 F/2 G/5 D A/2 E/2 E B/1 D/2 F/4 F C/2 E/4 G/1 G C/5 F/1
C(0)
F(2) G(3)
A(7)
92
93
A miracle occurs
First link state protocol: ARPANET I wanted to do something similar for DECnet My manager said Only if you can prove its stable Given a choice between a proof and a counterexample
94
Routing Robustness
I showed how to make link state distribution self-stabilizingbut only after the sick or evil node was disconnected Later, my thesis was on how to make the routing infrastructure (not just the routing protocol), robust while sick and evil nodes are participatingand its not that hard
95
Interdomain
BGP
97
Other policies: dont tell nbr about D, or lie to nbr about D making path look worse
98
Path vector
Each rtr R reports {(D,list of ASs in Rs chosen path to D)} Each rtr chooses best path based on configured policies
99
BGP Configuration
path preference rules which nbr to tell about which destinations how to edit the path when telling nbr N about prefix P (add fake hops to discourage N from using you to get to P)
100
Bridge meltdowns
They do occur (a Boston hospital) Lack of receipt of spanning tree msgs tells bridge to turn on link So if too much traffic causes spanning tree messages to get lost
loops exponential proliferation of looping packets
Rbridges
Compatible with todays bridges and routers Like routers, terminate bridges spanning tree Like bridges, glue LANs together to create one IP subnet (or for other protocols, a broadcast domain) Like routers, optimal paths, fast convergence, no meltdowns Like bridges, plug-and-play
Rbridging layer 2
Link state protocol among Rbridges (so know how to route to other Rbridges) Like bridges, learn location of endnodes from receiving data traffic But since traffic on optimal paths, need to distinguish originating traffic from transit So encapsulate packet
Rbridging
R4 R1 R7 c
R5
R3 R6 a R2
Encapsulation Header
S=Xmitting Rbridge D=Rcving Rbridge pt=transit hop count dest RBridge original pkt (including L2 hdr)
Outer L2 hdr must not confuse bridges So its just like it would be if the Rbridges were routers Need special layer 2 destination address for unknown or multicast layer 2 destinations can be L2 multicast, or any L2 address provided it never gets used as a source address
R2
R7
R2
Actually can be: bridged LAN
R4
R7
Endnode Learning
On shared link, only one Rbridge (DR) can learn and decapsulate onto link
otherwise, a naked packet will look like the source is on that link have election to choose which Rbridge
When DR sees naked pkt from S, announces S in its link state info to other Rbridges
114
VLANs
VLAN is a broadcast domain So a VLAN A packet must only be forwarded to VLAN A links RBridges must announce which VLANs they connect to RBridges must be able to flood a VLAN A pkt to just VLAN A links
could do it with one spanning tree, and just not send on non-A links or one spanning tree, and filter if no A-links downstream or per-VLAN spanning tree
115
VLANs
VLAN A endnodes only need to be learned by RBridges attached to VLAN A All RBridges must be able to forward to any other RBridge Egress RBridge in the encapsulation header
116
Conclusions
Looks to routers like a bridge
invisible, plug-and-play
Wrap-up
folklore of protocol design things too obvious to say, but everyone gets them wrong
118
Forward Compatibility
Reserved fields
spare bits ignore them on receipt, set them to zero. Can maybe be used for something in the future
TLV encoding
type, length, value so can skip new TLVs maybe have range of Ts to ignore if unknown, others to drop packet
119
Forward Compability
Make fields large enough
IP address, packet identifier, TCP sequence #
Version number
what is new version vs new protocol?
same lower layer multiplex info
Version #
Nobody seems to do this right IKEv1, SSL, even IP, unspecified what to do if version # different. Most implementations ignore it. SSL v3 moved version field!
v2 sets it to 0.2. v3 sets (different field) to 3.0. v2 node will ignore version number field, and happily parse the rest of the packet
122
Parameters
Minimize these:
someone has to document it customer has to read documentation and understand it
How to avoid
architectural constants if possible automatically configure if possible
124
Settable Parameters
Make sure they cant be set incompatibly across nodes, across layers, etc. (e.g., hello time and dead timer) Make sure they can be set at nodes one at a time and the net can stay running
125
Parameter tricks
IS-IS pairwise parameters reported in hellos area-wide parameters reported in LSPs Bridges Use Roots values, sent in spanning tree msgs
126
Summary
If things arent simple, they wont work Good engineering requires understanding tradeoffs and previous approaches. Its never a waste of time to answer why is something that way Dont believe everything you hear Know the problem youre solving before you try to solve it!
127