Cumulus Linux 3.7.2 User Guide
Cumulus Linux 3.7.2 User Guide
7
User Guide
Table of Contents
Cumulus Linux User Guide
Table of Contents
What is Cumulus Linux? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
What's New in Cumulus Linux 3.7.2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
What's New in Cumulus Linux 3.7.1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
What's New in Cumulus Linux 3.7.0 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
Open Source Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
Hardware Compatibility List . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
Installation Management . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
Managing Cumulus Linux Disk Images . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
Determine the Switch Platform . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
Reprovision the System (Restart the Installer) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
Uninstall All Images and Remove the Configuration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
Boot into Rescue Mode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
Inspect Image File Contents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
Related Information . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
Installing a New Cumulus Linux Image . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
Contents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
Install Using a DHCP/Web Server with DHCP Options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
Install Using a DHCP/Web Server without DHCP Options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
Install Using a Web Server with no DHCP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
Install Using FTP Without a Web Server . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
Install Using a Local File . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
Install Using a USB Drive . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
Related Information . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
Upgrading Cumulus Linux . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
Contents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
Before You Upgrade Cumulus Linux . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
Upgrade Cumulus Linux . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
Upgrade Switches in an MLAG Pair . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
Roll Back a Cumulus Linux Installation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
Third Party Packages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
Related Information . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
Using Snapshots . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
cumulusnetworks.com 2
Cumulus Linux User Guide
Using Snapshots . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
Contents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
Install the Snapshot Package . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
Take and Manage Snapshots . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
Roll Back to Earlier Snapshots . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
Configure Automatic Time-based Snapshots . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
Caveats and Errata . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
Adding and Updating Packages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
Contents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
Update the Package Cache . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
List Available Packages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
List Installed Packages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
Display the Version of a Package . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
Upgrade Packages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
Add New Packages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
Add Packages from Another Repository . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
Cumulus Supplemental Repository . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
Related Information . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
Zero Touch Provisioning - ZTP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
Contents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
Zero Touch Provisioning Using a Local File . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
Zero Touch Provisioning Using a USB Drive (ZTP-USB) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
Zero Touch Provisioning over DHCP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
Write ZTP Scripts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
Best Practices for ZTP Scripts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
Test ZTP Scripts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80
Common ZTP Script Errors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84
Manually Use the ztp Command . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86
Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
System Configuration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
Network Command Line Utility - NCLU . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88
Contents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88
Install NCLU . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
NCLU Basics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
Configure User Accounts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94
Edit the netd.conf File . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96
Restart the netd Service . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96
Back Up the Configuration to a Single File . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96
Advanced Configuration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97
Setting Date and Time . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98
Contents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98
Set the Time Zone . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99
Set the Date and Time . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100
Set the Time Using NTP and NCLU . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101
cumulusnetworks.com 3
Cumulus Linux User Guide
cumulusnetworks.com 4
Cumulus Linux User Guide
Contents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 212
cumulusnetworks.com 5
Cumulus Linux User Guide
Contents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 289
Configure the DHCP Server on Cumulus Linux Switches . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 289
Assign Port-Based IP Addresses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 290
Troubleshooting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 291
Facebook Voyager Optical Interfaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 291
Contents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 291
The Voyager Platform . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 292
Configure the Voyager Ports . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 295
Configure the Transponder Modules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 296
802.1X Interfaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 328
Contents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 328
Supported Features and Limitations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 329
Install the 802.1X Package . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 329
Configure 802.1X Interfaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 330
Configure the Linux Supplicants . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 333
Configure Accounting and Authentication Ports . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 334
Configure MAC Authentication Bypass . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 335
Configure a Parking VLAN . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 336
Configure Dynamic VLAN Assignments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 338
RADIUS Change of Authorization and Disconnect Requests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 340
Troubleshooting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 344
Configure the RADIUS Server . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 347
Prescriptive Topology Manager - PTM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 348
Contents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 348
Supported Features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 348
Configure PTM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 349
Basic Topology Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 349
ptmd Scripts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 350
Configuration Parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 350
Bidirectional Forwarding Detection (BFD) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 353
Check Link State with FRRouting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 354
ptmd Service Commands . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 354
ptmctl Commands . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 355
Caveats and Errata . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 358
Related Information . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 358
Layer 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 359
Spanning Tree and Rapid Spanning Tree . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 360
Contents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 360
Supported Modes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 360
View Bridge and STP Status and Logs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 361
Customize Spanning Tree Protocol . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 365
Related Information . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 378
Link Layer Discovery Protocol . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 378
Contents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 378
cumulusnetworks.com 6
Cumulus Linux User Guide
cumulusnetworks.com 7
Cumulus Linux User Guide
Contents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 459
LACP Bypass All-active Mode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 459
Configure LACP Bypass . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 460
Virtual Router Redundancy - VRR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 462
Contents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 464
Configure a VRR-enabled Network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 464
Example VRR Configuration with MLAG . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 465
ifplugd . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 469
IGMP and MLD Snooping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 471
Contents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 472
Configure IGMP/MLD Querier . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 472
Disable IGMP and MLD Snooping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 473
Troubleshooting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 474
Related Information . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 476
cumulusnetworks.com 8
Cumulus Linux User Guide
cumulusnetworks.com 9
Cumulus Linux User Guide
Layer 3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 701
Routing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 702
Contents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 702
Manage Static Routes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 702
Configure a Gateway or Default Route . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 705
Supported Route Table Entries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 705
Caveats and Errata . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 708
Related Information . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 710
Introduction to Routing Protocols . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 710
Contents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 710
Routing Protocols . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 710
Configure Routing Protocols . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 710
Protocol Tuning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 711
Network Topology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 711
Contents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 711
Clos Topologies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 712
Over-Subscribed and Non-Blocking Configurations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 712
Containing the Failure Domain . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 713
Load Balancing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 713
FRRouting Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 713
Contents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 713
Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 714
About zebra . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 714
Related Information . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 714
Upgrading from Quagga to FRRouting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 714
Configuring FRRouting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 719
Contents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 719
Configure FRRouting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 719
Interface IP Addresses and VRFs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 722
FRRouting vtysh Modal CLI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 722
Reload the FRRouting Configuration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 727
FRR Logging . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 727
Caveats . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 728
Related Information . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 728
Comparing NCLU and vtysh Commands . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 729
Address Resolution Protocol - ARP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 731
cumulusnetworks.com 10
Cumulus Linux User Guide
cumulusnetworks.com 11
Cumulus Linux User Guide
Troubleshooting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 789
Enable Read-only Mode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 795
Apply a Route Map for Route Updates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 795
Protocol Tuning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 796
Caveats and Errata . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 798
Policy-based Routing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 799
Contents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 800
Configure PBR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 800
Configuration Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 802
Review Your Configuration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 803
Delete PBR Rules and Policies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 804
Bidirectional Forwarding Detection - BFD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 805
Contents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 805
BFD Multihop Routed Paths . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 806
BFD Parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 806
Configure BFD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 806
BFD in BGP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 807
BFD in OSPF . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 808
OSPF Show Commands . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 808
Scripts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 810
Echo Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 810
Troubleshooting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 811
Related Information . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 812
Equal Cost Multipath Load Sharing - Hardware ECMP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 812
Contents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 812
Equal Cost Routing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 813
ECMP Hashing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 813
Resilient Hashing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 817
Redistribute Neighbor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 821
Contents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 821
Availability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 822
Target Use Cases and Best Practices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 822
How It Works . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 822
Example Configuration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 822
Known Limitations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 827
Troubleshooting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 827
Virtual Routing and Forwarding - VRF . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 830
Contents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 831
Configure VRF . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 832
VRF Route Leaking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 835
FRRouting Operation in a VRF . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 839
Example Commands to Show VRF Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 842
BGP Unnumbered Interfaces with VRF . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 852
DHCP with VRF . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 854
ping or traceroute on a VRF . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 858
cumulusnetworks.com 12
Cumulus Linux User Guide
cumulusnetworks.com 13
Cumulus Linux User Guide
cumulusnetworks.com 14
Cumulus Linux User Guide
Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1100
©2018 Cumulus Networks. All rights reserved
CUMULUS, the Cumulus Logo, CUMULUS NETWORKS, and the Rocket Turtle Logo (the “Marks”) are trademarks and service marks of
Cumulus Networks, Inc. in the U.S. and other countries. You are not permitted to use the Marks without the prior written consent of
Cumulus Networks. The registered trademark Linux® is used pursuant to a sublicense from LMI, the exclusive licensee of Linus
Torvalds, owner of the mark on a worldwide basis. All other marks are used under fair use or license from their respective owners.
cumulusnetworks.com 17
Cumulus Linux 3.7 User Guide
Quick
18 Start Guide 09 January 2019
Cumulus Networks
Prerequisites
Intermediate-level Linux knowledge is assumed for this guide. You should be familiar with basic
text editing, Unix file permissions, and process monitoring. A variety of text editors are pre-
installed, including vi and nano.
You must have access to a Linux or UNIX shell. If you are running Windows, use a Linux
environment like Cygwin as your command line tool for interacting with Cumulus Linux.
If you are a networking engineer but are unfamiliar with Linux concepts, refer to this
reference guide to compare the Cumulus Linux CLI and configuration options, and their
equivalent Cisco Nexus 3000 NX-OS commands and settings. You can also watch a
series of short videos introducing you to Linux and Cumulus Linux-specific concepts.
Contents
This topic describes ...
Install Cumulus Linux (see page 20)
Getting Started (see page 21)
Login Credentials (see page 21)
Serial Console Management (see page 21)
Wired Ethernet Management (see page 21)
Configure the Hostname and Timezone (see page 22)
Verify the System Time (see page 23)
Install the License (see page 23)
Configure Breakout Ports with Splitter Cables (see page 24)
Test Cable Connectivity (see page 24)
Configure Switch Ports (see page 26)
Layer 2 Port Configuration (see page 26)
Layer 3 Port Configuration (see page 27)
Configure a Loopback Interface (see page 28)
cumulusnetworks.com 19
Cumulus Linux 3.7 User Guide
If Cumulus Linux is already installed on your switch and you need to upgrade the software only,
skip to Upgrading Cumulus Linux (see page 44).
The easiest way to install Cumulus Linux with ONIE is with local HTTP discovery:
1. If your host (laptop or server) is IPv6-enabled, make sure it is running a web server. If the host is IPv4-
enabled, make sure it is running DHCP in addition to a web server.
2. Download the Cumulus Linux installation file to the root directory of the web server. Rename this file
onie-installer.
3. Connect your host using an Ethernet cable to the management Ethernet port of the switch.
4. Power on the switch. The switch downloads the ONIE image installer and boots. You can watch the
progress of the install in your terminal. After the installation completes, the Cumulus Linux login
prompt appears in the terminal window.
These steps describe a flexible unattended installation method. You do not need a console cable.
A fresh install with ONIE using a local web server typically completes in less than ten minutes.
You have more options for installing Cumulus Linux with ONIE. Read Installing a New Cumulus
Linux Image (see page 34) to install Cumulus Linux using ONIE in the following ways:
DHCP/web server with and without DHCP options
Web server without DHCP
FTP or TFTP without a web server
Local file
USB
ONIE supports many other discovery mechanisms using USB (copy the installer to the root of the drive),
DHCPv6 and DHCPv4, and image copy methods including HTTP, FTP, and TFTP. For more information on
these discovery methods, refer to the ONIE documentation.
After installing Cumulus Linux, you are ready to:
Log in to Cumulus Linux on the switch.
Install the Cumulus Linux license.
Configure Cumulus Linux. This quick start guide provides instructions on configuring switch ports
and a loopback interface.
20 09 January 2019
Cumulus Networks
Getting Started
When starting Cumulus Linux for the first time, the management port makes a DHCPv4 request. To
determine the IP address of the switch, you can cross reference the MAC address of the switch with your
DHCP server. The MAC address is typically located on the side of the switch or on the box in which the unit
ships.
Login Credentials
The default installation includes one system account, root, with full system privileges, and one user account,
cumulus, with sudo privileges. The root account password is set to null by default (which prohibits login),
while the cumulus account is configured with this default password:
CumulusLinux!
In this quick start guide, you use the cumulus account to configure Cumulus Linux.
For optimum security, change the default password (using the passwd command) before you
configure Cumulus Linux on the switch.
All accounts except root are permitted remote SSH login; you can use sudo to grant a non-root account
root-level access. Commands that change the system configuration require this elevated level of access.
For more information about sudo, read Using sudo to Delegate Privileges (see page 115).
Example IP Configuration
Set the static IP address with the interface address and interface gateway NCLU
commands:
cumulusnetworks.com 21
Cumulus Linux 3.7 User Guide
auto eth0
iface eth0
address 192.0.2.42/24
gateway 192.0.2.1
The command prompt in the terminal does not reflect the new hostname until you either log out
of the switch or start a new shell.
When you use this NCLU command to set the hostname, DHCP does not override the hostname
when you reboot the switch. However, if you disable the hostname setting with NCLU, DHCP does
override the hostname the next time you reboot the switch.
2. Follow the on screen menu options to select the geographic area and region.
Programs that are already running (including log files) and users currently logged in, do not see
timezone changes made with interactive mode. To have the timezone set for all services and
daemons, a reboot is required.
22 09 January 2019
Cumulus Networks
[email protected]|thequickbrownfoxjumpsoverthelazydog312
There are three ways to install the license onto the switch:
Copy the license from a local server. Create a text file with the license and copy it to a server
accessible from the switch. On the switch, use the following command to transfer the file directly on
the switch, then install the license file:
Copy the file to an HTTP server (not HTTPS), then reference the URL when you run cl-license:
Copy and paste the license key into the cl-license command:
It is not necessary to reboot the switch to activate the switch ports. After you install the license,
restart the switchd service. All front panel ports become active and show up as swp1, swp2, and
so on.
cumulusnetworks.com 23
Cumulus Linux 3.7 User Guide
If a license is not installed on a Cumulus Linux switch, the switchd service does not start. After
you install the license, start switchd as described above.
To administratively enable all physical ports, run the following command, where swp1-52 represents a
switch with switch ports numbered from swp1 to swp52:
To view link status, use the net show interface all command. The following examples show the
output of ports in admin down, down, and up modes:
lo IP:
10.0.0.11/32
lo IP:
10.0.0.112/32
24 09 January 2019
Cumulus Networks
lo IP:
::1/128
UP eth0 1G 1500 Mgmt oob-mgmt-switch
(swp6) Master: mgmt(UP)
eth0 IP:
192.168.0.11/24(DHCP)
UP swp1 1G 9000 BondMember server01
(eth1) Master: bond01(UP)
UP swp2 1G 9000 BondMember server02
(eth1) Master: bond02(UP)
ADMDN swp45 N/A 1500 NotConfigured
ADMDN swp46 N/A 1500 NotConfigured
ADMDN swp47 N/A 1500 NotConfigured
ADMDN swp48 N/A 1500 NotConfigured
UP swp49 1G 9000 BondMember leaf02
(swp49) Master: peerlink(UP)
UP swp50 1G 9000 BondMember leaf02
(swp50) Master: peerlink(UP)
UP swp51 1G 9216 NotConfigured spine01 (swp1)
UP swp52 1G 9216 NotConfigured spine02 (swp1)
UP bond01 1G 9000 802.3
ad Master: bridge(UP)
bond01
Bond Members: swp1(UP)
UP bond02 1G 9000 802.3
ad Master: bridge(UP)
bond02
Bond Members: swp2(UP)
UP bridge N/A 1500 Bridge/L2
UP mgmt N/A 65536 Interface
/L3 IP: 127.0.0.1/8
UP peerlink 2G 9000 802.3
ad Master: bridge(UP)
peerlink
Bond Members: swp49(UP)
peerlink
Bond Members: swp50(UP)
DN peerlink.4094 2G 9000 SubInt
/L3 IP: 169.254.1.1/30
ADMDN vagrant N/A 1500 NotConfigured
UP vlan13 N/A 1500 Interface
/L3 Master: vrf1(UP)
vlan13 IP:
10.1.3.11/24
cumulusnetworks.com 25
Cumulus Linux 3.7 User Guide
vlan24 IP:
10.2.4.11/24
UP vlan24-v0 N/A 1500 Interface
/L3 Master: vrf1(UP)
vlan24-
v0 IP:
10.2.4.1/24
UP vlan4001 N/A 1500
NotConfigured Master: vrf1(UP)
UP vni13 N/A 9000 Access
/L2 Master: bridge(UP)
UP vni24 N/A 9000 Access
/L2 Master: bridge(UP)
UP vrf1 N/A 65536 NotConfigured
UP vxlan4001 N/A 1500 Access
/L2 Master: bridge(UP)
Examples
Example One
In the following configuration example, the front panel port swp1 is placed into a bridge called
bridge. The NCLU commands are:
auto bridge
26 09 January 2019
Cumulus Networks
iface bridge
bridge-ports swp1
bridge-vlan-aware yes
Example Two
You can add a range of ports in one command. For example, add swp1 through swp10, swp12,
and swp14 through swp20 to bridge:
The commands above produce the following snippet in the /etc/network/interfaces file:
auto bridge
iface bridge
bridge-ports swp1 swp2 swp3 swp4 swp5 swp6 swp7 swp8
swp9 swp10 swp12 swp14 swp15 swp16 swp17 swp18 swp19 swp20
bridge-vlan-aware yes
The commands above produce the following snippet in the /etc/network/interfaces file:
auto swp1
cumulusnetworks.com 27
Cumulus Linux 3.7 User Guide
iface swp1
address 10.1.1.1/30
To add an IP address to a bridge interface, you must put it into a VLAN interface:
The commands above produce the following snippet in the /etc/network/interfaces file:
auto bridge
iface bridge
bridge-vids 100
bridge-vlan-aware yes
auto vlan100
iface vlan100
address 192.168.10.1/24
vlan-id 100
vlan-raw-device bridge
To view the changes in the kernel, use the ip addr show command:
...
The loopback interface lo must always be specified in the /etc/network/interfaces file and
28 09 January 2019
Cumulus Networks
The loopback interface lo must always be specified in the /etc/network/interfaces file and
must always be up.
To see the status of the loopback interface (lo), use the net show interface lo command:
Alias
-----
loopback interface
IP Details
------------------------- --------------------
IP: 127.0.0.1/8, ::1/128
IP Neighbor(ARP) Entries: 0
You can configure multiple loopback addresses by adding additional address lines:
The commands above produce the following snippet in the /etc/network/interfaces file:
auto lo
iface lo inet loopback
address 10.1.1.1/32
address 172.16.2.1/24
Installation
cumulusnetworks.com Management 29
Cumulus Linux 3.7 User Guide
Installation Management
You can only install one image of the operating system on a Cumulus Linux switch. This section discusses
how to install new and update existing Cumulus Linux disk images, and configure those images with
additional applications (using packages) if desired.
Zero touch provisioning provides a way to quickly deploy and configure new switches in a large-scale
environment.
cumulus@x86switch$ uname -m
x86_64
cumulus@ARMswitch$ uname -m
armv7l
You can also visit the HCL (hardware compatibility list) to look at your hardware and determine the
processor type.
30 09 January 2019
Cumulus Networks
cumulusnetworks.com 31
Cumulus Linux 3.7 User Guide
32 09 January 2019
Cumulus Networks
You can also extract the contents of the image file by passing the extract option to the image file:
Finally, you can verify the contents of the image file by passing the verify option to the image file:
cumulusnetworks.com 33
Cumulus Linux 3.7 User Guide
Related Information
Open Network Install Environment (ONIE) Home Page
Installing the Cumulus Linux disk image is destructive; configuration files on the switch are not
saved; copy them to a different server before installing.
Contents
This topic describes ...
Install Using a DHCP/Web Server with DHCP Options (see page 35)
Install Using a DHCP/Web Server without DHCP Options (see page 36)
34 09 January 2019
Cumulus Networks
Install Using a DHCP/Web Server without DHCP Options (see page 36)
Install Using a Web Server with no DHCP (see page 36)
Install Using FTP Without a Web Server (see page 37)
Install Using a Local File (see page 38)
Install Using a USB Drive (see page 39)
Prepare for USB Installation (see page 39)
Instructions for x86 Platforms (see page 41)
Instructions for ARM Platforms (see page 43)
Related Information (see page 44)
In the following procedures:
You can name your Cumulus Linux installer disk image using any of the ONIE naming schemes
mentioned here.
In the example commands, [PLATFORM] can be any supported Cumulus Linux platform, such as
x86_64, or arm.
Run the sudo onie-install -h command to show the ONIE installer options.
After you install the Cumulus Linux disk image, you need to install the license file. Refer to Install the
License (see page 23).
1. The bare metal switch boots up and requests an IP address (DHCP request).
2. The DHCP server acknowledges and responds with DHCP option 114 and the location of the
installation image.
3. ONIE downloads the Cumulus Linux disk image, installs, and reboots.
4. Success! You are now running Cumulus Linux.
The most common method is to send DHCP option 114 with the entire URL to the web server
(this can be the same system). However, there are many other ways to use DHCP even if you do
not have full control over DHCP. See the ONIE user guide for help.
cumulusnetworks.com 35
Cumulus Linux 3.7 User Guide
dhcp-host=sw4,192.168.100.14,6c:64:1a:00:03:ba,set:sw4
dhcp-option=tag:sw4,114,"http://roz.rtplab.test/onie-installer-
[PLATFORM]"
If you do not have a web server, you can use this free Apache example.
1. Place the Cumulus Linux disk image in a directory on the web server.
2. Run the onie-nos-install command:
1. Place the Cumulus Linux disk image in a directory on the web server.
2. From the Cumulus Linux command prompt, run the onie-install command:
1. ONIE is in discovery mode. You must disable discovery mode with the following command:
onie# onie-discovery-stop
36 09 January 2019
Cumulus Networks
3. Place the Cumulus Linux disk image in a directory on your web server.
4. Run the installer manually (because there are no DHCP options):
1. Place the Cumulus Linux disk image in a directory on your web server.
2. From the Cumulus Linux command prompt, run the onie-install command:
1. Set up DHCP or static addressing for eth0. The following example assigns a static address to eth0:
onie# onie-discovery-stop
3. Place the Cumulus Linux disk image into a TFTP or FTP directory.
4.
cumulusnetworks.com 37
Cumulus Linux 3.7 User Guide
4. If you are not using DHCP options, run one of the following commands (tftp for TFTP or ftp for
FTP):
1. Place the Cumulus Linux disk image into a TFTP or FTP directory.
2. From the Cumulus Linux command prompt, run one of the following commands (tftp for TFTP or
ftp for FTP):
1. Set up DHCP or static addressing for eth0. The following example assigns a static address to eth0:
onie# onie-discovery-stop
3. Use scp to copy the Cumulus Linux disk image to the switch. (Windows users can use WinScp.)
4. Run the installer manually from ONIE:
38 09 January 2019
Cumulus Networks
Tips
Installing Cumulus Linux using a USB drive is fine for a single switch here and there but is
not scalable. DHCP can scale to hundreds of switch installs with zero manual input unlike
USB installs.
Cumulus Networks also provides Cumulus on a Stick, which packages Cumulus Linux
images with your license. You can download your personalized ZIP file, transfer it to a USB
drive, insert the drive into your switch, apply power, and you are ready to go. See Cumulus
on a Stick for information.
1. From the Cumulus Networks Downloads page, download the appropriate Cumulus Linux image for
your x86 or ARM platform.
2. From a computer, prepare your USB drive by formatting it using one of the supported formats:
FAT32, vFAT or EXT2.
Optional: Prepare a USB Drive inside Cumulus Linux
Use caution when performing the actions below; it is possible to severely damage your
system with the following utilities.
a. Insert your USB drive into the USB port on the switch running Cumulus Linux and log in to
the switch.
b. Examine output from cat /proc/partitions and sudo fdisk -l [device] to
determine on which device your USB drive can be found. For example, sudo fdisk -l
/dev/sdb.
cumulusnetworks.com 39
Cumulus Linux 3.7 User Guide
These instructions assume your USB drive is the /dev/sdb device, which is
typical if you insert the USB drive after the machine is already booted. However, if
you insert the USB drive during the boot process, it is possible that your USB drive
is the /dev/sda device. Make sure to modify the commands below to use the
proper device for your USB drive.
The parted utility should already be installed. However, if it is not, install it with:
sudo -E apt-get install parted
e. Format the partition to your filesystem of choice using one of the examples below:
f. To continue installing Cumulus Linux, mount the USB drive to move files.
3. Copy the Cumulus Linux disk image to the USB drive, then rename the image file to:
onie-installer-x86_64, if installing on an x86 platform
onie-installer-arm, if installing on an ARM platform
You can also use any of the ONIE naming schemes mentioned here.
40 09 January 2019
Cumulus Networks
When using a Mac or Windows computer to rename the installation file, the file extension
might still be present. Make sure to remove the file extension otherwise ONIE is not able to
detect the file.
4. Insert the USB drive into the switch, then continue with the appropriate instructions below for your
x86 or ARM platform.
SSH sessions to the switch get dropped after this step. To complete the remaining
instructions, connect to the console of the switch. Cumulus Linux switches display their
boot process to the console; you need to monitor the console specifically to complete the
next step.
2. Monitor the console and select the ONIE option from the first GRUB screen shown below.
3. Cumulus Linux on x86 uses GRUB chainloading to present a second GRUB menu specific to the
ONIE partition. No action is necessary in this menu to select the default option ONIE: Install OS.
cumulusnetworks.com 41
Cumulus Linux 3.7 User Guide
4. The USB drive is recognized and mounted automatically. The image file is located and automatic
installation of Cumulus Linux begins. Here is some sample output:
42 09 January 2019
Cumulus Networks
5. After installation completes, the switch automatically reboots into the newly installed instance of
Cumulus Linux.
SSH sessions to the switch get dropped after this step. To complete the remaining
instructions, connect to the console of the switch. Cumulus Linux switches display their
boot process to the console; you need to monitor the console specifically to complete the
next step.
2. Interrupt the normal boot process before the countdown (shown below) completes. Press any key to
stop the autoboot.
3. A command prompt appears so that you can run commands. Execute the following command:
run onie_bootcmd
4. The USB drive is recognized and mounted automatically. The image file is located and automatic
installation of Cumulus Linux begins. Here is some sample output:
cumulusnetworks.com 43
Cumulus Linux 3.7 User Guide
5. After installation completes, the switch automatically reboots into the newly installed instance of
Cumulus Linux.
Related Information
ONIE Design Specification
Cumulus Networks Downloads page
Cumulus on a Stick
Managing Cumulus Linux Disk Images (see page 30)
44 09 January 2019
Cumulus Networks
Cumulus Networks recommends that you deploy, provision, configure, and upgrade switches using
automation, even with small networks or test labs. During the upgrade process, you can quickly upgrade
dozens of devices in a repeatable manner. Using tools like Ansible, Chef, or Puppet for configuration
management greatly increases the speed and accuracy of the next major upgrade; these tools also enable
the quick swap of failed switch hardware.
Contents
This topic describes ...
Before You Upgrade Cumulus Linux (see page 45)
Upgrade Cumulus Linux (see page 49)
Should I Install a Disk Image or Upgrade Packages? (see page 49)
Disk Image Install (ONIE) (see page 50)
Package Upgrade (see page 51)
Upgrade Notes (see page 52)
Upgrade Switches in an MLAG Pair (see page 53)
Upgrade from Cumulus Linux 3.y.z to a Later 3.y.z Release (see page 53)
Upgrade from Cumulus Linux 2.y.z to 3.y.z (see page 54)
Roll Back a Cumulus Linux Installation (see page 56)
Third Party Packages (see page 56)
Related Information (see page 56)
Be sure to read the knowledge base article Upgrades: Network Device and Linux Host Worldview
Comparison , which provides a detailed comparison between the n etwork device and Linux host
worldview of upgrade and installation.
Understanding the location of configuration data is required for successful upgrades, migrations, and
backup. As with other Linux distributions, the /etc directory is the primary location for all configuration
data in Cumulus Linux. The following list is a likely set of files that you need to back up and migrate to a new
release. Make sure you examine any file that has been changed. Cumulus Networks recommends you
consider making the following files and directories part of a backup strategy.
Network Configuration Files
/etc Network configuration files, most notably Switch Port Attributes N/A
/network/ /etc/network/interfaces and /etc (see page 234)
/network/interfaces.d/
DNS resolution
cumulusnetworks.com 45
Cumulus Linux 3.7 User Guide
/etc Configuration file for the hostname of Quick Start Guide (see wiki.debian.org
/hostname the switch page ) /HowTo
/ChangeHostname
/etc Breakout cable configuration file Switch Port Attributes N/A; please read the
/cumulus (see page ) guide on breakout
/ports. cables
conf
If you are using the root user account, consider including /root/.
If you have custom user accounts, consider including /home/<username>/.
46 09 January 2019
Cumulus Networks
/etc Link Layer Discover Protocol (LLDP) daemon Link Layer packages.debian.
/lldpd. configuration Discovery org/wheezy/lldpd
conf Protocol (see
page 378)
/etc Name Service Switch (NSS) configuration file TACACS Plus (see N/A
/nsswitch. page 121)
conf
If you are using the root user account, consider including /root/.
If you have custom user accounts, consider including /home/<username>/.
cumulusnetworks.com 47
Cumulus Linux 3.7 User Guide
/etc/adjtime System clock adjustment data. NTP manages this automatically. It is incorrect when
the switch hardware is replaced. Do not copy.
/etc/bcm.d/ Per-platform hardware configuration directory, created on first boot. Do not copy.
/etc/mlx/ Per-platform hardware configuration directory, created on first boot. Do not copy.
/etc/default Platform hardware-specific file. Created during first boot. Do not copy.
/hwclock
48 09 January 2019
Cumulus Networks
/etc/lvm
/backup
/etc/sensors.d Platform-specific sensor data. Created during first boot. Do not copy.
If you are using certain forms of network virtualization (see page 476), including VMware NSX-V (see page
660) or Midokura MidoNet (see page 643), you might have updated the /usr/share/openvswitch
/scripts/ovs-ctl-vtep file. This file is not marked as a configuration file; therefore, if the file contents
change in a newer release of Cumulus Linux, they overwrite any changes you made to the file. Cumulus
Networks recommends you back up this file before upgrading.
Upgrading an MLAG pair requires additional steps. If you are using MLAG to dual connect two
Cumulus Linux switches in your environment, follow the steps in Upgrade Switches in an MLAG
Pair (see page 53) below to ensure a smooth upgrade.
50 09 January 2019
Cumulus Networks
4. Restore the configuration files to the new release — ideally with automation.
5. Verify correct operation with the old configurations on the new release.
6. Reinstall third party applications and associated configurations.
Package Upgrade
Cumulus Linux completely embraces the Linux and Debian upgrade workflow, where you use an installer to
install a base image, then perform any upgrades within that release train with -E apt-get update and
-E apt-get upgrade commands. Any packages that have been changed since the base install get
upgraded in place from the repository. All switch configuration files remain untouched, or in rare cases
merged (using the Debian merge function) during the package upgrade.
When you use package upgrade to upgrade your switch, configuration data stays in place while the
packages are upgraded. If the new release updates a configuration file that you changed previously, you are
prompted for the version you want to use or if you want to evaluate the differences.
To upgrade the switch using package upgrade:
3. Review potential upgrade issues (in some cases, upgrading new packages might also upgrade
additional existing packages due to dependencies). Run the following command to see the additional
packages that will be installed or upgraded.
If no reboot is required after the upgrade completes, the upgrade ends, restarts all upgraded
services, and logs messages in the /var/log/syslog file similar to the ones shown below. In the
examples below, only the frr package was upgraded.
cumulusnetworks.com 51
Cumulus Linux 3.7 User Guide
If the upgrade process encounters changed configuration files that have new versions in the release
to which you are upgrading, you see a message similar to this:
- To see the differences between the currently installed version and the new version, type D.
- To keep the currently installed version, type N. The new package version is installed with the suffix
_.dpkg-dist (for example, /etc/frr/daemons.dpkg-dist). When upgrade is complete and
before you reboot, merge your changes with the changes from the newly installed file.
-To install the new version, type I. Your currently installed version is saved with the suffix .dpkg-old
.
When the upgrade is complete, you can search for the files with the sudo find / -mount -
type f -name '*.dpkg-*' command.
If you see errors for expired GPG keys that prevent you from upgrading packages, follow
the steps in Upgrading Expired GPG Keys.
5. Reboot the switch if the upgrade messages indicate that a system restart is required.
6. Verify correct operation with the old configurations on the new version.
Upgrade Notes
Package upgrade always updates to the latest available release in the Cumulus Linux repository. For
example, if you are currently running Cumulus Linux 3.0.1 and run the sudo -E apt-get upgrade
command on that switch, the packages are upgraded to the latest releases contained in the latest 3.y.z
release.
Because Cumulus Linux is a collection of different Debian Linux packages, be aware of the following:
The /etc/os-release and /etc/lsb-release files are updated to the currently installed
52 09 January 2019
Cumulus Networks
The /etc/os-release and /etc/lsb-release files are updated to the currently installed
Cumulus Linux release when you upgrade the switch using either package upgrade or disk image
install. For example, if you run sudo -E apt-get upgrade and the latest Cumulus Linux release
on the repository is 3.7.1, these two files display the release as 3.7.1 after the upgrade.
The /etc/image-release file is updated only when you run a disk image install. Therefore, if you
run a disk image install of Cumulus Linux 3.5.0, followed by a package upgrade to 3.7.1 using sudo
-E apt-get upgrade, the /etc/image-release file continues to display Cumulus Linux 3.5.0,
which is the originally installed base image.
2. If you want to install a disk image, go to the next step. If you want to use package upgrade, update
the Cumulus Linux repositories:
cumulusnetworks.com 53
Cumulus Linux 3.7 User Guide
7. If you were originally running Cumulus Linux 3.0.0 through 3.3.2, follow the steps for upgrading from
Quagga to FRRouting (see page 714).
8. Verify STP convergence across both switches:
1. Disable clagd in the /etc/network/interfaces file (set clagd-enable to no), then restart
54 09 January 2019
Cumulus Networks
1. Disable clagd in the /etc/network/interfaces file (set clagd-enable to no), then restart
switchd, networking, and FRR services.
2. If you are using BGP, notify the BGP neighbors that the switch is going down:
5. Run cl-img-select -fr to boot the switch in the secondary role into ONIE, then reboot the
switch.
6. Install Cumulus Linux onto the secondary switch using ONIE. At this time, all traffic goes to the switch
in the primary role.
7. After the install, copy the license file and all the configuration files you backed up, then restart the
switchd, networking, and Quagga services. All traffic is still going to the primary switch.
8. Run cl-img-select -fr to boot the switch in the primary role into ONIE, then reboot the switch.
Now, all traffic is going to the switch in the secondary role that you just upgraded.
9. Install Cumulus Linux onto the primary switch using ONIE.
10. After the install, copy the license file and all the configuration files you backed up.
11. Follow the steps for upgrading from Quagga to FRRouting (see page 714).
12. Enable clagd again in the /etc/network/interfaces file (set clagd-enable to yes), then run
ifreload -a.
The two switches are dual-connected again and traffic flows to both switches.
Related Information
Upgrades: Network Device Worldview and Linux Host Worldview Comparison
Automation Solutions
ONIE Design Specification
Multi-Chassis Link Aggregation - MLAG (see page 427)
Configuration File Migration Script
Zero Touch Provisioning - ZTP (see page 72)
56 09 January 2019
Cumulus Networks
Using Snapshots
Cumulus Linux supports the ability to take snapshots of the complete file system as well as the ability to roll
back to a previous snapshot. Snapshots are performed automatically right before and after you upgrade
Cumulus Linux using package install (see page 51), and right before and after you commit a switch
configuration using NCLU (see page 88). In addition, you can take a snapshot at any time. You can roll back
the entire file system to a specific snapshot or just retrieve specific files.
The primary snapshot components include:
btrfs — an underlying file system in Cumulus Linux, which supports snapshots.
snapper — a userspace utility to create and manage snapshots on demand as well as taking
snapshots automatically before and after running apt-get upgrade|install|remove|dist-
upgrade. You can use snapper to roll back to earlier snapshots, view existing snapshots, or delete
one or more snapshots.
NCLU (see page 88) — takes snapshots automatically before and after committing network
configurations. You can use NCLU to roll back to earlier snapshots, view existing snapshots, or
delete one or more snapshots.
Contents
This topic describes ...
Install the Snapshot Package (see page 57)
Take and Manage Snapshots (see page 57)
View Available Snapshots (see page 58)
View Differences between Snapshots (see page 59)
Delete Snapshots (see page 60)
Roll Back to Earlier Snapshots (see page 61)
Roll Back with snapper (see page 62)
Configure Automatic Time-based Snapshots (see page 62)
Caveats and Errata (see page 63)
cumulusnetworks.com 57
Cumulus Linux 3.7 User Guide
Before and after you update your switch configuration by running the NCLU net commit
command.
For more information about using snapper, run snapper --help or man snapper(8).
However, net show commit history only displays snapshots taken when you update your switch
configuration. It does not list any snapshots taken directly with snapper. To see all the snapshots on the
switch, run the sudo snapper list command:
58 09 January 2019
Cumulus Networks
-control-plane
- acl ipv4 EXAMPLE1 inbound
-iface swp1
- acl ipv4 EXAMPLE1 inbound
cumulusnetworks.com 59
Cumulus Linux 3.7 User Guide
You can view the diff for a single file by specifying the name in the command:
-control-plane
- acl ipv4 EXAMPLE1 inbound
-iface swp1
- acl ipv4 EXAMPLE1 inbound
For a higher level view; for example, to display the names of changed, added, or deleted files only, run the
sudo snapper status command:
Delete Snapshots
You can remove one or more snapshots using NCLU or snapper.
Take care when deleting a snapshot. You cannot restore a snapshot after you delete it.
Snapshot 0 is the running configuration. You cannot roll back to it or delete it. However, you can
take a snapshot of it.
The snapper utility preserves a number of snapshots and automatically deletes older snapshots after the
limit is reached. It does this in two ways.
By default, snapper preserves 10 snapshots that are labeled important. A snapshot is labeled important if
it is created when you run apt-get. To change this number, run:
Always make NUMBER_LIMIT_IMPORTANT an even number as two snapshots are always taken
before and after an upgrade. This does not apply to NUMBER_LIMIT, described next.
snapper also deletes unlabeled snapshots. By default, snapper preserves five snapshots. To change this
number, run:
You can prevent snapshots from being taken automatically before and after running apt-get
upgrade|install|remove|dist-upgrade. Edit /etc/cumulus/apt-snapshot.conf and set:
APT_SNAPSHOT_ENABLE=no
If you provided a description when you committed changes, mentioning a description rolls the
configuration back to the commit prior to the specified description. For example, consider the following
commit history:
cumulusnetworks.com 61
Cumulus Linux 3.7 User Guide
Running net rollback description turtle rolls the configuration back to the state it was in when
you ran net commit description rocket.
You can revert to an earlier version of a specific file instead of rolling back the whole file system:
You can also copy the file directly from the snapshot directory:
cumulus@switch:~$ cp /.snapshots/32/snapshot/etc/cumulus/acl
/policy.d/50_nclu_acl.rules /etc/cumulus/acl/policy.d/
62 09 January 2019
Cumulus Networks
NUMBER_LIMIT_IMPORTANT | 10
NUMBER_MIN_AGE | 1800
QGROUP |
SPACE_LIMIT | 0.5
SUBVOLUME | /
SYNC_ACL | no
TIMELINE_CLEANUP | yes
TIMELINE_CREATE | yes
TIMELINE_LIMIT_DAILY | 5
TIMELINE_LIMIT_HOURLY | 5
TIMELINE_LIMIT_MONTHLY | 5
TIMELINE_LIMIT_YEARLY | 5
TIMELINE_MIN_AGE | 1800
Directory Reason
/var/log, /var/support The log file and Cumulus support location. These directories are
excluded from snapshots to allow post-rollback analysis.
/opt, /var/opt Third-party software is installed typically in /opt. Exclude /opt to avoid
re-installing these applications after rollbacks.
/srv This directory contains data for HTTP and FTP servers. Exclude this
directory to avoid server data loss on rollbacks.
/usr/local This directory is used when installing locally built software. Exclude this
directory to avoid re-installing this software after rollbacks.
/var/lib/libvirt This is the default directory for libvirt VM images. Exclude this directory
/images from the snapshot. Additionally, disable Copy-On-Write (COW) for this
subvolume as COW and VM image I/O access patterns are not
compatible.
cumulusnetworks.com 63
Cumulus Linux 3.7 User Guide
Directory Reason
/boot/grub/i386-pc, The GRUB kernel modules must stay in sync with the GRUB kernel
/boot/grub/x86_64- installed in the master boot record or UEFI system partition.
efi, /boot/grub/arm-
uboot
Network Disruptions
Updating, upgrading, and installing packages with apt causes disruptions to network services:
Upgrading a package might result in services being restarted or stopped as part of the
upgrade process.
Installing a package might disrupt core services by changing core service dependency
packages. In some cases, installing new packages might also upgrade additional existing
packages due to dependencies.
If services are stopped, you might need to reboot the switch for those services to restart.
Contents
This topic describes ...
Update the Package Cache (see page 64)
List Available Packages (see page 66)
List Installed Packages (see page 67)
Display the Version of a Package (see page 67)
Upgrade Packages (see page 68)
Add New Packages (see page 68)
Add Packages from Another Repository (see page 69)
Cumulus Supplemental Repository (see page 71)
Related Information (see page 72)
64 09 January 2019
Cumulus Networks
cumulusnetworks.com 65
Cumulus Linux 3.7 User Guide
Cumulus Networks recommends you use the -E option with sudo whenever you run any apt-
get command. This option preserves your environment variables (such as HTTP proxies) before
you install new packages or upgrade your distribution.
66 09 January 2019
Cumulus Networks
The search commands look for the search terms not only in the package name but in other parts
of the package information; the search matches on more packages than you might expect.
cumulusnetworks.com 67
Cumulus Linux 3.7 User Guide
1.0-cl3u11
As an alternative to the NCLU command described above, you can run the Linux dpkg -l
<package_name> command.
To see a list of all packages installed on the system with their versions, run the net show package
version command. For example:
Upgrade Packages
To upgrade all the packages installed on the system to their latest versions, run the following commands:
A list of packages that will be upgraded is displayed and you are prompted to continue.
The above commands upgrade all installed versions with their latest versions but do not install any new
packages.
Refer to Upgrading Cumulus Linux (see page 44) for additional information.
68 09 January 2019
Cumulus Networks
If the package is installed already, you can update the package from the Cumulus Linux repository as
part of the package upgrade process, which upgrades all packages on the system. See Upgrade
Packages (see page 68) above.
2. If the package is not already installed, add it by running -E apt-get install <name of
package>. This retrieves the package from the Cumulus Linux repository and installs it on your
system together with any other packages on which this package might depend. The following
example adds the tcpreplay package to the system:
In some cases, installing a new package might also upgrade additional existing packages
due to dependencies. To view these additional packages before you install, run the apt-
get install --dry-run command.
cumulusnetworks.com 69
Cumulus Linux 3.7 User Guide
Cumulus Networks has added features or made bug fixes to certain packages; you must not
replace these packages with versions from other repositories. Cumulus Linux is configured to
ensure that the packages from the Cumulus Linux repository are always preferred over packages
from other repositories.
If you want to install packages that are not in the Cumulus Linux repository, the procedure is the same as
above, but with one additional step.
Packages that are not part of the Cumulus Linux Repository are not typically tested and might not
be supported by Cumulus Linux Technical Support.
Installing packages outside of the Cumulus Linux repository requires the use of -E apt-get; however,
depending on the package, you can use easy-install and other commands.
To install a new package, complete the following steps:
1. Run the dpkg command to ensure that the package is not already installed on the system:
2. If the package is installed already, ensure it is the version you need. If it is an older version, update
the package from the Cumulus Linux repository:
3. If the package is not on the system, the package source location is most likely not in the /etc/apt
/sources.list file. If the source for the new package is not in sources.list, edit and add the
appropriate source to the file. For example, add the following if you want a package from the Debian
repository that is not in the Cumulus Linux repository:
To uncomment the repository, remove the # at the start of the line, then save the file:
70 09 January 2019
Cumulus Networks
cumulusnetworks.com 71
Cumulus
1. Linux 3.7 User Guide
Related Information
Debian GNU/Linux FAQ, Ch 8 Package management tools
man pages for apt-get, dpkg, sources.list, apt_preferences
72 09 January 2019
Cumulus Networks
Contents
This topic describes...
Zero Touch Provisioning Using a Local File (see page 73)
Zero Touch Provisioning Using a USB Drive (ZTP-USB) (see page 74)
Zero Touch Provisioning over DHCP (see page 74)
Trigger ZTP over DHCP (see page 75)
Configure the DHCP Server (see page 75)
Inspect HTTP Headers (see page 75)
Write ZTP Scripts (see page 76)
Best Practices for ZTP Scripts (see page 77)
Install a License (see page 77)
Test DNS Name Resolution (see page 78)
Check the Cumulus Linux Release (see page 78)
Apply Management VRF Configuration (see page 79)
Perform Ansible Provisioning Callbacks (see page 79)
Disable the DHCP Hostname Override Setting (see page 79)
NCLU in ZTP Scripts (see page 80)
Test ZTP Scripts (see page 80)
Common ZTP Script Errors (see page 84)
Manually Use the ztp Command (see page 86)
Notes (see page 87)
cumulus-ztp-amd64-cel_pebble-rUNKNOWN
cumulus-ztp-amd64-cel_pebble
cumulus-ztp-cel_pebble
cumulus-ztp-amd64
cumulusnetworks.com 73
Cumulus Linux 3.7 User Guide
cumulus-ztp
You can also trigger the ZTP process manually by running the ztp --run <URL> command, where the
URL is the path to the ZTP script.
This feature has been tested only with thumb drives, not an actual external large USB hard drive.
If the ztp process does not discover a local script, it tries once to locate an inserted but unmounted USB
drive. If it discovers one, it begins the ZTP process.
Cumulus Linux supports the use of a FAT32, FAT16, or VFAT-formatted USB drive as an installation source
for ZTP scripts. You must plug in the USB drive before you power up the switch.
At minimum, the script must:
Install the Cumulus Linux operating system and license.
Copy over a basic configuration to the switch.
Restart the switch or the relevant serves to get switchd up and running with that configuration.
Follow these steps to perform zero touch provisioning using a USB drive:
1. Copy the Cumulus Linux license and installation image to the USB drive.
2. The ztp process searches the root filesystem of the newly mounted drive for filenames matching an
ONIE-style waterfall (see the patterns and examples above), looking for the most specific name first,
and ending at the most generic.
3. The contents of the script are parsed to ensure it contains the CUMULUS-AUTOPROVISIONING flag
(see example scripts (see page 76)).
The USB drive is mounted to a temporary directory under /tmp (for example, /tmp/tmpigGgjf/
). To reference files on the USB drive, use the environment variable ZTP_USB_MOUNTPOINT to
refer to the USB root partition.
1. The first time you boot Cumulus Linux, eth0 is configured for DHCP and makes a DHCP request.
2. The DHCP server offers a lease to the switch.
3. If option 239 is present in the response, the zero touch provisioning process starts.
4. The zero touch provisioning process requests the contents of the script from the URL, sending
74 09 January 2019
Cumulus Networks
4. The zero touch provisioning process requests the contents of the script from the URL, sending
additional HTTP headers (see page 75) containing details about the switch.
5. The contents of the script are parsed to ensure it contains the CUMULUS-AUTOPROVISIONING flag
(see example scripts (see page 76)).
6. If provisioning is necessary, the script executes locally on the switch with root privileges.
7. The return code of the script is examined. If it is 0, the provisioning state is marked as complete in
the autoprovisioning configuration file.
Additionally, you can specify the hostname of the switch with the host-name option:
cumulusnetworks.com 75
Cumulus Linux 3.7 User Guide
Remember to include the following line in any of the supported scripts that you expect to run
using the autoprovisioning framework.
# CUMULUS-AUTOPROVISIONING
This line is required somewhere in the script file for execution to occur.
The script must contain the CUMULUS-AUTOPROVISIONING flag. You can include this flag in a comment or
remark; the flag does not need to be echoed or written to stdout.
You can write the script in any language currently supported by Cumulus Linux, such as:
Perl
Python
Ruby
Shell
The script must return an exit code of 0 upon success, as this triggers the autoprovisioning process to be
marked as complete in the autoprovisioning configuration file.
The following script installs Cumulus Linux and its license from a USB drive and applies a configuration:
#!/bin/bash
function error() {
echo -e "\e[0;33mERROR: The Zero Touch Provisioning script failed
while running the command $BASH_COMMAND at line $BASH_LINENO.\e[0m"
>&2
exit 1
}
76 09 January 2019
Cumulus Networks
# CUMULUS-AUTOPROVISIONING
exit 0
Several ZTP example scripts are available in the Cumulus GitHub repository.
Install a License
Use the following function to include error checking for license file installation.
function install_license(){
cumulusnetworks.com 77
Cumulus Linux 3.7 User Guide
# Install license
echo "$(date) INFO: Installing License..."
echo $1 | /usr/cumulus/bin/cl-license -i
return_code=$?
if [ "$return_code" == "0" ]; then
echo "$(date) INFO: License Installed."
else
echo "$(date) ERROR: License not installed. Return code was:
$return_code"
/usr/cumulus/bin/cl-license
exit 1
fi
}
function ping_until_reachable(){
last_code=1
max_tries=30
tries=0
while [ "0" != "$last_code" ] && [ "$tries" -lt "$max_tries" ]; do
tries=$((tries+1))
echo "$(date) INFO: ( Attempt $tries of $max_tries ) Pinging
$1 Target Until Reachable."
ping $1 -c2 &> /dev/null
last_code=$?
sleep 1
done
if [ "$tries" -eq "$max_tries" ] && [ "$last_code" -ne "0" ]; then
echo "$(date) ERROR: Reached maximum number of attempts to
ping the target $1 ."
exit 1
fi
}
78 09 January 2019
Cumulus Networks
function init_ztp(){
#do normal ZTP tasks
}
CUMULUS_TARGET_RELEASE=3.5.3
CUMULUS_CURRENT_RELEASE=$(cat /etc/lsb-release | grep RELEASE | cut -
d "=" -f2)
IMAGE_SERVER_HOSTNAME=webserver.example.com
IMAGE_SERVER=”http://”$IMAGE_SERVER_HOSTNAME”
/”$CUMULUS_TARGET_RELEASE”.bin”
ZTP_URL=”http://”$IMAGE_SERVER_HOSTNAME”/ztp.sh”
function set_hostname(){
# Remove DHCP Setting of Hostname
sed s/'SETHOSTNAME="yes"'/'SETHOSTNAME="no"'/g -i /etc/dhcp
/dhclient-exit-hooks.d/dhcp-sethostname
cumulusnetworks.com 79
Cumulus Linux 3.7 User Guide
hostnamectl set-hostname $1
}
Not all aspects of NCLU are supported when running during ZTP. Use traditional Linux methods
of providing configuration to the switch during ZTP.
When you use NCLU in ZTP scripts, add the following loop to make sure NCLU has time to start up before
being called.
To see if ZTP is enabled and to see results of the most recent execution, you can run the ztp -s
command.
80 09 January 2019
Cumulus Networks
cumulus@switch:~$ ztp -s
ZTP INFO:
State enabled
Version 1.0
Result Script Failure
Date Tue May 10 22:42:09 2016 UTC
Method ZTP DHCP
URL http://192.0.2.1/demo.sh
If ZTP runs when the switch boots and not manually, you can run the systemctl -l status ztp.
service then journalctl -l -u ztp.service to see if any failures occur:
May 11 16:37:45 cumulus ztp[400]: ztp [400]: ZTP USB: Device not found
May 11 16:38:45 dell-s6000-01 ztp[400]: ztp [400]: ZTP DHCP: Looking
for ZTP Script provided by DHCP
May 11 16:38:45 dell-s6000-01 ztp[400]: ztp [400]: Attempting to
provision via ZTP DHCP from http://192.0.2.1/demo.sh
May 11 16:38:45 dell-s6000-01 ztp[400]: ztp [400]: ZTP DHCP: URL
response code 200
May 11 16:38:45 dell-s6000-01 ztp[400]: ztp [400]: ZTP DHCP: Found
Marker CUMULUS-AUTOPROVISIONING
May 11 16:38:45 dell-s6000-01 ztp[400]: ztp [400]: ZTP DHCP:
Executing http://192.0.2.1/demo.sh
May 11 16:38:45 dell-s6000-01 ztp[400]: ztp [400]: ZTP DHCP: Payload
returned code 1
May 11 16:38:45 dell-s6000-01 ztp[400]: ztp [400]: Script returned
failure
May 11 16:38:45 dell-s6000-01 systemd[1]: ztp.service: main process
exited, code=exited, status=1/FAILURE
May 11 16:38:45 dell-s6000-01 systemd[1]: Unit ztp.service entered
failed state.
cumulus@switch:~$
cumulus@switch:~$ sudo journalctl -l -u ztp.service --no-pager
-- Logs begin at Wed 2016-05-11 16:37:42 UTC, end at Wed 2016-05-11
16:40:39 UTC. --
May 11 16:37:45 cumulus ztp[400]: ztp [400]: /var/lib/cumulus/ztp:
Sate Directory does not exist. Creating it...
cumulusnetworks.com 81
Cumulus Linux 3.7 User Guide
Instead of running journalctl, you can see the log history by running:
82 09 January 2019
Cumulus Networks
If you see that the issue is a script failure, you can modify the script and then run ztp manually using ztp -
v -r <URL/path to that script>, as above.
cumulusnetworks.com 83
Cumulus Linux 3.7 User Guide
Use the following command to check syslog for information about ZTP:
84 09 January 2019
Cumulus Networks
Errors in syslog for ZTP like those shown above often occur if the script is created (or edited as some point)
on a Windows machine. Check to make sure that the \r\n characters are not present in the end-of-line
encodings.
Use the cat -v ztp.sh command to view the contents of the script and search for any hidden
characters.
The ^M characters in the output of your ZTP script, as shown above, indicate the presence of Windows end-
of-line encodings that you need to remove.
Use the translate (tr) command on any Linux system to remove the '\r' characters from the file.
cumulusnetworks.com 85
Cumulus Linux 3.7 User Guide
###################
/usr/cumulus/bin/cl-license -i http://192.168.0.254/license.txt
# Clean method of performing a Reboot
nohup bash -c 'sleep 2; shutdown now -r "Rebooting to Complete ZTP"' &
exit 0
# The line below is required to be a valid ZTP script
#CUMULUS-AUTOPROVISIONING
root@oob-mgmt-server:/var/www/html#
Enabling ztp means that ztp tries to run the next time the switch boots. However, if ZTP already
ran on a previous boot up or if a manual configuration has been found, ZTP will just exit without
trying to look for any script.
ZTP checks for these manual configurations during bootup:
Password changes
Users and groups changes
Packages changes
Interfaces changes
The presence of an installed license
When the switch is booted for the very first time, ZTP records the state of important files that are
most likely going to be modified after that the switch is configured. If ZTP is still enabled after a
reboot, ZTP compares the recorded state to the current state of these files. If they do not match,
ZTP considers that the switch has already been provisioned and exits. These files are only erased
after a reset.
To reset ztp to its original state, use the -R option and the -i option. This removes the ztp directory and
ztp runs the next time the switch reboots.
To force provisioning to occur and ignore the status listed in the configuration file, use the -r option:
86 09 January 2019
Cumulus Networks
Notes
During the development of a provisioning script, the switch might need to be rebooted.
You can use the Cumulus Linux onie-select -i command to cause the switch to reprovision
itself and install a network operating system again using ONIE.
System Configuration
cumulusnetworks.com 87
Cumulus Linux 3.7 User Guide
System Configuration
The NCLU wrapper utility called net is capable of configuring layer 2 and layer 3 features of the networking
stack, installing ACLs and VXLANs, rolling back and deleting snapshots, as well as providing monitoring and
troubleshooting functionality for these features. You can configure both the /etc/network/interfaces
and /etc/frr/frr.conf files with net, in addition to running show and clear commands related to
ifupdown2 and FRRouting.
Contents
This topic describes ...
Install NCLU
If you upgraded Cumulus Linux from a version earlier than 3.2 instead of performing a full disk image
install, you need to install the nclu package on your switch:
The nclu package installs a new bash completion script and displays the following message:
NCLU Basics
Use the following workflow to stage and commit changes to Cumulus Linux with NCLU:
1. Use the net add and net del commands to stage and remove configuration changes.
2. Use the net pending command to review staged changes.
3. Use net commit and net abort to commit and delete staged changes.
net commit applies the changes to the relevant configuration files, such as /etc/network
/interfaces, then runs necessary follow on commands to enable the configuration, such as
ifreload -a.
cumulusnetworks.com 89
Cumulus Linux 3.7 User Guide
If two different users try to commit a change at the same time, NCLU displays a warning but
implements the change according to the first commit received. The second user will need to
abort the commit.
When you have a running configuration, you can review and update the configuration with the following
commands:
net show is a series of commands for viewing various parts of the network configuration. For
example, use net show configuration to view the complete network configuration, net
show commit history to view a history of commits using NCLU, and net show bgp to view
BGP status.
net clear provides a way to clear net show counters, BGP and OSPF neighbor content, and
more.
net rollback provides a mechanism to revert back (see page 61) to an earlier configuration.
net commit confirm requires you to press Enter to commit changes using NCLU. If you run net
commit confirm but do not press Enter within 10 seconds, the commit automatically reverts and
no changes are made.
net commit description <description> enables you to provide a descriptive summary of
the changes you are about to commit.
net commit permanent retains the snapshot (see page 57) taken when committing the change.
Otherwise, the snapshots created from NCLU commands are cleaned up periodically with a snapper
cron job.
net commit delete deletes one or more snapshots created when committing changes with
NCLU.
net del all deletes all configurations and stops the IEEE 802.1X service.
The net del all command does not remove management VRF (see page 859)
configurations; NCLU does not interact with eth0 interfaces and management VRF.
90 09 January 2019
Cumulus Networks
NCLU has a comprehensive built in help system. In addition to the net man page, you can use ? and help
to display available commands:
Usage:
# net <COMMAND> [<ARGS>] [help]
#
# net is a command line utility for networking on Cumulus Linux
switches.
#
# COMMANDS are listed below and have context specific arguments
which can
# be explored by typing "<TAB>" or "help" anytime while using net.
#
# Use 'man net' for a more comprehensive overview.
net abort
net commit [verbose] [confirm] [description <wildcard>]
net commit delete (<number>|<number-range>)
net help [verbose]
net pending
net rollback (<number>|last)
net show commit (history|<number>|<number-range>|last)
net show rollback (<number>|last)
net show configuration
[commands|files|acl|bgp|ospf|ospf6|interface <interface>]
Options:
# Help commands
help : context sensitive information; see section below
example : detailed examples of common workflows
# Configuration commands
add : add/modify configuration
del : remove configuration
cumulusnetworks.com 91
Cumulus Linux 3.7 User Guide
# Status commands
show : show command output
clear : clear counters, BGP neighbors, etc
Uncomment the very last line in the .inputrc file so that the file changes from this:
to this:
92 09 January 2019
Cumulus Networks
Save the file and reconnect to the switch. The ? (question mark) ability will work on all subsequent sessions
on the switch.
cumulus@leaf01:~$ net
abort : abandon changes in the commit buffer
add : add/modify configuration
clear : clear counters, BGP neighbors, etc
commit : apply the commit buffer to the system
del : remove configuration
example : detailed examples of common workflows
help : Show this screen and exit
pending : show changes staged in the commit buffer
rollback : revert to a previous configuration state
show : show command output
When the question mark is typed, NCLU autocompletes and shows all available options, but the
question mark does not actually appear on the terminal. This is normal, expected behavior.
Built-In Examples
NCLU has a number of built in examples to guide users through basic configuration setup:
Scenario
========
cumulusnetworks.com 93
Cumulus Linux 3.7 User Guide
Verification
============
switch1# net show interface
switch1# net show bridge macs
You create user accounts with edit permissions for NCLU by adding them to the netedit group. A
94 09 January 2019
Cumulus Networks
You create user accounts with edit permissions for NCLU by adding them to the netedit group. A
user in the netedit group can run NCLU configuration commands, such net add, net del or
net commit in addition to NCLU net show commands.
The examples below demonstrate how to add a new user account or modify an existing user account called
myuser.
To add a new user account with NCLU show permissions:
You can use the adduser command for local user accounts only. You can use the addgroup
command for both local and remote user accounts. For a remote user account, you must use the
mapping username, such as tacacs3 or radius_user, not the TACACS (see page 121) or
RADIUS (see page 135) account name.
If the user tries to run commands that are not allowed, the following error displays:
cumulusnetworks.com 95
Cumulus Linux 3.7 User Guide
To configure a new user group to use NCLU, add that group to the groups_with_edit and
groups_with_show lines in the file.
Use caution giving edit permissions to groups. For example, don't give edit permissions to the
tacacs group (see page 126).
96 09 January 2019
Cumulus Networks
With the commands all stored in a single file, you can now copy this file to another ToR switch in your
network called leaf01 and apply the configuration by running:
Advanced Configuration
NCLU needs no initial configuration; however, if you need to modify its configuration, you must manually
update the /etc/netd.conf file. You can configure this file to allow different permission levels for users
to edit configurations and run show commands. The file also contains a blacklist that hides less frequently
used terms from the tabbed autocomplete.
cumulusnetworks.com 97
Cumulus Linux 3.7 User Guide
Controls
which users
are allowed
to run show
commands.
Contents
This topic describes ...
Set the Time Zone (see page 99)
Edit the /etc/timezone File (see page 99)
Use the Guided Wizard (see page 99)
98 09 January 2019
Cumulus Networks
Edit the file to add your desired time zone. A list of valid time zones can be found at the following link.
Use the following command to apply the new time zone immediately.
Then navigate the menus to enable the time zone you want. The following example selects the US/Pacific
time zone:
cumulusnetworks.com 99
Cumulus Linux 3.7 User Guide
Configuring tzdata
------------------
For more info see the Debian System Administrator’s Manual – Time.
If you need to reconfigure the current time zone, refer to the instructions above.
Then, to set the system clock according to the time zone configured:
These commands add the NTP server to the list of servers in /etc/ntp.conf:
To set the initial date and time via NTP before starting the ntpd daemon, use ntpd -q. This is the same as
ntpdate, which is to be retired and no longer available. See man ntp.conf(5) for details on configuring
ntpd using ntp.conf.
cumulusnetworks.com 101
Cumulus Linux 3.7 User Guide
These commands create the following configuration snippet in the ntp.conf file:
...
# Specify interfaces
interface listen swp10
...
cumulusnetworks.com 103
Cumulus Linux 3.7 User Guide
Cumulus Linux currently supports PTP on the Mellanox Spectrum ASIC only.
If you do not perform a full disk image install of Cumulus Linux 3.6 or later, you need to
install the ptp4l package with the apt-get install ptp4l command.
PTP is supported in boundary clock mode only (the switch provides timing to downstream
servers; it is a slave to a higher-level clock and a master to downstream clocks).
The switch uses hardware time stamping to capture timestamps from an Ethernet frame
at the physical layer. This allows PTP to account for delays in message transfer and greatly
improves the accuracy of time synchronization.
In the following example, boundary clock 2 receives time from Master 1 (the grandmaster) on a PTP slave
port, sets its clock and passes the time down from the PTP master port to boundary clock 1. Boundary
clock 1 receives the time on a PTP slave port, sets its clock and passes the time down the hierarchy through
the PTP master ports to the hosts that receive the time.
1. Open the /etc/cumulus/switchd.conf file in a text editor and add the following line:
ptp.timestamping = TRUE
2. Restart switchd:
cumulusnetworks.com 105
Cumulus Linux 3.7 User Guide
1. Configure the interfaces on the switch that you want to use for PTP. Each interface must be
configured as a layer 3 routed interface with an IP address.
Example Configuration
In the following example, the boundary clock on the switch receives time from Master 1 (the grandmaster)
on PTP slave port swp3s0, sets its clock and passes the time down through PTP master ports swp3s1,
swp3s2, and swp3s3 to the hosts that receive the time.
The configuration for the above example is shown below. The example assumes that you have already
configured the layer 3 routed interfaces (swp3s0, swp3s1, swp3s2, and swp3s3) you want to use for PTP.
cumulusnetworks.com 107
Cumulus Linux 3.7 User Guide
ptp
global
slaveOnly
0
priority1
255
priority2
255
domainNumber
0
logging_level
5
path_trace_enabled
0
use_syslog
1
verbose
0
summary_interval
0
time_stamping
hardware
gmCapable
0
swp15s0
swp15s1
...
To view the additional PTP status information, including the delta in nanoseconds from the master clock,
run the sudo pmc -u -b 0 'GET TIME_STATUS_NP' command:
cumulusnetworks.com 109
Cumulus Linux 3.7 User Guide
gmIdentity 000200.fffe.000005
If the state is not Active, or the alternate configuration file does not appear in the ntp command line — for
example:
— then it is likely that a mistake was made. In this case, correct the mistake and rerun the three commands
above to verify.
With this unit file override present, changing NTP settings using NCLU do not take effect until the
DHCP script regenerates the alternate NTP configuration file.
Related Information
Debian System Administrator’s Manual – Time
www.ntp.org
en.wikipedia.org/wiki/Network_Time_Protocol
wiki.debian.org/NTP
Contents
This topic describes ...
Generate an SSH Key Pair (see page 111)
Related Information (see page 113)
1. To generate the SSH key pair, run the ssh-keygen command and follow the prompts:
cumulus@leaf01:~$ ssh-keygen
Generating public/private rsa key pair.
Enter file in which to save the key (/home/cumulus/.ssh/id_rsa):
cumulusnetworks.com 111
Cumulus Linux 3.7 User Guide
2. To copy the generated public key to the desired location, run the ssh-copy-id command and
follow the prompts:
ssh-copy-id does not work if the username on the remote switch is different from the
username on the local switch. To work around this issue, use the scp command instead:
3. Connect to the remote switch to confirm that the authentication keys are in place:
Related Information
Debian Documentation - Password-less logins with OpenSSH
Wikipedia - Secure Shell (SSH)
User Accounts
By default, Cumulus Linux has two user accounts: cumulus and root.
The cumulus account:
Uses the default password CumulusLinux!
Is a user account in the sudo group with sudo privileges.
Can log in to the system through all the usual channels, such as console and SSH (see page 111).
Along with the cumulus group, has both show and edit rights for NCLU (see page 88).
The root account:
Has the default password disabled by default.
Has the standard Linux root user access to everything on the switch.
Disabled password prohibits login to the switch by SSH, telnet, FTP, and so on.
For optimal security, change the default password with the passwd command before you configure
Cumulus Linux on the switch.
You can add additional user accounts as needed. Like the cumulus account, these accounts must use sudo
to execute privileged commands (see page 115); be sure to include them in the sudo group.
To access the switch without a password, you need to boot into a single shell/user mode (see page 913).
You can add and configure user accounts in Cumulus Linux with read-only or edit permissions for NCLU.
For more information, see Configure User Accounts (see page 94).
cumulusnetworks.com 113
Cumulus Linux 3.7 User Guide
1. In a terminal on your host system (not the switch), check to see if a key already exists:
3. You are prompted to enter a file in which to save the key (/root/.ssh/id_rsa) . Press Enter to use the
home directory of the root user or provide a different destination.
4. You are prompted to enter a passphrase (empty for no passphrase). This is optional but it does
provide an extra layer of security.
5. The public key is now located in /root/.ssh/id_rsa.pub. The private key (identification) is now
located in /root/.ssh/id_rsa.
6. Copy the public key to the switch. SSH to the switch as the cumulus user, then run:
Cumulus Networks
...
# Authentication:
LoginGraceTime 120
PermitRootLogin yes
StrictModes yes
...
Contents
This topic describes ...
sudo Basics (see page 115)
sudoers Examples (see page 116)
Related Information (see page 121)
sudo Basics
sudo allows you to execute a command as superuser or another user as specified by the security policy.
See man sudo(8) for details.
The default security policy is sudoers, which is configured using /etc/sudoers. Use /etc/sudoers.d/ to
add to the default sudoers policy. See man sudoers(5) for details.
Use visudo only to edit the sudoers file; do not use another editor like vi or emacs. See man
visudo(8) for details.
When creating a new file in /etc/sudoers.d, use visudo -f. This option performs sanity
checks before writing the file to avoid errors that prevent sudo from working.
Errors in the sudoers file can result in losing the ability to elevate privileges to root. You can fix
this issue only by power cycling the switch and booting into single user mode. Before modifying
sudoers, enable the root user by setting a password for the root user.
cumulusnetworks.com 115
Cumulus Linux 3.7 User Guide
By default, users in the sudo group can use sudo to execute privileged commands. To add users to the
sudo group, use the useradd(8) or usermod(8) command. To see which users belong to the sudo
group, see /etc/group (man group(5)).
Any command can be run as sudo, including su. A password is required.
The example below shows how to use sudo as a non-privileged user cumulus to bring up an interface:
sudoers Examples
The following examples show how you grant as few privileges as necessary to a user or group of users to
allow them to perform the required task. For each example, the system group noc is used; groups are
prefixed with an %.
When executed by an unprivileged user, the example commands below must be prefixed with sudo.
Monitoring Switch
port info
ethtool -m swp1 %noc ALL=(ALL) NOPASSWD:
/sbin/ethtool
Monitoring System
diagnostics
cl-support %noc ALL=(ALL) NOPASSWD:/usr
/cumulus/bin/cl-support
Monitoring Routing
diagnostics
Image Install
management images
onie-select %noc ALL=(ALL) NOPASSWD:/usr
http://lab /cumulus/bin/onie-select
/install.bin
Package Install
management packages
apt-get install %noc ALL=(ALL) NOPASSWD:/usr
vim /bin/apt-get install *
Package Upgrading
management
apt-get upgrade %noc ALL=(ALL) NOPASSWD:/usr
/bin/apt-get upgrade
cumulusnetworks.com 117
Cumulus Linux 3.7 User Guide
Category Privilege Example Command sudoers Entry
Netfilter List
iptables
rules iptables -L %noc ALL=(ALL) NOPASSWD:
/sbin/iptables
Interfaces Up any
interface
ifup swp1 %noc ALL=(ALL) NOPASSWD:
/sbin/ifup
Interfaces Up/down
only swp2
ifup swp2 /
ifdown swp2
Interfaces Any IP
address
chg ip addr %noc ALL=(ALL) NOPASSWD:
{add|del} /sbin/ip addr *
192.0.2.1/30
dev swp1
Ethernet Add
bridging bridges
and ints brctl addbr br0 %noc ALL=(ALL) NOPASSWD:
/ brctl addif /sbin/brctl addbr *,/sbin
br0 swp1 /brctl addif *
cumulusnetworks.com 119
Cumulus Linux 3.7 User Guide
Troubleshooting Restart
switchd
systemctl %noc ALL=(ALL) NOPASSWD:/usr
restart switchd. /sbin/service switchd *
service
Troubleshooting Restart
any service
systemctl cron %noc ALL=(ALL) NOPASSWD:/usr
switchd.service /sbin/service
Troubleshooting Packet
capture
tcpdump %noc ALL=(ALL) NOPASSWD:/usr
/sbin/tcpdump
L3 Add static
routes
ip route add %noc ALL=(ALL) NOPASSWD:/bin
10.2.0.0/16 via /ip route add *
10.0.0.1
L3 Delete
static
routes ip route del %noc ALL=(ALL) NOPASSWD:/bin
10.2.0.0/16 via /ip route del *
10.0.0.1
L3 Any static
route chg
ip route * %noc ALL=(ALL) NOPASSWD:/bin
/ip route *
L3 Any
iproute
command ip *
L3 Non-
modal
OSPF cl-ospf area %noc ALL=(ALL) NOPASSWD:/usr
0.0.0.1 range /bin/cl-ospf
10.0.0.0/24
Related Information
sudo
Adding Yourself to sudoers
TACACS Plus
Cumulus Linux implements TACACS+ client AAA (Accounting, Authentication, and Authorization) in a
transparent way with minimal configuration. The client implements the TACACS+ protocol as described in
this IETF document. There is no need to create accounts or directories on the switch. Accounting records
are sent to all configured TACACS+ servers by default. Use of per-command authorization requires
additional setup on the switch.
Contents
This topic describes ...
Supported Features (see page 122)
Install the TACACS+ Client Packages (see page )
Configure the TACACS+ Client (see page )
TACACS+ Authentication (login) (see page )
Local Fallback Authentication (see page 124)
TACACS+ Accounting (see page )
Configure NCLU for TACACS+ Users (see page )
TACACS+ Per-command Authorization (see page )
NSS Plugin (see page 128)
TACACS Configuration Parameters (see page 129)
Remove the TACACS+ Client Packages (see page )
Troubleshooting (see page 131)
Supported Features
Authentication using PAM; includes login, ssh, sudo and su
Runs over the eth0 management interface
Ability to run in the management VRF (see page 859)
TACACS+ privilege 15 users can run any command with sudo using the /etc/sudoers.d/tacplus
file that is installed by the libtacplus-map1 package
Up to seven TACACS+ servers
secret=tacacskey
server=192.168.0.30
To specify multiple servers, they can be added, one per line, to the /etc/tacplus_servers file.
Connections are made in the order in which they are listed in this file. In most cases, you do not need to
122 09 January 2019
Cumulus Networks
Connections are made in the order in which they are listed in this file. In most cases, you do not need to
change any other parameters. You can add parameters used by any of the packages to this file, which
affects all the TACACS+ client software. For example, the timeout value for NSS lookups (see description
below) is set to 5 seconds by default in the /etc/tacplus_nss.conf file, whereas the timeout value for
other packages is 10 seconds and is set in the /etc/tacplus_servers file. The timeout value is per
connection to the TACACS+ servers. (If authorization is configured per command, the timeout occurs for
each command.) There are several (typically four) connections to the server per login attempt from PAM, as
well as two or more through NSS. Therefore, with the default timeout values, a TACACS+ server that is not
reachable can delay logins by a minute or more per unreachable server. If you must list unreachable
TACACS+ servers, place them at the end of the server list and consider reducing the timeout values.
When you add or remove TACACS+ servers, you must restart auditd (with the systemctl restart
auditd command) or you must send a signal (with killall -HUP audisp-tacplus) before audisp-
tacplus rereads the configuration to see the changed server list.
You can also configure the IP address used as the source IP address when communicating with the
TACACS+ server. See TACACS Configuration Parameters (see page 129) below for the full list of TACACS+
parameters.
Following is the complete list of the TACACS+ client configuration files, and their use.
Filename Description
/etc This is the primary file that requires configuration after installation. The file is used by
/tacplus_servers all packages with include=/etc/tacplus_servers parameters in the other
configuration files that are installed. Typically, this file contains the shared secrets;
make sure that the Linux file mode is 600.
/etc/nsswitch. When the libnss_tacplus package is installed, this file is configured to enable
conf tacplus lookups via libnss_tacplus. If you replace this file by automation or other
means, you need to add tacplus as the first lookup method for the passwd database
line.
/etc/tacplus_nss. This file sets the basic parameters for libnss_tacplus. It includes a debug variable
conf for debugging NSS lookups separately from other client packages.
/usr/share/pam- This is the configuration file for pam-auth-update to generate the files in the next
configs/tacplus row. These configurations are used at login, by su, and by ssh.
/etc/pam.d The /etc/pam.d/common-* files are updated for tacplus authentication. The files
/common-* are updated with pam-auth-update, when libpam-tacplus is installed or
removed.
/etc/sudoers.d This file allows TACACS+ privilege level 15 users to run commands with sudo. The file
/tacplus includes an example (commented out) of how to enable privilege level 15 TACACS
users to use sudo without having to enter a password and provides an example of
how to enable all TACACS users to run specific commands with sudo. Only edit this
wile with the command visudo -f /etc/sudoers.d/tacplus.
audisp-tacplus. This is the audisp plugin configuration file. Typically, no modifications are required.
conf
cumulusnetworks.com 123
Cumulus Linux 3.7 User Guide
Filename Description
/etc/audisp This is the TACACS+ server configuration file for accounting. Typically, no modifications
/audisp- are required. You can use this configuration file when you only want to debug
tac_plus.conf TACACS+ accounting issues, not all TACACS+ users.
/etc/audit/rules. The auditd rules for TACACS+ accounting. The augenrules command uses all rule
d/audisp- files to generate the rules file (described below).
tacplus.rules
/etc/audit/audit. This is the audit rules file generated when auditd is installed.
rules
You can edit the /etc/pam.d/common-* files manually. However, if you run pam-auth-update
again after making the changes, the update fails. Only perform configuration in /usr/share
/pam-configs/tacplus, then run pam-auth-update.
By default, TACACS+ users at privilege levels other than 15 are not allowed to run sudo
commands and are limited to commands that can be run with standard Linux user permissions.
1. Edit the /etc/nsswitch.conf file to remove the keyword tacplus from the line starting with
passwd. (You need to add the keyword back in step 3.)
An example of the /etc/nsswitch.conf file with the keyword tacplus removed from the line
starting with passwd is shown below.
Cumulus Networks
2. To enable the local privileged user to run sudo and NCLU commands, run the adduser commands
shown below. In the example commands, the TACACS account name is tacadmin.
The first adduser command prompts for information and a password. You can skip most
of the requested information by pressing ENTER.
3. Edit the /etc/nsswitch.conf file to add the keyword tacplus back to the line starting with
passwd (the keyword you removed in the first step).
4. Restart the netd service with the following command:
TACACS+ Accounting
TACACS+ accounting is implemented with the audisp module, with an additional plugin for auditd/
audisp. The plugin maps the auid in the accounting record to a TACACS login, based on the auid and
sessionid. The audisp module requires libnss_tacplus and uses the libtacplus_map.so library
interfaces as part of the modified lipam_tacplus package.
Communication with the TACACS+ servers is done with the libsimple-tacact1 library, through
dlopen(). A maximum of 240 bytes of command name and arguments are sent in the accounting record,
due to the TACACS+ field length limitation of 255 bytes.
All Linux commands result in an accounting record, including commands run as part of the login
process or as sub-processes of other commands. This can sometimes generate a large number
of accounting records.
cumulusnetworks.com 125
Cumulus Linux 3.7 User Guide
Configure the IP address and encryption key of the server in the /etc/tacplus_servers file. Minimal
configuration to auditd and audisp is necessary to enable the audit records necessary for accounting.
These records are installed as part of the package.
audisp-tacplus installs the audit rules for command accounting. Modifying the configuration files is not
usually necessary. However, when a management VRF (see page 859) is configured, the accounting
configuration does need special modification because the auditd service starts prior to networking. It is
necessary to add the vrf parameter and to signal the audisp-tacplus process to reread the
configuration. The example below shows that the management VRF is named mgmt. You can place the vrf
parameter in either the /etc/tacplus_servers file or in the /etc/audisp/audisp-tac_plus.conf
file.
vrf=mgmt
After editing the configuration file, send the HUP signal killall -HUP audisp-tacplus to notify the
accounting process to reread the file.
All sudo commands run by TACACS+ users generate accounting records against the original
TACACS+ login name.
For more information, refer to the audisp.8 and auditd.8 man pages.
Do not add the tacacs group to the groups_with_edit variable; this is dangerous and can
potentially enable any user to log into the switch as the root user.
...
...
After you save and exit the netd.conf file, restart the netd service. Run:
If the user/command combination is not authorized by the TACACS+ server, a message similar to the
following displays:
Option Description
-i Initializes the environment. You only need to issue this option once per username.
-a You can invoke the utility with the -a option as many times as desired. For each command in
the -a list, a symbolic link is created from tacplus-auth to the relative portion of the
command name in the local bin subdirectory. You also need to enable these commands on
the TACACS+ server (refer to the TACACS+ server documentation). It is common to have the
server allow some options to a command, but not others.
-f Re-initializes the environment. If you need to restart, issue the -f option with -i to force the
re-initialization; otherwise, repeated use of -i is ignored.
As part of the initialization:
The user's shell is changed to /bin/rbash.
cumulusnetworks.com 127
Cumulus Linux 3.7 User Guide
Option Description
For example, if you want to allow the user to be able to run the net and ip commands (if authorized by the
TACACS+ server), use the command:
Other than shell built-ins, the only two commands the privilege level 0 TACACS users can run are the ip
and net commands.
If you mistakenly add potential commands with the -a option, you can remove them. The example below
shows how to remove the net command:
Use the man command on the switch for more information on tacplus-auth and tacplus-restrict.
NSS Plugin
When used with pam_tacplus, TACACS+ authenticated users can log in without a local account on the
system using the NSS plugin that comes with the tacplus_nss package. The plugin uses the mapped
tacplus information if the user is not found in the local password file, provides the getpwnam() and
getpwuid()entry point,s and uses the TACACS+ authentication functions.
The plugin asks the TACACS+ server if the user is known, and then for relevant attributes to determine the
privilege level of the user. When the libnss_tacplus package is installed, nsswitch.conf is modified
to set tacplus as the first lookup method for passwd. If the order is changed, lookups return the local
accounts, such as tacacs0
If the user is not found, a mapped lookup is performed using the libtacplus.so exported functions. The
privilege level is appended to tacacs and the lookup searches for the name in the local password file. For
example, privilege level 15 searches for the tacacs15 user. If the user is found, the password structure is
filled in with information for the user.
If the user is not found, the privilege level is decremented and checked again until privilege level 0 (user t
acacs0) is reached. This allows use of only the two local users tacacs0 and tacacs15, if minimal
configuration is desired.
secret=STRING The secret key used to encrypt and decrypt packets sent to and received
from the server. You can specify the secret key more than once in any order
with respect to the server= parameter. When fewer secret= parameters are
specified, the last secret given is used for the remaining servers. Only use this
parameter in files such as /etc/tacplus_servers that are not world
readable.
server=HOSTNAME Adds a TACACS+ server to the servers list. Servers are queried in turn until a
match is found, or no servers remain in the list. Can be specified up to 7
server=IP_ADDR
times. An IP address can be optionally followed by a port number, preceded
by a ":". The default port is 49.
source_ip=IPv4_ADDRESS
cumulusnetworks.com 129
Cumulus Linux 3.7 User Guide
Sets the IP address used as the source IP address when communicating with
the TACACS+ server. You must specify an IPv4 address. IPv6 addresses and
hostnames are not supported. The address must must be valid for the
interface being used.
min_uid=value The minimum user ID that the NSS plugin looks up. Setting it to 0 means uid
0 (root) is never looked up, which is desirable for performance reasons. The
value should not be greater than the local TACACS+ user IDs (0 through 15),
to ensure they can be looked up.
exclude_users=user1, A comma-separated list of usernames that are never looked up by the NSS
user2,... plugin, set in the tacplus_nss.conf file. You cannot use * (asterisk) as a
wild card in the list. While it's not a legal username, bash may lookup this as a
user name during pathname completion, so it is included in this list as a
username string.
login=STRING TACACS+ authentication service (pap, chap, or login). The default value is pap.
user_homedir=1 This is not enabled by default. When enabled, a separate home directory for
each TACACS+ user is created when the TACACS+ user first logs in. By
default, the home directory in the mapping accounts in /etc/passwd (
/home/tacacs0 ... /home/tacacs15) is used. If the home directory does
not exist, it is created with the mkhomedir_helper program, in the same
manner as pam_mkhomedir.
This option is not honored for accounts with restricted shells when per-
command authorization is enabled.
acct_all=1
timeout=SECS Sets the timeout in seconds for connections to each TACACS+ server. The
default is 10 seconds for all lookups except that NSS lookups use a 5 second
timeout.
vrf=VRFNAME If the management network is in a VRF, set this variable to the VRF name. This
would usually be "mgmt". When this variable is set, the connection to the
TACACS+ accounting servers is made through the named VRF.
service TACACS+ accounting and authorization service. Examples include shell, pap,
raccess, ppp, and slip.
The default value is shell.
To remove the TACACS+ client configuration files as well as the packages (recommended), use this
command:
Troubleshooting
cumulusnetworks.com 131
Cumulus Linux 3.7 User Guide
If TACACS does not appear to be working correctly, debug the following configuration files by adding the
debug=1 parameter to one or more of these files:
/etc/tacplus_servers
/etc/tacplus_nss.conf
When this debugging is enabled, additional information is shown for the command authorization
conversation with the TACACS+ server:
To disable debugging:
If accounting records are still not being sent, add debug=1 to the /etc/audisp/audisp-tac_plus.
conf file, then issue the command above to notify the plugin. Ask the TACACS+ user to run a command
and examine the end of /var/log/syslog for messages from the plugin. You can also check the auditing
log file /var/log/audit/audit.log to be sure the auditing records are being written. If they are not,
restart the audit daemon with:
cumulusnetworks.com 133
Cumulus Linux 3.7 User Guide
Package Description
Name
audisp- This package uses auditing data from auditd to send accounting records to the TACACS+
tacplus_1. server and is started as part of auditd.
0.0-1-
cl3u3
libnss- Provides an interface between libc username lookups, the mapping functions, and the
tacplus_1. TACACS+ server.
0.1-cl3u3
tacplus- This package includes the tacplus-restrict setup utility, which enables you to perform
auth-1.0.0- per-command TACACS+ authorization. Per-command authorization is not done by default.
cl3u1
libtacplus- The mapping functionality between local and TACACS+ users on the server. Sets the
map1_1. immutable sessionid and auditing UID to ensure the original user can be tracked through
0.0-cl3u2 multiple processes and privilege changes. Sets the auditing loginuid as immutable if
supported. Creates and maintains a status database in /run/tacacs_client_map to
manage and lookup mappings.
libsimple- Provides an interface for programs to send accounting records to the TACACS+ server. Used
tacacct1_1. by audisp-tacplus.
0.0-cl3u2
libtac2- Provides the tacc testing program and TACACS+ man page.
bin_1.4.0-
cl3u2
Limitations
The current algorithm returns the first name matching the UID from the mapping file; this can be
the first or the second user that logged in.
To work around this issue, you can use the switch audit log or the TACACS server accounting logs to
determine which processes and files are created by each user.
For commands that do not execute other commands (for example, changes to configurations in an
editor, or actions with tools like clagctl and vtysh), no additional accounting is done.
Per-command authorization is implemented at the most basic level (commands are permitted or
denied based on the standard Linux user permissions for the local TACACS users and only privilege
level 15 users can run sudo commands by default).
The Linux auditd system does not always generate audit events for processes when terminated with a
signal (with the kill system call or internal errors such as SIGSEGV). As a result, processes that exit on a
signal that is not caught and handled, might not generate a STOP accounting record.
However, the command does remove the home directory. The user can still log in on that account, but will
not have a valid home directory. This is a known upstream issue with the deluser command for all non-
local users.
Only use the --remove-home option when the user_homedir=1 configuration command is in use.
RADIUS AAA
Cumulus Networks offers add-on packages that enable RADIUS users to log in to Cumulus Linux switches in
a transparent way with minimal configuration. There is no need to create accounts or directories on the
switch. Authentication is handled with PAM and includes login, ssh, sudo and su.
Contents
cumulusnetworks.com 135
Cumulus Linux 3.7 User Guide
Contents
This topic describes ...
Install the RADIUS Packages (see page 136)
Configure the RADIUS Client (see page 137)
Enable Login without Local Accounts (see page 138)
Local Fallback Authentication (see page 138)
Verify RADIUS Client Configuration (see page 139)
Remove RADIUS Client Packages (see page 140)
Limitations (see page 141)
Related Information (see page 141)
After installation is complete, either reboot the switch or run the sudo systemctl restart netd
command.
The libpam-radius-auth package supplied with the Cumulus Linux RADIUS client is a newer version
than the one in Debian Jessie. This package has added support for IPv6, the src_ip option described
below, as well as a number of bug fixes and minor features. The package also includes VRF support,
provides man pages describing the PAM and RADIUS configuration, and sets the SUDO_PROMPT
environment variable to the login name for RADIUS mapping support.
The libnss_mapuser package is specific to Cumulus Linux and supports the getgrent, getgrnam and
getgrgid library interfaces. These interfaces add logged in RADIUS users to the group member list for
groups that contain the mapped_user (radius_user) if the RADIUS account is unprivileged, and add
privileged RADIUS users to the group member list for groups that contain the mapped_priv_user (
radius_priv_user) during the group lookups.
During package installation:
The PAM configuration is modified automatically using pam-auth-update (8), and the NSS
configuration file /etc/nsswitch.conf is modified to add the mapuser and mapuid plugins. If you
remove or purge the packages, these files are modified to remove the configuration for these
plugins.
The radius_shell package is added, which installs the /sbin/radius_shell and setcap
cap_setuid program used as the login shell for RADIUS accounts. The package adjusts the UID
when needed, then runs the bash shell with the same arguments. When installed, the package
changes the shell of the RADIUS accounts to /sbin//radius_shell, and to /bin/shell if the
package is removed. This package is required for privileged RADIUS users to be enabled. It is not
required for regular RADIUS client use.
The radius_user account is added to the netshow group and the radius_priv_user account
to the netedit and sudo groups. This change enables all RADUS logins to run NCLU net show
commands and all privileged RADIUS users to also run net add, net del, and net commit
commands, and to use sudo.
1. Add the hostname or IP address of at least one RADIUS server (such as a freeradius server on Linux)
and the shared secret used to authenticate and encrypt communication with each server.
Multiple server configuration lines are verified in the order listed. Other than memory, there is no
limit to the number of RADIUS servers you want to use.
The server port number or name is optional. The system looks up the port in the /etc/services
file. However, you can override the ports in the /etc/pam_radius_auth.conf file.
2. If the server is slow or latencies are high, change the timeout setting. The setting defaults to 3
seconds.
3. If you want to use a specific interface to reach the RADIUS server, specify the src_ip option. You
can specify the hostname of the interface, an IPv4, or an IPv6 address. If you specify the src_ip
option, you must also specify the timeout option.
4. Set the vrf-name field. This is typically set to mgmt if you are using a management VRF (see page
859). You cannot specify more than one VRF.
The configuration file includes the mapped_priv_user field that sets the account used for privileged
RADIUS users and the priv-lvl field that sets the minimum value for the privilege level to be considered
a privileged login (the default value is 15). If you edit these fields, make sure the values match those set in
the /etc/nss_mapuser.conf file.
The following example provides a sample /etc/pam_radius_auth.conf file configuration:
mapped_priv_user radius_priv_user
# server[:port] shared_secret timeout (secs) src_ip
192.168.0.254 secretkey
other-server othersecret 3 192.168.1.10
# when mgmt vrf is in use
vrf-name mgmt
If this is the first time you are configuring the RADIUS client, uncomment the debug line to help
with troubleshooting. The debugging messages are written to /var/log/syslog. When the
RADIUS client is working correctly, comment out the debug line.
As an optional step, you can set PAM configuration keywords by editing the /usr/share/pam-configs
/radius file. After you edit the file, you must run the pam-auth-update --package command. PAM
configuration keywords are described in the pam_radius_auth (8) man page.
cumulusnetworks.com 137
Cumulus Linux 3.7 User Guide
radius_user:x:1017:1002:radius user:/home/radius_user:/bin/bash
then the matching line returned by running getent passwd dave is:
The home directory /home/dave is created during the login process if it does not already exist and is
populated with the standard skeleton files by the mkhomedir_helper command.
The configuration file /etc/nss_mapuser.conf is used to configure the plugins. The file includes the
mapped account name, which is radius_user by default. You can change the mapped account name by
editing the file. The nss_mapuser (5) man page describes the configuration file.
A flat file mapping is done based on the session number assigned during login, which persists across su
and sudo. The mapping is removed at logout.
1. Add a local privileged user account. For example, if the radius_priv_user account in the /etc
/passwd file is radius_priv_user:x:1002:1001::/home/radius_priv_user:/sbin
/radius_shell, run the following command to add a local privileged user account named
johnadmin:
2. To enable the local privileged user to run sudo and NCLU commands, run the following commands:
3. Edit the /etc/passwd file to move the local user line before to the radius_priv_user line:
...
johnadmin:x:1002:1001::/home/johnadmin:/sbin/radius_shell
radius_priv_user:x:1002:1001::/home/radius_priv_user:/sbin
/radius_shell
4. To set the local password for the local user, run the following command:
In this example, the admin user is a privileged RADIUS user (with privilege level 15) so is able to add
interface swp1.
source /etc/network/interfaces.d/*.intf
cumulusnetworks.com 139
Cumulus Linux 3.7 User Guide
auto eth0
iface eth0 inet dhcp
+
+auto swp1
+iface swp1
...
When you remove the packages, the plugins are removed from the /etc/nsswitch.conf file and from
the PAM files.
To remove all configuration files for these packages, run:
The RADIUS fixed account is not removed from the /etc/passwd or /etc/group file and the
home directories are not removed. They remain in case there are modifications to the account or
files in the home directories.
To remove the home directories of the RADIUS users, first get the list by running:
For all users listed, except the radius_user, run this command to remove the home directories:
where USERNAME is the account name (the home directory relative portion). This command gives the
following warning because the user is not listed in the /etc/passwd file.
After removing all the RADIUS users, run the command to remove the fixed account. If the account has
been changed in the /etc/nss_mapuser.conf file, use that account name instead of radius_user.
Limitations
If two or more RADIUS users are logged in simultaneously, a UID lookup only returns the user that logged in
first. Any processes run by either user get attributed to both, and all files created by either user get
attributed to the first name matched. This is similar to adding two local users to the password file with the
same UID and GID, and is an inherent limitation of using the UID for the fixed user from the password file.
The current algorithm returns the first name matching the UID from the mapping file; this might be the first
or second user that logged in.
Related Information
TACACS+ client (see page 121)
Cumulus Networks RADIUS demo on GitHub
Cumulus Network TACACS demo on GitHub
Netfilter - ACLs
Netfilter is the packet filtering framework in Cumulus Linux as well as most other Linux distributions. There
are a number of tools available for configuring ACLs in Cumulus Linux:
iptables, ip6tables, and ebtables are Linux userspace tools used to administer filtering rules
for IPv4 packets, IPv6 packets, and Ethernet frames (layer 2 using MAC addresses).
NCLU (see page 88) is a Cumulus Linux-specific userspace tool used to configure custom ACLs.
cl-acltool is a Cumulus Linux-specific userspace tool used to administer filtering rules and
configure default ACLs.
NCLU and cl-acltool operate on various configuration files and use iptables, ip6tables, and
ebtables to install rules into the kernel. In addition, NCLU and cl-acltool program rules in hardware
for interfaces involving switch port interfaces, which iptables, ip6tables and ebtables cannot do on
their own.
In many instances, you can use NCLU to configure ACLs; however, in some cases, you must use
cl-acltool. The examples below specify when to use which tool.
If you need help to configure ACLs, run net example acl to see a basic configuration:
Click to see the example ...
cumulusnetworks.com 141
Cumulus Linux 3.7 User Guide
The interfaces in the sample configuration in net example acl are layer 3; they are
not layer 2 bridge members.
Contents
This topic describes ...
Traffic Rules In Cumulus Linux (see page 144)
Chains (see page 144)
Tables (see page 145)
Rules (see page 146)
How Rules Are Parsed and Applied (see page 147)
Rule Placement in Memory (see page 149)
Nonatomic Update Mode and Atomic Update Mode (see page 149)
Use iptables, ip6tables, and ebtables Directly (see page 152)
Estimate the Number of Rules (see page 153)
Match SVI and Bridged Interfaces in Rules (see page 154)
Install and Manage ACL Rules with NCLU (see page 155)
Install and Manage ACL Rules with cl-acltool (see page 156)
Install Packet Filtering (ACL) Rules (see page 157)
Specify the Policy Files to Install (see page 159)
Hardware Limitations on Number of Rules (see page 160)
Broadcom Tomahawk Limits (see page 160)
Broadcom Trident II+ and Trident3 Limits (see page )
Broadcom Trident II Limits (see page 161)
Broadcom Helix4 Limits (see page 161)
Mellanox Spectrum Limits (see page 162)
Supported Rule Types (see page 162)
iptables/ip6tables Rule Support (see page 163)
ebtables Rule Support (see page 164)
Other Unsupported Rules (see page 164)
IPv6 Egress Rules on Broadcom Switches (see page 165)
Common Examples (see page 166)
Control Plane and Data Plane Traffic (see page 166)
Set DSCP on Transit Traffic (see page 168)
Verify DSCP Values on Transit Traffic (see page 168)
Check the Packet and Byte Counters for ACL Rules (see page 169)
Filter Specific TCP Flags (see page 171)
Example Scenario (see page 172)
Switch 1 Configuration (see page 172)
Switch 2 Configuration (see page 173)
Egress Rule (see page 174)
Ingress Rule (see page 174)
cumulusnetworks.com 143
Cumulus Linux 3.7 User Guide
Chains
Netfilter describes the mechanism for which packets are classified and controlled in the Linux kernel.
Cumulus Linux uses the Netfilter framework to control the flow of traffic to, from, and across the switch.
Netfilter does not require a separate software daemon to run; it is part of the Linux kernel itself. Netfilter
asserts policies at layers 2, 3 and 4 of the OSI model by inspecting packet and frame headers based on a
list of rules. Rules are defined using syntax provided by the iptables, ip6tables and ebtables
userspace applications.
The rules created by these programs inspect or operate on packets at several points in the life of the
packet through the system. These five points are known as chains and are shown here:
Tables
When building rules to affect the flow of traffic, the individual chains can be accessed by tables. Linux
provides three tables by default:
Filter classifies traffic or filters traffic
NAT applies Network Address Translation rules
cumulusnetworks.com 145
Cumulus Linux 3.7 User Guide
Rules
Rules are the items that actually classify traffic to be acted upon. Rules are applied to chains, which are
attached to tables, similar to the graphic below.
Rules have several different components; the examples below highlight those different components.
Table: The first argument is the table. Notice the second example does not specify a table, that is
because the filter table is implied if a table is not specified.
Chain: The second argument is the chain. Each table supports several different chains. See
Understanding Tables above.
Matches: The third argument(s) are called the matches. You can specify multiple matches in a single
rule. However, the more matches you use in a rule, the more memory that rule consumes.
Jump: The jump specifies the target of the rule; that is, what action to take if the packet matches the
rule. If this option is omitted in a rule, then matching the rule will have no effect on the packet's fate,
but the counters on the rule will be incremented.
Target(s): The target can be a user-defined chain (other than the one this rule is in), one of the
special built-in targets that decides the fate of the packet immediately (like DROP), or an extended
target. See the Supported Rule Types and Common Usages (see page 162) section below for
examples of different targets.
cumulusnetworks.com 147
Cumulus Linux 3.7 User Guide
All rules are terminating; after a rule matches, the action is carried out and no more rules are
processed. The exception to this is when a SETCLASS rule is placed immediately before another
rule; this exists multiple times in the default ACL configuration.
In the example below, the SETCLASS action applied with the --in-interface option, creates the
internal ASIC classification, and continues to process the next rule, which does the rate-limiting for
the matched protocol:
If multiple contiguous rules with the same match criteria are applied to --in-interface
, only the first rule gets processed and then terminates processing. This is a
misconfiguration; there is no reason to have duplicate rules with different actions.
When processing traffic, rules affecting the FORWARD chain that specify an ingress interface are
performed prior to rules that match on an egress interface. As a workaround, rules that only affect
the egress interface can have an ingress interface wildcard (currently, only swp+ and bond+ are
supported as wildcard names; see below) that matches any interface applied so that you can
maintain order of operations with other input interface rules. For example, with the following rules:
If you modify the rules like this, they are performed in order:
When using rules that do a mangle and a filter lookup for a packet, Cumulus Linux processes them
in parallel and combines the action.
If a switch port is assigned to a bond, any egress rules must be assigned to the bond.
When using the OUTPUT chain, rules must be assigned to the source. For example, if a rule is
assigned to the switch port in the direction of traffic but the source is a bridge (VLAN), the traffic is
not affected by the rule and must be applied to the bridge.
If all transit traffic needs to have a rule applied, use the FORWARD chain, not the OUTPUT chain.
ebtable rules are put into either the IPv4 or IPv6 memory space depending on whether the rule
utilizes IPv4 or IPv6 to make a decision. Layer 2-only rules that match the MAC address are put into
the IPv4 memory space.
On Broadcom switches, the ingress INPUT chain rules match layer 2 and layer 3 multicast packets
before multicast packet replication has occurred; therefore, a DROP rule affects all copies.
If you set an output flag with the INPUT chain, you see an error. For example, running cl-
acltool -i on the following rule:
cumulusnetworks.com 149
Cumulus Linux 3.7 User Guide
To increase the number of ACL rules that can be configured, configure the switch to operate in nonatomic
mode.
Instead of reserving 50% of your TCAM space for atomic updates, incremental update uses the available
150 09 January 2019
Cumulus Networks
Instead of reserving 50% of your TCAM space for atomic updates, incremental update uses the available
free space to write the new TCAM rules and swap over to the new rules after this is complete. Cumulus
Linux then deletes the old rules and frees up the original TCAM space. If there is insufficient free space to
complete this task, the original nonatomic update is performed, which interrupts traffic.
1. Updates are performed incrementally, one table at a time without stopping traffic.
2. Cumulus Linux checks if the rules in a table have changed since the last time they were installed; if a
table does not have any changes, it is not reinstalled.
3. If there are changes in a table, the new rules are populated in new groups or slices in hardware, then
that table is switched over to the new groups or slices.
4. Finally, old resources for that table are freed. This process is repeated for each of the tables listed
above.
5. If sufficient resources do not exist to hold both the new rule set and old rule set, the regular
nonatomic mode is attempted. This interrupts network traffic.
6. If the regular nonatomic update fails, Cumulus Linux reverts back to the previous rules.
1. Edit /etc/cumulus/switchd.conf.
2. Add the following line to the file:
acl.non_atomic_update_mode = TRUE
3. Restart switchd:
During regular non-incremental nonatomic updates, traffic is stopped first, then enabled after the
new configuration is written into the hardware completely.
Appears to work, and the rule appears when you run cl-acltool -L:
TABLE filter :
Chain INPUT (policy ACCEPT 72 packets, 5236 bytes)
pkts bytes target prot opt in out source destination
0 0 DROP icmp -- any any anywhere anywhere icmp echo-request
However, the rule is not synced to hardware when applied in this way and running cl-acltool -i or
reboot removes the rule without replacing it. To ensure all rules that can be in hardware are hardware
accelerated, place them in /etc/cumulus/acl/policy.conf and install them by running cl-acltool
-i.
An entry with multiple comma-separated output interfaces is split into one rule for each output
interface (listed after --out-interface below). This entry splits into two rules:
An entry with both input and output comma-separated interfaces is split into one rule for each
combination of input and output interface (listed after --in-interface and --out-interface
below). This entry splits into four rules:
An entry with multiple layer 4 port ranges is split into one rule for each range (listed after --dports
below). For example, this entry splits into two rules:
cumulusnetworks.com 153
Cumulus Linux 3.7 User Guide
[ebtables]
-A FORWARD -i br0.100 -p IPv4 --ip-protocol icmp -j DROP
-A FORWARD -o br0.100 -p IPv4 --ip-protocol icmp -j ACCEPT
[iptables]
-A FORWARD -i br0.100 -p icmp -j DROP
-A FORWARD --out-interface br0.100 -p icmp -j ACCEPT
-A FORWARD --in-interface br0.100 -j POLICE --set-mode pkt --set-
rate 1 --set-burst 1 --set-class 0
[ebtables]
-A FORWARD -i br0 -p IPv4 --ip-protocol icmp -j DROP
-A FORWARD -o br0 -p IPv4 --ip-protocol icmp -j ACCEPT
[iptables]
-A FORWARD -i br0 -p icmp -j DROP
-A FORWARD --out-interface br0 -p icmp -j ACCEPT
-A FORWARD --in-interface br0 -j POLICE --set-mode pkt --set-rate
1 --set-burst 1 --set-class 0
You create this rule, called EXAMPLE1, using NCLU like this:
All options, such as the -j and -p, even FORWARD in the above rule, are added automatically when you
apply the rule to the control plane; NCLU figures it all out for you.
You can also set a priority value, which specifies the order in which the rules get executed and the order in
which they appear in the rules file. Lower numbers are executed first. To add a new rule in the middle, first
run net show config acl, which displays the priority numbers. Otherwise, new rules get appended to
the end of the list of rules in the nclu_acl.conf and 50_nclu_acl.rules files.
If you need to hand edit a rule, do not edit the 50_nclu_acl.rules file. Instead, edit the
nclu_acl.conf file.
After you add the rule, you need to apply it to an inbound or outbound interface using net add int acl.
The inbound interface in our example is swp1:
After you commit your changes, you can verify the rule you created with NCLU by running net show
configuration acl:
cumulusnetworks.com 155
Cumulus Linux 3.7 User Guide
interface swp1
acl ipv4 EXAMPLE1 inbound
Or you can see all of the rules installed by running cat on the 50_nclu_acl.rules file:
For INPUT and FORWARD rules, apply the rule to a control plane interface using net add control-
plane:
This deletes all rules from the 50_nclu_acl.rules file with that name. It also deletes the interfaces
referenced in the nclu_acl.conf file.
To examine the current state of chains and list all installed rules, run:
TABLE filter :
Chain INPUT (policy ACCEPT 90 packets, 14456 bytes)
pkts bytes target prot opt in out source destination
0 0 DROP all -- swp+ any 240.0.0.0/5 anywhere
0 0 DROP all -- swp+ any loopback/8 anywhere
0 0 DROP all -- swp+ any base-address.mcast.net/8 anywhere
0 0 DROP all -- swp+ any 255.255.255.255 anywhere ...
To list installed rules using native iptables, ip6tables and ebtables, use the -L option with the
respective commands:
If the install fails, ACL rules in the kernel and hardware are rolled back to the previous state. Errors from
programming rules in the kernel or ASIC are reported appropriately.
By default:
ACL policy files are located in /etc/cumulus/acl/policy.d/.
All *.rules files in this directory are included in /etc/cumulus/acl/policy.conf.
All files included in this policy.conf file are installed when the switch boots up.
The policy.conf file expects rules files to have a .rules suffix as part of the file name.
[iptables]
-A INPUT --in-interface swp1 -p tcp --dport 80 -j ACCEPT
-A FORWARD --in-interface swp1 -p tcp --dport 80 -j ACCEPT
[ip6tables]
-A INPUT --in-interface swp1 -p tcp --dport 80 -j ACCEPT
-A FORWARD --in-interface swp1 -p tcp --dport 80 -j ACCEPT
[ebtables]
-A INPUT -p IPv4 -j ACCEPT
-A FORWARD -p IPv4 -j ACCEPT
You can use wildcards or variables to specify chain and interface lists to ease administration of rules.
Interface Wildcards
Currently only swp+ and bond+ are supported as wildcard names. There might be kernel
restrictions in supporting more complex wildcards likes swp1+ etc.
INGRESS = swp+
INPUT_PORT_CHAIN = INPUT,FORWARD
[iptables]
-A $INPUT_PORT_CHAIN --in-interface $INGRESS -p tcp --dport 80 -j
ACCEPT
[ip6tables]
-A $INPUT_PORT_CHAIN --in-interface $INGRESS -p tcp --dport 80 -j
ACCEPT
[ebtables]
-A INPUT -p IPv4 -j ACCEPT
You can write ACL rules for the system into multiple files under the default /etc/cumulus/acl/policy.
d/ directory. The ordering of rules during installation follows the sort order of the files based on their file
names.
Use multiple files to stack rules. The example below shows two rules files separating rules for management
and datapath traffic:
cumulus@switch:~$ ls /etc/cumulus/acl/policy.d/
00sample_mgmt.rules 01sample_datapath.rules
cumulus@switch:~$ cat /etc/cumulus/acl/policy.d/00sample_mgmt.rules
INGRESS_INTF = swp+
INGRESS_CHAIN = INPUT
[iptables]
# protect the switch management
-A $INGRESS_CHAIN --in-interface $INGRESS_INTF -s 10.0.14.2 -d
10.0.15.8 -p tcp -j ACCEPT
-A $INGRESS_CHAIN --in-interface $INGRESS_INTF -s 10.0.11.2 -d
10.0.12.8 -p tcp -j ACCEPT
-A $INGRESS_CHAIN --in-interface $INGRESS_INTF -d 10.0.16.8 -p udp -j
DROP
[iptables]
-A $INGRESS_CHAIN --in-interface $INGRESS_INTF -s 192.0.2.5 -p icmp -
j ACCEPT
-A $INGRESS_CHAIN --in-interface $INGRESS_INTF -s 192.0.2.6 -d
192.0.2.4 -j DROP
-A $INGRESS_CHAIN --in-interface $INGRESS_INTF -s 192.0.2.2 -d
192.0.2.8 -j DROP
cumulusnetworks.com 159
Cumulus Linux 3.7 User Guide
include /etc/cumulus/acl/policy.d/01_new.datapathacl
In the tables below, the default rules count toward the limits listed. The raw limits below assume only one
ingress and one egress table are present.
Ingress limit with 256 (36 default) 256 (29 default) 768 (36 default) 768 (29 default)
default rules
Ingress limit with 2048 (36 default) 3072 (29 default) 6144 (36 default) 6144 (29 default)
default rules
Ingress limit with 512 (36 default) 768 (29 default) 1536 (36 default) 1536 (29 default)
default rules
768 (36 default) 384 (29 default) 1792 (36 default) 896 (29 default)
cumulusnetworks.com 161
Cumulus Linux 3.7 User Guide
To learn more about any of the options shown in the tables below, run iptables -h [name
of option]. The same help syntax works for options for ip6tables and ebtables.
Click to see an example of help syntax for an ebtables target
tricolorpolice option:
--set-color-mode STRING setting the mode in blind or aware
--set-cir INT setting committed information rate in kbits per
second
--set-cbs INT setting committed burst size in kbyte
--set-pir INT setting peak information rate in kbits per
second
--set-ebs INT setting excess burst size in kbyte
--set-conform-action-dscp INT setting dscp value if the
action is accept for conforming packets
--set-exceed-action-dscp INT setting dscp value if the action
is accept for exceeding packets
--set-violate-action STRING setting the action (accept/drop)
for violating packets
--set-violate-action-dscp INT setting dscp value if the
action is accept for violating packets
Supported chains for the filter table:
INPUT FORWARD OUTPUT
cumulusnetworks.com 163
Cumulus Linux 3.7 User Guide
POLICE
TRICOLORPOLICE
SETCLASS
Extended Ulog
Targets
log
Unique to Cumulus Linux:
span
erspan
police
tricolorpolice
setclass
Caveats
Splitting rules across the ingress TCAM and the egress TCAM causes the ingress IPv6 part of the rule to
match packets going to all destinations, which can interfere with the regular expected linear rule match in a
sequence.
Examples
A higher rule can prevent a lower rule from being matched unexpectedly:
Rule 1: -A FORWARD --out-interface vlan100 -p icmp6 -j ACCEPT
Rule 2: -A FORWARD --out-interface vlan101 -p icmp6 -s 01::02 -j ACCEPT
Rule 1 matches all icmp6 packets from to all out interfaces in the ingress TCAM.
This prevents rule 2 from getting matched, which is more specific but with a different out interface.
Make sure to put more specific matches above more general matches even if the output interfaces are
different.
When you have two rules with the same output interface, the lower rule might match unexpectedly
depending on the presence of the previous rules.
Rule 1: -A FORWARD --out-interface vlan100 -p icmp6 -j ACCEPT
Rule 2: -A FORWARD --out-interface vlan101 -s 00::01 -j DROP
Rule 3: -A FORWARD --out -interface vlan101 -p icmp6 -j ACCEPT
Rule 3 still matches for an icmp6 packet with sip 00:01 going out of vlan101. Rule 1 interferes with the
normal function of rule 2 and/or rule 3.
cumulusnetworks.com 165
Cumulus Linux 3.7 User Guide
Examples
When you have two adjacent rules with the same match and different output interfaces, such as:
Rule 1: -A FORWARD --out-interface vlan100 -p icmp6 -j ACCEPT
Rule 2: -A FORWARD --out-interface vlan101 -p icmp6 -j DROP
Rule 2 will never be match on ingress. Both rules share the same mark.
Common Examples
Counters on POLICE ACL rules in iptables do not currently show the packets that are dropped
due to those rules.
Use the POLICE target with iptables. POLICE takes these arguments:
--set-class value sets the system internal class of service queue configuration to value.
--set-rate value specifies the maximum rate in kilobytes (KB) or packets.
--set-burst value specifies the number of packets or kilobytes (KB) allowed to arrive
sequentially.
--set-mode string sets the mode in KB (kilobytes) or pkt (packets) for rate and burst size.
For example, to rate limit the incoming traffic on swp1 to 400 packets per second with a burst of 100
packets per second and set the class of the queue for the policed traffic as 0, set this rule in your
appropriate .rules file:
Here is another example of control plane ACL rules to lock down the switch. You specify them in /etc
/cumulus/acl/policy.d/00control_plane.rules:
View the contents of the file ...
INGRESS_INTF = swp+
INGRESS_CHAIN = INPUT
INNFWD_CHAIN = INPUT,FORWARD
MARTIAN_SOURCES_4 = "240.0.0.0/5,127.0.0.0/8,224.0.0.0/8,
255.255.255.255/32"
MARTIAN_SOURCES_6 = "ff00::/8,::/128,::ffff:0.0.0.0/96,::1/128"
# Custom Policy Section
SSH_SOURCES_4 = "192.168.0.0/24"
NTP_SERVERS_4 = "192.168.0.1/32,192.168.0.4/32"
DNS_SERVERS_4 = "192.168.0.1/32,192.168.0.4/32"
SNMP_SERVERS_4 = "192.168.0.1/32"
[iptables]
-A $INNFWD_CHAIN --in-interface $INGRESS_INTF -s $MARTIAN_SOURCES_4 -
j DROP
-A $INGRESS_CHAIN --in-interface $INGRESS_INTF -p ospf -j POLICE --
set-mode pkt --set-rate 2000 --set-burst 2000 --set-class 7
-A $INGRESS_CHAIN --in-interface $INGRESS_INTF -p tcp --dport bgp -j
POLICE --set-mode pkt --set-rate 2000 --set-burst 2000 --set-class 7
-A $INGRESS_CHAIN --in-interface $INGRESS_INTF -p tcp --sport bgp -j
POLICE --set-mode pkt --set-rate 2000 --set-burst 2000 --set-class 7
-A $INGRESS_CHAIN --in-interface $INGRESS_INTF -p icmp -j POLICE --
set-mode pkt --set-rate 100 --set-burst 40 --set-class 2
-A $INGRESS_CHAIN --in-interface $INGRESS_INTF -p udp --dport bootps:
bootpc -j POLICE --set-mode pkt --set-rate 100 --set-burst 100 --set-
class 2
-A $INGRESS_CHAIN --in-interface $INGRESS_INTF -p tcp --dport bootps:
bootpc -j POLICE --set-mode pkt --set-rate 100 --set-burst 100 --set-
class 2
-A $INGRESS_CHAIN --in-interface $INGRESS_INTF -p igmp -j POLICE --
set-mode pkt --set-rate 300 --set-burst 100 --set-class 6
# Custom policy
-A $INGRESS_CHAIN --in-interface $INGRESS_INTF -p tcp --dport 22 -s
$SSH_SOURCES_4 -j ACCEPT
-A $INGRESS_CHAIN --in-interface $INGRESS_INTF -p udp --sport 123 -s
$NTP_SERVERS_4 -j ACCEPT
-A $INGRESS_CHAIN --in-interface $INGRESS_INTF -p udp --sport 53 -s
$DNS_SERVERS_4 -j ACCEPT
cumulusnetworks.com 167
Cumulus Linux 3.7 User Guide
[iptables]
[iptables]
#Match and count the packets that match SSH traffic with DSCP EF
-A FORWARD -p tcp --dport 22 -m dscp --dscp 46 -j ACCEPT
#Match and count the packets in a port range with DSCP AF41
-A FORWARD -p tcp -s 10.0.0.17/32 --sport 10000:20000 -d 10.0.100.27
/32 --dport 10000:20000 -m dscp --dscp 34 -j ACCEPT
Note: Policing counters do not increment on switches with the Spectrum ASIC.
# Send 100 packets with a small payload on host1 with a DSCP value of
AF13 with a destination of host2:
cumulusnetworks.com 169
Cumulus Linux 3.7 User Guide
INGRESS_INTF = swp20,swp21
[iptables]
-A INPUT,FORWARD --in-interface $INGRESS_INTF -p tcp --syn -j DROP
[ip6tables]
-A INPUT,FORWARD --in-interface $INGRESS_INTF -p tcp --syn -j DROP
The --syn flag in the above rule matches packets with the SYN bit set and the ACK, RST, and FIN bits are
cleared. It is equivalent to using -tcp-flags SYN,RST,ACK,FIN SYN. For example, you can write the
above rule as:
cumulusnetworks.com 171
Cumulus Linux 3.7 User Guide
Example Scenario
The following example scenario demonstrates how several different rules are applied.
Following are the configurations for the two switches used in these examples. The configuration for each
switch appears in /etc/network/interfaces on that switch.
Switch 1 Configuration
...
/etc/network/interfaces
=======================
auto swp1
iface swp1
auto swp2
iface swp2
auto swp3
iface swp3
auto swp4
iface swp4
auto bond2
iface bond2
bond-slaves swp3 swp4
auto br-untagged
iface br-untagged
address 10.0.0.1/24
bridge_ports swp1 bond2
bridge_stp on
auto br-tag100
iface br-tag100
address 10.0.100.1/24
bridge_ports swp2.100 bond2.100
bridge_stp on
...
Switch 2 Configuration
...
/etc/network/interfaces
=======================
auto swp3
iface swp3
auto swp4
iface swp4
auto br-untagged
iface br-untagged
address 10.0.0.2/24
bridge_ports bond2
bridge_stp on
auto br-tag100
iface br-tag100
address 10.0.100.2/24
bridge_ports bond2.100
bridge_stp on
auto bond2
cumulusnetworks.com 173
Cumulus Linux 3.7 User Guide
iface bond2
bond-slaves swp3 swp4
...
Egress Rule
The following rule blocks any TCP traffic with destination port 200 going from host1 or host2 through the
switch (corresponding to rule 1 in the diagram above).
Ingress Rule
The following rule blocks any UDP traffic with source port 200 going from host1 through the switch
(corresponding to rule 2 in the diagram above).
Input Rule
The following rule blocks any UDP traffic with source port 200 and destination port 50 going from host1 to
the switch (corresponding to rule 3 in the diagram above).
Output Rule
The following rule blocks any TCP traffic with source port 123 and destination port 123 going from Switch 1
to host2 (corresponding to rule 4 in the diagram above).
Combined Rules
The following rule blocks any TCP traffic with source port 123 and destination port 123 going from any
switch port egress or generated from Switch 1 to host1 or host2 (corresponding to rules 1 and 4 in the
diagram above).
[iptables]
-A FORWARD -o swp+ -p tcp --sport 123 --dport 123 -j DROP
-A OUTPUT -o swp+ -p tcp --sport 123 --dport 123 -j DROP
Useful Links
www.netfilter.org
Netfilter.org packet filtering how-to
cumulusnetworks.com 175
Cumulus Linux 3.7 User Guide
acl.non_atomic_update_mode = TRUE
Running cl-acltool -i (the installation command) resets all rules and deletes anything that is
176 09 January 2019
Cumulus Networks
Running cl-acltool -i (the installation command) resets all rules and deletes anything that is
not stored in /etc/cumulus/acl/policy.conf.
For example, running the following command works:
However, running cl-acltool -i or reboot removes them. To ensure all rules that can be in
hardware are hardware accelerated, place them in the /etc/cumulus/acl/policy.conf file,
then run cl-acltool -i.
[iptables]
-A $INGRESS_CHAIN -p udp --dport $BFD_ECHO_PORT -j POLICE --set-mode
pkt --set-rate 2000 --set-burst 2000
-A $INGRESS_CHAIN -p udp --dport $BFD_PORT -j POLICE --set-mode pkt --
set-rate 2000 --set-burst 2000
-A $INGRESS_CHAIN -p udp --dport $BFD_MH_PORT -j POLICE --set-mode
pkt --set-rate 2000 --set-burst 2000
[ip6tables]
-A $INGRESS_CHAIN --in-interface $INGRESS_INTF -p udp --dport
$BFD_ECHO_PORT -j POLICE --set-mode pkt --set-rate 2000 --set-burst
2000 --set-class 7
-A $INGRESS_CHAIN --in-interface $INGRESS_INTF -p udp --dport
$BFD_PORT -j POLICE --set-mode pkt --set-rate 2000 --set-burst 2000 --
set-class 7
cumulusnetworks.com 177
Cumulus Linux 3.7 User Guide
To work around this limitation, set the rate and burst of all 6 of these rules to the same values, using the --
set-rate and --set-burst options.
cumulusnetworks.com 179
Cumulus Linux 3.7 User Guide
cumulusnetworks.com 181
Cumulus Linux 3.7 User Guide
TABLE mangle :
Chain PREROUTING (policy ACCEPT 7 packets, 718 bytes)
pkts bytes target prot opt in out source
destination
Chain INPUT (policy ACCEPT 0 packets, 0 bytes)
pkts bytes target prot opt in out source
destination
Chain FORWARD (policy ACCEPT 0 packets, 0 bytes)
pkts bytes target prot opt in out source
destination
Chain OUTPUT (policy ACCEPT 0 packets, 0 bytes)
pkts bytes target prot opt in out source
destination
Chain POSTROUTING (policy ACCEPT 0 packets, 0 bytes)
pkts bytes target prot opt in out source
destination
TABLE raw :
Chain PREROUTING (policy ACCEPT 7 packets, 718 bytes)
pkts bytes target prot opt in out source
destination
Chain OUTPUT (policy ACCEPT 0 packets, 0 bytes)
pkts bytes target prot opt in out source
destination
-------------------------------
Listing rules of type ebtables:
-------------------------------
TABLE filter :
Bridge table: filter
Bridge chain: INPUT, entries: 16, policy: ACCEPT
-d BGA -i swp+ -j setclass --class 7 , pcnt = 0 -- bcnt = 0
-d BGA -j police --set-mode pkt --set-rate 2000 --set-burst 2000 ,
pcnt = 0 -- bcnt = 0
-d 1:80:c2:0:0:2 -i swp+ -j setclass --class 7 , pcnt = 0 -- bcnt = 0
-d 1:80:c2:0:0:2 -j police --set-mode pkt --set-rate 2000 --set-burst
2000 , pcnt = 0 -- bcnt = 0
-d 1:80:c2:0:0:e -i swp+ -j setclass --class 6 , pcnt = 0 -- bcnt = 0
-d 1:80:c2:0:0:e -j police --set-mode pkt --set-rate 200 --set-burst
200 , pcnt = 0 -- bcnt = 0
-d 1:0:c:cc:cc:cc -i swp+ -j setclass --class 6 , pcnt = 0 -- bcnt = 0
-d 1:0:c:cc:cc:cc -j police --set-mode pkt --set-rate 200 --set-burst
200 , pcnt = 0 -- bcnt = 0
-p ARP -i swp+ -j setclass --class 2 , pcnt = 0 -- bcnt = 0
-p ARP -j police --set-mode pkt --set-rate 400 --set-burst 100 , pcnt
= 0 -- bcnt = 0
-d 1:0:c:cc:cc:cd -i swp+ -j setclass --class 7 , pcnt = 0 -- bcnt = 0
-d 1:0:c:cc:cc:cd -j police --set-mode pkt --set-rate 2000 --set-
burst 2000 , pcnt = 0 -- bcnt = 0
-p IPv4 -i swp+ -j ACCEPT , pcnt = 0 -- bcnt = 0
-p IPv6 -i swp+ -j ACCEPT , pcnt = 0 -- bcnt = 0
-i swp+ -j setclass --class 0 , pcnt = 0 -- bcnt = 0
-j police --set-mode pkt --set-rate 100 --set-burst 100 , pcnt = 0 --
bcnt = 0
cumulusnetworks.com 183
Cumulus Linux 3.7 User Guide
IP Tables
Set class is internal to the switch - it does not set any precedence bits.
IPv6 Tables
cumulusnetworks.com 185
Cumulus Linux 3.7 User Guide
Set class is internal to the switch - it does not set any precedence bits.
EB Tables
cumulusnetworks.com 187
Cumulus Linux 3.7 User Guide
Set class is internal to the switch. It does not set any precedence bits.
This feature is specific to switches on the Broadcom platform only; on Mellanox Spectrum
switches, the input port ACL does not have these issues when learning MAC addresses.
Create a configuration similar to the following, where you associate a port and VLAN with a given MAC
address, adding each one to the bridge:
auto swp1
iface swp1
auto swp2
iface swp2
auto swp3
iface swp3
auto bridge
iface bridge
bridge-ports swp1 swp2 swp3
bridge-pvid 1
bridge-vids 100 200 300
bridge-vlan-aware yes
pre-up bridge fdb add 00:00:00:00:00:11 dev swp1 master static
vlan 100
pre-up bridge fdb add 00:00:00:00:00:22 dev swp2 master static
vlan 200
pre-up bridge fdb add 00:00:00:00:00:33 dev swp3 master static
vlan 300
If you need to list many MAC addresses, you can run a script to create the same configuration. For example,
create a script called macs.txt and put in the bridge fdb add commands for each MAC address you
need to configure:
cumulusnetworks.com 189
Cumulus Linux 3.7 User Guide
auto swp1
iface swp1
auto swp2
iface swp2
auto swp3
iface swp3
auto swp4
iface swp4
auto swp5
iface swp5
auto swp6
iface swp6
auto bridge
iface bridge
bridge-ports swp1 swp2 swp3 swp4 swp5 swp6
bridge-pvid 1
bridge-vids 100 200 300
bridge-vlan-aware yes
pre-up bridge fdb add 00:00:00:00:00:11 dev swp1 master static
vlan 100
pre-up bridge fdb add 00:00:00:00:00:22 dev swp2 master static
vlan 200
pre-up bridge fdb add 00:00:00:00:00:33 dev swp3 master static
vlan 300
pre-up bridge fdb add 00:00:00:00:00:44 dev swp4 master static
vlan 400
pre-up bridge fdb add 00:00:00:00:00:55 dev swp5 master static
vlan 500
pre-up bridge fdb add 00:00:00:00:00:66 dev swp6 master static
vlan 600
Contents
This topic describes ...
systemd and systemctl Command (see page 191)
systemctl Subcommands (see page 191)
Ensure a Service Starts after Multiple Restarts (see page 192)
Keep systemd Services from Hanging after Starting (see page 192)
Identify Active Listener Ports for IPv4 and IPv6 (see page 192)
Identify Daemons Currently Active or Stopped (see page 193)
Identify Essential Services (see page 198)
Unlike the service command in Debian Wheezy, the service name is written after the
systemctl subcommand, not before it.
systemctl Subcommands
systemctl has a number of subcommands that perform a specific operation on a given daemon.
status: Returns the status of the specified daemon.
start: Starts the daemon.
stop: Stops the daemon.
cumulusnetworks.com 191
Cumulus Linux 3.7 User Guide
To clear this error, run systemctl reset-failed switchd.service. If you know you are going to
restart frequently (multiple times within the StartLimitInterval), you can run the same command before you
issue the restart request. This also applies to stop followed by start.
cumulusnetworks.com 193
Cumulus Linux 3.7 User Guide
cumulus@switch:~$ cl-service-summary
Service cron enabled active
Service ssh enabled active
Service syslog enabled active
Service neighmgrd enabled active
Service clagd enabled active
Service lldpd enabled active
Service mstpd enabled active
Service poed inactive
Service portwd inactive
Service ptmd enabled active
Service pwmd enabled active
Service smond enabled active
Service switchd enabled active
Service vxrd disabled inactive
Service vxsnd disabled inactive
Service bgpd disabled inactive
Service isisd disabled inactive
Service ospf6d disabled inactive
Service ospfd disabled inactive
Service rdnbrd disabled inactive
Service ripd disabled inactive
Service ripngd disabled inactive
Service zebra disabled inactive
You can also run the systemctl list-unit-files --type service command to list all services on
the switch and see which ones are enabled:
Click here to see output of this command ...
cryptdisks-early.service masked
cryptdisks.service masked
cumulus-aclcheck.service static
cumulus-core.service static
cumulus-fastfailover.service enabled
cumulus-firstboot.service disabled
cumulus-platform.service enabled
cumulus-support.service static
dbus-org.freedesktop.hostname1.service static
dbus-org.freedesktop.locale1.service static
dbus-org.freedesktop.login1.service static
dbus-org.freedesktop.machine1.service static
dbus-org.freedesktop.timedate1.service static
dbus.service static
debian-fixup.service static
debug-shell.service disabled
decode-syseeprom.service static
dhcpd.service disabled
dhcpd6.service disabled
[email protected] disabled
[email protected] disabled
dhcrelay.service enabled
dhcrelay6.service disabled
[email protected] disabled
[email protected] disabled
dm-event.service disabled
dns-watcher.service disabled
dnsmasq.service enabled
emergency.service static
fuse.service masked
getty-static.service static
[email protected] enabled
halt-local.service static
halt.service masked
[email protected] static
hostname.service masked
hsflowd.service enabled
[email protected] enabled
hwclock-save.service enabled
hwclock.service masked
hwclockfirst.service masked
[email protected] static
initrd-cleanup.service static
initrd-parse-etc.service static
initrd-switch-root.service static
initrd-udevadm-cleanup-db.service static
killprocs.service masked
kmod-static-nodes.service static
kmod.service static
ledmgrd.service enabled
lldpd.service enabled
lm-sensors.service enabled
cumulusnetworks.com 195
Cumulus Linux 3.7 User Guide
lvm2-activation-early.service enabled
lvm2-activation.service enabled
lvm2-lvmetad.service static
lvm2-monitor.service enabled
[email protected] static
lvm2.service disabled
module-init-tools.service static
motd.service masked
mountall-bootclean.service masked
mountall.service masked
mountdevsubfs.service masked
mountkernfs.service masked
mountnfs-bootclean.service masked
mountnfs.service masked
mstpd.service enabled
netd.service enabled
netq-agent.service disabled
networking.service enabled
ntp.service enabled
[email protected] disabled
openvswitch-vtep.service disabled
phy-ucode-update.service enabled
portwd.service enabled
procps.service static
ptmd.service enabled
pwmd.service enabled
frr.service enabled
quotaon.service static
rc-local.service static
rc.local.service static
rdnbrd.service disabled
reboot.service masked
rescue.service static
rmnologin.service masked
rsyslog.service enabled
screen-cleanup.service masked
sendsigs.service masked
[email protected] disabled
single.service masked
smond.service enabled
snmpd.service disabled
[email protected] disabled
snmptrapd.service disabled
[email protected] disabled
ssh.service enabled
[email protected] disabled
sshd.service enabled
stop-bootlogd-single.service masked
stop-bootlogd.service masked
stopssh.service enabled
sudo.service disabled
switchd-diag.service static
switchd.service enabled
syslog.service enabled
sysmonitor.service static
systemd-ask-password-console.service static
systemd-ask-password-wall.service static
[email protected] static
systemd-binfmt.service static
systemd-fsck-root.service static
[email protected] static
systemd-halt.service static
systemd-hibernate.service static
systemd-hostnamed.service static
systemd-hybrid-sleep.service static
systemd-initctl.service static
systemd-journal-flush.service static
systemd-journald.service static
systemd-kexec.service static
systemd-localed.service static
systemd-logind.service static
systemd-machined.service static
systemd-modules-load.service static
systemd-networkd-wait-online.service disabled
systemd-networkd.service disabled
[email protected] disabled
systemd-poweroff.service static
systemd-quotacheck.service static
systemd-random-seed.service static
systemd-readahead-collect.service disabled
systemd-readahead-done.service static
systemd-readahead-drop.service disabled
systemd-readahead-replay.service disabled
systemd-reboot.service static
systemd-remount-fs.service static
systemd-resolved.service disabled
[email protected] static
systemd-setup-dgram-qlen.service static
systemd-shutdownd.service static
systemd-suspend.service static
systemd-sysctl.service static
systemd-timedated.service static
systemd-timesyncd.service disabled
systemd-tmpfiles-clean.service static
systemd-tmpfiles-setup-dev.service static
systemd-tmpfiles-setup.service static
systemd-udev-settle.service static
systemd-udev-trigger.service static
systemd-udevd.service static
systemd-update-utmp-runlevel.service static
systemd-update-utmp.service static
systemd-user-sessions.service static
udev-finish.service static
udev.service static
cumulusnetworks.com 197
Cumulus Linux 3.7 User Guide
umountfs.service masked
umountnfs.service masked
umountroot.service masked
update-ports.service enabled
urandom.service static
[email protected] static
uuidd.service static
vboxadd-service.service enabled
vboxadd-x11.service enabled
vboxadd.service enabled
vxrd.service disabled
vxsnd.service disabled
wd_keepalive.service enabled
x11-common.service masked
ztp-init.service enabled
ztp.service disabled
191 unit files listed.
lines 147-194/194 (END)
networking.service
switchd.service
wd_keepalive.service
network-pre.target
bootlog.service
systemd-readahead-done.service
systemd-readahead-done.timer
systemd-update-utmp-runlevel.service
graphical.target
systemd-update-utmp-runlevel.service
Configuring switchd
switchd is the daemon at the heart of Cumulus Linux. It communicates between the switch and Cumulus
Linux, and all the applications running on Cumulus Linux.
The switchd configuration is stored in /etc/cumulus/switchd.conf.
Contents
This topic describes ...
The switchd File System (see page 199)
Configure switchd Parameters (see page 200)
Restart switchd (see page 201)
cumulusnetworks.com 199
Cumulus Linux 3.7 User Guide
| | | |-- multicast
| | | `-- unknown_unicast
| |-- logging
| |-- route
| | |-- host_max_percent
| | `-- table
| `-- stats
| `-- poll_interval
|-- ctrl
| |-- acl
| |-- hal
| | `-- resync
| |-- logger
| |-- netlink
| | `-- resync
| |-- resync
| `-- sample
| `-- ulog_channel
|-- run
| `-- route_info
| |-- ecmp_nh
| | |-- count
| | |-- max
| | `-- max_per_route
| |-- host
| | |-- count
| | |-- count_v4
| | |-- count_v6
| | `-- max
| |-- mac
| | |-- count
| | `-- max
| `-- route
| |-- count_0
| |-- count_1
| |-- count_total
| |-- count_v4
| |-- count_v6
| |-- mask_limit
| |-- max_0
| |-- max_1
| `-- max_total
`-- version
To modify the configuration, run cl-cfg -w. For example, to set the buffer utilization measurement
interval to 1 minute, run:
You can get some of this information by running cl-resource-query; though you cannot
update the switchd configuration with it.
Restart switchd
Whenever you modify any switchd hardware configuration file (typically changing any *.conf file that
requires making a change to the switching hardware, like /etc/cumulus/datapath/traffic.conf),
you must restart switchd for the change to take effect:
You do not have to restart the switchd service when you update a network interface
configuration (that is, edit /etc/network/interfaces).
Restarting switchd causes all network ports to reset in addition to resetting the switch hardware
configuration.
cumulusnetworks.com 201
Cumulus Linux 3.7 User Guide
Contents
This topic describes ...
PoE Basics (see page 202)
Configure PoE (see page 203)
Troubleshooting (see page 207)
Verify the Link Is Up (see page 207)
View LLDP Information Using lldpcli (see page 208)
View LLDP Information Using tcpdump (see page 209)
Log poed Events in syslog (see page 210)
PoE Basics
PoE functionality is provided by the cumulus-poe package. When a powered device is connected to the
switch via an Ethernet cable:
If the available power is greater than the power required by the connected device, power is supplied
to the switch port, and the device powers on
If available power is less than the power required by the connected device and the switch port's
priority is less than the port priority set on all powered ports, power is not supplied to the port
If available power is less than the power required by the connected device and the switch port's
priority is greater than the priority of a currently powered port, power is removed from lower
priority port(s) and power is supplied to the port
If the total consumed power exceeds the configured power limit of the power source, low priority
ports are turned off. In the case of a tie, the port with the lower port number gets priority
Power is available as follows:
920W x 750W
x 920W 750W
The AS4610-54P has an LED on the front panel to indicate PoE status:
Link state and PoE state are completely independent of each other. When a link is brought down
on a particular port using ip link <port> down, power on that port is not turned off;
however, LLDP negotiation is not possible.
Configure PoE
You use the poectl command utility to configure PoE on a switch that supports the feature. You can:
Enable or disable PoE for a given switch port
Set a switch port's PoE priority to one of three values: low, high or critical
The PoE configuration resides in /etc/cumulus/poe.conf. The file lists all the switch ports, whether PoE
is enabled for those ports and the priority for each port.
Sample poe.conf file ...
[enable]
swp1 = enable
swp2 = enable
swp3 = enable
swp4 = enable
swp5 = enable
swp6 = enable
swp7 = enable
swp8 = enable
swp9 = enable
swp10 = enable
swp11 = enable
swp12 = enable
swp13 = enable
swp14 = enable
swp15 = enable
swp16 = enable
swp17 = enable
swp18 = enable
swp19 = enable
swp20 = enable
swp21 = enable
swp22 = enable
swp23 = enable
swp24 = enable
swp25 = enable
swp26 = enable
swp27 = enable
swp28 = enable
swp29 = enable
cumulusnetworks.com 203
Cumulus Linux 3.7 User Guide
swp30 = enable
swp31 = enable
swp32 = enable
swp33 = enable
swp34 = enable
swp35 = enable
swp36 = enable
swp37 = enable
swp38 = enable
swp39 = enable
swp40 = enable
swp41 = enable
swp42 = enable
swp43 = enable
swp44 = enable
swp45 = enable
swp46 = enable
swp47 = enable
swp48 = enable
[priority]
swp1 = low
swp2 = low
swp3 = low
swp4 = low
swp5 = low
swp6 = low
swp7 = low
swp8 = low
swp9 = low
swp10 = low
swp11 = low
swp12 = low
swp13 = low
swp14 = low
swp15 = low
swp16 = low
swp17 = low
swp18 = low
swp19 = low
swp20 = low
swp21 = low
swp22 = low
swp23 = low
swp24 = low
swp25 = low
swp26 = low
swp27 = low
swp28 = low
swp29 = low
swp30 = low
swp31 = low
swp32 = low
swp33 = low
swp34 = low
swp35 = low
swp36 = low
swp37 = low
swp38 = low
swp39 = low
swp40 = low
swp41 = low
swp42 = low
swp43 = low
swp44 = low
swp45 = low
swp46 = low
swp47 = low
swp48 = low
By default, PoE and PoE+ are enabled on all Ethernet/1G switch ports, and these ports are set with a low
priority. Switch ports can have low, high or critical priority.
There is no additional configuration for PoE+.
To change the priority for one or more switch ports, run poectl -p swp# [low|high|critical]. For
example:
To display PoE information for a set of switch ports, run poectl -i [port_numbers]:
cumulusnetworks.com 205
Cumulus Linux 3.7 User Guide
cumulus@switch:~$ poectl -s
System power:
Total: 730.0 W
Used: 11.0 W
Available: 719.0 W
Connected ports:
swp11, swp24, swp27, swp48
The set commands (priority, enable, disable) either succeed silently or display an error message if the
command fails.
The poectl command takes the following arguments:
Argument Description
-i, --port- Returns detailed information for the specified ports. You can specify a range of ports. For
info example:
PORT_LIST -i swp1-swp5,swp10
-a, --all Returns PoE status and detailed information for all ports.
-p, -- Sets priority for the specified ports: low, high, critical.
priority
PORT_LIST
PRIORITY
Argument Description
-r, --reset Performs a hardware reset on the specified ports. Use this if one or more ports are stuck in
PORT_LIST an error state. This does not reset any configuration settings for the specified ports.
--save Saves the current configuration. The saved configuration is automatically loaded on system
boot.
Troubleshooting
You can troubleshoot PoE and PoE+ using the following utilities and files:
poectl -s, as described above.
The Cumulus Linux cl-support script, which includes PoE-related output from poed.conf,
syslog, poectl --diag-info and lldpctl.
lldpcli show neighbors ports <swp> protocol lldp hidden details
tcpdump -v -v -i <swp> ether proto 0x88cc
The contents of the PoE/PoE+ /etc/lldpd.d/poed.conf configuration file, as described above.
cumulusnetworks.com 207
Cumulus Linux 3.7 User Guide
cumulusnetworks.com 209
Cumulus Linux 3.7 User Guide
1. In a terminal, create a new file in the /etc/profile.d/ directory. In the code example below, the
file is called proxy.sh, and is created using the text editor nano.
2. Add a line to the file to configure either an HTTP or an HTTPS proxy, or both:
HTTP proxy:
http_proxy=http://myproxy.domain.com:8080
export http_proxy
HTTPS proxy:
210 09 January 2019
Cumulus Networks
HTTPS proxy:
https_proxy=https://myproxy.domain.com:8080
export https_proxy
3. Create a file in the /etc/apt/apt.conf.d directory and add the following lines to the file for
acquiring the HTTP and HTTPS proxies; the example below uses http_proxy as the file name:
4. Add the proxy addresses to /etc/wgetrc; you may have to uncomment the http_proxy and
https_proxy lines:
https_proxy = https://myproxy.domain.com:8080
http_proxy = http://myproxy.domain.com:8080
...
5. Run the source command, to execute the file in the current environment:
The proxy is now configured. The echo command can be used to confirm a proxy is set up correctly:
HTTP proxy:
HTTPS proxy:
Related Information
Setting up an apt package cache
cumulusnetworks.com 211
Cumulus Linux 3.7 User Guide
HTTP API
Cumulus Linux implements an HTTP application programing interface to OpenStack ML2 driver (see page
1088) and NCLU (see page 88). Rather than accessing Cumulus Linux using SSH, you can interact with the
switch using an HTTP client, such as cURL, HTTPie or a web browser.
The HTTP API service is enabled by default on chassis hardware only. However, the associated
server is configured to only listen to traffic originating from within the chassis.
The service is not enabled by default on non-chassis hardware.
Contents
This topic describes ...
HTTP API Basics (see page 212)
Configuration (see page 213)
Enable External Traffic on a Chassis (see page 213)
IP and Port Settings (see page 214)
Security (see page 214)
Authentication (see page 214)
Transport Layer Security (see page 214)
cURL Examples (see page 214)
If you are upgrading from a version of Cumulus Linux earlier than 3.4.0, the supporting software
for the API may not be installed. Install the required software with the following command.
To enable the HTTP API service, run the following systemd command:
Use the systemctl start and systemctl stop commands to start/stop the HTTP API service:
Configuration
There are two configuration files associated with the HTTP API services:
/etc/nginx/sites-available/nginx-restapi.conf
/etc/nginx/sites-available/nginx-restapi-chassis.conf
The first configuration file is used for non-chassis hardware; the second, for chassis hardware.
Generally, only the configuration file relevant to your hardware needs to be edited, as the associated
services determine the appropriate configuration file to use at run time.
If the configuration file is not valid, return to step 1; review any changes that were made, and correct
the errors.
5. Restart the daemons:
cumulusnetworks.com 213
Cumulus Linux 3.7 User Guide
For more information on the listen directive, refer to the NGINX documentation.
Do not set the same listening port for internal and external chassis traffic.
Security
Authentication
The default configuration requires all HTTP requests from external sources (not internal switch traffic) to set
the HTTP Basic Authentication header.
The user and password should correspond to a user on the host switch.
Do not copy the cumulus.pem or cumulus.key files. After installation, edit the “ssl_certificate”
and “ssl_certificate_key” values in the configuration file for your hardware.
cURL Examples
This section contains several example cURL commands for sending HTTP requests to a non-chassis host.
The following settings are used for these examples:
Username: user
Password: pw
IP: 192.168.0.32
Port: 8080
Requests for NCLU require setting the Content-Type request header to be set to application
/json.
cURL’s -k flag is necessary when the server uses a self-signed certificate. This is the default
configuration (see the Security section (see page 214)). To display the response headers, include -
D flag in the command.
By default, ifupdown is quiet; use the verbose option -v when you want to know what is going
on when bringing an interface down or up.
Contents
This topic describes ...
Basic Commands (see page 216)
ifupdown2 Interface Classes (see page 217)
Configure a Loopback Interface (see page 219)
ifupdown Behavior with Child Interfaces (see page 219)
ifupdown2 Interface Dependencies (see page 220)
Subinterfaces (see page 223)
ifup and Upper (Parent) Interfaces (see page 223)
Configure IP Addresses (see page 224)
Specify IP Address Scope (see page 225)
Purge Existing IP Addresses on an Interface (see page 227)
Specify User Commands (see page 227)
Source Interface File Snippets (see page 228)
Use Globs for Port Lists (see page 229)
Use Templates (see page 230)
Run ifupdown Scripts under /etc/network/ with ifupdown2 (see page 231)
Add Descriptions to Interfaces (see page 231)
Caveats and Errata (see page 232)
ifupdown2 and sysctl (see page 233)
Long Interface Names (see page 233)
Related Information (see page 233)
Basic Commands
To bring up an interface or apply changes to an existing interface, run:
ifdown always deletes logical interfaces after bringing them down. Use the --admin-state
option if you only want to administratively bring the interface up or down.
To see the link and administrative state, use the ip link show command:
In this example, swp1 is administratively UP and the physical link is UP (LOWER_UP flag). More information
on interface administrative state and physical state can be found in this knowledge base article.
To put an interface into an admin down state. The interface remains down after any future reboots or
applying configuration changes with ifreload -a. For example:
auto swp1
iface swp1
link-down yes
auto swp1
iface swp1
cumulusnetworks.com 217
Cumulus Linux 3.7 User Guide
You can add other classes using the allow prefix. For example, if you have multiple interfaces used for
uplinks, you can make up a class called uplinks:
auto swp1
allow-uplink swp1
iface swp1 inet static
address 10.1.1.1/31
auto swp2
allow-uplink swp2
iface swp2 inet static
address 10.1.1.3/31
This allows you to perform operations on only these interfaces using the --allow=uplinks option, or still
use the -a options since these interfaces are also in the auto class:
If you are using Management VRF (see page 859), you can use the special interface class called mgmt, and
put the management interface into that class.
The mgmt interface class is not supported if you are configuring Cumulus Linux using NCLU (see
page 88).
allow-mgmt eth0
iface eth0 inet dhcp
vrf mgmt
allow-mgmt mgmt
iface mgmt
address 127.0.0.1/8
vrf-table auto
All ifupdown2 commands (ifup, ifdown, ifquery, ifreload) can take a class. Include the --
allow=<class> option when you run the command. For example, to reload the configuration for the
management interface described above, run:
You can easily bring up or down all interfaces marked with the common auto class in /etc/network
/interfaces . Use the -a option. For further details, see individual man pages for ifup(8) , ifdown(8)
, ifreload(8) .
To reload all network interfaces marked auto, use the ifreload command, which is equivalent to running
ifdown then ifup, the one difference being that ifreload skips any configurations that didn't change):
Some syntax checks are done by default, however it may be safer to apply the configs only if the
syntax check passes, using the following compound command:
cumulusnetworks.com 219
Cumulus Linux 3.7 User Guide
For more information on the bridge in traditional mode vs the bridge in VLAN-aware mode, please read this
knowledge base article.
auto bond1
iface bond1
address 100.0.0.2/16
bond-slaves swp29 swp30
auto bond2
iface bond2
address 100.0.0.5/16
bond-slaves swp31 swp32
auto br2001
iface br2001
address 12.0.1.3/24
bridge-ports bond1.2001 bond2.2001
bridge-stp on
Using ifup --with-depends br2001 brings up all dependents of br2001: bond1.2001, bond2.2001,
bond1, bond2, bond1.2001, bond2.2001, swp29, swp30, swp31, swp32.
Similarly, specifying ifdown --with-depends br2001 brings down all dependents of br2001: bond1.
2001, bond2.2001, bond1, bond2, bond1.2001, bond2.2001, swp29, swp30, swp31, swp32.
As mentioned earlier, ifdown2 always deletes logical interfaces after bringing them down. Use
the --admin-state option if you only want to administratively bring the interface up or down. In
terms of the above example, ifdown br2001 deletes br2001.
To guide you through which interfaces will be brought down and up, use the --print-dependency
option to get the list of dependents.
Use ifquery --print-dependency=list -a to get the dependency list of all interfaces:
cumulusnetworks.com 221
Cumulus Linux 3.7 User Guide
bond2.2001 : ['bond2']
br2001 : ['bond1.2001', 'bond2.2001']
swp40 : None
swp25 : None
swp26 : None
swp29 : None
swp30 : None
swp31 : None
swp32 : None
You can use dot to render the graph on an external system where dot is installed.
222 09 January 2019
Cumulus Networks
You can use dot to render the graph on an external system where dot is installed.
Subinterfaces
On Linux an interface is a network device, and can be either a physical device like switch port (such as swp1),
or virtual, like a VLAN (vlan100). A VLAN subinterface is a VLAN device on an interface, and the VLAN ID is
appended to the parent interface using dot (.) VLAN notation. For example, a VLAN with ID 100 that is a
subinterface of swp1 is named swp1.100 in Cumulus Linux. The dot VLAN notation for a VLAN device name
is a standard way to specify a VLAN device on Linux. Many Linux configuration tools, most notably
ifupdown2 and its predecessor ifupdown, recognize such a name as a VLAN interface name.
A VLAN subinterface only receives traffic tagged (see page 420) for that VLAN, so swp1.100 only receives
packets tagged with VLAN 100 on switch port swp1. Similarly, any transmits from swp1.100 result in tagging
the packet with VLAN 100.
For an MLAG (see page 427) deployment, the peerlink interface that connects the two switches in the MLAG
pair has a VLAN subinterface named 4094 by default, provided you configured the subinterface with NCLU
(see page 88). The peerlink.4094 subinterface only receives traffic tagged for VLAN 4094.
cumulusnetworks.com 223
Cumulus Linux 3.7 User Guide
auto br100
iface br100
bridge-ports bond1.100 bond2.100
auto bond1
iface bond1
bond-slaves swp1 swp2
If you run ifdown bond1, ifdown deletes bond1 and the VLAN interface on bond1 (bond1.100); it also
removes bond1 from the bridge br100. Next, when you run ifup bond1, it creates bond1 and the VLAN
interface on bond1 (bond1.100); it also executes ifup br100 to add the bond VLAN interface (bond1.100)
to the bridge br100.
As you can see above, implicitly bringing up the upper interface helps, but there can be cases where an
upper interface (like br100) is not in the right state, which can result in warnings. The warnings are mostly
harmless.
If you want to disable these warnings, you can disable the implicit upper interface handling by setting
skip_upperifaces=1 in /etc/network/ifupdown2/ifupdown2.conf.
With skip_upperifaces=1, you will have to explicitly execute ifup on the upper interfaces. In this case,
you will have to run ifup br100 after an ifup bond1 to add bond1 back to bridge br100.
Although specifying a subinterface like swp1.100 and then running ifup swp1.100 will also
result in the automatic creation of the swp1 interface in the kernel, Cumulus Networks
recommends you specify the parent interface swp1 as well. A parent interface is one where any
physical layer configuration can reside, such as link-speed 1000 or link-duplex full.
It's important to note that if you only create swp1.100 and not swp1, then you cannot run ifup
swp1 since you did not specify it.
Configure IP Addresses
IP addresses are configured with the net add interface command.
auto swp1
iface swp1
address 12.0.0.1/30
address 12.0.0.2/30
address 2001:DB8::1/126
You can specify both IPv4 and IPv6 addresses for the same interface.
For IPv6 addresses, you can create or modify the IP address for an interface using
either "::" or "0:0:0" notation. Both of the following examples are valid:
The address method and address family are added by NCLU when needed, specifically
when you are creating DHCP or loopback interfaces.
auto lo
iface lo inet loopback
cumulusnetworks.com 225
Cumulus Linux 3.7 User Guide
auto swp2
iface swp2
address 35.21.30.5/30
address 3101:21:20::31/80
scope link
When you run ifreload -a on this configuration, ifupdown2 considers all IP addresses as global.
These commands create the following code snippet in the /etc/network/interfaces file:
auto swp6
iface swp6
post-up ip address add 71.21.21.20/32 dev swp6 scope site
These commands create the following configuration snippet in the /etc/network/interfaces file:
auto swp1
iface swp1
address-purge no
Purging existing addresses on interfaces with multiple iface stanzas is not supported. Doing so
can result in the configuration of multiple addresses for an interface after you change an
interface address and reload the configuration with ifreload -a. If this happens, you must
shut down and restart the interface with ifup and ifdown, or manually delete superfluous
addresses with ip address delete specify.ip.address.here/mask dev DEVICE. See
also the Caveats and Errata (see page ) section below for some cautions about using multiple
iface stanzas for the same interface.
auto swp1
iface swp1
address 12.0.0.1/30
cumulusnetworks.com 227
Cumulus Linux 3.7 User Guide
Any valid command can be hooked in the sequencing of bringing an interface up or down, although
commands should be limited in scope to network-related commands associated with the particular
interface.
For example, it wouldn't make sense to install some Debian package on ifup of swp1, even though that is
technically possible. See man interfaces for more details.
If your post-up command also starts, restarts or reloads any systemd service, you must use the
--no-block option with systemctl. Otherwise, that service or even the switch itself may hang
after starting or restarting.
For example, to restart the dhcrelay service after bringing up VLAN 100, first run:
auto bridge
iface bridge
bridge-vids 100
bridge-vlan-aware yes
auto vlan100
iface vlan100
post-up systemctl --no-block restart dhcrelay.service
vlan-id 100
vlan-raw-device bridge
source /etc/network/interfaces.d/bond0
While you must use commas to separate different ranges of ports in the NCLU command, the /etc
/network/interfaces file renders the list of ports individually, as in the example output below.
...
auto bridge
iface bridge
bridge-ports swp1 swp2 swp3 swp4 swp6 swp10 swp11 swp12
bridge-vlan-aware yes
auto swp1
iface swp1
auto swp2
iface swp2
auto swp3
iface swp3
auto swp4
iface swp4
auto swp6
iface swp6
auto swp10
cumulusnetworks.com 229
Cumulus Linux 3.7 User Guide
iface swp10
auto swp11
iface swp11
auto swp12
iface swp12
Use Templates
ifupdown2 supports Mako-style templates. The Mako template engine is run over the interfaces file
before parsing.
Use the template to declare cookie-cutter bridges in the interfaces file:
%for v in [11,12]:
auto vlan${v}
iface vlan${v}
address 10.20.${v}.3/24
bridge-ports glob swp19-20.${v}
bridge-stp on
%endfor
%for i in [1,12]:
auto swp${i}
iface swp${i}
address 10.20.${i}.3/24
Regarding Mako syntax, use square brackets ([1,12]) to specify a list of individual numbers (in
this case, 1 and 12). Use range(1,12) to specify a range of interfaces.
You can test your template and confirm it evaluates correctly by running mako-render /etc
/network/interfaces.
For more examples of configuring Mako templates, read this knowledge base article.
To comment out content in Mako templates, use double hash marks (##). For example:
## auto swp${i}
## iface swp${i}
## % endfor
##
cumulusnetworks.com 231
Cumulus Linux 3.7 User Guide
auto swp1
iface swp1
alias hypervisor_port_1
Interface descriptions also appear in the SNMP (see page 986) OID IF-MIB::ifAlias.
To show the interface description (alias) for all interfaces on the switch, run the net show interface
alias command. For example:
To show the interface description for all interfaces on the switch in JSON format, run the net show
interface alias json command.
If you do specify multiple iface stanzas for the same interface, make sure the stanzas do not specify the
232 09 January 2019
Cumulus Networks
If you do specify multiple iface stanzas for the same interface, make sure the stanzas do not specify the
same interface attributes. Otherwise, unexpected behavior can result.
For example, swp1 is configured in two places:
source /etc/network/interfaces.d/speed_settings
auto swp1
iface swp1
address 10.0.14.2/24
As well as /etc/network/interfaces.d/speed_settings
auto swp1
iface swp1
link-speed 1000
link-duplex full
ifupdown2 correctly parses a configuration like this because the same attributes are not specified in
multiple iface stanzas.
And, as stated in the note above, you cannot purge existing addresses on interfaces with multiple iface
stanzas.
Related Information
Debian - Network Configuration
Linux Foundation - Bonds
Linux Foundation - VLANs
cumulusnetworks.com 233
Cumulus Linux 3.7 User Guide
You can only set MTU for logical interfaces. If you try to set auto-negotiation, duplex mode, or link
speed for a logical interface, an unsupported error is shown.
For Mellanox switches, MTU is the only port attribute that you can configure. The Mellanox firmware
configures everything else automatically, following a predefined list of parameter settings (speed, duplex,
autoneg, FEC), until the link comes up.
For Broadcom-based switches, Cumulus Networks recommends that you enable auto-negotiation on
each port. When enabled, Cumulus Linux automatically configures the best link parameter settings based
on the module type (speed, duplex, auto-negotiation, and forward error correction (FEC) where supported).
To understand the default configuration for the various port and cable types, see the table below (see page
244). If you need to troubleshoot further to bring the link up, follow the sections below to set the specific
link parameters.
Contents
This topic describes ...
Auto-negotiation (see page 235)
Port Speed and Duplex Mode (see page 236)
Auto-negotiation
To configure auto-negotiation for a Broadcom-based switch, set link-autoneg to on for all the switch
ports. For example, to enable auto-negotiation for swp1 through swp52:
Any time you enable auto-negotiation, Cumulus Linux restores the default configuration settings specified
in the table below (see page ).
By default on a Broadcom-based switch, auto-negotiation is disabled — except on 10G and 1000BASE-T
switch ports, where it is required for links to work. For RJ-45 SFP adapters, you need to manually configure
the settings as described in the default settings table below (see page ).
If you disable auto-negotiation later or never enable it, then you have to configure the duplex, FEC, and link
cumulusnetworks.com 235
Cumulus Linux 3.7 User Guide
If you disable auto-negotiation later or never enable it, then you have to configure the duplex, FEC, and link
speed settings manually using NCLU (see page 88) (see the relevant sections below). The default speed if
you disable auto-negotiation depends on the type of connector used with the port. For example, a QSFP28
optic defaults to 100G, while a QSFP+ optic defaults to 40G and SFP+ defaults to 10G.
Keep auto-negotiation enabled at all times. If you do decide to disable it, be aware of the
following:
You must manually set link the speed, duplex, pause, and FEC.
Disabling auto-negotiation on a copper cable of any kind prevents the port from
optimizing the link through link training.
Disabling auto-negotiation on a 1G optical cable prevents detection of single fiber breaks.
You cannot disable auto-negotiation for 1GT or 10GT cables.
10/100/1000BASE-T RJ-45 SFP adapters do not work with auto-negotiation enabled. You must
manually configure these ports using the settings below (link-autoneg=off, link-
speed=1000|100|10, link-duplex=full|half).
Depending upon the connector used for a port, enabling auto-negotiation also enables forward error
correction (FEC), if the cable requires it (see the table below (see page )). FEC always adjusts for the
speed of the cable. However, you cannot disable FEC separately using NCLU (see page 88).
1G 100 Mb
40G 10G*
100G 50G & 40G (with or without breakout port), 25G*, 10G*
*Requires the port to be converted into a breakout port. See Configuring Breakout Ports (see page 255)
below.
auto swp1
iface swp1
link-speed 10000
Platform Limitations
On Dell S4148F-ON and S4128F-ON switches, you must configure ports within the same
port group with the same link speed.
On Lenovo NE2572O switches, swp1 thru swp8 only support 25G speed.
For 10G and 1G SFPs inserted in a 25G port, you must edit the /etc/cumulus/ports.
conf file and configure the four ports in the same core to be 10G. See Caveats and Errata
(see page 263) below.
MTU
Interface MTU (maximum transmission unit) applies to traffic traversing the management port, front panel
/switch ports, bridge, VLAN subinterfaces and bonds — in other words, both physical and logical interfaces.
MTU is the only interface setting that you must set manually.
In Cumulus Linux, ifupdown2 assigns 1500 as the default MTU setting. To change the setting, run:
Some switches might not support the same maximum MTU setting in hardware for both the
management interface (eth0) and the data plane ports.
cat /etc/network/ifupdown2/policy.d/mtu.json
{
"address": {"defaults": { "mtu": "9216" }
cumulusnetworks.com 237
Cumulus Linux 3.7 User Guide
}
}
If your platform does not support a high MTU on eth0, you can set a lower MTU with the following
command:
auto bridge
iface bridge
bridge-ports bond1 bond2 bond3 bond4 peer5
bridge-vids 100-110
bridge-vlan-aware yes
For bridge to have an MTU of 9000, set the MTU for each of the member interfaces (bond1 to bond 4, and
peer5), to 9000 at minimum.
When configuring MTU for a bond, configure the MTU value directly under the bond interface; the
configured value is inherited by member links/slave interfaces. If you need a different MTU on the bond, set
it on the bond interface, as this ensures the slave interfaces pick it up. There is no need to specify MTU on
the slave interfaces.
VLAN interfaces inherit their MTU settings from their physical devices or their lower interface; for example,
swp1.100 inherits its MTU setting from swp1. Therefore, specifying an MTU on swp1 ensures that swp1.100
inherits the MTU setting for swp1.
If you are working with VXLANs (see page 476), the MTU for a virtual network interface (VNI) must be 50
bytes smaller than the MTU of the physical interfaces on the switch, as those 50 bytes are required for
various headers and other data. Also, consider setting the MTU much higher than the default 1500.
auto swp1
iface swp1
mtu 9000
You must take care to ensure there are no MTU mismatches in the conversation path.
MTU mismatches result in dropped or truncated packets, degrading or blocking
network performance.
The MTU for an SVI interface, such as vlan100, is derived from the bridge. When you use NCLU to
change the MTU for an SVI and the MTU setting is higher than it is for the other bridge member
interfaces, the MTU for all bridge member interfaces changes to the new setting. If you need to
use a mixed MTU configuration for SVIs, for example, if some SVIs have a higher MTU and some
lower, then set the MTU for all member interfaces to the maximum value, then set the MTU on
the specific SVIs that need to run at a lower MTU.
To view the MTU setting, run the net show interface <interface> command:
cumulusnetworks.com 239
Cumulus Linux 3.7 User Guide
As an alternative, add a post-down command in the /etc/network/interfaces file to reset the MTU
of the interface. For example:
auto swp3
iface swp3
alias BNBYLAB-PD01HV-01_Port3
bridge-vids 106 109 119 141 150-151
mtu 9192
post-down /sbin/ip link set dev swp3 mtu 9192
FEC
Forward Error Correction (FEC) is an encoding and decoding layer that enables the switch to detect and
correct bit errors introduced over the cable between two interfaces. Because 25G transmission speeds can
introduce a higher than acceptable bit error rate (BER) on a link, FEC is required or recommended for 25G,
4x25G, and 100G link speeds.
The two interfaces on each end must use the same FEC setting for the link to come up.
There is a very small latency overhead required for FEC. For most applications, this small amount
of latency is preferable to error packet retransmission latency.
Important
The Tomahawk switch does not support RS FEC or auto-negotiation of FEC on 25G lanes that are
240 09 January 2019
Cumulus Networks
The Tomahawk switch does not support RS FEC or auto-negotiation of FEC on 25G lanes that are
broken out (Tomahawk pre-dates 802.3by). If you are using a 4x25G breakout DAC or AOC on a
Tomahawk switch, you can configure either Base-R FEC or no FEC, and choose cables appropriate
for that limitation (CA-25G-S, CA-25G-N or fiber).
Tomahawk+ and Maverick switches do not have this limitation.
You cannot set FEC RS on any Trident II switch with either NCLU or by directly editing the /etc
/network/interfaces file.
For 25G DAC, 4x25G Breakouts DAC and 100G DAC cables, the IEEE 802.3by specification creates 3
classes:
CA-25G-L (long cables - achievable cable length of at least 5m) dB loss less or equal to 22.48.
Requires RS FEC and expects BER of 10-5 or better with RS FEC enabled.
CA-25G-S (short cables - achievable cable length of at least 3m) dB loss less or equal to 16.48.
Requires Base-R FEC and expects BER of 10-8 or better with Base-R FEC enabled.
CA-25G-N (no FEC - achievable cable length of at least 3m) dB loss less or equal to 12.98. Does not
require FEC. Expects BER 10-12 or better with no FEC.
The IEEE classification is based on various dB loss measurements and minimum achievable cable length.
You can build longer and shorter cables if they comply to the dB loss and BER requirements.
If a cable is manufactured to CA-25G-S classification and FEC is not enabled, the BER might be
unacceptable in a production network. It is important to set the FEC according to the cable class (or better)
to have acceptable bit error rates. See Determining Cable Class (see page 241) below.
You can check bit errors using cl-netstat (RX_ERR column) or ethtool -S (HwIfInErrors counter)
after a large amount of traffic has passed through the link. A non zero value indicates bit errors. Expect
error packets to be zero or extremely low compared to good packets. If a cable has an unacceptable rate of
errors with FEC enabled, replace the cable.
For 25G, 4x25G Breakout, and 100G Fiber modules and AOCs, there is no classification of 25G cable
types for dB loss, BER, or Length. FEC is recommended but might not be required if the BER is low enough.
A manufacturer's EEPROM setting might not match the dB loss on a cable or the actual bit error
rates that a particular cable introduces. Use the designation as a guide, but set FEC according to
the bit error rate tolerance in the design criteria for the network. For most applications, the
highest mutual FEC ability of both end devices is the best choice.
You can determine for which grade the manufacturer has designated the cable as follows.
For the SFP28 DAC, run the following command:
cumulusnetworks.com 241
Cumulus Linux 3.7 User Guide
On a Mellanox switch, the currently-enabled FEC mode is not accessible with user commands at
this time; however, you can deduce the mode from the remote FEC setting when the link is up.
To review the FEC setting on the link, run the following command:
To review the FEC setting on the link, run the following command:
To view the FEC and auto-negotiation settings, run the following command:
cumulusnetworks.com 243
Cumulus Linux 3.7 User Guide
To review the FEC setting on the link, run the following command:
Mellanox switches automatically configure these settings following a predefined list of parameter
settings until the link comes up.
If the other side of the link is running a version of Cumulus Linux earlier than 3.2, depending
upon the interface type, auto-negotiation may not work on that switch. Cumulus Networks
recommends you use the recommended settings as show below on this switch in this case.
Configuration
in /etc
/network
/interfaces
auto swp1
iface swp1
link-
autoneg on
cumulusnetworks.com 245
Cumulus Linux 3.7 User Guide
link-speed
100
auto swp1
iface swp1
link-
autoneg off
link-speed
1000
1000BASE- On N/A
T on a 1G
fixed $ net add
copper interface
port swp1 link
speed 1000
$ net add
interface
swp1 link
autoneg on
Configuration
in /etc
/network
/interfaces
auto swp1
iface swp1
link-
autoneg on
link-speed
1000
1000BASE- On N/A
T on a
10G fixed $ net add
copper interface
port swp1 link
speed 1000
$ net add
interface
swp1 link
autoneg on
Configuration
in /etc
/network
/interfaces
auto swp1
iface swp1
link-
autoneg on
link-speed
1000
Configuration
in /etc
/network
/interfaces
cumulusnetworks.com 247
Cumulus Linux 3.7 User Guide
auto swp1
iface swp1
link-
autoneg on
10GBASE- On N/A
T fixed
copper $ net add
port interface
swp1 link
speed 10000
$ net add
interface
swp1 link
autoneg on
Configuration
in /etc
/network
/interfaces
auto swp1
iface swp1
link-
autoneg on
link-speed
10000
Configuration
in /etc
/network
/interfaces
auto swp1
iface swp1
link-
autoneg off
link-speed
10000
Configuration
in /etc
/network
/interfaces
auto swp1
iface swp1
link-
autoneg on
link-speed
40000
cumulusnetworks.com 249
Cumulus Linux 3.7 User Guide
swp1 link
autoneg off
Configuration
in /etc
/network
/interfaces
auto swp1
iface swp1
link-
autoneg off
link-speed
40000
100GBASE- On auto-
CR4 negotiated
$ net add
interface
swp1 link
speed 100000
$ net add
interface
swp1 link
autoneg on
Configuration
in /etc
/network
/interfaces
auto swp1
iface swp1
link-
autoneg on
link-speed
100000
100GBASE- Off RS
SR4,
100G AOC $ net add
interface
swp1 link
speed 100000
$ net add
interface
swp1 link
autoneg off
$ net add
interface
swp1 link fec
rs
Configuration
in /etc
/network
/interfaces
auto swp1
iface swp1
link-
autoneg off
link-speed
100000
link-fec rs
Configuration
in /etc
/network
/interfaces
cumulusnetworks.com 251
Cumulus Linux 3.7 User Guide
auto swp1
iface swp1
link-
autoneg off
link-speed
100000
link-fec
off
25GBASE- On auto-
CR negotiated*
$ net add
interface
swp1 link
speed 25000
$ net add
interface
swp1 link
autoneg on
Configuration
in /etc
/network
/interfaces
auto swp1
iface swp1
link-
autoneg on
link-speed
25000
Configuration
in /etc
/network
/interfaces
auto swp1
iface swp1
link-
autoneg off
link-speed
25000
link-fec
baser
Configuration
in /etc
/network
/interfaces
auto swp1
iface swp1
link-
autoneg off
link-speed
25000
cumulusnetworks.com 253
Cumulus Linux 3.7 User Guide
link-fec
off
Setting the default MTU also applies to the management interface. Be sure to add the
iface_defaults to override the MTU for eth0, to remain at 1500.
Breakout Ports
Cumulus Linux has the ability to:
Break out 100G switch ports into the following with breakout cables:
2x50G, 4x25G, 4x10G
Break out 40G switch ports into four separate 10G ports for use with breakout cables.
Combine (also called aggregating or ganging) four 10G switch ports into one 40G port for use with a
breakout cable (not to be confused with a bond (see page 387)).
To configure a 4x25G breakout port, first configure the port to break out then set the link speed:
...
auto swp3s0
iface swp3s0
auto swp3s1
iface swp3s1
auto swp3s2
iface swp3s2
auto swp3s3
iface swp3s3
...
On Dell switches with Maverick ASICs, you configure breakout ports on the 100G uplink ports by
manually editing the /etc/cumulus/ports.conf file. You need to specify either 4x10 or 4x25
for the port speed. For example, on a Dell S4148F-ON switch, to break out swp26 into 4 25G
ports, modify the line starting with "26=" in ports.conf as follows:
...
cumulusnetworks.com 255
Cumulus Linux 3.7 User Guide
# QSFP+ ports
#
# <port label 27-28> = [4x10G|40G]
27=disabled
28=disabled
# QSFP28 ports
#
# <port label 25-26, 29-30> = [4x10G|4x25G|2x50G|40G|50G|100G]
25=100G
26=4x25G
29=100G
30=100G
...
Then you need to configure the breakout ports in the /etc/network/interfaces file:
...
auto swp26s0
iface swp26s0
auto swp26s1
iface swp3s1
auto swp26s2
iface swp26s2
auto swp26s3
iface swp26s3
...
On Mellanox switches, you need to disable the next port (see below). In this example, you also
run the following before committing the update:
When you commit your change configuring the breakout ports, switchd restarts to apply the
changes. The restart interrupts network services (see page 201).
The /etc/cumulus/ports.conf file varies across different hardware platforms. Check the
current list of supported platforms on the hardware compatibility list.
A snippet from the /etc/cumulus/ports.conf file on a Dell S6000 switch (with a Trident II+
ASIC) where swp6 is broken out looks like this:
cumulusnetworks.com 257
Cumulus Linux 3.7 User Guide
For switches with ports that support 100G speeds, you can break out any 100G port into a variety
of options: four 10G ports, four 25G ports, two 40G ports or two 50G ports. You cannot have
more than 128 total logical ports on a Broadcom switch.
The Mellanox SN2700, SN2700B, SN2410, and SN2410B switches both have a limit of
64 logical ports in total. However, if you want to break out to 4x25G or 4x10G, you must
configure the logical ports as follows:
You can only break out odd-numbered ports into 4 logical ports.
You must disable the next even-numbered port.
These restrictions do not apply to a 2x50G breakout configuration.
For example, if you have a 100G Mellanox SN2700 switch and break out port 11 into 4
logical ports, you must disable port 12 by running net add interface swp12
breakout disabled, which results in this configuration in /etc/cumulus/ports.
conf:
...
11=4x
12=disabled
...
Here is an example showing how to configure breakout cables for the Mellanox Spectrum SN2700
.
1. Remove the breakout port interfaces using NCLU, then commit the change. Continuing with the
original example:
2. Manually edit the /etc/cumulus/ports.conf file to configure the interface for the original speed,
then save your changes:
...
2=100G
3=100G
4=100G
...
cumulusnetworks.com 259
Cumulus Linux 3.7 User Guide
These commands create the following configuration snippet in the /etc/cumulus/ports.conf file:
# SFP+ ports#
# <port label 1-48> = [10G|40G/4]
1=40G/4
2=40G/4
3=40G/4
4=40G/4
5=10G
# ports.conf --
#
# This file controls port aggregation and subdivision. For example,
QSFP+
# ports are typically configurable as either one 40G interface or four
# 10G/1000/100 interfaces. This file sets the number of interfaces
per port
# while /etc/network/interfaces and ethtool configure the link speed
for each
# interface.
#
# You must restart switchd for changes to take effect.
#
# The DELL S6000 has:
# 32 QSFP ports numbered 1-32
# These ports are configurable as 40G, split into 4x10G ports or
# disabled.
#
# The X pipeline covers QSFP ports 1 through 16 and the Y pipeline
# covers QSFP ports 17 through 32.
#
# The Trident2 chip can only handle 52 logical ports per pipeline.
#
# This means 13 is the maximum number of 40G ports you can ungang
# per pipeline, with the remaining three 40G ports set to
# "disabled". The 13 40G ports become 52 unganged 10G ports, which
# totals 52 logical ports for that pipeline.
The means the maximum number of ports for this Dell S6000 is 104.
Mellanox SN2700 and SN2700B switches have a limit of 64 logical ports in total. However, the logical ports
must be configured in a specific way. See the note (see page 255) above.
Statistics
High-level interface statistics are available with the net show interface command:
cumulusnetworks.com 261
Cumulus Linux 3.7 User Guide
Counters TX RX
---------- ---- ----
errors 0 0
unicast 0 0
broadcast 0 0
multicast 0 0
LLDP
------ ---- ---------------------------
swp1 ==== 44:38:39:00:00:03(server01)
SoftOutErrors: 0
SoftOutDrops: 0
SoftOutTxFifoFull: 0
HwIfOutQLen: 0
25G and 100G cores do not support 1000Base-X auto-negotiation (Clause 37) which is
recommended for 1G Fiber optical modules. As a result, 1G fiber breaks cannot be detected. 1G
Fiber modules are not recommended on 25G ports.
cumulusnetworks.com 263
Cumulus Linux 3.7 User Guide
Or using ethtool:
Related Information
264 09 January 2019
Cumulus Networks
Related Information
Debian - Network Configuration
Linux Foundation - VLANs
Linux Foundation - Bonds
While it's possible to change the buffer limits in the datapath.conf file, Cumulus Networks
strongly recommends you work with a Cumulus support engineer to do so.
Each packet is assigned to an ASIC Class of Service (CoS) value based on the packet's priority value stored in
the 802.1p (Class of Service) or DSCP (Differentiated Services Code Point) header field. The choice to
schedule packets based on COS or DSCP is a configurable option in the /etc/cumulus/datapath
/traffic.conf file.
Priority groups include:
Control: Highest priority traffic
Service: Second-highest priority traffic
Bulk: All remaining traffic
The scheduler is configured to use a hybrid scheduling algorithm. It applies strict priority to control traffic
queues and a weighted round robin selection from the remaining queues. Unicast packets and multicast
packets with the same priority value are assigned to separate queues, which are assigned equal scheduling
weights.
Datapath configuration takes effect when you initialize switchd. Changes to the traffic.conf file
require you to restart the switchd (see page 201)service.
You can configure Quality of Service (QoS) for switches on the following platforms only:
Broadcom Helix4, Tomahawk, Trident II, Trident II+ and Trident3
Mellanox Spectrum
Contents
This topic describes ...
Commands (see page 266)
Example Configuration File (see page 266)
Commands
If you modify the configuration in the /etc/cumulus/datapath/traffic.conf file, you must restart
switchd (see page 201)for the changes to take effect:
cumulusnetworks.com 267
Cumulus Linux 3.7 User Guide
traffic.cos_2.priority_source.8021p = [2]
traffic.cos_3.priority_source.8021p = []
traffic.cos_4.priority_source.8021p = [3,4]
traffic.cos_5.priority_source.8021p = [5]
traffic.cos_6.priority_source.8021p = [6]
traffic.cos_7.priority_source.8021p = [7]
# dscp values = {0..63}
traffic.cos_0.priority_source.dscp = [0,1,2,3,4,5,6,7]
traffic.cos_1.priority_source.dscp = [8,9,10,11,12,13,14,15]
traffic.cos_2.priority_source.dscp = []
traffic.cos_3.priority_source.dscp = []
traffic.cos_4.priority_source.dscp = []
traffic.cos_5.priority_source.dscp = []
traffic.cos_6.priority_source.dscp = []
traffic.cos_7.priority_source.dscp =
[56,57,58,59,60,61,62,63]
# Per-port source packet fields and mapping: applies to the
designated set of ports.
source.port_group_list = [source_port_group]
source.source_port_group.packet_priority_source_set = [802.1p,dscp]
source.source_port_group.port_set = swp1-swp4,swp6
source.source_port_group.cos_0.priority_source.8021p = [7]
source.source_port_group.cos_1.priority_source.8021p = [6]
source.source_port_group.cos_2.priority_source.8021p = [5]
source.source_port_group.cos_3.priority_source.8021p = [4]
source.source_port_group.cos_4.priority_source.8021p = [3]
source.source_port_group.cos_5.priority_source.8021p = [2]
source.source_port_group.cos_6.priority_source.8021p = [1]
source.source_port_group.cos_7.priority_source.8021p = [0]
# priority groups
traffic.priority_group_list = [control, service, bulk]
# internal cos values assigned to each priority group
# each cos value should be assigned exactly once
# internal cos values {0..7}
priority_group.control.cos_list = [7]
priority_group.service.cos_list = [2]
priority_group.bulk.cos_list = [0,1,3,4,5,6]
# to configure priority flow control on a group of ports:
# -- assign cos value(s) to the cos list
# -- add or replace a port group names in the port group list
# -- for each port group in the list
# -- populate the port set, e.g.
# swp1-swp4,swp8,swp50s0-swp50s3
# -- set a PFC buffer size in bytes for each port in the group
# -- set the xoff byte limit (buffer limit that triggers PFC frame
transmit to start)
# -- set the xon byte delta (buffer limit that triggers PFC frame
transmit to stop)
# -- enable PFC frame transmit and/or PFC frame receive
# priority flow control
# pfc.port_group_list = [pfc_port_group]
# pfc.pfc_port_group.cos_list = []
# pfc.pfc_port_group.port_set = swp1-swp4,swp6
# pfc.pfc_port_group.port_buffer_bytes = 25000
# pfc.pfc_port_group.xoff_size = 10000
# pfc.pfc_port_group.xon_delta = 2000
# pfc.pfc_port_group.tx_enable = true
# pfc.pfc_port_group.rx_enable = true
# to configure pause on a group of ports:
# -- add or replace port group names in the port group list
# -- for each port group in the list
# -- populate the port set, e.g.
# swp1-swp4,swp8,swp50s0-swp50s3
# -- set a pause buffer size in bytes for each port in the group
# -- set the xoff byte limit (buffer limit that triggers pause
frames transmit to start)
# -- set the xon byte delta (buffer limit that triggers pause
frames transmit to stop)
# link pause
# link_pause.port_group_list = [pause_port_group]
# link_pause.pause_port_group.port_set = swp1-swp4,swp6
# link_pause.pause_port_group.port_buffer_bytes = 25000
# link_pause.pause_port_group.xoff_size = 10000
# link_pause.pause_port_group.xon_delta = 2000
# link_pause.pause_port_group.rx_enable = true
# link_pause.pause_port_group.tx_enable = true
# scheduling algorithm: algorithm values = {dwrr}
scheduling.algorithm = dwrr
# traffic group scheduling weight
# weight values = {0..127}
# '0' indicates strict priority
priority_group.control.weight = 0
priority_group.service.weight = 32
priority_group.bulk.weight = 16
# To turn on/off Denial of service (DOS) prevention checks
dos_enable = false
# Cut-through is disabled by default on all chips with the exception
of
# Spectrum. On Spectrum cut-through cannot be disabled.
#cut_through_enable = false
# Enable resilient hashing
#resilient_hash_enable = FALSE
# Resilient hashing flowset entries per ECMP group
# Valid values - 64, 128, 256, 512, 1024
#resilient_hash_entries_ecmp = 128
# Enable symmetric hashing
#symmetric_hash_enable = TRUE
# Set sflow/sample ingress cpu packet rate and burst in packets/sec
# Values: {0..16384}
#sflow.rate = 16384
#sflow.burst = 16384
#Specify the maximum number of paths per route entry.
# Maximum paths supported is 200.
cumulusnetworks.com 269
Cumulus Linux 3.7 User Guide
# Default value 0 takes the number of physical ports as the max path
size.
#ecmp_max_paths = 0
#Specify the hash seed for Equal cost multipath entries
# Default value 0
# Value Rang: {0..4294967295}
#ecmp_hash_seed = 42
# Specify the forwarding table resource allocation profile, applicable
# only on platforms that support universal forwarding resources.
#
# /usr/cumulus/sbin/cl-rsource-query reports the allocated table sizes
# based on the profile setting.
#
# Values: one of {'default', 'l2-heavy', 'v4-lpm-heavy', 'v6-lpm-
heavy'}
# Default value: 'default'
# Note: some devices may support more modes, please consult user
# guide for more details
#
#forwarding_table.profile = default
On Mellanox Spectrum switches, packet priority remark must be enabled on the ingress port. A
packet received on a remark-enabled port is remarked according to the priority mapping
configured on the egress port. If packet priority remark is configured the same way on every port,
the default configuration example above is correct. However, per-port customized configurations
require two port groups: one for the ingress ports and one for the egress ports, as below:
remark.port_group_list = [ingress_remark_group,
egress_remark_group]
remark.ingress_remark_group.packet_priority_remark_set = [dscp]
remark.remark_port_group.port_set = swp1-swp4,swp6
remark.egress_remark_group.port_set = swp10-swp20
remark.egress_remark_group.cos_0.priority_remark.dscp = [2]
remark.egress_remark_group.cos_1.priority_remark.dscp = [10]
remark.egress_remark_group.cos_2.priority_remark.dscp = [18]
remark.egress_remark_group.cos_3.priority_remark.dscp = [26]
remark.egress_remark_group.cos_4.priority_remark.dscp = [34]
remark.egress_remark_group.cos_5.priority_remark.dscp = [42]
remark.egress_remark_group.cos_6.priority_remark.dscp = [50]
remark.egress_remark_group.cos_7.priority_remark.dscp = [58]
[ebtables]
-A FORWARD -o swp5 -j setqos --set-cos 5
Option Description
--set-cos Sets the datapath resource/queuing class value. Values are defined in IEEE_P802.1p.
INT
--set-dscp Sets the DSCP field in packet header to a value, which can be either a decimal or hex value.
value
--set-dscp- Sets the DSCP field in the packet header to the value represented by the DiffServ class
class class value. This class can be EF, BE or any of the CSxx or AFxx classes.
[iptables]
-t mangle -A FORWARD --in-interface swp+ -p tcp --dport bgp -j SETQOS
--set-dscp 10 --set-cos 5
[ip6tables]
-t mangle -A FORWARD --in-interface swp+ -j SETQOS --set-dscp 10
You can put the rule in either the mangle table or the default filter table; the mangle table and filter table
are put into separate TCAM slices in the hardware.
To put the rule in the mangle table, include -t mangle; to put the rule in the filter table, omit -t mangle.
cumulusnetworks.com 271
Cumulus Linux 3.7 User Guide
PFC is a layer 2 mechanism that prevents congestion by throttling packet transmission. When PFC is
enabled for received packets on a set of switch ports, the switch detects congestion in the ingress buffer of
the receiving port and signals the upstream switch to stop sending traffic. If the upstream switch has PFC
enabled for packet transmission on the designated priorities, it responds to the downstream switch and
stops sending those packets for a period of time.
PFC operates between two adjacent neighbor switches; it does not provide end-to-end flow control.
However, when an upstream neighbor throttles packet transmission, it could build up packet congestion
and propagate PFC frames further upstream: eventually the sending server could receive PFC frames and
stop sending traffic for a time.
The PFC mechanism can be enabled for individual switch priorities on specific switch ports for RX and/or TX
traffic. The switch port’s ingress buffer occupancy is used to measure congestion. If congestion is present,
the switch transmits flow control frames to the upstream switch. Packets with priority values that do not
have PFC configured are not counted during congestion detection; neither do they get throttled by the
upstream switch when it receives flow control frames.
PFC congestion detection is implemented on the switch using xoff and xon threshold values for the specific
ingress buffer which is used by the targeted switch priorities. When a packet enters the buffer and the
buffer occupancy is above the xoff threshold, the switch transmits an Ethernet PFC frame to the upstream
switch to signal packet transmission should stop. When the buffer occupancy drops below the xon
threshold, the switch sends another PFC frame upstream to signal that packet transmission can resume.
(PFC frames contain a quanta value to indicate a timeout value for the upstream switch: packet
transmission can resume after the timer has expired, or when a PFC frame with quanta == 0 is received
from the downstream switch.)
After the downstream switch has sent a PFC frame upstream, it continues to receive packets until the
upstream switch receives and responds to the PFC frame. The downstream ingress buffer must be large
enough to store those additional packets after the xoff threshold has been reached.
Before Cumulus Linux 3.1.1, PFC was designated as a lossless priority group. The lossless priority
group has been removed from Cumulus Linux.
Priority flow control is fully supported on both Broadcom and Mellanox switches.
PFC is disabled by default in Cumulus Linux. Enabling priority flow control (PFC) requires configuring the
following settings in /etc/cumulus/datapath/traffic.conf on the switch:
Specifying the name of the port group in pfc.port_group_list in brackets; for example, pfc.
port_group_list = [pfc_port_group].
Assigning a CoS value to the port group in pfc.pfc_port_group.cos_list setting. Note that
pfc_port_group is the name of a port group you specified above and is used throughout the following
settings.
Populating the port group with its member ports in pfc.pfc_port_group.port_set.
Port Groups
A port group refers to one or more sequences of contiguous ports. Multiple port groups can be defined by:
Adding a comma-separated list of port group names to the port_group_list.
Adding the port_set, rx_enable, and tx_enable configuration lines for each port group.
cumulusnetworks.com 273
Cumulus Linux 3.7 User Guide
Adding the port_set, rx_enable, and tx_enable configuration lines for each port group.
You can specify the set of ports in a port group in comma-separated sequences of contiguous ports; you
can see which ports are contiguous in /var/lib/cumulus/porttab. The syntax supports:
A single port (swp1s0 or swp5)
A sequence of regular swp ports (swp2-swp5)
A sequence within a breakout swp port (swp6s0-swp6s3)
A sequence of regular and breakout ports, provided they are all in a contiguous range. For example:
...
swp2
swp3
swp4
swp5
swp6s0
swp6s1
swp6s2
swp6s3
swp7
...
Restart switchd (see page 201)to allow the PFC configuration changes to take effect:
What's the difference between link pause and priority flow control?
Priority flow control is applied to an individual priority group for a specific ingress port.
Link pause (also known as port pause or global pause) is applied to all the traffic for a specific
ingress port.
Here is an example configuration that enables both types of link pause for swp1 through swp4 and swp6:
# link pause
link_pause.port_group_list = [pause_port_group]
link_pause.pause_port_group.port_set = swp1-swp4,swp6
link_pause.pause_port_group.port_buffer_bytes = 25000
link_pause.pause_port_group.xoff_size = 10000
link_pause.pause_port_group.xon_delta = 2000
link_pause.pause_port_group.rx_enable = true
link_pause.pause_port_group.tx_enable = true
Restart switchd (see page 201)to allow link pause configuration changes to take effect:
To work around this issue, disable link pause or disable cut-through mode in /etc/cumulus/datapath
/traffic.conf.
cumulusnetworks.com 275
Cumulus Linux 3.7 User Guide
On Trident II switches only, if ECN is enabled on a specific queue, the ASIC also enables RED
on the same queue. If the packet is ECT marked (the ECN bits are 01 or 10), the ECN mechanism
executes as described above. However, if it is entering an ECN-enabled queue but is not ECT
marked (the ECN bits are 00), then the RED mechanism uses the same threshold and probability
values to decide whether to drop the packet. Packets entering a non-ECN-enabled queue do not
get marked or dropped due to ECN or RED in any case.
ECN is implemented on the switch using minimum and maximum threshold values for the egress queue
length. When a packet enters the queue and the average queue length is between the minimum and
maximum threshold values, a configurable probability value will determine whether the packet will be
marked. If the average queue length is above the maximum threshold value, the packet is always marked.
The downstream switches with ECN enabled perform the same actions as the traffic is received. If the ECN
bits are set, they remain set. The only way to overwrite ECN bits is to enable it — that is, set the ECN bits to
11.
ECN is supported on Broadcom Tomahawk, Trident II, Trident II+ and Trident3, and Mellanox Spectrum
switches only.
Click to learn how to configure ECN ...
ECN is disabled by default in Cumulus Linux. You can enable ECN for individual switch priorities on specific
switch ports. ECN requires configuring the following settings in /etc/cumulus/datapath/traffic.
conf on the switch:
Specifying the name of the port group in ecn.port_group_list in brackets; for example, ecn.
port_group_list = [ecn_port_group].
Assigning a CoS value to the port group in ecn.ecn_port_group.cos_list. If the CoS value of a
packet matches the value of this setting, then ECN is applied. Note that ecn_port_group is the name
of a port group you specified above.
Populating the port group with its member ports (ecn.ecn_port_group.port_set), where
ecn_port_group is the name of the port group you specified above. Congestion is measured on the
egress port queue for the ports listed here, using the average queue length: if congestion is present,
a packet entering the queue may be marked to indicate that congestion was observed. Marking a
packet involves setting the least 2 significant bits in the IP header DiffServ (ToS) field to 11.
The switch priority value(s) are mapped to specific egress queues for the target switch ports.
The ecn.ecn_port_group.probability value indicates the probability of a packet being
marked if congestion is experienced.
The following configuration example shows ECN configured for ports swp1 through swp4 and swp6:
Restart switchd (see page 201)to allow the ECN configuration changes to take effect:
cumulusnetworks.com 277
Cumulus Linux 3.7 User Guide
Related Information
iptables-extensions man page
Supported ASICs
DDOS protection is available for the following Broadcom ASICs:
Helix4
Maverick
Tomahawk
Tomahawk+
Trident
Trident-II
Trident-II+
Trident3
Cumulus Networks recommends enabling this feature when deploying a switch with one of the above
mentioned ASICs, as hardware-based DDOS protection is disabled by default. Although Cumulus
recommends enabling all of the above criteria, they can be individually enabled if desired. None of them are
enabled by default.
DDOS protection is not supported on Broadcom Hurricane2 and Mellanox Spectrum ASICs.
Configuring any of the following settings affects the BFD echo (see page 805) function. For
example, if you enable dos.udp_ports_eq, all the BFD packets will get dropped because
the BFD protocol uses the same source and destination UDP ports.
dos.sip_eq_dip
dos.smac_eq_dmac
dos.tcp_ctrl0_seq0
dos.tcp_flags_fup_seq0
dos.tcp_flags_syn_fin
dos.tcp_ports_eq
dos.tcp_syn_frag
dos.udp_ports_eq
DHCP Relays
You can configure DHCP relays for IPv4 and IPv6.
To run DHCP for both IPv4 and IPv6, initiate the DHCP relay once for IPv4 and once for IPv6. Following are
the configurations on the server hosts, DHCP relay, and DHCP server using the following topology:
The dhcpd and dhcrelay services are disabled by default. After you finish configuring the DHCP
relays and servers, you need to start those services.
Contents
This topic describes ...
Configure IPv4 DHCP Relays (see page 281)
DHCP Option 82 (see page 282)
Control the Gateway IP Address with RFC 3527 (see page 282)
Configure IPv6 DHCP Relays (see page 284)
Configure Multiple DHCP Relays (see page 285)
Configure a DHCP Relay with VRR (see page 286)
Configure the DHCP Relay Service Manually (Advanced) (see page 286)
Use the Gateway IP Address as the Source IP for Relayed DHCP Packets (Advanced) (see page 287)
Troubleshooting (see page 287)
You configure a DHCP relay on a per-VLAN basis, specifying the SVI, not the parent bridge; in our
example, you would specify vlan1 as the SVI for VLAN 1; do not specify the bridge named bridge in
this case.
As per RFC 3046, you can specify as many server IP addresses that can fit in 255 octets, specifying
each address only once.
After you finish configuring DHCP relay, restart then enable the dhcrelay service so the configuration
persists between reboots:
To see the DHCP relay status, use the systemctl status dhcrelay.service command:
cumulusnetworks.com 281
Cumulus Linux 3.7 User Guide
DHCP Option 82
You can configure DHCP relays to inject the circuit-id field with the -a option, which you add to the
OPTIONS line in the /etc/default/isc-dhcp-relay file. By default, the ingress SVI interface against
which the relayed DHCP discover packet is processed is injected into this field. You can change this
behavior by adding the --use-pif-circuit-id option. With this option, the physical switch port (swp)
on which the discover packet arrives is placed in the circuit-id field.
The following illustration demonstrates how you can control the giaddr with RFC 3527.
To enable RFC 3527 support and control the giaddr, run the net add dhcp relay giaddr-interface
command with interface/IP address you want to use.
The following example uses the first IP address on the loopback interface as the giaddr:
The above command creates the following configuration in the /etc/default/isc-dhcp-relay file:
The first IP address on the loopback interface is typically the 127.0.0.1 address; Cumulus
Networks recommends that you use more specific syntax, as shown in the next example.
The following example uses IP address 10.0.0.1 on the loopback interface as the giaddr:
The above command creates the following configuration in the /etc/default/isc-dhcp-relay file:
cumulusnetworks.com 283
Cumulus Linux 3.7 User Guide
The following example uses the first IP address on swp2 as the giaddr:
The above command creates the following configuration in the /etc/default/isc-dhcp-relay file:
The above command creates the following configuration in the /etc/default/isc-dhcp-relay file:
After you finish configuring the DHCP relay, save your changes, restart the dhcrelay6 service, then enable
the dhcrelay6 service so the configuration persists between reboots:
To see the status of the IPv6 DHCP relay, use the systemctl status dhcrelay6.service command:
1. As the sudo user, open the /etc/vrf/systemd.conf file in a text editor and remove dhcrelay.
2. To reload the systemd files, run the following command:
3. Create a config file in /etc/default using the following format for each dhcrelay: isc-dhcp-
relay-<dhcp-name>. An example file is shown below:
cumulusnetworks.com 285
Cumulus Linux 3.7 User Guide
4. Run the following command to start a dhcrelay instance. Replace dhcp-name with the instance
name or number:
Use the Gateway IP Address as the Source IP for Relayed DHCP Packets
(Advanced)
Using the gateway IP address as the source IP for relayed DHCP packets
You can configure the dhcrelay service to forward IPv4 (only) DHCP packets to a server and ensure that
the source IP address of the relayed packet is the same as the gateway IP address. You do this by enabling
the giaddr-src option; when set, dhcrelay attempts to set the source IP address of the packet to be
the gateway IP address.
This option impacts all relayed packets globally.
To enable this feature:
Troubleshooting
If you are experiencing issues with the DHCP relay, run the following commands to determine if the issue is
with systemd. The following commands manually activate the DHCP relay process and they do not persist
when you reboot the switch:
For example:
cumulusnetworks.com 287
Cumulus Linux 3.7 User Guide
You can run the journalctl command with the --since flag to specify a time period:
DHCP Servers
To run DHCP for both IPv4 and IPv6, you need to initiate the DHCP server twice: once for IPv4 and once for
IPv6. The following configuration uses the following topology for the host, DHCP relay and DHCP server:
For the configurations used in this chapter, the DHCP server is a switch running Cumulus Linux; however,
the DHCP server can also be located on a dedicated server in your environment.
The dhcpd and dhcrelay services are disabled by default. After you finish configuring the DHCP
relays and servers, you need to start those services.
Contents
This topic describes ...
Configure the DHCP Server on Cumulus Linux Switches (see page 289)
Configure the IPv4 DHCP Server (see page 289)
Configure the IPv6 DHCP Server (see page 290)
Assign Port-Based IP Addresses (see page 290)
Troubleshooting (see page 291)
default-lease-time 600;
max-lease-time 7200;
Just as you did with the DHCP relay scripts, edit the DHCP server configuration file so it can launch the
DHCP server when the system boots. Here is a sample configuration:
INTERFACES="swp1"
After you've finished configuring the DHCP server, enable and start the dhcpd service immediately:
cumulusnetworks.com 289
Cumulus Linux 3.7 User Guide
default-lease-time 600;
max-lease-time 7200;
subnet6 2001:db8:100::/64 {
}
subnet6 2001:db8:1::/64 {
range6 2001:db8:1::100 2001:db8:1::200;
}
Just as you did with the DHCP relay scripts, edit the DHCP server configuration file so it can launch the
DHCP server when the system boots. Here is a sample configuration:
INTERFACES="swp1"
After you've finished configuring the DHCP server, enable and start the dhcpd6 service immediately:
host myhost {
ifname = "swp1" ;
fixed_address = 10.10.10.10 ;
}
Troubleshooting
The DHCP server knows whether a DHCP request is a relay or a non-relay DHCP request. On isc-dhcp-
server, for example, it is possible to tail the log and look at the behavior firsthand:
Contents
This topic describes ...
The Voyager Platform (see page 292)
Inside the AC400 (see page 293)
Client to Network Connection (see page 294)
Configure the Voyager Ports (see page 295)
Configure the Transponder Modules (see page 296)
Set the Transponder State (see page 296)
Disable the Transmitter (see page 297)
Change the Grid Spacing (see page 298)
Set the Channel Frequency (see page 299)
Set the Transmit Power (see page 300)
Change the Modulation (see page 300)
Set the Differential Encoding (see page 301)
cumulusnetworks.com 291
Cumulus Linux 3.7 User Guide
The fc designations on the Tomahawk stand for Falcon Core. Each AC400 module has four 100G interfaces
connected to the Tomahawk and two interfaces connected to the front of the box.
cumulusnetworks.com 293
Cumulus Linux 3.7 User Guide
QPSK—Quadrature phase shift keying. When a network interface is using QPSK modulation, it carries
100Gbps and is therefore connected to only one client interface.
16-QAM—Quadrature amplitude modulation with 4 bits per symbol. When a network interface is using 16-
QAM modulation, it carries 200Gbps and is therefore connected to two client interfaces. Each of the two
client interfaces carried on a network interface is called a tributary. The AC400 adds extra information so
that these tributaries can be sorted out at the far end and delivered to the appropriate client interface.
8-QAM—Quadrature amplitude modulation with 3 bits per symbol. When a network interface is using 8-
QAM modulation, it carries 150Gbps. In this case, the two network interfaces in an AC400 module must be
coupled, so that the total bandwidth carried by the two interfaces is 300Gbps. Three client interfaces are
used with this modulation format. However, unlike other modulation formats that use independent mode,
the coupled mode means that data from each client interface is carried on both of the network interfaces.
cumulusnetworks.com 295
Cumulus Linux 3.7 User Guide
The following example /etc/cumulus/ports.conf file shows configuration for all of the modes.
Using NCLU commands is the preferred way to configure the transponder modules. However, as
an alternative, you can edit the /etc/cumulus/transponders.ini file to make configuration
changes. See Edit the transponder.ini file (see page 309) below.
Setting Description
reset The module is in the reset state. The module cannot be accessed and remains non-operational
until the state is changed to one of the other states.
low- The module is in the low-power configuration state. The network interfaces are not powered
power up. This state can be used to configure the module before bringing it online.
tx-off The receivers and transmitters are turned up, but there is nothing being transmitted.
To change the state of the module, run the net add interface <trans-port> state (reset|low-
power|tx-off|ready) command. For example, to change the state of the transponder module to low
power for L2, run the following command:
Use caution when changing the setting; although this command specifies a port, it affects an
entire module. State changes on modules with multiple ports affect all ports on the module, not
just the port specified.
cumulusnetworks.com 297
Cumulus Linux 3.7 User Guide
To enable the transmitter of an individual network interface, run the net del interface <trans-
port> transmit-disable command. The following example command enables the L1 transmitter:
The following example shows the command with the output when using tab completion:
cumulusnetworks.com 299
Cumulus Linux 3.7 User Guide
To see a complete list of the frequencies, channels, and wavelengths, run the net show transponder
frequency-map command (described in Display Available Frequencies (see page 307)).
Changing the modulation also changes the Linux interfaces available in the system, removing existing
interfaces and adding the new ones. Therefore, you must remove network interfaces with the net del
interface swpLx... command before you change the modulation. The network interfaces created for
each modulation are as follows (L1 is used as an example):
pm-qpsk swpL1
Because 8-qam modulation requires both network interfaces on a module to operate together, changing
the modulation on one interface also changes it on the other. Also, the network mode of the module
changes automatically to coupled when changing to 8-qam and reverts to independent when leaving 8-
qam modulation.
The only modulation format that allows the 15%_ac100 FEC mode is pm-qpsk. Attempting to change the
modulation from pm-qpsk while 15%_ac100 FEC is configured is not allowed. First change the FEC mode to
something other than 15%_ac100 and then the modulation.
cumulusnetworks.com 301
Cumulus Linux 3.7 User Guide
ModulationFormat = 16-qam
DifferentialEncoding = false
...
The following example command reverts to differential encoding (the default) for L1:
TxEnable = true
TxGridSpacing = 50ghz
TxChannel = 52
OutputPower = 10.0
TxFineTuneFrequency = 0
MasterEnable = true
ModulationFormat = 16-qam
DifferentialEncoding = true
FecMode = 15%
...
To disable loopback mode, run the net del interface <interface> facility-loopback
command. The following example disables loopback mode on the L1, L2, L3, and L4 network interfaces:
To enable loopback on the client interface (internal loopback for DWDM testing), edit the /etc
/cumulus/transponders.ini file. See Edit the transponder.ini file (see page 309) below
cumulusnetworks.com 303
Cumulus Linux 3.7 User Guide
Network
Interfaces
L3
L4
---------------------------
---------------------------
Modulation 16-qam 16-
qam
Frequency 193.70 THz, Channel 52 193.70 THz,
Channel 52
Current BER 1.428e-04 1.387e-
05
Current OSNR 84.90dBm 84.80dBm
TX/RX Power 0.99dBm/0.66dBm 1.00dBm/0.
43dBm
Encoding differential
differential
Alignment TX & RX TX &
RX
Grid Spacing 50ghz
50ghz
FEC Mode 25%
25%
Uncorrectable FEC Errs 0
0
TX/RX Turn-up power_adjusted/locked power_adjusted
/locked
Network
Interfaces
L1
L2
---------------------------
---------------------------
Modulation 16-qam 16-
qam
Frequency 193.70 THz, Channel 52 193.70 THz,
Channel 52
Current BER 7.039e-05 7.404e-
05
Current OSNR 84.90dBm 84.80dBm
TX/RX Power 0.98dBm/0.48dBm 0.99dBm/-0.78
dBm
Encoding differential
differential
To display only the status of a particular module, use the module <trans-module> option, which
specifies the transponder module number. The following example command displays the status of
transponder module 1:
Network
Interfaces
L3
L4
---------------------------
---------------------------
Modulation 16-qam 16-qam
Frequency 193.70 THz, Channel 52 193.70 THz,
Channel 52
Current BER 1.626e-04 1.343e-05
Current OSNR 84.90dBm 84.80
dBm
TX/RX Power 1.00dBm/0.67dBm 0.99dBm/0.
42dBm
Encoding differential
differential
Alignment TX & RX TX &
RX
Grid Spacing 50ghz
50ghz
FEC Mode 25%
25%
Uncorrectable FEC Errs 0
0
TX/RX Turn-up power_adjusted/locked power_adjusted
/locked
To display more information, including the host interfaces, use the verbose option. The following example
command displays more information about the transponder module:
cumulusnetworks.com 305
Cumulus Linux 3.7 User Guide
To display all status information in JSON format, use the json option. The following example command
displays all status information in JSON format:
The following example command displays a map of available channel frequencies, numbers, and
wavelengths in JSON format.
cumulusnetworks.com 307
Cumulus Linux 3.7 User Guide
],
[
4,
191.3,
1567.13
],
...
transponders
AC400_1
Location
1
NetworkMode
independent
L3
Location
0
TxEnable
true
TxGridSpacing
50ghz
TxChannel
52
OutputPower
1
TxFineTuneFrequency
0
MasterEnable
true
ModulationFormat
16-qam
DifferentialEncoding
true
FecMode
25%
Loopback
false
TxTributaryIndependent
0
1
TxTributaryCoupled
0
1
2
15
...
Using NCLU commands to configure the transponder modules is the preferred method.
However, not all configuration options are available with NCLU. If you want to change a
transponder module configuration setting that does not have an NCLU command, you can
change the setting manually in the transponders.ini file, then initiate the hardware update.
Use caution when editing the /etc/cumulus/transponders.ini file.
#
# Configuration file for Voyager transponder modules
#
[Modules]
Names=AC400_1,AC400_2
[AC400_1]
Location=1
NetworkMode=independent
NetworkInterfaces=L3,L4
cumulusnetworks.com 309
Cumulus Linux 3.7 User Guide
HostInterfaces=Client0,Client1,Client2,Client3
OperStatus=ready
[AC400_2]
Location=2
NetworkMode=independent
NetworkInterfaces=L1,L2
HostInterfaces=Client4,Client5,Client6,Client7
OperStatus=ready
[L1]
Location=0
TxEnable=true
TxGridSpacing=50ghz
TxChannel=52
OutputPower=1
TxFineTuneFrequency=0
MasterEnable=true
ModulationFormat=16-qam
DifferentialEncoding=true
FecMode=25%
TxTributaryIndependent=0,1
TxTributaryCoupled=0,1,2,15
Loopback=false
[L2]
Location=1
TxEnable=true
TxGridSpacing=50ghz
TxChannel=52
OutputPower=1
TxFineTuneFrequency=0
MasterEnable=true
ModulationFormat=16-qam
DifferentialEncoding=true
FecMode=25%
TxTributaryIndependent=2,3
TxTributaryCoupled=0,1,2,15
Loopback=false
[L3]
Location=0
TxEnable=true
TxGridSpacing=50ghz
TxChannel=52
OutputPower=1
TxFineTuneFrequency=0
MasterEnable=true
ModulationFormat=16-qam
DifferentialEncoding=true
FecMode=25%
TxTributaryIndependent=0,1
TxTributaryCoupled=0,1,2,15
Loopback=false
[L4]
Location=1
TxEnable=true
TxGridSpacing=50ghz
TxChannel=52
OutputPower=1
TxFineTuneFrequency=0
MasterEnable=true
ModulationFormat=16-qam
DifferentialEncoding=true
FecMode=25%
TxTributaryIndependent=2,3
TxTributaryCoupled=0,1,2,15
Loopback=false
[Client0]
Location=0
Rate=100ge
Enable=true
FecDecoder=false
FecEncoder=false
DeserialLfCtleGain=1
DeserialCtleGain=18
DeserialDfeCoeff=0
SerialTap0Gain=3
SerialTap0Delay=3
SerialTap1Gain=6
SerialTap2Gain=12
SerialTap2Delay=6
RxTributaryIndependent=0
RxTributaryCoupled=0
Loopback=false
[Client1]
Location=1
Rate=100ge
Enable=true
FecDecoder=false
FecEncoder=false
DeserialLfCtleGain=1
DeserialCtleGain=18
DeserialDfeCoeff=0
SerialTap0Gain=3
SerialTap0Delay=3
SerialTap1Gain=6
SerialTap2Gain=12
SerialTap2Delay=6
RxTributaryIndependent=1
RxTributaryCoupled=1
cumulusnetworks.com 311
Cumulus Linux 3.7 User Guide
Loopback=false
[Client2]
Location=2
Rate=100ge
Enable=true
FecDecoder=false
FecEncoder=false
DeserialLfCtleGain=1
DeserialCtleGain=18
DeserialDfeCoeff=0
SerialTap0Gain=3
SerialTap0Delay=3
SerialTap1Gain=6
SerialTap2Gain=12
SerialTap2Delay=6
RxTributaryIndependent=2
RxTributaryCoupled=2
Loopback=false
[Client3]
Location=3
Rate=100ge
Enable=true
FecDecoder=false
FecEncoder=false
DeserialLfCtleGain=1
DeserialCtleGain=18
DeserialDfeCoeff=0
SerialTap0Gain=3
SerialTap0Delay=3
SerialTap1Gain=6
SerialTap2Gain=12
SerialTap2Delay=6
RxTributaryIndependent=3
RxTributaryCoupled=65535
Loopback=false
[Client4]
Location=0
Rate=100ge
Enable=true
FecDecoder=false
FecEncoder=false
DeserialLfCtleGain=1
DeserialCtleGain=18
DeserialDfeCoeff=0
SerialTap0Gain=3
SerialTap0Delay=3
SerialTap1Gain=5
SerialTap2Gain=9
SerialTap2Delay=5
RxTributaryIndependent=0
RxTributaryCoupled=0
Loopback=false
[Client5]
Location=1
Rate=100ge
Enable=true
FecDecoder=false
FecEncoder=false
DeserialLfCtleGain=1
DeserialCtleGain=18
DeserialDfeCoeff=0
SerialTap0Gain=3
SerialTap0Delay=3
SerialTap1Gain=5
SerialTap2Gain=9
SerialTap2Delay=5
RxTributaryIndependent=1
RxTributaryCoupled=1
Loopback=false
[Client6]
Location=2
Rate=100ge
Enable=true
FecDecoder=false
FecEncoder=false
DeserialLfCtleGain=1
DeserialCtleGain=18
DeserialDfeCoeff=0
SerialTap0Gain=3
SerialTap0Delay=3
SerialTap1Gain=5
SerialTap2Gain=9
SerialTap2Delay=5
RxTributaryIndependent=2
RxTributaryCoupled=2
Loopback=false
[Client7]
Location=3
Rate=100ge
Enable=true
FecDecoder=false
FecEncoder=false
DeserialLfCtleGain=1
DeserialCtleGain=18
DeserialDfeCoeff=0
SerialTap0Gain=3
SerialTap0Delay=3
SerialTap1Gain=5
cumulusnetworks.com 313
Cumulus Linux 3.7 User Guide
SerialTap2Gain=9
SerialTap2Delay=5
RxTributaryIndependent=3
RxTributaryCoupled=65535
Loopback=false
Modules Group
The Modules group identifies the names of the other groups in the file. This is the root group from which
all other groups are referenced; it must always be the first group in the file and must be named Modules.
There is only one key-value pair in this group. Each value in the list represents a transponder in the system.
There must be a group within the file that has the same name as each value in the list.
The following example shows that there are two modules in the system named AC400_1 and AC400_2. The
transponders.ini file must conain these two groups.
[Modules]
Names=AC400_1,AC400_2
Module Groups
The module groups are i ndividual groups for each of the predefined modules and define the attributes of
the transponders in the system. The name of a module group is defined in the values of the Names key in
the Modules group (shown above).
The following table describes the key-value pairs in the module groups.
Location Integer: 1 or 2 The location or identifier of the module within Voyager. Voyager
has two modules which are identified by indexes 1 and 2.
Module 1 is connected to external network interfaces
labeled L3 and L4.
Module 2 is connected to L1 and L2.
NetworkMode String: The overall mode of the two network interfaces on the module:
independent
In coupled mode, traffic from a client interface travels
or coupled
on both network interfaces.
In independent mode, traffic from a client interface
travels on only one network interface.
NetworkInterfaces Comma- Each value in the list represents a network interface connected
separated list to this module. There must be a group within the file that has
of network the same name as each value in the list. Network interfaces are
interface the module interfaces that leave the Voyager platform and are
group names labeled L1, L2, L3, and L4 on the front of the Voyager.
Note: Although you can use any string for the network
interface group names, Cumulus Networks recommends that
you use the labels on the front of the Voyager to avoid
confusion.
HostInterfaces Comma- Each value in this list represents a client interface connected to
separated list this module. There must be a group within the file that has the
of client same name as each value in the list. Client interfaces are the
interface module interfaces that connect to the Tomahawk switching
group names ASIC.
The following example provides the configuration for module 1. The network interfaces are configured to
operate independently and are defined in the L3 and L4 groups in the file. The client interfaces are defined
in the Client0, Client1, Client2, and Client3 groups in the file. The operational status of the module is ready.
[AC400_1]
Location=1
NetworkMode=independent
NetworkInterfaces=L3,L4
HostInterfaces=Client0,Client1,Client2,Client3
OperStatus=ready
The following table describes the key-value pairs in the network interface groups.
cumulusnetworks.com 315
Cumulus Linux 3.7 User Guide
The following table describes the key-value pairs in the network interface groups.
2 0 L1
2 1 L2
1 0 L3
1 1 L4
TxGridSpacing String: Defines the channel spacing. The AC400 does not support
100ghz, variable-width channels; only different channel center
50ghz, frequencies.
33ghz, The default is 50ghz. Only 50ghz and 12.5ghz are
25ghz, supported.
12.5ghz,
or 6.25
ghz
TxChannel Integer: 1- The channel number upon which the network interface
100 transmits and receives data.
Click here to see the frequency and wavelength per
channel
1 191.15 1,568.36
2 191.20 1,567.95
3 191.25 1,567.54
4 191.30 1,567.13
5 191.35 1,566.72
6 191.40 1,566.31
7 191.45 1,565.91
8 191.50 1,565.50
9 191.55 1,565.09
10 191.60 1,564.68
11 191.65 1,564.27
12 191.70 1,563.86
13 191.75 1,563.46
14 191.80 1,563.05
15 191.85 1,562.64
16 191.90 1,562.23
17 191.95 1,561.83
18 192.00 1,561.42
19 192.05 1,561.01
20 192.10 1,560.61
21 192.15 1,560.20
22 192.20 1,559.79
cumulusnetworks.com 317
Cumulus Linux 3.7 User Guide
23 192.25 1,559.39
24 192.30 1,558.98
25 192.35 1,558.58
26 192.40 1,558.17
27 192.45 1,557.77
28 192.50 1,557.36
29 192.55 1,556.96
30 192.60 1,556.56
31 192.65 1,556.15
32 192.70 1,555.75
33 192.75 1,555.34
34 192.80 1,554.94
35 192.85 1,554.54
36 192.90 1,554.13
37 192.95 1,553.73
38 193.00 1,553.33
39 193.05 1,552.93
40 193.10 1,552.52
41 193.15 1,552.12
42 193.20 1,551.72
43 193.25 1,551.32
44 193.30 1,550.92
45 193.35 1,550.52
46 193.40 1,550.12
47 193.45 1,549.72
48 193.50 1,549.32
49 193.55 1,548.92
50 193.60 1,548.52
51 193.65 1,548.12
52 193.70 1,547.72
53 193.75 1,547.32
54 193.80 1,546.92
55 193.85 1,546.52
56 193.90 1,546.12
57 193.95 1,545.72
58 194.00 1,545.32
59 194.05 1,544.92
60 194.10 1,544.53
cumulusnetworks.com 319
Cumulus Linux 3.7 User Guide
61 194.15 1,544.13
62 194.20 1,543.73
63 194.25 1,543.33
64 194.30 1,542.94
65 194.35 1,542.54
66 194.40 1,542.14
67 194.45 1,541.75
68 194.50 1,541.35
69 194.55 1,540.95
70 194.60 1,540.56
71 194.65 1,540.16
72 194.70 1,539.77
73 194.75 1,539.37
74 194.80 1,538.98
75 194.85 1,538.58
76 194.90 1,538.19
77 194.95 1,537.79
78 195.00 1,537.40
79 195.05 1,537.00
80 195.10 1,536.61
81 195.15 1,536.22
82 195.20 1,535.82
83 195.25 1,535.43
84 195.30 1,535.04
85 195.35 1,534.64
86 195.40 1,534.25
87 195.45 1,533.86
88 195.50 1,533.47
89 195.55 1,533.07
90 195.60 1,532.68
91 195.65 1,532.29
92 195.70 1,531.90
93 195.75 1,531.51
94 195.80 1,531.12
95 195.85 1,530.73
96 195.90 1,530.33
97 195.95 1,529.94
98 196.00 1,529.55
cumulusnetworks.com 321
Cumulus Linux 3.7 User Guide
99 196.05 1,529.16
TxFineTuneFrequency Integer The fine tune frequency of the laser in units of 1 Hz. The
AC400 modules on Voyager are only capable of 1 MHz
resolution; you must specify this value in multiples of
1,000,000. The default value is 0.
MasterEnable Boolean: Enables (true) or disables (false) the ability of the network
true or lane modem to turn-up when leaving the low power state.
false
FecMode String: Selects the type of forward error correction used on the
15%, 15% network interface.
_non_std 15% selects the 15% SDFEC
, or 25%
25% selects the 25% SDFEC
15%_non_std selects the 15% overhead AC100
compatible SDFEC
TxTributaryIndependent List of two Defines which client interfaces map to this network interface
comma- when NetworkMode for the network interface is set to
separated independent. The integers in the list are the Location
integers values of the client interfaces. When operating in pm-qpsk,
only the first client interface in the list is used.
Note: Cumulus Networks STRONGLY recommends that you
do not change this value. The Tomahawk switching ASIC
should be configured to steer data to the appropriate
network interface, not this attribute.
TxTributaryCoupled List of Defines which client interfaces map to this network interface
four when NetworkMode for the network interface is set to
comma- coupled. The integers in the list are the Location values
separated of the client interfaces. When operating in 8-qam, only the
integers first three client interfaces in the list are used and only the
attribute on the network interface at location 0 is used.
Note: Cumulus Networks STRONGLY recommends that you
do not change this value. The Tomahawk switching ASIC
should be configured to steer data to the appropriate
network interface, not this attribute.
Loopback Boolean: Enables (true) or disables (false) line side loopback mode
true or on a network interface. When enabled, you send and receive
false data from the same network interface port to verify that the
port is operational.
The following example shows a network interface at location 0, which has transmission enabled and
50ghz channel spacing. Communication occurs on channel 52 with 1dBm of power. The network interface
becomes operational when leaving the low power state. 16-qam encoding is used (200G) with differential
encoding and 25% overhead SDFEC. The tributary mappings of the client interfaces is left unchanged.
Loopback mode is disabled.
[L1]
Location=0
TxEnable=true
TxGridSpacing=50ghz
TxChannel=52
OutputPower=1
TxFineTuneFrequency=0
MasterEnable=true
ModulationFormat=16-qam
DifferentialEncoding=true
FecMode=25%
TxTributaryIndependent=0,1
TxTributaryCoupled=0,1,2,15
Loopback=false
cumulusnetworks.com 323
Cumulus Linux 3.7 User Guide
Important
Because client interfaces are internal interfaces between the transponder module and the
Tomahawk switching ASIC, the default values of these attributes do not typically need to be
changed.
Location Integer: The location or index of the client interface within a module.
0-3 The Voyager AC400 modules each have four network
interfaces that are connected to the Tomahawk ASIC as
follows:
1 0 fc11
1 1 fc12
1 2 fc10
1 3 fc9
2 0 fc19
2 1 fc18
2 2 fc17
2 3 fc16
Rate String: The rate at which the client interface operates. Because the
otu4 or client interfaces on Voyager are always connected to a
100ge Tomahawk ASIC, always set this value to 100ge.
Boolean:
true or
false
FecDecoder Boolean: Enables (true) or disables (false) FEC decoding for data
true or received from the Tomahawk switching ASIC.
false
FecEncoder Boolean: Enables (true) or disables (false) FEC encoding for data
true or sent to the Tomahawk switching ASIC.
false
DeserialLfCtleGain Integer: These attributes configure the SERDES of the client interface.
0-8 The values for these attributes have been carefully
determined by hardware engineers; do not change them.
DeserialCtleGain Integer:
0-20
DeserialDfeCoeff Integer:
0-63
SerialTap0Gain Integer:
0-7
SerialTap0Delay Integer:
0-7
SerialTap1Gain Integer:
0-7
SerialTap2Gain Integer:
0-15
SerialTap2Delay Integer:
0-7
RxTributaryIndependent Integer: Defines which network interface maps to this client interface
0-1 when NetworkMode for the client interface is set to
independent. The integer is the Location value of the
network interface.
Note: Cumulus Networks STRONGLY recommends that you
do not change this value. The Tomahawk switching ASIC
should be configured to steer data from the appropriate
network interface, not this attribute.
cumulusnetworks.com 325
Cumulus Linux 3.7 User Guide
RxTributaryCoupled Integer: Defines which network interface maps to this client interface
0-1 when NetworkMode for the client interface is set to coupled.
The integer is the Location value of the network interface.
Note: Cumulus Networks STRONGLY recommends that you
do not change this value. The Tomahawk switching ASIC
should be configured to steer data from the appropriate
network interface, not this attribute.
The following example shows a sample configuration for a client interface group.
[Client0]
Location=0
Rate=100ge
Enable=true
FecDecoder=false
FecEncoder=false
DeserialLfCtleGain=1
DeserialCtleGain=18
DeserialDfeCoeff=0
SerialTap0Gain=3
SerialTap0Delay=3
SerialTap1Gain=6
SerialTap2Gain=12
SerialTap2Delay=6
RxTributaryIndependent=0
RxTributaryCoupled=0
Loopback=false
Depending on the configuration changes, programming the change into the hardware can take a long time
to complete (several minutes). The systemd reload command initiates the configuration update and
returns immediately. To monitor the progress of the configuration changes, review the syslog messages.
The following is an example of the syslog messages.
cumulusnetworks.com 327
Cumulus Linux 3.7 User Guide
802.1X Interfaces
The IEEE 802.1X protocol provides a method of authenticating a client (called a supplicant) over wired
media. It also provides access for individual MAC addresses on a switch (called the authenticator) after those
MAC addresses have been authenticated by an authentication server — typically a RADIUS (see page 135)
(Remote Authentication Dial In User Service, defined by RFC 2865) server.
A Cumulus Linux switch acts as an intermediary between the clients connected to the wired ports and the
authentication server, which is reachable over the existing network. EAPOL (Extensible Authentication
Protocol (EAP) over LAN — EtherType value of 0x888E, defined by RFC 3748) operates on top of the data
link layer; the switch uses EAPOL to communicate with supplicants connected to the switch ports.
Cumulus Linux implements 802.1X through the Debian hostapd package, which has been modified to
provide the PAE (port access entity).
Contents
This topic describes ...
Supported Features and Limitations (see page 329)
Install the 802.1X Package (see page 329)
Configure 802.1X Interfaces (see page 330)
Configure 802.1X Interfaces for a VLAN-aware Bridge (see page 330)
Configure 802.1X Interfaces for a Traditional Mode Bridge (see page 331)
Configure the Linux Supplicants (see page 333)
Configure Accounting and Authentication Ports (see page 334)
Configure MAC Authentication Bypass (see page 335)
Configure a Parking VLAN (see page 336)
Configure Dynamic VLAN Assignments (see page 338)
RADIUS Change of Authorization and Disconnect Requests (see page 340)
Configure DAS (see page 341)
Terminate a User Session (see page 342)
Changing the interface dot1x, dot1x mab, or dot1x parking-vlan settings do not
reset existing authorized user ports.
This has been tested with only a few wpa_supplicant (Debian), Windows 10 and Windows 7
supplicants.
RADIUS authentication is supported with FreeRADIUS and Cisco ACS.
Supports simple login/password, PEAP/MSCHAPv2 (Win7) and EAP-TLS (Debian).
There is no support for Mako template-based configurations.
1. Create a simple interface bridge configuration on the switch and add the switch ports that are
members of the bridge. You can use glob syntax to add a range of interfaces. The MAB and parking
VLAN configurations require interfaces to be bridge access ports. The VLAN-aware bridge must be
named bridge and there can be only one VLAN-aware bridge on a switch.
2. Configure the settings for the 802.1X RADIUS server, including its IP address and shared secret:
3. Enable 802.1X on interfaces, then review and commit the new configuration:
These commands create the following configuration snippet in the /etc/network/interfaces file:
...
auto swp1
iface swp1
bridge-learning off
auto swp2
iface swp2
bridge-learning off
auto swp3
iface swp3
bridge-learning off
auto swp4
iface swp4
bridge-learning off
...
auto bridge
iface bridge
bridge-ports swp1 swp2 swp3 swp4
bridge-vlan-aware yes
Verify the 802.1X configuration, showing the configuration and its status:
NCLU and hostapd may change traditional mode configurations on the bridge-ports line in
/etc/network/interface by adding or deleting special 802.1X traditional mode bridge-
ports configuration stanzas in /etc/network/interfaces.d/. It is important that the
source configuration command in /etc/network/interfaces include these special
configuration filenames. It should include at least source /etc/network/interfaces.d/*.
intf in order to not prevent these files from being sourced during an ifreload.
cumulusnetworks.com 331
Cumulus Linux 3.7 User Guide
2. Create a traditional mode bridge configuration on the switch and add the switch ports that are
members of the bridge. Traditional bridge cannot be named bridge as that name is reserved for the
single VLAN-aware bridge on the switch. You can use glob syntax to add a range of interfaces.
3. Create bridge associations with the parking VLAN ID and the dynamic VLAN IDs. In this example, 600
is used for the parking VLAN ID and 700 is used for the dynamic VLAN ID:
4. Configure the settings for the 802.1X RADIUS server, including its IP address and shared secret:
5. Enable 802.1X on interfaces, then review and commit the new configuration:
Verify the 802.1X configuration, showing the configuration and its status:
ctrl_interface=/var/run/wpa_supplicant
ctrl_interface_group=0
eapol_version=2
ap_scan=0
network={
key_mgmt=IEEE8021X
eap=TTLS MD5
identity="host1"
anonymous_identity="host1"
password="host1password"
phase1="auth=MD5"
eapol_flags=0
}
cumulusnetworks.com 333
Cumulus Linux 3.7 User Guide
ctrl_interface=/var/run/wpa_supplicant
ctrl_interface_group=0
eapol_version=2
ap_scan=0
network={
key_mgmt=IEEE8021X
eap=TTLS MD5
identity="host2"
anonymous_identity="host2"
password="host2password"
phase1="auth=MD5"
eapol_flags=0
}
To test that a supplicant (client) can communicate with the Cumulus Linux Authenticator switch, run the
following command from the supplicant:
MAB supports one authenticated MAC address per port only. After a source MAC address is
authenticated, the port exits MAB mode.
cumulusnetworks.com 335
Cumulus Linux 3.7 User Guide
If the authentication for swp1 fails, the port is moved to the parking VLAN:
The following output shows a parking VLAN association failure. VLAN association failure only occurs with
traditional mode bridges when there is no traditional bridge available with a parking VLAN ID-tagged
subinterface in it (notice the [UNKNOWN_BR] status in the output):
cumulusnetworks.com 337
Cumulus Linux 3.7 User Guide
You can specify the require option in the command so that VLAN attributes are required. If VLAN
attributes do not exist in the access response packet returned from the RADIUS server, the user is not
authorized and has no connectivity. If the RADIUS server returns VLAN attributes but the user has an
incorrect password, the user is placed in the parking VLAN (if you have configured parking VLAN).
The following example shows a typical RADIUS configuration (shown for FreeRADIUS, not typically
configured or run on the Cumulus Linux device) for a user with dynamic VLAN assignment:
The following output shows a dynamic VLAN association failure. VLAN association failure only occurs with
traditional mode bridges when there is no traditional bridge available with a parking VLAN ID-tagged
subinterface in it (notice the [UNKNOWN_BR] status in the output):
cumulusnetworks.com 339
Cumulus Linux 3.7 User Guide
To disable dynamic VLAN assignment, where VLAN attributes sent from the RADIUS server are ignored and
users are authenticated based on existing credentials:
Enabling or disabling dynamic VLAN assignment restarts hostapd, which forces existing,
authorized users to re-authenticate.
RADIUS CoA and disconnect requests are supported on a traditional-mode bridge only.
Configure DAS
To configure DAS, provide the UDP port (3799 is the default port), the IP address, and the secret key for the
DAS client.
The following example commands set the UDP port to the default port, the IP address of the DAS client to
10.0.2.228, and the secret key to myclientsecret:
You can disable DAS in Cumulus Linux at any time by running the following commands:
To see DAS configuration information, run the net show configuration dot1x command. For
example:
cumulusnetworks.com 341
Cumulus Linux 3.7 User Guide
To prevent unauthorized servers from disconnecting users, the Disconnect-Request packet must include
certain identification attributes (described below). For a session to be disconnected, all parameters must
match their expected values at the switch. If the parameters do not match, the switch discards the
Disconnect-Request packet and sends a Disconnect-NAK (negative acknowledgment message).
The Message-Authenticator attribute is required.
If the packet comes from a different source IP address than the one defined by das-client-ip,
the session is not disconnected and the hostapd logs the debug message: DAS: Drop message
from unknown client.
If the Acct-Session-Id attribute is omitted, the User-Name attribute is used to find the session.
If the User-Name attribute is omitted, the Acct-Session-Id attribute is used. If both the User-
Name and the Acct-Session-Id attributes are supplied, they must match the username provided
by the supplicant with the Acct-Session-Id provided. If neither are given or there is no match, a
Disconnect-NAK message is returned to the RADIUS server with Error-Cause "Session-
Context-Not-Found" and the following debug message is shown in the log:
RADIUS DAS: Acct-Session-Id match
RADIUS DAS: No matches remaining after User-Name check
hostapd_das_find_global_sta: checking ifname=swp2
RADIUS DAS: No matches remaining after Acct-Session-Id check
RADIUS DAS: No matching session found
DAS: Session not found for request from 10.10.0.1:58385
DAS: Reply to 10.10.0.1:58385
The following is an example of the Disconnect-Request packet received by the switch:
RADIUS Protocol
Code: Disconnect-Request (40)
Packet identifier: 0x4f (79)
Length: 53
Authenticator: c0e1fa75fdf594a1cfaf35151a43c6a7
Attribute Value Pairs
AVP: t=Acct-Session-Id(44) l=17 val=D91FE8E51802097
AVP: t=User-Name(1) l=10 val=somebody
AVP: t=Message-Authenticator(80) l=18
val=38cb3b6896623b4b7d32f116fa976cdc
AVP: t=Event-Timestamp(55) l=6 val=1532974019
AVP: t=NAS-IP-Address(4) l=6 val=10.0.0.1
Bounce a Port
You can create a CoA bounce-host-port message from the RADIUS server using the radclient utility
(included in the Debian freeradius-utils package). The bounce port can cause a link flap on an
authentication port, which triggers DHCP renegotiation from one or more hosts connected to the port.
The following is an example of a Cisco AVPair CoA bounce-host-port message sent from the radclient utility:
cumulusnetworks.com 343
Cumulus Linux 3.7 User Guide
RADIUS Protocol
Code: CoA-Request (43)
Packet identifier: 0x3a (58)
Length: 96
Authenticator: 6480d710802329269d5cae6a59bcfb59
Attribute Value Pairs
AVP: t=Acct-Session-Id(44) l=17 val=D91FE8E51802097
Type: 44
Length: 17
Acct-Session-Id: D91FE8E51802097
AVP: t=User-Name(1) l=10 val=somebody
Type: 1
Length: 10
User-Name: somebody
AVP: t=NAS-IP-Address(4) l=6 val=10.0.0.1
Type: 4
Length: 6
NAS-IP-Address: 10.0.0.1
AVP: t=Vendor-Specific(26) l=43 vnd=ciscoSystems(9)
Type: 26
Length: 43
Vendor ID: ciscoSystems (9)
VSA: t=Cisco-AVPair(1) l=37 val=subscriber:command=bounce-host-port
Type: 1
Length: 37
Cisco-AVPair: subscriber:command=bounce-host-port
Troubleshooting
To check connectivity between two supplicants, ping one host from the other:
You can run net show dot1x with the following options for more data:
json: Prints the command output in JSON format.
cumulusnetworks.com 345
Cumulus Linux 3.7 User Guide
dot1xAuthSessionUserName testing
dot1xPaePortProtocolVersion 2
last_eap_type_as 4 (MD5)
last_eap_type_sta 4
(MD5)
...
You can increase the debug level in hostapd by copying over the hostapd service file, then adding -d, -dd
346 09 January 2019
Cumulus Networks
You can increase the debug level in hostapd by copying over the hostapd service file, then adding -d, -dd
or -ddd to the ExecStart line in the hostapd.service file:
...
Once installed and configured, the FreeRADIUS server can serve Cumulus Linux running hostapd as a
RADIUS client.
Contents
This topic describes ...
Supported Features (see page 348)
Configure PTM (see page 349)
Basic Topology Example (see page 349)
ptmd Scripts (see page 350)
Configuration Parameters (see page 350)
Host-only Parameters (see page 350)
Global Parameters (see page 351)
Per-port Parameters (see page 351)
Templates (see page 352)
Supported BFD and LLDP Parameters (see page 352)
Bidirectional Forwarding Detection (BFD) (see page 353)
Check Link State with FRRouting (see page 354)
ptmd Service Commands (see page 354)
ptmctl Commands (see page 355)
ptmctl Examples (see page 355)
ptmctl Error Outputs (see page 357)
Caveats and Errata (see page 358)
Related Information (see page 358)
Supported Features
Topology verification using LLDP. ptmd creates a client connection to the LLDP daemon, lldpd, and
retrieves the neighbor relationship between the nodes/ports in the network and compares them
against the prescribed topology specified in the topology.dot file.
Only physical interfaces, like swp1 or eth0, are currently supported. Cumulus Linux does not
support specifying virtual interfaces like bonds or subinterfaces like eth0.200 in the topology file.
Forwarding path failure detection using Bidirectional Forwarding Detection (BFD); however, demand
mode is not supported. For more information on how BFD operates in Cumulus Linux, read the
Bidirectional Forwarding Detection - BFD (see page 805) chapter and read man ptmd(8).
Integration with FRRouting (PTM to FRRouting notification).
Client management: ptmd creates an abstract named socket /var/run/ptmd.socket on startup.
Other applications can connect to this socket to receive notifications and send commands.
Event notifications: see Scripts below.
User configuration via a topology.dot file; see below (see page 349).
Configure PTM
ptmd verifies the physical network topology against a DOT-specified network graph file, /etc/ptm.d
/topology.dot.
This file must be present or else ptmd will not start. You can specify an alternate file using the -c
option.
PTM performs its LLDP neighbor check using the PortID ifname TLV information. Previously, it
used the PortID port description TLV information.
graph G {
"spine1":"swp1" -- "leaf1":"swp1";
"spine1":"swp2" -- "leaf2":"swp1";
"spine2":"swp1" -- "leaf1":"swp2";
"spine2":"swp2" -- "leaf2":"swp2";
"leaf1":"swp3" -- "leaf2":"swp3";
"leaf1":"swp4" -- "leaf2":"swp4";
"leaf1":"swp5s0" -- "server1":"eth1";
"leaf2":"swp5s0" -- "server2":"eth1";
}
cumulusnetworks.com 349
Cumulus Linux 3.7 User Guide
ptmd Scripts
ptmd executes scripts at /etc/ptm.d/if-topo-pass and /etc/ptm.d/if-topo-fail for each
interface that goes through a change, running if-topo-pass when an LLDP or BFD check passes and
running if-topo-fails when the check fails. The scripts receive an argument string that is the result of
the ptmctl command, described in the ptmd commands section below (see page 354).
You should modify these default scripts as needed.
Configuration Parameters
You can configure ptmd parameters in the topology file. The parameters are classified as host-only, global,
per-port/node and templates.
Host-only Parameters
Host-only parameters apply to the entire host on which PTM is running. You can include the hostnametype
host-only parameter, which specifies whether PTM should use only the host name ( hostname) or the fully-
qualified domain name (fqdn) while looking for the self-node in the graph file. For example, in the graph
file below, PTM will ignore the FQDN and only look for switch04, since that is the host name of the switch it's
running on:
It’s a good idea to always wrap the hostname in double quotes, like "www.example.com".
Otherwise, ptmd can fail if you specify a fully-qualified domain name as the hostname and do not
wrap it in double quotes.
Further, to avoid errors when starting the ptmd process, make sure that /etc/hosts and /etc
/hostname both reflect the hostname you are using in the topology.dot file.
graph G {
hostnametype="hostname"
BFD="upMinTx=150,requiredMinRx=250"
"cumulus":"swp44" -- "switch04.cumulusnetworks.com":"swp20"
"cumulus":"swp46" -- "switch04.cumulusnetworks.com":"swp22"
}
However, in this next example, PTM will compare using the FQDN and look for switch05.cumulusnetworks.
com, which is the FQDN of the switch it’s running on:
graph G {
hostnametype="fqdn"
"cumulus":"swp44" -- "switch05.cumulusnetworks.com":"swp20"
"cumulus":"swp46" -- "switch05.cumulusnetworks.com":"swp22"
}
Global Parameters
Global parameters apply to every port listed in the topology file. There are two global parameters: LLDP and
BFD. LLDP is enabled by default; if no keyword is present, default values are used for all ports. However,
BFD is disabled if no keyword is present, unless there is a per-port override configured. For example:
graph G {
LLDP=""
BFD="upMinTx=150,requiredMinRx=250,afi=both"
"cumulus":"swp44" -- "qct-ly2-04":"swp20"
"cumulus":"swp46" -- "qct-ly2-04":"swp22"
}
Per-port Parameters
Per-port parameters provide finer-grained control at the port level. These parameters override any global or
compiled defaults. For example:
graph G {
LLDP=""
BFD="upMinTx=300,requiredMinRx=100"
cumulusnetworks.com 351
Cumulus Linux 3.7 User Guide
Templates
Templates provide flexibility in choosing different parameter combinations and applying them to a given
port. A template instructs ptmd to reference a named parameter string instead of a default one. There are
two parameter strings ptmd supports:
bfdtmpl, which specifies a custom parameter tuple for BFD.
lldptmpl, which specifies a custom parameter tuple for LLDP.
For example:
graph G {
LLDP=""
BFD="upMinTx=300,requiredMinRx=100"
BFD1="upMinTx=200,requiredMinRx=200"
BFD2="upMinTx=100,requiredMinRx=300"
LLDP1="match_type=ifname"
LLDP2="match_type=portdescr"
"cumulus":"swp44" -- "qct-ly2-04":"swp20" [BFD="
bfdtmpl=BFD1", LLDP="lldptmpl=LLDP1"]
"cumulus":"swp46" -- "qct-ly2-04":"swp22" [BFD="
bfdtmpl=BFD2", LLDP="lldptmpl=LLDP2"]
"cumulus":"swp46" -- "qct-ly2-04":"swp22"
}
In this template, LLDP1 and LLDP2 are templates for LLDP parameters while BFD1 and BFD2 are templates
for BFD parameters.
graph G {
"cumulus-1":"swp44" -- "cumulus-2":"swp20" [BFD="upMinTx=300,
requiredMinRx=100,afi=v6"]
"cumulus-1":"swp46" -- "cumulus-2":"swp22" [BFD="
detectMult=4"]
}
graph G {
"cumulus-1":"swp44" -- "cumulus-2":"swp20" [LLDP="
match_hostname=fqdn"]
"cumulus-1":"swp46" -- "cumulus-2":"swp22" [LLDP="
match_type=portdescr"]
}
When you specify match_hostname=fqdn, ptmd will match the entire FQDN, like cumulus-2.
domain.com in the example below. If you do not specify anything for match_hostname, ptmd will
match based on hostname only, like cumulus-3 below, and ignore the rest of the URL:
graph G {
"cumulus-1":"swp44" -- "cumulus-2.domain.com":"swp20"
[LLDP="match_hostname=fqdn"]
"cumulus-1":"swp46" -- "cumulus-3":"swp22" [LLDP="
match_type=portdescr"]
}
cumulusnetworks.com 353
Cumulus Linux 3.7 User Guide
You only need to do this to check link state; you don't need to enable PTM to determine BFD
status.
The check is enabled by default. Every interface has an implied ptm-enable line in the configuration
stanza in the interfaces file.
To disable the checks, delete the ptm-enable parameter from the interface. For example:
With PTM enabled on an interface, the zebra daemon connects to ptmd over a Unix socket. Any time there
is a change of status for an interface, ptmd sends notifications to zebra. Zebra maintains a ptm-status
flag per interface and evaluates routing adjacency based on this flag. To check the per-interface ptm-
status:
ptmctl Commands
ptmctl is a client of ptmd; it retrieves the operational state of the ports configured on the switch and
information about BFD sessions from ptmd. ptmctl parses the CSV notifications sent by ptmd.
See man ptmctl for more information.
ptmctl Examples
The examples below contain the following keywords in the output of the cbl status column, which are
described here:
cbl Definition
status
Keyword
pass The interface is defined in the topology file, LLDP information is received on the interface,
and the LLDP information for the interface matches the information in the topology file.
fail The interface is defined in the topology file, LLDP information is received on the interface,
and the LLDP information for the interface does not match the information in the topology
file.
N/A The interface is defined in the topology file, but no LLDP information is received on the
interface. The interface may be down or disconnected, or the neighbor is not sending LLDP
packets.
The "N/A" and "fail" statuses may indicate a wiring problem to investigate.
The "N/A" status is not shown when using the -l option with ptmctl. If you specify the -l
option, ptmctl displays only those interfaces that are receiving LLDP information.
-------------------------------------------------------------
port cbl BFD BFD BFD BFD
status status peer local type
-------------------------------------------------------------
swp1 pass pass 11.0.0.2 N/A singlehop
swp2 pass N/A N/A N/A N/A
swp3 pass N/A N/A N/A N/A
----------------------------------------------------------------------
----------------
port cbl exp act sysname portID portDescr match
last BFD BFD
status nbr nbr on
upd Type state
----------------------------------------------------------------------
----------------
swp45 pass h1:swp1 h1:swp1 h1 swp1 swp1 IfName 5m:
5s N/A N/A
swp46 fail h2:swp1 h2:swp1 h2 swp1 swp1 IfName 5m:
5s N/A N/A
To return information on active BFD sessions ptmd is tracking, use the -b option:
----------------------------------------------------------
port peer state local type diag
----------------------------------------------------------
swp1 11.0.0.2 Up N/A singlehop N/A
N/A 12.12.12.1 Up 12.12.12.4 multihop N/A
To return LLDP information, use the -l option. It returns only the active neighbors currently being tracked
by ptmd.
---------------------------------------------
To return detailed information on active BFD sessions ptmd is tracking, use the -b and -d options (results
are for an IPv6-connected peer):
----------------------------------------------------------------------
------------------
port peer state local type diag det
tx_timeout rx_timeout
mult
----------------------------------------------------------------------
------------------
swp1 fe80::202:ff:fe00:1 Up N/A singlehop N/A 3
300 900
swp1 3101:abc:bcad::2 Up N/A singlehop N/A 3
300 900
#continuation of output
---------------------------------------------------------------------
echo echo max rx_ctrl tx_ctrl rx_echo tx_echo
tx_timeout rx_timeout hop_cnt
---------------------------------------------------------------------
0 0 N/A 187172 185986 0 0
0 0 N/A 501 533 0 0
cumulusnetworks.com 357
Cumulus Linux 3.7 User Guide
Unsupported command
For example:
If you encounter errors with the topology.dot file, you can use dot (included in the Graphviz
package) to validate the syntax of the topology file.
By simply opening the topology file with Graphviz, you can ensure that it is readable and that the
file format is correct.
If you edit topology.dot file from a Windows system, be sure to double check the file
formatting; there may be extra characters that keep the graph from working correctly.
Related Information
Bidirectional Forwarding Detection (BFD)
Graphviz
358 09 January 2019
Cumulus Networks
Graphviz
LLDP on Wikipedia
PTMd GitHub repo
Layer 2
cumulusnetworks.com 359
Cumulus Linux 3.7 User Guide
Layer 2
Contents
This topic describes ...
Supported Modes (see page 360)
STP for a VLAN-aware Bridge (see page 360)
STP within a Traditional Mode Bridge (see page 361)
View Bridge and STP Status and Logs (see page 361)
Customize Spanning Tree Protocol (see page 365)
Spanning Tree Priority (see page 365)
PortAdminEdge (PortFast Mode) (see page 366)
PortAutoEdge (see page 367)
BPDU Guard (see page 367)
Bridge Assurance (see page 370)
BPDU Filter (see page 370)
Storm Control (see page 371)
Spanning Tree Parameter List (see page 371)
Caveats and Errata (see page 378)
Related Information (see page 378)
Supported Modes
The STP modes Cumulus Linux supports vary depending upon whether the traditional or VLAN-aware
bridge driver mode is in use.
Bridges configured in VLAN-aware (see page 402) mode operate only in RSTP mode.
Bridges configured in traditional mode (see page 414) operate in both PVST and PVRST mode. The
default is set to PVRST. Each traditional bridge has its own separate STP instance.
When connecting a VLAN-aware bridge to a proprietary PVST+ switch using STP, VLAN 1 must be
allowed on all 802.1Q trunks that interconnect them, regardless of the configured native VLAN .
This is because only VLAN 1 enables the switches to address the BPDU frames to the IEEE
multicast MAC address. The proprietary switch might be configured like this:
When connected to a switch that has a native VLAN configuration, the native VLAN must be
configured to be VLAN 1 only for maximum interoperability.
cumulusnetworks.com 361
Cumulus Linux 3.7 User Guide
cumulusnetworks.com 363
Cumulus Linux 3.7 User Guide
mstpd is the preferred utility for interacting with STP on Cumulus Linux. brctl also provides
certain methods for configuring STP; however, they are not as complete as the tools offered in
mstpd and output from brctl can be misleading in some cases.
Cumulus Linux supports MSTI 0 only. It does not support MSTI 1 through 15.
cumulusnetworks.com 365
Cumulus Linux 3.7 User Guide
Cumulus Linux supports MSTI 0 only. It does not support MSTI 1 through 15.
Using PortAdminEdge mode has the potential to cause loops if it is not accompanied by the
BPDU guard (see page 367) feature.
While it is common for edge ports to be configured as access ports for a simple end host, this is not
mandatory. In the data center, edge ports typically connect to servers, which might pass both tagged and
untagged traffic.
auto swp5
iface swp5
mstpctl-bpduguard yes
mstpctl-portadminedge yes
auto br2
iface br2 inet static
bridge-ports swp1 swp2 swp3 swp4
mstpctl-bpduguard swp1=yes swp2=yes swp3=yes swp4=yes
mstpctl-portadminedge swp1=yes swp2=yes swp3=yes swp4=yes
PortAutoEdge
PortAutoEdge is an enhancement to the standard PortAdminEdge (PortFast) mode, which allows for the
automatic detection of edge ports. PortAutoEdge enables and disables the auto transition to/from the edge
state of a port in a bridge.
Edge ports and access ports are not the same thing. Edge ports transition directly to the
forwarding state and skip the listening and learning stages. Upstream topology change
notifications are not generated when an edge port's link changes state. Access ports only forward
untagged traffic; however, there is no such restriction on edge ports, which can forward both
tagged and untagged traffic.
When a BPDU is received on a port configured with portautoedge, the port ceases to be in the edge port
state and transitions into a normal STP port. When BPDUs are no longer received on the interface, the port
becomes an edge port, and transitions through the discarding and learning states before resuming
forwarding.
PortAutoEdge is enabled by default in Cumulus Linux.
To disable PortAutoEdge for an interface, run the net add interface <port> stp portautoedge
no command. The following example disables PortAutoEdge on swp1:
To re-enable PortAutoEdge for an interface, run the the net del interface <port> stp
portautoedge no command. The following example re-enables PortAutoEdge on swp1:
BPDU Guard
To protect the spanning tree topology from unauthorized switches affecting the forwarding path, you can
configure BPDU guard (Bridge Protocol Data Unit). One very common example is when someone hooks up
a new switch to an access port off of a leaf switch. If this new switch is configured with a low priority, it could
become the new root switch and affect the forwarding path for the entire layer 2 topology.
auto swp5
iface swp5
mstpctl-bpduguard yes
To determine whether BPDU guard is configured, or if a BPDU has been received, run:
The only way to recover a port that has been placed in the disabled state is to manually un-shut or bring up
the port with sudo ifup [port], as shown in the example below.
Bringing up the disabled port does not fix the problem if the configuration on the connected end-
station has not been rectified.
cumulusnetworks.com 369
Cumulus Linux 3.7 User Guide
Bridge Assurance
On a point-to-point link where RSTP is running, if you want to detect unidirectional links and put the port in
a discarding state (in error), you can enable bridge assurance on the port by enabling a port type network.
The port would be in a bridge assurance inconsistent state until a BPDU is received from the peer. You
need to configure the port type network on both the ends of the link in order for bridge assurance to
operate properly.
The default setting for bridge assurance is off. This means that there is no difference between disabling
bridge assurance on an interface and not configuring bridge assurance on an interface.
auto swp1
iface swp1
mstpctl-portnetwork yes
You can monitor logs for bridge assurance messages by doing the following:
BPDU Filter
You can enable bpdufilter on a switch port, which filters BPDUs in both directions. This effectively
disables STP on the port as no BPDUs are transiting.
Using BDPU filter inappropriately can cause layer 2 loops. Use this feature deliberately and with
extreme caution.
auto swp6
iface swp6
mstpctl-portbpdufilter yes
Storm Control
Storm control provides protection against excessive inbound BUM (broadcast, unknown unicast, multicast)
traffic on layer 2 switch port interfaces, which can cause poor network performance.
You configure storm control for each physical port by configuring switchd (see page 199). For example, to
enable unicast and multicast storm control at 400 packets per second (pps) and 3000 pps, respectively, for
swp1, run the following:
Most of these parameters are blacklisted in the ifupdown_blacklist section of the /etc/
netd.conf file. Before you configure these parameters, you must edit the file to remove them
from the blacklist.
mstpctl- Sets the maximum age of the bridge in seconds. The default is 20. The
maxage maximum age must meet the condition 2 * (Bridge Forward Delay - 1
net second) >= Bridge Max Age.
add
cumulusnetworks.com 371
Cumulus Linux 3.7 User Guide
bridg
e
stp
maxag
e
<seco
nds>
mstpctl- Sets the Ethernet (MAC) address ageing time for the bridge in
ageing seconds when the running version is STP, but not RSTP/MSTP. The
net default is 1800.
add
bridg
e
stp
agein
g
<seco
nds>
mstpctl- Sets the bridge forward delay time in seconds. The default value is 15.
fdelay The bridge forward delay must meet the condition 2 * (Bridge
net Forward Delay - 1 second) >= Bridge Max Age.
add
bridg
e
stp
fdela
y
<seco
nds>
mstpctl- Sets the maximum hops for the bridge. The default is 20.
maxhops
net
add
bridg
e
stp
maxho
ps
<max_
hops>
mstpctl- Sets the bridge transmit hold count. The default value is 6.
txholdcount
net
add
bridg
e
stp
txhol
dcoun
t
<hold
_coun
t>
mstpctl- Sets the force STP version of the bridge to either RSTP/STP. MSTP is
forcevers not supported currently. The default is RSTP.
net
add
bridg
e
stp
force
vers
RSTP|
STP
mstpctl- Sets the tree priority of the bridge for an MSTI (multiple spanning tree
treeprio instance). The priority value is a number between 0 and 61440 and
net must be a multiple of 4096. The bridge with the lowest priority is
add elected the root bridge. The default is 32768.
bridg
e
stp Cumulus Linux supports MSTI 0 only. It does not support
treep MSTI 1 through 15.
rio
<prio
rity>
net
add
bridg
e
cumulusnetworks.com 373
Cumulus Linux 3.7 User Guide
stp
hello
<seco
nds>
mstpctl- Sets the priority of the interface for the MSTI. The priority value is a
treeportprio number between 0 and 240 and must be a multiple of 16. The
net default is 128.
add
inter
face Cumulus Linux supports MSTI 0 only. It does not support
<inte MSTI 1 through 15.
rface
>
stp
treep
ortpr
io
<prio
rity>
mstpctl- Enables or disables the initial edge state of the interface in the bridge.
portadminedge The default is no.
net
add
inter
face
<inte
rface
>
stp
porta
dmine
dge
mstpctl- Enables or disables the auto transition to and from the edge state of
portautoedge the interface in the bridge. The default is yes.
net
portautoedge is an enhancement to the standard PortAdminEdge
add
(PortFast) mode, which allows for the automatic detection of edge
inter
ports.
face
<inte
rface Edge ports and access ports are not the same thing. Edge
> ports transition directly to the forwarding state and skip the
stp listening and learning stages. Upstream topology change
porta notifications are not generated when an edge port's link
utoed changes state. Access ports only forward untagged traffic;
ge no however, there is no such restriction on edge ports, which
can forward both tagged and untagged traffic.
cumulusnetworks.com 375
Cumulus Linux 3.7 User Guide
mstpctl- Enables or disables the ability of the interface in the bridge to take
portrestrrole the root role. The default is no.
net
add
inter
face
<inte
rface
>
stp
portr
estrr
ole
mstpctl- Enables or disables the BPDU filter functionality for an interface in the
portbpdufilter bridge. The default is no.
net
add
inter
face
<inte
rface
>
stp
portb
pdufi
lter
mstpctl- Sets the spanning tree port cost to a value from 0 to 255. The default
treeportcost is 0.
net
add
inter
face
<inte
rface
>
stp
treep
ortco
st
<port
_cost
>
cumulusnetworks.com 377
Cumulus Linux 3.7 User Guide
Related Information
The source code for mstpd/mstpctl was written by Vitalii Demianets and is hosted at the URL below.
Sourceforge - mstpd project
Wikipedia - Spanning Tree Protocol
brctl(8)
bridge-utils-interfaces(5)
ifupdown-addons-interfaces(5)
mstpctl(8)
mstpctl-utils-interfaces(5)
Contents
This topic describes ...
Configure LLDP (see page 378)
Example lldpcli Commands (see page 379)
Enable the SNMP Subagent in LLDP (see page 384)
Caveats and Errata (see page 384)
Related Information (see page 384)
Configure LLDP
You configure lldpd settings in /etc/lldpd.conf or /etc/lldpd.d/.
Here is an example persistent configuration:
The last line in the example above shows that LLDP is disabled on eth0. You can disable LLDP on a single
port by editing the /etc/default/lldpd file. This file specifies the default options to present to the
lldpd service when it starts. The following example uses the -I option to disable LLDP on swp43:
cumulusnetworks.com 379
Cumulus Linux 3.7 User Guide
MgmtIP: fe80::201:ff:fe00:a00
Capability: Bridge, off
Capability: Router, on
Port:
PortID: ifname swp2
PortDescr: swp2
----------------------------------------------------------------------
---------
Interface: swp49s1, via: LLDP, RID: 9, Time: 0 day, 16:55:00
Chassis:
ChassisID: mac 00:01:00:00:0c:00
SysName: TORC-1-2
SysDescr: Cumulus Linux version 3.0.0 running on QEMU
Standard PC (i440FX + PIIX, 1996)
MgmtIP: 192.0.2.12
MgmtIP: fe80::201:ff:fe00:c00
Capability: Bridge, on
Capability: Router, on
Port:
PortID: ifname swp6
PortDescr: swp6
----------------------------------------------------------------------
---------
Interface: swp49s0, via: LLDP, RID: 9, Time: 0 day, 16:55:00
Chassis:
ChassisID: mac 00:01:00:00:0c:00
SysName: TORC-1-2
SysDescr: Cumulus Linux version 3.0.0 running on QEMU
Standard PC (i440FX + PIIX, 1996)
MgmtIP: 192.0.2.12
MgmtIP: fe80::201:ff:fe00:c00
Capability: Bridge, on
Capability: Router, on
Port:
PortID: ifname swp5
PortDescr: swp5
----------------------------------------------------------------------
---------
cumulusnetworks.com 381
Cumulus Linux 3.7 User Guide
Ageout: 10
Inserted: 20
Deleted: 10
--------------------------------------------------------------------
Interface: swp1
Transmitted: 9423
Received: 6264
Discarded: 0
Unrecognized: 0
Ageout: 0
Inserted: 2
Deleted: 0
---------------------------------------------------------------------
Interface: swp2
Transmitted: 9423
Received: 6264
Discarded: 0
Unrecognized: 0
Ageout: 0
Inserted: 2
Deleted: 0
---------------------------------------------------------------------
Interface: swp3
Transmitted: 9423
Received: 6265
Discarded: 0
Unrecognized: 0
Ageout: 0
Inserted: 2
Deleted: 0
----------------------------------------------------------------------
... and more (output truncated to fit this document)
A runtime configuration does not persist when you reboot the switch — all changes are lost.
The active interface list always overrides the inactive interface list.
cumulusnetworks.com 383
Cumulus Linux 3.7 User Guide
Related Information
GitHub - lldpd project
Wikipedia - Link Layer Discovery Protocol
Voice VLAN
In Cumulus Linux, a voice VLAN is a VLAN dedicated to voice traffic on a switch port. However, the term can
mean different things to different vendors.
Voice VLAN is part of a trunk port with 2 VLANs that comprises either:
Native VLAN, which carries both data and voice traffic, or
Voice VLAN, which carries the voice traffic, and a data or native VLAN, which carries the data traffic
in a trunk port.
The voice traffic is an 802.1q-tagged packet with a VLAN ID that has a VLAN ID (which may or may not be 0)
and an 802.1p (3-bit layer 2 COS) with a specific value (typically 5 is assigned for voice traffic).
Data traffic is always untagged (see page 420).
You can configure the topology above using the following NCLU (see page 88) commands. In this
configuration:
swp1 data traffic traverses the bridge's native VLAN and the voice traffic traverses VLAN 200
swp2 data traffic traverses VLAN 10 and the voice traffic traverses VLAN 100
swp3 data and voice traffic both traverse the bridge's native VLAN
These commands create the following configuration snippet in the /etc/network/interfaces file:
auto swp1
iface swp1
bridge-vids 200
cumulusnetworks.com 385
Cumulus Linux 3.7 User Guide
mstpctl-bpduguard yes
mstpctl-portadminedge yes
auto swp2
iface swp2
bridge-pvid 10
bridge-vids 100
mstpctl-bpduguard yes
mstpctl-portadminedge yes
auto swp3
iface swp3
bridge-vids 300
mstpctl-bpduguard yes
mstpctl-portadminedge yes
auto bridge
iface bridge
bridge-ports swp1 swp2 swp3
bridge-pvid 1
bridge-vids 1-1000
bridge-vlan-aware yes
Configure LLDP
Configuring voice VLAN with NCLU does not configure lldpd in Cumulus Linux, so LLDP-MED does not
provide data and voice VLAN information. You can configure LLDP-MED for each interface in a new file in
/etc/lldp.d. In the following example, the file is called /etc/lldpd.d/voice_vlan.conf:
You can also use the lldpcli command to configure an LLDP-MED network policy. However, lldpcli
commands do not persist across reboots of the switch.
Troubleshooting
The bridge-vids can be reviewed with the net show bridge vlan command:
You can get MAC address information using the net show bridge macs command:
You can capture LLDP information by checking syslog or using tcpdump on an interface.
Failover protection
Cumulus Linux uses version 1 of the LAG control protocol (LACP).
To temporarily bring up a bond even when there is no LACP partner, use LACP Bypass (see page 459).
Contents
This topic describes ...
Hash Distribution (see page 388)
Create a Bond (see page 388)
Configuration Options (see page 389)
Enable balance-xor Mode (see page 391)
Example Configuration: Bonding 4 Slaves (see page 392)
Caveats and Errata (see page 394)
Related Information (see page 395)
Hash Distribution
Egress traffic through a bond is distributed to a slave based on a packet hash calculation, providing load
balancing over the slaves; many conversation flows are distributed over all available slaves to load balance
the total traffic. Traffic for a single conversation flow always hashes to the same slave.
The hash calculation uses packet header data to choose to which slave to transmit the packet:
For IP traffic, IP header source and destination fields are used in the calculation.
For IP + TCP/UDP traffic, source and destination ports are included in the hash calculation.
In a failover event, the hash calculation is adjusted to steer traffic over available slaves.
Create a Bond
You can create and configure a bond with the Network Command Line Utility ( NCLU (see page 88)). Follow
the steps below to create a new bond:
The bond is configured by default in IEEE 802.3ad link aggregation mode. To configure the bond in
balance-xor mode, see bond mode (see page 389) below.
Configuration Options
The configuration options and their default values are listed in the table below.
Each bond configuration option, except for bond slaves, is set to the recommended value by
default in Cumulus Linux. Only configure an option if a different setting is needed. For more
information on configuration values, refer to the Related Information (see page 395) section
below.
bond mode The bonding mode. Cumulus Linux supports IEEE 802.3ad link aggregation 802.3ad
mode and balance-xor mode. IEEE 802.3ad link aggregation is the default
mode.
You can change the bond mode using NCLU. The following example
changes bond1 to balance-xor mode.
Note: Use balance-xor mode only if you cannot use LACP. See below (see
page 391) for more information.
bond miimon Defines how often the link state of each slave is inspected for failures. 100
bond 0
downdelay
cumulusnetworks.com 389
Cumulus Linux 3.7 User Guide
bond Specifies the time, in milliseconds, to wait before enabling a slave after a 0
updelay link recovery has been detected. This option is only valid for the miimon
link monitor. The updelay value must be a multiple of the miimon value; if
not, it is rounded down to the nearest multiple.
bond xmit- The hash method used to select the slave for a given packet. layer3+4
hash-policy
bond lacp- Sets the rate to ask the link partner to transmit LACP control packets. 1
rate
You can set the LACP rate to slow using NCLU (see page 88):
bond min- Defines the minimum number of links that must be active before the bond 1
links is put into service.
Use balance-xor mode only if you cannot use LACP; LACP can detect mismatched link attributes
between bond members and can even detect misconnections.
To change the mode of an existing bond to balance-xor, run the net add bond <bond-name> bond
mode balance-xor command. The following example commands change bond1 to balance-xor mode:
To create a new bond and configure the bond to use balance-xor mode, create the bond, then configure
the bond mode. The following example commands create a bond called bond1 and configure bond mode
to be balance-xor:
auto bond1
iface bond1
bond-mode balance-xor
bond-slaves swp3 swp4
Bond Details
--------------- -------------
cumulusnetworks.com 391
Cumulus Linux 3.7 User Guide
LLDP
------- ---- ------------
swp3(P) ==== swp1(p1c1h1)
swp4(P) ==== swp2(p1c1h1)Routing
-------
Interface bond1 is up, line protocol is up
Link ups: 3 last: 2017/04/26 21:00:38.26
Link downs: 2 last: 2017/04/26 20:59:56.78
PTM status: disabled
vrf: Default-IP-Routing-Table
index 31 metric 0 mtu 1500
flags: <UP,BROADCAST,RUNNING,MULTICAST>
Type: Ethernet
HWaddr: 00:02:00:00:00:12
inet6 fe80::202:ff:fe00:12/64
Interface Type Other
auto bond0
iface bond0
address 10.0.0.1/30
bond-slaves swp1 swp2 swp3 swp4
If the bond is going to become part of a bridge, you do not need to specify an IP
address.
When networking is started on the switch, bond0 is created as MASTER and interfaces swp1 thru swp4
come up in SLAVE mode, as seen in the ip link show command:
cumulusnetworks.com 393
Cumulus Linux 3.7 User Guide
...
All slave interfaces within a bond have the same MAC address as the bond. Typically, the first
slave added to the bond donates its MAC address as the bond MAC address, whereas the MAC
addresses of the other slaves are set to the bond MAC address.
The bond MAC address is used as the source MAC address for all traffic leaving the bond and
provides a single destination MAC address to address traffic to the bond.
On Cumulus RMP switches, which are built with two Hurricane2 ASICs, you cannot form
an LACP bond on links that terminate on different Hurricane2 ASICs.
Related Information
Linux Foundation - Bonding
802.3ad (Accessible writeup)
Wikipedia - Link aggregation
Bridge members can be individual physical interfaces, bonds or logical interfaces that traverse an
802.1Q VLAN trunk.
Cumulus Networks recommends using VLAN-aware mode (see page 402) bridges, rather than
traditional mode bridges. The bridge driver in Cumulus Linux is capable of VLAN filtering, which
allows for configurations that are similar to incumbent network devices. While Cumulus Linux
supports Ethernet bridges in traditional mode, Cumulus Networks recommends using VLAN-
aware mode.
For a comparison of traditional and VLAN-aware modes, read this knowledge base article.
cumulusnetworks.com 395
Cumulus Linux 3.7 User Guide
Cumulus Linux does not put all ports into a bridge by default.
You can configure both VLAN-aware and traditional mode bridges on the same network in
Cumulus Linux; however you cannot have more than one VLAN-aware bridge on a given switch.
Contents
This topic describes ...
Create a VLAN-aware Bridge (see page 396)
Create a Traditional Mode Bridge (see page 396)
Configure Bridge MAC Addresses (see page 396)
MAC Address Ageing (see page 397)
Configure an SVI (Switch VLAN Interface) (see page 397)
IPv6 Link-local Address Generation (see page 400)
Caveats and Errata (see page 401)
Related Information (see page 402)
...
auto bridge
iface bridge
bridge-ageing 600
...
When an interface is added to a bridge, it ceases to function as a router interface, and the IP
address on the interface, if any, becomes unreachable.
cumulusnetworks.com 397
Cumulus Linux 3.7 User Guide
These commands create the following SVI configuration in the /etc/network/interfaces file:
auto bridge
iface bridge
bridge-ports swp1 swp2
bridge-vids 10
bridge-vlan-aware yes
auto vlan10
iface vlan10
address 10.100.100.1/24
vlan-id 10
vlan-raw-device bridge
Notice the vlan-raw-device keyword, which NCLU includes automatically. NCLU uses this
keyword to associate the SVI with the VLAN-aware bridge.
Alternately, you can use the bridge.VLAN-ID naming convention for the SVI. The following example
configuration can be manually created in the /etc/network/interfaces file, which functions identically
to the above configuration:
auto bridge
iface bridge
bridge-ports swp1 swp2
bridge-vids 10
bridge-vlan-aware yes
auto bridge.10
iface bridge.10
address 10.100.100.1/24
When a switch is initially configured, all southbound bridge ports may be down, which means that, by
default, the SVI is also down. However, you may want to force the SVI to always be up, to perform
connectivity testing, for example. To do this, you essentially need to disable interface state tracking, leaving
the SVI in the UP state always, even if all member ports are down. Other implementations describe this
feature as no autostate.
In Cumulus Linux, you can keep the SVI perpetually UP by creating a dummy interface, and making the
dummy interface a member of the bridge. Consider the following configuration, without a dummy interface
in the bridge:
auto bridge
iface bridge
bridge-vlan-aware yes
bridge-ports swp3
bridge-vids 100
bridge-pvid 1
...
With this configuration, when swp3 is down, the SVI is also down:
1. Create a dummy interface, and add it to the bridge configuration. You do this by editing the /etc
/network/interfaces file and adding the dummy interface stanza before the bridge stanza:
auto dummy
iface dummy
link-type dummy
auto bridge
iface bridge
...
2. Continue editing the interfaces file. Add the dummy interface to the bridge-ports line in the
bridge configuration:
auto bridge
iface bridge
bridge-vlan-aware yes
bridge-ports swp3 dummy
bridge-vids 100
bridge-pvid 1
cumulusnetworks.com 399
3.
Now, even when swp3 is down, both the dummy interface and the bridge remain up:
...
auto vlan100
iface vlan 100
ipv6-addrgen off
vlan-id 100
vlan-raw-device bridge
...
To disable automatic address generation for a virtual IPv6 address on VLAN 100, run:
...
auto vlan100
iface vlan 100
address-virtual-ipv6-addrgen off
vlan-id 100
vlan-raw-device bridge
...
or
cumulusnetworks.com 401
Cumulus Linux 3.7 User Guide
Related Information
Linux Foundation - VLANs
Linux Journal - Linux as an Ethernet Bridge
Comparing Traditional Bridge Mode to VLAN-aware Bridge Mode
You can configure both VLAN-aware and traditional mode bridges on the same network in
Cumulus Linux; however you should not have more than one VLAN-aware bridge on a given
switch.
Contents
This topic describes ...
Configure a VLAN-aware Bridge (see page 403)
Example Configurations (see page 404)
VLAN Filtering/VLAN Pruning (see page 404)
Untagged/Access Ports (see page 405)
Drop Untagged Frames (see page 406)
VLAN Layer 3 Addressing — Switch Virtual Interfaces and Other VLAN Attributes (see page
407)
Configure ARP Timers (see page 408)
Configure Multiple Ports in a Range (see page 408)
Access Ports and Pruned VLANs (see page 409)
Large Bond Set Configuration (see page 410)
VXLANs with VLAN-aware Bridges (see page 412)
Configure a Static MAC Address Entry (see page 412)
Caveats and Errata (see page 413)
Spanning Tree Protocol (STP) (see page 413)
402 09 January 2019
Cumulus Networks
...
auto bridge
iface bridge
bridge-ports swp1 swp2
bridge-pvid 1
bridge-vids 100 200
bridge-vlan-aware yes
...
For a definitive list of bridge attributes, run ifquery --syntax-help and look for the entries under
bridge, bridgevlan and mstpctl.
The bridge-pvid 1 is implied by default. You do not have to specify bridge-pvid for a bridge
or a port; in this case, the VLAN is untagged. And while it does not hurt the configuration, it helps
other users for readability.
The following configurations are identical to each other and the configuration above:
Do not try to bridge the management port, eth0, with any switch ports (like swp0, swp1 and so
forth). For example, if you created a bridge with eth0 and swp1, it will not work properly and may
disrupt access to the management interface.
Example Configurations
...
auto bridge
iface bridge
bridge-ports swp1 swp2 swp3
bridge-pvid 1
bridge-vids 100 200
bridge-vlan-aware yes
auto swp3
iface swp3
bridge-vids 200
Untagged/Access Ports
Access ports ignore all tagged packets. In the configuration below, swp1 and swp2 are configured as access
ports, while all untagged traffic goes to VLAN 100, as specified in the example below:
...
auto bridge
iface bridge
bridge-ports swp1 swp2
bridge-pvid 1
bridge-vids 100 200
bridge-vlan-aware yes
cumulusnetworks.com 405
Cumulus Linux 3.7 User Guide
auto swp1
iface swp1
bridge-access 100
auto swp2
iface swp2
bridge-access 100
...
auto bridge
iface bridge
bridge-ports swp1 swp2
bridge-pvid 1
bridge-vids 10 100 200
bridge-vlan-aware yes
This command creates the following configuration snippet in the /etc/network/interfaces file. Note
the bridge-allow-untagged configuration is under swp2:
...
auto swp1
iface swp1
auto swp2
iface swp2
bridge-allow-untagged no
auto bridge
iface bridge
bridge-ports swp1 swp2
bridge-pvid 1
bridge-vids 10 100 200
bridge-vlan-aware yes
...
When you check VLAN membership for that port, it shows that there is no untagged VLAN.
VLAN Layer 3 Addressing — Switch Virtual Interfaces and Other VLAN Attributes
When configuring the VLAN attributes for the bridge, specify the attributes for each VLAN interface, each of
which is named vlan<vlanid>. If you are configuring the SVI for the native VLAN, you must declare the native
VLAN and specify its IP address. Specifying the IP address in the bridge stanza itself returns an error.
cumulusnetworks.com 407
Cumulus Linux 3.7 User Guide
auto bridge
iface bridge
bridge-ports swp1 swp2
bridge-pvid 1
bridge-vids 10 100 200
bridge-vlan-aware yes
auto vlan100
iface vlan100
address 192.168.10.1/24
address 2001:db8::1/32
vlan-id 100
vlan-raw-device bridge
In the above configuration, if your switch is configured for multicast routing, you do not need to
specify bridge-igmp-querier-src, as there is no need for a static IGMP querier configuration
on the switch. Otherwise, the static IGMP querier configuration helps to probe the hosts to
refresh their IGMP reports.
auto bridge
iface bridge
bridge-ports swp1 swp2 swp3 ... swp51 swp52
bridge-vids 310 700 707 712 850 910
bridge-vlan-aware yes
...
# ports swp3-swp48 are trunk ports which inherit vlans from the
'bridge'
# ie vlans 310,700,707,712,850,910
#
auto bridge
iface bridge
bridge-ports swp1 swp2 swp3 ... swp51 swp52
bridge-vids 310 700 707 712 850 910
bridge-vlan-aware yes
auto swp1
iface swp1
bridge-access 310
mstpctl-bpduguard yes
mstpctl-portadminedge yes
# The following port is the trunk uplink and inherits all vlans
# from 'bridge'; bridge assurance is enabled using 'portnetwork'
attribute
auto swp49
iface swp49
mstpctl-portnetwork yes
mstpctl-portpathcost 10
cumulusnetworks.com 409
Cumulus Linux 3.7 User Guide
# The following port is the trunk uplink and inherits all vlans
# from 'bridge'; bridge assurance is enabled using 'portnetwork'
attribute
auto swp50
iface swp50
mstpctl-portnetwork yes
mstpctl-portpathcost 0
...
...
#
# vlan-aware bridge with bonds example
#
# uplink1, peerlink and downlink are bond interfaces.
# 'bridge' is a vlan aware bridge with ports uplink1, peerlink
# and downlink (swp2-20).
#
# native vlan is by default 1
#
# 'bridge-vids' attribute is used to declare vlans.
# 'bridge-pvid' attribute is used to specify native vlans if other
than 1
# 'bridge-access' attribute is used to declare access port
#
auto lo
iface lo
auto eth0
iface eth0 inet dhcp
# bond interface
auto uplink1
iface uplink1
bond-slaves swp32
bridge-vids 2000-2079
# bond interface
auto peerlink
iface peerlink
bond-slaves swp30 swp31
bridge-vids 2000-2079 4094
# bond interface
auto downlink
iface downlink
bond-slaves swp1
bridge-vids 2000-2079
#
# Declare vlans for all swp ports
# swp2-20 get vlans from 2004 to 2022.
# The below uses mako templates to generate iface sections
# with vlans for swp ports
#
%for port, vlanid in zip(range(2, 20), range(2004, 2022)) :
auto swp${port}
iface swp${port}
bridge-vids ${vlanid}
%endfor
#
# vlan-aware bridge
#
auto bridge
iface bridge
bridge-ports uplink1 peerlink downlink swp1 swp2 swp49 swp50
bridge-vlan-aware yes
...
cumulusnetworks.com 411
Cumulus Linux 3.7 User Guide
See the VXLAN Scale (see page 692) topic for information about the number of VXLANs you can
configure simultaneously.
...
auto lo
iface lo inet loopback
address 10.35.0.10/32
auto bridge
iface bridge
bridge-ports uplink regex vni.*
bridge-pvid 1
bridge-vids 1-100
bridge-vlan-aware yes
auto vni-10000
iface vni-10000
alias CUSTOMER X VLAN 10
bridge-access 10
vxlan-id 10000
vxlan-local-tunnelip 10.35.0.10
vxlan-remoteip 10.35.0.34
...
IGMP Snooping
IGMP snooping and group membership are supported on a per-VLAN basis, though the IGMP snooping
configuration (including enable/disable and mrouter ports) are defined on a per-bridge port basis.
resv_vlan_range
While restarting switchd, all running ports will flap, and forwarding will be interrupted.
cumulusnetworks.com 413
Cumulus Linux 3.7 User Guide
VLAN Translation
A bridge in VLAN-aware mode cannot have VLAN translation enabled for it. Only traditional mode bridges
can utilize VLAN translation.
Contents
This topic describes ...
Create a Traditional Mode Bridge (see page 414)
Configure a Traditional Bridge with NCLU (see page 414)
Manually Configure a Traditional Mode Bridge (see page 416)
Trunks in Traditional Bridge Mode (see page 418)
Trunk Example (see page 419)
VLAN Tagging Examples (see page 419)
Configure ARP Timers (see page 419)
Caveats (see page 420)
The traditional bridge must be named something other than bridge, as that name is reserved for
the single VLAN-aware bridge (see page 402) that you can configure on the switch.
The following example shows how to create a simple traditional mode bridge configuration on the switch,
including adding the switch ports that are members of the bridge. You can choose to add one or more of
the following elements to the configuration:
You can add an IP address to provide IP access to the bridge interface.
The portautoedge attribute defaults to yes; to use a setting other than the default, you
must set this attribute to no.
The portrestrrole attribute defaults to no, but to use a setting other than the default, you
must specify this attribute without setting an option.
The defaults for these attributes do not appear in the NCLU configuration.
These commands create the following configuration snippet in the /etc/network/interfaces file:
...
auto swp1
iface swp1
mstpctl-portautoedge no
auto swp2
iface swp2
mstpctl-portrestrrole yes
auto swp3
iface swp3
auto swp4
iface swp4
...
auto my_bridge_A
iface my_bridge_A
address 10.10.10.10/24
bridge-ports swp1 swp2 swp3 swp4
bridge-vlan-aware no
cumulusnetworks.com 415
Cumulus Linux 3.7 User Guide
auto my_bridge
iface my_bridge
bridge-ports bond0 swp5 swp6
bridge-ageing 150
bridge-stp on
bridge-ports List of logical and physical ports belonging to the logical bridge. N/A
bridge-ageing Maximum amount of time before a MAC addresses learned on the 1800
bridge expires from the bridge MAC cache. seconds
bridge-stp Enables spanning tree protocol on this bridge. The default spanning off
tree mode is Per VLAN Rapid Spanning Tree Protocol (PVRST).
For more information on spanning-tree configurations see the
configuration section: Spanning Tree and Rapid Spanning Tree (see
page 360).
Do not try to bridge the management port, eth0, with any switch ports (like swp0, swp1,
and so forth). For example, if you created a bridge with eth0 and swp1, it will not work.
You can configure multiple bridges, in order to logically divide a switch into multiple layer 2
domains. This allows for hosts to communicate with other hosts in the same domain, while
separating them fro hosts in other domains.
The diagram below shows a multiple bridge configuration, where host-1 and host-2 are
connected to bridge-A, while host-3 and host-4 are connected to bridge-B. This means that:
host-1 and host-2 can communicate with each other.
host-3 and host-4 can communicate with each other.
host-1 and host-2 cannot communicate with host-3 and host-4.
auto bridge-A
iface bridge-A
cumulusnetworks.com 417
Cumulus Linux 3.7 User Guide
auto bridge-B
iface bridge-B
bridge-ports swp3 swp4
bridge-stp on
The interaction of tagged and un-tagged frames on the same trunk often leads to undesired and
unexpected behavior. A switch that uses VLAN 1 for the native VLAN may send frames to a switch
that uses VLAN 2 for the native VLAN, thus merging those two VLANs and their spanning tree
state.
Trunk Example
To create the above example, add the following configuration to the /etc/network/interfaces file:
auto br-VLAN100
iface br-VLAN100
bridge-ports swp1.100 swp2.100
bridge-stp on
auto br-VLAN200
iface br-VLAN200
bridge-ports swp1.200 swp2.200
bridge-stp on
move into a stale state. To keep neighbors in the reachable state, Cumulus Linux includes a background
cumulusnetworks.com 419
Cumulus Linux 3.7 User Guide
move into a stale state. To keep neighbors in the reachable state, Cumulus Linux includes a background
process (/usr/bin/neighmgrd) that tracks neighbors that move into a stale, delay or probe state, and
attempts to refresh their state ahead of any removal from the Linux kernel, and thus before it would be
removed from the hardware forwarding.
The ARP refresh timer defaults to 1080 seconds (18 minutes). You can change this setting by following the
procedures outlined in this knowledge base article.
Caveats
On Broadcom switches, when two VLAN subinterfaces are bridged to each other in a traditional mode
bridge, switchd does not assign an internal resource ID to the subinterface, which is expected for each
VLAN subinterface.
To work around this issue, add a VXLAN on the bridge so that it does not require a real tunnel IP address.
VLAN Tagging
This topic shows two examples of VLAN tagging, one basic and one more advanced. They both
demonstrate the streamlined interface configuration from ifupdown2.
Contents
This topic describes ...
VLAN Tagging, a Basic Example (see page 420)
VLAN Tagging, an Advanced Example (see page 421)
VLAN Translation (see page 426)
host1 connects to swp1 with both untagged frames and with 802.1Q frames tagged for vlan100.
host2 connects to swp2 with 802.1Q frames tagged for vlan120 and vlan130.
To configure the above example, edit the /etc/network/interfaces file and add a configuration like
the following:
auto swp1
iface swp1
auto swp1.100
iface swp1.100
auto swp2
iface swp2
auto swp2.120
iface swp2.120
auto swp2.130
iface swp2.130
cumulusnetworks.com 421
Cumulus Linux 3.7 User Guide
host1 connects to bridge br-untagged with bare Ethernet frames and to bridge br-tag100 with 802.1q
frames tagged for vlan100.
host2 connects to bridge br-tag100 with 802.1q frames tagged for vlan100 and to bridge br-vlan120
with 802.1q frames tagged for vlan120.
host3 connects to bridge br-vlan120 with 802.1q frames tagged for vlan120 and to bridge v130 with
802.1q frames tagged for vlan130.
bond2 carries tagged and untagged frames in this example.
Although not explicitly designated, the bridge member ports function as 802.1Q access ports and trunk ports
. In the example above, comparing Cumulus Linux with a traditional Cisco device:
swp1 is equivalent to a trunk port with untagged and vlan100.
swp2 is equivalent to a trunk port with vlan100 and vlan120.
swp3 is equivalent to a trunk port with vlan120 and vlan130.
bond2 is equivalent to an EtherChannel in trunk mode with untagged, vlan100, vlan120, and vlan130.
Bridges br-untagged, br-tag100, br-vlan120, and v130 are equivalent to SVIs (switched virtual
interfaces).
To create the above configuration, edit the /etc/network/interfaces file and add a configuration like
the following:
auto swp1.100
iface swp1.100
auto swp2.100
iface swp2.100
auto swp2.120
iface swp2.120
auto swp3.120
iface swp3.120
auto swp3.130
iface swp3.130
auto bond2
iface bond2
bond-slaves glob swp4-7
auto br-untagged
iface br-untagged
address 10.0.0.1/24
bridge-ports swp1 bond2
bridge-stp on
auto br-tag100
iface br-tag100
address 10.0.100.1/24
bridge-ports swp1.100 swp2.100 bond2.100
bridge-stp on
cumulusnetworks.com 423
Cumulus Linux 3.7 User Guide
auto br-vlan120
iface br-vlan120
address 10.0.120.1/24
bridge-ports swp2.120 swp3.120 bond2.120
bridge-stp on
auto v130
iface v130
address 10.0.130.1/24
bridge-ports swp3.130 bond2.130
bridge-stp on
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
To verify:
802.3ad info
LACP rate: fast
Min links: 0
Aggregator selection policy (ad_select): stable
Active Aggregator Info:
Aggregator ID: 3
Number of ports: 4
Actor Key: 33
Partner Key: 33
Partner Mac Address: 44:38:39:00:32:cf
cumulusnetworks.com 425
Cumulus Linux 3.7 User Guide
Aggregator ID: 3
Slave queue ID: 0
A single bridge cannot contain multiple subinterfaces of the same port as members. Attempting
to apply such a configuration will result in an error:
VLAN Translation
By default, Cumulus Linux does not allow VLAN subinterfaces associated with different VLAN IDs to be part
of the same bridge. Base interfaces are not explicitly associated with any VLAN IDs and are exempt from
this restriction.
In some cases, it may be useful to relax this restriction. For example, two servers might be connected to the
switch using VLAN trunks, but the VLAN numbering provisioned on the two servers are not consistent. You
can choose to just bridge two VLAN subinterfaces of different VLAN IDs from the servers. You do this by
enabling the sysctl net.bridge.bridge-allow-multiple-vlans. Packets entering a bridge from a
member VLAN subinterface will egress another member VLAN subinterface with the VLAN ID translated.
A bridge in VLAN-aware mode (see page 402) cannot have VLAN translation enabled for it; only
bridges configured in traditional mode (see page 414) can utilize VLAN translation.
If the sysctl is enabled and you want to disable it, run the above example, setting the sysctl net.
bridge.bridge-allow-multiple-vlans to 0.
After sysctl is enabled, ports with different VLAN IDs can be added to the same bridge. In the following
example, packets entering the bridge br-mix from swp10.100 will be bridged to swp11.200 with the VLAN
ID translated from 100 to 200:
MLAG or CLAG?
The Cumulus Linux implementation of MLAG is referred to by other vendors as CLAG, MC-LAG or
VPC. You will even see references to CLAG in Cumulus Linux, including the management daemon,
named clagd, and other options in the code, such as clag-id, which exist for historical
purposes. The Cumulus Linux implementation is truly a multi-chassis link aggregation protocol, so
we call it MLAG.
Dual-connected devices can create LACP bonds that contain links to each physical switch. Therefore, active-
active links from the dual-connected devices are supported even though they are connected to two
different physical switches.
A basic setup looks like this:
The two switches, S1 and S2, known as peer switches, cooperate so that they appear as a single device to
host H1's bond. H1 distributes traffic between the two links to S1 and S2 in any way that you configure on
the host. Similarly, traffic inbound to H1 can traverse S1 or S2 and arrive at H1.
Contents
This topic describes ...
MLAG Requirements (see page 429)
LACP and Dual-Connectedness (see page 430)
MLAG Requirements
MLAG has these requirements:
There must be a direct connection between the two peer switches implementing MLAG (S1 and S2).
This is typically a bond for increased reliability and bandwidth.
There must be only two peer switches in one MLAG configuration, but you can have multiple
configurations in a network for switch-to-switch MLAG (see below).
The peer switches implementing MLAG must be running Cumulus Linux version 2.5 or later.
You must specify a unique clag-id for every dual-connected bond on each peer switch; the value
must be between 1 and 65535 and must be the same on both peer switches in order for the bond
to be considered dual-connected.
The dual-connected devices (servers or switches) can use LACP (IEEE 802.3ad/802.1ax) to form the
bond (see page 387). In this case, the peer switches must also use LACP.
If for some reason you cannot use LACP, you can also use balance-xor mode (see page
cumulusnetworks.com 429
Cumulus Linux 3.7 User Guide
If for some reason you cannot use LACP, you can also use balance-xor mode (see page
391) to dual-connect host-facing bonds in an MLAG environment. If you do, you must still
configure the same clag_id parameter on the MLAG bonds, and it must be the same
on both MLAG switches. Otherwise, the MLAG switch pair treats the bonds as if they are
single-connected.
More elaborate configurations are also possible. The number of links between the host and the switches
can be greater than two, and does not have to be symmetrical:
Additionally, because S1 and S2 appear as a single switch to other bonding devices, you can also connect
pairs of MLAG switches to each other in a switch-to-switch MLAG setup:
In this case, L1 and L2 are also MLAG peer switches, and present a two-port bond from a single logical
system to S1 and S2. S1 and S2 do the same as far as L1 and L2 are concerned. For a switch-to-switch
MLAG configuration, each switch pair must have a unique system MAC address. In the above example,
switches L1 and L2 each have the same system MAC address configured. Switch pair S1 and S2 each have
the same system MAC address configured; however, it is a different system MAC address than the one used
by the switch pair L1 and L2.
Typically, Link Aggregation Control Protocol (LACP), the IEEE standard protocol for managing bonds, is used
430 09 January 2019
Cumulus Networks
Typically, Link Aggregation Control Protocol (LACP), the IEEE standard protocol for managing bonds, is used
for verifying dual-connectedness. LACP runs on the dual-connected device and on each of the peer
switches. On the dual-connected device, the only configuration requirement is to create a bond that is
managed by LACP.
However, if for some reason you cannot use LACP in your environment, you can configure the bonds in
balance-xor mode (see page 391). When using balance-xor mode to dual-connect host-facing bonds in an
MLAG environment, you must configure the clag_id parameter on the MLAG bonds, which must be the
same on both MLAG switches. Otherwise, the bonds are treated by the MLAG switch pair as if they are
single-connected. In short, dual-connectedness is solely determined by matching clag_id and any
misconnection will not be detected.
On each of the peer switches, you must place the links that are connected to the dual-connected host or
switch in the bond. This is true even if the links are a single port on each peer switch, where each port is
placed into a bond, as shown below:
All of the dual-connected bonds on the peer switches have their system ID set to the MLAG system ID.
Therefore, from the point of view of the hosts, each of the links in its bond is connected to the same
system, and so the host uses both links.
Each peer switch periodically makes a list of the LACP partner MAC addresses for all of their bonds and
sends that list to its peer (using the clagd service; see below). The LACP partner MAC address is the MAC
address of the system at the other end of a bond (hosts H1, H2, and H3 in the figure above). When a switch
receives this list from its peer, it compares the list to the LACP partner MAC addresses on its switch. If any
matches are found and the clag-id for those bonds match, then that bond is a dual-connected bond.
You can also find the LACP partner MAC address by the running net show bridge macs command or
by examining the /sys/class/net/<bondname>/bonding/ad_partner_mac sysfs file for each
bond.
Configure MLAG
To configure MLAG, you need to:
Create a bond that uses LACP, on the dual-connected devices.
Configure the interfaces, including bonds, VLANs, bridges and peer links, on each peer switch.
Port configuration; for example, VLAN membership, MTU (see page 454), and bonding
cumulusnetworks.com 431
Cumulus Linux 3.7 User Guide
Port configuration; for example, VLAN membership, MTU (see page 454), and bonding
parameters.
Bridge configuration; for example, spanning tree parameters or bridge properties.
Static address entries; for example, static FDB entries and static IGMP entries.
QoS configuration; for example, ACL entries.
You can verify the configuration of VLAN membership with the net show clag verify-vlans
command.
Click to see the output ...
cumulusnetworks.com 433
Cumulus Linux 3.7 User Guide
Important
You cannot use the same MAC address for different MLAG pairs. Make sure you specify a
different clag sys-mac setting for each MLAG pair in the network.
cumulusnetworks.com 435
Cumulus Linux 3.7 User Guide
Do not add VLAN 4094 to the bridge VLAN list; VLAN 4094 for the peerlink subinterface should
not also be configured as a bridged VLAN with bridge VIDs under the bridge.
The above commands produce the following configuration in the /etc/network/interfaces
file:
auto peerlink
iface peerlink
bond-slaves swp49 swp50
auto peerlink.4094
iface peerlink.4094
address 169.254.1.1/30
clagd-peer-ip 169.254.1.2
clagd-backup-ip 192.0.2.50
clagd-sys-mac 44:38:39:FF:40:94
To enable MLAG, peerlink must be added to a traditional or VLAN-aware bridge. The commands
below add peerlink to a VLAN-aware bridge:
auto bridge
iface bridge
bridge-ports peerlink
bridge-vlan-aware yes
If you change the MLAG configuration by editing the interfaces file, the changes take
effect when you bring the peer link interface up with ifup. Do not use systemctl
restart clagd.service to apply the new configuration.
By default, the role is determined by comparing the MAC addresses of the two sides of the peering link; the
switch with the lower MAC address assumes the primary role. You can override this by setting the clagd-
priority option for the peer link:
The switch with the lower priority value is given the primary role; the default value is 32768 and the range is
0 to 65535. Read the clagd(8) and clagctl(8) man pages for more information.
When the clagd service is exited during switch reboot or the service is stopped in the primary switch, the
peer switch that is in the secondary role becomes the primary.
However, if the primary switch goes down without stopping the clagd service for any reason, or if the peer
link goes down, the secondary switch does not change its role. In case the peer switch is determined to be
not alive, the switch in the secondary role rolls back the LACP system ID to be the bond interface MAC
address instead of the clagd-sys-mac and the switch in primary role uses the clagd-sys-mac as the
LACP system ID on the bonds.
You can see a more traditional layer 2 example configuration in NCLU; run net example clag
l2-with-server-vlan-trunks. For a very basic configuration with just one pair of switches
and a single host, run net example clag l2-with-server-vlan-trunks.
cumulusnetworks.com 437
Cumulus Linux 3.7 User Guide
You configure these interfaces using NCLU (see page 88), so the bridges are in VLAN-aware mode (see
page 402). The bridges use these Cumulus Linux-specific keywords:
bridge-vids, which defines the allowed list of tagged 802.1q VLAN IDs for all bridge member
interfaces. You can specify non-contiguous ranges with a space-separated list, like
bridge-vids 100-200 300 400-500.
bridge-pvid, which defines the untagged VLAN ID for each port. This is commonly referred to as
the native VLAN.
The bridge configurations below indicate that each bond carries tagged frames on VLANs 10, 20, 30, 40, 50,
and 100 to 200 (as specified by bridge-vids), but untagged frames on VLAN 1 (as specified by bridge-
pvid). Also, take note on how you configure the VLAN subinterfaces used for clagd communication (
peerlink.4094 in the sample configuration below). Finally, the host configurations for server01 through
server04 are not shown here. The configurations for each corresponding node are almost identical, except
for the IP addresses used for managing the clagd service.
VLAN Precautions
At minimum, this VLAN subinterface should not be in your layer 2 domain, and you should give it
a very high VLAN ID (up to 4094). Read more about the range of VLAN IDs you can use (see page
413).
The commands to create the configurations for both spines look like the following. Note that the clag-id
and clagd-sys-mac must be the same for the corresponding bonds on spine01 and spine02:
spine01 spine02
These commands create the following These commands create the following
configuration in the /etc/network/interfaces configuration in the /etc/network/interfaces
file: file:
# downlinks # downlinks
auto swp1 auto swp1
iface swp1 iface swp1
Here is an example configuration for the switches leaf01 through leaf04. Note that the clag-id and
clagd-sys-mac must be the same for the corresponding bonds on leaf01 and leaf02 as well as leaf03
and leaf04:
leaf01 leaf02
cumulusnetworks.com 439
Cumulus Linux 3.7 User Guide
These commands create the following These commands create the following configuration
configuration in the /etc/network/interfaces in the /etc/network/interfaces file:
file:
cumulusnetworks.com 441
Cumulus Linux 3.7 User Guide
leaf03 leaf04
net add bgp neighbor fabric net add bgp neighbor fabric
peer-group peer-group
net add bgp neighbor fabric net add bgp neighbor fabric
remote-as external remote-as external
net add bgp ipv4 unicast net add bgp ipv4 unicast
neighbor fabric prefix-list dc- neighbor fabric prefix-list dc-
leaf-in in leaf-in in
net add bgp ipv4 unicast net add bgp ipv4 unicast
neighbor fabric prefix-list dc- neighbor fabric prefix-list dc-
leaf-out out leaf-out out
net add bgp neighbor swp51-52 net add bgp neighbor swp51-52
interface peer-group fabric interface peer-group fabric
net add vlan 100 ip address net add vlan 100 ip address
172.16.1.3/24 172.16.1.4/24
net add bgp ipv4 unicast net add bgp ipv4 unicast
network 172.16.1.3/24 network 172.16.1.4/24
net add clag peer sys-mac 44: net add clag peer sys-mac 44:38:
38:39:FF:00:02 interface swp49- 39:FF:00:02 interface swp49-50
50 primary backup-ip secondary backup-ip 192.168.1.13
192.168.1.14 net add clag port bond server3
net add clag port bond server3 interface swp1 clag-id 3
interface swp1 clag-id 3 net add clag port bond server4
net add clag port bond server4 interface swp2 clag-id 4
interface swp2 clag-id 4 net add bond server3-4 bridge
net add bond server3-4 bridge access 100
access 100 net add bond server3-4 stp
net add bond server3-4 stp portadminedge
portadminedge net add bond server3-4 stp
net add bond server3-4 stp bpduguard
bpduguard
These commands create the following configuration
These commands create the following in the /etc/network/interfaces file:
configuration in the /etc/network/interfaces
file:
cumulus@leaf04:~$ cat /etc
/network/interfaces
cumulus@leaf03:~$ cat /etc auto lo
/network/interfaces iface lo inet loopback
auto lo address 10.0.0.14/32
iface lo inet loopback
address 10.0.0.13/32
auto eth0
iface eth0 inet dhcp
auto eth0
iface eth0 inet dhcp
auto swp1
iface swp1
auto swp1
iface swp1
auto swp2
iface swp2
cumulusnetworks.com 443
Cumulus Linux 3.7 User Guide
auto swp2
iface swp2
# peerlink
auto swp49
# peerlink iface swp49
auto swp49 post-up ip link set $IFACE
iface swp49 promisc on # Only required
post-up ip link set $IFACE on VX
promisc on # Only required
on VX
auto swp50
iface swp50
auto swp50 post-up ip link set $IFACE
iface swp50 promisc on # Only required
post-up ip link set $IFACE on VX
promisc on # Only required
on VX
# uplinks
auto swp51
# uplinks iface swp51
auto swp51
iface swp51
auto swp52
iface swp52
auto swp52
iface swp52
# bridge to hosts
auto bridge
# bridge to hosts iface bridge
auto bridge bridge-ports peerlink
iface bridge server3 server4
bridge-ports peerlink bridge-vids 100
server3 server4 bridge-vlan-aware yes
bridge-vids 100
bridge-vlan-aware yes
auto peerlink
iface peerlink
auto peerlink bond-slaves swp49 swp50
iface peerlink
bond-slaves swp49 swp50
auto peerlink.4094
iface peerlink.4094
auto peerlink.4094 address 169.254.1.2/30
iface peerlink.4094 clagd-backup-ip 192.168.1.13
address 169.254.1.1/30 clagd-peer-ip 169.254.1.1
clagd-backup-ip clagd-sys-mac 44:38:39:FF:
192.168.1.14 00:02
clagd-peer-ip 169.254.1.2
clagd-priority 1000
clagd-sys-mac 44:38:39:FF: auto server3
00:02 iface server3
bond-slaves swp1
bridge-access 100
auto server3 clag-id 3
iface server3 mstpctl-bpduguard yes
bond-slaves swp1 mstpctl-portadminedge yes
bridge-access 100
clag-id 3
mstpctl-bpduguard yes auto server4
mstpctl-portadminedge yes iface server4
bond-slaves swp2
bridge-access 100
auto server4 clag-id 4
iface server4 mstpctl-bpduguard yes
bond-slaves swp2 mstpctl-portadminedge yes
bridge-access 100
clag-id 4 auto vlan100
mstpctl-bpduguard yes iface vlan100
mstpctl-portadminedge yes address 172.16.1.4/24
vlan-id 100
auto vlan100 vlan-raw-device bridge
iface vlan100
address 172.16.1.3/24
vlan-id 100
vlan-raw-device bridge
Use clagd-priority to set the role of the MLAG peer switch to primary or secondary. Each peer switch
in an MLAG pair must have the same clagd-sys-mac setting. Each clagd-sys-mac setting must be
unique to each MLAG pair in the network. For more details, refer to man clagd.
cumulusnetworks.com 445
Cumulus Linux 3.7 User Guide
CLAG Interfaces
Our Interface Peer Interface CLAG Id
Conflicts Proto-Down Reason
---------------- ---------------- -------
-------------------- -----------------
server1 server1 1
- -
server2 server2 2
- -
A command line utility called clagctl is available for interacting with a running clagd service to get status
or alter operational behavior. For a detailed explanation of the utility, refer to the clagctl(8)man page.
See the clagctl Output ...
The following is a sample output of the MLAG operational status displayed by clagctl:
To configure MLAG with a traditional mode bridge, the peer link and all dual-connected links must be
configured as untagged/native (see page 414) ports on a bridge (note the absence of any VLANs in the
bridge-ports line and the lack of the bridge-vlan-aware parameter below):
auto br0
iface br0
bridge-ports peerlink spine1-2 host1 host2
The following example shows you how to allow VLAN 100 across the peer link:
auto br0.100
iface br0.100
bridge-ports peerlink.100 bond1.100
bridge-stp on
For a deeper comparison of traditional versus VLAN-aware bridge modes, read this knowledge base article.
cumulusnetworks.com 447
Cumulus Linux 3.7 User Guide
The backup IP address must be different than the peer link IP address (clagd-peer-
ip). It must be reachable by a route that does not use the peer link and it must be in
the same network namespace as the peer link IP address.
Cumulus Networks recommends you use the management IP address of the switch for
this purpose.
You can also specify the backup UDP port. The port defaults to 5342, but you can configure it as
an argument in clagd-args using --backupPort <PORT>.
To see the backup IP address, run the net show clag command:
CLAG Interfaces
Our Interface Peer Interface CLAG Id
Conflicts Proto-Down Reason
Verify the backup link by running the net show clag backup-ip command:
...
cumulusnetworks.com 449
Cumulus Linux 3.7 User Guide
auto swp52s0
iface swp52s0
address 192.0.2.1/24
vrf green
auto green
iface green
vrf-table auto
auto peer5.4000
iface peer5.4000
address 192.0.2.15/24
clagd-peer-ip 192.0.2.16
clagd-backup-ip 192.0.2.2 vrf green
clagd-sys-mac 44:38:39:01:01:01
...
You can verify the configuration with the net show clag status verbose command:
CLAG Interfaces
Our Interface Peer Interface CLAG Id
Conflicts Proto-Down Reason
---------------- ---------------- -------
-------------------- -----------------
bond4 bond4 4
- -
bond1 bond1 1
- -
bond2 bond2 2
- -
bond3 bond3 3
- -
...
After bonds are identified as dual-connected, clagd sends more information to the peer switch for those
bonds. The MAC addresses (and VLANs) that are dynamically learned on those ports are sent along with the
LACP partner MAC address for each bond. When a switch receives MAC address information from its peer,
it adds MAC address entries on the corresponding ports. As the switch learns and ages out MAC addresses,
it informs the peer switch of these changes to its MAC address table so that the peer can keep its table
synchronized. Periodically, at 45% of the bridge ageing time, a switch sends its entire MAC address table to
the peer, so that peer switch can verify that its MAC address table is properly synchronized.
The switch sends an update frequency value in the messages to its peer, which tells clagd how often the
peer will send these messages. You can configure a different frequency by adding --lacpPoll
<SECONDS> to clagd-args:
cumulusnetworks.com 451
Cumulus Linux 3.7 User Guide
In this design, the spine switches route traffic between the server hosts in the layer 2 domains and the
core. The servers (host1 thru host4) each have a layer 2 connection up to the spine layer where the default
gateway for the host subnets resides. However, since the spine switches as gateway devices communicate
at layer 3, you need to configure a protocol such as VRR (see page 462) (Virtual Router Redundancy)
between the spine switch pair to support active/active forwarding.
Then, to connect the spine switches to the core switches, you need to determine whether the routing is
static or dynamic. If it is dynamic, you must choose which protocol — OSPF (see page 738) or BGP (see
page 756) — to use. When enabling a routing protocol in an MLAG environment, it is also necessary to
manage the uplinks, because by default MLAG is not aware of layer 3 uplink interfaces. In the event of a
peer link failure, MLAG does not remove static routes or bring down a BGP or OSPF adjacency unless a
separate link state daemon such as ifplugd is used.
This monitoring is automatically configured and enabled as long as clagd is enabled (that is, clagd-peer-
ip and clagd-sys-mac are configured for an interface) and the clagd service is running. When clagd is
explicitly stopped, for example with the systemctl stop clagd.service command, monitoring of
clagd is also stopped.
...
...
cumulusnetworks.com 453
Cumulus Linux 3.7 User Guide
Configuring MTU
auto bridge
iface bridge
bridge-ports peerlink uplink server01
auto peerlink
iface peerlink
mtu 9216
auto server01
iface server01
mtu 9216
auto uplink
iface uplink
mtu 9216
Likewise, to ensure the MTU 9216 path is respected through the spine switches above, also
change the MTU setting for bridge bridge by configuring mtu 9216 for each of the following
members of bridge bridge on both spine01 and spine02: leaf01-02, leaf03-04, exit01-02, peerlink.
auto bridge
iface bridge
bridge-ports leaf01-02 leaf03-04 exit01-02 peerlink
auto exit01-02
iface exit01-02
mtu 9216
auto leaf01-02
iface leaf01-02
mtu 9216
auto leaf03-04
iface leaf03-04
mtu 9216
auto peerlink
iface peerlink
mtu 9216
Peerlink Sizing
The peerlink carries very little traffic when compared to the bandwidth consumed by dataplane traffic. In a
typical MLAG configuration, most every connection between the two switches in the MLAG pair is dual-
connected, so the only traffic going across the peerlink is traffic from the clagd process and some LLDP or
LACP traffic; the traffic received on the peerlink is not forwarded out of the dual-connected bonds.
However, there are some instances where a host is connected to only one switch in the MLAG pair; for
example:
You have a hardware limitation on the host where there is only one PCIE slot, and therefore, one
NIC on the system, so the host is only single-connected across that interface.
The host does not support 802.3ad and you cannot create a bond on it.
You are accounting for a link failure, where the host may become single connected until the failure
is rectified.
cumulusnetworks.com 455
Cumulus Linux 3.7 User Guide
In general, you need to determine how much bandwidth is traveling across the single-connected interfaces,
and allocate half of that bandwidth to the peerlink. We recommend half of the single-connected bandwidth
because, on average, one half of the traffic destined to the single-connected host arrives on the switch
directly connected to the single-connected host and the other half arrives on the switch that is not directly
connected to the single-connected host. When this happens, only the traffic that arrives on the switch that
is not directly connected to the single-connected host needs to traverse the peerlink, which is how you
calculate 50% of the traffic.
In addition, you might want to add extra links to the peerlink bond to handle link failures in the peerlink
bond itself.
In the illustration below, each host has two 10G links, with each 10G link going to each switch in the MLAG
pair. Each host has 20G of dual-connected bandwidth, so all three hosts have a total of 60G of dual-
connected bandwidth. We recommend you allocate at least 15G of bandwidth to each peerlink bond, which
represents half of the single-connected bandwidth.
Scaling this example out to a full rack, when planning for link failures, you need only allocate enough
bandwidth to meet your site's strategy for handling failure scenarios. Imagine a full rack with 40 servers and
two switches. You might plan for four to six servers to lose connectivity to a single switch and become
single connected before you respond to the event. So expanding upon our previous example, if you have
40 hosts each with 20G of bandwidth dual-connected to the MLAG pair, you might allocate 20G to 30G of
bandwidth to the peerlink — which accounts for half of the single-connected bandwidth for four to six
hosts.
enabled yes
role Designated
port id 8.002
state forwarding
..............
bpdufilter port no
clag ISL yes clag ISL Oper
UP yes
clag role primary clag dual conn
mac 00:00:00:00:00:00
clag remote portID F.FFF clag system
mac 44:38:39:FF:40:90
Use NCLU (see page 88) (net) commands for all spanning tree configurations, including bridge
priority, path cost and so forth. Do not use brctl commands for spanning tree, except for brctl
stp on/off, as changes are not reflected to mstpd and can create conflicts.
Troubleshooting
cumulusnetworks.com 457
Cumulus Linux 3.7 User Guide
When you run ethtool -S on a peerlink interface, the drops are indicated by the HwIfInDiscards
counter:
HwIfInDiscards: 2129675
This occurs when you have multiple LACP bonds between the same two LACP endpoints — for example, an
MLAG switch pair is one endpoint and an ESXi host is another. These bonds have duplicate LACP identifiers,
which are MAC addresses. This same warning could be triggered when you have a cabling or configuration
error.
LACP Bypass
On Cumulus Linux, LACP Bypass is a feature that allows a bond (see page 387) configured in 802.3ad mode
to become active and forward traffic even when there is no LACP partner. A typical use case for this feature
is to enable a host, without the capability to run LACP, to PXE boot while connected to a switch on a bond
configured in 802.3ad mode. Once the pre-boot process finishes and the host is capable of running LACP,
the normal 802.3ad link aggregation operation takes over.
Contents
This topic describes ...
LACP Bypass All-active Mode (see page 459)
Configure LACP Bypass (see page 460)
In an MLAG deployment (see page 427) where bond slaves of a host are connected to two
switches and the bond is in all-active mode, all the slaves of bond are active on both the primary
and secondary MLAG nodes.
auto bond1
iface bond1
bond-lacp-bypass-allow yes
bond-slaves swp51s2 swp51s3
clag-id 1
mstpctl-bpduguard yes
...
auto bridge
iface bridge
bridge-ports bond1 bond2 bond3 bond4 peer5
bridge-vids 100-105
bridge-vlan-aware yes
You can check the status of the configuration by running net show interface <bond> on the bond
and its slave interfaces:
Bond Details
------------------ -------------------------
Bond Mode: LACP
Load Balancing: Layer3+4
Minimum Links: 1
In CLAG: CLAG Active
LACP Sys Priority:
LACP Rate: Fast Timeout
LACP Bypass: LACP Bypass Not Supported
Untagged
----------
1
LLDP
-------- ---- ------------------
swp51s2(P) ==== swp1(spine01)
swp51s3(P) ==== swp1(spine02)
Use the cat command to verify that LACP bypass is enabled on a bond and its slave interfaces:
cumulusnetworks.com 461
Cumulus Linux 3.7 User Guide
The following configuration shows LACP bypass enabled for multiple active interfaces (all-active mode) with
a bridge in traditional bridge mode (see page 414):
auto bond1
iface bond1
bond-slaves swp3 swp4
bond-lacp-bypass-allow 1
auto br0
iface br0
bridge-ports bond1 bond2 bond3 bond4 peer5
mstpctl-bpduguard bond1=yes
A production implementation will have many more server hosts and network connections than
are shown here. However, this basic configuration provides a complete description of the
important aspects of the VRR setup.
As the bridges in each of the redundant routers are connected, they will each receive and reply to ARP
requests for the virtual router IP address.
Cumulus Networks recommends using MAC addresses from the reserved range when
cumulusnetworks.com 463
Cumulus Linux 3.7 User Guide
Cumulus Networks recommends using MAC addresses from the reserved range when
configuring VRR.
The reserved MAC address range for VRR is the same as for the Virtual Router
Redundancy Protocol (VRRP), as they serve similar purposes.
Contents
This topic describes ...
Configure a VRR-enabled Network (see page 464)
Configure the Routers (see page 464)
Configure the Hosts (see page 465)
Example VRR Configuration with MLAG (see page 465)
For networks using MLAG, use bond interfaces. Otherwise, use switch port interfaces.
Multiple inter-peer links are typically bonded interfaces, in order to accomodate higher
bandwidth between the routers, and to offer link redundancy.
The VLAN interface must have unique IP addresses for both the physical (the address
option below) and virtual (the address-virtual option below) interfaces, as the unique
address is used when the switch initiates an ARP request.
NCLU Commands
464 09 January 2019
Cumulus Networks
NCLU Commands
cumulus@switch:~$ net add bridge
cumulus@switch:~$ net add vlan 500 ip address 192.0.2.252/24
cumulus@switch:~$ net add vlan 500 ip address-virtual 00:00:5e:00:
01:01 192.0.2.254/24
cumulus@switch:~$ net add vlan 500 ipv6 address 2001:db8::1/32
cumulus@switch:~$ net add vlan 500 ipv6 address-virtual 00:00:5e:0
0:01:01 2001:db8::f/32
cumulus@switch:~$ net pending
cumulus@switch:~$ net commit
/etc/network/interfaces Snippet
auto bridge
iface bridge
bridge-vids 500
bridge-vlan-aware yes
auto vlan500
iface vlan500
address 192.0.2.252/24
address 2001:db8::1/32
address-virtual 00:00:5e:00:01:01 2001:db8::f/32 192.0.2.254/2
4
vlan-id 500
vlan-raw-device bridge
cumulusnetworks.com 465
Cumulus Linux 3.7 User Guide
These commands create the following These commands create the following
configuration in /etc/network/interfaces: configuration in /etc/network/interfaces:
cumulusnetworks.com 467
Cumulus Linux 3.7 User Guide
bond-xmit-hash-policy bond-xmit-hash-policy
layer3+4 layer3+4
address 172.16.1.101 address 172.16.1.101
netmask 255.255.255.0 netmask 255.255.255.0
post-up ip route add post-up ip route add
172.16.0.0/16 via 172.16.1.1 172.16.0.0/16 via 172.16.1.1
post-up ip route add post-up ip route add
10.0.0.0/8 via 172.16.1.1 10.0.0.0/8 via 172.16.1.1
ifplugd
ifplugd is an Ethernet link-state monitoring daemon, that can execute user-specified scripts to configure
an Ethernet device when a cable is plugged in, or automatically unconfigure it when a cable is removed.
Follow the steps below to install and configure the ifplugd daemon.
Contents
This topic describes ...
Install ifplugd (see page 469)
Configure ifplugd (see page 470)
Caveats and Errata (see page 471)
Install ifplugd
cumulusnetworks.com 469
2.
Configure ifplugd
Once ifplugd is installed, two configuration files must be edited to set up ifplugd:
/etc/default/ifplugd
/etc/ifplugd/action.d/ifupdown
ifplugd is configured on both both the primary and secondary MLAG (see page 427)
switches in this example.
INTERFACES="peerbond"
HOTPLUG_INTERFACES=""
ARGS="-q -f -u0 -d1 -w -I"
SUSPEND_ACTION="stop"
#!/bin/sh
set -e
case "$2" in
up)
clagrole=$(clagctl | grep "Our Priority" | awk '{prin
t $8}')
if [ "$clagrole" = "secondary" ]
then
#List all the interfaces below to bring up when
clag peerbond comes up.
for interface in swp1 bond1 bond3 bond4
do
echo "bringing up : $interface"
ip link set $interface up
done
fi
;;
down)
clagrole=$(clagctl | grep "Our Priority" | awk '{prin
t $8}')
if [ "$clagrole" = "secondary" ]
then
#List all the interfaces below to bring down
when clag peerbond goes down.
for interface in swp1 bond1 bond3 bond4
do
echo "bringing down : $interface"
ip link set $interface down
done
fi
;;
esac
cumulusnetworks.com 471
Cumulus Linux 3.7 User Guide
Contents
This topic describes ...
Configure IGMP/MLD Querier (see page 472)
Disable IGMP and MLD Snooping (see page 473)
Troubleshooting (see page 474)
Related Information (see page 476)
auto bridge.100
vlan bridge.100
bridge-igmp-querier-src 123.1.1.1
auto bridge
iface bridge
For a VLAN-aware bridge, like bridge in the above example, to enable querier functionality for VLAN 100 in
the bridge, set bridge-mcquerier to 1 in the bridge stanza and set bridge-igmp-querier-src to
123.1.1.1 in the bridge.100 stanza.
You can specify a range of VLANs as well. For example:
auto bridge.[1-200]
vlan bridge.[1-200]
bridge-igmp-querier-src 123.1.1.1
For a bridge in traditional mode (see page 395), use a configuration like the following:
auto br0
iface br0
address 192.0.2.10/24
bridge-ports swp1 swp2 swp3
bridge-vlan-aware no
bridge-mcquerier 1
bridge-mcqifaddr 1
The commands above add the bridge-mcsnoop line to the following example bridge in /etc
/network/interfaces:
auto bridge
iface bridge
bridge-mcquerier 1
bridge-mcsnoop 0
cumulusnetworks.com 473
Cumulus Linux 3.7 User Guide
Troubleshooting
To show the IGMP/MLD snooping bridge state, run brctl showstp <bridge>:
swp1 (1)
port id 8001 state
forwarding
designated root 8000.7072cf8c272c path
cost 2
swp2 (2)
port id 8002 state
forwarding
designated root 8000.7072cf8c272c path
cost 2
designated bridge 8000.7072cf8c272c message age
timer 0.00
designated port 8002 forward delay
timer 0.00
designated cost 0 hold
timer 0.00
mc router 1 mc fast
leave 0
flags
swp3 (3)
port id 8003 state
forwarding
designated root 8000.7072cf8c272c path
cost 2
designated bridge 8000.7072cf8c272c message age
timer 0.00
designated port 8003 forward delay
timer 8.98
designated cost 0 hold
timer 0.00
mc router 1 mc fast
leave 0
flags
To show the groups and bridge port state, use the bridge mdb show command. To show router ports
and group information use the bridge -d -s mdb show command:
cumulusnetworks.com 475
Cumulus Linux 3.7 User Guide
Related Information
tools.ietf.org/html/rfc4541
en.wikipedia.org/wiki/IGMP_snooping
tools.ietf.org/rfc/rfc2236.txt
tools.ietf.org/html/rfc3376
tools.ietf.org/search/rfc2710
tools.ietf.org/html/rfc3810
Network
476 Virtualization 09 January 2019
Cumulus Networks
Network Virtualization
Cumulus Linux supports these forms of network virtualization:
VXLAN (Virtual Extensible LAN) is a standard overlay protocol that abstracts logical virtual networks from the
physical network underneath. You can deploy simple and scalable layer 3 Clos architectures while
extending layer 2 segments over that layer 3 network.
VXLAN uses a VLAN-like encapsulation technique to encapsulate MAC-based layer 2 Ethernet frames within
layer 3 UDP packets. Each virtual network is a VXLAN logical layer 2 segment. VXLAN scales to 16 million
segments – a 24-bit VXLAN network identifier (VNI ID) in the VXLAN header – for multi-tenancy.
Hosts on a given virtual network are joined together through an overlay protocol that initiates and
terminates tunnels at the edge of the multi-tenant network, typically the hypervisor vSwitch or top of rack.
These edge points are the VXLAN tunnel end points (VTEP).
Cumulus Linux can initiate and terminate VTEPs in hardware and supports wire-rate VXLAN. VXLAN
provides an efficient hashing scheme across the IP fabric during the encapsulation process; the source
UDP port is unique, with the hash based on layer 2 through layer 4 information from the original frame. The
UDP destination port is the standard port 4789.
Cumulus Linux includes the native Linux VXLAN kernel support and integrates with controller-based overlay
solutions like VMware NSX (see page 672) and Midokura MidoNet (see page 643).
VXLAN is supported only on switches in the Cumulus Linux HCL using the Broadcom Tomahawk, Trident II,
Trident II+ and Trident3 chipsets, as well as the Mellanox Spectrum chipset.
VXLAN encapsulation over layer 3 subinterfaces (for example, swp3.111) is not supported. Only
configure VXLAN uplinks as layer 3 interfaces without any subinterfaces (for example, swp3).
The VXLAN tunnel endpoints cannot share a common subnet; there must be at least one layer 3
hop between the VXLAN source and destination.
cumulusnetworks.com 477
Cumulus Linux 3.7 User Guide
Useful Links
VXLAN - RFC 7348
ovsdb-server
Contents
This topic describes ...
Requirements (see page 478)
Example Configuration (see page 479)
Configure Static VXLAN Tunnels (see page 479)
Verify the Configuration (see page 483)
Requirements
Cumulus Networks supports static VXLAN tunnels only on switches in the Cumulus Linux HCL using the
Broadcom Tomahawk, Trident II+ and Trident II ASICs, as well as the Mellanox Spectrum ASIC.
For a basic VXLAN configuration, make sure that:
The VXLAN has a network identifier (VNI); do not use 0 or 16777215 as the VNI ID, which are
reserved values under Cumulus Linux.
The VXLAN link and local interfaces are added to bridge to create the association between port,
VLAN, and VXLAN instance.
Each traditional bridge on the switch has only one VXLAN interface. Cumulus Linux does not
support more than one VXLAN ID per traditional bridge.
When deploying VXLAN with a VLAN-aware bridge, there is no restriction on using a single
VNI. This limitation is only present when using the traditional bridge configuration.
The VXLAN registration daemon (vxrd) is not running. Static VXLAN tunnels do not interoperate
with LNV or EVPN. If vxrd is running, stop it with the following command:
Example Configuration
The following topology is used in this chapter. Each IP address corresponds to the loopback address of the
switch.
cumulusnetworks.com 479
Cumulus Linux 3.7 User Guide
address 10.0.0.11/32
auto swp1
iface swp1
auto swp2
iface swp2
auto bridge
iface bridge
bridge-ports vni-10
bridge-vids 10
bridge-vlan-aware yes
auto vni-10
iface vni-10
bridge-access 10
mstpctl-bpduguard yes
mstpctl-portbpdufilter yes
vxlan-id 10
vxlan-local-tunnelip 10.0.0.11
vxlan-remoteip 10.0.0.12
vxlan-remoteip 10.0.0.13
vxlan-remoteip 10.0.0.14
leaf02
auto swp2
iface swp2
auto bridge
iface bridge
bridge-ports
vni-10
bridge-vids 10
bridge-vlan-
aware yes
auto vni-10
iface vni-10
bridge-access
10
mstpctl-
bpduguard yes
mstpctl-
portbpdufilter yes
vxlan-id 10
vxlan-local-
tunnelip 10.0.0.12
vxlan-
remoteip 10.0.0.11
vxlan-
remoteip 10.0.0.13
vxlan-
remoteip 10.0.0.14
leaf03
cumulusnetworks.com 481
Cumulus Linux 3.7 User Guide
auto swp2
iface swp2
auto bridge
iface bridge
bridge-ports
vni-10
bridge-vids 10
bridge-vlan-
aware yes
auto vni-10
iface vni-10
bridge-access
10
mstpctl-
bpduguard yes
mstpctl-
portbpdufilter yes
vxlan-id 10
vxlan-local-
tunnelip 10.0.0.13
vxlan-
remoteip 10.0.0.11
vxlan-
remoteip 10.0.0.12
vxlan-
remoteip 10.0.0.14
leaf04
auto bridge
iface bridge
bridge-ports
vni-10
bridge-vids 10
bridge-vlan-
aware yes
auto vni-10
iface vni-10
bridge-access
10
mstpctl-
bpduguard yes
mstpctl-
portbpdufilter yes
vxlan-id 10
vxlan-local-
tunnelip 10.0.0.14
vxlan-
remoteip 10.0.0.11
vxlan-
remoteip 10.0.0.12
vxlan-
remoteip 10.0.0.13
cumulusnetworks.com 483
Cumulus Linux 3.7 User Guide
Contents
This topic describes ...
Requirements (see page 484)
Example VXLAN Configuration (see page 484)
Configure the Static MAC Bindings VXLAN (see page 485)
Troubleshooting (see page 486)
Requirements
A VXLAN configuration requires a Broadcom switch with the Tomahawk, Trident II+, or Trident II ASIC
running Cumulus Linux 2.0 or later, or a Mellanox switch with the Spectrum ASIC running Cumulus Linux
3.2.0 or later.
For a basic VXLAN configuration, make sure that:
The VXLAN has a network identifier (VNI); do not use 0 or 16777215 as the VNI ID, which are
reserved values under Cumulus Linux.
The VXLAN link and local interfaces are added to the bridge to create the association between port,
VLAN, and VXLAN instance.
Preconfiguring remote MAC addresses does not scale. A better solution is to use the Cumulus
Networks Lightweight Network Virtualization feature, or a controller-based option like Midokura
MidoNet and OpenStack or VMware NSX.
auto vtep1000
iface vtep1000
vxlan-id 1000
vxlan-local-tunnelip 172.10.1.1
auto bridge
iface bridge
bridge-ports swp1 swp2 vtep1000
bridge-vids 10
bridge-vlan-aware yes
post-up bridge fdb add 0:00:10:00:00:0C dev vtep1000 dst 172.20.1.
1 vni 1000
auto vtep1000
iface vtep1000
vxlan-id 1000
vxlan-local-tunnelip 172.20.1.1
auto bridge
iface bridge
bridge-ports swp1 swp2 vtep1000
bridge-vlan-aware yes
post-up bridge fdb add 00:00:10:00:00:0A dev vtep1000 dst 172.10.1
.1 vni 1000
post-up bridge fdb add 00:00:10:00:00:0B dev vtep1000 dst 172.10.1
.1 vni 1000
Troubleshooting
Use the following commands to troubleshoot issues on the switch:
brctl show verifies the VXLAN configuration in a bridge:
LNV is a lightweight controller option. Contact Cumulus Networks with your scale requirements so
we can make sure this is the right fit for you. There are also other controller options that can work
on Cumulus Linux.
You cannot use LNV and EVPN (see page 539) at the same time.
Contents
This topic describes ...
LNV Concepts (see page 488)
Acquire the Forwarding Database at the Service Node (see page 489)
MAC Learning and Flooding (see page 489)
BUM Traffic (see page 489)
Requirements (see page 490)
Hardware Requirements (see page 490)
Configuration Requirements (see page 490)
Install the LNV Packages (see page 491)
Sample LNV Configuration (see page 491)
Network Connectivity (see page 491)
Layer 3 IP Addressing (see page 492)
LNV Concepts
Consider the following example deployment:
The two switches running Cumulus Linux, called leaf1 and leaf2, each have a bridge configured. These two
bridges contain the physical switch port interfaces connecting to the servers as well as the logical VXLAN
interface associated with the bridge. By creating a logical VXLAN interface on both leaf switches, the
switches become VTEPs (virtual tunnel end points). The IP address associated with this VTEP is most
commonly configured as its loopback address; in the image above, the loopback address is 10.2.1.1 for
leaf1 and 10.2.1.2 for leaf2.
BUM Traffic
Cumulus Linux has two ways of handling BUM (Broadcast Unknown-unicast and Multicast) traffic:
Head end replication
Service node replication
Head end replication is enabled by default in Cumulus Linux.
You cannot have both service node and head end replication configured simultaneously, as this
causes the BUM traffic to be duplicated; both the source VTEP and the service node send their
own copy of each packet to every remote VTEP.
cumulusnetworks.com 489
Cumulus Linux 3.7 User Guide
1. Disable head end replication; set head_rep to False in the /etc/vxrd.conf file.
2. Configure a service node IP address for every VXLAN interface using the vxlan-svcnodeip
parameter:
You only specify this parameter when head end replication is disabled. For the loopback,
the parameter is still named vxrd-svcnode-ip.
svcnode_ip = <>
enable_vxlan_listen = true
Requirements
Hardware Requirements
Broadcom switches with the Tomahawk, Trident II+, or Trident II ASIC or Mellanox switches with the
Spectrum ASIC running Cumulus Linux 2.5.4 or later. Please refer to the Cumulus Networks
hardware compatibility list for a list of supported switch models.
Configuration Requirements
The VXLAN has an associated VXLAN Network Identifier (VNI), also interchangeably called a VXLAN
ID.
The VNI cannot be 0 or 16777215, as these two numbers are reserved values under Cumulus Linux.
The VXLAN link and physical interfaces are added to the bridge to create the association between
the port, VLAN, and VXLAN instance.
Each bridge on the switch has only one VXLAN interface. Cumulus Linux does not support more
490 09 January 2019
Cumulus Networks
Each bridge on the switch has only one VXLAN interface. Cumulus Linux does not support more
than one VXLAN link in a bridge; however, a switch can have multiple bridges.
An SVI (Switch VLAN Interface) or layer 3 address on the bridge is not supported. For example, you
cannot ping from the leaf1 SVI to the leaf2 SVI through the VXLAN tunnel; you need to use server1
and server2 to verify.
Want to try out configuring LNV and do not have a Cumulus Linux switch? Check out Cumulus VX.
Network Connectivity
There must be full network connectivity before you can configure LNV. The layer 3 IP addressing
information as well as the OSPF configuration (/etc/frr/frr.conf) below is provided to make the LNV
example easier to understand.
OSPF is not a requirement for LNV, LNV just requires layer 3 connectivity. With Cumulus Linux this
can be achieved with static routes, OSPF or BGP.
cumulusnetworks.com 491
Cumulus Linux 3.7 User Guide
Layer 3 IP Addressing
Here is the configuration for the IP addressing information used in this example.
spine1: spine2:
These commands create the following These commands create the following
configuration: configuration:
leaf1: leaf2:
These commands create the following These commands create the following
configuration: configuration:
cumulusnetworks.com 493
Cumulus Linux 3.7 User Guide
Layer 3 Fabric
The service nodes and registration nodes must all be routable between each other. The layer 3 fabric on
Cumulus Linux can either be BGP (see page 756) or OSPF (see page 738). In this example, OSPF is used to
demonstrate full reachability. Click to expand the FRRouting configurations below.
Click to expand the OSPF configuration ...
FRRouting configuration using OSPF:
spine1: spine2:
These commands create the following These commands create the following
configuration: configuration:
leaf1: leaf2:
cumulusnetworks.com 495
Cumulus Linux 3.7 User Guide
These commands create the following These commands create the following
configuration: configuration:
Host Configuration
In this example, the servers are running Ubuntu 14.04. There needs to be a trunk mapped from server1
and server2 to the respective switch. In Ubuntu this is done with subinterfaces. You can expand the
configurations below.
Click to expand the host configurations ...
server1: server2:
On Ubuntu, it is more reliable to use ifup and if down to bring the interfaces up and down individually,
rather than restarting networking entirely (there is no equivalent to if reload like there is in Cumulus
Linux):
leaf1: leaf2:
cumulusnetworks.com 497
Cumulus Linux 3.7 User Guide
These commands create the following These commands create the following
configuration in the /etc/network/interfaces configuration in the /etc/network/interfaces
file: file:
auto lo auto lo
iface lo iface lo
address 10.2.1.1/32 address 10.2.1.2/32
vxrd-src-ip 10.2.1.1 vxrd-src-ip 10.2.1.2
bridge-access 10 bridge-access 10
mstpctl-bpduguard yes mstpctl-bpduguard yes
mstpctl-portbpdufilter yes mstpctl-portbpdufilter yes
vxlan-id 10 vxlan-id 10
vxlan-local-tunnelip 10.2.1.1 vxlan-local-tunnelip 10.2.1.2
Why is vni-2000 not vni-20? For example, why not tie VLAN 20 to VNI 20, or why was 2000 used?
VXLANs and VLANs do not need to be the same number. However if you are using fewer than
4096 VLANs, there is no reason not to make it easy and correlate VLANs to VXLANs. It is
completely up to you.
As with any logical interfaces on Linux, the name does not matter (other than a 15-character limit). To verify
the associated VNI for the logical name, use the ip -d link show command:
cumulusnetworks.com 499
Cumulus Linux 3.7 User Guide
The vxlan id 10 indicates the VXLAN ID/VNI is indeed 10 as the logical name suggests.
svcnode_ip = 10.2.1.3
cumulusnetworks.com 501
Cumulus Linux 3.7 User Guide
svcnode_ip = 10.2.1.3
Enable, then restart the registration node daemon for the change to take effect:
loglevel The log level: DEBUG, INFO, WARNING, ERROR, CRITICAL. INFO
logdest The destination for log messages. The destination can be a file name, syslog
stdout, or syslog.
logfilesize The log file size in bytes. Used when logdest is a file name. 512000
logbackupcount The maximum number of log files stored on the disk. Used when logdest 14
is a file name.
pidfile The PIF file location for the vxrd daemon. /var/run
/vxrd.
pid
udsfile The file name for the Unix domain socket used for management. /var/run
/vxrd.
sock
vxfld_port The UDP port used for VXLAN control messages. 10001
svcnode_ip The address to which registration daemons send control messages for
registration and or BUM packets for replication. You can also configure
this option in the /etc/network/interfaces file with the vxrd-
svcnode-ip keyword.
holdtime The hold time (in seconds) for soft state, which is how long the service 90
node waits before ageing out an IP address for a VNI. The vxrd includes seconds
this in the register messages it sends to a vxsnd.
src_ip The local IP address to bind to for receiving control traffic from the service
node daemon.
refresh_rate The number of times to refresh within the hold time. The higher this 3
number, the more lost UDP refresh messages can be tolerated. seconds
config_check_rate The number of seconds to poll the system for current VXLAN 5
membership. seconds
head_rep Enables self replication. Instead of using the service node to replicate true
BUM packets, it is done in hardware on the VTEP switch.
Use 1, yes, true, or on for True for each relevant option. Use 0, no, false, or off for False.
For the example configuration, default values are used, except for the svcnode_ip field.
The address field is set to the loopback address of the switch running the vxsnd daemon.
svcnode_ip = 10.2.1.3
Enable, then restart the service node daemon for the change to take effect:
loglevel The log level: DEBUG, INFO, WARNING, ERROR, CRITICAL. INFO
logdest syslog
cumulusnetworks.com 503
Cumulus Linux 3.7 User Guide
The destination for log messages. The destination can be a file name,
stdout, or syslog.
logfilesize The log file size in bytes. Used when logdest is a file name. 512000
logbackupcount The maximum number of log files stored on disk. Used when logdest 14
is a file name.
pidfile The PID file location for the vxrd daemon. /var/run
/vxrd.
pid
udsfile The file name for the Unix domain socket used for management. /var/run
/vxrd.
sock
vxfld_port The UDP port used for VXLAN control messages. 10001
svcnode_ip The address to which registration daemons send control messages for 0.0.0.0
registration and or BUM packets for replication.
holdtime The holdtime (in seconds) for soft state. This option is used when 90
sending a register message to peers in response to learning a <vni,
addr> from a VXLAN data packet.
src_ip The local IP address to bind to for receiving inter-vxsnd control traffic. 0.0.0.0
svcnode_peers A space-separated list of IP addresses with which the vxsnd shares its
state.
enable_vxlan_listen When set to true, the service node listens for VXLAN data traffic. true
install_svcnode_ip When set to true, the snd_peer_address gets installed on the false
loopback interface. It gets withdrawn when the vxsnd is not in service. If
set to true, you must define the snd_peer_address configuration
variable.
age_check Number of seconds to wait before checking the database to age out 90
stale entries. seconds
Use 1, yes, true, or on for True for each relevant option. Use 0, no, false, or off for False.
Use the vxrdctl peers command to see configured VNIs and all VTEPs (leaf switches) within the network
that have them configured.
cumulus@leaf1:~$ cumulus@leaf2:~$
vxrdctl peers vxrdctl peers
VNI Peer Addrs VNI Peer Addrs
=== ========== === ==========
10 10.2.1.1, 10 10.2.1.1,
10.2.1.2 10.2.1.2
30 10.2.1.1, 30 10.2.1.1,
10.2.1.2 10.2.1.2
2000 10.2.1.1, 2000 10.2.1.1,
10.2.1.2 10.2.1.2
When head end replication mode is disabled, the command does not work.
Use the vxrdctl peers command to see the other VTEPs (leaf switches) and the VNIs with
which they are associated. This does not show anything unless you enabled head end replication
mode by setting the head_rep option to True. Otherwise, replication is done by the service node.
cumulusnetworks.com 505
Cumulus Linux 3.7 User Guide
SVIs (switch VLAN interfaces) are not supported when using VXLAN. There cannot be an IP
address on the bridge that also contains a VXLAN.
10 10.10.10.1 10.10.10.2
30 10.10.30.1 10.10.30.2
The other VNIs were also tested and can be viewed in the expanded output below.
Test connectivity between VNI-2000 connected servers by pinging from server1:
cumulusnetworks.com 507
Cumulus Linux 3.7 User Guide
90:e2:ba:55:f0:85 appears in the MAC address table, which indicates that connectivity is occurring between
leaf1 and server1.
cumulusnetworks.com 509
Cumulus Linux 3.7 User Guide
spine1: spine2:
Add the 10.10.10.10/32 address to the loopback Add the 10.10.10.10/32 address to the loopback
address: address:
These commands create the following These commands create the following
configuration in the /etc/network/interfaces configuration in the /etc/network/interfaces
file: file:
auto lo auto lo
iface lo inet loopback iface lo inet loopback
address 10.2.1.3/32 address 10.2.1.4/32
address 10.10.10.10/32 address 10.10.10.10/32
spine1: spine2:
Use a text editor to edit the network configuration: Use a text editor to edit the network configuration:
This sets the address on which the service node This sets the address on which the service node
listens to VXLAN messages to the configured listens to VXLAN messages to the configured
Anycast address and sets it to sync with spine2. Anycast address and sets it to sync with spine1.
Enable, then restart the vxsnd daemon: Enable, then restart the vxsnd daemon:
cumulusnetworks.com 511
Cumulus Linux 3.7 User Guide
leaf1: leaf2:
Change the vxrd-svcnode-ip field to the Change the vxrd-svcnode-ip field to the
anycast address: anycast address:
These commands create the following These commands create the following
configuration in the /etc/network/interfaces configuration in the /etc/network/interfaces
file: file:
auto lo auto lo
iface lo inet loopback iface lo inet loopback
address 10.2.1.1 address 10.2.1.2
vxrd-svcnode-ip 10.10.10.10 vxrd-svcnode-ip 10.10.10.10
Verify the new service node is configured: Verify the new service node is configured:
The svcnode 10.10.10.10 means the The svcnode 10.10.10.10 means the
interface has the correct service node interface has the correct service node
configured. configured.
Use the vxrdctl vxlans command to check the Use the vxrdctl vxlans command to check the
service node: service node:
cumulusnetworks.com 513
Cumulus Linux 3.7 User Guide
Test Connectivity
Repeat the ping tests from the previous section. Here is the table again for reference:
10 10.10.10.1 10.10.10.2
30 10.10.30.1 10.10.30.2
Related Information
tools.ietf.org/html/rfc7348
en.wikipedia.org/wiki/Anycast
Network virtualization chapter, Cumulus Linux user guide (see page 476)
Contents
This topic describes ...
Terminology and Definitions (see page 515)
Configure LNV Active-active Mode (see page 516)
Active-active VTEP Anycast IP Behavior (see page 517)
Failure Scenario Behaviors (see page 517)
Check VXLAN Interface Configuration Consistency (see page 518)
Configure the Anycast IP Address (see page 518)
Example VXLAN Active-Active Configuration (see page 520)
FRRouting Configuration (see page 520)
Layer 3 IP Addressing (see page 520)
Host Configuration (see page 526)
Enable the Registration Daemon (see page 526)
Configure a VTEP (see page 527)
Enable the Service Node Daemon (see page 527)
Configure the Service Node (see page 527)
Considerations for Virtual Topologies Using Cumulus VX (see page 529)
Node ID (see page 529)
Bonds with Vagrant (see page 530)
Troubleshooting (see page 530)
Caveats and Errata (see page 532)
Related Information (see page 532)
Term Definition
vxrd
cumulusnetworks.com 515
Cumulus Linux 3.7 User Guide
Term Definition
The VXLAN registration daemon. The daemon runs on the switch that is mapping VLANs to
VXLANs. You must configure the vxrd daemon to register to a service node. This turns the
switch into a VTEP.
VTEP The virtual tunnel endpoint. This is an encapsulation and decapsulation point for VXLANs.
ToR The top of rack switch; also referred to as a leaf or access switch.
Spine The aggregation switch for multiple leafs. Specifically used when a data center is using a Clos
network architecture. Read more about spine-leaf architecture in this white paper.
vxsnd The VXLAN service node daemon that you can run to register multiple VTEPs.
exit A switch dedicated to peering the Clos network to an outside network; also referred to as a
leaf border leaf, service leaf, or edge leaf.
anycast When an IP address is advertised from multiple locations. Allows multiple devices to share the
same IP and effectively load balance traffic across them. With LNV, anycast is used in two
places:
RIOT A Broadcom feature for routing in and out of tunnels. Allows a VXLAN bridge to have a switch
VLAN interface associated with it, and traffic to exit a VXLAN into the layer 3 fabric. Also called
VXLAN Routing.
VXLAN The industry standard term for the ability to route in and out of a VXLAN. Equivalent to the
Routing Broadcom RIOT feature.
MLAG Refer to the MLAG chapter (see page ) for more detailed configuration information.
Configurations for the demonstration are provided below.
OSPF or Refer to the OSPF chapter (see page 738) or the BGP chapter (see page 756) for more
BGP detailed configuration information. Configurations for the demonstration are provided
below.
LNV Refer to the LNV chapter (see page 487) for more detailed configuration information.
Configurations for the demonstration are provided below.
STP You must enable BPDU filter and BPDU guard (see page ) in the VXLAN interfaces if
STP (see page 515) is enabled in the bridge that is connected to the VXLAN.
Configurations for the demonstration are provided below.
1 When the switches boot up, ifupdown2 places all VXLAN interfaces in a PROTO_DOWN state (see
page ). The configured anycast addresses are not configured yet.
2 MLAG peering takes place and a successful VXLAN interface consistency check between the switches
occurs.
3 clagd (the daemon responsible for MLAG) adds the anycast address to the loopback interface. It
then changes the local IP address of the VXLAN interface from a unique address to the anycast virtual
IP address and puts the interface in an UP state.
Scenario Behavior
The peer link The primary MLAG switch continues to keep all VXLAN interfaces up with the anycast
goes down. IP address while the secondary switch brings down all VXLAN interfaces and places
them in a PROTO_DOWN state. The secondary MLAG switch removes the anycast IP
address from the loopback interface and changes the local IP address of the VXLAN
interface to the configured unique IP address.
One of the The other operational switch continues to use the anycast IP address.
switches goes
down.
cumulusnetworks.com 517
Cumulus Linux 3.7 User Guide
Scenario Behavior
clagd is All VXLAN interfaces are put in a PROTO_DOWN state. The anycast IP address is
stopped. removed from the loopback interface and the local IP addresses of the VXLAN
interfaces are changed from the anycast IP address to unique non-virtual IP
addresses.
MLAG peering clagd brings up all the VXLAN interfaces after the reload timer expires with the
could not be configured anycast IP address. This allows the VXLAN interface to be up and running
established on both switches even though peering is not established.
between the
switches.
When the peer All VXLAN interfaces are put into a PROTO_DOWN state on the secondary switch.
link goes down
but the peer
switch is up (the
backup link is
active).
A configuration The VXLAN interface is placed into a PROTO_DOWN state on the secondary switch.
mismatch
between the
MLAG switches
auto lo
iface lo inet loopback
address 10.0.0.11/32
vxrd-src-ip 10.0.0.11
vxrd-svcnode-ip 10.10.10.10
clagd-vxlan-anycast-ip 10.10.10.20
auto lo
iface lo inet loopback
address 10.0.0.12/32
vxrd-src-ip 10.0.0.12
vxrd-svcnode-ip 10.10.10.10
clagd-vxlan-anycast-ip 10.10.10.20
Explanation of Variables
Variable Explanation
vxrd-src-
ip
The service node anycast IP address in the topology. In this demonstration, this is an
anycast IP address shared by both spine switches.
vxrd-
svcnode-ip
cumulusnetworks.com 519
Cumulus Linux 3.7 User Guide
Variable Explanation
The anycast address for the MLAG pair to share and bind to when MLAG is up and
running.
clagd-
vxlan-
anycast-ip
Note the configuration of the local IP address in the VXLAN interfaces below. They are configured with
individual IP addresses, which clagd changes to anycast upon MLAG peering.
FRRouting Configuration
You can configure the layer 3 fabric using BGP (see page 756) or OSPF (see page 738). The following
example uses BGP unnumbered. The MLAG switch configuration for the topology above is shown below.
Layer 3 IP Addressing
The IP address configuration for this example:
auto lo auto lo
iface lo inet loopback iface lo inet loopback
address 10.0.0.21/32 address 10.0.0.22/32
address 10.10.10.10/32 address 10.10.10.10/32
# downlinks # downlinks
auto swp1 auto swp1
iface swp1 iface swp1
auto lo auto lo
iface lo inet loopback iface lo inet loopback
address 10.0.0.11/32 address 10.0.0.12/32
vxrd-src-ip 10.0.0.11 vxrd-src-ip 10.0.0.12
vxrd-svcnode-ip 10.10.10.10 vxrd-svcnode-ip 10.10.10.10
clagd-vxlan-anycast-ip clagd-vxlan-anycast-ip
10.10.10.20 10.10.10.20
# peerlinks # peerlinks
auto swp49 auto swp49
iface swp49 iface swp49
cumulusnetworks.com 521
Cumulus Linux 3.7 User Guide
bond-xmit-hash-policy bond-xmit-hash-policy
layer3+4 layer3+4
# Downlinks # Downlinks
auto swp1 auto swp1
iface swp1 iface swp1
auto lo auto lo
iface lo inet loopback iface lo inet loopback
address 10.0.0.13/32 address 10.0.0.14/32
vxrd-src-ip 10.0.0.13 vxrd-src-ip 10.0.0.14
vxrd-svcnode-ip 10.10.10.10 vxrd-svcnode-ip 10.10.10.10
clagd-vxlan-anycast-ip clagd-vxlan-anycast-ip
10.10.10.30 10.10.10.30
cumulusnetworks.com 523
Cumulus Linux 3.7 User Guide
# peerlinks # peerlinks
auto swp49 auto swp49
iface swp49 iface swp49
# Downlinks # Downlinks
auto swp1 auto swp1
iface swp1 iface swp1
mstpctl-portbpdufilter mstpctl-portbpdufilter
vxlan1=yes vxlan1=yes
mstpctl-bpduguard mstpctl-bpduguard
vxlan1=yes vxlan1=yes
# uplinks # uplinks
auto swp51 auto swp51
iface swp51 iface swp51
cumulusnetworks.com 525
Cumulus Linux 3.7 User Guide
Host Configuration
In this example, the servers are running Ubuntu 14.04. A layer2 bond must be mapped from server01 and
server03 to the respective switch. In Ubuntu this is done with subinterfaces.
server01 server03
auto lo auto lo
iface lo inet loopback iface lo inet loopback
auto lo auto lo
iface lo inet static iface lo inet static
address 10.0.0.31/32 address 10.0.0.33/32
START=yes
Configure a VTEP
The registration node is configured earlier in /etc/network/interfaces; no additional configuration is
typically needed. Alternatively, you can perform the configuration in the /etc/vxrd.conf file, which has
additional configuration knobs available.
START=yes
cumulusnetworks.com 527
Cumulus Linux 3.7 User Guide
[common] [common]
# Log level is one of DEBUG, # Log level is one of DEBUG,
INFO, WARNING, ERROR, CRITICAL INFO, WARNING, ERROR, CRITICAL
#loglevel = INFO #loglevel = INFO
# Destination for log # Destination for log
message. Can be a file name, message. Can be a file name,
'stdout', or 'syslog' 'stdout', or 'syslog'
#logdest = syslog #logdest = syslog
# log file size in bytes. Used # log file size in bytes. Used
when logdest is a file when logdest is a file
#logfilesize = 512000 #logfilesize = 512000
# maximum number of log files # maximum number of log files
stored on disk. Used when stored on disk. Used when
logdest is a file logdest is a file
#logbackupcount = 14 #logbackupcount = 14
# The file to write the pid. # The file to write the pid.
If using monit, this must If using monit, this must
match the one match the one
# in the vxsnd.rc # in the vxsnd.rc
#pidfile = /var/run/vxsnd.pid #pidfile = /var/run/vxsnd.pid
# The file name for the unix # The file name for the unix
domain socket used for mgmt. domain socket used for mgmt.
#udsfile = /var/run/vxsnd.sock #udsfile = /var/run/vxsnd.sock
# UDP port for vxfld control # UDP port for vxfld control
messages messages
#vxfld_port = 10001 #vxfld_port = 10001
# This is the address to which # This is the address to which
registration daemons send registration daemons send
control messages for control messages for
# registration and/or BUM # registration and/or BUM
packets for replication packets for replication
svcnode_ip = 10.10.10.10 svcnode_ip = 10.10.10.10
# Holdtime (in seconds) for # Holdtime (in seconds) for
soft state. It is used when soft state. It is used when
sending a sending a
# register msg to peers in # register msg to peers in
response to learning a <vni, response to learning a <vni,
addr> from a addr> from a
# VXLAN data pkt # VXLAN data pkt
#holdtime = 90 #holdtime = 90
# Local IP address to bind to # Local IP address to bind to
for receiving inter-vxsnd for receiving inter-vxsnd
control traffic control traffic
src_ip = 10.0.0.21 src_ip = 10.0.0.22
[vxsnd] [vxsnd]
# Space separated list of IP # Space separated list of IP
addresses of vxsnd to share addresses of vxsnd to share
state with state with
svcnode_peers = 10.0.0.21 svcnode_peers = 10.0.0.21
10.0.0.22 10.0.0.22
Node ID
vxrd requires a unique node_id for each individual switch. This node_id is based off the first interface's
MAC address; when using certain virtual topologies like Vagrant, both leaf switches within an MLAG pair can
generate the same exact unique node_id. You must configure one of the node_ids manually (or make
sure the first interface always has a unique MAC address), as they are not unique.
To verify the node_id that gets configured by your switch, use the vxrdctl get config command:
cumulusnetworks.com 529
Cumulus Linux 3.7 User Guide
"logfilesize": 512000,
"loglevel": "INFO",
"max_packet_size": 1500,
"node_id": 13,
"pidfile": "/var/run/vxrd.pid",
"refresh_rate": 3,
"src_ip": "10.2.1.50",
"svcnode_ip": "10.10.10.10",
"udsfile": "/var/run/vxrd.sock",
"vxfld_port": 10001
}
[common]
node_id = 13
Ensure that each leaf has a separate node_id so that LNV can function correctly.
auto swp49
iface swp49
#for vagrant so bonds work correctly
post-up ip link set $IFACE promisc on
auto swp50
iface swp50
#for vagrant so bonds work correctly
post-up ip link set $IFACE promisc on
For more information on using Cumulus VX and Vagrant, refer to the Cumulus VX documentation.
Troubleshooting
In addition to troubleshooting for single-attached LNV, there is now the MLAG daemon (clagd) to consider.
The clagctl command gives the output of MLAG behavior and any inconsistencies that might arise
between a MLAG pair.
cumulus@leaf01$ clagctl
The peer is alive
Our Priority, ID, and Role: 32768 44:38:39:00:00:35 primary
Peer Priority, ID, and Role: 32768 44:38:39:00:00:36 secondary
Peer Interface and IP: peerlink.4094 169.254.1.2
VxLAN Anycast IP: 10.10.10.30
Backup IP: 10.0.0.14 (inactive)
System MAC: 44:38:39:ff:40:95
CLAG Interfaces
Our Interface Peer Interface CLAG Id
Conflicts Proto-Down Reason
---------------- ---------------- -------
-------------------- -----------------
bond0 bond0 1
- -
vxlan20 vxlan20 -
- -
vxlan1 vxlan1 -
- -
vxlan10 vxlan10 -
- -
Output Explanation
VXLAN Anycast IP: The anycast IP address being shared by the MLAG pair for VTEP termination is in
10.10.10.30 use and is 10.10.10.30.
Conflicts: -
Proto-Down
Reason: -
In the next example the vxlan-id on VXLAN10 is switched to the wrong vxlan-id. When the clagctl
command is run, you see that VXLAN10 goes down because this switch is the secondary switch and the
peer switch takes control of VXLAN. The reason code is vxlan-single indicating that there is a vxlan-id
mis-match on VXLAN10
cumulus@leaf02$ clagctl
The peer is alive
Peer Priority, ID, and Role: 32768 44:38:39:00:00:11 primary
cumulusnetworks.com 531
Cumulus Linux 3.7 User Guide
Related Information
Network virtualization chapter, Cumulus Linux user guide (see page 476)
LNV is a lightweight controller option. Contact Cumulus Networks with your scale requirements so
we can make sure this is the right fit for you. There are also other controller options that can work
on Cumulus Linux.
Contents
This topic describes ...
Example LNV Configuration (see page 533)
Layer 3 IP Addressing (see page 533)
Want to try out configuring LNV and do not have a Cumulus Linux switch? Check out Cumulus VX .
Feeling Overwhelmed? Come join a Cumulus Boot Camp and get instructor-led training!
Layer 3 IP Addressing
Here is the configuration for the IP addressing information used in this example:
auto lo auto lo
iface lo inet loopback iface lo inet loopback
address 10.2.1.3/32 address 10.2.1.4/32
address 10.10.10.10/32 address 10.10.10.10/32
cumulusnetworks.com 533
Cumulus Linux 3.7 User Guide
auto lo auto lo
iface lo inet loopback iface lo inet loopback
address 10.2.1.1/32 address 10.2.1.2/32
vxrd-src-ip 10.2.1.1 vxrd-src-ip 10.2.1.2
vxrd-svcnode-ip 10.10.10.10 vxrd-svcnode-ip 10.10.10.10
vxlan-id 10 vxlan-id 10
vxlan-local-tunnelip 10.2.1.1 vxlan-local-tunnelip 10.2.1.2
mstpctl-bpduguard yes mstpctl-bpduguard yes
mstpctl-portbpdufilter yes mstpctl-portbpdufilter yes
FRRouting Configuration
The service nodes and registration nodes must all be routable between each other. The layer 3 fabric on
Cumulus Linux can either be BGP (see page 756) or OSPF (see page 738). In this example, OSPF is used to
demonstrate full reachability.
Here is the FRRouting configuration using OSPF:
interface lo interface lo
ip ospf area 0.0.0.0 ip ospf area 0.0.0.0
interface swp49 interface swp49
ip ospf network point-to- ip ospf network point-to-
point point
cumulusnetworks.com 535
Cumulus Linux 3.7 User Guide
interface lo interface lo
ip ospf area 0.0.0.0 ip ospf area 0.0.0.0
interface swp1s0 interface swp1s0
ip ospf network point-to- ip ospf network point-to-
point point
ip ospf area 0.0.0.0 ip ospf area 0.0.0.0
! !
interface swp1s1 interface swp1s1
ip ospf network point-to- ip ospf network point-to-
point point
ip ospf area 0.0.0.0 ip ospf area 0.0.0.0
! !
interface swp1s2 interface swp1s2
ip ospf network point-to- ip ospf network point-to-
point point
ip ospf area 0.0.0.0 ip ospf area 0.0.0.0
! !
interface swp1s3 interface swp1s3
ip ospf network point-to- ip ospf network point-to-
point point
ip ospf area 0.0.0.0 ip ospf area 0.0.0.0
! !
! !
! !
! !
! !
router-id 10.2.1.1 router-id 10.2.1.2
router ospf router ospf
ospf router-id 10.2.1.1 ospf router-id 10.2.1.2
Host Configuration
In this example, the servers are running Ubuntu 14.04. You must map a trunk from server1 and server2 to
the respective switch. In Ubuntu, this is done with subinterfaces.
server1 server2
cumulusnetworks.com 537
Cumulus Linux 3.7 User Guide
spine1:/etc/vxsnd.conf spine2:/etc/vxsnd.conf
[common] [common]
# Log level is one of DEBUG, # Log level is one of DEBUG,
INFO, WARNING, ERROR, CRITICAL INFO, WARNING, ERROR, CRITICAL
#loglevel = INFO #loglevel = INFO
# Destination for log # Destination for log
message. Can be a file name, ' message. Can be a file name, '
stdout', or 'syslog' stdout', or 'syslog'
#logdest = syslog #logdest = syslog
# log file size in bytes. Used # log file size in bytes. Used
when logdest is a file when logdest is a file
#logfilesize = 512000 #logfilesize = 512000
# maximum number of log files # maximum number of log files
stored on disk. Used when stored on disk. Used when
logdest is a file logdest is a file
#logbackupcount = 14 #logbackupcount = 14
# The file to write the pid. # The file to write the pid.
If using monit, this must If using monit, this must
match the one match the one
# in the vxsnd.rc # in the vxsnd.rc
#pidfile = /var/run/vxsnd.pid #pidfile = /var/run/vxsnd.pid
# The file name for the unix # The file name for the unix
domain socket used for mgmt. domain socket used for mgmt.
#udsfile = /var/run/vxsnd.sock #udsfile = /var/run/vxsnd.sock
# UDP port for vxfld control # UDP port for vxfld control
messages messages
#vxfld_port = 10001 #vxfld_port = 10001
# This is the address to which # This is the address to which
registration daemons send registration daemons send
control messages for control messages for
# registration and/or BUM # registration and/or BUM
packets for replication packets for replication
svcnode_ip = 10.10.10.10 svcnode_ip = 10.10.10.10
# Holdtime (in seconds) for # Holdtime (in seconds) for
soft state. It is used when soft state. It is used when
sending a sending a
# register msg to peers in # register msg to peers in
response to learning a <vni, response to learning a <vni,
addr> from a addr> from a
# VXLAN data pkt # VXLAN data pkt
#holdtime = 90 #holdtime = 90
# Local IP address to bind to f # Local IP address to bind to f
or receiving inter-vxsnd or receiving inter-vxsnd
control traffic control traffic
src_ip = 10.2.1.3 src_ip = 10.2.1.4
[vxsnd] [vxsnd]
# Space separated list of IP # Space separated list of IP
addresses of vxsnd to share addresses of vxsnd to share
state with state with
svcnode_peers = 10.2.1.4 svcnode_peers = 10.2.1.3
# When set to true, the # When set to true, the
service node will listen for service node will listen for
vxlan data traffic vxlan data traffic
# Note: Use 1, yes, true, or # Note: Use 1, yes, true, or
on, for True and 0, no, false, on, for True and 0, no, false,
or off, or off,
# for False # for False
#enable_vxlan_listen = true #enable_vxlan_listen = true
# When set to true, the # When set to true, the
svcnode_ip will be installed svcnode_ip will be installed
on the loopback on the loopback
# interface, and it will be # interface, and it will be
withdrawn when the vxsnd is no withdrawn when the vxsnd is no
longer in longer in
# service. If set to true, # service. If set to true,
the svcnode_ip configuration the svcnode_ip configuration
# variable must be defined. # variable must be defined.
# Note: Use 1, yes, true, or # Note: Use 1, yes, true, or
on, for True and 0, no, false, on, for True and 0, no, false,
or off, or off,
# for False # for False
#install_svcnode_ip = false #install_svcnode_ip = false
# Seconds to wait before # Seconds to wait before
checking the database to age checking the database to age
out stale entries out stale entries
#age_check = 90 #age_check = 90
Related Information
tools.ietf.org/html/rfc7348
en.wikipedia.org/wiki/Anycast
Detailed LNV Configuration Guide (see page 487)
Cumulus Networks Training
Network virtualization chapter, Cumulus Linux user guide (see page 476)
cumulusnetworks.com 539
Cumulus Linux 3.7 User Guide
Ethernet Virtual Private Network (EVPN) is a standards-based control plane for VXLAN (see page 476)
defined in RFC 7432 and draft-ietf-bess-evpn-overlay that allows for building and deploying VXLANs at scale.
It relies on multi-protocol BGP (MP-BGP) for exchanging information and is based on BGP-MPLS IP VPNs (
RFC 4364). It has provisions to enable not only bridging between end systems in the same layer 2 segment
but also routing between different segments (subnets). There is also inherent support for multi-tenancy.
EVPN is often referred to as the means of implementing controller-less VXLAN.
Cumulus Linux fully supports EVPN as the control plane for VXLAN, including for both intra-subnet bridging
and inter-subnet routing. Key features include:
VNI membership exchange between VTEPs using EVPN type-3 (Inclusive multicast Ethernet tag)
routes.
Exchange of host MAC and IP addresses using EVPN type-2 (MAC/IP advertisement) routes.
Support for host/VM mobility (MAC and IP moves) through exchange of the MAC Mobility Extended
community.
Support for dual-attached hosts via VXLAN active-active mode (see page 515). MAC synchronization
between the peer switches is done using MLAG (see page 427).
Support for ARP/ND suppression, which provides VTEPs with the ability to suppress ARP flooding
over VXLAN tunnels.
Support for exchange of static (sticky) MAC addresses through EVPN.
Support for distributed symmetric routing between different subnets.
Support for distributed asymmetric routing between different subnets.
Support for centralized routing.
Support for prefix-based routing using EVPN type-5 routes (EVPN IP prefix route)
Support for layer 3 multi-tenancy.
Support for IPv6 tenant routing.
Symmetric routing, asymmetric routing and prefix-based routing are supported for both IPv4 and
IPv6 hosts and prefixes.
ECMP support for overlay networks on RIOT-capable Broadcom switches (Trident 3, Maverick,
Trident 2+) in addition to Mellanox Spectrum-A1 and Tomahawk switches.
EVPN address-family is supported with both eBGP and iBGP peering. If the underlay routing is provisioned
using eBGP, the same eBGP session can also be used to carry EVPN routes. For example, in a typical 2-tier
Clos network topology where the leaf switches are the VTEPs, if eBGP sessions are in use between the leaf
and spine switches for the underlay routing, the same sessions can be used to exchange EVPN routes; the
spine switches merely act as "route forwarders" and do not install any forwarding state as they are not
VTEPs. When EVPN routes are exchanged over iBGP peering, OSPF can be used as the IGP or the next hops
can also be resolved using iBGP.
You can provision and manage EVPN using NCLU (see page 88).
For Cumulus Linux 3.4 and later releases, the routing control plane (including EVPN) is installed as
part of the FRRouting (FRR) package. For more information about FRR, refer to the FRR Overview
(see page 713).
For information about VXLAN routing, including platform and hardware limitations, see VXLAN Routing (see
540 09 January 2019
Cumulus Networks
For information about VXLAN routing, including platform and hardware limitations, see VXLAN Routing (see
page 638).
Contents
This topic describes ...
Basic EVPN Configuration (see page 542)
Enable EVPN between BGP Neighbors (see page 542)
Advertise All VNIs (see page 543)
Auto-derivation of RDs and RTs (see page 544)
User-defined RDs and RTs (see page 544)
Enable EVPN in an iBGP Environment with an OSPF Underlay (see page 545)
Disable Data Plane MAC Learning over VXLAN Tunnels (see page 547)
BUM Traffic (see page 547)
ARP and ND Suppression (see page 547)
UFT Profiles Other than the Default (see page 549)
Support for EVPN Neighbor Discovery (ND) Extended Community (see page 550)
EVPN and VXLAN Active-active Mode (see page 551)
Active-active VTEP Anycast IP Behavior (see page 551)
Failure Scenario Behaviors (see page 551)
Inter-subnet Routing (see page 552)
Centralized Routing (see page 553)
Asymmetric Routing (see page 554)
Symmetric Routing (see page 554)
Prefix-based Routing — EVPN Type-5 Routes (see page 558)
Configure the Switch to Install EVPN Type-5 Routes (see page 558)
Announce EVPN Type-5 Routes (see page 558)
EVPN Type-5 Routing with Asymmetric Routing (see page 559)
Control Which RIB Routes Are Injected into EVPN (see page 559)
Originate Default EVPN Type-5 Routes (see page 560)
EVPN Enhancements (see page 560)
Static (Sticky) MAC Addresses (see page 560)
Filter EVPN Routes Based on Type (see page 561)
Extended Mobility (see page 561)
Duplicate Address Detection (see page 562)
EVPN Operational Commands (see page 566)
General Linux Commands Related to EVPN (see page 566)
General BGP Operational Commands Relevant to EVPN (see page 567)
Display EVPN address-family Peers (see page 571)
Display VNIs in EVPN (see page 571)
cumulusnetworks.com 541
Cumulus Linux 3.7 User Guide
Display VNIs in EVPN (see page 571)
Examine Local and Remote MAC Addresses for a VNI in EVPN (see page 572)
Examine Local and Remote Neighbors for a VNI in EVPN (see page 573)
Examine Remote Router MACs in EVPN (see page 573)
Examine Gateway Next Hops in EVPN (see page 574)
Display the VRF Routing Table in FRR (see page 574)
Display the Global BGP EVPN Routing Table (see page 575)
Display a Specific EVPN Route (see page 576)
Display the per-VNI EVPN Routing Table (see page 578)
Display the per-VRF BGP Routing Table (see page 578)
Examine MAC Moves (see page 579)
Examine Sticky MAC Addresses (see page 580)
Troubleshooting (see page 580)
Caveats (see page 581)
Example Configurations (see page 582)
Basic Clos (4x2) for Bridging (see page 583)
Clos Configuration with MLAG and Centralized Routing (see page 594)
Clos Configuration with MLAG and EVPN Asymetric Routing (see page 607)
Basic Clos Configuration with EVPN Symmetric Routing (see page 620)
1. Enable EVPN route exchange (that is, address-family layer 2 VPN/EVPN) between BGP peers.
2. Enable EVPN on the system to advertise VNIs and host reachability information (MAC addresses
learned on associated VLANs) to BGP peers.
3. Disable MAC learning on VXLAN interfaces as EVPN is responsible for installing remote MACs.
Additional configuration is necessary to enable ARP/ND suppression, provision inter-subnet routing, and so
on. The configuration depends on the deployment scenario. You can also configure various other BGP
parameters.
To configure an EVPN route exchange with a BGP peer, you must activate the peer or peer-group within the
542 09 January 2019
Cumulus Networks
To configure an EVPN route exchange with a BGP peer, you must activate the peer or peer-group within the
EVPN address-family:
The command syntax bgp evpn is also permitted for backwards compatibility with prior versions
of Cumulus Linux, but the syntax bgp l2vpn evpn is recommended to standardize the BGP
address-family configuration to the AFI/SAFI format.
The above commands create the following configuration snippet in the /etc/frr/frr.conf file.
The above configuration does not result in BGP knowing about the local VNIs defined on the system and
advertising them to peers. This requires additional configuration, as described below (see page 543).
cumulusnetworks.com 543
Cumulus Linux 3.7 User Guide
The above commands create the following configuration snippet in the /etc/frr/frr.conf file.
This configuration is only needed on leaf switches that are VTEPs. EVPN routes received from a
BGP peer are accepted, even without this explicit EVPN configuration. These routes are
maintained in the global EVPN routing table. However, they only become effective (that is,
imported into the per-VNI routing table and appropriate entries installed in the kernel) when the
VNI corresponding to the received route is locally known.
These commands create the following configuration snippet in the /etc/frr/frr.conf file.
These commands are per VNI and must be specified under address-family l2vpn evpn in
BGP.
If you delete the RD or RT later, it reverts back to its corresponding default value.
You can configure multiple RT values for import or export for a VNI. In addition, you can configure both the
import and export route targets with a single command by using route-target both:
cumulus@switch:~$ net add bgp evpn vni 10400 route-target import 100:
400
cumulus@switch:~$ net add bgp evpn vni 10400 route-target import 100:
500
cumulus@switch:~$ net add bgp evpn vni 10500 route-target both 65000:
500
cumulus@switch:~$ net pending
cumulus@switch:~$ net commit
The above commands create the following configuration snippet in the /etc/frr/frr.conf file:
cumulusnetworks.com 545
Cumulus Linux 3.7 User Guide
These commands create the following configuration snippet in the /etc/frr/frr.conf file.
interface lo
ip ospf area 0.0.0.0
!
interface swp50
ip ospf area 0.0.0.0
ip ospf network point-to-point
interface swp51
ip ospf area 0.0.0.0
ip ospf network point-to-point
!
router bgp 65020
neighbor 10.1.1.2 remote-as internal
neighbor 10.1.1.3 remote-as internal
neighbor 10.1.1.4 remote-as internal
!
address-family l2vpn evpn
neighbor 10.1.1.2 activate
neighbor 10.1.1.3 activate
neighbor 10.1.1.4 activate
advertise-all-vni
exit-address-family
!
Router ospf
Ospf router-id 10.1.1.1
Passive-interface lo
These commands create the following code snippet in the /etc/network/interfaces file:
auto vni200
iface vni200
bridge-access 200
bridge-learning off
vxlan-id 10200
vxlan-local-tunnelip 10.0.0.1
For a bridge in traditional mode (see page 414), you must edit the bridge configuration in the
/etc/network/interfaces file using a text editor:
auto bridge1
iface bridge1
bridge-ports swp3.100 swp4.100 vni100
bridge-learning vni100=off
BUM Traffic
With EVPN, the only method of handling BUM traffic is Head End Replication (HER) (see page 489). HER is
enabled by default, as it is when Lightweight Network Virtualization (LNV) is used.
On switches with the Mellanox Spectrum chipset, ND suppression only functions with the
cumulusnetworks.com 547
Cumulus Linux 3.7 User Guide
On switches with the Mellanox Spectrum chipset, ND suppression only functions with the
Spectrum A1 chip.
ARP and ND suppression are not enabled by default. You configure ARP/ND suppression on a VXLAN
interface. You also need to create an SVI for the neighbor entry.
When ARP and ND suppression are enabled, you need to configure layer 3 interfaces even if the
switch is configured only for layer 2 (that is, you are not using VXLAN routing). To avoid
unnecessary layer 3 information from being installed, Cumulus Networks recommends you
configure the ip forward off or ip6 forward off options as appropriate on the VLANs.
See the example configuration below.
To configure ARP or ND suppression, use NCLU (see page 88). Here is an example configuration using two
VXLANs (10100 and 10200) and two VLANs (100 and 200).
auto bridge
iface bridge
bridge-ports vni100 vni200
bridge-stp on
bridge-vids 100 200
bridge-vlan-aware yes
auto vlan100
iface vlan100
ip6-forward off
ip-forward off
vlan-id 100
vlan-raw-device bridge
auto vlan200
iface vlan200
ip6-forward off
ip-forward off
vlan-id 200
vlan-raw-device bridge
auto vni100
iface vni100
bridge-access 100
bridge-arp-nd-suppress on
bridge-learning off
vxlan-id 10100
vxlan-local-tunnelip 10.0.0.1
auto vni200
iface vni200
bridge-learning off
bridge-access 200
bridge-arp-nd-suppress on
vxlan-id 10200
vxlan-local-tunnelip 10.0.0.1
For a bridge in traditional mode (see page 414), you must edit the bridge configuration in the
/etc/network/interfaces file using a text editor:
auto bridge1
iface bridge1
bridge-ports swp3.100 swp4.100 vni100
bridge-learning vni100=off
bridge-arp-nd-suppress vni100=on
ip6-forward off
ip-forward off
cumulusnetworks.com 549
Cumulus Linux 3.7 User Guide
...
net.ipv4.neigh.default.gc_thresh3=14336
net.ipv6.neigh.default.gc_thresh3=16384
net.ipv4.neigh.default.gc_thresh2=7168
net.ipv6.neigh.default.gc_thresh2=8192
...
After you save your settings, reboot the switch to apply the new configuration.
Router Flag
The router flag (R-bit) is used in following scenarios:
In a centralized VXLAN routing configuration with a gateway router.
In a layer 2 switch deployment with ARP/ND suppression.
When the MAC/IP (type-2) route contains the IPv6-MAC pair and the R-bit is set, the route belongs to a
router. If the R-bit is set to zero, the route belongs to a host. If the router is in a local LAN segment, the
switch implementing the proxy ND function learns of this information by snooping on neighbor
advertisement messages for the associated IPv6 address. This information is then exchanged with other
EVPN peers by using the ND extended community in BGP updates.
To show the EVPN arp-cache that gets populated by the neighbor table and see if the IPv6-MAC entry
belongs to a router, run this command:
To show the BGP routing table entry for the IPv6-MAC EVPN route with the ND extended community, run
this command:
cumulus@switch:mgmt-vrf:~$ net show bgp l2vpn evpn route vni 101 mac
00:02:00:00:00:11 ip fe80::202:ff:fe00:11
1 When the switches boot up, ifupdown2 places all VXLAN interfaces in a PROTO_DOWN state (see
page ). The configured anycast addresses are not yet configured.
2 MLAG peering takes place and a successful VXLAN interface consistency check between the switches
occurs.
3 clagd (the daemon responsible for MLAG) adds the anycast address to the loopback interface. It
then changes the local IP address of the VXLAN interface from a unique address to the anycast virtual
IP address and puts the interface in an UP state.
Scenario Behavior
The peer link The primary MLAG switch continues to keep all VXLAN interfaces up with the anycast
goes down. IP address while the secondary switch brings down all VXLAN interfaces and places
them in a PROTO_DOWN state. The secondary MLAG switch removes the anycast IP
address from the loopback interface and changes the local IP address of the VXLAN
interface to the configured unique IP address.
One of the The other operational switch continues to use the anycast IP address.
switches goes
down.
clagd is All VXLAN interfaces are put in a PROTO_DOWN state. The anycast IP address is
stopped. removed from the loopback interface and the local IP addresses of the VXLAN
interfaces are changed from the anycast IP address to unique non-virtual IP
addresses.
MLAG peering clagd brings up all the VXLAN interfaces after the reload timer expires with the
could not be configured anycast IP address. This allows the VXLAN interface to be up and running
established on both switches even though peering is not established.
between the
switches.
When the peer All VXLAN interfaces are put into a PROTO_DOWN state on the secondary switch.
link goes down
but the peer
switch is up (i.e.
the backup link is
active).
A configuration The VXLAN interface is placed into a PROTO_DOWN state on the secondary switch.
mismatch
between the
MLAG switches
Inter-subnet Routing
There are multiple models in EVPN for routing between different subnets (VLANs), also known as inter-
VLAN routing. These models arise due to the following considerations:
Does every VTEP act as a layer 3 gateway and do routing, or only specific VTEPs do routing?
Is routing done only at the ingress of the VXLAN tunnel or is it done at both the ingress and the
egress of the VXLAN tunnel?
These models are:
Centralized routing: Specific VTEPs act as designated layer 3 gateways and perform routing
between subnets; other VTEPs just perform bridging.
Distributed asymmetric routing: Every VTEP participates in routing, but all routing is done at the
ingress VTEP; the egress VTEP only performs bridging.
Distributed symmetric routing: Every VTEP participates in routing and routing is done at both
552 09 January 2019
Cumulus Networks
Distributed symmetric routing: Every VTEP participates in routing and routing is done at both
the ingress VTEP and the egress VTEP.
Distributed routing — asymmetric or symmetric — is commonly deployed with the VTEPs configured with
an anycast IP/MAC address for each subnet. That is, each VTEP that has a particular subnet is configured with
the same IP/MAC for that subnet. Such a model facilitates easy host/VM mobility as there is no need to
change the host/VM configuration when it moves from one VTEP to another.
EVPN in Cumulus Linux supports all of the routing models listed above. The models are described further in
the following sections.
All routing happens in the context of a tenant VRF (virtual routing and forwarding (see page 830)). A VRF
instance is provisioned for each tenant, and the subnets of the tenant are associated with that VRF (the
corresponding SVI is attached to the VRF). Inter-subnet routing for each tenant occurs within the context of
that tenant's VRF and is separate from the routing for other tenants.
When configuring VXLAN routing (see page 638), Cumulus Networks recommends enabling ARP
suppression on all VXLAN interfaces. Otherwise, when a locally attached host ARPs for the
gateway, it will receive multiple responses, one from each anycast gateway.
Centralized Routing
In centralized routing, a specific VTEP is configured to act as the default gateway for all the hosts in a
particular subnet throughout the EVPN fabric. It is common to provision a pair of VTEPs in active-active
mode as the default gateway, using an anycast IP/MAC address for each subnet. All subnets need to be
configured on such gateway VTEP(s). When a host in one subnet wants to communicate with a host in
another subnet, it addresses the packets to the gateway VTEP. The ingress VTEP (to which the source host
is attached) bridges the packets to the gateway VTEP over the corresponding VXLAN tunnel. The gateway
VTEP performs the routing to the destination host and post-routing, the packet gets bridged to the egress
VTEP (to which the destination host is attached). The egress VTEP then bridges the packet on to the
destination host.
These commands create the following configuration snippet in the /etc/frr/frr.conf file.
cumulusnetworks.com 553
Cumulus Linux 3.7 User Guide
You can deploy centralized routing at the VNI level. Therefore, you can configure the
advertise-default-gw command per VNI so that centralized routing is used for some
VNIs while distributed routing (described below) is used for other VNIs. This type of
configuration is not recommended unless the deployment requires it.
When centralized routing is in use, even if the source host and destination host are
attached to the same VTEP, the packets travel to the gateway VTEP to get routed and then
come back.
Asymmetric Routing
In distributed asymmetric routing, each VTEP acts as a layer 3 gateway, performing routing for its attached
hosts. The routing is called asymmetric because only the ingress VTEP performs routing, the egress VTEP
only performs the bridging. Asymmetric routing is easy to deploy as it can be achieved with only host
routing and does not involve any interconnecting VNIs. However, each VTEP must be provisioned with all
VLANs/VNIs — the subnets between which communication can take place; this is required even if there are
no locally-attached hosts for a particular VLAN.
The only additional configuration required to implement asymmetric routing beyond the standard
configuration for a layer 2 VTEP described earlier is to ensure that each VTEP has all VLANs (and
corresponding VNIs) provisioned on it and the SVI for each such VLAN is configured with an
anycast IP/MAC address.
Symmetric Routing
In distributed symmetric routing, each VTEP acts as a layer 3 gateway, performing routing for its attached
hosts. This is the same as in asymmetric routing. The difference is that with symmetric routing, both the
ingress VTEP and egress VTEP route the packets. Therefore, it can be compared to the traditional routing
behavior of routing to a next hop router. In the VXLAN encapsulated packet, the inner destination MAC
address is set to the router MAC address of the egress VTEP as an indication that the egress VTEP is the
next hop and also needs to perform routing. All routing happens in the context of a tenant (VRF). For a
packet received by the ingress VTEP from a locally attached host, the SVI interface corresponding to the
VLAN determines the VRF. For a packet received by the egress VTEP over the VXLAN tunnel, the VNI in the
packet has to specify the VRF. For symmetric routing, this is a VNI corresponding to the tenant and is
different from either the source VNI or the destination VNI. This VNI is referred to as the layer 3 VNI or
interconnecting VNI; it has to be provisioned by the operator and is exchanged through the EVPN control
plane. In order to make the distinction clear, the regular VNI, which is used to map a VLAN, is referred to as
the layer 2 VNI.
L3-VNI
There is a one-to-one mapping between a layer 3 VNI and a tenant (VRF).
The VRF to layer 3 VNI mapping has to be consistent across all VTEPs. The layer 3 VNI has
to be provisioned by the operator.
Layer 3 VNI and layer 2 VNI cannot share the same number space.
In an EVPN symmetric routing configuration, when a type-2 (MAC/IP) route is announced, in addition to
containing two VNIs (the layer 2 VNI and the layer 3 VNI), the route also contains separate RTs for layer 2
and layer 3. The layer 3 RT associates the route with the tenant VRF. By default, this is auto-derived in a
similar way to the layer 2 RT, using the layer 3 VNI instead of the layer 2 VNI; however you can also explicitly
configure it.
For EVPN symmetric routing, additional configuration is required:
1. Configure a per-tenant VXLAN interface that specifies the layer 3 VNI for the tenant. This VXLAN
interface is part of the bridge and router MAC addresses of remote VTEPs is installed over this
interface.
2. Configure an SVI (layer 3 interface) corresponding to the per-tenant VXLAN interface. This is attached
to the tenant's VRF. Remote host routes for symmetric routing are installed over this SVI.
3. Specify the mapping of VRF to layer 3 VNI. This configuration is for the BGP control plane.
The above commands create the following snippet in the /etc/network/interfaces file:
auto vni104001
iface vni104001
bridge-access 4001
bridge-arp-nd-suppress on
bridge-learning off
vxlan-id 104001
vxlan-local-tunnelip 10.0.0.11
auto bridge
iface bridge
bridge-ports vni104001
bridge-vlan-aware yes
cumulusnetworks.com 555
Cumulus Linux 3.7 User Guide
auto vlan4001
iface vlan4001
vlan-id 4001
vlan-raw-device bridge
vrf turtle
When two VTEPs are operating in VXLAN active-active mode and performing symmetric routing,
you need to configure the router MAC corresponding to each layer 3 VNI to ensure both VTEPs
use the same MAC address. Specify the hwaddress (MAC address) for the SVI corresponding to
the layer 3 VNI. Use the same address on both switches in the MLAG pair. Cumulus Networks
recommends you use the MLAG system MAC address.
auto vlan4001
iface vlan4001
hwaddress 44:39:39:FF:40:94
vlan-id 4001
vlan-raw-device bridge
vrf turtle
These commands create the following configuration snippet in the /etc/frr/frr.conf file.
vrf turtle
vni 104001
!
These commands create the following configuration snippet in the /etc/frr/frr.conf file:
The tenant VRF RD and RTs are different from the RD and RTs for the layer 2 VNI, which are
described in Auto-derivation of RDs and RTs (see page 544) and User-defined RDs and RTs (see
page 544), above.
1. Enable advertisement of EVPN prefix (type-5) routes. Refer to Prefix-based Routing — EVPN Type-5
Routes (see page 558), below.
2. Ensure that the routes corresponding to the connected subnets are known in the BGP VRF routing
table by injecting them using the network command or redistributing them using the
redistribute connected command.
This configuration is recommended only if the deployment is known to have silent hosts. It is also
recommended that you enable on only one VTEP per subnet, or two for redundancy.
cumulusnetworks.com 557
Cumulus Linux 3.7 User Guide
An earlier version of this chapter referred to the advertise-subnet command. That command
is deprecated and should not be used.
When connecting to a WAN edge router to reach destinations outside the data center, it is highly
recommended that specific border/exit leaf switches be deployed to originate the type-5 routes.
On switches with the Mellanox Spectrum chipset, centralized routing, symmetric routing and
prefix-based routing only function with the Spectrum A1 chip.
If you are using a Broadcom Trident II+ switch as a border/exit leaf, see caveats (see page 581),
below for a necessary workaround; the workaround only applies to Trident II+ switches, not
Tomahawk or Spectrum.
1. Configure a per-tenant VXLAN interface that specifies the layer 3 VNI for the tenant. This VXLAN
interface is part of the bridge; router MAC addresses of remote VTEPs are installed over this
interface.
2. Configure an SVI (layer 3 interface) corresponding to the per-tenant VXLAN interface. This is attached
to the tenant's VRF. The remote prefix routes are installed over this SVI.
3. Specify the mapping of the VRF to layer 3 VNI. This configuration is for the BGP control plane.
cumulus@bl1:~$ net add bgp vrf vrf1 l2vpn evpn advertise ipv4 unicast
cumulus@bl1:~$ net pending
cumulus@bl1:~$ net commit
vrf turtle
vni 104001 prefix-routes-only
cumulus@switch:~$ net add bgp vrf turtle l2vpn evpn advertise ipv4
unicast route-map map1
cumulus@switch:~$ net pending
cumulus@switch:~$ net commit
cumulusnetworks.com 559
Cumulus Linux 3.7 User Guide
EVPN Enhancements
auto bridge
iface bridge
bridge-ports swp1 vni10101
bridge-vids 101
bridge-vlan-aware yes
post-up bridge fdb add 00:11:22:33:44:55 dev swp1 vlan 101 master
static
For a bridge in traditional mode (see page 414), you must edit the bridge configuration in the
/etc/network/interfaces file using a text editor:
auto br101
iface br101
bridge-ports swp1.101 vni10101
bridge-learning vni10101=off
post-up bridge fdb add 00:11:22:33:44:55 dev swp1.101
master static
The following example command configures EVPN to advertise type-5 routes only:
Extended Mobility
Cumulus Linux support for host and virtual machine mobility in an EVPN deployment has been enhanced
to handle scenarios where the IP to MAC binding for a host or virtual machine changes across the move.
This is referred to as extended mobility. The simple mobility scenario where a host or virtual machine with a
binding of IP1, MAC1 moves from one rack to another has been supported in previous releases of Cumulus
Linux. The EVPN enhancements support additional scenarios where a host or virtual machine with a
binding of IP1, MAC1 moves and takes on a new binding of IP2, MAC1 or IP1, MAC2. The EVPN protocol
mechanism to handle extended mobility continues to use the MAC mobility extended community and is the
same as the standard mobility procedures. Extended mobility defines how the sequence number in this
attribute is computed when binding changes occur.
Extended mobility not only supports virtual machine moves, but also a scenario where one virtual machine
shuts down and another is provisioned on a different rack that uses the IP address or the MAC address of
the previous virtual machine. For example, in an EVPN deployment with OpenStack, where virtual machines
for a tenant are provisioned and shut down very dynamically, a new virtual machine can use the same IP
address as an earlier virtual machine but with a different MAC address.
The support for extended mobility is enabled by default and does not require any additional configuration.
You can examine the sequence numbers associated with a host or virtual machine MAC address and IP
cumulusnetworks.com 561
Cumulus Linux 3.7 User Guide
You can examine the sequence numbers associated with a host or virtual machine MAC address and IP
address with NCLU commands. For example:
Although an MLAG configuration is not shown in the above illustration, duplicate address
detection is supported for MLAG.
The following example shows the syslog message that is generated when Cumulus Linux flags an IP address
as a duplicate during a remote update:
When you disable duplicate address detection, all existing duplicate address flags and configuration are
cleared.
To configure the threshold for MAC and IP address moves, run the net add bgp l2vpn evpn dup-
cumulusnetworks.com 563
Cumulus Linux 3.7 User Guide
To configure the threshold for MAC and IP address moves, run the net add bgp l2vpn evpn dup-
addr-detection max-moves <number-of-events> time <duration> command.
The following example command sets the maximum number of address moves allowed to 10 and the
duplicate address detection time interval to 1200 seconds.
You can specify max-moves to be between 2 and 1000 and time to be between 2 and 1800 seconds.
To clear the flagged duplicate addresses for all VNIs, run the following command:
When you clear the duplicate flag for a MAC address, all its associated IP addresses are also
cleared. However, you cannot clear the duplicate flag for an associated IP address if its MAC
address is still in a duplicate state.
After the MAC address is flagged as a duplicate, the net show evpn mac vni <vni_id> mac
<mac_addr> command shows:
MAC: 00:01:02:03:04:11
Remote VTEP: 172.16.0.16
Local Seq: 13 Remote Seq: 14
Duplicate, detected at Tue Nov 6 18:55:29 2018
Neighbors:
10.0.1.26 Active
To display information for a duplicate IP address, run the net show evpn arp-cache vni <vni_id>
ip <ip_addr> command. The following command example shows information for IP address 10.0.0.9 for
VNI 1001.
To show a list of MAC addresses detected as duplicate for a specific VNI or for all VNIs, run the net show
evpn mac vni <vni-id|all> duplicate command. The following example command shows a list of
duplicate MAC addresses for VNI 1001:
To show a list of IP addresses detected as duplicate for a specific VNI or for all VNIs, run the net show
evpn arp-cache vni <vni-id|all> duplicate command. The following example command shows
a list of duplicate IP addresses for VNI 1001:
To show a BGP configuration with duplicate address detection, run the net show configuration bgp
command:
cumulusnetworks.com 565
Cumulus Linux 3.7 User Guide
...
cumulus@leaf01:~$
A sample output of bridge fdb show is depicted below. Some interesting information from this output
includes:
swp3 and swp4 are access ports with VLAN ID 100. This is mapped to VXLAN interface vni100.
00:02:00:00:00:01 is a local host MAC learned on swp3.
The remote VTEPs which participate in VLAN ID 100 are 10.0.0.3, 10.0.0.4 and 10.0.0.2. This is
evident from the FDB entries with a MAC address of 00:00:00:00:00:00. These entries are used for
BUM traffic replication.
00:02:00:00:00:06 is a remote host MAC reachable over the VXLAN tunnel to 10.0.0.2.
A sample output of ip neigh show is shown below. Some interesting information from this output
includes:
172.16.120.11 is a locally-attached host on VLAN 100. It is shown twice because of the configuration
of the anycast IP/MAC on the switch.
172.16.120.42 is a remote host on VLAN 100 and 172.16.130.23 is a remote host on VLAN 200. The
MAC address of these hosts can be examined using the bridge fdb show command described
earlier to determine the VTEPs behind which these hosts are located.
cumulusnetworks.com 567
Cumulus Linux 3.7 User Guide
You can examine the underlay routing, which determines how remote VTEPs are reached. Run the net
show route command. Here is some sample output from a leaf switch:
show ip route
=============
Codes: K - kernel route, C - connected, S - static, R - RIP,
O - OSPF, I - IS-IS, B - BGP, E - EIGRP, N - NHRP,
T - Table, v - VNC, V - VNC-Direct, A - Babel, D - SHARP,
F - PBR,
> - selected route, * - FIB route
cumulus@leaf01:~$
You can view the MAC forwarding database on the switch by running the net show bridge macs
command:
cumulusnetworks.com 569
Cumulus Linux 3.7 User Guide
You can examine the EVPN information for a specific VNI in detail. The following output shows details for
the layer 2 VNI 10100 as well as for the layer 3 VNI 104001. For the layer 2 VNI, the remote VTEPs which
have that VNI are shown. For the layer 3 VNI, the router MAC and associated layer 2 VNIs are shown. The
state of the layer 3 VNI depends on the state of its associated VRF as well as the states of its underlying
VXLAN interface and SVI.
cumulusnetworks.com 571
Cumulus Linux 3.7 User Guide
Run the net show evpn mac vni all command to examine MAC addresses for all VNIs.
572 09 January 2019
Cumulus Networks
Run the net show evpn mac vni all command to examine MAC addresses for all VNIs.
You can examine the details for a specific MAC addresse or query all remote MAC addresses behind a
specific VTEP:
Run the net show evpn arp-cache vni all command to examine neighbor entries for all VNIs.
cumulusnetworks.com 573
Cumulus Linux 3.7 User Guide
Run the net show evpn rmac vni all command to examine router MACs for all layer 3 VNIs.
Run the net show evpn next-hops vni all command to examine gateway next hops for all layer 3
VNIs.
You can query a specific next hop; the output displays the remote host and prefix routes through this next
hop:
VRF vrf1:
K * 0.0.0.0/0 [255/8192] unreachable (ICMP unreachable), 1d02h42m
C * 172.16.120.0/24 is directly connected, vlan100-v0, 1d02h42m
C>* 172.16.120.0/24 is directly connected, vlan100, 1d02h42m
B>* 172.16.120.21/32 [20/0] via 10.0.0.2, vlan4001 onlink, 1d02h41m
B>* 172.16.120.22/32 [20/0] via 10.0.0.2, vlan4001 onlink, 1d02h41m
B>* 172.16.120.31/32 [20/0] via 10.0.0.3, vlan4001 onlink, 1d02h41m
B>* 172.16.120.32/32 [20/0] via 10.0.0.3, vlan4001 onlink, 1d02h41m
B>* 172.16.120.41/32 [20/0] via 10.0.0.4, vlan4001 onlink, 1d02h41m
...
In the output above, the next hops for these routes are specified by EVPN to be onlink, or reachable over
the specified SVI. This is necessary because this interface is not required to have an IP address. Even if the
interface is configured with an IP address, the next hop is not on the same subnet as it is usually the IP
address of the remote VTEP (part of the underlay IP network).
cumulusnetworks.com 575
Cumulus Linux 3.7 User Guide
You can filter the routing table based on EVPN route type. The available options are shown below:
cumulus@leaf01:~$ net show bgp l2vpn evpn route rd 10.0.0.4:3 mac 00:
02:00:00:00:10 ip 172.16.130.44
BGP routing table entry for 10.0.0.4:3:[2]:[0]:[0]:[48]:[00:02:00:00:
00:10]:[32]:[172.16.130.44]
Paths: (2 available, best #2)
Advertised to non peer-group peers:
s1(swp1) s2(swp2)
Route [2]:[0]:[0]:[48]:[00:02:00:00:00:10]:[32]:[172.16.130.44] VNI
10200/104001
65100 65004
10.0.0.4 from s2(swp2) (172.16.110.2)
Origin IGP, localpref 100, valid, external
Extended Community: RT:65004:10200 RT:65004:104001 ET:8 Rmac:00:
01:00:00:14:00
AddPath ID: RX 0, TX 97
Last update: Sun Dec 17 20:57:24 2017
Only global VNIs are supported. Even though VNI values are exchanged in the type-2 and
type-5 routes, the received values are not used when installing the routes into the
forwarding plane; the local configuration is used. You must ensure that the VLAN to VNI
mappings and the layer 3 VNI assignment for a tenant VRF are uniform throughout the
network.
If the remote host is dual attached, the next hop for the EVPN route is the anycast IP
address of the remote MLAG (see page 427) pair, when MLAG is active.
The following example shows a prefix (type-5) route. Such a route has only the layer 3 VNI and the route
target corresponding to this VNI. This route is learned through two paths, one through each spine switch.
cumulusnetworks.com 577
Cumulus Linux 3.7 User Guide
To display the VNI routing table for all VNIs, run the net show bgp l2vpn evpn route vni all
command.
cumulusnetworks.com 579
Cumulus Linux 3.7 User Guide
cumulus@switch:~$ net show bgp l2vpn evpn route vni 10109 mac 00:02:
22:22:22:02
BGP routing table entry for [2]:[0]:[0]:[48]:[00:02:22:22:22:02]
Paths: (1 available, best #1)
Not advertised to any peer
Route [2]:[0]:[0]:[48]:[00:02:22:22:22:02] VNI 10109
Local
6.0.0.184 from 0.0.0.0 (6.0.0.184)
Origin IGP, localpref 100, weight 32768, valid, sourced, local,
bestpath-from-AS Local, best
Extended Community: RT:650184:10109 ET:8 MM:3
AddPath ID: RX 0, TX 10350121
Last update: Tue Feb 14 18:40:37 2017
cumulus@switch:~$ net show bgp l2vpn evpn route vni 10101 mac 00:02:
00:00:00:01
BGP routing table entry for [2]:[0]:[0]:[48]:[00:02:00:00:00:01]
Paths: (1 available, best #1)
Not advertised to any peer
Route [2]:[0]:[0]:[48]:[00:02:00:00:00:01] VNI 10101
Local
172.16.130.18 from 0.0.0.0 (172.16.130.18)
Origin IGP, localpref 100, weight 32768, valid, sourced, local,
bestpath-from-AS Local, best
Extended Community: ET:8 RT:60176:10101 MM:0, sticky MAC
AddPath ID: RX 0, TX 46
Last update: Tue Apr 11 21:44:02 2017
Troubleshooting
To troubleshoot EVPN, enable FRR debug logs. The relevant debug options are as follows:
debug zebra vxlan traces VNI addition and deletion (local and remote) as well as MAC and
neighbor addition and deletion (local and remote).
debug zebra kernel traces actual netlink messages exchanged with the kernel, which includes
everything, not just EVPN.
debug bgp updates traces BGP update exchanges, including all updates. Output is extended to
show EVPN specific information.
debug bgp zebra traces interactions between BGP and zebra for EVPN (and other) routes.
580 09 January 2019
Cumulus Networks
debug bgp zebra traces interactions between BGP and zebra for EVPN (and other) routes.
Caveats
The following caveats apply to EVPN in this version of Cumulus Linux:
When EVPN is enabled on a switch (VTEP), all locally defined VNIs on that switch and other
information (such as MAC addresses) pertaining to them are advertised to EVPN peers. There is no
provision to only announce certain VNIs.
In a VXLAN active-active (see page 515) configuration, ARPs are sometimes not suppressed even if
ARP suppression is enabled. This is because the neighbor entries are not synchronized between the
two switches operating in active-active mode by a control plane. This has no impact on forwarding.
You must configure the overlay (tenants) in a specific VRF(s) and separate from the underlay, which
resides in the default VRF. A layer 3 VNI mapping for the default VRF is not supported.
On the Broadcom Trident II+, Trident 3, and Maverick-based switch, when a lookup is done after
VXLAN decapsulation on the external-facing switch (exit/border leaf), the switch does not rewrite the
MAC addresses or TTL; for through traffic, packets are dropped by the next hop instead of correctly
routing from a VXLAN overlay network into a non-VXLAN external network (such as the Internet).
This affects all traffic from VXLAN overlay hosts that need to be routed after VXLAN decapsulation
on an exit/border leaf, including traffic destined to external networks (through traffic) and traffic
destined to the exit leaf SVI address.
To work around this issue, modify the external-facing interface for each VLAN sub-interface on the
exit leaf by creating a temporary VNI and associating it with the existing VLAN ID.
For example, if the expected interface configuration is:
auto swp3.2001
iface swp3.2001
vrf vrf1
address 10.0.0.2/24
# where swp3 is the external facing port and swp3.2001 is the
VLAN sub-interface
auto bridge
iface bridge
bridge-vlan-aware yes
bridge ports vx-4001
bridge-vids 4001
auto vx-4001
iface vx-4001
vxlan-id 4001
<... usual vxlan config ...>
bridge-access 4001
# where vnid 4001 represents the L3 VNI
auto vlan4001
iface vlan4001
vlan-id 4001
vlan-raw-device bridge
vrf vrf1
cumulusnetworks.com 581
Cumulus Linux 3.7 User Guide
auto swp3
iface swp3
bridge-access 2001
# associate the port (swp3) with bridge 2001
auto bridge
iface bridge
bridge-vlan-aware yes
bridge ports swp3 vx-4001 vx-16000000
bridge-vids 4001 2001
# where vx-4001 is the existing VNI and vx-16000000 is a new
temporary VNI
# this is now bridging the port (swp3), the VNI (vx-4001),
# and the new temporary VNI (vx-16000000)
# the bridge VLAN IDs are now 4001 and 2001
auto vlan2001
iface vlan2001
vlan-id 2001
vrf vrf1
address 10.0.0.2/24
vlan-raw-device bridge
# create a VLAN 2001 with the associated VRF and IP address
auto vx-16000000
iface vx-16000000
vxlan-id 16000000
bridge-access 2001
<... usual vxlan config ...>
# associate the temporary VNI (vx-16000000) with bridge 2001
auto vx-4001
iface vx-4001
vxlan-id 4001
<... usual vxlan config ...>
bridge-access 4001
# where vnid 4001 represents the L3 VNI
auto vlan4001
iface vlan4001
vlan-id 4001
vlan-raw-device bridge
vrf vrf1
If an MLAG pair is used instead of a single exit/border leaf, add the same temporary VNIs on both
switches of the MLAG pair.
Example Configurations
582 09 January 2019
Cumulus Networks
Example Configurations
Basic Clos (4x2) for bridging
Clos with MLAG and centralized routing
Clos with MLAG and asymmetric routing
Basic Clos with symmetric routing and exit leafs
cumulusnetworks.com 583
Cumulus Linux 3.7 User Guide
cumulusnetworks.com 585
Cumulus Linux 3.7 User Guide
! !
address-family ipv6 unicast address-family ipv6 unicast
redistribute connected redistribute connected
neighbor peerlink-3.4094 neighbor peerlink-3.4094
activate activate
neighbor uplink-1 activate neighbor uplink-1 activate
neighbor uplink-2 activate neighbor uplink-2 activate
exit-address-family exit-address-family
! !
address-family l2vpn evpn address-family l2vpn evpn
neighbor uplink-1 activate neighbor uplink-1 activate
neighbor uplink-2 activate neighbor uplink-2 activate
advertise-all-vni advertise-all-vni
exit-address-family exit-address-family
! !
line vty line vty
exec-timeout 0 0 exec-timeout 0 0
! !
cumulusnetworks.com 587
Cumulus Linux 3.7 User Guide
cumulusnetworks.com 589
Cumulus Linux 3.7 User Guide
exit-address-family exit-address-family
! !
address-family ipv6 unicast address-family ipv6 unicast
redistribute connected redistribute connected
neighbor peerlink-3.4094 neighbor peerlink-3.4094
activate activate
neighbor uplink-1 activate neighbor uplink-1 activate
neighbor uplink-2 activate neighbor uplink-2 activate
exit-address-family exit-address-family
! !
address-family l2vpn evpn address-family l2vpn evpn
neighbor uplink-1 activate neighbor uplink-1 activate
neighbor uplink-2 activate neighbor uplink-2 activate
advertise-all-vni advertise-all-vni
exit-address-family exit-address-family
! !
line vty line vty
exec-timeout 0 0 exec-timeout 0 0
! !
cumulusnetworks.com 591
Cumulus Linux 3.7 User Guide
cumulusnetworks.com 593
Cumulus Linux 3.7 User Guide
cumulusnetworks.com 595
Cumulus Linux 3.7 User Guide
vxlan-local-tunnelip vxlan-local-tunnelip
10.0.0.7 10.0.0.8
bridge-learning off bridge-learning off
bridge-arp-nd-suppress on bridge-arp-nd-suppress on
mstpctl-portbpdufilter yes mstpctl-portbpdufilter yes
mstpctl-bpduguard yes mstpctl-bpduguard yes
mtu 9152 mtu 9152
auto bridge auto bridge
iface bridge iface bridge
bridge-vlan-aware yes bridge-vlan-aware yes
bridge-ports vx-101000 vx- bridge-ports vx-101000 vx-
101001 vx-101002 vx-101003 101001 vx-101002 vx-101003
peerlink-3 hostbond4 hostbond5 peerlink-3 hostbond4 hostbond5
bridge-stp on bridge-stp on
bridge-vids 1000-1003 bridge-vids 1000-1003
bridge-pvid 1 bridge-pvid 1
auto vrf1 auto vrf1
iface vrf1 iface vrf1
vrf-table auto vrf-table auto
auto vlan1000 auto vlan1000
iface vlan1000 iface vlan1000
address 45.0.0.2/24 address 45.0.0.3/24
address 2001:fee1::2/64 address 2001:fee1::3/64
vlan-id 1000 vlan-id 1000
vlan-raw-device bridge vlan-raw-device bridge
address-virtual 00:00:5e: address-virtual 00:00:5e:
00:01:01 45.0.0.1/24 2001: 00:01:01 45.0.0.1/24 2001:
fee1::1/64 fee1::1/64
vrf vrf1 vrf vrf1
auto vlan1001 auto vlan1001
iface vlan1001 iface vlan1001
address 45.0.1.2/24 address 45.0.1.3/24
address 2001:fee1:0:1::2/64 address 2001:fee1:0:1::3/64
vlan-id 1001 vlan-id 1001
vlan-raw-device bridge vlan-raw-device bridge
address-virtual 00:00:5e: address-virtual 00:00:5e:
00:01:01 45.0.1.1/24 2001:fee1: 00:01:01 45.0.1.1/24 2001:fee1:
0:1::1/64 0:1::1/64
vrf vrf1 vrf vrf1
auto vrf2 auto vrf2
iface vrf2 iface vrf2
vrf-table auto vrf-table auto
auto vlan1002 auto vlan1002
iface vlan1002 iface vlan1002
address 45.0.2.2/24 address 45.0.2.3/24
address 2001:fee1:0:2::2/64 address 2001:fee1:0:2::3/64
vlan-id 1002 vlan-id 1002
vlan-raw-device bridge vlan-raw-device bridge
address-virtual 00:00:5e: address-virtual 00:00:5e:
00:01:01 45.0.2.1/24 2001:fee1: 00:01:01 45.0.2.1/24 2001:fee1:
0:2::1/64 0:2::1/64
cumulusnetworks.com 597
Cumulus Linux 3.7 User Guide
! !
address-family ipv6 unicast address-family ipv6 unicast
redistribute connected redistribute connected
neighbor peerlink-3.4094 neighbor peerlink-3.4094
activate activate
neighbor uplink-1 activate neighbor uplink-1 activate
neighbor uplink-2 activate neighbor uplink-2 activate
exit-address-family exit-address-family
! !
address-family l2vpn evpn address-family l2vpn evpn
neighbor uplink-1 activate neighbor uplink-1 activate
neighbor uplink-2 activate neighbor uplink-2 activate
advertise-default-gw advertise-default-gw
advertise-all-vni advertise-all-vni
exit-address-family exit-address-family
! !
line vty line vty
exec-timeout 0 0 exec-timeout 0 0
! !
cumulusnetworks.com 599
Cumulus Linux 3.7 User Guide
cumulusnetworks.com 601
Cumulus Linux 3.7 User Guide
vxlan-local-tunnelip vxlan-local-tunnelip
10.0.0.9 10.0.0.10
bridge-learning off bridge-learning off
bridge-arp-nd-suppress on bridge-arp-nd-suppress on
mstpctl-portbpdufilter yes mstpctl-portbpdufilter yes
mstpctl-bpduguard yes mstpctl-bpduguard yes
mtu 9152 mtu 9152
auto bridge auto bridge
iface bridge iface bridge
bridge-vlan-aware yes bridge-vlan-aware yes
bridge-ports vx-101000 vx- bridge-ports vx-101000 vx-
101001 vx-101002 vx-101003 101001 vx-101002 vx-101003
peerlink-3 hostbond4 hostbond5 peerlink-3 hostbond4 hostbond5
bridge-stp on bridge-stp on
bridge-vids 1000-1003 bridge-vids 1000-1003
bridge-pvid 1 bridge-pvid 1
auto vrf1 auto vrf1
iface vrf1 iface vrf1
vrf-table auto vrf-table auto
auto vlan1000 auto vlan1000
iface vlan1000 iface vlan1000
vlan-id 1000 vlan-id 1000
vlan-raw-device bridge vlan-raw-device bridge
ip-forward off ip-forward off
auto vlan1001 auto vlan1001
iface vlan1001 iface vlan1001
vlan-id 1001 vlan-id 1001
vlan-raw-device bridge vlan-raw-device bridge
ip-forward off ip-forward off
auto vrf2 auto vrf2
iface vrf2 iface vrf2
vrf-table auto vrf-table auto
auto vlan1002 auto vlan1002
iface vlan1002 iface vlan1002
vlan-id 1002 vlan-id 1002
vlan-raw-device bridge vlan-raw-device bridge
ip-forward off ip-forward off
auto vlan1003 auto vlan1003
iface vlan1003 iface vlan1003
vlan-id 1003 vlan-id 1003
vlan-raw-device bridge vlan-raw-device bridge
ip-forward off ip-forward off
! !
log timestamp precision 6 log timestamp precision 6
! !
interface peerlink-3.4094 interface peerlink-3.4094
ipv6 nd ra-interval 10 ipv6 nd ra-interval 10
no ipv6 nd suppress-ra no ipv6 nd suppress-ra
! !
interface uplink-1 interface uplink-1
ipv6 nd ra-interval 10 ipv6 nd ra-interval 10
no ipv6 nd suppress-ra no ipv6 nd suppress-ra
! !
interface uplink-2 interface uplink-2
ipv6 nd ra-interval 10 ipv6 nd ra-interval 10
no ipv6 nd suppress-ra no ipv6 nd suppress-ra
! !
router bgp 65544 router bgp 65545
bgp router-id 10.0.0.9 bgp router-id 10.0.0.10
coalesce-time 1000 coalesce-time 1000
bgp bestpath as-path bgp bestpath as-path
multipath-relax multipath-relax
neighbor peerlink-3.4094 neighbor peerlink-3.4094
interface v6only remote-as interface v6only remote-as
external external
neighbor uplink-1 interface neighbor uplink-1 interface
v6only remote-as external v6only remote-as external
neighbor uplink-2 interface neighbor uplink-2 interface
v6only remote-as external v6only remote-as external
! !
address-family ipv4 unicast address-family ipv4 unicast
redistribute connected redistribute connected
exit-address-family exit-address-family
! !
address-family ipv6 unicast address-family ipv6 unicast
redistribute connected redistribute connected
neighbor peerlink-3.4094 neighbor peerlink-3.4094
activate activate
neighbor uplink-1 activate neighbor uplink-1 activate
neighbor uplink-2 activate neighbor uplink-2 activate
exit-address-family exit-address-family
! !
address-family l2vpn evpn address-family l2vpn evpn
neighbor uplink-1 activate neighbor uplink-1 activate
neighbor uplink-2 activate neighbor uplink-2 activate
advertise-all-vni advertise-all-vni
exit-address-family exit-address-family
! !
line vty line vty
exec-timeout 0 0 exec-timeout 0 0
! !
cumulusnetworks.com 603
Cumulus Linux 3.7 User Guide
cumulusnetworks.com 605
Cumulus Linux 3.7 User Guide
cumulusnetworks.com 607
Cumulus Linux 3.7 User Guide
cumulusnetworks.com 609
Cumulus Linux 3.7 User Guide
vxlan-local-tunnelip vxlan-local-tunnelip
10.0.0.7 10.0.0.8
bridge-learning off bridge-learning off
bridge-arp-nd-suppress on bridge-arp-nd-suppress on
mstpctl-portbpdufilter yes mstpctl-portbpdufilter yes
mstpctl-bpduguard yes mstpctl-bpduguard yes
mtu 9152 mtu 9152
auto bridge auto bridge
iface bridge iface bridge
bridge-vlan-aware yes bridge-vlan-aware yes
bridge-ports vx-101000 vx- bridge-ports vx-101000 vx-
101001 vx-101002 vx-101003 101001 vx-101002 vx-101003
peerlink-3 hostbond4 hostbond5 peerlink-3 hostbond4 hostbond5
bridge-stp on bridge-stp on
bridge-vids 1000-1003 bridge-vids 1000-1003
bridge-pvid 1 bridge-pvid 1
auto vrf1 auto vrf1
iface vrf1 iface vrf1
vrf-table auto vrf-table auto
auto vlan1000 auto vlan1000
iface vlan1000 iface vlan1000
address 45.0.0.2/24 address 45.0.0.3/24
address 2001:fee1::2/64 address 2001:fee1::3/64
vlan-id 1000 vlan-id 1000
vlan-raw-device bridge vlan-raw-device bridge
address-virtual 00:00:5e: address-virtual 00:00:5e:
00:01:01 45.0.0.1/24 2001: 00:01:01 45.0.0.1/24 2001:
fee1::1/64 fee1::1/64
vrf vrf1 vrf vrf1
auto vlan1001 auto vlan1001
iface vlan1001 iface vlan1001
address 45.0.1.2/24 address 45.0.1.3/24
address 2001:fee1:0:1::2/64 address 2001:fee1:0:1::3/64
vlan-id 1001 vlan-id 1001
vlan-raw-device bridge vlan-raw-device bridge
address-virtual 00:00:5e: address-virtual 00:00:5e:
00:01:01 45.0.1.1/24 2001:fee1: 00:01:01 45.0.1.1/24 2001:fee1:
0:1::1/64 0:1::1/64
vrf vrf1 vrf vrf1
auto vrf2 auto vrf2
iface vrf2 iface vrf2
vrf-table auto vrf-table auto
auto vlan1002 auto vlan1002
iface vlan1002 iface vlan1002
address 45.0.2.2/24 address 45.0.2.3/24
address 2001:fee1:0:2::2/64 address 2001:fee1:0:2::3/64
vlan-id 1002 vlan-id 1002
vlan-raw-device bridge vlan-raw-device bridge
address-virtual 00:00:5e: address-virtual 00:00:5e:
00:01:01 45.0.2.1/24 2001:fee1: 00:01:01 45.0.2.1/24 2001:fee1:
0:2::1/64 0:2::1/64
cumulusnetworks.com 611
Cumulus Linux 3.7 User Guide
! !
address-family ipv6 unicast address-family ipv6 unicast
redistribute connected redistribute connected
neighbor peerlink-3.4094 neighbor peerlink-3.4094
activate activate
neighbor uplink-1 activate neighbor uplink-1 activate
neighbor uplink-2 activate neighbor uplink-2 activate
exit-address-family exit-address-family
! !
address-family l2vpn evpn address-family l2vpn evpn
neighbor uplink-1 activate neighbor uplink-1 activate
neighbor uplink-2 activate neighbor uplink-2 activate
advertise-all-vni advertise-all-vni
exit-address-family exit-address-family
! !
line vty line vty
exec-timeout 0 0 exec-timeout 0 0
! !
cumulusnetworks.com 613
Cumulus Linux 3.7 User Guide
vxlan-local-tunnelip vxlan-local-tunnelip
10.0.0.9 10.0.0.10
bridge-learning off bridge-learning off
bridge-arp-nd-suppress on bridge-arp-nd-suppress on
mstpctl-portbpdufilter yes mstpctl-portbpdufilter yes
mstpctl-bpduguard yes mstpctl-bpduguard yes
mtu 9152 mtu 9152
auto bridge auto bridge
iface bridge iface bridge
bridge-vlan-aware yes bridge-vlan-aware yes
bridge-ports vx-101000 vx- bridge-ports vx-101000 vx-
101001 vx-101002 vx-101003 101001 vx-101002 vx-101003
peerlink-3 hostbond4 hostbond5 peerlink-3 hostbond4 hostbond5
bridge-stp on bridge-stp on
bridge-vids 1000-1003 bridge-vids 1000-1003
bridge-pvid 1 bridge-pvid 1
auto vrf1 auto vrf1
iface vrf1 iface vrf1
vrf-table auto vrf-table auto
auto vlan1000 auto vlan1000
iface vlan1000 iface vlan1000
address 45.0.0.2/24 address 45.0.0.3/24
address 2001:fee1::2/64 address 2001:fee1::3/64
vlan-id 1000 vlan-id 1000
vlan-raw-device bridge vlan-raw-device bridge
address-virtual 00:00:5e: address-virtual 00:00:5e:
00:01:01 45.0.0.1/24 2001: 00:01:01 45.0.0.1/24 2001:
fee1::1/64 fee1::1/64
vrf vrf1 vrf vrf1
auto vlan1001 auto vlan1001
iface vlan1001 iface vlan1001
address 45.0.1.2/24 address 45.0.1.3/24
address 2001:fee1:0:1::2/64 address 2001:fee1:0:1::3/64
vlan-id 1001 vlan-id 1001
vlan-raw-device bridge vlan-raw-device bridge
address-virtual 00:00:5e: address-virtual 00:00:5e:
00:01:01 45.0.1.1/24 2001:fee1: 00:01:01 45.0.1.1/24 2001:fee1:
0:1::1/64 0:1::1/64
vrf vrf1 vrf vrf1
auto vrf2 auto vrf2
iface vrf2 iface vrf2
vrf-table auto vrf-table auto
auto vlan1002 auto vlan1002
iface vlan1002 iface vlan1002
address 45.0.2.2/24 address 45.0.2.3/24
address 2001:fee1:0:2::2/64 address 2001:fee1:0:2::3/64
vlan-id 1002 vlan-id 1002
vlan-raw-device bridge vlan-raw-device bridge
address-virtual 00:00:5e: address-virtual 00:00:5e:
00:01:01 45.0.2.1/24 2001:fee1: 00:01:01 45.0.2.1/24 2001:fee1:
0:2::1/64 0:2::1/64
cumulusnetworks.com 615
Cumulus Linux 3.7 User Guide
! !
address-family ipv6 unicast address-family ipv6 unicast
redistribute connected redistribute connected
neighbor peerlink-3.4094 neighbor peerlink-3.4094
activate activate
neighbor uplink-1 activate neighbor uplink-1 activate
neighbor uplink-2 activate neighbor uplink-2 activate
exit-address-family exit-address-family
! !
address-family l2vpn evpn address-family l2vpn evpn
neighbor uplink-1 activate neighbor uplink-1 activate
neighbor uplink-2 activate neighbor uplink-2 activate
advertise-all-vni advertise-all-vni
exit-address-family exit-address-family
! !
line vty line vty
exec-timeout 0 0 exec-timeout 0 0
! !
cumulusnetworks.com 617
Cumulus Linux 3.7 User Guide
cumulusnetworks.com 619
Cumulus Linux 3.7 User Guide
cumulusnetworks.com 621
Cumulus Linux 3.7 User Guide
cumulusnetworks.com 623
Cumulus Linux 3.7 User Guide
cumulusnetworks.com 625
Cumulus Linux 3.7 User Guide
cumulusnetworks.com 627
Cumulus Linux 3.7 User Guide
cumulusnetworks.com 629
Cumulus Linux 3.7 User Guide
cumulusnetworks.com 631
Cumulus Linux 3.7 User Guide
cumulusnetworks.com 633
Cumulus Linux 3.7 User Guide
vxlan-local-tunnelip 10.0.0.5
bridge-learning off
bridge-access 2002
cumulusnetworks.com 635
Cumulus Linux 3.7 User Guide
router01 Configurations
router01 /etc/network/interfaces
cumulusnetworks.com 637
Cumulus Linux 3.7 User Guide
iface swp6
address 81.1.4.1/24
router01 /etc/frr/frr.conf
VXLAN Routing
VXLAN routing, sometimes referred to as inter-VXLAN routing, provides IP routing between VXLAN VNIs in
overlay networks. The routing of traffic is based on the inner header or the overlay tenant IP address.
Because VXLAN routing is fundamentally routing, it is most commonly deployed with a control plane, such
as Ethernet Virtual Private Network (EVPN (see page 539)). You can set up static routing too, either with or
without the Cumulus Lightweight Network Virtualization (see page 487) (LNV) for MAC distribution and BUM
handling.
This topic describes the platform and hardware considerations for VXLAN routing. For a detailed
638 09 January 2019
Cumulus Networks
This topic describes the platform and hardware considerations for VXLAN routing. For a detailed
description of different VXLAN routing models and configuration examples, refer to EVPN (see page 539).
VXLAN routing supports full layer 3 multi-tenancy; all routing occurs in the context of a VRF (see page 830).
Also, VXLAN routing is supported for dual-attached hosts where the associated VTEPs function in active-
active mode (see page 515).
Contents
This topic describes ...
Supported Platforms (see page 639)
VXLAN Routing Data Plane and the Broadcom Trident II+, Trident3, Maverick, and Tomahawk
Platforms (see page )
Trident II+, Trident3, and Maverick (see page )
Tomahawk (see page 640)
VXLAN Routing Data Plane and Broadcom Trident II Platforms (see page 641)
VXLAN Routing Data Plane and the Mellanox Spectrum Platform (see page 643)
Supported Platforms
The following chipsets support VXLAN routing:
Broadcom Trident II+, Trident3, and Maverick
Broadcom Tomahawk, using an internal loopback on one or more switch ports
Broadcom Trident II, static VXLAN routing only, using an external loopback on one or more switch
ports
Mellanox Spectrum
Using ECMP with VXLAN routing is supported only on Broadcom Tomahawk and Mellanox
Spectrum switches.
For additional restrictions and considerations for VXLAN routing with EVPN, refer to the
EVPN chapter (see page 539).
VXLAN Routing Data Plane and the Broadcom Trident II+, Trident3,
Maverick, and Tomahawk Platforms
mode-2: 50% of the underlay next hops are set apart for overlay
cumulusnetworks.com 639
Cumulus Linux 3.7 User Guide
mode-2: 50% of the underlay next hops are set apart for overlay
mode-3: 80% of the underlay next hops are set apart for overlay
disable: disables VXLAN routing
The following shows an example of the VXLAN Routing Profile section of the datapath.conf file where the
default profile is enabled.
...
# Specify a VxLan Routing Profile - the profile selected determines
the
# maximum number of overlay next hops that can be allocated.
# This is supported only on TridentTwoPlus and Maverick
#
# Profile can be one of {'default', 'mode-1', 'mode-2', 'mode-3',
'disable'}
# default: 15% of the overall nexthops are for overlay.
# mode-1: 25% of the overall nexthops are for overlay.
# mode-2: 50% of the overall nexthops are for overlay.
# mode-3: 80% of the overall nexthops are for overlay.
# disable: VxLan Routing is disabled
#
# By default VxLan Routing is enabled with the default profile.
vxlan_routing_overlay.profile = default
The Trident II+ and Trident3 ASIC supports a maximum of 48k underlay next hops.
For any profile you specify, you can allocate a maximum of 2K (2048) VXLAN SVI interfaces.
To disable the VXLAN routing capability on a Trident II+ or Trident3 switch, set the
vxlan_routing_overlay.profile field to disable.
Tomahawk
The Tomahawk ASIC does not support RIOT natively; you must configure the switch ports for VXLAN routing
to use internal loopback (also referred to as internal hyperloop). The internal loopback facilitates the
recirculation of packets through the ingress pipeline to achieve VXLAN routing.
F or routing into a VXLAN tunnel, the first pass of the ASIC performs routing and routing rewrites of the
packet MAC source and destination address and VLAN, then packets recirculate through the internal
hyperloop for VXLAN encapsulation and underlay forwarding on the second pass.
For routing out of a VXLAN tunnel, the first pass performs VXLAN decapsulation, then packets recirculate
through the hyperloop for routing on the second pass.
You only need to configure a number of switch ports that must be in internal loopback mode based on the
amount of bandwidth required. No additional configuration is necessary.
To configure one or more switch ports for loopback mode, edit the /etc/cumulus/ports.conf file and
change the port speed to loopback. In the example below, swp8 and swp9 are configured for loopback
mode:
...
7=4x10G
8=loopback
9=loopback
10=100G
...
After you save your changes to the ports.conf file, restart switchd (see page 201)for the changes to
take effect.
VXLAN routing using internal loopback is supported only with VLAN-aware bridges (see page 402);
you cannot use a bridge in traditional mode (see page 414).
On Broadcom Trident II switches, only static VXLAN routing is supported with the use of external
loopback.
External hyperloop is set up so that the port at one end of the loopback is a layer 2 port attached to the
bridge while the port at the other end is configured with a layer 3 interface. The layer 3 interface is
configured with the gateway IP address for the corresponding VLAN/VNI. Traffic exiting a VXLAN tunnel is
bridged out the layer 2 port if it needs to be routed (exactly as it would if it were going to an external
gateway) but at the other end, because traffic is addressed to the gateway IP address, it gets regular routing
treatment. For redundancy and increased bandwidth, two or more pairs of ports are typically put into an
external hyperloop and bonded together.
The following diagram illustrates the configuration and operation of an external hyperloop.
cumulusnetworks.com 641
Cumulus Linux 3.7 User Guide
In the above diagram, VTEPs exit01 and exit02 are acting as VXLAN layer 3 gateways. On exit01, two pairs of
ports are externally looped back (swp45, swp46) and (swp47, swp48). The ports swp46 and swp48 are
bonded together and act as the layer 2 end; therefore, this bond interface (named inside) is a member of
the bridge. The ports swp45 and swp47 are bonded together (named outside) and act as the layer 3 end
with SVIs configured for VLANs 100 and 200 with the corresponding gateway IP addresses. Because the two
layer 3 gateways are in an MLAG (see page 427) configuration, they use a virtual IP address as the gateway
IP. The relevant interface configuration on exit01 is as follows:
auto bridge
iface bridge
bridge-vlan-aware yes
bridge-ports inside server01 server02 vni-10 vni-20 peerlink
bridge-vids 100 200
bridge-pvid 1 # sets native VLAN to 1, an unused VLAN
mstpctl-treeprio 8192
auto outside
iface outside
bond-slaves swp45 swp47
alias hyperloop outside
mstpctl-bpduguard yes
mstpctl-portbpdufilter yes
auto inside
iface inside
bond-slaves swp46 swp48
alias hyperloop inside
mstpctl-bpduguard yes
mstpctl-portbpdufilter yes
auto VLAN100GW
iface VLAN100GW
bridge-ports outside.100
address 172.16.100.2/24
address-virtual 44:38:39:FF:01:90 172.16.100.1/24
auto VLAN200GW
iface VLAN200GW
bridge-ports outside.200
address 172.16.200.2/24
address-virtual 44:38:39:FF:02:90 172.16.200.1/24
auto vni-10
iface vni-10
vxlan-id 10
vxlan-local-tunnelip 10.0.0.11
bridge-access 100
auto vni-20
iface vni-20
vxlan-id 20
vxlan-local-tunnelip 10.0.0.11
bridge-access 200
For the external hyperloop to work correctly, you must configure the following switchd flag:
After you save your changes to the switchd.conf file, restart switchd (see page 201)for the change to
take effect.
cumulusnetworks.com 643
Cumulus Linux 3.7 User Guide
Contents
This topic describes ...
Getting Started (see page 645)
Configure the MidoNet Integration on the Switch (see page 645)
Configure the MidoNet Integration Using the Configuration Script (see page 645)
Configure the MidoNet Integration Manually (see page 646)
Configure MidoNet VTEP and Port Bindings (see page 647)
From the MidoNet Manager GUI (see page 647)
From the MidoNet CLI (see page 650)
Troubleshooting (see page 652)
Control Plane Troubleshooting (see page 652)
Verify VTEP and OVSDB Services (see page 652)
Verify OVSDB-server Connections (see page 653)
Verify the VXLAN Bridge and VTEP Interfaces (see page 653)
Datapath Troubleshooting (see page 654)
Verify IP Reachability (see page 654)
MidoNet VXLAN Encapsulation (see page 654)
Inspect the OVSDB (see page 655)
List the Physical Switch (see page 655)
List the Logical Switch (see page 656)
List Local or Remote MAC Addresses (see page 656)
Show Open Vswitch Database (OVSDB) Data (see page 656)
Getting Started
Make sure you have a layer 2 gateway; a Tomahawk, Trident II+ or Trident II switch running Cumulus Linux.
Cumulus Linux includes OVSDB server (ovsdb-server) and VTEPd (ovs-vtepd), which support VLAN-
aware bridges (see page 402).
To integrate a VXLAN with MidoNet, you need to:
Configure the MidoNet integration on the swtich
Configure the MidoNet VTEP and port bindings
Verify the VXLAN configuration
For more information about MidoNet, see the MidoNet Operations Guide, version 1.8 or later.
There is no support for VXLAN routing (see page 638) in the Trident II chipset; use a loopback
interface (hyperloop (see page 641)) instead.
cumulusnetworks.com 645
Cumulus Linux 3.7 User Guide
().
Executed:
define local tunnel IP address on the switch
().
Executed:
define management IP address on the switch
().
Executed:
restart a service
(Killing ovs-vtepd (28170).
Killing ovsdb-server (28146).
Starting ovsdb-server.
Starting ovs-vtepd.).
Because MidoNet does not have a controller, you need to use a dummy IP address (for example,
1.1.1.1) for the controller parameter in the script. After the script completes, delete the VTEP
manager, as it is not needed and will otherwise fill the logs with inconsequential error messages:
The switch is now ready to connect to MidoNet. The rest of the configuration is performed from the
MidoNet Manager GUI or using the MidoNet API.
cumulusnetworks.com 647
Cumulus Linux 3.7 User Guide
The tunnel zone is a construct used to define the VXLAN source address used for the tunnel. The address
of this host is used for the source of the VXLAN encapsulation and traffic transits into the routing domain
from this point. Therefore, the host must have layer 3 reachability to the Cumulus Linux switch tunnel IP.
Next, add a host entry to the tunnel zone:
1. Click Add.
2. Select a host from the Host list.
3. Provide the tunnel source IP Address to use on the selected host.
4. Click Save.
The new VTEP appears in the list below. MidoNet then initiates a connection between the OpenStack
Controller and the Cumulus Linux switch. If the OVS client successfully connects to the OVSDB server, the
VTEP entry displays the switch name and VXLAN tunnel IP address, which you specified during the
bootstrapping process.
1. Click Add.
2. In the Port Name list, select the port on the Cumulus Linux switch that you are using to connect to
the VXLAN segment.
3. Specify the VLAN ID (enter 0 for untagged).
4. In the Bridge list, select the MidoNet bridge that the instances (VMs) are using in OpenStack.
5. Click Save.
cumulusnetworks.com 649
Cumulus Linux 3.7 User Guide
You see the port binding displayed in the binding table under the VTEP.
After the port is bound, this automatically configures a VXLAN bridge interface, and includes the VTEP
interface and the port bound to the bridge. Now the OpenStack instances (VMs) are able to ping the hosts
connected to the bound port on the Cumulus switch. The Troubleshooting section below demonstrates the
verification of the VXLAN data and control planes.
root@os-controller:~# midonet-cli
midonet>
From the MidoNet CLI, the commands explained in this section perform the same operations depicted in
the previous section with the MidoNet Manager GUI.
2. The tunnel zone is a construct used to define the VXLAN source address used for the tunnel. The
address of this host is used for the source of the VXLAN encapsulation and traffic transits into the
routing domain from this point. Therefore, the host must have layer 3 reachability to the Cumulus
Linux switch tunnel IP.
First, obtain the list of available hosts connected to the Neutron network and the MidoNet
650 09 January 2019
2.
Cumulus Networks
First, obtain the list of available hosts connected to the Neutron network and the MidoNet
bridge.
Next, get a listing of all the interfaces.
Finally, add a host entry to the tunnel zone ID returned in the previous step and specify which
interface address to use.
Repeat this procedure for each OpenStack host connected to the Neutron network and the MidoNet
bridge.
3. Create a VTEP and assign it to the tunnel zone ID returned in the previous step. The management IP
address (the destination address for the VXLAN or remote VTEP) and the port must be the same
ones you configure in the vtep-bootstrap script or the manual bootstrapping:
cumulusnetworks.com 651
Cumulus Linux 3.7 User Guide
In this step, MidoNet initiates a connection between the OpenStack Controller and the Cumulus
Linux switch. If the OVS client successfully connects to the OVSDB server, the returned values should
show the name and description matching the switch-name parameter specified in the bootstrap
process.
4. The VTEP binding uses the information provided to MidoNet from the OVSDB server, providing a list
of ports that the hardware VTEP can use for layer 2 attachment. This binding virtually connects the
physical interface to the overlay switch, and joins it to the Neutron bridged network.
First, get the UUID of the Neutron network behind the MidoNet bridge:
Next, create the VTEP binding using the UUID and the switch port being bound to the VTEP on the
remote end. If there is no VLAN ID, set vlan to 0:
At this point, the VTEP is connected and the layer 2 overlay is operational. From the openstack instance
(VM), you can ping a physical server connected to the port bound to the hardware switch VTEP.
Troubleshooting
As with any complex system, there is a control plane and data plane.
If the connection fails, verify IP reachability from the host to the switch. If that succeeds, it is likely that the
bootstrap process did not set up port 6632. Redo the bootstrapping procedures above.
Next, look at the bridging table for the VTEP and the forwarding entries. The bound interface and the VTEP
are listed along with the MAC addresses of those interfaces. When the hosts attached to the bound port
send data, those MACs are learned and entered into the bridging table, as well as the OVSDB.
cumulusnetworks.com 653
Cumulus Linux 3.7 User Guide
Datapath Troubleshooting
If you have verified the control plane is correct, and you still cannot get data between the OpenStack
instances and the physical nodes on the switch, there might be something wrong with the data plane. The
data plane consists of the actual VXLAN encapsulated path, between one of the OpenStack nodes running
the midolman service. This is typically the compute nodes, but can include the MidoNet gateway nodes. If
the OpenStack instances can ping the tenant router address but cannot ping the physical device connected
to the switch (or vice versa), then something is wrong in the data plane.
Verify IP Reachability
First, there must be IP reachability between the encapsulating node, and the address you bootstrapped as
the tunnel IP on the switch. Verify the OpenStack host can ping the tunnel IP. If this does not work, check
the routing design and fix the layer 3 problem first.
cumulusnetworks.com 655
Cumulus Linux 3.7 User Guide
Logical_Binding_Stats table
_uuid bytes_from_local bytes_to_local packets_from_local
packets_to_local
------------------------------------ ---------------- --------------
------------------ ----------------
d2e378b4-61c1-4daf-9aec-a7fd352d3193 5782569 1658250 21687 14589
Logical_Router table
_uuid description name static_routes switch_binding
----- ----------- ---- ------------- --------------
Logical_Switch table
_uuid description name tunnel_key
------------------------------------ -----------
----------------------------------------- ----------
44d162dc-0372-4749-a802-5b153c7120ec "" "mn-6c9826da-6655-4fe3-a826-
4dcba6477d2d" 10006
Manager table
_uuid inactivity_probe is_connected max_backoff other_config status
target
----- ---------------- ------------ ----------- ------------ ------
------
Mcast_Macs_Local table
MAC _uuid ipaddr locator_set logical_switch
----------- ------------------------------------ ------
------------------------------------
------------------------------------
unknown-dst 25eaf29a-c540-46e3-8806-3892070a2de5 "" 7a4c000a-244e-
4b37-8f25-fd816c1a80dc 44d162dc-0372-4749-a802-5b153c7120ec
Mcast_Macs_Remote table
MAC _uuid ipaddr locator_set logical_switch
----------- ------------------------------------ ------
------------------------------------
------------------------------------
unknown-dst b122b897-5746-449e-83ba-fa571a64b374 "" 6c04d477-18d0-
41df-8d52-dc7b17845ebe 44d162dc-0372-4749-a802-5b153c7120ec
Physical_Locator table
_uuid dst_ip encapsulation_type
------------------------------------ -------------- ------------------
2fcf8b7e-e084-4bcb-b668-755ae7ac0bfb "10.111.0.182" "vxlan_over_ipv4"
3f78dbb0-9695-42ef-a31f-aaaf525147f1 "10.111.1.2" "vxlan_over_ipv4"
Physical_Locator_Set table
_uuid locators
------------------------------------
--------------------------------------
6c04d477-18d0-41df-8d52-dc7b17845ebe [2fcf8b7e-e084-4bcb-b668-
755ae7ac0bfb]
7a4c000a-244e-4b37-8f25-fd816c1a80dc [3f78dbb0-9695-42ef-a31f-
aaaf525147f1]
Physical_Port table
_uuid description name port_fault_status vlan_bindings vlan_stats
------------------------------------ ----------- ---------
----------------- ----------------------------------------
----------------------------------------
cumulusnetworks.com 657
Cumulus Linux 3.7 User Guide
----------------------------------------------------------------------
----------------------------------------------------------------------
----------------------------------------------------------------------
----------------------------------------------------------------------
----------------------------------------------------------------------
----------------------------------------------------------------------
----------------------------------------------------------------------
----------------------------------------------------------------------
----------------------------------------------------------------------
----------------------------------------------------------------------
----------------------------------------------------------------------
----------------------------------------------------------------------
----------------------------------------------------------------------
-------------------------------------------- -------------------
-------------- --------------------------------------
6d459554-0c75-4170-bb3d-117eb4ce1f4d "sw12" ["10.50.20.22"] "sw12"
[109a9911-d6c7-4142-b6c9-7c985506abb4, 124d1e01-a187-4427-819f-
21de66e76f13, 2a2d04fa-7190-41fe-8cee-318fcbafb2ea, 3001c904-b152-
4dc4-9d8e-718f24ffa439, 3943fb6a-0b49-4806-a014-2bcd4d469537,
4223559a-da1c-4c34-b8bf-bff7ced376ad, 439afb62-067e-4bbe-a0d9-
ee33a23d2a9c, 47cc66fb-ef8a-4a9b-a497-1844b89f7d32, 54f6c9df-01a1-
4d96-9dcf-3035a33ffb3e, 55b49814-b5c5-405e-8e9f-898f3df4f872,
5b15372b-89f0-4e14-a50b-b6c6f937d33d, 5be3a052-be0f-4258-94cb-
5e8be9afb896, 631b19bd-3022-4353-bb2d-f498b0c1cb17, 652c6cd1-0823-4585
-bb78-658e6ca2abfc, 684f99d5-426c-45c8-b964-211489f45599, 69585fff-436
0-4177-901d-8360ade5391b, 6bbccda8-d7e5-4b19-b978-4ec7f5b868e0,
7096abaf-eebf-4ee3-b0cc-276224bc3e71, 7cb681f4-2206-4c70-85b7-
23b60963cd21, 93b85c31-be38-4384-8b7a-9696764f9ba9, 9a7e42c4-228f-
4b55-b972-7c3b8352c27d, a44c5402-6218-4f09-bf1e-518f41a5546e,
a6f8a88a-3877-4f81-b9b4-d75394a09d2c, a9294152-2b32-4058-8796-23520
ffb7379, b26ce4dd-b771-4d7b-8647-41fa97aa40e3, b2b2cd14-662d-45a5-
87c1-277acbccdffd, bcfb2920-6676-494c-9dcb-b474123b7e59, bf38137d-
3a14-454e-8df0-9c56e4b4e640, bf69fcbb-36b3-4dbc-a90d-fc7412e57076,
c32a9ff9-fd11-4399-815f-806322f26ff5, c35f55f5-8ec6-4fed-bef4-
49801cd0934c, c5a88dd6-d931-4b2c-9baa-a0abfb9d41f5, c6876886-8386-4
e34-a307-931909fca58f, c85ed6cd-a7d4-4016-b3e9-34df592072eb, cf382ed6-
60d3-43f5-8586-81f4f0f2fb28, d9db91a6-1c10-4154-9269-84877faa79b4,
e00741f1-ba34-47c5-ae23-9269c5d1a871, e0ee993a-8383-4701-a766-
d425654dbb7f] [] ["10.111.1.2"] [062eaf89-9bd5-4132-8b6b-09db254325af]
Tunnel table
_uuid bfd_config_local bfd_config_remote bfd_params bfd_status local
remote
------------------------------------
-----------------------------------------------------------
----------------- ---------- ----------
------------------------------------
------------------------------------
062eaf89-9bd5-4132-8b6b-09db254325af {bfd_dst_ip="169.254.1.0",
bfd_dst_mac="00:23:20:00:00:01"} {} {} {} 3f78dbb0-9695-42ef-a31f-
aaaf525147f1 2fcf8b7e-e084-4bcb-b668-755ae7ac0bfb
Ucast_Macs_Local table
MAC _uuid ipaddr locator logical_switch
cumulusnetworks.com 659
Cumulus Linux 3.7 User Guide
Cumulus Linux also supports integration with VMware NSX in high availability mode. Refer to OVSDB Server
High Availability (see page 685).
Contents
This topic describes ...
Getting Started (see page 661)
Configure the Switch for NSX-V Integration (see page 661)
Start the openvswitch-vtep Service (see page 661)
Configure the NSX-V Integration Using the Configuration Script (see page 662)
Configure the NSX-V Integration Manually (see page 663)
Getting Started
Before you integrate VXLANs with NSX-V, make sure you have a layer 2 gateway; a Broadcom Tomahawk,
Trident II+, Trident II, Maverick or Mellanox Spectrum switch running Cumulus Linux. Cumulus Linux
includes OVSDB server (ovsdb-server) and VTEPd (ovs-vtepd), which support VLAN-aware bridges (see
page 402).
To integrate a VXLAN with NSX-V, you need to:
Configure the NSX-V integration on the switch.
Configure the transport and logical layers from the NSX Manager.
Verify the VXLAN configuration.
Cumulus Linux supports security protocol version TLSv1.2 for SSL connections between the
OVSDB server and the NSX controller.
The OVSDB server cannot select the loopback interface as the source IP address, causing top of
rack registration to the controller to fail. To work around this issue, run the net add bgp
redistribute connected command followed by the net commit command.
cumulusnetworks.com 661
Cumulus Linux 3.7 User Guide
Run the following commands in the order shown to complete the configuration process:
You can configure the NSX-V integration manually for standalone mode only; manual
configuration for OVSDB server high availability is not supported.
If you do not want to use the configuration script to configure the NSX-V integration on the switch
automatically, you can configure the integration manually, which requires you to perform the following
steps:
Generate a certificate and key pair for authentication by NSX-V.
Configure a switch as a VTEP gateway.
cumulus@switch:~$ ls -l
total 12
-rw-r--r-- 1 root root 4028 Oct 23 05:32 cumulus-cert.pem
-rw------- 1 root root 1679 Oct 23 05:32 cumulus-privkey.pem
-rw-r--r-- 1 root root 3585 Oct 23 05:32 cumulus-req.pem
# Start ovsdb-server.
set ovsdb-server "$DB_FILE"
set "$@" -vANY:CONSOLE:EMER -vANY:SYSLOG:ERR -vANY:FILE:INFO
set "$@" --remote=punix:"$DB_SOCK"
set "$@" --remote=db:Global,managers
set "$@" --remote=ptcp:6633:$LOCALIP
set "$@" --private-key=/root/cumulus-privkey.pem
set "$@" --certificate=/root/cumulus-cert.pem
set "$@" --bootstrap-ca-cert=/root/controller.cacert
set “$@” --ssl-protocols=TLSv1,TLSv1.1,TLSv1.2
cumulusnetworks.com 663
Cumulus Linux 3.7 User Guide
If files have been moved or regenerated, restart the OVSDB server and VTEPd:
3. Define the NSX Controller Cluster IP address in OVSDB. This causes the OVSDB server to start
contacting the NSX controller:
4. Define the local IP address on the VTEP for VXLAN tunnel termination. First, find the physical switch
name as recorded in OVSDB:
Then set the tunnel source IP address of the VTEP. This is the datapath address of the VTEP, which is
typically an address on a loopback interface on the switch that is reachable from the underlying layer
3 network:
After you generate the certificate, keep the terminal session active; you need to paste the certificate into
NSX Manager when you configure the VTEP gateway.
# Start ovs-vtepd
set ovs-vtepd unix:“$DB_SOCK”
set “$@” -vconsole:emer -vsyslog:err -vfile:info
#set “$@” --enable-vlan-aware-mode
1. In NSX Manager, add a new HW VTEP gateway. Click the Network & Security icon, Service
Definitions category, then the Hardware Devices tab. Under Hardware Devices, click +. The
Create Add Hardware Devices window opens.
cumulus@switch:~$ cd /var/lib/openvswitch
cumulus@switch:/var/lib/openvswitch$ ls
conf.db pki vtep7-cert.pem vtep7-privkey.pem vtep7-req.pem
cumulus@switch:/var/lib/openvswitch$ cat vtep7-cert.pem
cumulusnetworks.com 665
Cumulus Linux 3.7 User Guide
After communication is established between the switch and the controller, a controller.cacert file is
downloaded onto the switch.
Verify that the controller and switch handshake is successful. In a terminal connected to the switch, run this
command:
1. In NSX Manager, click the Logical Network Preparation tab in the Installation category, then
click the Segment ID tab.
2.
666 09 January 2019
Cumulus Networks
2. Click Edit and add the segment IDs (VNIDs) to be used. Here VNIs 5000-5999 are configured.
5. Select Unicast to choose the NSX-V Controller Cluster to handle the VXLAN control plane.
cumulusnetworks.com 667
Cumulus Linux 3.7 User Guide
5. Select Unicast to choose the NSX-V Controller Cluster to handle the VXLAN control plane.
1. In NSX Manager, select the Logical Switches category. Click + to add a logical switch instance.
3. In the Transport Zone field, add the transport zone that you created earlier.
668 09 January 2019
Cumulus Networks
3. In the Transport Zone field, add the transport zone that you created earlier.
4. In the Replication Mode field, select Unicast for replication by the service node. Then check the
Enable IP Discovery check box.
5. Click OK.
1. Select the Service Definitions category, then click the Hardware Devices tab. Next to the
Replication Cluster field, click Edit.
2. Hypervisors connected to the NSX controller for replication appear in the Available Objects list.
Select the required service nodes, then click the green arrow to move them to the Selected Objects
list.
cumulusnetworks.com 669
Cumulus Linux 3.7 User Guide
1. In NSX Manager, add a new logical switch port. Click the Logical Switches category. Under Actions,
click Manage Hardware Bindings. The Manage Hardware Binding wizard appears.
5.
670 09 January 2019
Cumulus Networks
5. Click OK to save the logical switch port. Connectivity is established. Repeat this procedure for each
logical switch port you want to define.
cumulusnetworks.com 671
Cumulus Linux 3.7 User Guide
To check that the active OVSDB server is connected to the NSX controller, run the ovsdb-client dump
Manager command:
Cumulus Linux also supports integration with VMware NSX in high availability mode. Refer to OVSDB Server
High Availability (see page 685).
Contents
This topic describes ...
Getting Started (see page 673)
Configure the Switch for NSX-MH Integration (see page 673)
Start the openvswitch-vtep Service (see page 673)
Configure the NSX-MH Integration Using the Configuration Script (see page 674)
Configure the NSX-MH Integration Manually (see page 675)
Provision VMware NSX-V (see page 677)
Configure the Switch as a VTEP Gateway (see page 677)
Configure the Transport and Logical Layers (see page 679)
Configure the Transport Layer (see page 679)
Configure the Logical Layer (see page 680)
Define Logical Switch Ports (see page 682)
Verify the VXLAN Configuration (see page 684)
Getting Started
Before you integrate VXLANs with NSX-MH, make sure you have a layer 2 gateway; a Broadcom Tomahawk,
Trident II+, Trident II, Maverick, or Mellanox Spectrum switch running Cumulus Linux. Cumulus Linux
includes OVSDB server (ovsdb-server) and VTEPd (ovs-vtepd), which support VLAN-aware bridges (see
page 402).
To integrate a VXLAN with NSX-MH, you need to:
Configure the NSX-MH integration on the switch.
Configure the transport and logical layers from the NSX Manager.
Verify the VXLAN configuration.
Cumulus Linux supports security protocol version TLSv1.2 for SSL connections between the
OVSDB server and the NSX controller.
The OVSDB server cannot select the loopback interface as the source IP address, causing top of
rack registration to the controller to fail. To work around this issue, run the net add bgp
redistribute connected command followed by the net commit command.
().
Executed:
restart a service
().
Run the following commands in the order shown to complete the configuration process:
You can configure the NSX-V integration manually for standalone mode only; manual
configuration for OVSDB server high availability is not supported.
If you do not want to use the configuration script to configure the NSX-MH integration on the switch
automatically, you can configure the integration manually, which requires you to perform the following
steps:
Generate a certificate and key pair for authentication by NSX.
Configure the switch as a VTEP gateway.
cumulus@switch:~$ ls -l
total 12
-rw-r--r-- 1 root root 4028 Oct 23 05:32 cumulus-cert.pem
-rw------- 1 root root 1679 Oct 23 05:32 cumulus-privkey.pem
-rw-r--r-- 1 root root 3585 Oct 23 05:32 cumulus-req.pem
cumulusnetworks.com 675
2.
# Start ovsdb-server.
set ovsdb-server "$DB_FILE"
set "$@" -vANY:CONSOLE:EMER -vANY:SYSLOG:ERR -vANY:FILE:INFO
set "$@" --remote=punix:"$DB_SOCK"
set "$@" --remote=db:Global,managers
set "$@" --remote=ptcp:6633:$LOCALIP
set "$@" --private-key=/root/cumulus-privkey.pem
set "$@" --certificate=/root/cumulus-cert.pem
set "$@" --bootstrap-ca-cert=/root/controller.cacert
If files have been moved or regenerated, restart the OVSDB server and VTEPd:
3. Define the NSX controller cluster IP address in OVSDB. This causes the OVSDB server to start
contacting the NSX controller:
4. Define the local IP address on the VTEP for VXLAN tunnel termination. First, find the physical switch
name as recorded in OVSDB:
Then set the tunnel source IP address of the VTEP. This is the datapath address of the VTEP, which is
typically an address on a loopback interface on the switch that is reachable from the underlying layer
3 network:
After you generate the certificate, keep the terminal session active; you need to paste the certificate into
NSX Manager when you configure the VTEP gateway.
# Start ovs-vtepd
set ovs-vtepd unix:“$DB_SOCK”
1. In NSX Manager, add a new gateway. Click the Network Components tab, then the Transport
Layer category. Under Transport Node, click Add, then select Manually Enter All Fields. The
Create Gateway wizard opens.
2. In the Create Gateway dialog, select Gateway for the Transport Node Type, then click Next.
3. In the Display Name field, provide a name for the gateway, then click Next.
4. Enable the VTEP service. Select the VTEP Enabled checkbox, then click Next.
5.
cumulusnetworks.com 677
Cumulus Linux 3.7 User Guide
5. From the terminal session connected to the switch where you generated the certificate, copy the
certificate and paste it into the Security Certificate text field. Copy only the bottom portion,
including the BEGIN CERTIFICATE and END CERTIFICATE lines. For example, copy all the
highlighted text in the terminal:
6. In the Connectors dialog, click Add Connector to add a transport connector. This defines the tunnel
endpoint that terminates the VXLAN tunnel and connects NSX to the physical gateway. You must
choose a tunnel Transport Type of VXLAN. Choose an existing transport zone for the connector or
click Create to create a new transport zone.
7. Define the IP address of the connector (the underlay IP address on the switch for tunnel
termination).
8. Click OK to save the connector, then click Save to save the gateway.
After communication is established between the switch and the controller, a controller.cacert file
downloads onto the switch.
Verify that the controller and switch handshake is successful. In a terminal connected to the switch, run this
command:
1. In NSX Manager, add a new gateway service. Click the Network Components tab, then the Services
category. Under Gateway Service, click Add. The Create Gateway Service wizard opens.
2. In the Create Gateway Service dialog, select VTEP L2 Gateway Service as the Gateway Service Type.
3. Provide a Display Name for the service to represent the VTEP in NSX.
4. Click Add Gateway to associate the service with the gateway you created earlier.
5. In the Transport Node field, choose the name of the gateway you created earlier.
6. In the Port ID field, choose the physical port on the gateway (for example, swp10) that will connect
to a logical layer 2 segment and carry data traffic.
7. Click OK to save this gateway in the service, then click Save to save the gateway service.
cumulusnetworks.com 679
Cumulus Linux 3.7 User Guide
1. In NSX Manager, add a new logical switch. Click the Network Components tab, then the Logical
Layer category. Under Logical Switch, click Add. The Create Logical Switch wizard opens.
2. In the Display Name field, enter a name for the logical switch, then click Next.
4.
680 09 January 2019
Cumulus Networks
4. Specify the transport zone bindings for the logical switch. Click Add Binding. The Create Transport
Zone Binding dialog opens.
5. In the Transport Type list, select VXLAN, then click OK to add the binding to the logical switch.
6. In the VNI field, assign the switch a VNI ID, then click OK.
Do not use 0 or 16777215 as the VNI ID; these are reserved values under Cumulus Linux.
cumulusnetworks.com 681
Cumulus Linux 3.7 User Guide
1.
682 09 January 2019
Cumulus Networks
1. In NSX Manager, add a new logical switch port. Click the Network Components tab, then the
Logical Layer category. Under Logical Switch Port, click Add. The Create Logical Switch Port
wizard opens.
2. In the Logical Switch UUID list, select the logical switch you created above, then click Create.
3. In the Display Name field, provide a name for the port that indicates it is the port that connects the
gateway, then click Next.
4. In the Attachment Type list, select VTEP L2 Gateway.
5. In the VTEP L2 Gateway Service UUID list, choose the name of the gateway service you created
earlier.
6. In the VLAN list, you can choose a VLAN if you want to connect only traffic on a specific VLAN of the
physical network. Leave it blank to handle all traffic.
7.
cumulusnetworks.com 683
Cumulus Linux 3.7 User Guide
7. Click Save to save the logical switch port. Connectivity is established. Repeat this procedure for each
logical switch port you want to define.
or
Use the ovsdb-client dump command to troubleshoot issues on the switch. This command verifies that
the controller and switch handshake is successful (and works only for VXLANs integrated with NSX):
Cumulus Linux supports integration with VMware NSX in both standalone mode and OVSDB server high
availability mode (where the data plane is running in active-active mode). For information about VMware
NSX in standalone mode and for a description of the components that work together to integrate VMware
NSX and Cumulus Linux, see Integrating Hardware VTEPs with VMware NSX-MH (see page 672) or
Integrating Hardware VTEPs with VMware NSX-V (see page 660).
With OVSDB server high availability mode, you use two peer Cumulus Linux switches in an MLAG
configuration. Both the MLAG primary and MLAG secondary switch contain OVSDB server and VTEPd. The
OVSDB servers synchronize their databases with each other and always maintain the replicated state
unless failover occurs; for example, the peer link bond breaks, a switch fails, or the OVSDB server goes
down. Both of the VTEPd components talk to the active OVSDB server to read the configuration and then
push the configuration to the kernel. Only the active OVSDB server communicates with the NSX controller,
unless failover occurs and then the standby OVSDB server takes over automatically. Although the Cumulus
switches are configured as an MLAG pair, the NSX controller sees them as a single system (the NSX
controller is not aware that multiple switches exist).
The following examples show OVSDB server high availability mode.
Example 1: The OVSDB server on the MLAG primary switch is active. The OVSDB server on the MLAG
secondary switch is the hot standby. Only the active OVSDB server communicates with the NSX controller.
Example 2: If failover occurs, the OVSDB server on the MLAG secondary switch becomes the active OVSDB
cumulusnetworks.com 685
Cumulus Linux 3.7 User Guide
Example 2: If failover occurs, the OVSDB server on the MLAG secondary switch becomes the active OVSDB
server and communicates with the NSX controller.
When the OVSDB server on the MLAG primary switch starts responding again, it resynchronizes its
database, becomes the active OVSDB server, and connects to the controller. At the same time, the OVSDB
server on the MLAG secondary switch stops communicating with the NSX controller, synchronizes with the
now active OVSDB server, and takes the standby role again.
Contents
This topic describes ...
Getting Started (see page 686)
Configure the NSX Integration on the Switch (see page 688)
Configure the Transport and Logical Layers (see page 690)
Troubleshooting (see page 690)
Getting Started
Before you configure OVSDB server high availability, make sure you have two switches running Cumulus
Linux in an MLAG configuration. Cumulus Linux includes OVSDB server (ovsdb-server) and VTEPd (
ovs-vtepd), which support VLAN-aware bridges (see page 402).
The following example configuration in the /etc/network/interfaces file shows the minimum MLAG
configuration required (the MLAG peerlink configuration and the dual-connected bonds on the peer
switches). The dual-connected bonds are identified in the NSX controller by their clag-id (single-
connected bonds or ports are identified by their usual interface names prepended with the name of the
particular switch to which they belong). When you create the Gateway Service for the dual-connected
bonds (described in Configuring the Transport and Logical Layers (see page 690), below), make sure to
select the clag-id named interfaces instead of the underlying individual physical ports. All the logical
network configurations are provisioned by the NSX controller.
auto peerlink-3
iface peerlink-3
bond-slaves swp5 swp6
bond-mode 802.3ad
bond-min-links 1
bond-lacp-rate 1
mtu 9202
alias Local Node/s leaf01 and Ports swp5 swp6 <==> Remote Node/s
leaf02 and Ports swp5 swp6
auto peerlink-3.4094
iface peerlink-3.4094
address 10.0.0.24/32
address 169.254.0.9/29
mtu 9202
alias clag and vxlan communication primary path
clagd-priority 4096
clagd-sys-mac 44:38:39:ff:ff:02
clagd-peer-ip 169.254.0.10
clagd-args --vm --debug 0x0
# post-up sysctl -w net.ipv4.conf.peerlink-3/4094.accept_local=1
clagd-backup-ip 10.0.0.25
auto hostbond4
iface hostbond4
bond-slaves swp7
bond-mode 802.3ad
bond-min-links 1
bond-lacp-rate 1
mtu 9152
alias Local Node/s leaf01 and Ports swp7 <==> Remote Node/s hostd-
01 and Ports swp1
clag-id 1
auto hostbond5
iface hostbond5
bond-slaves swp8
bond-mode 802.3ad
bond-min-links 1
bond-lacp-rate 1
mtu 9152
alias Local Node/s leaf01 and Ports swp8 <==> Remote Node/s hostd-
02 and Ports swp1
clag-id 2
The OVSDB server cannot select the loopback interface as the source IP address, causing top of
rack registration to the controller to fail. To work around this issue, run the net add bgp
redistribute connected command followed by the net commit command.
cumulusnetworks.com 687
Cumulus Linux 3.7 User Guide
().
Executed:
define NSX controller IP address in OVSDB
().
Executed:
define local tunnel IP address on the switch
().
Executed:
define management IP address on the switch
().
Executed:
restart a service
().
b. On the switch where you want to run the standby OVSDB server, run the the vtep-
bootstrap command with the same options as above but replace db_ha active with
db_ha standby:
c. From the switch running the active OVSDB server, copy the certificate files (hostname-cert.
pem and hostname-privkey.pem) to the same location on the switch with the standby
OVSDB server.
cumulusnetworks.com 689
c.
Cumulus Linux 3.7 User Guide
The certificate and key pairs for authenticating with the NSX controller are
generated automatically when you run the configuration script and are stored in the
/home/cumulus directory. The same certificate must be used for both switches.
d. On the switch running the active OVSDB server and then the switch running the standby
OVSDB server, run the following commands in the order shown to complete the configuration
process:
For information about the configuration script, read man vtep-bootstrap or run the command vtep-
bootstrap --help.
Troubleshooting
After you configure OVSDB server high availability, you can check that configuration is successful.
To check the sync status on the active OVSDB server, run the following command:
To check the sync status on the standby OVSDB server, run the following command:
To check that the active OVSDB server is connected to the NSX controller, run the ovsdb-client dump
Manager command:
Manager table
_uuid inactivity_probe is_connected
max_backoff other_config status target
------------------------------------ ---------------- ------------
----------- ------------ --------------------------------------
-------------------
e700ad21-8fd8-4f09-96dc-fa7cc6e498d8 30000 true
[] {} {sec_since_connect=“68”, state=ACTIVE} “ssl:
54.0.0.2:6632"
To make sure the MLAG configuration is correct, run the clagctl command:
The following example command output shows that MLAG is configured correctly on the active OVSDB
server:
The following example command output shows that MLAG is not configured correctly on the active OVSDB
server or that the peer is down:
cumulusnetworks.com 691
Cumulus Linux 3.7 User Guide
To make sure that the BFD sessions are up and running, run the ptmctl -b command:
--------------------------------------------------------
port peer state local type diag vrf
--------------------------------------------------------
vxln0 54.0.0.4 Up 36.0.0.1 singlehop N/A N/A
vxln0 54.0.0.3 Up 36.0.0.1 singlehop N/A N/A
If you encounter interface or VXLAN bridge configuration issues after adding the hardware bindings, run
the ifreload -a command to reload all network interfaces.
If you still encounter issues with high availability after you restart openvswitch-vtep.service, run
ifreload -a, and restart networking.service, reboot the switch running the standby OVSDB server.
VXLAN Scale
On Broadcom Trident II and Tomahawk switches running Cumulus Linux, there is a limit to the number of
VXLANs you can configure simultaneously. The limit most often given is 2000 VXLANs, but you might want
to get more specific and know exactly the limit for your specific design.
While this limitation does apply to Trident II+, Trident3, or Maverick ASICs, Cumulus Linux
supports the same number of VXLANs on these ASICs as it does for Trident II or Tomahawk ASICs.
Mellanox Spectrum ASICs do not have a limitation on the number of VXLANs that they can
support.
The limit is a physical to virtual mapping where a switch can hold 15000 mappings in hardware before you
encounter hash collisions. There is also an upper limit of around 3000 VLANs you can configure before you
hit the reserved range (Cumulus Linux uses 3000-3999 by default). Cumulus Networks typically uses a soft
number because the math is unique to each environment. An internal VLAN is consumed by each layer 3
port, subinterface, traditional bridge (see page 414), and the VLAN-aware bridge (see page 402). Therefore,
the number of configurable VLANs is:
(total configurable 802.1q VLANs) - (reserved VLANS) - (physical or logical interfaces) =
4094-999-eth0-loopback = 3093 by default (without any other configuration)
The equation for the number of configurable VXLANs looks like this:
(number of trunks) * (VXLAN/VLANs per trunk) - (Linux logical and physical interfaces) = 15000
For example, on a 10Gb switch with 48 * 10 G ports and 6 * 40G uplinks, you can calculate for X, the
amount of configurable VXLANs:
48 * X + (48 downlinks + 6 uplinks + 1 loopback + 1 eth0 + 1 bridge) = 15000
48 * X = 14943
X = 311 VXLANs
Similarly, you can apply this logic to a 32 port 100G switch where 16 ports are broken up to 4 * 25 Gbps
ports, for a total of 64 * 25 Gbps ports:
64 * X + (64 downlinks + 16 uplinks + 1 loopback + 1 eth0 + 1 bridge) = 15000
64 * X = 14917
X = 233 VXLANs
However, not all ports are trunks for all VXLANs (or at least not all the time). It is much more common for
subsets of ports to be used for different VXLANs. For example, a 10G (48 * 10G + 6 * 40G uplinks) can have
the following configuration:
Ports Trunks
swp31-48 X VXLAN/VLANs
cumulusnetworks.com 693
Cumulus Linux 3.7 User Guide
Mellanox switches, only with VLAN-aware bridges (see page 402) with 802.1ad and only with single
tag translation.
Contents
This topic describes ...
Remove the Early Access QinQ Metapackage (see page 694)
Configure Single Tag Translation (see page 694)
Configure the Public Cloud-facing Switch (see page 695)
Configure the Customer-facing Edge Switch (see page 696)
View the Configuration (see page 697)
Configure Double Tag Translation (see page 698)
Caveats and Errata (see page 700)
Feature Limitations (see page 700)
Long Interface Names (see page 701)
You configure two switches: one at the service provider edge that faces the customer (the switch on the left
above), and one on the public cloud handoff edge (the righthand switch above).
auto vni-1000
iface vni-1000
bridge-access 100
bridge-learning off
vxlan-id 1000
vxlan-local-tunnelip 10.0.0.1
auto vni-3000
iface vni-3000
bridge-access 200
bridge-learning off
vxlan-id 3000
vxlan-local-tunnelip 10.0.0.1
auto bridge
iface bridge
bridge-ports swp3 vni-1000 vni-3000
bridge-vids 100 200
bridge-vlan-aware yes
cumulusnetworks.com 695
Cumulus Linux 3.7 User Guide
bridge-vlan-protocol 802.1ad
auto vni-1000
iface vni-1000
bridge-access 100
bridge-learning off
vxlan-id 1000
vxlan-local-tunnelip 10.0.0.1
auto vni-3000
iface vni-3000
bridge-access 200
bridge-learning off
vxlan-id 3000
vxlan-local-tunnelip 10.0.0.1
auto swp3
iface swp3
bridge-access 100
auto swp4
iface swp4
bridge-access 200
auto bridge
iface bridge
bridge-ports swp3 swp4 vni-1000 vni-3000
bridge-vids 100 200
bridge-vlan-aware yes
bridge-vlan-protocol 802.1ad
To verify that the bridge is configured for QinQ, run ip -d link show bridge and look for vlan_protocol
802.1ad in the output:
cumulusnetworks.com 697
Cumulus Linux 3.7 User Guide
The configuration in Cumulus Linux uses the outer tag for the customer and the inner tag for the
service.
You configure a double-tagged interface by stacking the VLANs in the following manner: <port>.<outer tag>.
<inner tag>. For example, consider swp1.100.10: the outer tag is VLAN 100, which represents the customer,
and the inner tag is VLAN 10, which represents the service.
The outer tag or TPID (tagged protocol identifier) needs the vlan_protocol to be specified. It can be
either 802.1Q or 802.1ad. If 802.1ad is used, it must be specified on the lower VLAN device, such as swp3.
100 in the example below.
Double tag translation only works with bridges in traditional mode (see page 414) (not VLAN-
aware mode). As such, you cannot use NCLU (see page 88) to configure it.
To configure the switch for double tag translation using the above example, edit the /etc/network
/interfaces file in a text editor and add the following:
auto swp3.100
iface swp3.100
vlan_protocol 802.1ad
auto swp3.100.10
iface swp3.100.10
mstpctl-portbpdufilter yes
mstpctl-bpduguard yes
auto vni1000
iface vni1000
vxlan-local-tunnelip 10.0.0.1
mstpctl-portbpdufilter yes
mstpctl-bpduguard yes
vxlan-id 1000
auto custA-10-azr
iface custA-10-azr
bridge-ports swp3.100.10 vni1000
bridge-vlan-aware no
bridge-learning vni1000=off
You can check the configuration with the brctl show command:
cumulusnetworks.com 699
Cumulus Linux 3.7 User Guide
You can try this out without the bridge being VXLAN-enabled. The configuration would look
something like this:
auto swp5.100.10
iface swp5.100.10
mstpctl-portbpdufilter yes
mstpctl-bpduguard yes
auto br10
iface br10
bridge-ports swp3.10 swp4 swp5.100.10
bridge-vlan-aware no
Feature Limitations
iptables match on double-tagged interfaces is not supported.
Single-tagged translation supports only VLAN-aware bridge mode with the bridge’s VLAN 802.1ad
protocol.
MLAG (see page 427) is only supported with single-tagged translation.
No layer 2 protocol (STP BPDU, LLDP) tunneling support.
Mixing 802.1Q and 802.1ad subinterfaces on the same switch port is not supported.
When using switches with Mellanox Spectrum ASICs in an MLAG pair:
The peerlink (peerlink.4094) between the MLAG pair should be configured for VLAN protocol
700 09 January 2019
Cumulus Networks
The peerlink (peerlink.4094) between the MLAG pair should be configured for VLAN protocol
802.1ad.
The peerlink cannot be used as a backup datapath in the event that one of the MLAG peers
loses all uplinks.
For switches with the Spectrum ASIC (but not the Spectrum 2), when the bridge VLAN protocol is
802.1ad and is VXLAN-enabled, either:
All bridge ports are access ports, except for the MLAG peerlink.
All bridge ports are VLAN trunks.
This means the switch terminating the cloud provider connections (double-tagged) cannot have
local clients; these clients must be on a separate switch.
auto vlan1001
iface vlan1001
vlan-id 1001
vlan-raw-device swp50s0
vlan-protocol 802.1ad
auto vlan1001-101
iface vlan1001-101
vlan-id 101
vlan-raw-device vlan1001
auto bridge101
iface bridge101
bridge-ports vlan1001-101 vxlan1000101
Layer 3
cumulusnetworks.com 701
Cumulus Linux 3.7 User Guide
Layer 3
Routing
This chapter discusses routing on switches running Cumulus Linux.
Contents
This topic describes ...
Manage Static Routes (see page 702)
Static Multicast Routes (see page 703)
Static Routing via ip route (see page 703)
Apply a Route Map for Route Updates (see page 705)
Configure a Gateway or Default Route (see page 705)
Supported Route Table Entries (see page 705)
Forwarding Table Profiles (see page 705)
Number of Supported Route Entries, by Platform (see page 706)
TCAM Resource Profiles for Mellanox Switches (see page 708)
Caveats and Errata (see page 708)
Don't Delete Routes via Linux Shell (see page 709)
Add IPv6 Default Route with src Address on eth0 Fails without Adding Delay (see page 709)
Related Information (see page 710)
!
ip route 203.0.113.0/24 198.51.100.2
!
!
ip mroute 230.0.0.0/24
!
To view mroutes, open the FRRouting CLI, and run the following command:
auto swp3
iface swp3
address 198.51.100.1/24
post-up ip route add 203.0.113.0/24 via 198.51.100.2
If an IPv6 address is assigned to a DOWN interface, the associated route is still installed into the
routing table. The type of IPv6 address doesn't matter: link local, site local and global all exhibit
the same problem.
If the interface is bounced up and down, then the routes are no longer in the route table.
The ip route command allows manipulating the kernel routing table directly from the Linux shell. See
man ip(8) for details. FRRouting monitors the kernel routing table changes and updates its own routing
table accordingly.
To display the routing table:
cumulusnetworks.com 705
Cumulus Linux 3.7 User Guide
Cumulus Linux defines these profiles as default, l 2-heavy, v4-lpm-heavy and v6-lpm-heavy. Choose the profile
that best suits your network architecture and specify the profile name for the forwarding_table.
profile variable in the /etc/cumulus/datapath/traffic.conf file.
After you specify a different profile, restart switchd (see page 201)for the change to take effect. You can
see the forwarding table profile when you run cl-resource-query.
Broadcom ASICs other than Maverick, Tomahawk/Tomahawk+, Trident II, Trident II+, and Trident3
support only the default profile.
The values in the following tables reflect results from our testing on the different platforms we
support, and may differ from published manufacturers' specifications provided about these
chipsets.
default 40k 32k (IPv4) and 16k (IPv6) 64k (IPv4) or 28k (IPv6-long)
l2-heavy 88k 48k (IPv4) and 40k (IPv6) 8k (IPv4) and 8k (IPv6-long)
v4-lpm-heavy 8k 8k (IPv4) and 16k (IPv6) 80k (IPv4) and 16k (IPv6-long)
v6-lpm-heavy 40k 8k (IPv4) and 40k (IPv6) 8k (IPv4) and 64k (IPv6-long)
For Broadcom switches, IPv4 and IPv6 entries are not carved in separate spaces so it is not
possible to define explicit numbers in the L3 Neighbors column of the tables shown above.
However, note that an IPv6 entry takes up twice the space of an IPv4 entry.
cumulusnetworks.com 707
Cumulus Linux 3.7 User Guide
1. Valid profiles -
2. default, ipmc-heavy, acl-heavy, ipmc-max
tcam_resource.profile = default
After you specify a different profile, restart switchd (see page 201)for the change to take effect.
When nonatomic updates (see page 149) are enabled (that is, the acl.non_atomic_update_mode is set
to TRUE in /etc/cumulus/switchd.conf file), the maximum number of mroute and ACL entries for
each profile are as follows:
When nonatomic updates (see page 149) are disabled (that is, the acl.non_atomic_update_mode is set
to FALSE in /etc/cumulus/switchd.conf file), the maximum number of mroute and ACL entries for
each profile are as follows:
Add IPv6 Default Route with src Address on eth0 Fails without Adding Delay
Attempting to install an IPv6 default route on eth0 with a source address fails at reboot or when running
ifup on eth0.
The first execution of ifup -dv returns this warning and does not install the route:
Exclude the src parameter to the ip route add that causes the need for the delay. If the src
parameter is removed, the route is added correctly.
cumulusnetworks.com 709
Cumulus Linux 3.7 User Guide
Related Information
Linux IP - ip route command
FRRouting docs - static route commands
Contents
This topic describes ...
Routing Protocols (see page 710)
Configure Routing Protocols (see page 710)
Protocol Tuning (see page 711)
Routing Protocols
A routing protocol dynamically computes reachability between various end points. This enables
communication to work around link and node failures, and additions and withdrawals of various addresses.
IP routing protocols are typically distributed; that is, an instance of the routing protocol runs on each of the
routers in a network.
Cumulus Linux does not support running multiple instances of the same protocol on a router.
Distributed routing protocols compute reachability between end points by disseminating relevant
information and running a routing algorithm on this information to determine the routes to each end
station. To scale the amount of information that needs to be exchanged, routes are computed on address
prefixes rather than on every end point address.
other node in the network. Since the state that a node has to keep grows rapidly in such a case, link-state
protocols typically limit the number of nodes that communicate this way. They allow for bigger networks to
be built by breaking up a network into a set of smaller subnetworks (which are called areas or levels), and
by advertising summarized information about an area to other areas.
Besides the two critical pieces of information mentioned above, protocols have other parameters that can
be configured. These are usually specific to each protocol.
Protocol Tuning
Most protocols provide certain tunable parameters that are specific to convergence during changes.
Wikipedia defines convergence as the “state of a set of routers that have the same topological information
about the network in which they operate”. It is imperative that the routers in a network have the same
topological state for the proper functioning of a network. Without this, traffic can be blackholed, and thus
not reach its destination. It is normal for different routers to have differing topological states during
changes, but this difference should vanish as the routers exchange information about the change and
recompute the forwarding paths. Different protocols converge at different speeds in the presence of
changes.
A key factor that governs how quickly a routing protocol converges is the time it takes to detect the change.
For example, how quickly can a routing protocol be expected to act when there is a link failure. Routing
protocols classify changes into two kinds: hard changes such as link failures, and soft changes such as a
peer dying silently. They’re classified differently because protocols provide different mechanisms for dealing
with these failures.
It is important to configure the protocols to be notified immediately on link changes. This is also true when
a node goes down, causing all of its links to go down.
Even if a link doesn’t fail, a routing peer can crash. This causes that router to usually delete the routes it has
computed or worse, it makes that router impervious to changes in the network, causing it to go out of sync
with the other routers in the network because it no longer shares the same topological information as its
peers.
The most common way to detect a protocol peer dying is to detect the absence of a heartbeat. All routing
protocols send a heartbeat (or “hello”) packet periodically. When a node does not see a consecutive set of
these hello packets from a peer, it declares its peer dead and informs other routers in the network about
this. The period of each heartbeat and the number of heartbeats that need to be missed before a peer is
declared dead are two popular configurable parameters.
If you configure these timers very low, the network can quickly descend into instability under stressful
conditions when a router is not able to keep sending the heartbeats quickly as it is busy computing routing
state; or the traffic is so much that the hellos get lost. Alternately, configuring this timer to very high values
also causes blackholing of communication because it takes much longer to detect peer failures. Usually, the
default values initialized within each protocol are good enough for most networks. Cumulus Networks
recommends you do not adjust these settings.
Network Topology
In computer networks, topology refers to the structure of interconnecting various nodes. Some commonly
used topologies in networks are star, hub and spoke, leaf and spine, and broadcast.
Contents
This topic describes ...
Clos Topologies (see page 712)
Clos Topologies
In the vast majority of modern data centers, Clos or fat tree topology is very popular. This topology is shown
in the figure below. It is also commonly referred to as leaf-spine topology. We shall use this topology
throughout the routing protocol guide.
This topology allows the building of networks of varying size using nodes of different port counts and/or by
increasing the tiers. The picture above is a three-tiered Clos network. We number the tiers from the bottom
to the top. Thus, in the picture, the lowermost layer is called tier 1 and the topmost tier is called tier 3.
The number of end stations (such as servers) that can be attached to such a network is determined by a
very simple mathematical formula.
In a 2-tier network, if each node is made up of m ports, then the total number of end stations that can be
connected is m^2/2. In more general terms, if tier-1 nodes are m-port nodes and tier-2 nodes are n-port
nodes, then the total number of end stations that can be connected are (m*n)/2. In a three tier network,
where tier-3 nodes are o-port nodes, the total number of end stations that can be connected are (m*n*o)
/2^(number of tiers-1).
Let’s consider some practical examples. In many data centers, it is typical to connect 40 servers to a top-of-
rack (ToR) switch. The ToRs are all connected via a set of spine switches. If a ToR switch has 64 ports, then
after hooking up 40 ports to the servers, the remaining 24 ports can be hooked up to 24 spine switches of
the same link speed or to a smaller number of higher link speed switches. For example, if the servers are all
hooked up as 10GE links, then the ToRs can connect to the spine switches via 40G links. So, instead of
connecting to 24 spine switches with 10G links, the ToRs can connect to 6 spine switches with each link
being 40G. If the spine switches are also 64-port switches, then the total number of end stations that can
be connected is 2560 (40*64) stations.
In a three tier network of 64-port switches, the total number of servers that can be connected are
(40*64*64)/2^(3-1) = 40960. As you can see, this kind of topology can serve quite a large network with
three tiers.
This can lead to congestion in the network and hot spots. Instead, if network operators connected 32
servers per rack, then 32 ports are left to be connected to spine switches. Now, the network is said to be
rearrangably non-blocking. Now any server in a rack can talk to any other server in any other rack without
necessarily blocking traffic between other servers.
In such a network, the total number of servers that can be connected are (64*64)/2 = 2048. Similarly, a
three-tier version of the same can serve up to (64*64*64)/4 = 65536 servers.
Load Balancing
In a Clos network, traffic is load balanced across the multiple links using equal cost multi-pathing (ECMP).
Routing algorithms compute shortest paths between two end stations where shortest is typically the lowest
path cost. Each link is assigned a metric or cost. By default, a link’s cost is a function of the link speed. The
higher the link speed, the lower its cost. A 10G link has a higher cost than a 40G or 100G link, but a lower
cost than a 1G link. Thus, the link cost is a measure of its traffic carrying capacity.
In the modern data center, the links between tiers of the network are homogeneous; that is, they have the
same characteristics (same speed and therefore link cost) as the other links. As a result, the first hop router
can pick any of the spine switches to forward a packet to its destination (assuming that there is no link
failure between the spine and the destination switch). Most routing protocols recognize that there are
multiple equal-cost paths to a destination and enable any of them to be selected for a given traffic flow.
FRRouting Overview
Cumulus Linux uses FRRouting to provide the routing protocols for dynamic routing. FRRouting provides
many routing protocols, of which Cumulus Linux supports the following:
Open Shortest Path First (v2 (see page 738) and v3 (see page 753))
Border Gateway Protocol (see page 756)
Contents
This topic describes ...
Architecture (see page 714)
About zebra (see page 714)
Related Information (see page 714)
cumulusnetworks.com 713
Cumulus Linux 3.7 User Guide
Related Information (see page 714)
Architecture
As shown in the figure above, the FRRouting suite consists of various protocol-specific daemons and a
protocol-independent daemon called zebra. Each of the protocol-specific daemons are responsible for
running the relevant protocol and building the routing table based on the information exchanged.
It is not uncommon to have more than one protocol daemon running at the same time. For example, at the
edge of an enterprise, protocols internal to an enterprise (called IGP for Interior Gateway Protocol) such as
OSPF (see page 738) or RIP run alongside the protocols that connect an enterprise to the rest of the world
(called EGP or Exterior Gateway Protocol) such as BGP (see page 756).
About zebra
zebra is the daemon that resolves the routes provided by multiple protocols (including static routes
specified by the user) and programs these routes in the Linux kernel via netlink (in Linux). zebra does
more than this, of course. The FRRouting documentation defines zebra as the IP routing manager for
FRRouting that "provides kernel routing table updates, interface lookups, and redistribution of routes
between different routing protocols."
Related Information
frrouting.org
GitHub
These instructions only apply to upgrading to Cumulus Linux 3.4 or later from releases earlier
than 3.4. New image installations contain frr instead of quagga or quagga-compat. If you are
using any automation tools to configure your network and are installing a new Cumulus Linux
image, make sure your automation tools refer to FRR and not to Quagga.
If you are upgrading Cumulus Linux using apt-get upgrade, existing automation that
references Quagga continues to work until you upgrade to FRR. Once you perform the following
upgrade steps, your automation must reference FRR instead of Quagga.
Upgrading to Cumulus Linux 3.4 or later results in both quagga.service and frr.service
being present on the system, until quagga.service is removed. These services have been
configured to conflict with each other; starting one service automatically stops the other, as they
cannot run concurrently.
At the end of the apt-get upgrade process, the output shows details of the upgrade
process, regarding the Quagga to FRR switchover.
cumulusnetworks.com 715
Cumulus Linux 3.7 User Guide
Once the upgrade process is completed, the switch is in the following state:
Cumulus 3.4 and later releases do not support or implement python-clcmd. While the package
remains, the related commands have been removed.
The vtysh.conf file should not be moved, as it is unlikely any configuration is in the file.
However, if there is necessary configuration in place, copy the contents into /etc/frr
/vtysh.conf.
2. Merge the current Quagga.conf file with the new frr.conf file. Keep the default configuration for
frr.conf in place, and add the additional configuration sections from Quagga.conf.
3. Enable the daemons needed for your installation in /etc/frr/daemons.
4. Manually update the log file locations to /var/log/frr or syslog.
5. Remove the compatibility package:
This step stops the Quagga compatibility mode, causing routing to go down.
This step deletes all Quagga configuration files. Please ensure you back up your
configuration.
Cumulus Networks does not recommend reinstalling the quagga and quagga-compat
packages once they have been removed. While they can be reinstalled to continue
migration iterations, limited testing has taken place, and configuration issues may occur.
Troubleshooting
If the systemctl -l status frr output shows an issue, edit the configuration files to correct it, and
repeat the process. If issues persist, you can return to Quagga compatibility mode for further testing:
Once further testing is complete, run the following commands to reset the FRR installation, and then repeat
the steps from the beginning of this section to upgrade to FRR:
Configuring FRRouting
This section provides an overview of configuring FRRouting, the routing software package that provides a
suite of routing protocols so you can configure routing on your switch.
Contents
This topic describes ...
Configure FRRouting (see page 719)
Enable and Start FRRouting (see page 720)
Integrated Configurations (see page 720)
Restore the Default Configuration (see page 721)
Interface IP Addresses and VRFs (see page 722)
FRRouting vtysh Modal CLI (see page 722)
Reload the FRRouting Configuration (see page 727)
FRR Logging (see page 727)
Caveats (see page 728)
Obfuscated Passwords (see page 728)
Duplicate Hostnames (see page 728)
Related Information (see page 728)
Configure FRRouting
FRRouting does not start by default in Cumulus Linux. Before you run FRRouting, make sure all you have
enabled relevant daemons that you intend to use — zebra, bgpd, ospfd, ospf6d or pimd — in the /etc
/frr/daemons file.
Cumulus Networks has not tested RIP, RIPv6, IS-IS and Babel.
The zebra daemon must always be enabled. The others you can enable according to how you plan to
route your network — using BGP (see page 756) for example, instead of OSPF (see page 738).
Before you start FRRouting, you need to enable the corresponding daemons. Edit the /etc/frr/daemons
file and set to yes each daemon you are enabling. For example, to enable BGP, set both zebra and bgpd to
yes:
cumulusnetworks.com 719
Cumulus Linux 3.7 User Guide
All the routing protocol daemons (bgpd, ospfd, ospf6d, ripd, ripngd, isisd and pimd) are
dependent on zebra. When you start frr, systemd determines whether zebra is running; if
zebra is not running, it starts zebra, then starts the dependent service, such as bgpd.
In general, if you restart a service, its dependent services also get restarted. For example, running
systemctl restart frr.service restarts any of the routing protocol daemons that are
enabled and running.
For more information on the systemctl command and changing the state of daemons, read
Managing Application Daemons (see page 191).
Integrated Configurations
By default in Cumulus Linux, FRRouting saves the configuration of all daemons in a single integrated
configuration file, frr.conf.
You can disable this mode by running the following command in the vtysh FRRouting CLI (see page 722):
If you disable the integrated configuration mode, FRRouting saves each daemon-specific configuration file
in a separate file. At a minimum for a daemon to start, that daemon must be enabled and its daemon-
specific configuration file must be present, even if that file is empty.
You save the current configuration by running:
cumulus@switch:~$
When the integrated configuration mode disabled, the output looks like this:
2. Remove /etc/frr/frr.conf:
3. Restart FRRouting:
If for some reason you disabled service integrated-vtysh-config, then you should
remove all the configuration files (such as zebra.conf or ospf6d.conf) instead of frr.conf
in step 2 above.
cumulusnetworks.com 721
Cumulus Linux 3.7 User Guide
switch#
vtysh provides a Cisco-like modal CLI, and many of the commands are similar to Cisco IOS commands. By
modal CLI, we mean that there are different modes to the CLI, and certain commands are only available
within a specific mode. Configuration is available with the configure terminal command, which is
invoked thus:
The prompt displays the mode the CLI is in. For example, when the interface-specific commands are
invoked, the prompt changes to:
When the routing protocol specific commands are invoked, the prompt changes to:
At any level, ”?” displays the list of available top-level commands at that level:
switch(config-if)# ?
bandwidth Set bandwidth informational parameter
description Interface specific description
end End current mode and change to enable mode
?-based completion is also available to see the parameters that a command takes:
switch(config-if)# bandwidth ?
<1-10000000> Bandwidth in kilobits
switch(config-if)# ip ?
address Set the IP address of an interface
irdp Alter ICMP Router discovery preference this interface
ospf OSPF interface commands
rip Routing Information Protocol
router IP router interface commands
Displaying state can be done at any level, including the top level. For example, to see the routing table as
seen by zebra, you use:
cumulusnetworks.com 723
Cumulus Linux 3.7 User Guide
Running single commands with vtysh is possible using the -c option of vtysh:
Notice that the commands also take a partial command name (for example, sh ip route above) as long
as the partial command name is not aliased:
A command or feature can be disabled in FRRouting by prepending the command with no. For example:
The current state of the configuration can be viewed using the show running-config command:
Current configuration:
!
username cumulus nopassword
!
service integrated-vtysh-config
!
vrf mgmt
!
interface lo
link-detect
!
interface swp1
ipv6 nd ra-interval 10
link-detect
!
interface swp2
ipv6 nd ra-interval 10
link-detect
!
interface swp3
ipv6 nd ra-interval 10
link-detect
!
interface swp4
ipv6 nd ra-interval 10
link-detect
!
interface swp29
ipv6 nd ra-interval 10
link-detect
!
interface swp30
cumulusnetworks.com 725
Cumulus Linux 3.7 User Guide
ipv6 nd ra-interval 10
link-detect
!
interface swp31
link-detect
!
interface swp32
link-detect
!
interface vagrant
link-detect
!
interface eth0 vrf mgmt
ipv6 nd suppress-ra
link-detect
!
interface mgmt vrf mgmt
link-detect
!
router bgp 65020
bgp router-id 10.0.0.21
bgp bestpath as-path multipath-relax
bgp bestpath compare-routerid
neighbor fabric peer-group
neighbor fabric remote-as external
neighbor fabric description Internal Fabric Network
neighbor fabric capability extended-nexthop
neighbor swp1 interface peer-group fabric
neighbor swp2 interface peer-group fabric
neighbor swp3 interface peer-group fabric
neighbor swp4 interface peer-group fabric
neighbor swp29 interface peer-group fabric
neighbor swp30 interface peer-group fabric
!
address-family ipv4 unicast
network 10.0.0.21/32
neighbor fabric activate
neighbor fabric prefix-list dc-spine in
neighbor fabric prefix-list dc-spine out
exit-address-family
!
ip prefix-list dc-spine seq 10 permit 0.0.0.0/0
ip prefix-list dc-spine seq 20 permit 10.0.0.0/24 le 32
ip prefix-list dc-spine seq 30 permit 172.16.1.0/24
ip prefix-list dc-spine seq 40 permit 172.16.2.0/24
ip prefix-list dc-spine seq 50 permit 172.16.3.0/24
ip prefix-list dc-spine seq 60 permit 172.16.4.0/24
ip prefix-list dc-spine seq 500 deny any
!
ip forwarding
ipv6 forwarding
!
line vty
!
end
If you attempt to configure a routing protocol that has not been started, vtysh silently ignores
those commands.
Alternately, if you do not want to use a modal CLI to configure FRRouting, you can use a suite of Cumulus
Linux-specific commands (see page 729) instead.
FRRouting reload only applies to an integrated service configuration, where your FRRouting
configuration is stored in a single frr.conf file instead of one configuration file per FRRouting
daemon (like zebra or bgpd).
Examine the running configuration and verify that it matches the config in /etc/frr/frr.conf:
If the running configuration is not what you expected, submit a support request and supply the following
information:
The current running configuration (run net show configuration and output the contents to a
file)
The contents of /etc/frr/frr.conf
The contents of /var/log/frr/frr-reload.log
FRR Logging
By default, Cumulus Linux configures FFR with syslog severity level 6 (informational). Log output is written to
the /var/log/frr/frr.log file.
cumulusnetworks.com 727
Cumulus Linux 3.7 User Guide
To write debug messages to the log file, you must run the log syslog debug command to
configure FRR with syslog severity 7 (debug); otherwise, when you issue a debug command such
as, debug bgp neighbor-events, no output is sent to /var/log/frr/frr.log.
However, when you manually define a log target with the log file /var/log/frr/debug.
log command, FRR automatically defaults to severity 7 (debug) logging and the output is logged
to /var/log/frr/frr.log.
Caveats
Obfuscated Passwords
In FRRouting, Cumulus Linux stores obfuscated passwords for BGP and OSPF (ISIS, OSPF area, and BGP
neighbor passwords). All passwords in configuration files and those displayed in show commands are
obfuscated. The obfuscation algorithm protects passwords from casual viewing. The system can retrieve
the original password when needed.
Duplicate Hostnames
If you change the hostname, either through NCLU or with the hostname command in vtysh, the switch
can have two hostnames in the FRR configuration. For example:
Spine01# conf t
Spine01(config)# hostname Spine01-1
Spine01-1(config)# do sh run
Building configuration...
Current configuration:
!
frr version 4.0+cl3u1
frr defaults datacenter
hostname Spine01
hostname Spine01-1
...
Accidentally configuring the same numbered BGP neighbor using both the neighbor x.x.x.x
and neighbor swp# interface commands results in two neighbor entries being present for
the same IP in the configuration and operationally. You can correct this issue by updating the
configuration and restarting the FRR service.
Related Information
FRR BGP documentation
FRR IPv6 support
FRR Zebra documentation
cumulus@switch: switch(config)
~$ net add bgp # router bgp 65
autonomous- 002
system 65002 switch(config-
cumulus@switch: router)#
~$ net add bgp neighbor 14.0.0
neighbor 14.0.0.2 .22
2
cumulus@switch: switch(config)
~$ net add # ip route 155.
routing route 155 1.2.20/24
.1.2.20/24 bridge 45
bridge 45
cumulusnetworks.com 729
Cumulus Linux 3.7 User Guide
cumulus@switch: switch(config)
~$ net add interf # int swp3
ace swp3 ipv6 switch(config-i
address 3002:2123 f)# ipv6
:1234:1abc::21/64 address 3002:21
23:1234:1abc::2
1/64
cumulus@switch: switch(config)
~$ net add interf # int swp3
ace swp3 ospf6 switch(config-i
priority 120 f)# ip ospf6
priority 120
cumulus@switch: switch(config)
~$ net add ospf6 # router ospf6
timers throttle switch(config-
spf 40 50 60 ospf6)# timer
throttle spf 40
50 60
cumulus@switch: switch(config)
~$ net add interf # int swp4
ace swp4 ospf6 switch(config-i
hello-interval 60 f)# ipv6 ospf6
hello-
interval 60
Contents
This topic describes ...
Standard Debian ARP Behavior and the Tunable ARP Parameters (see page 731)
ARP Tunable Parameter Settings in Cumulus Linux (see page 732)
Change Tunable ARP Parameters (see page 734)
Change Port-specific ARP Parameters (see page 736)
Configure Proxy ARP (see page 737)
arp_announce
cumulusnetworks.com 731
Cumulus Linux 3.7 User Guide
arp_announce
arp_filter
arp_ignore
arp_notify
These parameters are described in the Linux documentation, but snippets for each parameter description
are included in the table below and are highlighted in italics.
In a standard Debian installation, all of these ARP parameters are set to 0, leaving the router as wide open
and unrestricted as possible. These settings are based on the assertion made long ago that Linux IP
addresses are a property of the device, not a property of an individual interface. Thus an ARP request or
reply could be sent on one interface containing an address residing on a different interface. While this
unrestricted behavior makes sense for a server, it is not the normal behavior of a router. Routers expect
the MAC/IP address mappings supplied by ARP to match the physical topology, with the IP addresses
matching the interfaces on which they reside. With these tunable ARP parameters, Cumulus Linux has been
able to specify the behavior to match the expectations of a router.
arp_accept 0 BOOL Define behavior for gratuitous ARP frames whose IP is not already present
in the ARP table:
0 - Don't create new entries in the ARP table.
1 - Create new entries in the ARP table.
Cumulus Linux uses the default arp_accept behavior of not
creating new entries in the ARP table when a gratuitous ARP is seen
on an interface or when an ARP reply packet is received. However, an
individual interface can have the arp_accept behavior set
differently than the remainder of the switch if needed. For
information on how to apply this port-specific behavior, see below
(see page ).
arp_announce 2 INT Define different restriction levels for announcing the local source IP
address from IP packets in ARP requests sent on interface:
0 - (default) Use any local address, configured on any interface.
1 - Try to avoid local addresses that are not in the target's subnet for this
interface. This mode is useful when target hosts reachable via this
interface require the source IP address in ARP requests to be part of their
logical network configured on the receiving interface. When we generate
the request we will check all our subnets that include the target IP and will
preserve the source address if it is from such subnet. If there is no such
subnet we select source address according to the rules for level 2.
2 - Always use the best local address for this target. In this mode we
ignore the source address in the IP packet and try to select local address
that we prefer for talks with the target host. Such local address is selected
by looking for primary IP addresses on all our subnets on the outgoing
interface that include the target IP address. If no suitable local address is
found we select the first local address we have on the outgoing interface
or on all other interfaces, with the hope we will receive reply for our
request and even sometimes no matter the source IP address we
announce.
The default Debian behavior with arp_announce set to 0 is to send
gratuitous ARPs or ARP requests using any local source IP address,
not limiting the IP source of the ARP packet to an address residing on
the interface used to send the packet. This reflects the historically
held view in Linux that IP addresses reside inside the device and are
not considered a property of a specific interface.
Routers expect a different relationship between the IP address and
the physical network. Adjoining routers look for MAC/IP addresses to
reach a next-hop residing on a connecting interface for transiting
traffic. By setting the arp_announce parameter to 2, Cumulus Linux
uses the best local address for each ARP request, preferring primary
addresses on the interface used to send the ARP. This most closely
matches traditional router ARP request behavior.
arp_filter 0 BOOL 0 - (default) The kernel can respond to ARP requests with addresses from
other interfaces. This may seem wrong but it usually makes sense,
because it increases the chance of successful communication. IP
addresses are owned by the complete host on Linux, not by particular
interfaces. Only for more complex setups like load- balancing, does this
behavior cause problems.
1 - Allows you to have multiple network interfaces on the same subnet,
and have the ARPs for each interface be answered based on whether or
not the kernel would route a packet from the ARP'd IP address out of that
interface (therefore you must use source based routing for this to work).
In other words, it allows control of which cards (usually 1) will respond to
an ARP request.
arp_filter for the interface will be enabled if at least one of conf/{all,
interface}/arp_filter is set to TRUE, it will be disabled otherwise.
Cumulus Linux uses the default Debian Linux arp_filter setting of 0.
The arp_filter is primarily used when multiple interfaces reside in
the same subnet and is used to allow/disallow which interfaces
respond to ARP requests. In the case of OSPF (see page 738) using IP
unnumbered interfaces, many interfaces appear to be in the same
subnet, and so actually contain the same address. If multiple
interfaces are used between a pair of routers, having arp_filter
set to 1 causes forwarding to fail.
The arp_filter parameter is set to allow a response on any
interface in the subnet, where the arp_ignore setting (below) to
limit cross-interface ARP behavior.
arp_ignore 1 INT Define different modes for sending replies in response to received ARP
requests that resolve local target IP addresses:
cumulusnetworks.com 733
Cumulus Linux 3.7 User Guide
arp_notify 1 BOOL Define mode for notification of address and device changes.
0 - (default) Do nothing.
1 - Generate gratuitous arp requests when device is brought up or
hardware address changes.
The default Debian arp_notify setting is to remain silent when an
interface is brought up or the hardware address is changed. Since
Cumulus Linux often acts as a next-hop for many end hosts, it
immediately notifies attached devices when an interface comes up or
the address changes. This speeds up convergence on the new
information and provides the most rapid support for changes.
arp_accept OR
arp_announce MAX
arp_filter OR
arp_ignore MAX
arp_notify MAX
The way the default setting is implemented in Linux, the value of the default parameter is copied
to every port-specific location, excluding those that already have an IP address assigned, as
previously mentioned. Therefore, there is not any complicated logic between the default setting
and the port-specific setting like there is when using the all location. This makes the application of
particular port-specific policies much simpler and more deterministic.
To determine the current ARP parameter settings for each of the the locations, use the following
mechanism; other methods are available but this one is quite simple:
cumulusnetworks.com 735
Cumulus Linux 3.7 User Guide
/proc/sys/net/ipv4/conf/all/arp_ignore:0
/proc/sys/net/ipv4/conf/all/arp_notify:0
Note that Cumulus Linux implements this change at boot time using the arp.conf file at the following
location:
To make the change persist through reboots, edit the /etc/sysctl.d/arp.conf file and add your port-
specific ARP setting.
auto swp1
iface swp1
post-up echo 1 > /proc/sys/net/ipv4/conf/swp1/proxy_arp
If you're running two interfaces in the same broadcast domain, which is typically seen when using VRR (see
page 462), as it creates a "-v0" interface in the same broadcast domain, make sure to use sysctl or sysfs
to let the kernel know, so that both interfaces do not respond with proxy ARP replies. To do so, set /proc
/sys/net/ipv4/conf/<INTERFACE>/medium_id to 2 on both the interface and the -v0 interface.
Continuing with the previous example:
auto swp1
iface swp1
post-up echo 1 > /proc/sys/net/ipv4/conf/swp1/proxy_arp
post-up echo 2 > /proc/sys/net/ipv4/conf/swp1/medium_id
auto swp1-v0
iface swp1-v0
post-up echo 1 > /proc/sys/net/ipv4/conf/swp1-v0/proxy_arp
post-up echo 2 > /proc/sys/net/ipv4/conf/swp1-v0/medium_id
cumulusnetworks.com 737
Cumulus Linux 3.7 User Guide
If you're running proxy ARP on a VRR interface, add a post-up line to the VRR interface stanza similar to the
following. For example, if vlan100 is the VRR interface for the configuration above:
Contents
This topic describes ...
Scalability and Areas (see page 739)
Configure OSPFv2 (see page 739)
Enable the OSPF and Zebra Daemons (see page 740)
Configure OSPF (see page 740)
Define (Custom) OSPF Parameters on the Interfaces (see page 741)
OSPF SPF Timer Defaults (see page 741)
Configure MD5 Authentication for OSPF Neighbors (see page 742)
Scaling Tips (see page 743)
Summarization (see page 743)
Stub Areas (see page 744)
Multiple ospfd Instances (see page 745)
Auto-cost Reference Bandwidth (see page 749)
Unnumbered Interfaces (see page 750)
Apply a Route Map for Route Updates (see page 751)
Here are some points to note about areas and OSPF behavior:
Routers that have links to multiple areas are called area border routers (ABR). For example, routers
R3, R4, R5, R6 are ABRs in the diagram. An ABR performs a set of specialized tasks, such as SPF
computation per area and summarization of routes across areas.
Most of the LSAs have an area-level flooding scope. These include router LSA, network LSA, and
summary LSA.
In the diagram, we reused the same non-zero area address. This is fine since the area address is
only a scoping parameter provided to all routers within that area. It has no meaning outside the
area. Thus, in the cases where ABRs do not connect to multiple non-zero areas, the same area
address can be used, thus reducing the operational headache of coming up with area addresses.
Configure OSPFv2
Configuring OSPF involves the following tasks:
Enabling the OSPF daemon
Enabling OSPF
cumulusnetworks.com 739
Cumulus Linux 3.7 User Guide
Enabling OSPF
Defining (Custom) OSPF parameters on the interfaces
Configure OSPF
As discussed in Introduction to Routing Protocols (see page 710), there are three steps to the configuration:
There are two ways to achieve (2) and (3) in FRRouting OSPF:
1. The network statement under router ospf does both. The statement is specified with an IP
subnet prefix and an area address. All the interfaces on the router whose IP address matches the
network subnet are put into the specified area. OSPF process starts bringing up peering adjacency
on those interfaces. It also advertises the interface IP addresses formatted into LSAs (of various
types) to the neighbors for proper reachability.
The subnets can be as coarse as possible to cover the most number of interfaces on the router that
should run OSPF.
There may be interfaces where it’s undesirable to bring up OSPF adjacency. For example, in a data
center topology, the host-facing interfaces need not run OSPF; however the corresponding IP
addresses should still be advertised to neighbors. This can be achieved using the passive-
interface construct:
Or use the passive-interface default command to put all interfaces as passive and
selectively remove certain interfaces to bring up protocol adjacency:
2. Explicitly enable OSPF for each interface by configuring it under the interface configuration mode:
If OSPF adjacency bringup is not desired, you should configure the corresponding interfaces as
passive as explained above.
This model of configuration is required for unnumbered interfaces as discussed later in this guide.
For achieving step (3) alone, the FRRouting configuration provides another method: redistribution. For
example:
Redistribution, however, unnecessarily loads the database with type-5 LSAs and should be limited to
generating real external prefixes (for example, prefixes learned from BGP). In general, it is a good
practice to generate local prefixes using network and/or passive-interface statements.
In the example command above, KEYID represents the key used to create the message digest. It's a value
between 1-255 and must be consistent across all routers on a link.
KEY represents the actual message digest key, and is associated to the given KEYID. This value has an
upper range of 16 characters; longer strings get truncated.
Existing MD5 authentication hashes can be removed with the net del interface
<interface> ospf message-digest-key <1-255> md5 <text> command.
For example, if the key ID were 1 and the key were thisisthekey, then the NCLU command would create the
following configuration in the /etc/frr/frr.conf file:
This setting gets applied and accepted into the configuration without error. However, OSPF continues to
operate without using authentication. To provide authorization, run the ip ospf authentication
message-digest command:
...
interface swp1
...
Scaling Tips
Here are some tips for how to scale out OSPF.
Summarization
By default, an ABR creates a summary (type-3) LSA for each route in an area and advertises it in adjacent
areas. Prefix range configuration optimizes this behavior by creating and advertising one summary LSA for
multiple routes.
To configure a range:
Summarize in the direction to the backbone. The backbone receives summarized routes and
injects them to other areas already summarized.
Summarization can cause non-optimal forwarding of packets during failures. Here is an example
scenario:
cumulusnetworks.com 743
Cumulus Linux 3.7 User Guide
As shown in the diagram, the ABRs in the right non-zero area summarize the host prefixes as 10.1.0.0/16.
When the link between R5 and R10 fails, R5 will send a worse metric for the summary route (metric for the
summary route is the maximum of the metrics of intra-area routes that are covered by the summary route.
Upon failure of the R5-R10 link, the metric for 10.1.2.0/24 goes higher at R5 as the path is R5-R9-R6-R10).
As a result, other backbone routers shift traffic destined to 10.1.0.0/16 towards R6. This breaks ECMP and is
an under-utilization of network capacity for traffic destined to 10.1.1.0/24.
Stub Areas
Nodes in an area receive and store intra-area routing information and summarized information about
other areas from the ABRs. In particular, a good summarization practice about inter-area routes through
prefix range configuration helps scale the routers and keeps the network stable.
Then there are external routes. External routes are the routes redistributed into OSPF from another
protocol. They have an AS-wide flooding scope. In many cases, external link states make up a large
percentage of the LSDB.
Stub areas alleviate this scaling problem. A stub area is an area that does not receive external route
advertisements.
To configure a stub area:
Stub areas still receive information about networks that belong to other areas of the same OSPF domain.
Especially, if summarization is not configured (or is not comprehensive), the information can be
overwhelming for the nodes. Totally stubby areas address this issue. Routers in totally stubby areas keep in
their LSDB information about routing within their area, plus the default route.
To configure a totally stubby area:
Type Behavior
Normal non- zero LSA types 1, 2, 3, 4 area-scoped, type 5 externals, inter-area routes summarized
area
Totally stubby area LSA types 1, 2 area-scoped, default summary, No type 3, 4, 5 LSA types allowed
1. Edit /etc/frr/daemons and add ospfd_instances="instance1 instance2 ..." to the ospfd line,
specifying an instance ID for each separate instance. For example, the following configuration has
OSPF enabled with 2 ospfd instances, 11 and 22:
cumulusnetworks.com 745
3.
hostname zebra
log file /var/log/frr/zebra.log
username cumulus nopassword
service integrated-vtysh-config
interface eth0
ipv6 nd suppress-ra
link-detect
interface lo
link-detect
interface swp1
ip ospf 11 area 0.0.0.0
link-detect
interface swp2
ip ospf 22 area 0.0.0.0
link-detect
interface swp45
link-detect
interface swp46
link-detect
interface swp47
link-detect
interface swp48
link-detect
interface swp49
link-detect
interface swp50
link-detect
interface swp51
link-detect
interface swp52
link-detect
interface vagrant
link-detect
router ospf 11
ospf router-id 1.1.1.1
router ospf 22
ospf router-id 1.1.1.1
ip forwarding
ipv6 forwarding
line vty
end
Caveats
You can use the redistribute ospf option in your frr.conf file works with this so you can route
between the instances. Specify the instance ID for the other OSPF instance. For example:
...
cumulusnetworks.com 747
Cumulus Linux 3.7 User Guide
!
router ospf 11
ospf router-id 1.1.1.1
!
router ospf 22
ospf router-id 1.1.1.1
redistribute ospf 11
!
...
If you disabled the integrated (see page 720) FRRouting configuration, you must create a separate
ospfd configuration file for each instance. The ospfd.conf file must include the instance ID in
the file name. Continuing with our example, you would create /etc/frr/ospfd-11.conf and
/etc/frr/ospfd-22.conf.
interface swp47
link-detect
!
interface swp48
link-detect
!
interface swp49
link-detect
!
interface swp50
link-detect
!
interface swp51
link-detect
!
interface swp52
link-detect
!
interface vagrant
link-detect
!
router ospf 11
ospf router-id 1.1.1.1
!
router ospf 22
ospf router-id 1.1.1.1
!
ip forwarding
ipv6 forwarding
!
line vty
!
It is a good idea to specify that the bandwidth setting should be a consistent value across all OSPF
routers; otherwise routing loops can occur.
To configure the auto-cost reference bandwidth for 90Gbps, run the following commands:
cumulusnetworks.com 749
Cumulus Linux 3.7 User Guide
...
router ospf
auto-cost reference-bandwidth 90000
...
Unnumbered Interfaces
Unnumbered interfaces are interfaces without unique IP addresses. In OSPFv2, configuring unnumbered
interfaces reduces the links between routers into pure topological elements, which dramatically simplifies
network configuration and reconfiguration. In addition, the routing database contains only the real
networks, so the memory footprint is reduced and SPF is faster.
Unless the Ethernet media is intended to be used as a LAN with multiple connected routers, we
recommend configuring the interface as point-to-point. It has the additional advantage of a
simplified adjacency state machine; there is no need for DR/BDR election and LSA reflection. See
RFC5309 for a more detailed discussion.
To configure an unnumbered interface, take the IP address of another interface (called the anchor) and use
that as the IP address of the unnumbered interface:
auto lo
iface lo inet loopback
address 192.0.2.1/32
auto swp1
iface swp1
address 192.0.2.1/32
auto swp2
iface swp2
address 192.0.2.1/32
ECMP
During SPF computation for an area, if OSPF finds multiple paths with equal cost (metric), all those paths
are used for forwarding. For example, in the reference topology diagram, R8 uses both R3 and R4 as next
hops to reach a destination attached to R9.
For the maintenance events, operators typically raise the OSPF administrative weight of the link(s) to ensure
that all traffic is diverted from the link or the node (referred to as costing out). The speed of reconvergence
does not matter. Indeed, changing the OSPF cost causes LSAs to be reissued, but the links remain in service
during the SPF computation process of all routers in the network.
For the failure events, traffic may be lost during reconvergence; that is, until SPF on all nodes computes an
alternative path around the failed link or node to each of the destinations. The reconvergence depends on
layer 1 failure detection capabilities and at the worst case DeadInterval OSPF timer.
Example configuration for event 1, using vtysh:
cumulusnetworks.com 751
Cumulus Linux 3.7 User Guide
Troubleshooting
OperState lists all the commands to view the operational state of OSPF.
The three most important states while troubleshooting the protocol are:
1. Neighbors, with net show ospf neighbor. This is the starting point to debug neighbor states
(also see tcpdump below).
2. Database, with net show ospf database. This is the starting point to verify that the LSDB is, in
fact, synchronized across all routers in the network. For example, sweeping through the output of
show ip ospf database router taken from all routers in an area will ensure if the topology
graph building process is complete; that is, every node has seen all the other nodes in the area.
3. Routes, with net show route ospf. This is the outcome of SPF computation that gets
downloaded to the forwarding table, and is the starting point to debug, for example, why an OSPF
route is not being forwarded correctly.
Related Information
Bidirectional forwarding detection (see page 805) (BFD) and OSPF
en.wikipedia.org/wiki/Open_Shortest_Path_First
FRR OSPFv2
Perlman, Radia (1999). Interconnections: Bridges, Routers, Switches, and Internetworking Protocols
(2 ed.). Addison-Wesley.
Moy, John T. OSPF: Anatomy of an Internet Routing Protocol. Addison-Wesley.
RFC 2328 OSPFv2
RFC 3101 OSPFv2 Not-So-Stubby Area (NSSA)
IETF has defined extensions to OSPFv3 to support multiple address families (that is, both IPv6 and
IPv4). FRR (see page 713) does not support it yet.
Contents
This topic describes ...
Configure OSPFv3 (see page 753)
Configure the OSPFv3 Area (see page 754)
Configure the OSPFv3 Distance (see page 755)
Configure OSPFv3 Interfaces (see page 756)
Troubleshooting (see page 756)
Related Information (see page 756)
Configure OSPFv3
Configuring OSPFv3 involves the following tasks:
1. Enabling the zebra and ospf6 daemons, as described in Configuring FRRouting (see page 719) then
start the FRRouting service:
cumulusnetworks.com 753
1.
Unlike OSPFv2, OSPFv3 intrinsically supports unnumbered interfaces. Forwarding to the next hop
router is done entirely using IPv6 link local addresses. Therefore, you are not required to
configure any global IPv6 address to interfaces between routers.
The following example command creates a summary route for all the routes in the range 2001::/64:
You can also configure the cost for a summary route, which is used to determine the shortest paths to the
destination. For example:
This example command changes the OSPF administrative distance to 150 for internal routes and 220 for
external routes:
This example command changes the OSPF administrative distance to 150 for internal routes:
This example command changes the OSPF administrative distance to 220 for external routes:
This example command changes the OSPF administrative distance to 150 for internal routes to a subnet or
network inside the same area as the router:
This example command changes the OSPF administrative distance to 150 for internal routes to a subnet in
an area of which the router is not a part:
cumulusnetworks.com 755
Cumulus Linux 3.7 User Guide
You can also configure the cost for a particular interface, bond interface, or VLAN.
The following example command configures the cost for the bond interface swp2.
Troubleshooting
See Debugging OSPF (see page 752) for OSPFv2 for the troubleshooting discussion. The equivalent
commands are:
Another helpful command is net show ospf6 spf tree. It dumps the node topology as computed by
SPF to help visualize the network view.
Related Information
Bidirectional forwarding detection (see page 805) (BFD) and OSPF
en.wikipedia.org/wiki/Open_Shortest_Path_First
FRR OSPFv3
RFC 2740 OSPFv3 OSPF for IPv6
Auto-cost reference bandwidth (see page 749) (OSPFv2 chapter)
Contents
This topic describes ...
Autonomous System Number (ASN) (see page 758)
eBGP and iBGP (see page 758)
Route Reflectors (see page 759)
ECMP with BGP (see page 760)
Maximum Paths (see page 760)
BGP for Both IPv4 and IPv6 (see page 760)
Configure BGP (see page 761)
BGP Unnumbered Interfaces (see page 762)
BGP and Extended Next Hop Encoding (see page 762)
Configure BGP Unnumbered Interfaces (see page 762)
Manage Unnumbered Interfaces (see page 764)
How traceroute Interacts with BGP Unnumbered Interfaces (see page 765)
Advanced: How Next Hop Fields Are Set (see page 766)
Limitations (see page 767)
RFC 5549 Support with Global IPv6 Peers (Cumulus Linux 3.7.2 and later) (see page 767)
Configure RFC 5549 Support with Global IPv6 Peers (see page 768)
Show IPv4 Prefixes Learned with IPv6 Next Hops (see page 769)
BGP add-path (see page 773)
BGP add-path RX (see page 773)
BGP add-path TX (see page 774)
Fast Convergence Design Considerations (see page 776)
Peer Groups to Simplify Configuration (see page 777)
Configure BGP Dynamic Neighbors (see page 777)
Configure BGP Peering Relationships across Switches (see page 778)
Configure MD5-enabled BGP Neighbors (see page 780)
Route Reflectors
In a two-tier Clos network, the leaf (or tier 1) switches are the only ones connected to end stations. The
spines themselves do not have any routes to announce; they are merely reflecting the routes announced
by one leaf to the other leaves. Therefore, the spine switches function as route reflectors while the leaf
switches serve as route reflector clients.
In a three-tier network, the tier 2 nodes (or mid-tier spines) act as both route reflector servers and route
reflector clients. They act as route reflectors because they announce the routes learned from the tier 1
nodes to other tier 1 nodes and to tier 3 nodes. They also act as route reflector clients to the tier 3 nodes,
receiving routes learned from other tier 2 nodes. Tier 3 nodes act only as route reflectors.
In the following illustration, tier 2 node 2.1 is acting as a route reflector server, announcing the routes
between tier 1 nodes 1.1 and 1.2 to tier 1 node 1.3. It is also a route reflector client, learning the routes
between tier 2 nodes 2.2 and 2.3 from the tier 3 node, 3.1.
cumulusnetworks.com 759
Cumulus Linux 3.7 User Guide
A cluster consists of route reflectors (RRs) and their clients and is used in iBGP environments where
multiple sets of route reflectors and their clients are configured. Configuring a unique ID per cluster (on the
route reflector server and clients) prevents looping as a route reflector does not accept routes from
another that has the same cluster ID. Additionally, because all route reflectors in the cluster recognize
updates from peers in the same cluster, they do not install routes from a route reflector in the same
cluster; this reduces the number of updates that need to be stored in BGP routing tables.
To configure a cluster ID on a route reflector, run the net add bgp cluster-id (<ipv4>|<1-
4294967295>) command. You can enter the cluster ID as an IP address or as a 32-bit quantity.
The following example configures a cluster ID on a route reflector in IP address format:
ECMPwith BGP
If a BGP node hears a prefix p from multiple peers, it has all the information necessary to program the
routing table to forward traffic for that prefix p through all of these peers; BGP supports equal-cost
multipathing (ECMP).
To perform ECMP in BGP, you may need to configure net add bgp bestpath as-path multipath-
relax (if you are using eBGP).
Maximum Paths
In Cumulus Linux, the BGP maximum-paths setting is enabled by default, so multiple routes are already
installed. The default setting is 64 paths.
Configure BGP
The following example shows a basic BGP configuration. The rest of this chapter discusses how to configure
other BGP features, such as unnumbered interfaces to route maps.
1. Enable the BGP and Zebra daemons (zebra and bgpd), then enable the FRRouting service and start
it, as described in Configuring FRRouting (see page 719).
2. Identify the BGP node by assigning an ASN and router-id:
For an iBGP session, the remote-as is the same as the local AS:
Specifying the IP address of the peer allows BGP to set up a TCP socket with this peer, but it does
not distribute any prefixes to it, unless it is explicitly told that it must with the activate command.
You must specify the activate command for each address family that is being announced by the
BGP session.
cumulusnetworks.com 761
Cumulus Linux 3.7 User Guide
It is node switchRR, the route reflector, on which the peer is specified as a client.
It is assumed that the IPv6 implementation on the peering device uses the MAC address as the
interface ID when assigning the IPv6 link-local address, as suggested by RFC 4291.
Cumulus Linux 3.7.2 and later also supports advertising IPv4 prefixes with IPv6 next hop
addresses while peering over IPv6 global unicast addresses. See RFC 5549 Support with Global
IPv6 Peers (see page 767) below.
In Cumulus Linux 3.7.1 and earlier, extended next hop encoding is sent only for the link-local address
762 09 January 2019
Cumulus Networks
In Cumulus Linux 3.7.1 and earlier, extended next hop encoding is sent only for the link-local address
peerings (as shown below). In Cumulus Linux 3.7.2 and later, extended next hop encoding can be sent for
the both link-local and global unicast address peerings (see RFC 5549 Support with Global IPv6 Peers (see
page 767)).
For an unnumbered configuration, you can use a single command to configure a neighbor and attach it to a
peer group (see page 777) (making sure to substitute for the interface and peer group below):
cumulusnetworks.com 763
Cumulus Linux 3.7 User Guide
The following commands show how the IPv4 link-local address 169.254.0.1 is used to install the route and
static neighbor entry to facilitate proper forwarding without having to install an IPv4 prefix with IPv6 next
hop in the kernel:
You can use the following command to display more neighbor information:
cumulus@switch:~$ ip neighbor
192.168.0.254 dev eth0 lladdr 44:38:39:00:00:5f REACHABLE
169.254.0.1 dev swp52 lladdr 44:38:39:00:00:2b PERMANENT
169.254.0.1 dev swp51 lladdr 44:38:39:00:00:5c PERMANENT
fe80::4638:39ff:fe00:2b dev swp52 lladdr 44:38:39:00:00:2b router
REACHABLE
fe80::4638:39ff:fe00:5c dev swp51 lladdr 44:38:39:00:00:5c router
REACHABLE
cumulusnetworks.com 765
Cumulus Linux 3.7 User Guide
If this address is a link-local IPv6 address, it is reset so that the link-local IPv6 address of
the eBGP peer is not passed along to an iBGP peer, which most likely is on a different
link.
route-map and/or the peer configuration can change the above behavior. For example, route-
map can set the global IPv6 next hop or the peer configuration can set it to self — which is relevant
for iBGP peers. The route map or peer configuration can also set the next hop to unchanged, which
ensures the source IPv6 global next hop is passed around — which is relevant for eBGP peers.
Whenever two next hops are being sent, the link-local next hop (the second value of the two) is the
link-local IPv6 address on the peering interface unless it is due to nh-local-unchanged or
route-map has set the link-local next hop.
Network administrators cannot set martian values for IPv6 next hops in route-map. Also, global
and link-local next hops are validated to ensure they match the respective address types.
In a received update, a martian check is imposed for the IPv6 global next hop. If the check fails, it
gets treated as an implicit withdraw.
If two next hops are received in an update and the second next hop is not a link-local address, it
gets ignored and the update is treated as if only one next hop was received.
Whenever two next hops are received in an update, the second next hop is used to install the route
into zebra. As per the previous point, it is already assured that this is a link-local IPv6 address.
Currently, this is assumed to be reachable and is not registered with NHT.
When route-map specifies the next hop as peer-address, the global IPv6 next hop as well as the
link-local IPv6 next hop (if it's being sent) is set to the peering address. If the peering is on a link-local
address, the former could be the link-local address on the peering interface, unless there is a global
IPv6 address present on this interface.
When using iBGP unnumbered with IPv6 Link Local Addresses (the default), FRR rewrites the BGP
next hop to be the adjacent link. This is similar behavior to eBGP next hops. However, iBGP route
advertisement rules do not change and a full mesh or route reflectors is still required.
The above rules imply that there are scenarios where a generated update has two IPv6 next hops, and both
of them are the IPv6 link-local address of the peering interface on the local system. If you are peering with a
switch or router that is not running Cumulus Linux and expects the first next hop to be a global IPv6
address, a route map can be used on the sender to specify a global IPv6 address. This conforms with the
recommendations in the Internet draft draft-kato-bgp-ipv6-link-local-00.txt, "BGP4+ Peering Using IPv6 Link-
local Address".
Limitations
Interface-based peering with separate IPv4 and IPv6 sessions is not supported.
In Cumulus Linux 3.7.1 and earlier, ENHE is sent for IPv6 link-local peerings only. In Cumulus Linux
3.7.2 and later, ENHE can also be also sent for IPv6 GUA peerings (see below).
If an IPv4 /30 or /31 IP address is assigned to the interface, IPv4 peering is used over IPv6 link-local
peering.
If the default router lifetime in the generated IPv6 route advertisements (RA) is set to 0, the receiving
FRRouting instance drops the RA if it is on a Cumulus Linux 2.5.z switch. To work around this issue,
either:
Explicitly configure the switch to advertise a router lifetime of 0, unless a value is specifically
set by the operator — with the assumption that the host is running Cumulus Linux 3.y.z
version of FRRouting. When hosts see an IPv6 RA with a router lifetime of 0, they do not
make that router a default router.
Use the sysctl on the host — net.ipv6.conf.all.accept_ra_defrtr. However, this
requires applying this setting on all hosts, which might mean many hosts, especially if
FRRouting is run on the hosts.
RFC 5549 Support with Global IPv6 Peers (Cumulus Linux 3.7.2 and later)
RFC 5549 defines the method used for BGP to advertise IPv4 prefixes with IPv6 next hops. The RFC does
not make a distinction between whether the IPv6 peering and next hop values should be global unicast
addresses (GUA) or link-local addresses. Cumulus Linux 3.7.1 and earlier only supports advertising IPv4
prefixes using link-local IPv6 next hop addresses via BGP unnumbered peering. Cumulus Linux 3.7.2
supports advertising IPv4 prefixes with IPv6 global unicast and link-local next hop addresses, with either
unnumbered or numbered BGP.
When BGP peering uses IPv6 global addresses and IPv4 prefixes are being advertised and installed, IPv6
route advertisements are used to derive the MAC address of the peer so that FRR can create an IPv4 route
with a link-local IPv4 next hop address (defined by RFC 3927). This is required to install the route into the
kernel. These route advertisement settings are configured automatically when FRR receives an update from
a BGP peer using IPv6 global addresses that contain an IPv4 prefix with an IPv6 nexthop, and the enhanced-
nexthop capability has been negotiated.
cumulusnetworks.com 767
Cumulus Linux 3.7 User Guide
The above commands create the following configuration in the /etc/frr/frr.conf file:
router bgp 1
bgp router-id 10.0.0.11
neighbor 2001:1:1::3 remote-as external
neighbor 2001:1:1::3 capability extended-nexthop
!
Ensure that the IPv6 peers are activated under the IPv4 unicast address family; otherwise, all peers are
activated in the IPv4 unicast address family by default.
If no bgp default ipv4-unicast is configured, you need to explicitly activate the IPv6 neighbor under
the IPv4 unicast address family as shown below:
The above commands create the following configuration in the /etc/frr/frr.conf file:
The following examples show an IPv4 prefix learned from a BGP peer over an IPv6 session using IPv6
global addresses, but where the next hop installed by BGP is a link-local IPv6 address. This occurs
when the session is directly between peers and both link-local and global IPv6 addresses are included
as next hops in the BGP update for the prefix. If both global and link-local next hops exist, BGP prefers
the link-local address for route installation.
cumulusnetworks.com 769
Cumulus Linux 3.7 User Guide
AddPath ID: RX 0, TX 3
Last update: Mon Oct 22 08:09:22 2018
The example output below shows the results of installing the route in the FRR RIB as well as the kernel
FIB. Note that the next hop used for installation in the FRR RIB is the link-local IPv6 address, but then it
is converted into an IPv4 link-local address as required for installation into the kernel FIB.
If an IPv4 prefix is learned with only an IPv6 global next hop address (for example, when the route is
learned through a route reflector), the command output shows the IPv6 global address as the next
hop value and shows that it is learned recursively through the link-local address of the route reflector.
Note that when a global IPv6 address is used as a next hop for route installation in the FRR RIB, it is
still converted into an IPv4 link-local address for installation into the kernel.
To have only IPv6 global addresses used for route installation into the FRR RIB, you must add an
additional route map to the neighbor or peer group statement in the appropriate address family.
When the route map command set ipv6 next-hop prefer-global is applied to a neighbor, if
both a link-local and global IPv6 address are in the BGP update for a prefix, the IPv6 global address is
preferred for route installation.
With this additional configuration, the output in the FRR RIB changes in the direct neighbor case, as
shown below:
router bgp 1
bgp router-id 10.0.0.11
neighbor 2001:2:2::4 remote-as internal
neighbor 2001:2:2::4 capability extended-nexthop
!
address-family ipv4 unicast
neighbor 2001:2:2::4 route-map GLOBAL in
exit-address-family
cumulusnetworks.com 771
Cumulus Linux 3.7 User Guide
!
route-map GLOBAL permit 20
set ipv6 next-hop prefer-global
!
Spine01# sh ip route
Codes: K - kernel route, C - connected, S - static, R - RIP,
O - OSPF, I - IS-IS, B - BGP, E - EIGRP, N - NHRP,
T - Table, v - VNC, V - VNC-Direct, A - Babel, D - SHARP,
F - PBR,
> - selected route, * - FIB route
When the route is learned through a route reflector, it appears like this:
router bgp 1
bgp router-id 10.0.0.13
neighbor 2001:1:1::1 remote-as internal
neighbor 2001:1:1::1 capability extended-nexthop
!
address-family ipv6 unicast
neighbor 2001:1:1::1 activate
neighbor 2001:1:1::1 route-map GLOBAL in
exit-address-family
!
route-map GLOBAL permit 10
set ipv6 next-hop prefer-global
Leaf01# sh ip route
Codes: K - kernel route, C - connected, S - static, R - RIP,
O - OSPF, I - IS-IS, B - BGP, E - EIGRP, N - NHRP,
T - Table, v - VNC, V - VNC-Direct, A - Babel, D - SHARP,
F - PBR,
> - selected route, * - FIB route
BGP add-path
Cumulus Linux supports both BGP add-path RX and BGP add-path TX.
BGP add-path RX
BGP add-path RX allows BGP to receive multiple paths for the same prefix. A path identifier is used so that
additional paths do not override previously advertised paths. No additional configuration is required for
BGP add-path RX.
To view the existing capabilities, run net show bgp neighbor. The existing capabilities are listed in the
subsection Add Path, below Neighbor capabilities:
...
cumulusnetworks.com 773
Cumulus Linux 3.7 User Guide
The example output above shows that additional BGP paths can be sent and received (TX and RX are
advertised). It also shows that the BGP neighbor, fe80::4638:39ff:fe00:5c, supports both.
To view the current additional paths, run net show bgp <network>. The example output shows an
additional path that has been added by the TX node for receiving. Each path has a unique AddPath ID.
BGP add-path TX
AddPath TX allows BGP to advertise more than just the bestpath for a prefix. Consider the following
topology:
r8
|
|
r1 ---- ---- r6
r2 ---- r7 ---- r5
||
||
r3 r4
In this topology:
r1 and r2 are in AS 100
r3 and r4 are in AS 300
r5 and r6 are in AS 500
r7 is in AS 700
r8 is in AS 800
r7 learns 1.1.1.1/32 from r1, r2, r3, r4, r5, and r6. Among these r7 picks the path from r1 as the
bestpath for 1.1.1.1/32
The example below configures the r7 session to advertise the bestpath learned from each AS. In this case,
this means a path from AS 100, a path from AS 300, and a path from AS 500. The net show bgp 1.1.1.1
/32 from r7 has "bestpath-from-AS 100" so the user can see what the bestpath is from each AS:
700 300
10.7.8.1 from r7(10.7.8.1) (10.0.0.7)
Origin IGP, localpref 100, valid, external
Community: 3:3
AddPath ID: RX 4, TX 3
Last update: Thu Jun 2 00:57:14 2016
700 500
10.7.8.1 from r7(10.7.8.1) (10.0.0.7)
Origin IGP, localpref 100, valid, external, bestpath-from-AS
700, best
Community: 5:5
AddPath ID: RX 6, TX 2
Last update: Thu Jun 2 00:57:14 2016
The example below shows the results if r7 is configured to advertise all paths to r8:
cumulusnetworks.com 775
Cumulus Linux 3.7 User Guide
700 300
10.7.8.1 from r7(10.7.8.1) (10.0.0.7)
Origin IGP, localpref 100, valid, external
Community: 3:3
AddPath ID: RX 4, TX 3
Last update: Thu Jun 2 00:57:14 2016
700 500
10.7.8.1 from r7(10.7.8.1) (10.0.0.7)
Origin IGP, localpref 100, valid, external, bestpath-from-AS
700, best
Community: 5:5
AddPath ID: RX 6, TX 2
Last update: Thu Jun 2 00:57:14 2016
exit-address-family
You create the above configuration with the following NCLU commands:
By default, Cumulus Linux sends IPv6 neighbor discovery router advertisements. Cumulus
Networks recommends you adjust the interval of the router advertisement to a shorter value (
net add interface <interface> ipv6 nd ra-interval <interval>) to address
scenarios when nodes come up and miss router advertisement processing to relay the neighbor’s
link-local address to BGP. The interval is measured in seconds and defaults to 10 seconds.
BGP peer-group restrictions have been replaced with update-groups, which dynamically examine
all peers and group them if they have the same outbound policy.
You configure dynamic neighbors using the bgp listen range <IP address> peer-group
cumulusnetworks.com 777
Cumulus Linux 3.7 User Guide
You configure dynamic neighbors using the bgp listen range <IP address> peer-group
<GROUP> command. After you configure the dynamic neighbors, a BGP speaker can listen for, and form
peer relationships with, any neighbor in the IP address range and mapped to a peer group.
To limit the number of dynamic peers, specify the limit in the bgp listen limit command (the default
value is 100):
To connect to the same AS using the neighbor command, modify your configuration similar to the
following:
To connect to a different AS using the neighbor command, modify your configuration similar to the
following:
To connect to the same AS using the peer-group command, modify your configuration similar to
the following:
cumulusnetworks.com 779
Cumulus Linux 3.7 User Guide
To connect to a different AS using the peer-group command, modify your configuration similar to
the following:
switch1
switch2
3. Confirm the configuration has been implemented with the net show bgp summary command:
6. Confirm the configuration has been implemented with the net show bgp summary command:
The MD5 password configured against a BGP listen-range peer-group (used to accept and create
dynamic BGP neighbors) is not enforced. This means that connections are accepted from peers
that do not specify a password.
1. To establish a connection between two eBGP peers that are not directly connected:
2. Confirm the configuration with the net show bgp neighbor <ip> command:
cumulusnetworks.com 783
Cumulus Linux 3.7 User Guide
cumulusnetworks.com 785
Cumulus Linux 3.7 User Guide
When configured, the graceful-shutdown community is added to all paths from eBGP peers and the
local-pref for that route is set to 0. An example configuration is shown below:
20
10.1.2.2 from bottom0(10.1.2.2) (10.1.1.1)
Origin IGP, metric 0, localpref 0, valid, external, bestpath-
from-AS 20
Community: 99:1 graceful-shutdown
AddPath ID: RX 0, TX 2
Last update: Mon Sep 18 17:01:18 2017
To disable graceful shutdown for the current node, run the net del bgp graceful-shutdown
command:
Configuration Tips
cumulusnetworks.com 787
Cumulus Linux 3.7 User Guide
BGP and static routing (IPv4 and IPv6) are supported within a VRF context. For more information,
refer to Virtual Routing and Forwarding - VRF (see page 830).
When the neighbor receives the prefix, it examines the community value and takes action accordingly, such
as permitting or denying the community member in the routing policy.
Here is an example of a standard community list filter:
You can apply the community list to a route map to define the routing policy:
Troubleshooting
To troubleshoot BGP, you can view the summary of neighbors to which the switch is connected and see
information about these connections. The following example shows sample command output:
cumulusnetworks.com 789
Cumulus Linux 3.7 User Guide
To determine if the sessions above are iBGP or eBGP sessions, look at the ASNs.
To show a more detailed breakdown of a specific neighbor, run the net show bgp neighbor
<neighbor> command:
Neighbor capabilities:
4 Byte AS: advertised and received
AddPath:
IPv4 Unicast: RX advertised IPv4 Unicast and received
Extended nexthop: advertised and received
Address families by peer:
IPv4 Unicast
Route refresh: advertised and received(old & new)
Address family IPv4 Unicast: advertised and received
Hostname Capability: advertised and received
Graceful Restart Capabilty: advertised and received
Remote Restart timer is 120 seconds
Address families by peer:
none
Graceful restart informations:
End-of-RIB send: IPv4 Unicast
End-of-RIB received: IPv4 Unicast
Message statistics:
Inq depth is 0
Outq depth is 0
Sent Rcvd
Opens: 1 1
Notifications: 0 0
Updates: 7 6
Keepalives: 690 689
Route Refresh: 0 0
Capability: 0 0
Total: 698 696
Minimum time between advertisement runs is 0 seconds
For address family: IPv4 Unicast
fabric peer-group member
Update group 1, subgroup 1
Packet Queue length 0
Community attribute sent to this neighbor(both)
Inbound path policy configured
Outbound path policy configured
Incoming update prefix filter list is *dc-leaf-in
Outgoing update prefix filter list is *dc-leaf-out
3 accepted prefixes
Connections established 1; dropped 0
Last reset never
Local host: fe80::4638:39ff:fe00:5b, Local port: 48424
Foreign host: fe80::4638:39ff:fe00:5c, Foreign port: 179
Nexthop: 10.0.0.11
Nexthop global: fe80::4638:39ff:fe00:5b
Nexthop local: fe80::4638:39ff:fe00:5b
BGP connection: shared network
BGP Connect Retry Timer in Seconds: 3
Estimated round trip time: 3 ms
Read thread: on Write thread: off
To see details of a specific route, such as from where it is received and where it is sent, run the net show
cumulusnetworks.com 791
Cumulus Linux 3.7 User Guide
To see details of a specific route, such as from where it is received and where it is sent, run the net show
bgp <ip address/prefix> command:
The above example shows that the routing table prefix seen by BGP is 10.0.0.11/32, that this route is
advertised to two neighbors, and that it is not heard by any neighbors.
IPv6 route advertisements (RAs) are automatically enabled on an interface with IPv6 addresses;
the no ipv6 nd suppress-ra command is not needed for BGP unnumbered.
Instead of the IPv6 address, the peering interface name is displayed in the show ip bgp summary
command and wherever else applicable:
Most of the net show commands can take the interface name instead of the IP address.
cumulusnetworks.com 793
Cumulus Linux 3.7 User Guide
Read-only mode begins as soon as the first peer reaches its established state and the max-delay timer
starts, and continues until either of the following two conditions are met:
All the configured peers (except the shutdown peers) have sent an explicit EOR (End-Of-RIB) or an
implicit EOR. The first keep-alive after BGP has reached the established state is considered an
implicit EOR.
If you specify the establish-wait option, BGP only considers peers that have reached the
established state from the moment the max-delay timer starts until the establish-wait period
ends.
The minimum set of established peers for which EOR is expected are the peers that are
established during the establish-wait window, not necessarily all the configured
neighbors.
Protocol Tuning
See Caveats and Errata below for information regarding ttl-security hops.
Here is an example:
presence of a lot of neighbors. keepalive-time is the periodicity with which the keepalive message is
sent. hold-time specifies how many keepalive messages can be lost before the connection is considered
invalid. It is usually set to three times the keepalive time, and defaults to nine seconds. The following
example shows how to change these timers:
The following snippet shows that the default values have been modified for this neighbor:
Reconnect Quickly
A BGP process attempts to connect to a peer after a failure (or on startup) every connect-time seconds.
By default, this is 10 seconds. To modify this value, run the following command:
Advertisement Interval
BGP by default chooses stability over fast convergence. This is very useful when routing for the Internet. For
example, unlike link-state protocols, BGP typically waits for a duration of advertisement-interval
seconds between sending consecutive updates to a neighbor. This ensures that an unstable neighbor
flapping routes are not propagated throughout the network. By default, this is set to zero seconds for both
eBGP and iBGP sessions, which allows for very fast convergence. You can modify this as follows:
cumulusnetworks.com 797
Cumulus Linux 3.7 User Guide
Hostname: spine01
Member of peer-group fabric for session parameters
BGP version 4, remote router ID 0.0.0.0
BGP state = Connect
Last read 00:04:37, Last write 00:44:07
Hold time is 30, keepalive interval is 10 seconds
Configured hold time is 30, keepalive interval is 10 seconds
Message statistics:
Inq depth is 0
Outq depth is 0
Sent Rcvd
Opens: 1 1
Notifications: 1 0
Updates: 7 6
Keepalives: 2374 2373
Route Refresh: 0 0
Capability: 0 0
Total: 2383 2380
Minimum time between advertisement runs is 5 seconds
...
See this IETF draft for more details on the use of this value.
ttl-security Issue
Enabling ttl-security does not cause the hardware to be programmed with the relevant information.
This means that frames will come up to the CPU and be dropped there. It is recommended that you use the
net add acl command to explicitly add the relevant entry to hardware.
For example, you can configure a file, such as /etc/cumulus/acl/policy.d/01control_plane_bgp.
rules, with a rule like this for TTL:
INGRESS_INTF = swp1
INGRESS_CHAIN = INPUT, FORWARD
[iptables]
-A $INGRESS_CHAIN --in-interface $INGRESS_INTF -p tcp --dport bgp
-m ttl --ttl 255 POLICE --set-mode pkt --set-rate 2000 --set-burst
1000
-A $INGRESS_CHAIN --in-interface $INGRESS_INTF -p tcp --dport bgp DROP
For more information about ACLs, see Netfilter (ACLs) (see page 141).
Related Information
Bidirectional forwarding detection (see page 805) (BFD) and BGP
Wikipedia entry for BGP (includes list of useful RFCs)
FRR BGP documentation
IETF draft discussing BGP use within data centers
RFC 1657, Definitions of Managed Objects for the Fourth Version of the Border Gateway Protocol
(BGP-4) using SMIv2
RFC 1997, BGP Communities Attribute
RFC 2385, Protection of BGP Sessions via the TCP MD5 Signature Option
RFC 2439, BGP Route Flap Damping
RFC 2545, Use of BGP-4 Multiprotocol Extensions for IPv6 Inter-Domain Routing
RFC 2918, Route Refresh Capability for BGP-4
RFC 4271, A Border Gateway Protocol 4 (BGP-4)
RFC 4360, BGP Extended Communities Attribute
RFC 4456, BGP Route Reflection – An Alternative to Full Mesh Internal BGP (iBGP)
RFC 4760, Multiprotocol Extensions for BGP-4
RFC 5004, Avoid BGP Best Path Transitions from One External to Another
RFC 5065, Autonomous System Confederations for BGP
RFC 5291, Outbound Route Filtering Capability for BGP-4
RFC 5492, Capabilities Advertisement with BGP-4
RFC 5549, Advertising IPv4 Network Layer Reachability Information with an IPv6 Next Hop
RFC 6793, BGP Support for Four-Octet Autonomous System (AS) Number Space
RFC 7911, Advertisement of Multiple Paths in BGP
draft-walton-bgp-hostname-capability-02, Hostname Capability for BGP
Policy-based Routing
Typical routing systems and protocols forward traffic based on the destination address in the packet, which
is used to look up an entry in a routing table. However, sometimes the traffic on your network requires a
more hands-on approach. You might need to forward a packet based on the source address, the packet
size, or other information in the packet header.
Policy-based routing (PBR) lets you make routing decisions based on filters that change the routing
behavior of specific traffic so that you can override the routing table and influence where the traffic goes.
For example, you can use PBR to help you reach the best bandwidth utilization for business-critical
applications, isolate traffic for inspection or analysis, or manually load balance outbound traffic.
Policy-based routing is applied to incoming packets. All packets received on a PBR-enabled interface pass
cumulusnetworks.com 799
Cumulus Linux 3.7 User Guide
Policy-based routing is applied to incoming packets. All packets received on a PBR-enabled interface pass
through enhanced packet filters that determine rules and specify where to forward the packets.
You can create a maximum of 255 PBR match rules and 256 nexthop groups (this is the
ECMP limit).
You can apply only one PBR policy per input interface.
You can match on source and destination IP address only.
PBR is not supported for GRE or VXLAN tunneling.
PBR is not supported on ethernet interfaces.
A PBR rule cannot contain both IPv4 and IPv6 addresses.
Contents
This topic describes ...
Configure PBR (see page 800)
Configuration Example (see page 802)
Review Your Configuration (see page 803)
Delete PBR Rules and Policies (see page 804)
Configure PBR
A PBR policy contains one or more policy maps. Each policy map:
Is identified with a unique map name and sequence number. The sequence number is used to
determine the relative order of the map within the policy.
Contains a match source IP rule or a match destination IP rule, and a set rule.
To match on a source and destination address, a policy map can contain both match source
and match destination IP rules.
A set rule determines the PBR nexthop for the policy. The set rule can contain a single
nexthop IP address or it can contain a nexthop group. A nexthop group has more than one
nexthop IP address so that you can use multiple interfaces to forward traffic. To use ECMP,
you configure a nexthop group.
To use PBR in Cumulus linux, you define a PBR policy and apply it to the ingress interface (the interface
must already have an IP address assigned). Traffic is matched against the match rules in sequential order
and forwarded according to the set rule in the first match. Traffic that does not match any rule is passed
onto the normal destination based routing mechanism.
For Tomahawk and Tomahawk+ platforms, you must configure the switch to operate in non-
atomic mode, which offers better scaling as all TCAM resources are used to actively impact traffic.
Add the line acl.non_atomic_update_mode = TRUE to the /etc/cumulus/switchd.conf
file. For more information, see Nonatomic Update Mode vs. Atomic Update Mode (see page 149).
1.
800 09 January 2019
Cumulus Networks
1. Configure the policy map with the net add pbr-map <name> seq <1-700> match dst-
ip|src-ip <ip/prefixlen> command.
The example commands below configure a policy map called map1 with sequence number 1, that
matches on destination address 10.1.2.0/24 and source address 10.1.4.1/24.
If the IP address in the rule is 0.0.0.0/0 or ::/0, any IP address is a match. You
cannot mix IPv4 and IPv6 addresses in a rule.
To apply a nexthop group (for ECMP) to the policy map, first create the nexthop group, then
apply the group to the policy map:
a. Create the nexthop group with the net add nexthop-group <groupname>
nexthop <ipaddress> [<interface>] [nexthop-vrf <vrfname>]
command.
The output interface and VRF are optional. However, you must specify the VRF if the
nexthop is not in the default VRF.
The example commands below create a nexthop group called group1 that contains
the nexthop 192.168.0.21 on output interface swp1 and VRF rocket, and the
nexthop 192.168.0.22.
b. Apply the nexthop group to the policy map with the net add pbr-map <name>
seq <1-700> set nexthop-group <groupname> command.
The example command below applies the nexthop group group1 to the map1 policy
map:
cumulusnetworks.com 801
b.
3. Assign the PBR policy to an ingress interface with the net add interface <interface> pbr-
policy <name> command.
The example command below assigns the PBR policy map1 to interface swp51:
Configuration Example
In the following example, the PBR-enabled switch has a PBR policy to route all traffic from the Internet to a
server that performs anti-DDOS. The traffic returns to the PBR-enabled switch after being cleaned and is
then passed onto the regular destination based routing mechanism.
interface swp51
pbr-policy map1
pbr-map map1 seq 1
match src-ip 0.0.0.0/0
set nexthop 192.168.0.32
To see the policies applied to a specific interface on the switch, add the interface name at the end of the
command; for example, net show pbr interface swp51.
To see information about all policies, including mapped table and rule numbers, use the net show pbr
map command. If the rule is not set, you see a reason why.
To see information about a specific policy, what it matches, and with which interface it is associated, add
the map name at the end of the command; for example, net show pbr map map1.
To see information about all nexthop groups, run the net show pbr nexthop-group command:
cumulusnetworks.com 803
Cumulus Linux 3.7 User Guide
To see information about a specific nexthop group, add the group name at the end of the command; for
example, net show pbr nexthop-group group1.
A new Linux routing table ID is used for each nexthop and nexthop group.
Use caution when deleting PBR rules and nexthop groups, as you might create an incorrect
configuration for the PBR policy.
The following example shows how to delete a PBR policy so that the PBR interface is no longer receiving
PBR traffic:
Contents
This topic describes ...
BFD Multihop Routed Paths (see page 806)
BFD Parameters (see page 806)
Configure BFD (see page 806)
BFD in BGP (see page 807)
BFD in OSPF (see page 808)
OSPF Show Commands (see page 808)
Scripts (see page 810)
Echo Function (see page 810)
About the Echo Packet (see page 810)
Transmit and Receive Echo Packets (see page 811)
Echo Function Parameters (see page 811)
Troubleshooting (see page 811)
Related Information (see page 812)
cumulusnetworks.com 805
Cumulus Linux 3.7 User Guide
BFD Parameters
You can configure the following BFD parameters for both IPv4 and IPv6 sessions:
The required minimum interval between the received BFD control packets.
The minimum interval for transmitting BFD control packets.
The detection time multiplier.
Configure BFD
You configure BFD one of two ways: by specifying the configuration in the PTM topology.dot file (see
page 348), or using FRRouting (see page 713). However, the topology file has some limitations:
The topology.dot file supports creating BFD IPv4 and IPv6 single hop sessions only; you cannot
specify IPv4 or IPv6 multihop sessions in the topology file.
The topology file supports BFD sessions for only link-local IPv6 peers; BFD sessions for global IPv6
peers discovered on the link will not be created.
You cannot specify BFD multihop sessions in the topology.dot file since you cannot specify the
source and destination IP address pairs in that file. Use FRRouting (see page 719) to configure
multihop sessions.
The FRRouting CLI can track IPv4 and IPv6 peer connectivity — both single hop and multihop, and both link-
local IPv6 peers and global IPv6 peers — using BFD sessions without needing the topology.dot file. Use
FRRouting to register multihop peers with PTM and BFD as well as for monitoring the connectivity to the
remote BGP multihop peer. FRRouting can dynamically register and unregister both IPv4 and IPv6 peers
with BFD when the BFD-enabled peer connectivity is established or de-established, respectively. Also, you
can configure BFD parameters for each BGP or OSPF peer using FRRouting.
The BFD parameter configured in the topology file is given higher precedence over the client-
configured BFD parameters for a BFD session that has been created by both topology file and
client (FRRouting).
BFD requires an IP address for any interface on which it is configured. The neighbor IP address
806 09 January 2019
Cumulus Networks
BFD requires an IP address for any interface on which it is configured. The neighbor IP address
for a single hop BFD session must be in the ARP table before BFD can start sending control
packets.
BFD in BGP
For FRRouting when using BGP, neighbors are registered and de-registered with PTM (see page 348)
dynamically when you enable BFD in BGP using net add bgp neighbor <neighbor|IP|interface>
bfd. For example:
Configuration of BFD for a peergroup or individual neighbors is performed in the same way.
These commands add the neighbor SPINE bfd line below the last address family configuration in the
/etc/frr/frr.conf file:
...
...
The configuration above configures the default BFD values of intervals: 3, minimum RX interval: 300ms,
minimum TX interval: 300ms.
To see neighbor information in BGP, including BFD status, run net show bgp neighbor <interface>.
To change the BFD values to something other than the defaults, BFD parameters can be configured for
each BGP neighbor. For example:
BFD in BGP
cumulusnetworks.com 807
Cumulus Linux 3.7 User Guide
BFD in OSPF
For FRRouting using OSFP, neighbors are registered and de-registered dynamically with PTM (see page 348)
when you enable or disable BFD in OSPF. A neighbor is registered with BFD when two-way adjacency is
established and deregistered when adjacency goes down if the BFD is enabled on the interface. The BFD
configuration is per interface and any IPv4 and IPv6 neighbors discovered on that interface inherit the
configuration.
BFD in OSPF
These commands create the following configuration snippet in the /etc/frr/frr.conf file:
interface swp1
ipv6 ospf6 bfd 5 500 500
end
cumulusnetworks.com 809
Cumulus Linux 3.7 User Guide
Options 2 *|-|-|-|-|-|E|*
Dead timer due in 38.501s
Database Summary List 0
Link State Request List 0
Link State Retransmission List 0
Thread Inactivity Timer on
Thread Database Description Retransmision off
Thread Link State Request Retransmission on
Thread Link State Update Retransmission on
BFD: Type: single hop
Detect Mul: 5, Min Rx interval: 500, Min Tx interval: 500
Status: Down, Last update: 0:00:01:29
Scripts
ptmd executes scripts at /etc/ptm.d/bfd-sess-down and /etc/ptm.d/bfd-sess-up for when BFD
sessions go down or up, running bfd-sess-down when a BFD session goes down and running bfd-sess-
up when a BFD session goes up.
You should modify these default scripts as needed.
Echo Function
Cumulus Linux supports the echo function for IPv4 single hops only, and with the asynchronous operating
mode only (Cumulus Linux does not support demand mode).
You use the echo function primarily to test the forwarding path on a remote system. To enable the echo
function, set echoSupport to 1 in the topology file.
Once the echo packets are looped by the remote system, the BFD control packets can be sent at a much
lower rate. You configure this lower rate by setting the slowMinTx parameter in the topology file to a non-
zero value of milliseconds.
You can use more aggressive detection times for echo packets since the round-trip time is reduced
because they are accessing the forwarding path. You configure the detection interval by setting the
echoMinRx parameter in the topology file to a non-zero value of milliseconds; the minimum setting is 50
milliseconds. Once configured, BFD control packets are sent out at this required minimum echo Rx interval.
This indicates to the peer that the local system can loop back the echo packets. Echo packets are
transmitted if the peer supports receiving echo packets.
0 1 2 3
My Discriminator
Where:
Troubleshooting
You can use the following commands to view information about active BFD sessions.
To return information on active BFD sessions, use the net show bfd sessions command:
----------------------------------------------------------
port peer state local type diag
----------------------------------------------------------
swp1 11.0.0.2 Up N/A singlehop N/A
N/A 12.12.12.1 Up 12.12.12.4 multihop N/A
To return more detailed information on active BFD sessions, use the net show bfd sessions detail
command (results are for an IPv6-connected peer):
cumulusnetworks.com 811
Cumulus Linux 3.7 User Guide
----------------------------------------------------------------------
------------------
port peer state local type diag det
tx_timeout rx_timeout
mult
----------------------------------------------------------------------
------------------
swp1 fe80::202:ff:fe00:1 Up N/A singlehop N/A 3
300 900
swp1 3101:abc:bcad::2 Up N/A singlehop N/A 3
300 900
#continuation of output
---------------------------------------------------------------------
echo echo max rx_ctrl tx_ctrl rx_echo tx_echo
tx_timeout rx_timeout hop_cnt
---------------------------------------------------------------------
0 0 N/A 187172 185986 0 0
0 0 N/A 501 533 0 0
Related Information
RFC 5880 - Bidirectional Forwarding Detection
RFC 5881 - BFD for IPv4 and IPv6 (Single Hop)
RFC 5882 - Generic Application of BFD
RFC 5883 - Bidirectional Forwarding Detection (BFD) for Multihop Paths
Contents
This topic describes ...
Equal Cost Routing (see page 813)
ECMP Hashing (see page 813)
Use cl-ecmpcalc to Determine the Hash Result (see page 814)
cl-ecmpcalc Limitations (see page 814)
ECMP Hash Buckets (see page 814)
Configure a Hash Seed to Avoid Hash Polarization (see page 816)
Resilient Hashing (see page 817)
Resilient Hash Buckets (see page 817)
The BGP maximum-paths setting is enabled, so multiple routes are installed by default. See the
ECMP section (see page 760) of the BGP chapter for more information.
ECMP Hashing
Once multiple routes are installed in the routing table, a hash is used to determine which path a packet
follows.
Cumulus Linux hashes on the following fields:
IP protocol
Ingress interface
Source IPv4 or IPv6 address
Destination IPv4 or IPv6 address
For TCP/UDP frames, Cumulus Linux also hashes on:
Source port
Destination port
To prevent out of order packets, ECMP hashing is done on a per-flow basis, which means that all packets
with the same source and destination IP addresses and the same source and destination ports always hash
to the same next hop. ECMP hashing does not keep a record of flow states.
cumulusnetworks.com 813
Cumulus Linux 3.7 User Guide
ECMP hashing does not keep a record of packets that have hashed to each next hop and does not
guarantee that traffic sent to each next hop is equal.
cl-ecmpcalc Limitations
cl-ecmpcalc can only take input interfaces that can be converted to a single physical port in the port tab
file, like the physical switch ports (swp). Virtual interfaces like bridges, bonds, and subinterfaces are not
supported.
cl-ecmpcalc is supported only on switches with the Mellanox Spectrum and the Broadcom Maverick,
Tomahawk, Trident II, Trident II+ and Trident3 chipsets.
In the following example, 4 next hops exist. Three different flows are hashed to different hash buckets.
Each next hop is assigned to a unique hash bucket.
A new next hop is added and a new hash bucket is created. As a result, the hash and hash bucket
assignment changed, causing the existing flows to be sent to different next hops.
cumulusnetworks.com 815
Cumulus Linux 3.7 User Guide
A next hop fails and the next hop and hash bucket are removed. The remaining next hops may be
reassigned.
In most cases, the modification of hash buckets has no impact on traffic flows as traffic is being forward to a
single end host. In deployments where multiple end hosts are using the same IP address (anycast), resilient
hashing must be used.
...
...
cumulus@leaf01:~$
Resilient Hashing
In Cumulus Linux, when a next hop fails or is removed from an ECMP pool, the hashing or hash bucket
assignment can change. For deployments where there is a need for flows to always use the same next hop,
like TCP anycast deployments, this can create session failures.
The ECMP hash performed with resilient hashing is exactly the same as the default hashing mode. Only the
method in which next hops are assigned to hash buckets differs.
Resilient hashing supports both IPv4 and IPv6 routes.
Resilient hashing is not enabled by default. See below for steps on configuring it.
Resilient hashing prevents disruptions when next hops are removed. It does not prevent
disruption when next hops are added.
Resilient hashing is supported only on switches with the Broadcom Tomahawk, Trident II, Trident
II+, and Trident3 as well as Mellanox Spectrum chipsets. You can run net show system to
determine the chipset.
cumulusnetworks.com 817
Cumulus Linux 3.7 User Guide
With 12 buckets assigned and four next hops, instead of reducing the number of buckets — which would
impact flows to known good hosts — the remaining next hops replace the failed next hop.
After the failed next hop is removed, the remaining next hops are installed as replacements. This prevents
impact to any flows that hash to working next hops.
As a result, some flows may hash to new next hops, which can impact anycast deployments.
cumulusnetworks.com 819
Cumulus Linux 3.7 User Guide
An ECMP route counts as a single route with multiple next hops. The following example is
considered to be a single ECMP route:
All ECMP routes must use the same number of buckets (the number of buckets cannot be configured per
ECMP route).
The number of buckets can be configured as 64, 128, 256, 512 or 1024; the default is 128:
64 1024
128 512
256 256
512 128
1024 64
A larger number of ECMP buckets reduces the impact on adding new next hops to an ECMP route.
However, the system supports fewer ECMP routes. If the maximum number of ECMP routes have been
installed, new ECMP routes log an error and are not installed.
To enable resilient hashing, edit /etc/cumulus/datapath/traffic.conf:
Cumulus Networks
Redistribute Neighbor
Redistribute neighbor provides a mechanism for IP subnets to span racks without forcing the end hosts to
run a routing protocol.
The fundamental premise behind redistribute neighbor is to announce individual host /32 routes in the
routed fabric. Other hosts on the fabric can then use this new path to access the hosts in the fabric. If
multiple equal-cost paths (ECMP) are available, traffic can load balance across the available paths natively.
The challenge is to accurately compile and update this list of reachable hosts or neighbors. Luckily, existing
commonly-deployed protocols are available to solve this problem. Hosts use ARP to resolve MAC addresses
when sending to an IPv4 address. A host then builds an ARP cache table of known MAC addresses: IPv4
tuples as they receive or respond to ARP requests.
In the case of a leaf switch, where the default gateway is deployed for hosts within the rack, the ARP cache
table contains a list of all hosts that have ARP'd for their default gateway. In many scenarios, this table
contains all the layer 3 information that's needed. This is where redistribute neighbor comes in, as it is a
mechanism of formatting and syncing this table into the routing protocol.
Contents
This topic describes ...
Availability (see page 822)
Target Use Cases and Best Practices (see page 822)
How It Works (see page 822)
Example Configuration (see page 822)
Configure the Leaf(s) (see page 823)
Configure the Host(s) (see page 825)
Known Limitations (see page 827)
TCAM Route Scale (see page 827)
Possible Uneven Traffic Distribution (see page 827)
Silent Hosts Never Receive Traffic (see page 827)
Support for IPv4 Only (see page 827)
VRFs Are not Supported (see page 827)
Only 1024 Interfaces Supported (see page 827)
Troubleshooting (see page 827)
How do I determine if rdnbrd (the redistribute neighbor daemon) is running? (see page 827)
How do I change rdnbrd's default configuration? (see page 828)
What is table 10? Why was table 10 chosen? (see page 828)
cumulusnetworks.com 821
Cumulus Linux 3.7 User Guide
What is table 10? Why was table 10 chosen? (see page 828)
How do I determine that the /32 redistribute neighbor routes are being advertised to my
neighbor? (see page 829)
How do I verify that the kernel routing table is being correctly populated? (see page 829)
Availability
Redistribute neighbor is distributed as python-rdnbrd.
How It Works
Redistribute neighbor works as follows:
1. The leaf/ToR switches learn about connected hosts when the host sends an ARP request or ARP
reply.
2. An entry for the host is added to the kernel neighbor table of each leaf switch.
3. The redistribute neighbor daemon, rdnbrd, monitors the kernel neighbor table and creates a /32
route for each neighbor entry. This /32 route is created in kernel table 10.
4. FRRouting is configured to import routes from kernel table 10.
5. A route-map is used to control which routes from table 10 are imported.
6. In FRRouting these routes are imported as table routes.
7. BGP, OSPF and so forth are then configured to redistribute the table 10 routes.
Example Configuration
The following example configuration is based on the reference topology created by Cumulus Networks.
Other configurations are possible, based on the use cases outlined above. Here is a diagram of the
topology:
1. Configure the host facing ports, using the same IP address on both host-facing interfaces as well as a
/32 prefix. In this case, swp1 and swp2 are configured as they are the ports facing server01 and
server02:
auto lo
iface lo inet loopback
address 10.0.0.11/32
auto swp1
iface swp1
address 10.0.0.11/32
auto swp2
iface swp2
address 10.0.0.11/32
cumulusnetworks.com 823
Cumulus Linux 3.7 User Guide
2.
4. Configure routing:
a. Define a route-map that matches on the host-facing interfaces:
c. Redistribute the imported table routes in into the appropriate routing protocol.
BGP:
OSPF:
cumulusnetworks.com 825
Cumulus Linux 3.7 User Guide
auto lo:1
iface lo:1
address 10.1.0.101/32
auto eth1
iface eth1
address 10.1.0.101/32
post-up for i in {1..3}; do arping -q -c 1 -w 0 -i eth1
10.0.0.11; sleep 1; done
post-up ip route add 0.0.0.0/0 nexthop via 10.0.0.11 dev eth1
onlink nexthop via 10.0.0.12 dev eth2 onlink || true
auto eth2
iface eth2
address 10.1.0.101/32
post-up for i in {1..3}; do arping -q -c 1 -w 0 -i eth2
10.0.0.12; sleep 1; done
post-up ip route add 0.0.0.0/0 nexthop via 10.0.0.11 dev eth1
onlink nexthop via 10.0.0.12 dev eth2 onlink || true
Install ifplugd
Additionally, install and use ifplugd (see page 469). ifplugd modifies the behavior of the Linux routing
table when an interface undergoes a link transition (carrier up/down). The Linux kernel by default leaves
routes up even when the physical interface is unavailable (NO-CARRIER).
After you install ifplugd, edit /etc/default/ifplugd as follows, where eth1 and eth2 are the interface
names that your host uses to connect to the leaves.
Known Limitations
Troubleshooting
cumulusnetworks.com 827
Cumulus Linux 3.7 User Guide
# If a host does not send an ARP reply for holdtime consider the host
down
holdtime = 3
#
# reserved values
#
255 local
254 main
253 default
0 unspec
#
# local
#
#1 inr.ruhep
Read more information on Linux route tables, or you can read the Ubuntu man pages for ip route.
How do I determine that the /32 redistribute neighbor routes are being advertised
to my neighbor?
For BGP, check the advertised routes to the neighbor.
How do I verify that the kernel routing table is being correctly populated?
Use the following workflow to verify that the kernel routing table is being populated correctly and that
routes are being correctly imported/advertised:
1. Verify that ARP neighbor entries are being populated into the Kernel routing table 10.
Both the > and * should be present so that table 10 routes are installed as preferred into the routing
table. If the routes are not being installed, verify the following:
The imported distance of the locally imported kernel routes using the ip import 10
distance X command, where X is not less than the adminstrative distance of the routing
protocol. If the distance is too low, routes learned from the protocol may overwrite the locally
imported routes.
The routes are in the kernel routing table.
3. Confirm that routes are in the BGP/OSPF database and are being advertised.
Applications can use existing interfaces to operate in a VRF context — by binding sockets to the VRF
device or passing the ifindex using cmsg. By default, applications on the switch run against the
default VRF. Services started by systemd run in the default VRF unless the VRF instance is used. If
management VRF (see page 859) is enabled, logins to the switch default to the management VRF.
This is a convenience for users to not have to specify management VRF for each command.
Listen sockets used by services are VRF-global by default unless the application is configured to use
a more limited scope — for example, read about services in the management VRF (see page 861).
Connected sockets (like TCP) are then bound to the VRF domain in which the connection originates.
The kernel provides a sysctl that allows a single instance to accept connections over all VRFs. For
TCP, connected sockets are bound to the VRF the first packet was received. This sysctl is enabled for
Cumulus Linux.
Connected and local routes are placed in appropriate VRF tables.
Neighbor entries continue to be per-interface, and you can view all entries associated with the VRF
device.
A VRF does not map to its own network namespace; however, you can nest VRFs in a network
namespace.
You can use existing Linux tools to interact with it, such as tcpdump.
Cumulus Linux supports up to 255 VRFs on a switch.
You configure VRF by associating each subset of interfaces to a VRF routing table, and configuring an
instance of the routing protocol — BGP or OSPFv2 — for each routing table.
Contents
This topic describes ...
Configure VRF (see page 832)
Specify a Table ID (see page 833)
Bring a VRF Up after Downing It with ifdown (see page 833)
vrf Command (see page 833)
cumulusnetworks.com 831
Cumulus Linux 3.7 User Guide
Configure VRF
Each routing table is called a VRF table, and has its own table ID. You configure VRF using NCLU (see page
88), then place the layer 3 interface in the VRF. You can have a maximum of 255 VRFs on a switch.
When you configure a VRF, you follow a similar process to other network interfaces. Keep in mind the
following for a VRF table:
It can have an IP address, a loopback interface for the VRF.
Associated rules are added automatically.
You can also add a default route to avoid skipping across tables when the kernel forwards the
packet.
Names for VRF tables can be up to 15 characters. However, you cannot use the name mgmt, as this
name can only be used for management VRF (see page 859).
To configure a VRF, run:
These commands result in the following VRF configuration in the /etc/network/interfaces file:
auto rocket
iface rocket
vrf-table auto
auto swp1
iface swp1
vrf rocket
Specify a Table ID
Instead of having Cumulus Linux assign a table ID for the VRF table, you can specify your own table ID in the
configuration. The table ID to name mapping is saved in /etc/iproute2/rt_tables.d/ for name-
based references. So instead of using the auto option above, specify the table ID like this:
If you do specify a table ID, it must be in the range of 1001 to 1255 which is reserved in Cumulus
Linux for VRF table IDs.
vrf Command
The vrf command returns information about VRF tables that is otherwise not available in other Linux
commands, such as iproute. You can also use it to execute non-VRF-specific commands and perform
other tasks related to VRF tables.
To get a list of VRF tables, run:
VRF Table
---------------- -----
cumulusnetworks.com 833
Cumulus Linux 3.7 User Guide
rocket 1016
To return a list of processes and PIDs associated with a specific VRF table, run vrf task list <vrf-
name>. For example:
VRF: rocket
-----------------------
dhclient 2508
sshd 2659
bash 2681
su 2702
bash 2720
vrf 2829
To determine which VRF table is associated with a particular PID, run vrf task identify <pid>. For
example:
rocket
You should manage long-running services with systemd using the service@vrf notation; for example,
systemctl start ntp@mgmt. systemd-based services are stopped when a VRF is deleted and started
when the VRF is created. For example, restarting networking or running an ifdown/ifup sequence.
Services in VRFs
For services that need to run against a specific VRF, Cumulus Linux uses systemd instances, where the
instance is the VRF. In general, you start a service within a VRF like this:
For example, you can run the NTP service in the turtle VRF using:
834 09 January 2019
Cumulus Networks
For example, you can run the NTP service in the turtle VRF using:
In most cases, the instance running in the default VRF needs to be stopped before a VRF instance can start.
This is because the instance running in the default VRF owns the port across all VRFs — that is, it is VRF
global. systemd-based services are stopped when the VRF is deleted and started when the VRF is created.
For example, when you restart networking or run an ifdown/ifup sequence — as mentioned above. The
management VRF chapter (see page 861) details how to do this.
In Cumulus Linux, the following services work with VRF instances:
chef-client
collectd
dhcpd
dhcrelay
hsflowd
netq-agent
ntp
puppet
snmpd
snmptrapd
ssh
zabbix-agent
There are cases where systemd instances do not work; you must use a service-specific
configuration option instead. For example, you can configure rsyslogd to send messages to
remote systems over a VRF:
cumulusnetworks.com 835
Cumulus Linux 3.7 User Guide
An interface is always assigned to only one VRF; any packets received on that interface are
routed using the associated VRF routing table.
Route leaking is typically used for non-overlapping addresses.
Route leaking is supported for both IPv4 and IPv6 routes.
Do not mix static and dynamic route leaking in a fabric.
VRF route leaking is not supported between the tenant VRF and the default VRF with
onlink next hops (bgp unnumbered).
1. Enable the VRF route leaking option, then restart switchd for the change to take effect:
Edit the /etc/cumulus/switchd.conf file. Change the vrf_route_leak_enable option to
TRUE and uncomment the line. For example:
Only set the vrf_route_leak_enable option to TRUE for static VRF route leaking. This
option must be set to false for dynamic route leaking.
2. Use the keyword nexthop-vrf when configuring a static route to specify the VRF through which the
next hop router is reachable.
The example command below adds a static route (10.1.0.0/24) to a VRF named turtle, which is
reachable through a next-hop router (192.168.200.1) over a different VRF, rocket.
1. Enable VRF route leaking, as shown in step 1 of configure-static-routing (see page ) above.
2. Configure static route leaking for EVPN. The following commands provide examples.
To configure static route leaking between VRF1 and VRF2, where VRF1 contains subnets 10.50.1.0
/24, 10.50.2.0/24, 10.50.3.0/24, and 10.50.4.0/24 and VRF2 contains subnets 10.60.1.0/24, 10.60.2.0
/24, 10.60.3.0/24, and 10.60.4.0/24, run these commands:
To configure static route leaking between the default VRF and VRF1, where swp1s0 is the egress port
for subnets under 10.10.0.0/16 in the default VRF, run these commands:
Important
You cannot reach the loopback address of a VRF (the address assigned to the VRF device)
from another VRF.
When using dynamic route leaking, you must use the redistribute command in BGP to
cumulusnetworks.com 837
Cumulus Linux 3.7 User Guide
When using dynamic route leaking, you must use the redistribute command in BGP to
leak non-BGP routes (connected or static routes); you cannot use the network command.
Routes in the management VRF with the next-hop as eth0 or the management interface
are not leaked.
VRF dynamic route leaking is not supported for EVPN environments.
Routes learned with iBGP or multi-hop eBGP in a VRF can be leaked even if their next
hops become unreachable. Therefore, route leaking for BGP-learned routes is
recommended only when they are learned through single-hop eBGP.
Route leaking is supported only between two named VRFs. Route leaking between the
default VRF and other VRFs is not supported currently.
To configure dynamic route leaking, use the net add bgp vrf <TO_VRFNAME> ipv4|ipv6 unicast
import vrf <FROM_VRFNAME> command.
In the following example, routes in the BGP routing table of VRF rocket are dynamically leaked into VRF
turtle.
cumulus@switch:~$ net add bgp vrf turtle ipv4 unicast import vrf
rocket
cumulus@switch:~$ net pending
cumulus@switch:~$ net commit
cumulus@switch:~$ net add bgp vrf rocket ipv4 unicast import vrf
turtle route-map turtle-to-rocket-IPV4
cumulus@switch:~$ net pending
cumulus@switch:~$ net commit
green
RD: 10.1.1.1:2
Export RT: 10.1.1.1:2
To view the BGP routing table, use the net show bgp vrf <VRFNAME> ipv4|ipv6 unicast
command. To view the FRR IP routing table, use the net show route vrf <VRFNAME> command. These
commands show all routes, including routes leaked from other VRFs.
The following example command shows all routes in VRF turtle, including routes leaked from VRF green:
VRF turtle:
K * 0.0.0.0/0 [255/8192] unreachable (ICMP unreachable), 6d07h01m
C>* 10.1.1.1/32 is directly connected, turtle, 6d07h01m
B>* 10.0.100.1/32 [200/0] is directly connected, green(vrf green),
6d05h10m
B>* 10.0.200.0/24 [20/0] via 10.10.2.2, swp1.11, 5d05h10m
B>* 10.0.300.0/24 [200/0] via 10.20.2.2, swp1.21(vrf green), 5d05h10m
C>* 10.10.2.0/30 is directly connected, swp1.11, 6d07h01m
C>* 10.10.3.0/30 is directly connected, swp2.11, 6d07h01m
C>* 10.10.4.0/30 is directly connected, swp3.11, 6d07h01m
B>* 10.20.2.0/30 [200/0] is directly connected, swp1.21(vrf green),
6d05h10m
cumulus@switch:~$ net del bgp vrf turtle ipv4 unicast import vrf
rocket
cumulus@switch:~$ net pending
cumulus@switch:~$ net commit
Do not use the kernel commands; they are no lnger supported and might cause issues when
used with VRF route leaking in FRR.
VRFs are provisioned using NCLU. VRFs can be pre-provisioned in FRRouting too, but they become active
only when configured with NCLU.
You pre-provision a VRF in FRRouting by running the command vrf vrf-name.
A BGP instance corresponding to a VRF can be pre-provisioned by configuring net add bgp vrf
<VRF> autonomous-system <ASN>. Under this context, all existing BGP parameters can be
configured: neighbors, peer-groups, address-family configuration, redistribution, and so forth.
An OSPFv2 instance can be configured using the net add ospf vrf <VRF> command; as with
BGP, all OSPFv2 parameters can be configured.
Static routes (IPv4 and IPv6) can be provisioned in a VRF by specifying the VRF along with the static
route configuration. For example, ip route prefix dev vrf vrf-name. The VRF has to exist
for this configuration to be accepted — either already defined through /etc/network
/interfaces or pre-provisioned in FRRouting. If you want to leak a static route in a VRF, see the
note above (see page ).
However, to show BGP IPv6 routes in the VRF, you need to use vtysh, the FRRouting CLI:
cumulusnetworks.com 843
Cumulus Linux 3.7 User Guide
VRF vrf1012:
O>* 6.0.0.1/32 [110/210] via 200.254.2.10, swp2s0.2, 00:13:30
* via 200.254.2.14, swp2s1.2, 00:13:30
* via 200.254.2.18, swp2s2.2, 00:13:30
O>* 6.0.0.2/32 [110/210] via 200.254.2.10, swp2s0.2, 00:13:30
* via 200.254.2.14, swp2s1.2, 00:13:30
* via 200.254.2.18, swp2s2.2, 00:13:30
O>* 9.9.12.5/32 [110/20] via 200.254.2.10, swp2s0.2, 00:13:29
* via 200.254.2.14, swp2s1.2, 00:13:29
* via 200.254.2.18, swp2s2.2, 00:13:29
To show which interfaces are in a VRF (either BGP or OSPF), run the net show vrf list command. The
following command shows which interfaces are in the VRFs configured on the switch:
VRF: turtle
--------------------
To show the interfaces for a specific VRF, run the net show vrf list <vrf_name> command. The
following command shows which interfaces are in VRF turtle:
You can only specify one VRF with the net show vrf list <vrf_name> command. For
example, net show vrf list mgmt turtle is an invalid command.
To show the VNIs for the interfaces in a VRF, run the net show vrf vni command. For example:
To see the VNIs for the interfaces in a VRF in JSON format, run the net show vrf vni json command.
For example:
cumulusnetworks.com 845
Cumulus Linux 3.7 User Guide
"vni":104001,
"vxlanIntf":"vxlan4001",
"sviIntf":"vlan4001",
"state":"Up",
"routerMac":"44:39:39:ff:40:94"
}
]
}
Show VRFs configured in BGP, including the default. A non-zero ID is a VRF that has also been actually
provisioned — that is, defined in /etc/network/interfaces:
inet6 fe80::202:ff:fe00:a/64
ND advertised reachable time is 0 milliseconds
ND advertised retransmit interval is 0 milliseconds
ND router advertisements are sent every 600 seconds
ND router advertisements lifetime tracks ra-interval
ND router advertisement default router preference is medium
Hosts use stateless autoconfig for addresses.
switch# exit
cumulus@switch:~$
switch# exit
cumulus@switch:~$
To see the routing table for each VRF, use the show up route vrf all command. The OSPF route is
denoted in the row that starts with O:
cumulusnetworks.com 847
Cumulus Linux 3.7 User Guide
cumulusnetworks.com 849
Cumulus Linux 3.7 User Guide
To see a list of links associated with a particular VRF table, run ip link list <vrf-name>. For
example:
VRF: rocket
--------------------
swp1.10@swp1 UP 6c:64:1a:00:5a:0c <BROADCAST,
MULTICAST,UP,LOWER_UP>
swp2.10@swp2 UP 6c:64:1a:00:5a:0d <BROADCAST,
MULTICAST,UP,LOWER_UP>
To see a list of routes associated with a particular VRF table, run ip route list <vrf-name>. For
example:
VRF: rocket
--------------------
unreachable default metric 8192
10.1.1.0/24 via 10.10.1.2 dev swp2.10
10.1.2.0/24 via 10.99.1.2 dev swp1.10
broadcast 10.10.1.0 dev swp2.10 proto kernel scope link src
10.10.1.1
10.10.1.0/28 dev swp2.10 proto kernel scope link src 10.10.1.1
local 10.10.1.1 dev swp2.10 proto kernel scope host src 10.10.1.1
broadcast 10.10.1.15 dev swp2.10 proto kernel scope link src
10.10.1.1
broadcast 10.99.1.0 dev swp1.10 proto kernel scope link src
10.99.1.1
10.99.1.0/30 dev swp1.10 proto kernel scope link src 10.99.1.1
local 10.99.1.1 dev swp1.10 proto kernel scope host src 10.99.1.1
broadcast 10.99.1.3 dev swp1.10 proto kernel scope link src
10.99.1.1
cumulusnetworks.com 851
Cumulus Linux 3.7 User Guide
You can also show routes in a VRF using ip [-6] route show vrf <name>. This command
omits local and broadcast routes, which can clutter the output.
auto swp1
iface swp1
link-autoneg on
link-speed 10000
vrf vrf1
auto bridge
iface bridge
bridge-ports vlan101
bridge-vids 101
bridge-vlan-aware yes
auto vlan101
iface vlan101
address 20.1.6.1/24
address 2001:20:1:6::1/80
vlan-id 101
vlan-raw-device bridge
auto vrf1
iface vrf1
address 6.1.0.6/32
address 2001:6:1::6/128
vrf-table auto
!
router bgp 65001 vrf vrf1
no bgp default ipv4-unicast
bgp bestpath as-path multipath-relax
bgp bestpath compare-routerid
neighbor LEAF peer-group
neighbor LEAF remote-as external
neighbor LEAF capability extended-nexthop
neighbor swp1.101 interface peer-group LEAF
neighbor swp2.101 interface peer-group LEAF
!
address-family ipv4 unicast
redistribute connected
cumulusnetworks.com 853
Cumulus Linux 3.7 User Guide
To enable the service at boot time you should also run systemctl enable <service>@<vrf-name>.
To continue with the previous example:
In addition, you need to create a separate default file in /etc/default for every instance of a DHCP
server and/or relay in a non-default VRF; this is where you set the server and relay options. To run multiple
instances of any of these services, you need a separate file for each instance. The files must be named as
follows:
isc-dhcp-server-<vrf-name>
isc-dhcp-server6-<vrf-name>
isc-dhcp-relay-<vrf-name>
isc-dhcp-relay6-<vrf-name>
See the example configuration below for more details.
Typically a service running in the default VRF owns a port across all VRFs. If the VRF local instance is
preferred, the global one may need to be disabled and stopped first.
VRF is a layer 3 routing feature. It only makes sense to run programs that use AF_INET and AF_INET6
sockets in a VRF. VRF context does not affect any other aspects of the operation of a program.
This method only works with systemd-based services.
Example Configuration
In the following example, there is one IPv4 network with a VRF named rocket and one IPv6 network with a
VRF named turtle.
The IPv4 DHCP server/relay network looks like this: The IPv6 DHCP server/relay network looks like this:
cumulusnetworks.com 855
Cumulus Linux 3.7 User Guide
You can create this configuration using You can create this configuration using
the vrf command (see above (see page the vrf command (see above (see page
834) for more details): ) for more details):
3. 3.
cumulusnetworks.com 857
Cumulus Linux 3.7 User Guide
You can create this configuration using You can create this configuration using
the vrf command (see above (see page the vrf command (see above (see page
834) for more details): ) for more details):
Or:
Setting the router ID outside of BGP via the router-id option causes all BGP instances to get the
same router ID. If you want each BGP instance to have its own router ID, specify the router-id
under the BGP instance using bgp router-id. If both are specified, the one under the BGP
instance overrides the one provided outside BGP.
You cannot configure EVPN address families (see page 539) within a VRF.
Management VRF
Management VRF is a subset of VRF (see page 830) (virtual routing tables and forwarding) and provides a
separation between the out-of-band management network and the in-band data plane network. For all
VRFs, the main routing table is the default table for all of the data plane switch ports. With management
VRF, a second table, mgmt, is used for routing through the Ethernet ports of the switch. The mgmt name is
special cased to identify the management VRF from a data plane VRF. FIB rules are installed for DNS servers
because this is the typical deployment case.
Cumulus Linux only supports eth0 as the management interface, or eth1, depending on the switch
platform. The Ethernet ports are software-only parts that are not hardware accelerated by switchd. VLAN
subinterfaces, bonds, bridges, and the front panel switch ports are not supported as management
interfaces.
When management VRF is enabled, logins to the switch are set into the management VRF context. IPv4 and
IPv6 networking applications (for example, Ansible, Chef, and apt-get) run by an administrator
communicate out the management network by default. This default context does not impact services run
through systemd and the systemctl command, and does not impact commands examining the state of
the switch, such as the ip command to list links, neighbors, or routes.
The management VRF configurations in this chapter contain a localhost loopback IP address
(127.0.0.1/8). Adding the loopback address to the L3 domain of the management VRF prevents
issues with applications that expect the loopback IP address to exist in the VRF, such as NTP.
Contents
This topic describes ...
Enable Management VRF (see page 860)
Run Services within the Management VRF (see page 861)
Enable Polling with snmpd in a Management VRF (see page 862)
Enable hsflowd (see page 863)
ping or traceroute on the Management VRF (see page 864)
Run Services as a Non-root User (see page 864)
OSPF and BGP (see page 865)
SSH within a Management VRF Context (see page 866)
View the Routing Tables (see page 866)
mgmt Interface Class (see page 867)
Management VRF and DNS (see page 868)
Incompatibility with cl-ns-mgmt (see page 869)
cumulusnetworks.com 859
Cumulus Linux 3.7 User Guide
The management VRF must be named mgmt to differentiate from a data plane VRF.
The NCLU commands above create the following snippets in the /etc/network/interfaces file:
...
auto eth0
iface eth0 inet dhcp
vrf mgmt
...
auto mgmt
iface mgmt
address 127.0.0.1/8
vrf-table auto
...
When you commit the change to add the management VRF, all connections over eth0 are
dropped. This can impact any automation that might be running, such as Ansible or Puppet
scripts.
If you take down the management VRF using ifdown, to bring it back up you need to do one of two things:
Use ifup --with-depends <vrf>
Use ifreload -a
For example:
Running ifreload -a disconnects the session for any interface configured as auto.
1. Configure the management VRF as described in the Enabling Management VRF section above (see
page 859).
2. If NTP is running, stop the service:
cumulusnetworks.com 861
3.
Cumulus Linux 3.7 User Guide
After you enable ntp@mgmt, you can verify that NTP peers are active:
The message Duplicate IPv4 address detected, some interfaces may not be
visible in IP-MIB displays after starting snmpd in the mgmt VRF. This is because the IP-MIB
assumes the same IP address cannot be used twice on the same device; the IP-MIB is not VRF
aware. This message is a warning that the SNMP IP-MIB detects overlapping IP addresses on the
system; it does not indicate a problem and is non-impacting to the operation of the switch.
Enable hsflowd
If you are using sFlow to monitor traffic in the management VRF, you need to complete the following steps
to enable sFlow.
1. Add the hsflowd process to the systemd configuration file in /etc/vrf. Edit the /etc/vrf
/systemd.conf file with a text editor.
3. Disable hsflowd to ensure it does not start in the default VRF if the system is rebooted:
cumulusnetworks.com 863
6.
Cumulus Linux 3.7 User Guide
Or:
1. Copy the original service file to its new name and store the file in /etc/systemd/system.
2. If there is a User directive, comment it out. If it exists, you can find it under [Service].
[Unit]
Description=Example
Documentation=https://www.example.io/
[Service]
#User=username
ExecStart=/usr/local/bin/myservice agent -data-dir=/tmp
/myservice -bind=192.168.0.11
[Install]
WantedBy=multi-user.target
3. Modify the ExecStart line to /usr/bin/vrf exec mgmt /sbin/runuser -u USER -- COMMAND
. For example, to have the cumulus user run the foocommand:
[Unit]
Description=Example
Documentation=https://www.example.io/
[Service]
#User=username
ExecStart=/usr/bin/vrf task exec mgmt /sbin/runuser -u cumulus
-- foocommand
[Install]
WantedBy=multi-user.target
^O
^X
cumulus@switch:~$
cumulusnetworks.com 865
Cumulus Linux 3.7 User Guide
This also creates a route on the neighbor device to the management network through the data
plane, which might not be desired.
Cumulus Networks recommends you always use route maps to control the advertised networks
redistributed by the redistribute connected command. For example, you can specify a route map to
redistribute routes in this way (for both BGP and OSPF):
These commands produce the following configuration snippet in the /etc/frr/frr.conf file:
<routing protocol>
redistribute connected route-map REDISTRIBUTE-CONNECTED
If you use ip route get to return information about a single route, the command resolves over the mgmt
table by default. To obtain information about the route in the switching silicon, use:
To get the route for any VRF, run the following command:
The management VRF interface class is not supported if you are configuring Cumulus Linux using
NCLU (see page 88).
You configure the management interface in the /etc/network/interfaces file. In the example below,
the management interface, eth0 and the management VRF stanzas are added to the mgmt interface class:
auto lo
iface lo inet loopback
allow-mgmt eth0
iface eth0 inet dhcp
vrf mgmt
allow-mgmt mgmt
iface mgmt
address 127.0.0.1/8
vrf-table auto
When you run ifupdown2 commands against the interfaces in the mgmt class, include --allow=mgmt
with the commands. For example, to see which interfaces are in the mgmt interface class, run:
cumulusnetworks.com 867
Cumulus Linux 3.7 User Guide
You can still bring the management interface up and down using ifup eth0 and ifdown eth0.
Nameservers configured through DHCP are updated automatically. Statically configured nameservers
(configured in the /etc/resolv.conf file) only get updated when you run ifreload -a.
Because DNS lookups are forced out of the management interface using FIB rules, this might
affect data plane ports if overlapping addresses are used. For example, when the DNS server IP
address is learned over the management VRF, a FIB rule is created for that IP address. When
DHCP relay is configured for the same IP address, a DHCP discover packet received on the front
panel port is forwarded out of the management interface (eth0) even though a route is present
out the front-panel port.
If you don't specify a DNS server and you lose in band connectivity, DNS will not work through the
management VRF. Cumulus Linux does not assume all DNS servers are reachable through the
management VRF.
Management VRF has replaced the management namespace functionality in Cumulus Linux. The
management namespace feature (used with the cl-ns-mgmt utility) has been deprecated, and
the cl-ns-mgmt command has been removed.
GRE Tunneling
Generic Routing Encapsulation (GRE) is a tunneling protocol that encapsulates network layer protocols
inside virtual point-to-point links over an Internet Protocol network. The two endpoints are identified by the
tunnel source and tunnel destination addresses at each endpoint.
GRE packets travel directly between the two endpoints through a virtual tunnel. As a packet comes across
other routers, there is no interaction with its payload; the routers only parse the outer IP packet. When the
packet reaches the endpoint of the GRE tunnel, the outer packet is de-encapsulated, the payload is parsed,
then forwarded to its ultimate destination.
GRE uses multiple protocols over a single-protocol backbone and is less demanding than some of the
alternative solutions, such as VPN. You can use GRE to transport protocols that the underlying network
does not support, work around networks with limited hops, connect non-contiguous subnets, and allow
VPNs across wide area networks.
Notes
GRE Tunneling is supported for Mellanox (Spectrum ASIC) switches only.
Only static routes are supported as a destination for the tunnel interface.
IPv6 endpoints are not supported.
The following example shows two sites that use IPv4 addresses. Using GRE tunneling, the two end points
can encapsulate an IPv4 or IPv6 payload inside an IPv4 packet. The packet is routed based on the
destination in the outer IPv4 header.
cumulusnetworks.com 869
Cumulus Linux 3.7 User Guide
Contents
This topic describes ...
Contents (see page 870)
Configure GRE Tunneling (see page 870)
Verify GRE Tunnel Settings (see page 872)
Delete a GRE Tunnel Interface (see page 872)
Change GRE Tunnel Settings (see page 872)
1. Create a tunnel interface by specifying an interface name, the tunnel mode as gre, the source (local)
and destination (remote) underlay IP address, and the ttl (optional).
2. Bring the GRE tunnel interface up.
3. Assign an IP address to the tunnel interface.
4. Add route entries to encapsulate the packets using the tunnel interface.
The following configuration example shows the commands used to set up a bidirectional GRE tunnel
between two endpoints: Tunnel-R1 and Tunnel-R2.
The local tunnel endpoint for Tunnel-R1 is 10.0.0.9 and the remote endpoint is 10.0.0.2.
The local tunnel endpoint for Tunnel-R2 is 10.0.0.2 and the remote endpoint is 10.0.0.9.
Tunnel-R1 commands:
Tunnel-R2 commands:
To apply the GRE tunnel configuration automatically at reboot, instead of running the commands from the
command line (as above), you can add the following commands directly in the /etc/network
/interfaces file.
# Tunnel-R2 configuration
auto swp1 #underlay interface for tunnel
iface swp1
link-speed 10000
link-duplex full
cumulusnetworks.com 871
Cumulus Linux 3.7 User Guide
link-autoneg off
address 10.0.0.2/24
auto Tunnel-R1 #overlay interface for tunnel
iface Tunnel-R1 inet static
address 10.0.200.1/24
pre-up ip tunnel add Tunnel-R1 mode gre local 10.0.0.2 remote
10.0.0.9 ttl 255
post-up ip route add 10.0.200.0/24 dev Tunnel-R1
post-down ip tunnel del Tunnel-R1
For more information about the pre-up, post-up, and post-down commands, run the man interfaces
command.
You can delete a GRE tunnel directly from the /etc/network/interfaces file instead of using
the ip tunnel del command. Make sure you run the ifreload - a command after you
update the interfaces file.
You can make changes to GRE tunnel settings directly in the /etc/network/interfaces file
instead of using the ip tunnel change command. Make sure you run the ifreload - a
command after you update the interfaces file.
Contents
This topic describes ...
PIM Overview (see page 874)
PIM Messages (see page 875)
PIM Neighbors (see page 877)
PIM Sparse Mode (PIM-SM) (see page 878)
Any-source Multicast Routing (see page 878)
PIM Null-Register (see page 882)
PIM and ECMP (see page 882)
Configure PIM (see page 883)
Configure PIM Using FRRouting (see page 884)
Example Configurations (see page 885)
Source Specific Multicast Mode (SSM) (see page 888)
IP Multicast Boundaries (see page 889)
Multicast Source Discovery Protocol (MSDP) (see page 889)
Verify PIM (see page 891)
Source Starts First (see page 891)
Receiver Joins First (see page 893)
PIM in a VRF (see page 894)
BFD for PIM Neighbors (see page 897)
Troubleshooting (see page 897)
FHR Stuck in Registering Process (see page 897)
No *,G Is Built on LHR (see page 899)
No mroute Created on FHR (see page 899)
cumulusnetworks.com 873
Cumulus Linux 3.7 User Guide
PIM Overview
Network Description
Element
First Hop The FHR is the router attached to the source. The FHR is responsible for the PIM register
Router process.
(FHR)
Last Hop The LHR is the last router in the path, attached to an interested multicast receiver. There is
Router a single LHR for each network subnet with an interested receiver, however multicast
(LHR) groups can have multiple LHRs throughout the network.
Rendezvous The RP allows for the discovery of multicast sources and multicast receivers. The RP is
Point (RP) responsible for sending PIM Register Stop messages to FHRs. The PIM RP address must be
globally routable.
Do not use a spine switch as an RP. If you are running BGP (see page 756) on a
spine switch and it is configured for allow-as in origin, BGP does not accept
routes learned through other spines that do not originate on the spine itself. The
RP must route to a multicast source. During a single failure scenario, this is not
possible if the RP is on the spine. This also applies to Multicast Source Discovery
Protocol (MSDP — see below (see page 873)).
The Shared Tree is the multicast tree rooted at the RP. When receivers want to join a
multicast group, join messages are sent along the shared tree towards the RP.
Network Description
Element
PIM Shared
Tree (RP
Tree) or (*,
G) Tree
PIM The SPT is the multicast tree rooted at the multicast source for a given group. Each
Shortest multicast source has a unique SPT. The SPT can match the RP Tree, but this is not a
Path Tree requirement. The SPT represents the most efficient way to send multicast traffic from a
(SPT) or (S, source to the interested receivers.
G) Tree
Outgoing The outgoing interface indicates the interface on which a PIM or multicast packet is be
Interface sent out. OIFs are the interfaces towards the multicast receivers.
(OIF)
Incoming The incoming interface indicates the interface on which a multicast packet is received. An
Interface IIF can be the interface towards the source or towards the RP.
(IIF)
Reverse Reverse path forwarding interface is the path used to reach the RP or source. There must
Path be a valid PIM neighbor to determine the RPF unless directly connected to source.
Forwarding
Interface
(RPF
Interface)
Multicast A multicast route indicates the multicast source and multicast group as well as associated
Route OIFs, IIFs, and RPF information.
(mroute)
Star-G The (*,G) mroute represents the RP Tree. The * is a wildcard indicating any multicast
mroute (*, source. The G is the multicast group. An example (*,G) is (*, 239.1.2.9).
G)
S-G mroute This is the mroute representing the source entry. The S is the multicast source IP. The G is
(S,G) the multicast group. An example (S,G) is (10.1.1.1, 239.1.2.9).
PIM Messages
PIM Description
Message
PIM Hello PIM hellos announce the presence of a multicast router on a segment. PIM hellos are sent
every 30 seconds by default.
cumulusnetworks.com 875
Cumulus Linux 3.7 User Guide
PIM Description
Message
PIM Join PIM J/P messages indicate the groups that a multicast router would like to receive or no
/Prune (J/P) longer receive. Often PIM join/prune messages are described as distinct message types,
but are actually a single PIM message with a list of groups to join and a second list of
groups to leave. PIM J/P messages can be to join or prune from the SPT or RP trees (also
called (*,G) joins or (S,G) joins).
PIM join/prune messages are sent to PIM neighbors on individual interfaces. Join
/prune messages are never unicast.
This PIM join/prune is for group 239.1.1.9, with 1 join and 0 prunes for the group. Join
/prunes for multiple groups can exist in a single packet.
PIM Description
Message
PIM Register PIM register messages are unicast packets sent from an FHR destined to the RP to
advertise a multicast group. The FHR fully encapsulates the original multicast packet in
PIM register messages. The RP is responsible for decapsulating the PIM register message
and forwarding it along the (*,G) tree towards the receivers.
PIM Null PIM null register is a special type of PIM register message where the Null-Register flag is set
Register within the packet. Null register messages are used for an FHR to signal to an RP that a
source is still sending multicast traffic. Unlike normal PIM register messages, null register
messages do not encapsulate the original data packet.
PIM Register PIM register stop messages are sent by an RP to the FHR to indicate that PIM register
Stop messages must no longer be sent.
IGMP IGMP membership reports are sent by multicast receivers to tell multicast routers of their
Membership interest in a specific multicast group. IGMP join messages trigger PIM *,G joins. IGMP
Report version 2 queries are sent to the all hosts multicast address, 224.0.0.1. IGMP version 2
(IGMP Join) reports (joins) are sent to the group's multicast address. IGMP version 3 messages are
sent to an IGMP v3 specific multicast address, 224.0.0.22.
IGMP Leave IGMP leaves tell a multicast router that a multicast receiver no longer wants the multicast
group. IGMP leave messages trigger PIM *,G prunes.
PIM Neighbors
When PIM is configured on an interface, PIM Hello messages are sent to the link local multicast group
224.0.0.13. Any other router configured with PIM on the segment that hears the PIM Hello messages build
a PIM neighbor with the sending device.
cumulusnetworks.com 877
Cumulus Linux 3.7 User Guide
PIM neighbors are stateless. No confirmation of neighbor relationship is exchanged between PIM
endpoints.
This behavior is in contrast to PIM Dense Mode (PIM-DM), where traffic is flooded, and the
network must be periodically notified that the receiver wants to stop receiving the multicast
stream.
PIM-SM has three configuration options: Any-source Multicast (ASM), Bi-directional Multicast (BiDir), and
Source Specific Multicast (SSM):
Any-source Mulitcast (ASM) is the traditional, and most commonly deployed PIM implementation.
ASM relies on rendezvous points to connect multicast senders and receivers that then dynamically
determine the shortest path through the network between source and receiver, to efficiently send
multicast traffic.
Bidirectional PIM (BiDir) forwards all traffic through the multicast rendezvous point (RP) instead of
tracking multicast source IPs, allowing for greater scale while resulting in inefficient forwarding of
network traffic.
Source Specific Multicast (SSM) requires multicast receivers to know exactly from which source they
want to receive multicast traffic instead of relying on multicast rendezvous points. SSM requires the
use of IGMPv3 on the multicast clients.
Cumulus Linux only supports ASM and SSM. PIM BiDir is not currently supported.
For additional information, see RFC 7761 - Protocol Independent Multicast - Sparse Mode.
This creates a (*,G) mroute with an OIF of the interface on which the IGMP Membership Report is received
and an IIF of the RPF interface for the RP.
The LHR generates a PIM (*,G) join message and sends it from the interface towards the RP. Each multicast
router between the LHR and the RP builds a (*,G) mroute with the OIF being the interface on which the PIM
join message is received and an Incoming Interface of the reverse path forwarding interface for the RP.
When the RP receives the (*,G) Join message, it does not send any additional PIM join messages.
The RP will maintain a (*,G) state as long as the receiver wishes to receive the multicast group.
Unlike multicast receivers, multicast sources do not send IGMP (or PIM) messages to the FHR. A
multicast source begins sending, and the FHR receives the traffic and builds both a (*,G) and an (S,
G) mroute. The FHR then begins the PIM register process.
cumulusnetworks.com 879
Cumulus Linux 3.7 User Guide
You can view the configured prefix-list with the net show mroute command:
In the example above, 235.0.0.0 is configured for SPT switchover, identified by pimreg.
cumulusnetworks.com 881
Cumulus Linux 3.7 User Guide
Receiving a PIM register stop without any associated PIM joins leaves the FHR without any outgoing
interfaces. The FHR drops this multicast traffic until a PIM join is received.
PIM register messages are sourced from the interface that receives the multicast traffic and are
destined to the RP address. The PIM register is not sourced from the interface towards the RP.
PIM Null-Register
To notify the RP that multicast traffic is still flowing when the RP has no receiver, or if the RP is not on the
SPT tree, the FHR periodically sends PIM null register messages. The FHR sends a PIM register with the Null-
Register flag set, but without any data. This special PIM register notifies the RP that a multicast source is still
sending, in case any new receivers come online.
After receiving a PIM Null-Register, the RP immediately sends a PIM register stop to acknowledge the
reception of the PIM null register message.
The ip pim ecmp rebalance command recalculates all stream paths in the event of a loss of path over
one of the ECMP paths. Without this command, only the streams that are using the path that is lost are
moved to alternate ECMP paths. Rebalance does not affect existing groups.
The show ip pim nexthop provides you with a way to review which nexthop is selected for a specific
source/group:
Configure PIM
To configure PIM using NCLU:
PIM must be enabled on all interfaces facing multicast sources or multicast receivers, as
well as on the interface where the RP address is configured.
2. Optional: Run the following command to enable IGMP (either version 2 or 3) on the interfaces with
hosts attached. IGMP version 3 is the default, so you only need to specify the version if you want to
use IGMP version 2:
You must configure IGMP on all interfaces where multicast receivers exist.
Unless you are using PIM SSM, each PIM-SM enabled device must configure a static RP to a
group mapping, and all PIM-SM enabled devices must have the same RP to group mapping
configuration.
IP PIM RP group ranges can overlap. Cumulus Linux performs a longest prefix match (LPM)
to determine the RP. For example:
In this example, if the group is in 224.10.2.5, the RP that gets selected is 192.168.0.2. If the
cumulusnetworks.com 883
Cumulus Linux 3.7 User Guide
In this example, if the group is in 224.10.2.5, the RP that gets selected is 192.168.0.2. If the
group is 224.10.15, the RP that gets selected is 192.168.0.1.
zebra=yes
pimd=yes
4. In a terminal, run the vtysh command to start the FRRouting CLI on the switch.
PIM must be enabled on all interfaces facing multicast sources or multicast receivers, as
well as on the interface where the RP address is configured.
6.
884 09 January 2019
Cumulus Networks
6. Optional: Run the following commands to enable IGMP (either version 2 or 3) on the interfaces with
hosts attached. IGMP version 3 is the default; you only need to specify the version if you want to use
IGMP version 2:
You must configure IGMP on all interfaces where multicast receivers exist.
Each PIM-SM enabled device must configure a static RP to a group mapping, and all PIM-
SM enabled devices must have the same RP to group mapping configuration.
IP PIM RP group ranges can overlap. Cumulus Linux performs a longest prefix match (LPM)
to determine the RP. For example:
Example Configurations
Complete Multicast Network Configuration Example
The following is example configuration:
RP Configuration
cumulusnetworks.com 885
Cumulus Linux 3.7 User Guide
!
interface lo
description RP Address interface
ip ospf area 0.0.0.0
ip pim sm
!
interface swp1
description interface to FHR
ip ospf area 0.0.0.0
ip ospf network point-to-point
ip pim sm
!
interface swp2
description interface to LHR
ip ospf area 0.0.0.0
ip ospf network point-to-point
ip pim sm
!
router ospf
ospf router-id 192.168.0.1
!
line vty
!
end
FHR Configuration
interface swp50
description interface to LHR
ip ospf area 0.0.0.0
ip ospf network point-to-point
ip pim sm
!
router ospf
ospf router-id 192.168.1.1
!
line vty
!
end
LHR Configuration
cumulusnetworks.com 887
Cumulus Linux 3.7 User Guide
end
PIM considers 232.0.0.0/8 the default range if the ssm range is not configured. If this default is
overridden with a prefix-list, all ranges that should be considered must be in the prefix-list
You can also perform this configuration with the FRRouting CLI:
IP Multicast Boundaries
Multicast boundaries enable you to limit the distribution of multicast traffic by setting boundaries with the
goal of pushing multicast to a subset of the network.
With such boundaries in place, any incoming IGMP or PIM joins are dropped or accepted based upon the
prefix-list specified. The boundary is implemented by applying an IP multicast boundary OIL (outgoing
interface list) on an interface.
To configure the boundary, use NCLU:
Cumulus Linux MSDP support is primarily for anycast-RP configuration, rather than multiple
multicast domains. You must configure each MSDP peer in a full mesh, as SA messages are not
received and re-forwarded.
The following steps demonstrate how to configure a Cumulus switch to use the MSDP:
1. Add an anycast IP address to the loopback interface for each RP in the domain:
2. On every multicast switch, configure the group to RP mapping using the anycast address:
cumulusnetworks.com 889
Cumulus Linux 3.7 User Guide
2. On every multicast switch, configure the group to RP mapping using the anycast address:
3. Configure the MSDP mesh group for all active RPs (the following example uses 3 RPs):
The mesh group must include all RPs in the domain as members, with a unique address as
the source. This configuration results in MSDP peerings between all RPs.
4. Pick the local loopback address as the source of the MSDP control packets:
If the network is unnumbered and uses unnumbered BGP as the IGP, avoid using the anycast IP
address for establishing unicast or multicast peerings. For PIM-SM, ensure that the unique
address is used as the PIM hello source by setting the source:
Verify PIM
The following outputs are based on the Cumulus Reference Topology with cldemo-pim.
cumulusnetworks.com 891
Cumulus Linux 3.7 User Guide
On the RP, no mroute state is created, but the net show pim upstream output includes the S,G:
As a receiver joins the group, the mroute output interface on the FHR transitions from none to the RPF
interface of the RP:
cumulusnetworks.com 893
Cumulus Linux 3.7 User Guide
On the RP:
PIM in a VRF
VRFs (see page 830) divide the routing table on a per-tenant basis, ultimately providing for separate layer 3
networks over a single layer 3 infrastructure. With a VRF, each tenant has its own virtualized layer 3
network, so IP addresses can overlap between tenants.
PIM in a VRF enables PIM trees and multicast data traffic to run inside a layer 3 virtualized network, with a
separate tree per domain or tenant. Each VRF has its own multicast tree with its own RP(s), sources, and so
on. Therefore, you can have one tenant per corporate division, client, or product; for example.
VRFs on different switches typically connect or are peered over subinterfaces, where each subinterface is in
its own VRF, provided MP-BGP VPN is not enabled or supported.
To configure PIM in a VRF, run the following commands. First, add the VRFs and associate them with switch
ports:
Then add the PIM configuration to FRR, review and commit the changes:
These commands create the following configuration in the /etc/network/interfaces file and the /etc
/frr/frr.conf file:
auto purple
iface purple
vrf-table auto
auto blue
iface blue
vrf-table auto
auto swp1
iface swp1
vrf purple
auto swp49.1
iface swp49.1
vrf purple
auto swp2
iface swp2
vrf blue
auto swp49.2
cumulusnetworks.com 895
Cumulus Linux 3.7 User Guide
iface swp49.2
vrf blue
...
vrf purple
ip pim rp 192.168.0.1 224.0.0.0/4
!
vrf blue
ip pim rp 192.168.0.1 224.0.0.0/4
!
int swp49.2
ip pim sm
Troubleshooting
cumulusnetworks.com 897
Cumulus Linux 3.7 User Guide
1. Validate that the FHR can reach the RP. If the RP and FHR can not communicate, the registration
process fails:
2. On the RP, use tcpdump to see if the PIM register packets are arriving:
3. If PIM registration packets are being received, verify that they are seen by PIM by issuing debug
pim packets from within FRRouting:
4. Repeat the process on the FHR to see if PIM register stop messages are being received on the FHR
and passed to the PIM process:
To troubleshoot this issue, if both PIM and IGMP are enabled, ensure that IGMPv3 joins are being sent by
the receiver:
cumulusnetworks.com 899
Cumulus Linux 3.7 User Guide
3. If PIM is configured, verify that the RPF interface for the source matches the interface on which the
multicast traffic is received:
This is expected behavior. You can see the active source on the RP with the show ip pim upstream
command:
For Mellanox chipsets, refer to TCAM Resource Profiles for Mellanox Switches (see page 708).
cumulusnetworks.com 901
Cumulus Linux 3.7 User Guide
Monitoring
902 and Troubleshooting 09 January 2019
Cumulus Networks
Contents
This topic describes ...
Serial Console (see page 903)
Configure the Serial Console on ARM Switches (see page 903)
Configure the Serial Console on x86 Switches (see page 904)
Show General System Information (see page 905)
Diagnostics Using cl-support (see page 905)
Send Log Files to a syslog Server (see page 906)
NCLU (see page 906)
Log Technical Details (see page 906)
Local Logging (see page 907)
Enable Remote syslog (see page 908)
Write to syslog with Management VRF Enabled (see page 909)
Rate-limit syslog Messages (see page 909)
Harmless syslog Error: Failed to reset devices.list (see page 910)
Syslog Troubleshooting Tips (see page 910)
Next Steps (see page 913)
Serial Console
The serial console can be a useful tool for debugging issues, especially when you find yourself rebooting the
switch often or if you don’t have a reliable network connection.
The default serial console baud rate is 115200, which is the baud rate ONIE uses.
You must reboot the switch for the baudrate change to take effect.
Incorrect configuration settings in grub can cause the switch to be inaccessible via the console.
Grub changes should be carefully reviewed before implementation.
1. Edit /etc/default/grub. The two relevant lines in /etc/default/grub are as follows; replace
the 115200 value with a valid value specified above in the --speed variable in the first line and in
the console variable in the second line:
2. After you save your changes to the grub configuration, type the following at the command prompt:
cumulus@switch:~$ update-grub
3. If you plan on accessing your switch's BIOS over the serial console, you need to update the baud rate
in the switch BIOS. For more information, see this knowledge base article.
4. Reboot the switch.
For general information about the switch, run net show system, which gathers information about the
switch from a number of files in the system:
cumulusnetworks.com 905
Cumulus Linux 3.7 User Guide
Args:
[reason]: Optional reason to give for invoking cl-support.
Saved into tarball's cmdline.args file.
Options:
-h: Print this usage statement
-s: Security sensitive collection
-t: User filename tag
-v: Verbose
-e MODULES: Enable modules. Comma separated module list (run with -e
help for module names)
-d MODULES: Disable modules. Comma separated module list (run with -d
help for module names)
NCLU
The remote syslog server can be configured on the switch using the following configuration:
cumulus@switch:~$ net add syslog host ipv4 192.168.0.254 port udp 514
cumulus@switch:~$ net pending
cumulus@switch:~$ net commit
This creates a file called /etc/rsyslog.d/11-remotesyslog.conf in the rsyslog directory. The file
has the following content:
NCLU cannot configure a remote syslog if management VRF is enabled on the switch. Refer to
Writing to syslog with Management VRF Enabled (see page 909) below.
There are applications in Cumulus Linux that could write directly to a log file without going through
rsyslog. These files are typically located in /var/log/.
All Cumulus Linux rules are stored in separate files in /etc/rsyslog.d/, which are called at the
end of the GLOBAL DIRECTIVES section of /etc/rsyslog.conf. As a result, the RULES
section at the end of rsyslog.conf is ignored because the messages have to be processed by
the rules in /etc/rsyslog.d and then dropped by the last line in /etc/rsyslog.d/99-
syslog.conf.
Local Logging
Most logs within Cumulus Linux are sent through rsyslog, which then writes them to files in the /var
/log directory. There are default rules in the /etc/rsyslog.d/ directory that define where the logs are
written:
Rule Purpose
10-rules. Sets defaults for log messages, include log format and log rate limits.
conf
15-crit.conf Logs crit, alert or emerg log messages to /var/log/crit.log to ensure they are not
rotated away rapidly.
20-clagd. Logs clagd messages to /var/log/clagd.log for MLAG (see page 427).
conf
22-linkstate. Logs link state changes for all physical and logical network links to /var/log/linkstate
conf
30-ptmd. Logs ptmd messages to /var/log/ptmd.log for Prescription Topology Manager (see
conf page 348).
35-rdnbrd. Logs rdnbrd messages to /var/log/rdnbrd.log for redistribute neighbor (see page
conf 821).
40-netd. Logs netd messages to /var/log/netd.log for NCLU (see page 88).
conf
45-frr.conf Logs routing protocol messages to /var/log/frr/frr.log. This includes BGP and OSPF log
messages.
cumulusnetworks.com 907
Cumulus Linux 3.7 User Guide
Rule Purpose
99-syslog. All remaining processes that use rsyslog are sent to /var/log/syslog.
conf
Log files that are rotated are compressed into an archive. Processes that do not use rsyslog write to their
own log files within the /var/log directory. For more information on specific log files, see Troubleshooting
Log Files (see page 941).
1. Create a file in /etc/rsyslog.d/. Make sure it starts with a number lower than 99 so that it
executes before log messages are dropped in, such as 20-clagd.conf or 25-switchd.conf.
Our example file is called /etc/rsyslog.d/11-remotesyslog.conf. Add content similar to the
following:
@192.168.1.2:514
This configuration sends log messages to a remote syslog server for the following processes:
clagd, switchd, ptmd, rdnbrd, netd and syslog. It follows the same syntax as the /var/log
/syslog file, where @ indicates UDP, 192.168.1.2 is the IP address of the syslog server, and 514 is
the UDP port.
The numbering of the files in /etc/rsyslog.d/ dictates how the rules are installed into
rsyslog.d. If you want to remotely log only the messages in /var/syslog, and not
those in /var/log/clagd.log or /var/log/switchd.log, for instance, then name
the file 98-remotesyslog.conf, since it's lower than the /var/syslog file 99-
syslog.conf only.
Do not use the imfile module with any file written by rsyslogd.
2. Restart rsyslog.
For each syslog server, configure a unique action line. For example, to configure two syslog servers at
192.168.0.254 and 10.0.0.1:
module(load="imuxsock"
SysSock.RateLimit.Interval="2" SysSock.RateLimit.Burst="50")
The following test script shows an example of rate-limit output in Cumulus Linux ...
cumulusnetworks.com 909
Cumulus Linux 3.7 User Guide
root@leaf1:mgmt-vrf:/home/cumulus# ./syslog.py
Sending 100 Messages...
DONE.
root@leaf1:mgmt-vrf:/home/cumulus# tail -n 60 /var/log/syslog
2017-02-22T19:59:50.043342+00:00 leaf1 syslog.py[22830]: Message
Number:0
2017-02-22T19:59:50.043723+00:00 leaf1 syslog.py[22830]: Message
Number:1
2017-02-22T19:59:50.043941+00:00 leaf1 syslog.py[22830]: Message
Number:2
2017-02-22T19:59:50.044565+00:00 leaf1 syslog.py[22830]: Message
Number:3
2017-02-22T19:59:50.044830+00:00 leaf1 syslog.py[22830]: Message
Number:4
2017-02-22T19:59:50.045680+00:00 leaf1 syslog.py[22830]: Message
Number:5
<...snip...>
2017-02-22T19:59:50.056727+00:00 leaf1 syslog.py[22830]: Message
Number:45
2017-02-22T19:59:50.057599+00:00 leaf1 syslog.py[22830]: Message
Number:46
2017-02-22T19:59:50.057741+00:00 leaf1 syslog.py[22830]: Message
Number:47
2017-02-22T19:59:50.057936+00:00 leaf1 syslog.py[22830]: Message
Number:48
2017-02-22T19:59:50.058125+00:00 leaf1 syslog.py[22830]: Message
Number:49
2017-02-22T19:59:50.058324+00:00 leaf1 rsyslogd-2177: imuxsock[pid
22830]: begin to drop messages due to rate-limiting
This message is harmless, and can be ignored. It is logged when systemd attempts to change cgroup
attributes that are read only. The upstream version of systemd has been modified to not log this message
by default.
The systemctl daemon-reload command is often issued when Debian packages are installed, so the
message may be seen multiple times when upgrading packages.
After correcting the invalid syntax, issuing the sudo rsyslogd -N1 command produces the following
output.
cumulusnetworks.com 911
Cumulus Linux 3.7 User Guide
tcpdump
If a syslog server is not accessible to validate that syslog messages are being exported, you can use
tcpdump.
In the following example, a syslog server has been configured at 192.168.0.254 for UDP syslogs on port
514:
A simple way to generate syslog messages is to use sudo in another session, such as sudo date. Using
sudo generates an authpriv log.
To see the contents of the syslog file, use the tcpdump -X option:
0x0050: 3120 3b20 5057 443d 2f68 6f6d 652f 6375 1.;.PWD=/home/cu
0x0060: 6d75 6c75 7320 3b20 5553 4552 3d72 6f6f mulus.;.USER=roo
0x0070: 7420 3b20 434f 4d4d 414e 443d 2f62 696e t.;.COMMAND=/bin
0x0080: 2f64 6174 65 /date
Next Steps
The links below discuss more specific monitoring topics.
+----------------------------------------------------------------
------------+
|*Cumulus Linux GNU
/Linux |
| Advanced options for Cumulus Linux GNU
/Linux |
|
ONIE
|
|
|
+----------------------------------------------------------------
------------+
2. Use the ^ and v arrow keys to select Advanced options for Cumulus Linux GNU/Linux. A menu
similar to the following should appear:
+----------------------------------------------------------------
------------+
| Cumulus Linux GNU/Linux, with Linux 4.1.0-cl-1-
amd64 |
| Cumulus Linux GNU/Linux, with Linux 4.1.0-cl-1-amd64
(sysvinit) |
cumulusnetworks.com 913
Cumulus Linux 3.7 User Guide
|
|
+----------------------------------------------------------------
------------+
6. Sync the /etc directory using btrfs, then reboot the system:
routes: 8192 <<<< if all routes are IPv6, or 65536 if all routes are
IPv4
route mask limit 64
host_routes: 73728
ecmp_nhs: 16327
ecmp_nhs_per_route: 52
This translates to about 314 routes with ECMP nexthops, if every route has the maximum ECMP nexthops.
To monitor the routes in Cumulus Linux hardware, use the cl-resource-query command. The results
vary between switches running on different chipsets.
The example below shows cl-resource-query results for a Broadcom Tomahawk switch:
cumulusnetworks.com 915
Cumulus Linux 3.7 User Guide
The example below shows cl-resource-query results for a Broadcom Trident II switch:
Ingress ACL and Egress ACL entries show the counts in single wide ( not double-wide). For
information about ACL entries, see Estimate the Number of ACL Rules (see page 153).
Contents
This topic describes ...
Monitor Hardware Using decode-syseeprom (see page 918)
Command Options (see page 918)
Related Commands (see page 919)
Monitor Hardware Using sensors (see page 919)
Monitor Switch Hardware Using SNMP (see page 920)
Monitor System Units Using smond (see page 920)
Keep the Switch Alive Using the Hardware Watchdog (see page 922)
Related Information (see page 922)
cumulusnetworks.com 917
Cumulus Linux 3.7 User Guide
cumulus@switch:~$ decode-syseeprom
TlvInfo Header:
Id String: TlvInfo
Version: 1
Total Length: 114
TLV Name Code Len Value
-------------------- ---- --- -----
Product Name 0x21 4 4804
Part Number 0x22 14 R0596-F0009-00
Device Version 0x26 1 2
Serial Number 0x23 19 D1012023918PE000012
Manufacture Date 0x25 19 10/09/2013 20:39:02
Base MAC Address 0x24 6 00:E0:EC:25:7B:D0
MAC Addresses 0x2A 2 53
Vendor Name 0x2D 17 Penguin Computing
Label Revision 0x27 4 4804
Manufacture Country 0x2C 2 CN
CRC-32 0xFE 4 0x96543BC5
(checksum valid)
Command Options
Usage: /usr/cumulus/bin/decode-syseeprom [-a][-r][-s [args]][-t]
Option Description
-s Sets the EEPROM content if the EEPROM is writable. args can be supplied in command line in
a comma separated list of the form '<field>=<value>, ...'. ',' and '=' are illegal
characters in field names and values. Fields that are not specified will default to their current
values. If args are supplied in the command line, they will be written without confirmation. If
args is empty, the values will be prompted interactively.
-t Selects the target EEPROM (board, psu2, psu1) for the read or write operation; default is
TARGET board.
Option Description
Related Commands
You can also use the dmidecode command to retrieve hardware configuration information that’s been
populated in the BIOS.
You can use apt-get to install the lshw program on the switch, which also retrieves hardware
configuration information.
cumulus@switch:~$ sensors
tmp75-i2c-6-48
Adapter: i2c-1-mux (chan_id 0)
temp1: +39.0 C (high = +75.0 C, hyst = +25.0 C)
tmp75-i2c-6-49
Adapter: i2c-1-mux (chan_id 0)
temp1: +35.5 C (high = +75.0 C, hyst = +25.0 C)
ltc4215-i2c-7-40
Adapter: i2c-1-mux (chan_id 1)
in1: +11.87 V
in2: +11.98 V
power1: 12.98 W
curr1: +1.09 A
max6651-i2c-8-48
Adapter: i2c-1-mux (chan_id 2)
fan1: 13320 RPM (div = 1)
fan2: 13560 RPM
Output from the sensors command varies depending upon the switch hardware you use, as
each platform ships with a different type and number of sensors.
cumulusnetworks.com 919
Cumulus Linux 3.7 User Guide
Option Description
-c, -- Specify a config file; use - after -c to read the config file from stdin; by default, sensors
config-file references the configuration file in /etc/sensors.d/.
-s, --set Executes set statements in the config file (root only); sensors -s is run once at boot time
and applies all the settings to the boot drivers.
If [CHIP] is not specified in the command, all chip info will be printed. Example chip names include:
lm78-i2c-0-2d *-i2c-0-2d
lm78-i2c-0-* *-i2c-0-*
lm78-i2c-*-2d *-i2c-*-2d
lm78-i2c-*-* *-i2c-*-*
lm78-isa-0290 *-isa-0290
lm78-isa-* *-isa-*
lm78-*
When the switch is not powered on, smonctl shows the PSU status as BAD instead of POWERED
OFF or NOT DETECTED. This is a known limitation.
Some switch models lack the sensor for reading voltage information, so this data is not output
from the smonctl command.
For example, the Dell S4048 series has this sensor and displays power and voltage information:
Option Description
cumulusnetworks.com 921
Cumulus Linux 3.7 User Guide
run_watchdog=1
To disable the watchdog, edit the /etc/watchdog.d/<your_platform> file and set run_watchdog to 0
:
run_watchdog=0
You can modify the settings for the watchdog — like the timeout setting and scheduler priority — in its
configuration file, /etc/watchdog.conf.
Related Information
packages.debian.org/search?keywords=lshw
lm-sensors.org
Net-SNMP tutorials
Contents
This topic describes ...
Network Port LEDs (see page 923)
Status LEDs (see page 924)
cumulusnetworks.com 923
Cumulus Linux 3.7 User Guide
Status LEDs
A set of status LEDs are typically located on one side of a network switch. The status LEDs provide a visual
indication on what is physically wrong with the network switch. Typical LEDs on the front panel are for PSU
(Power Supply Units), fans and system. Locator LEDs are also found on the front panel of a switch. Let's call
the different components for which the LEDs are there as just units for now.
Number of LEDs per unit — Each unit should have only 1 LED.
Location — All units should have their LEDs on the righthand side of the switch after the physical
ports.
Unit label — The label should be printed on the front panel directly above the LED.
Colors — The focus should be on giving a network operator a simple set of indications that provide
basic information about the unit. The following section has more information about the indications,
but colors are standardized on green and amber. These colors are universally found on all status
LEDs and should be easy to implement on future switches.
Defined LED — Every network switch must have LEDs for the following:
PSU
Fans
System LED
Locator LED
PSU LEDs — Each PSU must have its own LED. PSU faults are difficult to debug. If a network
operator knows which PSU is faulty, he or she can quickly check if it is powered up correctly and if
that fault persists, replace the PSU.
Fan LED — A network switch may have multiple fan trays (3 - 6). It is difficult to put an LED for each
fan tray on the front panel, given the limited real estate. Hence, the recommendation is one LED for
all fans.
System LED — A network switch must have a system LED that indicates the general state of a
switch. This state could be of hardware, software, or both. It is up to the individual switch NOS to
decide what this LED indicates. But the LED can have only the following indications:
Locator LED — The locator LED helps locate a particular switch in a data center full of switches.
Thus, it should have a different color and predefined location. It must be located at the top right
corner on the front panel of the switch and its color must be blue.
Locate a Switch
Cumulus Linux 3.3 and newer versions support the locator LED functionality for identifying a switch, by
blinking a single LED on a specified network port, on the following switches:
Celestica Seastone, Dell Z9100-ON, Edgecore AS7712-32X, Penguin Arctica 3200C, Quanta
QuantaMesh BMS T4048-IX2, Supermicro SSE-C3632S
To use the locator LED functionality, run:
In the example above, INTERFACE_NAME should be replaced with the name of the port, and TIME should
be replaced with the length of time, in seconds, that the port LED should blink.
This functionality is only supported on swp* ports, not eth* management interfaces.
Contents
This topic describes ...
cumulusnetworks.com 927
Cumulus Linux 3.7 User Guide
If you change one of these settings on the fly, the new configuration applies only to those VNIs or
VLANs set up after the configuration changed; previously allocated counters remain as is.
#stats.vlan.show_internal_vlans = FALSE
Clear Statistics
Since ethtool is not supported for virtual devices, you cannot clear the statistics cache maintained by the
kernel. You can clear the hardware statistics via switchd:
ASIC Monitoring
Cumulus Linux provides an ASIC monitoring tool that collects and distributes data about the state of the
ASIC. The monitoring tool polls for data at specific intervals and takes certain actions so that you can quickly
identify and respond to problems, such as:
Microbursts that result in longer packet latency
Packet buffer congestion that might lead to packet drops
Network problems with a particular switch, port, or traffic class
Contents
This topic describes ...
What Type of Statistics Can You Collect? (see page 930)
Collecting Queue Lengths in Histograms (see page 930)
Configure ASIC Monitoring (see page 931)
Configuration Examples (see page 934)
Queue Length Histograms (see page 934)
Packet Drops Due to Errors (see page 935)
Queue Length (Histogram) with Collect Actions (see page 935)
Example Snapshot File (see page 936)
Example Log Message (see page 937)
ASIC Monitoring Settings (see page 937)
Bin 3: 4032:5567
930 09 January 2019
Cumulus Networks
Bin 3: 4032:5567
Bin 4: 5568:7103
Bin 5: 7104:8639
Bin 6: 8640:10175
Bin 7: 10176:11711
Bin 8: 11712:13247
Bin 9: 13248:*
The following illustration demonstrates a histogram showing how many times the queue length for a port
was in the ranges specified by each bin. The example shows that the queue length was between 960 and
2495 bytes 125 times within one second.
cumulusnetworks.com 931
Cumulus Linux 3.7 User Guide
The following procedure describes how to monitor queue lengths using a histogram. The settings are
configured to collect data every second and write the results to a snapshot file. When the size of the queue
reaches 500 bytes, the system sends a message to the /var/log/syslog file.
To monitor queue lengths using a histogram:
2. At the end of the file, add the following line to specify the name of the histogram monitor (port
group). The example uses histogram_pg; however, you can use any name you choose. You must
use the same name with all histogram settings.
monitor.port_group_list = [histogram_pg]
3. Add the following line to specify the ports you want to monitor. The following example sets swp1
through swp50.
monitor.histogram_pg.port_set = swp1-swp50
4. Add the following line to set the data type to histogram. This is the data type for histogram
monitoring.
monitor.histogram_pg.stat_type = histogram
5. Add the following line to set the trigger type to timer. Currently, the only trigger type available is
timer.
monitor.histogram_pg.trigger_type = timer
6. Add the following line to set the frequency at which data collection starts. In the following example,
the frequency is set to one second.
monitor.histogram_pg.timer = 1s
7. Add the following line to set the actions you want to take when data is collected. In the following
example, the system writes the results of data collection to a snapshot file and sends a message to
the /var/log/syslog file .
monitor.histogram_pg.action_list = [snapshot,log]
8.
932 09 January 2019
Cumulus Networks
8. Add the following line to specify a name and location for the snapshot file. In the following example,
the system writes the snapshot to a file called histogram_stats in the /var/lib/cumulus
directory and adds a suffix to the file name with the snapshot file count (see the following step).
monitor.histogram_pg.snapshot.file = /var/lib/cumulus
/histogram_stats
9. Add the following line to set the number of snapshots that are taken before the system starts
overwriting the earliest snapshot files.
In the following example, because the snapshot file count is set to 64, the first snapshot file is named
histogram_stats_0 and the 64th snapshot is named histogram_stats_63. When the 65th
snapshot is taken, the original snapshot file (histogram_stats_0) is overwritten and the sequence
continues until histogram_stats_63 is written. Then, the sequence restarts.
monitor.histogram_pg.snapshot.file_count = 64
10. Add the following line to include a threshold, which determines how to collect data. Setting a
threshold is optional. In the following example, when the size of the queue reaches 500 bytes, the
system sends a message to the /var/log/syslog file .
monitor.histogram_pg.log.queue_bytes = 500
11. Add the following lines to set the size, minimum boundary, and sampling time of the histogram.
Adding the histogram size and the minimum boundary size together produces the maximum
boundary size. These settings are used to represent the range of queue lengths per bin.
monitor.histogram_pg.histogram.minimum_bytes_boundary = 960
monitor.histogram_pg.histogram.histogram_size_bytes = 12288
monitor.histogram_pg.histogram.sample_time_ns = 1024
12. Save the file, then restart the asic-monitor service with the following command.
Restarting the asic-monitor service does not disrupt traffic or require you to restart
switchd. The service is enabled by default when you boot the switch and restarts when
you restart switchd.
Important
cumulusnetworks.com 933
Cumulus Linux 3.7 User Guide
Overhead is involved in collecting the data, which uses both the CPU and SDK process and
can affect execution of switchd. Snapshots and logs can occupy a lot of disk space if you
do not limit their number.
To collect other data, such as all packets per port, buffer congestion, or packet drops due to error, follow
the procedure above but change the port group list setting to include the port group name you want to
use. For example, to monitor packet drops due to buffer congestion:
monitor.port_group_list = [buffers_pg]
monitor.buffers_pg.port_set = swp1-swp50
monitor.buffers_pg.stat_type = buffer
...
Certain settings in the procedure above (such as the histogram size, boundary size, and sampling time) only
apply to the histogram monitor. All ASIC monitor settings are described in ASIC Monitoring Settings (see
page 937).
Configuration Examples
Several configuration examples are provided below.
monitor.port_group_list = [histogram_pg]
monitor.histogram_pg.port_set = swp1-swp50
monitor.histogram_pg.stat_type = histogram
monitor.histogram_pg.cos_list = [0]
monitor.histogram_pg.trigger_type = timer
monitor.histogram_pg.timer = 1s
monitor.histogram_pg.action_list = [snapshot,log]
monitor.histogram_pg.snapshot.file = /var/lib
/cumulus/histogram_stats
monitor.histogram_pg.snapshot.file_count = 64
monitor.histogram_pg.log.queue_bytes = 500
monitor.histogram_pg.histogram.minimum_bytes_boundary = 960
monitor.histogram_pg.histogram.histogram_size_bytes = 12288
monitor.histogram_pg.histogram.sample_time_ns = 1024
monitor.port_group_list = [discards_pg]
monitor.discards_pg.port_set = swp1-swp50
monitor.discards_pg.stat_type = packet
monitor.discards_pg.action_list = [snapshot,log]
monitor.discards_pg.trigger_type = timer
monitor.discards_pg.timer = 2s
monitor.discards_pg.log.packet_error_drops = 100
monitor.discards_pg.snapshot.packet_error_drops = 100
monitor.discards_pg.snapshot.file = /var/lib/cumulus
/discard_stats
monitor.discards_pg.snapshot.file_count = 16
monitor.port_group_list = [histogram_pg,
discards_pg]
monitor.histogram_pg.port_set = swp1-swp50
monitor.histogram_pg.stat_type = buffer
monitor.histogram_pg.cos_list = [0]
monitor.histogram_pg.trigger_type = timer
monitor.histogram_pg.timer = 1s
monitor.histogram_pg.action_list = [snapshot,
collect,log]
cumulusnetworks.com 935
Cumulus Linux 3.7 User Guide
monitor.histogram_pg.snapshot.file = /var/lib
/cumulus/histogram_stats
monitor.histogram_pg.snapshot.file_count = 64
monitor.histogram_pg.histogram.minimum_bytes_boundary = 960
monitor.histogram_pg.histogram.histogram_size_bytes = 12288
monitor.histogram_pg.histogram.sample_time_ns = 1024
monitor.histogram_pg.log.queue_bytes = 500
monitor.histogram_pg.collect.queue_bytes = 500
monitor.histogram_pg.collect.port_group_list = [buffers_pg,
all_packet_pg]
monitor.buffers_pg.port_set = swp1-swp50
monitor.buffers_pg.stat_type = buffer
monitor.buffers_pg.action_list = [snapshot]
monitor.buffers_pg.snapshot.file = /var/lib
/cumulus/buffer_stats
monitor.buffers_pg.snapshot.file_count = 8
monitor.all_packet_pg.port_set = swp1-swp50
monitor.all_packet_pg.stat_type = packet_all
monitor.all_packet_pg.action_list = [snapshot]
monitor.all_packet_pg.snapshot.file = /var/lib
/cumulus/all_packet_stats
monitor.all_packet_pg.snapshot.file_count = 8
monitor.discards_pg.port_set = swp1-swp50
monitor.discards_pg.stat_type = packet
monitor.discards_pg.action_list = [snapshot,log]
monitor.discards_pg.trigger_type = timer
monitor.discards_pg.timer = 2s
monitor.discards_pg.log.packet_error_drops = 100
monitor.discards_pg.snapshot.packet_error_drops = 100
monitor.discards_pg.snapshot.file = /var/lib
/cumulus/discard_stats
monitor.discards_pg.snapshot.file_count = 16
Certain actions require additional settings. For example, if the snapshot action is specified, a
snapshot file is also required. If the log action is specified, a log threshold is also required. See
action_list (see page 938) for additional settings required for each action.
Setting Description
port_group_list Specifies the names of the monitors (port groups) you want to use to
collect data, such as discards_pg, histogram_pg, all_packet_pg,
buffers_pg. You can provide any name you want for the port group;
the names above are just examples. You must use the same name for
all the settings of a particular port group.
Example:
monitor.port_group_list = [histogram_pg,discards_pg,
buffers_pg, all_packets_pg]
You must specify at least one port group. If the port group list
is empty, systemd shuts down the asic-monitor service.
<port_group_name>. Specifies the range of ports monitored. You can specify GLOBs and
port_set comma-separated lists; for example, swp1-swp4,swp8,swp10-swp50.
Example:
monitor.histogram_pg.port_set = swp1-swp50
<port_group_name>. Specifies the type of data that the port group collects.
stat_type
For histograms, specify histogram. For example: monitor.
histogram_pg.stat_type = histogram
For packet drops due to errors, specify packet. For example:
monitor.discards_pg.stat_type = packet
cumulusnetworks.com 937
Cumulus Linux 3.7 User Guide
Setting Description
<port_group_name>. For histogram monitoring, each CoS (Class of Service) value in the list
cos_list has its own histogram on each port.The global limit on the number of
histograms is an average of one histogram per port.
Example:
monitor.histogram_pg.cos_list = [0]
<port_group_name>. Specifies the type of trigger that initiates data collection. Currently, the
trigger_type only option is timer. At least one port group must have a timer
configured, otherwise no data is ever collected.
Example:
monitor.histogram_pg.trigger_type = timer
<port_group_name>.timer Specifies the frequency at which data is collected; for example, a setting
of 1s indicates that data is collected once per second. You can set the
timer to the following:
1 to 60 seconds: 1s, 2s, and so on up to 60s
1 to 60 minutes: 1m, 2m, and so on up to 60m
1 to 24 hours: 1h, 2h, and so on up to 24h
1 to 7 days: 1d, 2d and so on up to 7d
Example:
monitor.histogram_pg.timer = 4s
<port_group_name>. Specifies one or more actions that occur when data is collected:
action_list
snapshot writes a snapshot of the data collection results to a
file. If you specify this action, you must also specify a snapshot
file (described below). You can also specify a threshold that
initiates the snapshot action, but this is not required. For
example:
monitor.histogram_pg.action_list = [snapshot]
monitor.histogram_pg.snapshot.file = /var/lib
/cumulus/histogram_stats
collect gathers additional data. If you specify this action, you
must also specify the port groups for the additional data you
want to collect. For example:
monitor.histogram_pg.action_list = [collect]
monitor.histogram_pg.collect.port_group_list =
[buffers_pg,all_packet_pg]
Setting Description
You can use all three of these actions in one monitoring step. For
example:
monitor.histogram_pg.action_list = [snapshot,collect,
log]
Note: If an action appears in the action list but does not have the
required settings (such as a threshold for the log action), the ASIC
monitor stops and reports an error.
<port_group_name>. Specifies the name for the snapshot file. All snapshots use this name,
snapshot.file with a sequential number appended to it. See the snapshot.
file_count, setting.
Example:
monitor.histogram_pg.snapshot.file = /var/lib/cumulus
/histogram_stats
<port_group_name>. Specifies the number of snapshots that can be created before the first
snapshot.file_count snapshot file is overwritten.
In the following example, because the snapshot file count is set to 64,
the first snapshot file is named histogram_stats_0 and the 64th
snapshot is named histogram_stats_63. When the 65th snapshot is
taken, the original snapshot file (histogram_stats_0) is overwritten
and the sequence restarts.
Example:
monitor.histogram_pg.snapshot.file_count = 64
While more snapshots provide you with more data, they can
occupy a lot of disk space on the switch.
cumulusnetworks.com 939
Cumulus Linux 3.7 User Guide
Setting Description
Specifies a threshold for the packet drops due to error monitor. This is
<port_group_name>.
the number of packet drops due to error that initiates a specified action
<action>.
(snapshot, log, collect).
packet_error_drops
Examples:
monitor.discards_pg.snapshot.packet_error_drops = 500
monitor.discards_pg.log.packet_error_drops = 500
monitor.discards_pg.collect.packet_error_drops = 500
To create the cl-support archive file manually, run the cl-support command:
If the Cumulus Networks support team requests that you submit the output from cl-support to help with
the investigation of issues you might experience with Cumulus Linux and you need to include security-
sensitive information, such as the sudoers file, use the -s option:
/var/log Information from the update-alternatives are logged into this log file.
/alternatives.
log
/var/log/apt Information the apt utility can send logs here; for example, from
apt-get install and apt-get remove.
/var/log/audit Contains log information stored by the Linux audit daemon, auditd.
/*
/var/log Logs output generated by running the zero touch provisioning (see
/autoprovision page 72) script.
/var/log/btmp This file contains information about failed login attempts. Use the
last command to view the btmp file. For example:
/var/log Contains kernel ring buffer information. When the system boots up, it
/dmesg prints number of messages on the screen that display information
about the hardware devices that the kernel detects during boot
process. These messages are available in the kernel ring buffer and
whenever a new message arrives, the old message gets overwritten.
You can also view the content of this file using the dmesg command.
Note that Cumulus Linux does not write to this log file; but because
it's a standard file, Cumulus Linux creates it as a zero length file.
/var/log/faillog Contains failed user login attempts. Use the faillog command to
display the contents of this file.
Note that Cumulus Linux does not write to this log file; but because
it's a standard file, Cumulus Linux creates it as a zero length file.
/var/log/fsck/* The fsck utility is used to check and optionally repair one or more
Linux filesystems.
Note that Cumulus Linux does not write to this log file; but because
it's a standard file, Cumulus Linux creates it as a zero length file.
/var/log Formats and prints the contents of the last login log file.
/lastlog
/var/log/news The news command keeps you informed of news concerning the
/* system.
Note that Cumulus Linux does not write to this log file; but because
it's a standard file, Cumulus Linux creates it as a zero length file.
cumulusnetworks.com 943
Cumulus Linux 3.7 User Guide
example an
md5 or mtu
mismatch with
OSPF.
/var/log Log file for snapshots (see page 57). These logs are
/snapper.log valuable for the
snapshots you
take on your
switch.
/var/log/syslog The main system log, which logs everything except auth-related The primary
messages. log; it's easiest
to grep this file
to see what
occurred
during a
problem.
File Description
/etc/nologin nologin prevents unprivileged users from logging into the system.
/etc update-alternatives creates, removes, maintains and displays information about the
/alternatives symbolic links comprising the Debian alternatives system.
This is the alphabetical of the output from running ls -l on the /etc directory structure created by cl-
support. The green highlighted rows are the ones Cumulus Networks finds most important when
troubleshooting problems.
ca-certificates
cumulusnetworks.com 945
Cumulus Linux 3.7 User Guide
chef This is an example of something that is not included This is not installed by
by default. In this instance, cl-support included the default, but this tool could
chef folder for some reason. have been installed or
configured incorrectly, which
is why it's included in the
cl-support output.
cumulusnetworks.com 947
Cumulus Linux 3.7 User Guide
gss
image-release Contains the version of Cumulus Linux that was Useful for determining
installed with the installer. This version number does baseline version.
not change when you upgrade using apt-get.
cumulusnetworks.com 949
Cumulus Linux 3.7 User Guide
lsb-release Shows the current version of Linux on the system. This shows you the version
Run cat /etc/lsb-release for output. of the operating system you
are running; also compare
this to the output of onie-
select.
cumulusnetworks.com 951
Cumulus Linux 3.7 User Guide
netd.conf The NCLU (see page 88) configuration file. Contains the settings for
which Linux commands are
operable under NCLU. Also
contains the blacklist of
infrequently used
commands.
network Contains the network interface configuration for ifup The main configuration file is
and ifdown. under /etc/network
/interfaces. This is
where you configure L2 and
L3 information for all of your
front panel ports (swp
interfaces). Settings like
MTU, link speed, IP address
information, and VLANs are
all included here.
cumulusnetworks.com 953
Cumulus Linux 3.7 User Guide
Cumulus Linux-specific
folder for PTM (prescriptive
topology manager).
resolv.conf Resolver configuration file, which is where DNS is set You need DNS to reach the
(domain, nameserver and search). Cumulus Linux repository.
securetty This file lists terminals into which the root user can log
in.
cumulusnetworks.com 955
Cumulus Linux 3.7 User Guide
sudoers
timezone If this file exists, it is read and its contents are used as
the time zone name.
cumulusnetworks.com 957
Cumulus Linux 3.7 User Guide
Contents
This topic describes ...
Enable Logging for Networking (see page 958)
Use ifquery to Validate and Debug Interface Configurations (see page 959)
Mako Template Errors (see page 960)
ifdown Cannot Find an Interface that Exists (see page 961)
Remove All References to a Child Interface (see page 961)
MTU Set on a Logical Interface Fails with Error: "Numerical result out of range" (see page 961)
iproute2 batch Command Failures (see page 962)
"RTNETLINK answers: Invalid argument" Error when Adding a Port to a Bridge (see page 962)
MLAG Peerlink Interface Drops Many Packets (see page 962)
#
#
# Exclude interfaces
EXCLUDE_INTERFACES=
Use ifquery --check to check the current running state of an interface within the interfaces file. It
will return exit code 0 or 1 if the configuration does not match. The line bond-xmit-hash-policy
layer3+7 below fails because it should read bond-xmit-hash-policy layer3+4.
Use ifquery --running to print the running state of interfaces in the interfaces file format:
cumulusnetworks.com 959
Cumulus Linux 3.7 User Guide
iface bond0
bond-slaves swp25 swp26
address 14.0.0.9/30
address 2001:ded:beef:2::1/64
ifquery --syntax-help provides help on all possible attributes supported in the interfaces file. For
complete syntax on the interfaces file, see man interfaces and man ifupdown-addons-
interfaces.
You can use ifquery --print-savedstate to check the ifupdown2 state database. ifdown works
only on interfaces present in this state database.
# ssim2 added
auto swp45
iface swp45
auto swp46
iface swp46
auto lo
iface lo inet loopback
auto eth0
iface eth0 inet dhcp
auto bond1
iface bond1
bond-slaves swp2 swp1
auto bond3
iface bond3
bond-slaves swp8 swp6 swp7
auto br0
iface br0
bridge-ports swp3 swp5 bond1 swp4 bond3
bridge-pathcosts swp3=4 swp5=4 swp4=4
address 11.0.0.10/24
address 2001::10/64
Notice that bond1 is a member of br0. If bond1 is removed, you must remove the reference to it from the
br0 configuration. Otherwise, if you reload the configuration with ifreload -a, bond1 is still part of br0.
MTU Set on a Logical Interface Fails with Error: "Numerical result out of
cumulusnetworks.com 961
Cumulus Linux 3.7 User Guide
MTU Set on a Logical Interface Fails with Error: "Numerical result out of
range"
This error occurs when the MTU (see page 237) you are trying to set on an interface is higher than the MTU
of the lower interface or dependent interface. Linux expects the upper interface to have an MTU less than
or equal to the MTU on the lower interface.
In the example below, the swp1.100 VLAN interface is an upper interface to physical interface swp1. If you
want to change the MTU to 9000 on the VLAN interface, you must include the new MTU on the lower
interface swp1 as well.
auto swp1.100
iface swp1.100
mtu 9000
auto swp1
iface swp1
mtu 9000
error: failed to execute cmd 'ip -force -batch - [link set dev host2
master bridge
addr flush dev host2
link set dev host1 master bridge
addr flush dev host1
]'(RTNETLINK answers: Invalid argument
Command failed -:1)
warning: bridge configuration failed (missing ports)
Contents
This topic describes ...
Monitor Interface Status Using ethtool (see page 963)
View and Clear Interface Counters (see page 964)
Monitor Switch Port SFP/QSFP Hardware Information Using ethtool (see page 965)
cumulusnetworks.com 963
Cumulus Linux 3.7 User Guide
HwIfOutOctets: 14866246
HwIfOutUcastPkts: 11791
HwIfOutMcastPkts: 136493
HwIfOutBcastPkts: 0
HwIfInDiscards: 0
HwIfInL3Drops: 0
HwIfInBufferDrops: 0
HwIfInAclDrops: 28
HwIfInDot3LengthErrors: 0
HwIfInErrors: 0
SoftInErrors: 0
SoftInDrops: 0
SoftInFrameErrors: 0
HwIfOutDiscards: 0
HwIfOutErrors: 0
HwIfOutQDrops: 0
HwIfOutNonQDrops: 0
SoftOutErrors: 0
SoftOutDrops: 0
SoftOutTxFifoFull: 0
HwIfOutQLen: 0
Option Description
-c Copies and clears statistics. It does not clear counters in the kernel or hardware.
Option Description
The -c argument is applied per user ID by default. You can override it by using the -
t argument to save statistics to a different directory.
The -d argument is applied per user ID by default. You can override it by using the -
t argument to save statistics to a different directory.
cumulusnetworks.com 965
Cumulus Linux 3.7 User Guide
Network Troubleshooting
Cumulus Linux contains a number of command line and analytical tools to help you troubleshoot issues
with your network.
Contents
This topic describes ...
Check Reachability Using ping (see page 968)
Print Route Trace Using traceroute (see page 969)
Manipulate the System ARP Cache (see page 969)
Generate Traffic Using mz (see page 970)
Create Counter ACL Rules (see page 971)
Configure SPAN and ERSPAN (see page 972)
When troubleshooting intermittent connectivity issues, it is helpful to send continuous pings to a host.
To send continuous pings to an IPv4 host:
where:
cumulus@switch:~$ arp -a
? (11.0.2.2) at 00:02:00:00:00:10 [ether] on swp3
? (11.0.3.2) at 00:02:00:00:00:01 [ether] on swp4
? (11.0.0.2) at 44:38:39:00:01:c1 [ether] on swp1
cumulusnetworks.com 969
Cumulus Linux 3.7 User Guide
If you need to flush or remove and ARP entry for a specific interface, you can disable dynamic ARP learning:
[iptables]
-A FORWARD -p tcp --dport 80 -j ACCEPT
The -p option clears out all other rules, and the -i option is used to reinstall all the rules.
cumulusnetworks.com 971
Cumulus Linux 3.7 User Guide
----------------------------------------------------------
| MAC_HEADER | IP_HEADER | GRE_HEADER | L2_Mirrored_Packet |
----------------------------------------------------------
Mirrored traffic is not guaranteed. If the MTP is congested, mirrored packets may be discarded.
SPAN and ERSPAN are configured via cl-acltool, the same utility for security ACL configuration (see
page 141). The match criteria for SPAN and ERSPAN is usually an interface; for more granular match terms,
use selective spanning (see page 978). The SPAN source interface can be a port, a subinterface or a bond
interface. Ingress traffic on interfaces can be matched, and on Mellanox Spectrum switches, egress traffic
can be matched. See the list of limitations (see page 972) below.
Cumulus Linux supports a maximum of 2 SPAN destinations. Multiple rules (SPAN sources) can point to the
same SPAN destination, although a given SPAN source cannot specify 2 SPAN destinations. The SPAN
destination (MTP) interface can be a physical port, a subinterface, or a bond interface. The SPAN/ERSPAN
action is independent of security ACL actions. If packets match both a security ACL rule and a SPAN rule,
both actions will be carried out.
Using cl-acltool with the --out-interface rule applies to transit traffic only; it does not
apply to traffic sourced from the switch.
cumulusnetworks.com 973
Cumulus Linux 3.7 User Guide
Running the following command is incorrect and will remove all existing control-plane rules or
other installed rules and only install the rules defined in span.rules:
cumulusnetworks.com 975
Cumulus Linux 3.7 User Guide
Using cl-acltool with the --out-interface rule applies to transit traffic only; it does not
apply to traffic sourced from the switch.
Configure ERSPAN
This section describes how to configure ERSPAN for all packets coming in from swp1 to 12.0.0.2.
cumulusnetworks.com 977
3.
The src-ip option can be any IP address, whether it exists in the routing table or not. The dst-ip
option must be an IP address reachable via the routing table. The destination IP address must be
reachable from a front-panel port, and not the management port. Use ping or ip route get
<ip> to verify that the destination IP address is reachable. Setting the --ttl option is
recommended.
When using Wireshark to review the ERSPAN output, Wireshark may report the message
"Unknown version, please report or test to use fake ERSPAN preference", and the trace is
unreadable. To resolve this, go into the General preferences for Wireshark, then go to Protocols
> ERSPAN and check the Force to decode fake ERSPAN frame option.
Selective Spanning
SPAN/ERSPAN traffic rules can be configured to limit the traffic that is spanned, to reduce the volume of
copied data.
Cumulus Linux supports selective spanning for iptables only. ip6tables and ebtables are
not supported.
With ERSPAN, a maximum of two --src-ip --dst-ip pairs are supported. Exceeding this limit
produces an error when you install the rules with cl-acltool.
SPAN Examples
To mirror forwarded packets from all ports matching SIP 20.0.1.0 and DIP 20.0.1.2 to port swp1s1:
To mirror forwarded UDP packets received from port swp1s0, towards DIP 20.0.1.2 and destination
port 53:
ERSPAN Examples
To mirror forwarded packets from all ports matching SIP 20.0.1.0 and DIP 20.0.1.2:
To mirror forwarded UDP packets received from port swp1s0, towards DIP 20.0.1.2 and destination
port 53:
cumulusnetworks.com 979
Cumulus Linux 3.7 User Guide
Related Information
en.wikipedia.org/wiki/Ping
www.tcpdump.org
en.wikipedia.org/wiki/Traceroute
Contents
This topic describes ...
net show Commands (see page 982)
Show Interfaces (see page 982)
Other Useful Features (see page 984)
Install netshow on a Linux Server (see page 984)
cumulusnetworks.com 981
Cumulus Linux 3.7 User Guide
Show Interfaces
To show all available interfaces that are physically UP, run net show interface:
Whereas net show interface all displays every interface regardless of state:
You can get information about the switch itself by running net show system:
cumulusnetworks.com 983
Cumulus Linux 3.7 User Guide
Debian and Red Hat packages will be available in the near future.
Contents
This topic describes ...
Install hsflowd (see page 984)
Configure sFlow (see page 985)
Configure sFlow via DNS-SD (see page 985)
Manually Configure /etc/hsflowd.conf (see page 986)
Configure sFlow Visualization Tools (see page 986)
Related Information (see page 986)
Install hsflowd
To download and install the hsflowd package, use apt-get:
Configure sFlow
You can configure hsflowd to send to the designated collectors via two methods:
DNS service discovery (DNS-SD)
Manually configuring /etc/hsflowd.conf
The above snippet instructs hsflowd to send sFlow data to collector1 on port 6343 and to collector2 on
port 6344. hsflowd will poll counters every 20 seconds and sample 1 out of every 2048 packets.
The maximum samples per second delivered from the hardware is limited to 16K. You can
configure the number of samples per scond in the /etc/cumulus/datapath/traffic.conf
file, as shown below:
After the initial configuration is ready, bring up the sFlow daemon by running:
cumulusnetworks.com 985
Cumulus Linux 3.7 User Guide
DNSSD = off
sampling.1G=2048
sampling.10G=4096
sampling.40G=8192
collector {
ip = 192.0.2.100
udpport = 6343
}
collector {
ip = 192.0.2.200
udpport = 6344
}
This configuration polls the counters every 20 seconds, samples 1 of every 2048 packets and sends this
information to a collector at 192.0.2.100 on port 6343 and to another collector at 192.0.2.200 on port
6344.
Some collectors require each source to transmit on a different port, others may listen on only
one port. Please refer to the documentation for your collector for more information.
Related Information
sFlow Collectors
sFlow Wikipedia page
Contents
This topic describes ...
History (see page 987)
Introduction to Simple Network Management Protocol (see page 988)
SNMP Managers (see page 988)
SNMP Agents (see page 988)
Management Information Base (MIB) (see page 988)
Getting Started (see page 990)
Configure SNMP (see page 990)
Configure SNMP with NCLU (see page 991)
Configure SNMP Manually (see page 997)
Start the SNMP Daemon (see page 1000)
Configure SNMP with Management VRF (used prior to Cumulus Linux 3.6) (see page 1001)
Set up the Custom Cumulus Networks MIBs (see page 1003)
Set the Community String (see page 1003)
Enable SNMP Support for FRRouting (see page 1004)
Enable the .1.3.6.1.2.1 Range (see page 1005)
Configure SNMPv3 (see page 1005)
Manually Configure SNMP Traps (Non-NCLU) (see page 1008)
Generate Event Notification Traps (see page 1008)
snmptrapd.conf (see page 1017)
Supported MIBs (see page 1018)
Pass Persist Scripts (see page 1022)
Troubleshooting (see page 1022)
History
SNMP is an IETF standards-based network management architecture and protocol that traces its roots back
to Carnegie-Mellon University in 1982. Since then, it has been modified by programmers at the University of
California. In 1995, this code was also made publicly available as the UCD project. After that, ucd-snmp was
extended by work done at the University of Liverpool as well as later in Denmark. In late 2000, the project
name changed to net-snmp and became a fully-fledged collaborative open source project. The version
used by Cumulus Networks is based on the latest net-snmp 5.7 branch with added custom MIBs and pass-
through and pass-persist scripts (see below (see page 1022) for more information on pass persist scripts).
cumulusnetworks.com 987
Cumulus Linux 3.7 User Guide
SNMP Managers
An SNMP Network Management System (NMS) is a computer that is configured to poll SNMP agents (in this
case, Cumulus Linux switches and routers) to gather information and present it. This manager can be any
machine that can send query requests to SNMP agents with the correct credentials. This NMS can be a
large set of monitoring suite or as simple as some scripts that collect and display data. The managers
generally poll the agents and the agents respond with the data. There are a variety of polling command-line
tools (snmpget, snmpgetnext, snmpwalk, snmpbulkget, snmpbulkwalk, and so on). SNMP agents can also
send unsolicited Traps/Inform messages to the SNMP Manager based on predefined criteria (like link
changes).
SNMP Agents
The SNMP agents (snmpd) running on the switches do the bulk of the work and are responsible for
gathering information about the local system and storing data in a format that can be queried updating an
internal database called the management information base, or MIB. The MIB is a standardized, hierarchical
structure that stores information that can be queried. Parts of the MIB tree are available and provided to
incoming requests originating from an NMS host that has authenticated with the correct credentials. You
can configure the Cumulus Linux switch with usernames and credentials to provide authenticated and
encrypted responses to NMS requests. The snmpd agent can also proxy requests and act as a master
agent to sub-agents running on other daemons (FRR, LLDP).
numbers and string and to also provide definitions for the various MIB Objects. For example, you can view
988 09 January 2019
Cumulus Networks
numbers and string and to also provide definitions for the various MIB Objects. For example, you can view
the sysLocation object in the system table with either a string of numbers 1.3.6.1.2.1.1.6 or the string
representation iso.org.dod.internet.mgmt.mib-2.system.sysLocation. You can view the definition with
the snmptranslate (1) command (found in the snmp Debian package).
snmptranslate Command
The section 1.3.6.1 or iso.org.dod.internet is the OID that defines internet resources. The 2 or
mgmt that follows is for a management subcategory. The 1 or mib-2 under that defines the MIB-2
specification. And finally, the 1 or system is the parent for a number of child objects (sysDescr, sysObjectID,
sysUpTime, sysContact, sysName, sysLocation, sysServices, and so on).
cumulusnetworks.com 989
Cumulus Linux 3.7 User Guide
Getting Started
The simplest use case for using SNMP consists of creating a readonly community password and enabling a
listening address for the loopback address (this is the default listening-address provided). This allows for
testing functionality of snmpd before extending the listening addresses to IP addresses reachable from
outside the switch or router. This first sample configuration adds a listening address on the loopback
interface (this is not a change from the default so we get a message stating that the configuration has not
changed), sets a simple community password (SNMPv2) for testing, changes the system-name object in the
system table, commits the change, checks the status of snmpd, and gets the first MIB object in the system
table:
Configure SNMP
For external SNMP NMS systems to poll Cumulus Linux switches and routers, you must configure the SNMP
agent (snmpd) running on the switch with one or more IP addresses (with net add snmp-server
listening-address <ip>) on which the agent listens. You must configure these IP addresses on
interfaces that have link state UP. By default, the SNMP configuration has a listening address of localhost
(or 127.0.0.1), which allows the daemon to respond to SNMP requests originating on the switch itself. This is
a useful method of checking the configuration for SNMP without exposing the switch to attacks from the
outside. The only other required configuration is a readonly community password (configured with net
add snmp-server readonly-community <password> access <ip | any>), that allows polling of
the various MIB objects on the device itself. SNMPv3 is recommended since SNMPv2c (with a community
990 09 January 2019
Cumulus Networks
the various MIB objects on the device itself. SNMPv3 is recommended since SNMPv2c (with a community
string) exposes the password in the GetRequest and GetResponse packets. SNMPv3 does not expose
the username passwords and has the option of encrypting the packet contents.
Cumulus Linux 3.4 and later releases support configuring SNMP with NCLU. While NCLU does not
provide functionality to configure every single snmpd feature, it is the recommended method of
configuring snmpd. You are not restricted to using NCLU for configuration and can edit the /etc
/snmp/snmpd.conf file and control snmpd with systemctl commands. For Cumulus Linux
versions earlier than 3.0, snmpd has a default configuration that listens to incoming requests on
all interfaces.
Cumulus Linux 3.6 and later releases support configuring VRFs for listening-addresses as well as
Trap/Inform support. If management VRF is enabled on the switch, this places the eth0 interface
in the management VRF. When configuring the listening-address for snmp-server, you must
specify an additional parameter to enable listening on the eth0 interface with the following
command:
cumulus@router1:~$ net add snmp-server listening-address 10.10.10.10 vrf
mgmt
These additional parameters are described in detail below.
You must add a default community string for v1 or v2c environments or the snmpd daemon does
not respond to any requests. For security reasons, the default configuration configures snmpd to
listen to SNMP requests on the loopback interface so access to the switch is restricted to
requests originating from the switch itself. The only required commands for snmpd to function
are a listening-address and either a username or a readonly-community string.
Command Summary
net del all or net del Removes all entries in the /etc/snmp/snmpd.conf file and
snmp-server all replaces them with defaults. The defaults remove all SNMPv3
usernames, readonly-communities, and a listening-address of
localhost is configured.
net add snmp-server For security reasons, the localhost is set to a listening address
listening-address 127.0.0.1 by default so that the SNMP agent only responds to
(localhost|localhost-v6) requests originating on the switch itself. You can also configure
cumulusnetworks.com 991
Cumulus Linux 3.7 User Guide
Command Summary
net add snmp-server Configures the snmpd agent to listen on all interfaces for either IPv4
listening-address or IPv6 UDP port 161 SNMP requests. This command removes all
(all|all-v6) other individual IP addresses configured.
Note: This command does not allow snmpd to cross VRF table
boundaries. To listen on IP addresses in different VRF tables, use
multiple listening-address commands each with a VRF name, as
shown below.
net add snmp-server Sets snmpd to listen to a specific IPv4 or IPv6 address, or a group of
listening-address addresses with space separated values, for incoming SNMP queries.
IP_ADDRESS IP_ADDRESS ... If VRF tables are used, be sure to specify an IP address with an
associated VRF name, as shown below. If you omit a VRF name, the
default VRF is used.
net add snmp-server Sets snmpd to listen to a specific IPv4 or IPv6 address on an
listening-address interface within a particular VRF. With VRFs, identical IP addresses
IP_ADDRESS vrf VRF_NAME can exist in different VRF tables. This command restricts listening to
a particular IP address within a particular VRF. If the VRF name is not
given, the default VRF is used.
Command Summary
net add snmp-server Creates an SNMPv3 username and the necessary credentials for
username [user name] access. You can restrict a user to a particular OID tree or predefined
(auth-none|auth-md5|auth- view name if these are specified. If you specify auth-none, no
sha) <authentication authentication password is required. Otherwise, an MD5 or SHA
password> [(encrypt- password is required for access to the MIB objects. If specified, an
des|encrypt-aes) encryption password is used to hide the contents of the request
<encryption password>] and response packets.
(oid <OID>|view <view
name>)
net add snmp-server username
testusernoauth auth-none
net add snmp-server username
testuserauth auth-md5
myauthmd5password
net add snmp-server username
testuserboth auth-md5
mynewmd5password encrypt-aes
myencryptsecret
net add snmp-server username
limiteduser1 auth-md5
md5password1 encrypt-aes
myaessecret oid 1.3.6.1.2.1.1
net add snmp-server Creates a view name that is used in readonly-community to restrict
viewname [view name] MIB tree exposure. By itself, this view definition has no effect;
(included | excluded) however, when linked to an SNMPv3 username or community
[OID or name] password, and a host from a restricted subnet, any SNMP request
with that username and password must have a source IP address
within the configured subnet.
Note: OID can be either a string of period separated decimal
numbers or a unique text string that identifies an SNMP MIB object.
Some MIBs are not installed by default; you must install them either
by hand or with the latest Debian package called snmp-mibs-
downloader. You can remove specific view name entries with the
delete command or with just a view name to remove all entries
matching that view name. You can define a specific view name
multiple times and fine tune to provide or restrict access using the
included or excluded command to specify branches of certain MIB
trees.
cumulusnetworks.com 993
Cumulus Linux 3.7 User Guide
Command Summary
net add snmp-server This command defines the password required for SNMP version 1
(readonly-community | or 2c requests for GET or GETNEXT. By default, this provides access
readonly-community-v6) to the full OID tree for such requests, regardless of from where they
[password] access (any | were sent. There is no default password set, so snmpd does not
localhost | [network]) respond to any requests that arrive. Users often specify a source IP
[(view [view name]) or address token to restrict access to only that host or network given.
[oid [oid or name]) You can specify a view name to restrict the subset of the OID tree.
Examples of readonly-community commands are shown below.
The first command sets the read only community string to
simplepassword for SNMP requests and this restricts requests to
those sourced from hosts in the 10.10.10.0/24 subnet and restricts
viewing to the mysystem view name defined with the viewname
command. The second example creates a read-only community
password showitall that allows access to the entire OID tree for
requests originating from any source IP address.
net add snmp-server trap- For SNMP versions 1 and 2C, this command sets the SNMP Trap
destination (localhost | destination IP address. Multiple destinations can exist, but you must
[ipaddress]) [vrf vrf set up at least one to enable SNMP Traps to be sent. Removing all
name] community-password settings disables SNMP traps. The default version is 2c, unless
[password] [version [1 | otherwise configured. You must include a VRF name with the IP
2c]] address to force Traps to be sent in a non-default VRF table.
Command Summary
net add snmp-server trap- For SNMPv3 Trap and Inform messages, this command configures
destination (localhost | the trap destination IP address (with an optional VRF name). You
[ipaddress]) [vrf vrf must define the authentication type and password. The encryption
name] username <v3 type and password are optional. You must specify the engine ID
username> (auth-md5|auth- /user name pair. The inform keyword is used to specify an Inform
sha) <authentication message where the SNMP agent waits for an acknowledgement.
password> [(encrypt-
For Traps, the engine ID/user name is for the CL switch sending the
des|encrypt-aes)
traps. This can be found at the end of the /var/lib/snmp/snmpd.
<encryption password>]
engine-id <text> [inform] conf file labelled oldEngineID. Configure this same engine ID
/user name (with authentication and encryption passwords) for the
Trap daemon receiving the trap to validate the received Trap.
cumulusnetworks.com 995
Cumulus Linux 3.7 User Guide
Command Summary
net add snmp-server trap- Enables notifications for interface link-up to be sent to SNMP Trap
link-up [check-frequency destinations.
[seconds]]
net add snmp-server trap- Enables notifications for interface link-down to be sent to SNMP
link-down [check- Trap destinations.
frequency [seconds]]
net add snmp-server trap- Enables SNMP Trap notifications to be sent for every SNMP
snmp-auth-failures authentication failure.
net add snmp-server trap- Enables a trap when the cpu-load-average exceeds the configured
cpu-load-average one- threshold. You can only use integers or floating point numbers.
minute [threshold] five-
minute [5-min-threshold]
fifteen-minute [15-min- net add snmp-server trap-cpu-load-average
threshold] one-minute 4.34 five-minute 2.32 fifteen-
minute 6.5
Command Summary
net add snmp-server Sets the system physical location for the node in the SNMPv2-MIB
system-location [string] system table.
net add snmp-server Sets the identification of the contact person for this managed node,
system-contact [string] together with information on how to contact this person.
net add snmp-server Sets an administratively-assigned name for the managed node. By
system-name [string] convention, this is the fully-qualified domain name of the node.
The example commands below enable an SNMP agent to listen on all IPv4 addresses with a community
string password, set the trap destination host IP address, and create four types of SNMP traps.
Use caution when editing this file. The next time you use NCLU to update your SNMP configuration, if NCLU
cumulusnetworks.com 997
Cumulus Linux 3.7 User Guide
Use caution when editing this file. The next time you use NCLU to update your SNMP configuration, if NCLU
is unable to correctly parse the syntax, some of the options might be overwritten.
Make sure you do not delete the snmpd.conf file; this can cause issues with the package manager the
next time you update Cumulus Linux.
The SNMP daemon, snmpd, uses the /etc/snmp/snmpd.conf configuration file for most of its
configuration. The syntax of the most important keywords are defined in the following table.
Syntax Meaning
agentaddress Required. This command sets the protocol, IP address, and the port for
snmpd to listen for incoming requests. The IP address must exist on an
interface that has link UP on the switch where snmpd is being used. By
default, this is set to udp:127.0.0.1:161, which means snmpd listens on the
loopback interface and only responds to requests (snmpwalk, snmpget,
snmpgetnext) originating from the switch. A wildcard setting of udp:161,
udp6:161 forces snmpd to listen on all IPv4 and IPv6 interfaces for
incoming SNMP requests. You can configure multiple IP addresses as
comma-separated values; for example, udp:66.66.66.66:161,udp:
77.77.77.77:161,udp6:[2001::1]:161. You can use multiple lines to define
listening addresses. To bind to a particular IP address within a particular
VRF table, follow the IP address with a % and the name of the VRF table
(for example, 10.10.10.10%mgmt).
rocommunity Required. This command defines the password that is required for SNMP
version 1 or 2c requests for GET or GETNEXT. By default, this provides
access to the full OID tree for such requests, regardless of from where
they were sent. There is no default password set, so snmpd does not
respond to any requests that arrive. Specify a source IP address token to
restrict access to only that host or network given. Specify a view name (as
defined above) to restrict the subset of the OID tree.
Examples of rocommunity commands are shown below. The first
command sets the read only community string to simplepassword for
SNMP requests sourced from the 10.10.10.0/24 subnet and restricts
viewing to the systemonly view name defined previously with the view
command. The second example creates a read-only community password
that allows access to the entire OID tree from any source IP address.
rocommunity cumulustestpassword
view This command defines a view name that specifies a subset of the overall
OID tree. You can reference this restricted view by name in the
rocommunity command to link the view to a password that is used to see
this restricted OID subset. By default, the snmpd.conf file contains
numerous views with the systemonly view name.
Syntax Meaning
trapsink This command defines the IP address of the notification (or trap) receiver
for either SNMPv1 traps or SNMPv2 traps. If you specify several sink
trap2sink
directives, multiple copies of each notification (in the appropriate formats)
are generated. You must configure a trap server to receive and decode
these trap messages (for example, snmptrapd). You can configure the
address of the trap receiver with a different protocol and port but this is
most often left out. The defaults are to use the well-known UDP packets
and port 162.
createuser snmptrapusernameX
iquerysecname snmptrapusernameX
rouser snmptrapusernameX
linkUpDownNotifications This command enables link up and link down trap notifications, assuming
yes the other trap configurations settings are set. This command configures
the Event MIB tables to monitor the ifTable for network interfaces being
taken up or down, and triggering a linkUp or linkDown notification as
appropriate. This is equivalent to the following configuration:
cumulusnetworks.com 999
Cumulus Linux 3.7 User Guide
Syntax Meaning
defaultMonitors yes This command configures the Event MIB tables to monitor the various
UCD-SNMP-MIB tables for problems (as indicated by the appropriate
xxErrFlag column objects) and send a trap. This assumes you have
downloaded the snmp-mibs-downloader Debian package and
commented out mibs from the /etc/snmp/snmp.conf file (#mibs). This
command is exactly equivalent to the following configuration:
[Service]
Restart=always
RestartSec=60
After the service starts, you can use SNMP to manage various components on the switch.
Configure SNMP with Management VRF (used prior to Cumulus Linux 3.6)
When you configure Management VRF (see page 859), you need to be aware of the interface IP addresses
on which SNMP is listening. If you set listening-address to all, the snmpd daemon responds to incoming
requests on all interfaces that are in the default VRF. If you prefer to listen on a limited number of IP
addresses, Cumulus Networks recommends that you run only one instance of the snmpd daemon and
specify the VRF name along with the listening-address. You can configure IP addresses in different VRFs and
a single SNMP daemon listens on multiple IP addresses each with its own VRF. Because SNMP has native
VRF awareness, using systemctl commands to manage snmpd in different VRFs is no longer necessary.
SNMP configuration in NCLU is VRF aware so you can configure the snmpd daemon to listen to incoming
SNMP requests on a particular IP address within particular VRFs. Because interfaces in a particular VRF
(routing table) are not aware of interfaces in a different VRF, the snmpd daemon only responds to polling
requests and sends traps on the interfaces of the VRF on which it is configured.
When management VRF is configured, configure the listening-address with a VRF name as shown above.
This allows snmpd to receive and respond to SNMP polling requests on eth0.
Prior to CL 3.6, you could not configure a VRF name in the listening-address or the trap-destination
commands. To manually handle VRF functionality, you had to do the following:
1. Configure all the required SNMP settings with NCLU. Pay particular attention to the listening-address
configuration setting, which should contain one or more IP addresses that belong to interfaces
within a single VRF (if management VRF is configured, this is typically the IP address of eth0 ). You can
use IP addresses other than eth0, but the interfaces for these IP addresses must be in the same VRF
(typically the management VRF).
2. Commit the changes to start the snmpd daemon in the default VRF.
3. Manually stop the snmpd daemon from running in the default VRF.
4. Manually restart the snmpd daemon in the management VRF.
cumulusnetworks.com 1001
Cumulus Linux 3.7 User Guide
Prior to CL 3.6, more complex configurations may have been needed; for example, you can run
more than one snmpd daemon (one in each VRF designed to receive SNMP polling requests).
Cumulus Networks does not recommend this for memory and cpu resource reasons. However, if
this is required, you must use a separate configuration file with each instance of the snmpd
daemon. You can use a copy of the /etc/snmp/snmpd.conf file. When you use this file, start an
snmpd daemon with the following command:
To use management VRF, you need to configure the IP address of eth0 as the listening-address. In the
example below, eth0 IP address is 10.10.10.10. You can also add other snmp-server configurations, then
commit the changes.
This restarts the snmpd daemon in the default VRF. Then, to run snmpd in the correct VRF, stop the
daemon in the default VRF (or stop any other snmpd daemons that happen to be running), then restart
snmpd in the management VRF so that it can respond to requests on interfaces only in that VRF. Make sure
that only one instance of the snmpd daemon is running and that it is running in the desired VRF. Assuming
the Management VRF has been enabled, the following example shows how to stop snmpd and restart it in
the management VRF.
ago
Main PID: 30880 (snmpd)
CGroup: /system.slice/system-snmpd.slice/snmpd@mgmt.
service
30880 /usr/sbin/snmpd -y -LS 0-4 d -Lf /dev/null -u snmp -g snmp -I -
smux -p /run/snmpd.pid -f
sysObjectID 1.3.6.1.4.1.40310
pass_persist .1.3.6.1.4.1.40310.1 /usr/share/snmp/resq_pp.py
pass_persist .1.3.6.1.4.1.40310.2 /usr/share/snmp/cl_drop_cntrs_pp.py
However, you need to copy several files to the NMS server for the custom Cumulus MIB to be recognized
on NMS server.
/usr/share/snmp/mibs/Cumulus-Snmp-MIB.txt
/usr/share/snmp/mibs/Cumulus-Counters-MIB.txt
/usr/share/snmp/mibs/Cumulus-Resource-Query-MIB.txt
cumulusnetworks.com 1003
Cumulus Linux 3.7 User Guide
Keyword Meaning
default The default keyword allows connections from any system. The localhost keyword
allows requests only from the local host. A restricted source can either be a
specific hostname (or address), or a subnet, represented as IP/MASK (like
10.10.10.0/255.255.255.0), or IP/BITS (like 10.10.10.0/24), or the IPv6 equivalents.
systemonly The name of this particular SNMP view. This is a user-defined value.
3. Restart snmpd:
At this time, SNMP does not support monitoring BGP unnumbered neighbors.
If you plan on using the OSPFv2 MIB, provide access to 1.3.6.1.2.1.14 and to 1.3.6.1.2.1.191 for the OSPv3
MIB.
To enable SNMP support for FRRouting:
2. Update the SNMP configuration to enable FRRouting to respond to SNMP requests. Open the /etc
/snmp/snmpd.conf file in a text editor and verify that the following configuration exists:
agentxsocket /var/agentx/master
agentxperms 777 777 snmp snmp
master agentx
Make sure that the /var/agentx directory is world-readable and world-searchable (octal
mode 755).
To verify the configuration, run snmpwalk. For example, if you have a running OSPF configuration with
routes, you can check this OSPF-MIB first from the switch itself with:
This configuration grants access to a large number of MIBs, including all SNMPv2-MIB, which
might reveal more data than expected. In addition to being a security vulnerability, it might
consume more CPU resources.
To enable the .1.3.6.1.2.1 range, make sure the view name commands include the required MIB objects.
Configure SNMPv3
SNMPv3 is often used to enable authentication and encryption, as community strings in versions 1 and 2c
are sent in plaintext. SNMPv3 usernames are added to the /etc/snmp/snmpd.conf file, along with
plaintext authentication and encryption pass phrases.
The NCLU command structures for configuring SNMP user passwords are:
cumulusnetworks.com 1005
Cumulus Linux 3.7 User Guide
The example below definines five users, each with a different combination of authentication and
encryption:
After configuring user passwords and restarting the snmpd daemon, you can check user access with a
client.
The snmp Debian package contains snmpget, snmpwalk, and other programs that are useful for
checking daemon functionality from the switch itself or from another workstation.
The following commands check the access for each user defined above from the localhost:
The following procedure shows a slightly more secure method of configuring SNMPv3 users without
creating cleartext passwords:
cumulusnetworks.com 1007
Cumulus Linux 3.7 User Guide
3. Use the net-snmp-config command to create two users, one with MD5 and DES, and the next
with SHA and AES.
The minimum password length is eight characters and the arguments -a and -x have
different meanings in net-snmp-config than snmpwalk.
This adds a createUser command in /var/lib/snmp/snmpd.conf. Do not edit this file by hand unless
you are removing usernames. It also adds the rwuser in /usr/share/snmp/snmpd.conf. You can edit
this file and restrict access to certain parts of the MIB by adding noauth, auth or priv to allow
unauthenticated access, require authentication, or to enforce use of encryption.
The snmpd daemon reads the information from the /var/lib/snmp/snpmd.conf file and then the line is
removed (eliminating the storage of the master password for that user) and replaced with the key that is
derived from it (using the EngineID). This key is a localized key, so that if it is stolen it cannot be used to
access other agents. To remove the two users userMD5withDES and userSHAwithAES, stop the snmpd
daemon and edit the /var/lib/snmp/snmpd.conf and /usr/share/snmp/snmpd.conf files. Remove
the lines containing the username, then restart the snmpd daemon as in step 3 above.
From a client, you access the MIB with the correct credentials. (The roles of -x, -a and -X and -A are
reversed on the client side as compared with the net-snmp-config command used above.)
createuser trapusername
iquerysecname trapusername
rouser trapusername
iquerysecname specifies the default SNMPv3 username to be used when making internal
queries to retrieve any necessary information — either for evaluating the monitored expression
or building a notification payload. These internal queries always use SNMPv3, even if normal
querying of the agent is done using SNMPv1 or SNMPv2c. Note that this user must also be
explicitly created via createUser and given appropriate access rights, for rouser, for example.
The iquerysecname directive is purely concerned with defining which user should be used, not
with actually setting this user up.
Although the traps are sent to an SNMPV2 receiver, the SNMPv3 user is still required. Starting
with Net-SNMP 5.3, snmptrapd no longer accepts all traps by default. snmptrapd must be
configured with authorized SNMPv1/v2c community strings and/or SNMPv3 users. Non-
authorized traps/informs are dropped. Refer to the snmptrapd.conf(5) manual page for details.
It is possible to define multiple trap receivers and to use the domain name instead of an IP
address in the trap2sink directive.
cumulusnetworks.com 1009
Cumulus Linux 3.7 User Guide
SNMPv3 TRAP/INFORM
The SNMP trap receiving daemon must have usernames, authentication passwords, and encryption
passwords created with its own EngineID. You must configure this trap server EngineID in the switch snmpd
daemon sending the trap and inform messages. You specify the level of authentication and encryption for
SNMPv3 trap and inform messages with -l (NoauthNoPriv, authNoPriv, or authPriv).
You can define multiple trap receivers and use the domain name instead of an IP address in the
trap2sink directive.
After you complete the configuration, restart the snmpd service to apply the changes:
snmptrap, snmpget, snmpwalk and snmpd itself must be able to bind to this address.
clientaddr [<transport-specifier>:]<transport-address>
specifies the source address to be used by command-line
applica
tions when sending SNMP requests. See snmpcmd(1) for
more infor
mation about the format of addresses.
This value is also used by snmpd when generating
notifications.
EXPRESSION
There are three types of monitor expression supported
by the Event MIB - existence, boolean and threshold tests.
OID OP VALUE
cumulusnetworks.com 1011
Cumulus Linux 3.7 User Guide
OPTIONS
You can configure snmpd to monitor the operational status of an Entity MIB or Entity-Sensor MIB. You can
determine the operational status, given as a value of ok(1), unavailable(2) or nonoperational(3), by adding the
following example configuration to /etc/snmp/snmpd.conf and adjusting the values:
Using the entPhySensorOperStatus integer:
cumulusnetworks.com 1013
Cumulus Linux 3.7 User Guide
You can use the OID name if the snmp-mibs-downloader package is installed.
To get all sensor information, run snmpwalk on the entPhysicalName table. For example:
5. Open the /etc/snmp/snmp.conf file to verify that the mibs : line is commented out:
#
# As the snmp packages come without MIB files due to license
reasons, loading
# of MIBs is disabled by default. If you added the MIBs you can
reenable
# loading them by commenting out the following line.
#mibs :
6. Open the /etc/default/snmpd file to verify that the export MIBS= line is commented out:
7. After you confirm the configuration, remove or comment out the non-free repository in /etc/apt
/sources.list.
cumulusnetworks.com 1015
Cumulus Linux 3.7 User Guide
linkUpDownNotifications yes
The default frequency for checking link up/down is 60 seconds. You can change the default
frequency using the monitor directive directly instead of the linkUpDownNotifications
directive. See man snmpd.conf for details.
To monitor the sensors individually, first use the sensors command to determine which sensors are
available to be monitored on the platform.
CY8C3245-i2c-4-2e
Adapter: i2c-0-mux (chan_id 2)
fan5: 7006 RPM (min = 2500 RPM, max = 23000 RPM)
fan6: 6955 RPM (min = 2500 RPM, max = 23000 RPM)
fan7: 6799 RPM (min = 2500 RPM, max = 23000 RPM)
fan8: 6750 RPM (min = 2500 RPM, max = 23000 RPM)
temp1: +34.0 C (high = +68.0 C)
temp2: +28.0 C (high = +68.0 C)
temp3: +33.0 C (high = +68.0 C)
temp4: +31.0 C (high = +68.0 C)
temp5: +23.0 C (high = +68.0 C)
Configure a monitor command for the specific sensor using the -I option. The -I option indicates that
the monitored expression is applied to a single instance. In this example, there are five temperature
sensors available. Use the following directive to monitor only temperature sensor 3 at 5 minute intervals.
load 12 10 5
includeAllDisks 1%
monitor -r 60 -o dskPath -o DiskErrMsg "dskTable" diskErrorFlag !=0
authtrapenable 1
snmptrapd.conf
Use the Net-SNMP trap daemon to receive SNMP traps. The /etc/snmp/snmptrapd.conf file is used to
configure how incoming traps are processed. Starting with Net-SNMP release 5.3, you must specify who is
authorized to send traps and informs to the notification receiver (and what types of processing these are
allowed to trigger). You can specify three processing types:
log logs the details of the notification in a specified file to standard output (or stderr), or through
syslog (or similar).
execute passes the details of the trap to a specified handler program, including embedded Perl.
cumulusnetworks.com 1017
Cumulus Linux 3.7 User Guide
execute passes the details of the trap to a specified handler program, including embedded Perl.
net forwards the trap to another notification receiver.
Typically, this configuration is log,execute,net to cover any style of processing for a particular category of
notification. But it is possible (even desirable) to limit certain notification sources to selected processing
only.
authCommunity TYPES COMMUNITY [SOURCE [OID | -v VIEW ]] authorizes traps and SNMPv2c
INFORM requests with the specified community to trigger the types of processing listed. By default, this
allows any notification using this community to be processed. You can use the SOURCE field to specify that
the configuration only applies to notifications received from particular sources. For more information about
specific configuration options within the file, look at the snmpd.conf(5) man page with the following
command:
######################################################################
#########
#
# EXAMPLE-trap.conf:
# An example configuration file for configuring the Net-SNMP
snmptrapd agent.
#
######################################################################
#########
#
# This file is intended to only be an example. If, however, you want
# to use it, it should be placed in /etc/snmp/snmptrapd.conf.
# When the snmptrapd agent starts up, this is where it will look for
it.
#
# All lines beginning with a '#' are comments and are intended for you
# to read. All other lines are configuration commands for the agent.
#
# PLEASE: read the snmptrapd.conf(5) manual page as well!
#
# this is the default (port 162) and defines the listening
# protocol and address (e.g. udp:10.10.10.10)
snmpTrapdAddr localhost
#
# defines the actions and the community string
authCommunity log,execute,net public
Supported MIBs
Below are the MIBs supported by Cumulus Linux, as well as suggested uses for them. The overall Cumulus
Linux MIB is defined in the /usr/share/snmp/mibs/Cumulus-Snmp-MIB.txt file.
BGP4-MIB, You can enable FRRouting SNMP support to provide support for OSPF-MIB (RFC-1850),
OSPFV3-MIB (RFC-5643), and BGP4-MIB (RFC-1657). See the FRRouting section (see
OSPFv2-MIB,
page 1004) above.
OSPFv3-MIB,
RIPv2-MIB
CUMULUS- Discard counters: Cumulus Linux also includes its own counters MIB, defined in /usr
COUNTERS- /share/snmp/mibs/Cumulus-Counters-MIB.txt. It has the OID .
MIB 1.3.6.1.4.1.40310.2
CUMULUS- The Cumulus Networks custom Power over Ethernet (see page 202) PoE MIB defined in
POE-MIB the /usr/share/snmp/mibs/Cumulus-POE-MIB.txt file. For devices that provide
PoE, this provides users with the system wide power information in poeSystemValues
as well as per interface PoeObjectsEntry values for the poeObjectsTable. Most of
this information comes from the poectl command. To enable this MIB, uncomment
the following line in /etc/snmp/snmpd.conf:
CUMULUS- Cumulus Linux includes its own resource utilization MIB, which is similar to using cl-
RESOURCE- resource-query. This MIB monitors layer 3 entries by host, route, nexthops, ECMP
QUERY-MIB groups, and layer 2 MAC/BDPU entries.The MIB is defined in /usr/share/snmp/mibs
/Cumulus-Resource-Query-MIB.txt and has the OID .1.3.6.1.4.1.40310.1.
CUMULUS- SNMP counters. For information on exposing CPU and memory information with SNMP,
SNMP-MIB see this knowledge base article.
ENTITY-MIB From RFC 4133, the temperature sensors, fan sensors, power sensors, and ports are
covered.
ENTITY- Physical sensor information (temperature, fan, and power supply) from RFC 3433.
SENSOR-MIB
IEEE8021-
BRIDGE-MIB
cumulusnetworks.com 1019
Cumulus Linux 3.7 User Guide
IEEE8021-Q-
The dot1dBasePortEntry and dot1dBasePortIfIndex tables in the BRIDGE-MIB
BRIDGE-MIB
and dot1qBase, dot1qFdbEntry, dot1qTpFdbEntry, dot1qTpFdbStatus, and
dot1qVlanStaticName tables in the Q-BRIDGE-MIB tables. You must uncomment the
bridge_pp.py pass_persist script in /etc/snmp/snmpd.conf.
IF-MIB Interface description, type, MTU, speed, MAC, admin, operation status, counters
The IF-MIB cache is disabled by default. To enable the counter to reflect traffic
statistics, remove the -y option from the SNMPDOPTS line in the /etc
/default/snmpd file. The example below first shows the original line,
commented out, then the modified line without the -y option:
LLDP-MIB Layer 2 neighbor information from lldpd (you need to enable the SNMP subagent (see
page 384) in LLDP). You need to start lldpd with the -x option to enable connectivity
to snmpd (AgentX).
LM-SENSORS Fan speed, temperature sensor values, voltages. This is deprecated since the ENTITY-
MIB SENSOR MIB has been added.
NET-SNMP- See this knowledge base article on extending NET-SNMP in Cumulus Linux to include
EXTEND-MIB data from power supplies, fans, and temperature sensors.
SNMP-
TARGET-MIB
The ENTITY MIB does not show the chassis information in Cumulus Linux.
cumulusnetworks.com 1021
Cumulus Linux 3.7 User Guide
Troubleshooting
Use the following commands to troubleshoot potential SNMP issues:
Contents
This topic describes ...
Configure Cumulus Linux (see page 1023)
Configure Nutanix (see page 1025)
Switch Information Displayed on Nutanix Prism (see page 1028)
Troubleshooting (see page 1029)
Enable LLDP/CDP on VMware ESXi (Hypervisor on Nutanix) (see page 1029)
Troubleshoot Connections without LLDP or CDP (see page 1031)
1. SSH to the Cumulus Linux switch that needs to be configured, replacing [switch] below as
appropriate:
cumulusnetworks.com 1023
1.
Community
5. Restart snmpd:
Configure Nutanix
1. Log into the Nutanix Prism. Nutanix defaults to the Home menu, referred to as the Dashboard:
2. Click on the gear icon in the top right corner of the dashboard, and select NetworkSwitch:
3. Click the +Add Switch Configuration button in the Network Switch Configuration pop up
window.
4. Fill out the Network Switch Configuration for the Top of Rack (ToR) switch configured for snmpd
in the previous section:
cumulusnetworks.com 1025
Cumulus Linux 3.7 User Guide
SNMP public
Community
Name
The rest of the values were not touched for this demonstration. They are usually used with
SNMP v3.
5. Save the configuration. The switch will now be present in the Network Switch Configuration
menu now.
6. Close the pop up window to return to the dashboard.
7. Open the Hardware option from the Home dropdown menu:
9.
cumulusnetworks.com 1027
Cumulus Linux 3.7 User Guide
9. Click the Switch button. Configured switches are shown in the table, as indicated in the screenshot
below, and can be selected in order to view interface statistics:
The switch has been added correctly, when interfaces hooked up to the Nutanix hosts are visible.
Troubleshooting
To help visualize the following diagram is provided:
cumulusnetworks.com 1029
1.
Cumulus Linux 3.7 User Guide
kb.vmware.com/selfservice/microsites/search.do?
language=en_US&cmd=displayKC&externalId=1003885
wahlnetwork.com/2012/07/17/utilizing-cdp-and-lldp-with-vsphere-networking/
For example, switch CDP on:
The both means CDP is now running, and the lldp dameon on Cumulus Linux is capable of
'seeing' CDP devices.
2. After the next CDP interval, the Cumulus Linux box will pick up the interface via the lldp daemon:
PortDescr: vmnic2
-----------------------------------------------------------------
--------------
Nutanix Acropolis is an alternate hypervisor that Nutanix supports. Acropolis Hypervisor uses the yum
packaging system and is capable of installing normal Linux lldp daemons to operating just like Cumulus
Linux. LLDP should be enabled for each interface on the host. Refer to https://community.mellanox.com
/docs/DOC-1522 for setup instructions.
1. Find the MAC address information in the Prism GUI, located in: Hardware > Table > Host > Host
NICs
2. Select a MAC address to troubleshoot (e.g. 0c:c4:7a:09:a2:43 represents vmnic0 which is tied to NX-
1050-A).
3. List out all the MAC addresses associated to the bridge:
cumulusnetworks.com 1031
Cumulus Linux 3.7 User Guide
Contents
This topic describes ...
Overview (see page 1033)
Trend Analysis Using Metrics (see page 1033)
Generate Alerts with Triggered Logging (see page 1034)
Log Formatting (see page 1034)
Hardware (see page 1034)
System Data (see page 1036)
CPU Idle Time (see page 1036)
Disk Usage (see page 1038)
Process Restart (see page 1038)
Layer 1 Protocols and Interfaces (see page 1039)
Layer 2 Protocols (see page 1046)
Layer 3 Protocols (see page 1048)
BGP (see page 1048)
OSPF (see page 1048)
Route and Host Entries (see page 1049)
Routing Logs (see page 1050)
Logging (see page 1050)
Protocols and Services (see page 1052)
Device Management (see page 1053)
Device Access Logs (see page 1053)
Device Super User Command Logs (see page 1053)
Overview
This document describes:
Metrics that you can poll from Cumulus Linux and use in trend analysis
Critical log messages that you can monitor for triggered alerts
Log Formatting
Most log files in Cumulus Linux use a standard presentation format. For example, consider this syslog
entry:
Hardware
The smond process provides monitoring functionality for various switch hardware elements. Minimum or
maximum values are output depending on the flags applied to the basic command. The hardware elements
and applicable commands and flags are listed in the table below.
Temperature 10 seconds
cumulus@switch:~$ smonctl -j
cumulus@switch:~$ smonctl -j -s TEMP[X]
Fan 10 seconds
cumulus@switch:~$ smonctl -j
cumulus@switch:~$ smonctl -j -s FAN[X]
PSU 10 seconds
cumulus@switch:~$ smonctl -j
cumulus@switch:~$ smonctl -j -s PSU[X]
cumulus@switch:~$ smonctl -j
cumulus@switch:~$ smonctl -j -s PSU[X]Fan[X]
cumulus@switch:~$ smonctl -j
cumulus@switch:~$ smonctl -j -s PSU[X]Temp
[X]
Voltage 10 seconds
cumulus@switch:~$ smonctl -j
cumulus@switch:~$ smonctl -j -s Volt[X]
cumulus@switch:~$ ledmgrd -d
cumulus@switch:~$ ledmgrd -j
Not all switch models include a sensor for monitoring power consumption and voltage. See this
note (see page 920) for details.
High
temperature
/usr/sbin/smond : : Temp1(Board Sensor near
CPU): state changed from UNKNOWN to OK
cumulusnetworks.com 1035
Cumulus Linux 3.7 User Guide
Fan speed
issues
/var /usr/sbin/smond : : Fan1(Fan Tray 1, Fan 1):
/log state changed from UNKNOWN to OK
/sysl /usr/sbin/smond : : Fan2(Fan Tray 1, Fan 2):
og state changed from UNKNOWN to OK
/usr/sbin/smond : : Fan3(Fan Tray 2, Fan 1):
state changed from UNKNOWN to OK
/usr/sbin/smond : : Fan4(Fan Tray 2, Fan 2):
state changed from UNKNOWN to OK
/usr/sbin/smond : : Fan5(Fan Tray 3, Fan 1):
state changed from UNKNOWN to OK
/usr/sbin/smond : : Fan6(Fan Tray 3, Fan 2):
state changed from UNKNOWN to OK
PSU failure
System Data
Cumulus Linux includes a number of ways to monitor various aspects of system data. In addition, alerts are
issued in high risk situations.
Short bursts of high CPU can occur during switchd churn or routing protocol startup. Do not set
alerts for these short bursts.
High
CPU
/var/log sysmonitor: Critically high CPU use: 99%
/syslog systemd[1]: Starting Monitor system resources
(cpu, memory, disk)...
systemd[1]: Started Monitor system resources
(cpu, memory, disk).
sysmonitor: High CPU use: 89%
systemd[1]: Starting Monitor system resources
(cpu, memory, disk)...
systemd[1]: Started Monitor system resources
(cpu, memory, disk).
sysmonitor: CPU use no longer high: 77%
Cumulus Linux 3.0 and later monitors CPU, memory, and disk space via sysmonitor. The configurations
for the thresholds are stored in /etc/cumulus/sysmonitor.conf. More information is available with
man sysmonitor.
Click here to see differences between Cumulus Linux 2.5 ESR and 3.0 and later...
High
CPU
cumulusnetworks.com 1037
Cumulus Linux 3.7 User Guide
In Cumulus Linux 2.5, CPU logs are created with each unique threshold:
User 70%
System 30%
Wait 20%
In Cumulus Linux 2.5, CPU and memory warnings are generated with jdoo. The configuration for the
thresholds is stored in /etc/jdoo/jdoorc.d/cl-utilities.rc.
Disk Usage
When monitoring disk utilization, you can exclude tmpfs from monitoring.
cumulus@switch:~$ /bin/df -x
tmpfs
Process Restart
In Cumulus Linux 3.0 and later, systemd is responsible for monitoring and restarting processes.
cumulus@switch:~$ systemctl
status
Click here to changes from Cumulus Linux 2.5 ESR to 3.0 and later...
Cumulus Linux 2.5.2 through 2.5 ESR uses a forked version of monit called jdoo to monitor processes. If
the process fails, jdoo invokes init.d to restart the process.
cumulus@switch:~$ ps -aux
Link state
Link speed
cumulusnetworks.com 1039
Cumulus Linux 3.7 User Guide
Port state
Bond state
Interface counters are obtained from either querying the hardware or the Linux kernel. The two outputs
should align, but the Linux kernel aggregates the output from the hardware.
Interface counters 10
seconds
cumulus@switch:~$ cat /sys/class/net/[iface]
/statistics/[stat_name]
cumulus@switch:~$ net show counters json
cumulus@switch:~$ cl-netstat -j
cumulus@switch:~$ ethtool -S [iface]
Link failure/Link
flap
/var switchd[5692]: nic.c:213 nic_set_carrier:
/log swp17: setting kernel carrier: down
/swi switchd[5692]: netlink.c:291 libnl: swp1,
tchd
family 0, ifi 20, oper down
switchd[5692]: nic.c:213 nic_set_carrier:
.log
swp1: setting kernel carrier: up
switchd[5692]: netlink.c:291 libnl: swp17,
family 0, ifi 20, oper up
Unidirectional
link
/var ptmd[7146]: ptm_bfd.c:2471 Created new session
/log 0x1 with peer 10.255.255.11 port swp1
/swi ptmd[7146]: ptm_bfd.c:2471 Created new session
tchd
0x2 with peer fe80::4638:39ff:fe00:5b port swp1
ptmd[7146]: ptm_bfd.c:2471 Session 0x1 down to
.log
peer 10.255.255.11, Reason 8
/var ptmd[7146]: ptm_bfd.c:2471 Detect timeout on
/log session 0x1 with peer 10.255.255.11, in state 1
/ptm
.log
Bond
Negotiation
/var kernel: [85412.763193] bonding: bond0 is being
Working
/log created...
/sys kernel: [85412.770014] bond0: Enslaving swp2
log
as a backup interface with an up link
kernel: [85412.775216] bond0: Enslaving swp1
as a backup interface with an up link
kernel: [85412.797393] IPv6: ADDRCONF
(NETDEV_UP): bond0: link is not ready
kernel: [85412.799425] IPv6: ADDRCONF
(NETDEV_CHANGE): bond0: link becomes ready
Bond
Negotiation
/var kernel: [85412.763193] bonding: bond0 is being
Failing
/log created...
/sys kernel: [85412.770014] bond0: Enslaving swp2
log
as a backup interface with an up link
kernel: [85412.775216] bond0: Enslaving swp1
as a backup interface with an up link
kernel: [85412.797393] IPv6: ADDRCONF
(NETDEV_UP): bond0: link is not ready
MLAG peerlink
negotiation
/var lldpd[998]: error while receiving frame on
Working
/log swp50: Network is down
/sys lldpd[998]: error while receiving frame on
log
swp49: Network is down
cumulusnetworks.com 1041
Cumulus Linux 3.7 User Guide
MLAG peerlink
negotiation
/var lldpd[998]: error while receiving frame on
Failing
/log swp50: Network is down
/sys lldpd[998]: error while receiving frame on
log swp49: Network is down
kernel: [76174.262893] peerlink: Setting
ad_actor_system to 44:38:39:00:00:11
kernel: [76174.264205] 8021q: adding VLAN 0 to
HW filter on device peerlink
mstpd: one_clag_cmd: setting (1) peer link:
peerlink
MLAG port
negotiation
/var kernel: [77419.112195] bonding: server01 is
Working
/log being created...
/sys lldpd[998]: error while receiving frame on
log
swp1: Network is down
kernel: [77419.122707] 8021q: adding VLAN 0 to
HW filter on device swp1
kernel: [77419.126408] server01: Enslaving
swp1 as a backup interface with a down link
kernel: [77419.177175] server01: Setting
ad_actor_system to 44:38:39:ff:40:94
kernel: [77419.190874] server01: Warning: No
802.3ad response from the link partner for any
adapters in the bond
kernel: [77419.191448] IPv6: ADDRCONF
(NETDEV_UP): server01: link is not ready
kernel: [77419.191452] 8021q: adding VLAN 0 to
HW filter on device server01
kernel: [77419.192060] server01: link status
definitely up for interface swp1, 1000 Mbps
full duplex
cumulusnetworks.com 1043
Cumulus Linux 3.7 User Guide
MLAG port
negotiation
/var kernel: [79290.290999] bonding: server01 is
Failing
/log being created...
/sys kernel: [79290.299645] 8021q: adding VLAN 0 to
log HW filter on device swp1
kernel: [79290.301790] server01: Enslaving
swp1 as a backup interface with a down link
kernel: [79290.358294] server01: Setting
ad_actor_system to 44:38:39:ff:40:94
kernel: [79290.373590] server01: Warning: No
802.3ad response from the link partner for any
adapters in the bond
kernel: [79290.374024] IPv6: ADDRCONF
(NETDEV_UP): server01: link is not ready
kernel: [79290.374028] 8021q: adding VLAN 0 to
HW filter on device server01
kernel: [79290.375033] server01: link status
definitely up for interface swp1, 1000 Mbps
full duplex
kernel: [79290.375037] server01: now running
without any active interface!
MLAG port
negotiation
/var mstpd: one_clag_cmd: setting (0) mac 00:00:00:
Flapping
/log 00:00:00 <server01, None>
/sys mstpd: one_clag_cmd: setting (1) mac 44:38:39:
log
00:00:03 <server01, None>
Prescriptive Topology Manager (PTM) uses LLDP information to compare against a topology.dot file that
describes the network. It has built in alerting capabilities, so it is preferable to use PTM on box rather than
polling LLDP information regularly. The PTM code is available on the Cumulus Networks GitHub repository.
Additional PTM, BFD, and associated logs are documented in the code.
Cumulus Networks recommends that you track peering information through PTM. For more
information, refer to the Prescriptive Topology Manager documentation (see page 348).
cumulus@switch:~$ lldpctl -f
json
cumulusnetworks.com 1045
Cumulus Linux 3.7 User Guide
Neighbor Element Monitoring Command/s Interval Poll
Layer 2 Protocols
Spanning tree is a protocol that prevents loops in a layer 2 infrastructure. In a stable state, the spanning
tree protocol should stably converge. Monitoring the Topology Change Notifications (TCN) in STP helps
identify when new BPDUs are received.
Spanning
Tree Working
/var kernel: [1653877.190724] device swp1 entered
/log promiscuous mode
/syslog kernel: [1653877.190796] device swp2 entered
promiscuous mode
mstpd: create_br: Add bridge bridge
mstpd: clag_set_sys_mac_br: set bridge mac 00:
00:00:00:00:00
mstpd: create_if: Add iface swp1 as port#2 to
bridge bridge
mstpd: set_if_up: Port swp1 : up
Spanning
Tree Blocking
/var mstpd: MSTP_OUT_set_state: bridge:swp2:0
/log entering blocking state(Designated)
/syslog mstpd: MSTP_OUT_set_state: bridge:swp2:0
entering learning state(Designated)
mstpd: MSTP_OUT_set_state: bridge:swp2:0
entering forwarding state(Designated)
mstpd: MSTP_OUT_flush_all_fids: bridge:swp2:0
Flushing forwarding database
mstpd: MSTP_OUT_flush_all_fids: bridge:swp2:0
Flushing forwarding database
mstpd: MSTP_OUT_set_state: bridge:swp2:0
entering blocking state(Alternate)
mstpd: MSTP_OUT_flush_all_fids: bridge:swp2:0
Flushing forwarding database
cumulusnetworks.com 1047
Cumulus Linux 3.7 User Guide
Layer 3 Protocols
When FRRouting boots up for the first time, there is a different log file for each daemon that is activated. If
the log file is ever edited (for example, through vtysh or frr.conf), the integrated configuration sends all
logs to the same file.
To send FRRouting logs to syslog, apply the configuration log syslog in vtysh.
BGP
When monitoring BGP, check if BGP peers are operational. There is not much value in alerting on the
current operational state of the peer; monitoring the transition is more valuable, which you can do by
monitoring syslog.
Monitoring the routing table provides trending on the size of the infrastructure. This is especially useful
when integrated with host-based solutions (such as Routing on the Host) when the routes track with the
number of applications available.
BGP
peer
down /var/log bgpd[3000]: %NOTIFICATION: sent to neighbor
/syslog swp1 4/0 (Hold Timer Expired) 0 bytes
/var/log bgpd[3000]: %ADJCHANGE: neighbor swp1 Down BGP
/frr/*.
Notification send
log
OSPF
1048 09 January 2019
Cumulus Networks
OSPF
When monitoring OSPF, check if OSPF peers are operational. There is not much value in alerting on the
current operational state of the peer; monitoring the transition is more valuable, which you can do by
monitoring syslog.
Monitoring the routing table provides trending on the size of the infrastructure. This is especially useful
when integrated with host-based solutions (such as Routing on the Host) when the routes track with the
number of applications available.
cumulus@switch:~$ cl-resource-query
cumulus@switch:~$ cl-resource-query -
k
cumulus@switch:~$ cl-resource-query
cumulus@switch:~$ cl-resource-query -
k
cumulusnetworks.com 1049
Cumulus Linux 3.7 User Guide
Routing Logs
Routing
protocol
process crash /var frrouting[1824]: Starting FRRouting daemons
/log (prio:10):. zebra. bgpd.
/sys bgpd[1847]: BGPd 1.0.0+cl3u7 starting:
log vty@2605, bgp@<all>:179
zebra[1840]: client 12 says hello and bids fair
to announce only bgp routes
watchquagga[1853]: watchquagga 1.0.0+cl3u7
watching [zebra bgpd], mode [phased zebra
restart]
watchquagga[1853]: bgpd state -> up : connect
succeeded
watchquagga[1853]: bgpd state -> down : read
returned EOF
cumulus-core: Running cl-support for core files
bgpd.3030.1470341944.core.core_helper
core_check.sh[4992]: Please send /var/support
/cl_support__spine01_20160804_201905.tar.xz to
Cumulus support
watchquagga[1853]: Forked background command
[pid 6665]: /usr/sbin/service frr restart bgpd
watchquagga[1853]: watchquagga 0.99.24+cl3u2
watching [zebra bgpd ospfd], mode [phased zebra
restart]
watchquagga[1853]: zebra state -> up : connect
succeeded
watchquagga[1853]: bgpd state -> up : connect
succeeded
watchquagga[1853]: Watchquagga: Notifying
Systemd we are up and running
Logging
The table below describes the various log files.
syslog Catch all log file. Identifies memory leaks and CPU spikes.
/va
r
/lo
g
/sy
slo
g
Routing The log file is configurable in FRRouting. When FRRouting first boots, it uses
protocol the non-integrated configuration so each routing protocol has its own log file.
After booting up, FRRouting switches over to using the integrated /va
configuration, so that all logs go to a single place. r
/lo
g
cumulusnetworks.com 1051
Cumulus Linux 3.7 User Guide
/fr
To edit the location of the log files, use the log file <location>
command. By default, FRRouting logs are not sent to syslog. Use the log r
syslog <level> command to send logs through rsyslog and into /var /ze
/log/syslog. bra
.
log
To write syslog debug messages to the log file, you must run the
/va
log syslog debug command to configure FRR with syslog
severity 7 (debug); otherwise, when you issue a debug command r
such as, debug bgp neighbor-events, no output is sent to /var /lo
/log/frr/frr.log. g
However, when you manually define a log target with the log /fr
file /var/log/frr/debug.log command, FRR automatically r/
defaults to severity 7 (debug) logging and the output is logged to
{pr
/var/log/frr/frr.log.
oto
col
}.
log
/va
r
/lo
g
/fr
r
/fr
r.
log
cumulus@switch:~$ /usr/bin/ntpq -p
Device Management
User
Authentication
and Remote /va sshd[31830]: Accepted publickey for cumulus
Login r from 192.168.0.254 port 45582 ssh2: RSA 38:e6:
/lo 3b:cc:04:ac:41:5e:c9:e3:93:9d:cc:9e:48:25
g sshd[31830]: pam_unix(sshd:session): session
opened for user cumulus by (uid=0)
/sy
slo
g
Executing
commands
using sudo /var sudo: cumulus : TTY=unknown ; PWD=/home
/log /cumulus ; USER=root ; COMMAND=/tmp
/sysl /script_9938.sh -v
og
sudo: pam_unix(sudo:session): session opened
for user root by (uid=0)
sudo: pam_unix(sudo:session): session closed
for user root
cumulusnetworks.com 1053
Cumulus Linux 3.7 User Guide
Babel HIGH 16777217 BABEL Babel has failed to Find the process that is
Memory Errors allocate memory. The causing memory
system is about to shortages and
run out of memory. remediate that
process. Restart FRR.
Babel HIGH 16777218 BABEL Packet Babel has detected a Collect the relevant log
Error packet encode files and report the
/decode problem. issue for
troubleshooting.
Babel HIGH 16777219 BABEL Babel has detected a Ensure that the
Configuration configuration error of configuration is
Error some sort. correct.
Babel HIGH 16777220 BABEL Route Babel has detected a Gather data to report
Error routing error and is the issue for
in an inconsistent troubleshooting.
state. Restart FRR.
BGP HIGH 33554433 BGP attribute BGP attribute flag is Determine the soure of
flag is set to the wrong the attribute and
incorrect value (Optional determine why the
/Transitive/Partial). attribute flag has been
set incorrectly.
BGP HIGH 33554434 BGP attribute BGP attribute length Determine the soure of
length is is incorrect. the attribute and
incorrect determine why the
attribute length has
been set incorrectly.
BGP HIGH 33554435 BGP attribute BGP attribute origin Determine the soure of
origin value value is invalid. the attribute and
invalid determine why the
origin attribute has
been set incorrectly.
BGP HIGH 33554436 BGP as path is BGP AS path has Determine the soure of
invalid been malformed. the update and
determine why the AS
path has been set
incorrectly.
BGP HIGH 33554439 BGP PMSI BGP update has Determine the soure of
tunnel invalid type for PMSI the update and
attribute type tunnel. determine why the
is invalid PMSI tunnel attribute
type has been set
incorrectly.
BGP HIGH 33554440 BGP PMSI BGP update has Determine the soure of
tunnel invalid length for the update and
attribute PMSI tunnel. determine why the
length is PMSI tunnel attribute
invalid length has been set
incorrectly.
BGP HIGH 33554443 BGP failed to BGP was unable to Determine if all
delete peer delete the peer expected peers are
structure structure when the removed and restart
address-family was FRR if not. This is most
removed. likely a bug.
BGP HIGH 33554444 BGP failed to BGP unable to get Ensure there is
get table chunk memory for adequate memory on
chunk memory table manager. the device to support
the table
requirements.
BGP HIGH 33554445 BGP received BGP received MACIP Verify the MACIP
MACIP with with invalid IP entries inserted in
invalid IP addr address length from Zebra are correct. This
len Zebra. is most likely a bug.
BGP HIGH 33554446 BGP received BGP received an Label manager sent an
invalid label invalid label manager invalid message to BGP
manager message from the for the wrong protocol
message label manager. instance. This is most
likely a bug.
cumulusnetworks.com 1055
Cumulus Linux 3.7 User Guide
BGP HIGH 33554447 BGP unable to BGP attempted to Ensure that the device
allocate generate JSON has adequate memory
memory for output and was to support the
JSON output unable to allocate the required functions.
memory required.
BGP HIGH 33554448 BGP update BGP attempted to This is most likely a
had attributes send an update but bug. If the problem
too long to the attributes were persists, report it for
send too long to fit. troubleshooting.
BGP HIGH 33554449 BGP update BGP attempted to This is most likely a
group creation create an update bug. If the problem
failed group but was unable persists, report it for
to do so. troubleshooting.
BGP HIGH 33554450 BGP error BGP attempted to This is most likely a
creating create an update bug. If the problem
update packet packet but was persists, report it for
unable to do so. troubleshooting.
BGP HIGH 33554451 BGP error BGP received an Determine the sending
receiving open open from a peer peer and correct its
packet that was invalid. invalid open packet.
BGP HIGH 33554453 BGP error BGP received an This is most likely a
receiving from update from a peer bug. If the problem
peer but the status was persists, report it for
incorrect. troubleshooting.
BGP HIGH 33554454 BGP error BGP received an Determine the source
receiving invalid update packet. of the update and
update packet resolve the invalid
update being sent.
BGP HIGH 33554455 BGP error due BGP attempted a Enable the capability if
to capability function that did not this functionality is
not enabled have the capability desired.
enabled.
BGP HIGH 33554456 BGP error BGP unable to BGP notify received
receiving notify process the while in a stopped
message notification message. state. If the problem
persists, report it for
troubleshooting.
BGP HIGH 33554457 BGP error BGP unable to BGP keepalive received
receiving process a keepalive while in a stopped
keepalive packet. state. If the problem
packet persists, report it for
troubleshooting.
BGP HIGH 33554458 BGP error BGP unable to BGP route refresh
receiving route process route refresh received while in a
refresh message. stopped state. If the
message problem persists,
report it for
troubleshooting.
BGP HIGH 33554460 BGP error with BGP unable to BGP received the
nexthopo process nexthop nexthop update but
update update. the nexthop is not
reachable in this BGP
instance. Report the
problem for
troubleshooting.
BGP HIGH 33554462 Multipath BGP was started with Correct the ECMP
specified is an invalid ECMP /multipath value
invalid /multipath value. supplied when starting
the BGP daemon.
cumulusnetworks.com 1057
Cumulus Linux 3.7 User Guide
BGP HIGH 33554465 BGP FSM issue BGP neighbor This is most likely a
transition problem. bug. If the problem
persists, report it for
troubleshooting.
BGP HIGH 33554466 BGP VNI BGP could not create This is most likely a
creation issue a new VNI. bug. If the problem
persists, report it for
troubleshooting.
BGP HIGH 33554467 BGP default BGP could not find Define a default
instance default instance. instance of BGP since
missing some feature requires
its existence.
BGP HIGH 33554468 BGP remote BGP remote VTEP is Correct the remote
VTEP invalid invalid and cannot be VTEP configuration or
used. resolve the source of
the problem.
BGP HIGH 33554470 BGP EVPN BGP attempted to This is most likely a
route delete delete an EVPN route bug. If the problem
error and failed. persists, report it for
troubleshooting.
BGP HIGH 33554471 BGP EVPN BGP attempted to This is most likely a
install/uninstall install or uninstall an bug. If the problem
error EVPN prefix and persists, report it for
failed. troubleshooting.
BGP HIGH 33554472 BGP EVPN BGP received an Determine the source
route received EVPN route with of the EVPN route and
with invalid invalid contents. resolve whatever is
contents causing the invalid
content.
BGP HIGH 33554473 BGP EVPN BGP attempted to This is most likely a
route create create an EVPN route bug. If the problem
error and failed. persists, report it for
troubleshooting.
BGP HIGH 33554474 BGP EVPN ES BGP attempted to This is most likely a
entry create create an EVPN ES bug. If the problem
error entry and failed. persists, report it for
troubleshooting.
BGP HIGH 33554478 BGP Flowspec The BGP flowspec Gather log files from
packet subsystem has both sides of the
processing detected an error in peering relationship
error the sending or and report the issue
receiving of a packet. for troubleshooting.
BGP HIGH 33554479 BGP Flowspec The BGP flowspec Gather log files from
Installation subsystem has the router and report
/removal Error detected that there the issue for
was a failure for troubleshooting.
installation/removal Restart FRR.
/modification of
Flowspec from the
dataplane.
EIGRP HIGH 50331649 EIGRP Packet EIGRP has a packet Gather log files from
Error that does not both sides of the
correctly decode or neighbor relationship
encode. and report the issue
for troubleshooting.
cumulusnetworks.com 1059
Cumulus Linux 3.7 User Guide
General HIGH 100663297 Failure to raise FRR attempted to Ensure that you are
or lower raise or lower its running FRR as the frr
privileges privileges and was user and that the user
unable to do so. has sufficient privileges
to properly access root
privileges.
General HIGH 100663298 VRF Failure on Upon startup, FRR Ensure that there is
Start failed to properly sufficient memory to
initialize and start up start processes, then
the VRF subsystem. restart FRR.
General HIGH 100663299 Socket Error When attempting to Ensure that there are
access a socket, a sufficient system
system error occured resources available and
and FRR was unable ensure that the frr user
to properly complete has sufficient
the request. permisions to work.
General HIGH 100663303 System Call FRR has detected an Ensure permissions are
Error error from using a correct for FRR users
vital system call and and groups.
has probably already Additionally, check that
exited. sufficient system
resources are available.
General HIGH 100663304 VTY Subsystem FRR has detected a Ensure the
Error problem with the configuration file exists
specified and has the correct
configuration file. permissions for
operations.
Additionally, ensure
that all config lines are
correct as well.
General HIGH 100663305 SNMP FRR has detected a Examine the callback
Subsystem problem with the message and ensure
Error SNMP library it uses. SNMP is properly set
A callback from this up and working.
subsystem has
indicated some error.
General HIGH 100663306 Interface FRR has detected a Open an issue with all
Subsystem problem with relevant log files and
Error interface data from restart FRR.
the kernel as it
deviates from what
we would expect to
happen via normal
netlink messaging.
General HIGH 100663307 NameSpace FRR has detected a Open an issue with all
Subsystem problem with relevant log files and
Error namespace data restart FRR.
from the kernel as it
deviates from what
we would expect to
happen via normal
kernel messaging.
General HIGH 100663308 Developmental FRR has detected an Open an issue with all
Escape Error issue where new relevant log files.
development has not
properly updated all
code paths.
General HIGH 100663309 ZMQ FRR has detected an Open an issue with all
Subsystem issue with the relevant log files and
Error ZeroMQ subsystem restart FRR.
and ZeroMQ is not
working properly
now.
General HIGH 100663310 Feature or FRR was not Recompile FRR with the
system compiled with feature enabled or find
unavailable support for a out what platforms
particular feature or support the feature.
it is not available on
the current platform.
General HIGH 4043309071 IRDP message The length encoded Notify a developer.
length in the IP TLV does not
mismatch match the length of
the packet received.
cumulusnetworks.com 1061
Cumulus Linux 3.7 User Guide
General HIGH 4043309075 Netlink FRR was not Recompile FRR with
backend not compiled with Netlink or install a
available support for Netlink. package that supports
Any operations that this feature.
require Netlink will
fail.
General HIGH 4043309076 Protocol FRR was not Recompile FRR with
Buffers compiled with protobuf support or
backend not support for protocol install a package that
available buffers. Any supports this feature.
operations that
require protobuf will
fail.
General HIGH 4043309087 Cannot set The socket receive Ignore this error.
receive buffer buffer size could not
size be set in the kernel.
General HIGH 4043309089 Receive buffer The kernel's buffer Zebra will restart itself.
overrun for a socket has been Notify a developer if
overrun, rendering this issue shows up
the socket invalid. frequently.
General HIGH 4043309094 String could There was an attempt Notify a developer.
not be parsed to parse a string as
as IP prefix an IPv4 or IPv6 prefix,
but the string could
not be parsed and
this operation failed.
General HIGH 268435457 WATCHFRR WATCHFRR has Ensure that FRR is still
Connection detected a running. If it isn't,
Error connectivity issue report the issue for
with one of the FRR troubleshooting.
daemons.
OSPF HIGH 134217731 OSPF Domain OSPF attempted to Check OSPF network
Corruption process a router LSA, database for a
but there was an corrupted LSA. If the
advertising ID problem persists, shut
mismtach with the down the OSPF domain
link ID. and report the
problem for
troubleshooting.
OSPF HIGH 134217734 OSPF SR hash OSPF segment This is most likely a
node creation routing node creation bug. If the problem
failed failed. persists, report it for
troubleshooting.
cumulusnetworks.com 1063
Cumulus Linux 3.7 User Guide
PIM HIGH 184549377 PIM MSDP PIM has received a Check the MSDP peer
Packet Error packet from a peer and ensure it is
that does not correctly working.
correctly decode.
RIP HIGH 201326593 RIP Packet RIP has detected a Gather log files from
Error packet encode both sides and open a
/decode issue. Issue
Zebra HIGH 4043309057 Error reading Zebra could not read Wait for the error to
response from the ZAPI header from resolve on its own. If it
label manager the label manager. does not resolve,
restart Zebra.
Zebra HIGH 4043309058 Label manager Zebra was unable to Ensure that clients that
could not find find a ZAPI client use the label manager
ZAPI client matching the given are properly
protocol and instance configured and
number. running.
Zebra HIGH 4043309059 Zebra could Zebra found the Ensure that clients that
not relay label client and instance to use the label manager
manager relay the label are properly
response manager response or configured and
request, but was running.
unable to do so,
possibly because the
connection was
closed.
Zebra HIGH 100663301 ZAPI Error The ZAPI subsystem Restart FRR.
has detected an
encoding issue
between Zebra and a
client protocol.
Zebra HIGH 100663302 ZAPI Error The ZAPI subsystem Restart FRR.
has detected a socket
error between Zebra
and a client.
Zebra HIGH 4043309064 Zebra label Zebra is unable to Make the label range
manager used assign additional bigger and restart
all available label chunks because Zebra.
labels it has exhausted its
assigned label range.
cumulusnetworks.com 1065
Cumulus Linux 3.7 User Guide
Zebra HIGH 4043309077 Table manager Zebra's table Reconfigure Zebra with
used all manager used up all a larger range of table
available IDs IDs available to it and IDs.
can't assign any
more.
Zebra HIGH 4043309079 Zebra did not Zebra's table chunk Ignore this error.
free any table cleanup procedure
chunks ran but no table
chunks were
released.
Zebra HIGH 4043309080 Address family Zebra attempted to Ensure that your
specifier process information configuration is
unrecognized from somewhere that correct. If it is, notify a
included an address developer.
family specifier but
did not recognize the
provided specifier.
Zebra HIGH 4043309083 Label manager Zebra's label Ensure that Zebra has
unable to manager was unable a sufficient label range
assign label to assign a label available and that
chunk chunk to client. there is not a range
collision.
cumulusnetworks.com 1067
Cumulus Linux 3.7 User Guide
Zebra HIGH 4043309085 Table manager Zebra's table Ensure that Zebra has
unable to manager was unable sufficient table ID
assign table to assign a table range available and
chunk chunk to a client. that there is not a
range collision.
Zebra HIGH 4043309088 Unknown Zebra received a Verify that you are
Netlink Netlink message with running the latest
message type an unrecognized type version of FRR to
field. ensure kernel
compatibility. If the
problem persists, notify
a developer.
cumulusnetworks.com Solutions
Network 1069
Cumulus Linux 3.7 User Guide
Network Solutions
Contents
This topic describes ...
Layer 2 - Architecture (see page 1070)
Traditional Spanning Tree - Single Attached (see page 1070)
MLAG (see page 1072)
Layer 3 Architecture (see page 1074)
Single-attached Hosts (see page 1074)
Redistribute Neighbor (see page 1076)
Routing on the Host (see page 1077)
Routing on the VM (see page 1078)
Virtual Router (see page 1079)
Anycast with Manual Redistribution (see page 1080)
Network Virtualization (see page 1081)
Layer 2 - Architecture
auto eth1
iface eth1 inet manual
auto eth1.10
iface eth1.10 inet manual
auto eth2
iface eth1 inet manual
auto eth2.20
iface eth2.20 inet manual
auto br-10
iface br-10 inet manual
bridge-ports eth1.10 vnet0
cumulusnetworks.com 1071
Cumulus Linux 3.7 User Guide
auto br-20
iface br-20 inet manual
bridge-ports eth2.20 vnet1
MLAG
MLAG (see page 427) (multi-chassis link aggregation) is when both Benefits
uplinks are utilized at the same time. VRR gives the ability for both
100% of links utilized
spines to act as gateways simultaneously for HA (high availability) and
active-active mode (see page 515) (both are being used at the same Caveats
time).
More complicated (more
Configurations moving parts)
leaf01 Config More configuration
No interoperability
auto bridge between vendors
iface bridge
ISL (inter-switch link)
bridge-vlan-aware yes required
bridge-ports host-01 peerlink
bridge-vids 1-2000 Additional Comments
bridge-stp on Can be done with either
the traditional (see page
auto bridge.10 395) or VLAN-aware (see
iface bridge.10 page 402) bridge driver
address 172.16.1.2/24 depending on overall STP
address-virtual 44:38:39:00:00:10 needs
172.16.1.1/24 There are a few different
solutions including Cisco
auto peerlink VPC and Arista MLAG, but
iface peerlink none of them
bond-slaves glob swp49-50 interoperate and are very
vendor specific
auto peerlink.4094 Cumulus Networks Layer
iface peerlink.4094 2 HA validated design
address 169.254.1.2 guide
clagd-enable yes
clagd-peer-ip 169.254.1.2
clagd-system-mac 44:38:39:FF:40:94
auto host-01
iface host-01
bond-slaves swp1
clag-id 1
{bond-defaults removed for brevity}
auto bond0
iface bond0 inet manual
bond-slaves eth0 eth1
{bond-defaults removed for brevity}
auto bond0.10
iface bond0.10 inet manual
auto vm-br10
iface vm-br10 inet manual
bridge-ports bond0.10 vnet0
cumulusnetworks.com 1073
Cumulus Linux 3.7 User Guide
Layer 3 Architecture
Single-attached Hosts
ip ospf area 0
leaf02 Config
/etc/network/interfaces
auto swp1
iface swp1
address 172.16.2.1/30
/etc/frr/frr.conf
router ospf
router-id 10.0.0.12
interface swp1
ip ospf area 0
auto eth1
iface eth1 inet static
address 172.16.1.2/30
up ip route add 0.0.0.0/0
nexthop via 172.16.1.1
auto eth1
iface eth1 inet static
address 172.16.2.2/30
up ip route add 0.0.0.0/0
nexthop via 172.16.2.1
No redundancy, uses single ToR as Big Data validated design guide uses single
gateway. attached ToR
cumulusnetworks.com 1075
Cumulus Linux 3.7 User Guide
Redistribute Neighbor
Equal cost route installed on server/host Cumulus Networks blog post introducing
/hypervisor to both ToRs to load balance redistribute neighbor
evenly.
cumulusnetworks.com 1077
Cumulus Linux 3.7 User Guide
Certain hypervisors or
host OSes might not
support a routing
application like FRRouting
and will require a virtual
router on the hypervisor
No L2 adjacnecy between
servers without VXLAN
The first hop is still the ToR, just like redistribute neighbor Installing the Cumulus
Linux FRRouting Package
A default route can be advertised by all leaf/ToRs for
on an Ubuntu Server
dynamic ECMP paths
Configuring FRRouting (see
page 719)
Routing on the VM
The first hop is still the ToR, just like Installing the Cumulus Linux FRRouting
redistribute neighbor Package on an Ubuntu Server
Multiple ToRs (2+) can be used Configuring FRRouting (see page 719)
Virtual Router
cumulusnetworks.com 1079
Cumulus Linux 3.7 User Guide
The gateway would be the vRouter, Installing the Cumulus Linux FRRouting Package
which has two routes out (two ToRs) on an Ubuntu Server
Multiple vRouters could be used Configuring FRRouting (see page 719)
In contrast to routing on the host (preferred), this method allows a user Benefits
to route to the host. The ToRs are the gateway, as with redistribute
Most benefits of
neighbor, except because there is no daemon running, the networks
routing on the host
must be manually configured under the routing process. There is a
potential to black hole unless a script is run to remove the routes when No requirement for
the host no longer responds. host to run routing
Configurations No requirement for
redistribute neighbor
leaf01 Config
Caveats
/etc/network/interfaces
Removing a subnet
from one ToR and re-
adding it to another
auto swp1
(hence, network
iface swp1
statements from your
address 172.16.1.1/30 router process) is a
manual process
/etc/frr/frr.conf Network team and
server team would
have to be in sync, or
router ospf server team controls
router-id 10.0.0.11 the ToR, or
interface swp1 automation is being
ip ospf area 0 used whenever VM
migration happens
router ospf
router-id 10.0.0.12
interface swp1
ip ospf area 0
auto lo
iface lo inet loopback
auto lo:1
iface lo:1 inet static
address 172.16.1.2/32
up ip route add 0.0.0.0/0 nexthop via 172.16
.1.1 dev eth0 onlink nexthop via 172.16.1.1
dev eth1 onlink
auto eth1
iface eth2 inet static
address 172.16.1.2/32
auto eth2
iface eth2 inet static
address 172.16.1.2/32
Network Virtualization
LNV with MLAG
cumulusnetworks.com 1081
Cumulus Linux 3.7 User Guide
auto vni-10
iface vni-10
vxlan-id 10
vxlan-local-tunnelip 10.0.0.11
auto br-10
iface br-10
bridge-ports swp1 vni-10
leaf02 Config
/etc/network/interfaces
auto lo
iface lo inet loopback
address 10.0.0.12/32
Vxrd-src-ip 10.0.0.12
vxrd-svcnode-ip 10.10.10.10
clagd-vxlan-anycast-ip 36.0.0.11
auto vni-10
iface vni-10
vxlan-id 10
vxlan-local-tunnelip 10.0.0.12
auto br-10
iface br-10
bridge-ports swp1 vni-10
More Information
Contents
This topic describes ...
Reference Topology (see page 1083)
IP and MAC Addressing (see page 1084)
Build the Topology (see page 1085)
Virtual Appliance (see page 1085)
Hardware (see page 1085)
Demos (see page 1085)
Reference Topology
cumulusnetworks.com 1083
Cumulus Linux 3.7 User Guide
Reference Topology
The Cumulus Networks reference topology includes cabling (in DOT format for dual use with PTM (see page
348)), MAC addressing, IP addressing, switches and servers. This topology is blessed by the Professional
Services Team at Cumulus Networks to fit a majority of designs seen in the field.
edge01 192.168.0.51 A0:00:00:00:00: 10g NICs (customer edge device, firewall, load
51 balancer, etc.)
Virtual Appliance
You can build out the reference topology in hardware or using Cumulus VX (the free Cumulus Networks
virtual appliance). The Cumulus Reference Topology using Vagrant is essentially the reference topology built
out inside Vagrant with VirtualBox or KVM. The installation and setup instructions for bringing up the entire
reference topology on a laptop or server are on the cldemo-vagrant GitHub repo.
Hardware
Any switch from the hardware compatibility list is compatible with the topology as long as you follow the
interface count from the table above. Of course, in your own production environment, you don't have to
use exactly the same devices and cabling as outlined above.
Demos
You can find an up to date list of all the demos in the cldemo-vagrant GitHub repository, which is available
to anyone free of charge.
cumulusnetworks.com 1085
Cumulus Linux 3.7 User Guide
...
2. Create the /etc/apt/sources.list.d/docker.list file, add the following line in a text editor,
and save the file:
1. Install Docker:
1. Add docker as a new line at the bottom of /etc/vrf/systemd.conf, and save the file.
...
docker
[Service]
ExecStart=
ExecStart=/usr/bin/docker daemon --iptables=false --ip-
masq=false --ip-forward=false
1. Enable the Docker management daemon so it starts when the switch boots:
cumulusnetworks.com 1087
Cumulus
1. Linux 3.7 User Guide
Performance Notes
Keep in mind switches are not servers, in terms of the hardware that drives them. As such, you should be
mindful of the types of applications you want to run in containers on a Cumulus Linux switch. In general,
depending upon the configuration of the container, you can expect DHCP servers, custom scripts and other
lightweight services to run well. However, VPN, NAT and encryption-type services are CPU-intensive and
could lead to undesirable effects on critical applications. Use of any resource-intensive services should be
avoided and is not supported.
Contents
This topic describes ...
Configure the REST API (see page 1089)
Install and Configure the Cumulus Networks Modular Layer 2 Mechanism Driver (see page 1090)
Try OpenStack with Cumulus in the Cloud (see page 1090)
[ML2]
#local_bind = 10.40.10.122
#service_node = 10.40.10.1
2. Restart the REST API service for the configuration changes to take effect:
cumulusnetworks.com 1089
2.
Additional REST API calls have been added to support the configuration of bridge using the bridge name
instead of network ID.
[ml2_cumulus]
switches="192.168.10.10,192.168.20.20"
The ML2 mechanism driver contains the following configurable parameters. You configure them in the /etc
/neutron/plugins/ml2/ml2_conf.ini file.
switches — The list of Cumulus Linux switches connected to the Neutron host. Specify a list of IP
addresses.
scheme — The scheme (for example, HTTP) for the base URL for the ML2 API.
protocol_port — The protocol port for the bast URL for the ML2 API. The default value is 8000.
sync_time — A periodic time interval for polling the Cumulus Linux switch. The default value is 30
seconds.
spf_enable — Enables/disables SPF for the bridge. The default value is False.
new_bridge — Enables/disables VLAN-aware bridge mode (see page 402) for the bridge
configuration. The default value is False, so a traditional mode bridge is created.
Anycast Architecture
Anycast relies on layer 3 equal cost multipath functionality to provide load sharing throughout the network.
Each server announces a route for a service. As the route is propagated through the network, each network
device sees the route as originating from multiple places. As an end user connects to the anycast IP, each
network device performs a hardware hash of the layer 3 and layer 4 headers to determine which path to
use.
Every packet in a flow from an end user has the same source and destination IP address as well as source
and destination port numbers. The hash performed by the network devices results in the same answer for
every packet, ensuring all packets in a flow are sent to the same destination.
In the following image, the client initiates two flows: the blue, dotted flow and the red dashed flow. Each
flow has the same source IP address (the client’s IP address), destination IP address (172.16.255.66) and
same destination port (depending on the service; for example, DNS is port 53). Each flow has a unique
source port generated by the client.
cumulusnetworks.com 1091
Cumulus Linux 3.7 User Guide
In this example, each flow hashes to different servers based on this source port, which you can see when
you run ip route show to the destination IP address:
On a Cumulus Linux switch, you can see the hardware hash with the cl-ecmpcalc command. In Figure 2,
two flows originate from a remote user destined to the anycast IP address. Each session has a different
source port. Using the cl-ecmpcalc command, you can see that the sessions were hashed to different
egress ports.
As previously described, every packet in a flow hashes to the same next hop. However, if that next hop is no
longer valid, the traffic flows to another anycast next hop instead. For example, in the image below, if leaf03
fails, traffic flows to a different anycast address; in this case, server04:
For stateless applications that rely on UDP, like DNS, this does not present a problem. However, for stateful
applications that rely on TCP, like HTTP, this breaks any existing traffic flows, such as a file download. If the
TCP three-way handshake was established on server03, after the failure, server04 would have no
connection built and would send a TCP reset message back to the client, restarting the session.
This is not to say that it is not possible to use TCP-based applications for anycast. However, TCP
applications in an anycast environment should have short-lived flows (measured in seconds or less) to
reduce the impact of network changes or failures.
Resilient Hashing
Resilient hashing (see page 817) provides a method to prevent failures from impacting the hash result of
unrelated flows. However, resilient hashing does not prevent rehashing when new next hops are added.
As previously mentioned, the hardware hashing function determines which path gets used for a given flow.
The simplified version of that hash is the combination of protocol, source IP address, destination IP
address, source layer 4 port and destination layer 4 port. The full hashing function includes not only these
fields but also the list of possible layer 3 next hop addresses. The hash result is passed through a modulo of
the number of next hop addresses. If the number of next hop addresses changes, through either addition
or subtraction of the next hops, this changes the hash result for all traffic, including flows that have already
established.
Continuing with the example in Figure 3, leaf03 is in a failed state, so traffic is hashing to server04. This is a
result of the hash considering three possible next hop IPs (leaf01, leaf02, leaf04). When leaf03 is brought
back online, the number of possible next hop IPs grows to four. This changes the modulo value that is part
of the hashing function, which may result in traffic being sent to a different server, even if previously
unaffected by the change.
As you can see below, leaf03 is in a failed state. The blue dotted flow uses leaf02 to reach server02.
cumulusnetworks.com 1093
Cumulus Linux 3.7 User Guide
As leaf03 is brought back into service, the hashing function on spine02 changes, impacting the blue dotted
flow:
Just as the addition of a device can impact unrelated traffic, the removal of a device can also impact
unrelated traffic, since again, the modulo of the hash function is changed. You can see this below, where
the blue dotted flow goes through leaf01 and the red dashed line goes through leaf04.
Now, leaf02 has failed. As a result, the modulo on spine02 has changed from four possible next hops to
1094 09 January 2019
Cumulus Networks
Now, leaf02 has failed. As a result, the modulo on spine02 has changed from four possible next hops to
only three next hops. In this example, the red dashed line has rehashed to leaf03:
To help solve this issue, resilient hashing can prevent traffic flows from shifting on unrelated failure
scenarios. With resilient hashing enabled, the failure of leaf02 does not impact both existing flows, since
they do not currently flow through leaf02:
Although resilient hashing can prevent rehashing on next hop failure, it cannot prevent rehashing on next
hop addition.
You can read more information on resilient hashing in the ECMP chapter (see page 812).
cumulusnetworks.com 1095
Cumulus Linux 3.7 User Guide
Conclusion
Anycast can provide a low cost, highly scalable implementation for services. However, the limitations
inherent in network-based ECMP makes anycast challenging to integrate with some applications. An
anycast architecture is best suited for stateless applications or applications that are able to share session
state at the application layer.
Contents
This topic describes ...
Enable RDMA over Converged Ethernet with PFC (see page 1097)
Enable RDMA over Converged Ethernet with ECN (see page 1098)
Related Information (see page 1099)
On Mellanox switches, you can alternately use NCLU to configure RoCE with PFC:
cumulusnetworks.com 1097
Cumulus Linux 3.7 User Guide
...
ecn_red.port_group_list = [ROCE_ECN]
pfc.ROCE_PFC.port_set = swp1
pfc.ROCE_PFC.cos_list = [1]
pfc.ROCE_PFC.xoff_size = 18000
pfc.ROCE_PFC.xon_delta = 18000
pfc.ROCE_PFC.tx_enable = true
pfc.ROCE_PFC.rx_enable = true
pfc.ROCE_PFC.port_buffer_bytes = 70000
ecn_red.ROCE_ECN.port_set = swp1
ecn_red.ROCE_ECN.cos_list = [0,1]
ecn_red.ROCE_ECN.min_threshold_bytes = 150000
ecn_red.ROCE_ECN.max_threshold_bytes = 1500000
ecn_red.ROCE_ECN.ecn_enable = true
ecn_red.ROCE_ECN.red_enable = true
ecn_red.ROCE_ECN.probability = 100
...
While link pause (see page 274) is another way to provide lossless ethernet, PFC is the preferred
method. PFC allows more granular control by pausing the traffic flow for a given CoS group,
rather than the entire link.
On Mellanox switches, you can alternately use NCLU to configure RoCE with ECN:
ecn_red.port_group_list = [ROCE_ECN]
ecn_red.ROCE_ECN.port_set = swp1
ecn_red.ROCE_ECN.cos_list = [0,1]
ecn_red.ROCE_ECN.min_threshold_bytes = 150000
ecn_red.ROCE_ECN.max_threshold_bytes = 1500000
ecn_red.ROCE_ECN.ecn_enable = true
ecn_red.ROCE_ECN.red_enable = true
ecn_red.ROCE_ECN.probability = 100
...
Related Information
RoCE introduction — roceinitiative.org
RoCEv2 congestion management — community.mellanox.com
Configuring RoCE over a DSCP-based lossless network with a Mellanox Spectrum switch
Index
cumulusnetworks.com 1099
Cumulus Linux 3.7 User Guide
Index
4
40G ports 260
logical limitations 260
8
802.1p 265
class of service 265
802.3ad link aggregation 459
A
ABRs 739
area border routers 739
access control lists 141
access ports 422
ACL policy files 157
ACL rules 270
ACLs 141, 144, 166
chains 144
QoS 166
active-active mode 465, 515
VRR 465
VXLAN 515
active listener ports 192
Algorithm Longest Prefix Match 705
routing 705
ALPM mode 705
routing 705
AOC cables 24
apt-get 64
area border routers 739
ABRs 739
arp cache 969
ASN 758
autonomous system number 758
auto-negotiation 235
autonomous system number 758
BGP 758
autoprovisioning 72
B
BFD 352, 810
Bidirectional Forwarding Detection 352
echo function 810
BGP 757, 760, 830
Border Gateway Protocol 757
ECMP 760
virtual routing and forwarding (VRF) 830
BGP peering relationships 778, 778
external 778
internal 778
bonds 387, 459
LACP Bypass 459
boot recovery 913
bpdufilter 370
and STP 370
BPDU guard 367
and STP 367
brctl 27
bridge assurance 370
and STP 370
bridges 395, 396, 396, 397, 402, 418, 422, 422
access ports 422
adding IP addresses 397
MAC addresses 396
MTU 395
trunk ports 422
untagged frames 418
VLAN-aware 396, 402
C
cable connectivity 24
cabling 348
Prescriptive Topology Manager 348
chain 144
cl-acltool 141, 271, 971
clagctl 446
class of service 265
cl-cfg 200, 928
cumulusnetworks.com 1101
Cumulus Linux 3.7 User Guide
cl-ecmpcalc 814
cl-license 23
cl-netstat 964
cl-ospf6 753
Clos topology 712
cl-resource-query 201, 914
cl-support 905
convergence 711
routing 711
Cumulus Linux 20, 31, 31, 34, 44, 484
installing 20, 34
reprovisioning 31
uninstalling 31
upgrading 44
VXLAN 484
cumulus user 113
D
DAC cables 24
daemons 191
datapath 265, 272, 274
link pause 274
priority flow control 272
datapath.conf 265
date 100
setting 100
deb 70
debugging 903
decode-syseeprom 918
differentiated services code point 265
dmidecode 919
dpkg 68
dpkg-reconfigure 99
DSCP 265
differentiated services code point 265
DSCP marking 270
dual-connected hosts 430
duplex interfaces 236
dynamic routing 354
and PTM 354
E
eBGP 759
external BGP 759
ebtables 141, 148
memory spaces 148
echo function 810, 810
BFD 810
PTM 810
ECMP 713, 751, 760, 820, 914
BGP 760
equal cost multi-pathing 713
monitoring 914
OSPF 751
resilient hashing 820
ECMP hashing 813, 817
resilient hashing 817
EGP 714
Exterior Gateway Protocol 714
equal cost multipath 813
ECMP hashing 813
equal cost multi-pathing 713
ECMP 713
ERSPAN 972
network troubleshooting 972
Ethernet management port 21
ethtool 263, 963
switch ports 263
external BGP 759
eBGP 759
F
fast convergence 776
BGP 776
First Hop Redundancy Protocol 465
VRR 465
FRRouting 354, 354, 713
and PTM 354, 354
dynamic routing 713
cumulusnetworks.com 1103
Cumulus Linux 3.7 User Guide
globs 229
Graphviz 348
H
hardware 917
monitoring 917
hardware compatibility list 18
hash distribution 388
HCL 18
head end replication 489
LNV 489
high availability 713
host entries 914
monitoring 914
hostname 22
hsflowd 984
hwclock 101
I
iBGP 759
internal BGP 759
ifdown 217
ifquery 221, 959
ifup 216
ifupdown 216
ifupdown2 227, 420, 958, 958, 958
excluding interfaces 958
logging 958
purging IP addresses 227
troubleshooting 958
VLAN tagging 420
IGMP snooping 452, 471
MLAG 452
IGP 714
Interior Gateway Protocol 714
image contents 32
installing 20
Cumulus Linux 20
interface counters 964
interface dependencies 220
interfaces 261
statistics 261
internal BGP 759
iBGP 759
ip6tables 141
IP addresses 227
purging 227
iproute2 962
failures 962
iptables 141
IPv4 routes 761
BGP 761
IPv6 routes 761
BGP 761
L
LACP 388, 428
MLAG 428
LACP Bypass 459
layer 3 access ports 27
configuring 27
leaf-spine topology 712
license 23
installing 23
lightweight network virtualization 487, 489, 490, 532
head end replication 489
service node replication 490
link aggregation 387
Link Layer Discovery Protocol 378
link-local IPv6 addresses 792
BGP 792
link pause 274
datapath 274
link-state advertisement 738
LLDP 378, 384
SNMP 384
lldpcli 379
lldpd 349, 378
LNV 487, 487, 489, 490, 532, 532
head end replication 489
service node replication 490
VXLAN 487, 532
load balancing 713
logging 907, 958, 958
cumulusnetworks.com 1105
Cumulus Linux 3.7 User Guide
ifupdown2 958
networking service 958
logging neighbor state changes 792
BGP 792
logical switch 428
longest prefix match 705
routing 705
loopback interface 28
configuring 28
LSA 738
link-state advertisement 738
LSDB 738
link-state database 738
lshw 919
M
MAC entries 914
monitoring 914
Mako templates 230, 960
debugging 960
mangle table 271
ACL rules 271
memory spaces 148
ebtables 148
MLAG 428, 447, 447, 448, 452, 456
backup link 448
IGMP snooping 452
peer link states 447
protodown state 447
STP 456
MLD snooping 471
monitoring 98, 903, 914, 922, 925, 963, 984, 987
hardware watchdog 922
Net-SNMP 987
network traffic 984
mstpctl 364, 424
MTU 237, 395, 962
bridges 395
failures 962
multi-Chassis Link Aggregation 428
MLAG 428
multiple bridges 417
mz 970
N
Netfilter 141
Net-SNMP 987
networking service 958
logging 958
network interfaces 216
ifupdown 216
network traffic 984
monitoring 984
network troubleshooting 980
tcpdump 980
network virtualization 477, 484, 660, 672
VMware NSX 660, 672
nonatomic updates 151
switchd 151
non-blocking networks 712
NTP 101
time 101
ntpd 101
O
ONIE 20, 32
rescue mode 32
onie-select 31
Open Network Install Environment 20
Open Shortest Path First Protocol 738, 753
OSPFv2 738
OSPFv3 753
open source contributions 18
OSPF 743, 750, 751, 751
ECMP 751
reconvergence 751
summary LSA 743
unnumbered interfaces 750
ospf6d.conf 754
OSPFv2 738
OSPFv3 753, 754
unnumbered interfaces 754
over-subscribed networks 712
cumulusnetworks.com 1107
Cumulus Linux 3.7 User Guide
P
packages 64
managing 64
packet buffering 265
datapath 265
packet queueing 265
datapath 265
packet scheduling 265
datapath 265
parent interfaces 223
password 113
default 113
passwords 21
peer groups 777
BGP 777
Per VLAN Spanning Tree 361
PVST 361
ping 968
policy.conf 159
port lists 229
port speeds 236
Prescriptive Topology Manager 348
priority flow control 272
datapath 272
priority groups 265
datapath 265
privileged commands 116
protocol tuning 711, 796
BGP 796
routing 711
protodown state 447
MLAG 447
PTM 348, 810
echo function 810
Prescriptive Topology Manager 348
ptmctl 355
ptmd 348
PTM scripts 350
PVRST 361
Rapid PVST 361
PVST 361
Per VLAN Spanning Tree 361
Q
QoS 166
ACLs 166
QSFP 965
Quagga 719
configuring 719
quality of service 265
querier 472
IGMP/MLD snooping 472
R
Rapid PVST 361
PVRST 361
read-only mode 795
BGP 795
recommended configuration 45
reconvergence 751
OSPF 751
repositories 69
other packages 69
rescue mode 32
resilient hashing 817, 820
ECMP 820
restart 201
switchd 201
root user 21, 113
route advertisements 758
BGP 758
route maps 705, 751, 795
BGP 705, 751, 795
route reflectors 759
BGP 759
routes 914
monitoring 914
routing protocols 710
RSTP 361
S
sensors command 919
serial console management 21
cumulusnetworks.com 1109
Cumulus Linux 3.7 User Guide
T
tcpdump 980
network troubleshooting 980
templates 230
time 100
setting 100
time zone 99
topology 348, 711
data center 348
traceroute 969
traffic.conf 265, 265
traffic distribution 388
traffic generator 970
mz 970
traffic marking 270
datapath 270
troubleshooting 903, 913, 980
single user mode 913
tcpdump 980
trunk ports 418, 422
tzdata 99
U
U-Boot 20, 903
unnumbered interfaces 750, 754
OSPF 750
OSPFv3 754
untagged frames 418
bridges 418
upgrading 44
Cumulus Linux 44
user accounts 113
cumulusnetworks.com 1111
Cumulus Linux 3.7 User Guide
cumulus 113
root 113
user commands 227
interfaces 227
V
virtual device counters 925, 928, 929
monitoring 925
poll interval 928
VLAN statistics 929
virtual routing and forwarding (VRF) 830, 833
BGP 830
table ID 833
visudo 115
VLAN 435, 925
statistics 925
switched virtual interface 435
VLAN-aware bridges 396, 402, 402
Spanning Tree Protocol 402
VLAN tagging 420, 420, 421
advanced example 421
basic example 420
VLAN translation 426
VTEP 477
vtysh 722
FRRouting CLI 722
VXLAN 477, 484, 487, 515, 532, 661, 673, 687, 925
active-active mode 515
LNV 487, 532
no controller 484
statistics 925
VMware NSX 661, 673, 687
W
watchdog 922
monitoring 922
Z
zebra 714
routing 714
cumulusnetworks.com 1113