Cumulus Linux 3.5.3 User Guide
Cumulus Linux 3.5.3 User Guide
5
User Guide
Table of Contents
Cumulus Linux User Guide
Table of Contents
Introducing Cumulus Linux . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
What's New in Cumulus Linux 3.5 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
Open Source Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
Hardware Compatibility List . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
Installation Management . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
Managing Cumulus Linux Disk Images . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
Contents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
Installing a New Cumulus Linux Image . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
Upgrading Cumulus Linux . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
x86 vs ARM Switches . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
Reprovisioning the System (Restart Installer) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
Uninstalling All Images and Removing the Configuration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
Booting into Rescue Mode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
Inspecting Image File Contents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
Related Information . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
Installing a New Cumulus Linux Image . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
Understanding these Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
Contents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
Installing via a DHCP/Web Server Method with DHCP Options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
Installing via a DHCP/Web Server Method without DHCP Options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
Installing via a Web Server with no DHCP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
Installing via FTP or TFTP without a Web Server . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
Installing via a Local File . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
Installing via USB . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
Installing a New Image when Cumulus Linux Is already Installed . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
Upgrading Cumulus Linux . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
Contents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
Upgrades: Comparing the Network Device Worldview vs. the Linux Host Worldview . . . . . . . . . . . 45
Upgrading Cumulus Linux Devices: Strategies and Processes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
Upgrading Cumulus Linux: Choosing between a Binary Install vs. Package Upgrade . . . . . . . . . . . 52
Rolling Back a Cumulus Linux Installation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
cumulusnetworks.com 2
Cumulus Linux User Guide
System Configuration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
Network Command Line Utility - NCLU . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
Contents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
What's New and Different in NCLU in Version 3.5? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
Installing NCLU . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
Getting Started . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
Configuring User Accounts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88
Restarting the netd Service . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90
Backing up the Configuration to a Single File . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90
Advanced Configuration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
Setting Date and Time . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92
cumulusnetworks.com 3
Cumulus Linux User Guide
Contents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92
Setting the Time Zone . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
Setting the Date and Time . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94
Setting Time Using NTP and NCLU . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
Specifying the NTP Source Interface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96
NTP Default Configuration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96
Related Information . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98
Authentication, Authorization and Accounting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98
SSH for Remote Access . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98
User Accounts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100
Using sudo to Delegate Privileges . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102
LDAP Authentication and Authorization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108
TACACS Plus . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117
RADIUS AAA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129
Netfilter - ACLs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 134
Contents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135
Understanding Traffic Rules In Cumulus Linux . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137
Installing and Managing ACL Rules with NCLU . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 146
Installing and Managing ACL Rules with cl-acltool . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 148
Installing Packet Filtering (ACL) Rules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 149
Specifying which Policy Files to Install . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 151
Hardware Limitations on Number of Rules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 152
Supported Rule Types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 154
Common Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 156
Example Scenario . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 161
Useful Links . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 165
Caveats and Errata . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 165
Default Cumulus Linux ACL Configuration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 168
Filtering Learned MAC Addresses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177
Managing Application Daemons . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 180
Contents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 180
Using systemd and the systemctl Command . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 180
Identifying Active Listener Ports for IPv4 and IPv6 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 182
Identifying Daemons Currently Active or Stopped . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 183
Identifying Essential Services . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 187
Configuring switchd . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 188
Contents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 188
The switchd File System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 188
Configuring switchd Parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 190
Restarting switchd . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 190
Power over Ethernet - PoE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 191
Contents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 191
How It Works . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 191
Configuring PoE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 192
Troubleshooting PoE and PoE+ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 197
Configuring a Global Proxy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 200
cumulusnetworks.com 4
Cumulus Linux User Guide
cumulusnetworks.com 5
Cumulus Linux User Guide
cumulusnetworks.com 6
Cumulus Linux User Guide
cumulusnetworks.com 7
Cumulus Linux User Guide
Layer 3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 588
Routing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 589
Contents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 589
Managing Static Routes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 589
Configuring a Gateway or Default Route . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 592
Supported Route Table Entries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 592
Caveats and Errata . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 596
Related Information . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 597
Introduction to Routing Protocols . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 597
Contents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 597
Defining Routing Protocols . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 597
Configuring Routing Protocols . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 597
Protocol Tuning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 598
Network Topology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 598
Contents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 599
Clos Topologies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 599
Over-Subscribed and Non-Blocking Configurations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 599
Containing the Failure Domain . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 600
Load Balancing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 600
FRRouting Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 600
Contents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 601
Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 601
About zebra . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 601
Related Information . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 601
Upgrading from Quagga to FRRouting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 602
Configuring FRRouting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 606
Contents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 606
Configuring FRRouting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 606
Interface IP Addresses and VRFs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 609
Using the FRRouting vtysh Modal CLI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 609
Reloading the FRRouting Configuration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 614
Related Information . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 614
Comparing NCLU and vtysh Commands . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 615
Open Shortest Path First - OSPF - Protocol . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 617
Contents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 618
Scalability and Areas . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 618
Configuring OSPFv2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 619
Scaling Tips . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 622
cumulusnetworks.com 10
Cumulus Linux User Guide
cumulusnetworks.com 12
Cumulus Linux User Guide
cumulusnetworks.com 13
Cumulus Linux User Guide
cumulusnetworks.com 14
Cumulus Linux User Guide
Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 927
©2018 Cumulus Networks. All rights reserved
CUMULUS, the Cumulus Logo, CUMULUS NETWORKS, and the Rocket Turtle Logo (the “Marks”) are trademarks and service marks of
Cumulus Networks, Inc. in the U.S. and other countries. You are not permitted to use the Marks without the prior written consent of
Cumulus Networks. The registered trademark Linux® is used pursuant to a sublicense from LMI, the exclusive licensee of Linus
Torvalds, owner of the mark on a worldwide basis. All other marks are used under fair use or license from their respective owners.
Introducing
cumulusnetworks.com
Cumulus Linux 15
Cumulus Linux 3.5 User Guide
16 02 March 2018
Cumulus Networks
Prerequisites
Prior intermediate Linux knowledge is assumed for this guide. You should be familiar with basic
text editing, Unix file permissions, and process monitoring. A variety of text editors are pre-
installed, including vi and nano.
You must have access to a Linux or UNIX shell. If you are running Windows, you should use a
Linux environment like Cygwin as your command line tool for interacting with Cumulus Linux.
If you're a networking engineer but are unfamiliar with Linux concepts, refer to this
reference guide for examples of the Cumulus Linux CLI and configuration options, and
their equivalent Cisco Nexus 3000 NX-OS commands and settings for comparison. You
can also watch a series of short videos introducing you to Linux in general and some
Cumulus Linux-specific concepts in particular.
Contents
This chapter covers ...
Installation (see page 19)
Upgrade to the Latest Version (see page 20)
Getting Started (see page 20)
Login Credentials (see page 20)
Serial Console Management (see page 20)
Wired Ethernet Management (see page 20)
Configuring the Hostname and Timezone (see page 21)
Verifying the System Time (see page 22)
Installing the License (see page 22)
Configuring Breakout Ports with Splitter Cables (see page 23)
Testing Cable Connectivity (see page 23)
Configuring Switch Ports (see page 24)
Layer 2 Port Configuration (see page 24)
Layer 3 Port Configuration (see page 25)
Configuring a Loopback Interface (see page 27)
18 02 March 2018
Cumulus Networks
Installation
To install Cumulus Linux, you use ONIE (Open Network Install Environment), an extension to the traditional
U-Boot software that allows for automatic discovery of a network installer image. This facilitates the
ecosystem model of procuring switches, with a user's own choice of operating system loaded, such as
Cumulus Linux.
If Cumulus Linux 3.0.0 or later is already installed on your switch, and you need to upgrade the
software only, you can skip to Upgrading Cumulus Linux (see page 20) below.
The easiest way to install Cumulus Linux with ONIE is via local HTTP discovery:
1. If your host (like a laptop or server) is IPv6-enabled, make sure it is running a web server.
If the host is IPv4-enabled, make sure it is running DHCP as well as a web server.
2. Download the Cumulus Linux installation file to the root directory of the web server. Rename this file
onie-installer.
3. Connect your host via Ethernet cable to the management Ethernet port of the switch.
4. Power on the switch. The switch downloads the ONIE image installer and boots it. You can watch the
progress of the install in your terminal. After the installation finishes, the Cumulus Linux login prompt
appears in the terminal window.
These steps describe a flexible unattended installation method. You should not need a console
cable. A fresh install via ONIE using a local web server should generally complete in less than 10
minutes.
You have more options for installing Cumulus Linux with ONIE. Read Installing a New Cumulus
Linux Image (see page 32) to install Cumulus Linux using ONIE in the following ways:
DHCP/web server with and without DHCP options
Web server without DHCP
FTP or TFTP without a web server
Local file
USB
ONIE supports many other discovery mechanisms using USB (copy the installer to the root of the drive),
DHCPv6 and DHCPv4, and image copy methods including HTTP, FTP, and TFTP. For more information on
these discovery methods, refer to the ONIE documentation.
After installing Cumulus Linux, you are ready to:
Log in to Cumulus Linux on the switch.
Install the Cumulus Linux license.
Configure Cumulus Linux. This quick start guide provides instructions on configuring switch ports
and a loopback interface.
cumulusnetworks.com 19
Cumulus Linux 3.5 User Guide
Getting Started
When bringing up Cumulus Linux for the first time, the management port makes a DHCPv4 request. To
determine the IP address of the switch, you can cross reference the MAC address of the switch with your
DHCP server. The MAC address should be located on the side of the switch or on the box in which the unit
was shipped.
Login Credentials
The default installation includes one system account, root, with full system privileges, and one user account,
cumulus, with sudo privileges. The root account password is set to null by default (which prohibits login),
while the cumulus account is configured with this default password:
CumulusLinux!
In this quick start guide, you will use the cumulus account to configure Cumulus Linux.
For best security, you should change the default password (using the passwd command) before
you configure Cumulus Linux on the switch.
All accounts except root are permitted remote SSH login; sudo may be used to grant a non-root account
root-level access. Commands which change the system configuration require this elevated level of access.
For more information about sudo, read Using sudo to Delegate Privileges (see page 102).
Example IP Configuration
20 02 March 2018
Cumulus Networks
Example IP Configuration
Set the static IP address with the interface address and interface gateway NCLU
commands:
auto eth0
iface eth0
address 192.0.2.42/24
gateway 192.0.2.1
The command prompt in the terminal doesn't reflect the new hostname until you either log out of
the switch or start a new shell.
When you use this NCLU command to set the hostname, DHCP does not override the hostname
when you reboot the switch. However, if you disable the hostname setting with NCLU, DHCP does
override the hostname the next time you reboot the switch.
2. Follow the on screen menu options to select the geographic area and region.
cumulusnetworks.com 21
Cumulus Linux 3.5 User Guide
Programs that are already running (including log files) and users currently logged in, will not see
timezone changes made with interactive mode. To have the timezone set for all services and
daemons, a reboot is required.
[email protected]|thequickbrownfoxjumpsoverthelazydog312
There are three ways to install the license onto the switch:
Copy it from a local server. Create a text file with the license and copy it to a server accessible from
the switch. On the switch, use the following command to transfer the file directly on the switch, then
install the license file:
Copy the file to an HTTP server (not HTTPS), then reference the URL when you run cl-license:
Copy and paste the license key into the cl-license command:
22 02 March 2018
Cumulus Networks
You don't have to reboot the switch to activate the switch ports. Once you install the license,
restart the switchd service. All front panel ports become active and show up as swp1, swp2, and
so forth.
If a license is not installed on a Cumulus Linux switch, the switchd service does not start. Once
you install the license, start switchd as described above.
To administratively enable all physical ports, run the following command, where swp1-52 represents a
switch with switch ports numbered from swp1 to swp52:
To view link status, use net show interface all. The following examples show the output of ports in
"admin down", "down" and "up" modes:
cumulusnetworks.com 23
Cumulus Linux 3.5 User Guide
Examples
Example One
In the following configuration example, the front panel port swp1 is placed into a bridge called
bridge. The NCLU commands are:
24 02 March 2018
Cumulus Networks
auto bridge
iface bridge
bridge-ports swp1
bridge-vlan-aware yes
Example Two
A range of ports can be added in one command. For example, add swp1 through swp10, swp12,
and swp14 through swp20 to bridge:
auto bridge
iface bridge
bridge-ports swp1 swp2 swp3 swp4 swp5 swp6 swp7 swp8
swp9 swp10 swp12 swp14 swp15 swp16 swp17 swp18 swp19 swp20
bridge-vlan-aware yes
A script is available to generate a configuration that places all physical ports in a single bridge.
cumulusnetworks.com 25
Cumulus Linux 3.5 User Guide
auto swp1
iface swp1
address 10.1.1.1/30
auto bridge
iface bridge
bridge-vids 100
bridge-vlan-aware yes
auto vlan100
iface vlan100
address 192.168.10.1/24
vlan-id 100
vlan-raw-device bridge
To view the changes in the kernel use the ip addr show command:
...
26 02 March 2018
Cumulus Networks
To see the status of the loopback interface (lo), use the net show interface lo command:
IP Details
------------------------- --------------------
IP: 127.0.0.1/8, ::1/128
IP Neighbor(ARP) Entries: 0
auto lo
iface lo inet loopback
address 10.1.1.1/32
address 172.16.2.1/24
Installation
cumulusnetworks.com Management 27
Cumulus Linux 3.5 User Guide
Installation Management
A Cumulus Linux switch can have only one image of the operating system installed. This section discusses
installing new and updating existing Cumulus Linux disk images, and configuring those images with
additional applications (via packages) if desired.
Zero touch provisioning is a way to quickly deploy and configure new switches in a large-scale environment.
Contents
This chapter covers ...
Installing a New Cumulus Linux Image (see page 28)
Upgrading Cumulus Linux (see page 28)
x86 vs ARM Switches (see page 28)
Reprovisioning the System (Restart Installer) (see page 29)
Uninstalling All Images and Removing the Configuration (see page 29)
Booting into Rescue Mode (see page 30)
Inspecting Image File Contents (see page 31)
Related Information (see page 32)
28 02 March 2018
Cumulus Networks
cumulus@x86switch$ uname -m
x86_64
cumulus@ARMswitch$ uname -m
armv7l
You can also visit the HCL (hardware compatibility list) to look at your hardware to determine the processor
type.
If you change your mind, you can cancel a pending reinstall operation by using onie-select -c
:
cumulusnetworks.com 29
Cumulus Linux 3.5 User Guide
If you change your mind you can cancel a pending uninstall operation by using onie-select -c
:
If you change your mind you can cancel a pending rescue boot operation by using onie-
select -c:
30 02 March 2018
Cumulus Networks
You can also extract the contents of the image file by passing the extract option to the image file:
cumulusnetworks.com 31
Cumulus Linux 3.5 User Guide
Finally, you can verify the contents of the image file by passing the verify option to the image file:
Related Information
Open Network Install Environment (ONIE) Home Page
ONIE is an open source project, equivalent to PXE on servers, that enables the installation of
network operating systems (NOS) on bare metal switches.
Contents
This chapter covers ...
Understanding these Examples (see page 33)
Installing via a DHCP/Web Server Method with DHCP Options (see page 34)
Installing via a DHCP/Web Server Method without DHCP Options (see page 34)
Installing via a Web Server with no DHCP (see page 35)
Installing from Cumulus Linux (see page 35)
Installing from ONIE (see page 35)
Installing via FTP or TFTP without a Web Server (see page 36)
Installing from Cumulus Linux (see page 36)
Installing from ONIE (see page 36)
Installing via a Local File (see page 36)
Installing from Cumulus Linux (see page 36)
Installing from ONIE (see page 37)
Installing via USB (see page 37)
Preparing for USB Installation (see page 37)
Instructions for x86 Platforms (see page 39)
Instructions for ARM Platforms (see page 41)
Installing a New Image when Cumulus Linux Is already Installed (see page 44)
Entering ONIE Mode from Cumulus Linux (see page 44)
cumulusnetworks.com 33
Cumulus Linux 3.5 User Guide
1. The bare metal switch boots up and asks for an address (DHCP request).
2. The DHCP server acknowledges and responds with DHCP option 114 and the location of the
installation image.
3. ONIE downloads the Cumulus Linux binary, installs and reboots.
4. Success! You are now running Cumulus Linux.
The most common method is for you to send DHCP option 114 with the entire URL to the web
server (this could be the same system). However, there are many other ways to use DHCP even if
you don't have full control over DHCP. See the ONIE user guide for help.
dhcp-host=sw4,192.168.100.14,6c:64:1a:00:03:ba,set:sw4
dhcp-option=tag:sw4,114,"http://roz.rtplab.test/onie-installer-
[PLATFORM]"
If you don't have a web server, you can use this free Apache example.
34 02 March 2018
2. Cumulus Networks
onie# onie-discovery-stop
3. Place the Cumulus Linux installer image in a directory on your web server.
4. Run the installer manually, since there are no DHCP options:
cumulusnetworks.com 35
4.
Cumulus Linux 3.5 User Guide
36 02 March 2018
Cumulus Networks
Make sure to back up (see page 44) any important configuration files that you may need to
restore the configuration of your switch after the installation finishes.
It is possible that you could severely damage your system with the following utilities, so
please use caution when performing the actions below!
a. Insert your flash drive into the USB port on the switch running Cumulus Linux and log in to
the switch.
b. Determine and note at which device your flash drive can be found by using output from
cat /proc/partitions and sudo fdisk -l [device]. For example, sudo fdisk -
l /dev/sdb.
cumulusnetworks.com 37
Cumulus Linux 3.5 User Guide
These instructions assume your USB drive is the /dev/sdb device, which is
typical if the USB stick was inserted after the machine was already booted.
However, if the USB stick was plugged in during the boot process, it is possible the
device could be /dev/sda. Make sure to modify the commands below to use the
proper device for your USB drive!
The parted utility should already be installed. However, if it is not, install it with:
sudo -E apt-get install parted
e. Format the partition to your filesystem of choice using ONE of the examples below:
f. To continue installing Cumulus Linux, mount the USB drive in order to move files to it.
3. Copy the image and license files over to the flash drive and rename the image file to:
onie-installer-x86_64, if installing on an x86 platform
onie-installer-arm, if installing on an ARM platform
You can also use any of the ONIE naming schemes mentioned here.
38 02 March 2018
Cumulus Networks
When using a Mac or Windows computer to rename the installation file the file extension
may still be present. Make sure to remove the file extension otherwise ONIE will not be
able to detect the file!
4. Insert the USB stick into the switch, then continue with the appropriate instructions below for your
x86 or ARM platform.
SSH sessions to the switch get dropped after this step. To complete the remaining
instructions, connect to the console of the switch. Cumulus Linux switches display their
boot process to the console, so you need to monitor the console specifically to complete
the next step.
2. Monitor the console and select the ONIE option from the first GRUB screen shown below.
3. Cumulus Linux on x86 uses GRUB chainloading to present a second GRUB menu specific to the
ONIE partition. No action is necessary in this menu to select the default option ONIE: Install OS.
cumulusnetworks.com 39
3.
4. At this point, the USB drive should be automatically recognized and mounted. The image file should
be located and automatic installation of Cumulus Linux should begin. Here is some sample output:
40 02 March 2018
Cumulus Networks
5. After installation completes, the switch automatically reboots into the newly installed instance of
Cumulus Linux.
6. Determine and note at which device your flash drive can be found by using output from cat /proc
/partitions and sudo fdisk -l [device]. For example, sudo fdisk -l /dev/sdb.
These instructions assume your USB drive is the /dev/sdb device, which is typical if the
USB stick was inserted after the machine was already booted. However, if the USB stick
was plugged in during the boot process, it is possible the device could be /dev/sda. Make
sure to modify the commands below to use the proper device for your USB drive!
10. Check that your license is installed with the cl-license command.
11. Reboot the switch to utilize the new license.
sudo reboot
SSH sessions to the switch get dropped after this step. To complete the remaining
instructions, connect to the console of the switch. Cumulus Linux switches display their
boot process to the console, so you need to monitor the console specifically to complete
the next step.
2. Interrupt the normal boot process before the countdown (shown below) completes. Press any key to
stop the autobooting.
3. A command prompt appears, so you can run commands. Execute the following command:
run onie_bootcmd
4. At this point the USB drive should be automatically recognized and mounted. The image file should
be located and automatic installation of Cumulus Linux should begin. Here is some sample output:
42 02 March 2018
Cumulus Networks
5. After installation completes, the switch automatically reboots into the newly installed instance of
Cumulus Linux.
6. Determine and note at which device your flash drive can be found by using output from cat /proc
/partitions and sudo fdisk -l [device]. For example, sudo fdisk -l /dev/sdb.
These instructions assume your USB drive is the /dev/sdb device, which is typical if the
USB stick was inserted after the machine was already booted. However, if the USB stick
was plugged in during the boot process, it is possible the device could be /dev/sda. Make
sure to modify the commands below to use the proper device for your USB drive!
cumulusnetworks.com 43
9.
Cumulus Linux 3.5 User Guide
10. Check that your license is installed with the cl-license command.
11. Reboot the switch to utilize the new license.
sudo reboot
ONIE Install Mode to attempt to automatically discover the image from a DHCP server:
Contents
This chapter covers ...
Upgrades: Comparing the Network Device Worldview vs. the Linux Host Worldview (see page 45)
Manual vs. Automated Configuration (see page 45)
Pre-deployment Testing of Production Environments (see page 45)
Locations of Configuration Data vs. Executables (see page 46)
Upgrade Procedure (see page 46)
Rollback Procedure (see page 46)
Upgrades: Comparing the Network Device Worldview vs. the Linux Host
Worldview
cumulusnetworks.com 45
Cumulus Linux 3.5 User Guide
Alternatively, the cost of a Linux host is cheap (or nearly free when using virtualization), so rigorous testing of
a release before deploying it is not encumbered by budgeting concerns. Most sysadmins extensively test
new releases in the complete application environment.
Upgrade Procedure
Both network admins and sysadmins generally plan upgrades only to gain new functionality or to get bug
fixes when the workarounds become too onerous. The goal is to reduce the number of upgrades as much
as possible.
The network device upgrade paradigm is to leave the configuration data in place, and replace the executable
files either all at once from a single binary image or in large chunks (subsystems). A full release upgrade
comes with risk due to unexpected behavior changes in subsystems where the admin did not anticipate or
need changes.
The Linux host upgrade paradigm is to independently upgrade a small list of packages while leaving most of
the OS untouched. Changing a small list of packages reduces the risk of unintended consequences. Usually
upgrades are a "forward only" paradigm, where the sysadmins generally plan to move to the latest code
within the same major release when needed. Every few years, when a new kernel train is released, a major
upgrade is planned. A major upgrade involves wiping and replacing the entire OS and migrating
configuration data.
Rollback Procedure
Even the most well planned and tested upgrades can result in unforeseen problems, and sometimes the
best solution to new problems is to roll back to the previous state.
Since network devices clearly separate data and executables, generally the process is to overwrite the new
release executable with the previously running executable. If the configuration was changed by the newer
release, then you either have to manually back out or repair the changes, or restore from an already
backed up configuration.
The Linux host scenario can be more complicated. There are three main approaches:
Back out individual packages: If the problematic package is identified, the sysadmin can downgrade
the affected package directly. In rare cases the configuration files may have to be restored from
backup, or edited to back out any changes that were automatically made by the upgrade package.
46 02 March 2018
Cumulus Networks
Flatten and rebuild: If the OS becomes unusable, you can use orchestration tools to reinstall the
previous OS release from scratch and then automatically rebuild the configuration.
Backup and restore: Another common strategy is to restore to a previous state via a backup
captured before the upgrade.
cumulusnetworks.com 47
Cumulus Linux 3.5 User Guide
However, if an out-of-band network is not available for you to upgrade, you can use the dtach tool
instead to upgrade in band.
/etc Network configuration Layer 1 and Switch Port Attributes (see page N/A
/network/ files, most notably 222)
/etc/network
/interfaces and
/etc/network
/interfaces.d/
48 02 March 2018
Cumulus Networks
/etc Breakout cable Layer 1 and Switch Port N/A; please read
/cumulus configuration file Attributes#ConfiguringBreakoutPorts (see the guide on
/ports. page ) breakout cables
conf
/etc Switchd configuration Configuring switchd (see page 188) N/A; please read
/cumulus the guide on
/switchd. switchd
conf configuration
/etc/lldpd. Link Layer Discover Protocol (LLDP) daemon Link Layer packages.debian.
conf configuration Discovery org/wheezy/lldpd
Protocol (see
page 295)
cumulusnetworks.com 49
Cumulus Linux 3.5 User Guide
/etc Name Service Switch (NSS) configuration file TACACS Plus (see N/A
/nsswitch. page 117)
conf
If you are using the root user account, consider including /root/.
If you have custom user accounts, consider including /home/<username>/.
/etc/adjtime System clock adjustment data. NTP manages this automatically. It is incorrect when
the switch hardware is replaced. Do not copy.
/etc/bcm.d/ Per-platform hardware configuration directory, created on first boot. Do not copy.
/etc/mlx/ Per-platform hardware configuration directory, created on first boot. Do not copy.
/etc/blkid.tab. A previous partition table; it should not be modified manually. Do not copy.
old
50 02 March 2018
Cumulus Networks
/etc/cumulus
/init
/etc/default Platform hardware-specific file. Created during first boot. Do not copy.
/hwclock
/etc/sensors.d Platform-specific sensor data. Created during first boot. Do not copy.
cumulusnetworks.com 51
Cumulus Linux 3.5 User Guide
If you are upgrading from Cumulus Linux 2.y.z to Cumulus Linux 3.y.z, during the time when one
switch in the pair is on Cumulus Linux 3.y.z and the other switch in the pair is on Cumulus Linux 2.
y.z, a complete outage occurs on these switches and their associated network segments.
1. Upgrade Cumulus Linux on the switch already in the secondary role. This is the switch with the
higher clagd-priority value.
2. Set the switch in the secondary role into the primary role by setting its clagd-priority to a value
lower than the clagd-priority setting on the switch in the primary role.
For more information about setting the priority, see Understanding Switch Roles (see page 357).
Cons:
52 02 March 2018
Cumulus Networks
Cons:
This method works only if you are upgrading to a later minor release (like 3.1.x to 3.2.y), or to a later
maintenance release from an earlier version of that minor release (for example, 2.5.2 to 2.5.5 or
3.0.0 to 3.0.1).
Rollback is quite difficult and tedious.
You can't choose the exact release version that you want to run.
When you upgrade, you upgrade all packages to the latest available version.
The upgrade process takes a while to complete, and various switch functions may be intermittently
available during the upgrade.
Some upgrade operations will terminate SSH sessions on the in-band (front panel) ports, leaving the
user unable to monitor the upgrade process. As a workaround, use the dtach tool.
Just like the binary install method, you may have to reboot after the upgrade, lengthening the
downtime.
After you upgrade, user names and group names created by packages may be different on different
switches, depending the configuration and package installation history.
If services are stopped, a reboot may be required before those services can be started again.
4. Reboot the switch if the upgrade messages indicate that a system restart is required.
cumulusnetworks.com 53
Cumulus
4. Linux 3.5 User Guide
After you successfully upgrade Cumulus Linux, you may notice some some results that you may
or may not have expected:
apt-get upgrade always updates the operating system to the most current version, so
if you are currently running Cumulus Linux 3.0.1 and run apt-get upgrade on that
switch, the packages get upgraded to the latest versions contained in the latest 3.y.z
release.
When you run cat /etc/image-release, the output still shows the version of
Cumulus Linux from the last binary install. So if you installed Cumulus Linux 3.1.0 as a full
image install and then upgraded to 3.2.0 using apt-get upgrade, the output from /etc
/image-release still shows: IMAGE_RELEASE=3.0.0. To see the current version of all
the Cumulus Linux packages running on the switch, use dpkg --list or dpkg -l.
cumulusnetworks.com 55
Cumulus Linux 3.5 User Guide
Using the Config File Migration Script to Identify and Move Files to
Cumulus Linux 3.0 and Later
You can use the Config File Migration Script with the --backup option to create a backup archive of
configuration files in version 2.5, copy them off the box, then install them on the new version switch. Note
that you need to follow the previous section about caveats when migrating configuration files.
You cannot use the Config File Migration Script to upgrade from Cumulus Linux 3.0.0 to a later
version. Use apt-get instead, as documented in the release notes.
The following example excludes /etc/apt, /etc/passwd and /etc/shadow from the backup archive.
56 02 March 2018
Cumulus Networks
2. Install Cumulus Linux 3.0 or later onto the switch using ONIE (see page 32).
3. Reinstall the files from the config file archive to the newly installed switch.
# On the switch, copy the config file archive back from the
server:
scp user@my_external_server:PATH/SWITCHNAME-config-archive-
DATE_TIME.tar.gz .
# Untar the archive to the root of the box
sudo tar -C / -xvf SWITCHNAME-config-archive-DATE_TIME.tar.gz
Be aware that version 2.5.z configurations are not guaranteed to work in Cumulus Linux
3.0 or later. You should test the restoration and proper operation of the Cumulus Linux 2.5.
z configuration in Cumulus Linux 3.0 or later on a non-production switch or in a Cumulus
VX image, since every deployment is unique.
Using Snapshots
Cumulus Linux supports the ability to take snapshots of the complete file system as well as the ability to roll
back to a previous snapshot. Snapshots are performed automatically right before and after you upgrade
Cumulus Linux and right before and after you commit a switch configuration using NCLU (see page 82). In
addition, you can take a snapshot at any time. You can roll back the entire file system to a specific snapshot
or just retrieve specific files.
The primary snapshot components are:
btrfs — an underlying file system in Cumulus Linux, which supports snapshots.
snapper — a userspace utility to create and manage snapshots on demand as well as taking
snapshots automatically before and after running apt-get upgrade|install|remove|dist-
upgrade. You can use snapper to roll back to earlier snapshots, view existing snapshots, or delete
one or more snapshots.
cumulusnetworks.com 57
Cumulus Linux 3.5 User Guide
NCLU (see page 82) — takes snapshots automatically before and after committing network
configurations. You can use NCLU to roll back to earlier snapshots, view existing snapshots, or
delete one or more snapshots.
Contents
This chapter covers ...
Installing the Snapshot Package (see page 58)
Taking and Managing Snapshots (see page 58)
Viewing Available Snapshots (see page 58)
Viewing Differences between Snapshots (see page 59)
Deleting Snapshots (see page 61)
Rolling Back to Earlier Snapshots (see page 62)
Configuring Automatic Time-based Snapshots (see page 62)
Caveats and Errata (see page 63)
root Partition Mounted Multiple Times (see page 63)
For more information about using snapper, run snapper --help or man snapper(8).
58 02 March 2018
Cumulus Networks
However, net show commit history only displays snapshots taken when you update your switch
configuration. It does not list any snapshots taken directly with snapper. To see all the snapshots on the
switch, run:
-control-plane
- acl ipv4 EXAMPLE1 inbound
-iface swp1
- acl ipv4 EXAMPLE1 inbound
You can view the diff for a single file by specifying the name in the command:
-control-plane
- acl ipv4 EXAMPLE1 inbound
-iface swp1
- acl ipv4 EXAMPLE1 inbound
60 02 March 2018
Cumulus Networks
For a higher level view, displaying the names of changed/added/deleted files only, run:
Deleting Snapshots
You can remove one or more snapshots using both NCLU and snapper.
Take care when deleting a snapshot, as you cannot restore it once it's been deleted.
Snapshot 0 is the running configuration. You can't roll back to it or delete it. However, you can
take a snapshot of it.
Snapshot 1 is the root file system.
The snapper utility preserves a number of snapshots, and automatically deletes older snapshots once the
limit is reached. It does this in two ways.
By default, snapper preserves 10 snapshots that are labeled important. A snapshot is labeled important if
it was created when you run apt-get. To change this number, run:
You should always make NUMBER_LIMIT_IMPORTANT an even number since two snapshots are
always taken before and after an upgrade. This does not apply to NUMBER_LIMIT, described
next.
snapper also deletes unlabeled snapshots. The default number of snapshots snapper preserves is 5. To
change this number, run:
cumulusnetworks.com 61
Cumulus Linux 3.5 User Guide
Also, you can prevent snapshots from being taken automatically before and running apt-get
upgrade|install|remove|dist-upgrade. Edit /etc/cumulus/apt-snapshot.conf and set:
APT_SNAPSHOT_ENABLE=no
For any snapshot on the switch, you can use snapper to roll back to a specific snapshot. When running
snapper rollback, you must reboot the switch for the rollback to complete:
You can also revert to an earlier version of a specific file instead of rolling back the whole file system:
You can also copy the file directly from the snapshot directory:
cumulus@switch:~$ cp /.snapshots/32/snapshot/etc/cumulus/acl
/policy.d/50_nclu_acl.rules /etc/cumulus/acl/policy.d/
62 02 March 2018
Cumulus Networks
Directory Reason
/var/log, /var/support Log file and Cumulus support location. Excluded from snapshots to allow post-
rollback analysis.
/opt, /var/opt Third-party software usually are installed in /opt. Exclude /opt to avoid re-
installing these applications after rollbacks.
cumulusnetworks.com 63
Cumulus Linux 3.5 User Guide
Directory Reason
/srv Contains data for HTTP and FTP servers. Excluded this directory to avoid
server data loss on rollbacks.
/usr/local This directory is used when installing locally built software. Exclude this
directory to avoid re-installing these software after rollbacks.
/var/lib/libvirt/images This is the default directory for libvirt VM images. Exclude from the snapshot.
Additionally disable Copy-On-Write (COW) for this subvolume as COW and VM
image I/O access patterns do not play nice.
/boot/grub/i386-pc, The GRUB kernel modules must stay in sync with the GRUB kernel installed in
/boot/grub/x86_64-efi, the master boot record or UEFI system partition.
/boot/grub/arm-uboot
If services are stopped, a reboot may be required before those services can be started again.
Contents
This chapter covers ...
Updating the Package Cache (see page 65)
cumulusnetworks.com 65
Cumulus Linux 3.5 User Guide
Cumulus Networks recommends you use the -E option with sudo whenever you run any apt-
get command. This option preserves your environment variables — such as HTTP proxies —
before you install new packages or upgrade your distribution.
66 02 March 2018
Cumulus Networks
The search commands look for the search terms not only in the package name but in other parts
of the package information. Consequently, it matches on more packages than you would expect.
Adding a Package
In order to add a new package, first ensure the package is not already installed in the system:
If the package is installed already, ensure it’s the version you need. If it’s an older version, then update the
package from the Cumulus Linux repository:
cumulusnetworks.com 67
Cumulus Linux 3.5 User Guide
If the package is not already on the system, add it by running apt-get install. This retrieves the
package from the Cumulus Linux repository and installs it on your system together with any other packages
that this package might depend on.
For example, the following adds the package tcpreplay to the system:
68 02 March 2018
Cumulus Networks
For several packages, Cumulus Networks has added features or made bug fixes and these
packages must not be replaced with versions from other repositories. Cumulus Linux has been
configured to ensure that the packages from the Cumulus Linux repository are always preferred
over packages from other repositories.
If you want to install packages that are not in the Cumulus Linux repository, the procedure is the same as
above with one additional step.
Packages not part of the Cumulus Linux Repository have generally not been tested, and may not
be supported by Cumulus Linux support.
Installing packages outside of the Cumulus Linux repository requires the use of apt-get, but, depending
on the package, easy-install and other commands can also be used.
To install a new package, please complete the following steps:
1. First, ensure package is not already installed in the system. Use the dpkg command:
2. If the package is installed already, ensure it's the version you need. If it's an older version, then
update the package from the Cumulus Linux repository:
cumulusnetworks.com 69
2.
3. If the package is not on the system, then most likely the package source location is also not in the
/etc/apt/sources.list file. If the source for the new package is not in sources.list, please
edit and add the appropriate source to the file. For example, add the following if you wanted a
package from the Debian repository that is not in the Cumulus Linux repository:
Otherwise, the repository may be listed in /etc/apt/sources.list but is commented out, as can
be the case with the early-access repository:
To uncomment the repository, remove the # at the start of the line, then save the file:
Related Information
70 02 March 2018
Cumulus Networks
Related Information
Debian GNU/Linux FAQ, Ch 8 Package management tools
man pages for apt-get, dpkg, sources.list, apt_preferences
Contents
Click to expand...
Zero Touch Provisioning Using a Local File (see page 71)
Zero Touch Provisioning Using USB (ZTP-USB) (see page 72)
Zero Touch Provisioning over DHCP (see page 73)
Triggering ZTP over DHCP (see page 73)
Configuring The DCHP Server (see page 73)
Detailed Look at HTTP Headers (see page 74)
Writing ZTP Scripts (see page 74)
Example ZTP Script (see page 75)
Testing and Debugging ZTP Scripts (see page 76)
Manually Using the ztp Command (see page 79)
Notes (see page 81)
cumulusnetworks.com 71
Cumulus Linux 3.5 User Guide
cumulus-ztp-amd64-cel_pebble-rUNKNOWN
cumulus-ztp-amd64-cel_pebble
cumulus-ztp-cel_pebble
cumulus-ztp-amd64
cumulus-ztp
You can also trigger the ZTP process manually by running the ztp --run <URL> command, where the
URL is the path to the ZTP script.
This feature has been tested only with "thumb" drives, not an actual external large USB hard
drive.
If the ztp process did not discover a local script, it tries once to locate an inserted but unmounted USB
drive. If it discovers one, it begins the ZTP process.
Cumulus Linux supports the use of a FAT32, FAT16, or VFAT-formatted USB drive as an installation source
for ZTP scripts. You must plug in the USB stick before you power up the switch.
At minimum, the script should:
Install the Cumulus Linux operating system and license.
Copy over a basic configuration to the switch.
Restart the switch or the relevant serves to get switchd up and running with that configuration.
Follow these steps to perform zero touch provisioning using USB:
1. Copy the Cumulus Linux license and installation image to the USB stick.
2. The ztp process searches the root filesystem of the newly mounted device for filenames matching
an ONIE-style waterfall (see the patterns and examples above), looking for the most specific name
first, and ending at the most generic.
3. The script's contents are parsed to ensure it contains the CUMULUS-AUTOPROVISIONING flag (see
example scripts (see page 75)).
The USB device is mounted to a temporary directory under /tmp (for example, /tmp
/tmpigGgjf/). In order to reference files on the USB, use the environment variable
ZTP_USB_MOUNTPOINT to refer to the USB root partition.
72 02 March 2018
Cumulus Networks
1. The first time you boot Cumulus Linux, eth0 is configured for DHCP and makes a DHCP request.
2. The DHCP server offers a lease to the switch.
3. If option 239 is present in the response, the zero touch provisioning process itself will start.
4. The zero touch provisioning process requests the contents of the script from the URL, sending
additional HTTP headers (see page 74) containing details about the switch.
5. The script's contents are parsed to ensure it contains the CUMULUS-AUTOPROVISIONING flag (see
example scripts (see page 75)).
6. If provisioning is necessary, then the script executes locally on the switch with root privileges.
7. The return code of the script gets examined. If it is 0, then the provisioning state is marked as
complete in the autoprovisioning configuration file.
Additionally, the hostname of the switch can be specified via the host-name option:
cumulusnetworks.com 73
Cumulus Linux 3.5 User Guide
Remember to include the following line in any of the supported scripts which are expected to be
run via the autoprovisioning framework.
# CUMULUS-AUTOPROVISIONING
This line is required somewhere in the script file in order for execution to occur.
The script must contain the CUMULUS-AUTOPROVISIONING flag. This can be in a comment or remark and
does not needed to be echoed or written to stdout.
The script can be written in any language currently supported by Cumulus Linux, such as:
Perl
Python
Ruby
Shell
74 02 March 2018
Cumulus Networks
Shell
The script must return an exit code of 0 upon success, as this triggers the autoprovisioning process to be
marked as complete in the autoprovisioning configuration file.
#!/bin/bash
function error() {
echo -e "\e[0;33mERROR: The Zero Touch Provisioning script failed
while running the command $BASH_COMMAND at line $BASH_LINENO.\e[0m"
>&2
exit 1
}
cumulusnetworks.com 75
Cumulus Linux 3.5 User Guide
# CUMULUS-AUTOPROVISIONING
exit 0
Several ZTP example scripts are available in the Cumulus GitHub repository.
You can also run ztp -s to get more information about the current state of ZTP.
cumulus@switch:~$ ztp -s
ZTP INFO:
State enabled
Version 1.0
Result Script Failure
Date Tue May 10 22:42:09 2016 UTC
Method ZTP DHCP
URL http://192.0.2.1/demo.sh
If ZTP ran when the switch booted and not manually, you can run the systemctl -l status ztp.
service then journalctl -l -u ztp.service to see if any failures occur:
76 02 March 2018
Cumulus Networks
May 11 16:37:45 cumulus ztp[400]: ztp [400]: ZTP USB: Device not found
May 11 16:38:45 dell-s6000-01 ztp[400]: ztp [400]: ZTP DHCP: Looking
for ZTP Script provided by DHCP
May 11 16:38:45 dell-s6000-01 ztp[400]: ztp [400]: Attempting to
provision via ZTP DHCP from http://192.0.2.1/demo.sh
May 11 16:38:45 dell-s6000-01 ztp[400]: ztp [400]: ZTP DHCP: URL
response code 200
May 11 16:38:45 dell-s6000-01 ztp[400]: ztp [400]: ZTP DHCP: Found
Marker CUMULUS-AUTOPROVISIONING
May 11 16:38:45 dell-s6000-01 ztp[400]: ztp [400]: ZTP DHCP:
Executing http://192.0.2.1/demo.sh
May 11 16:38:45 dell-s6000-01 ztp[400]: ztp [400]: ZTP DHCP: Payload
returned code 1
May 11 16:38:45 dell-s6000-01 ztp[400]: ztp [400]: Script returned
failure
May 11 16:38:45 dell-s6000-01 systemd[1]: ztp.service: main process
exited, code=exited, status=1/FAILURE
May 11 16:38:45 dell-s6000-01 systemd[1]: Unit ztp.service entered
failed state.
cumulus@switch:~$
cumulus@switch:~$ sudo journalctl -l -u ztp.service --no-pager
-- Logs begin at Wed 2016-05-11 16:37:42 UTC, end at Wed 2016-05-11
16:40:39 UTC. --
May 11 16:37:45 cumulus ztp[400]: ztp [400]: /var/lib/cumulus/ztp:
Sate Directory does not exist. Creating it...
May 11 16:37:45 cumulus ztp[400]: ztp [400]: /var/run/ztp.lock: Lock
File does not exist. Creating it...
May 11 16:37:45 cumulus ztp[400]: ztp [400]: /var/lib/cumulus/ztp
/ztp_state.log: State File does not exist. Creating it...
May 11 16:37:45 cumulus ztp[400]: ztp [400]: ZTP LOCAL: Looking for
ZTP local Script
May 11 16:37:45 cumulus ztp[400]: ztp [400]: ZTP LOCAL: Waterfall
search for /var/lib/cumulus/ztp/cumulus-ztp-x86_64-dell_s6000_s1220-
rUNKNOWN
May 11 16:37:45 cumulus ztp[400]: ztp [400]: ZTP LOCAL: Waterfall
search for /var/lib/cumulus/ztp/cumulus-ztp-x86_64-dell_s6000_s1220
May 11 16:37:45 cumulus ztp[400]: ztp [400]: ZTP LOCAL: Waterfall
search for /var/lib/cumulus/ztp/cumulus-ztp-x86_64-dell
May 11 16:37:45 cumulus ztp[400]: ztp [400]: ZTP LOCAL: Waterfall
search for /var/lib/cumulus/ztp/cumulus-ztp-x86_64
May 11 16:37:45 cumulus ztp[400]: ztp [400]: ZTP LOCAL: Waterfall
search for /var/lib/cumulus/ztp/cumulus-ztp
May 11 16:37:45 cumulus ztp[400]: ztp [400]: ZTP USB: Looking for
unmounted USB devices
May 11 16:37:45 cumulus ztp[400]: ztp [400]: ZTP USB: Parsing
partitions
May 11 16:37:45 cumulus ztp[400]: ztp [400]: ZTP USB: Device not found
cumulusnetworks.com 77
Cumulus Linux 3.5 User Guide
Instead of running journalctl, you can see the log history by running:
78 02 March 2018
Cumulus Networks
If you see that the issue is a script failure, you can modify the script and then run ztp manually using ztp -
v -r <URL/path to that script>, as above.
cumulusnetworks.com 79
Cumulus Linux 3.5 User Guide
Enabling ztp means that ztp will try to occur the next time the switch boots. However, if ZTP
already occurred on a previous boot up or if a manual configuration has been found, ZTP will just
exit without trying to look for any script.
ZTP checks for these manual configurations during bootup:
Password changes
Users and groups changes
Packages changes
Interfaces changes
The presence of an installed license
When the switch is booted for the very first time, ZTP records the state of some important files
that are most likely going to be modified after that the switch is configured. If ZTP is still enabled
after a reboot, ZTP will compare the recorded state to the current state of these files. If they do
not match, ZTP considers that the switch has already been provisioned and exits. These files are
only erased after a reset.
To reset ztp to its original state, use the -R option. This removes the ztp directory and ztp runs the next
time the switch reboots.
To force provisioning to occur and ignore the status listed in the configuration file use the -r option:
80 02 March 2018
Cumulus Networks
Notes
During the development of a provisioning script, the switch may need to be rebooted.
You can use the Cumulus Linux onie-select -i command to cause the switch to reprovision
itself and install a network operating system again using ONIE.
System Configuration
cumulusnetworks.com 81
Cumulus Linux 3.5 User Guide
System Configuration
The NCLU wrapper utility is called net. net is capable of configuring L2 and L3 features of the networking
stack, installing ACLs and VXLANs, rolling back and deleting snapshots, as well as providing monitoring and
troubleshooting functionality for these features. /etc/network/interfaces and /etc/frr/frr.conf
can both be configured with net, in addition to running show and clear commands related to ifupdown2
and FRRouting.
82 02 March 2018
Cumulus Networks
Contents
This chapter covers ...
What's New and Different in NCLU in Version 3.5? (see page 83)
Installing NCLU (see page 83)
Getting Started (see page 83)
Tab Completion, Verification and Inline Help (see page 84)
Adding ? (Question Mark) Ability to NCLU (see page 86)
Built-In Examples (see page 87)
Configuring User Accounts (see page 88)
Editing the netd.conf File (see page 90)
Restarting the netd Service (see page 90)
Backing up the Configuration to a Single File (see page 90)
Advanced Configuration (see page 91)
Installing NCLU
If you upgraded Cumulus Linux from a version earlier than 3.2 instead of performing a full binary install, you
need to install the nclu package on your switch:
The nclu package installs a new bash completion script, and displays the following message
when it is manually installed:
Getting Started
NCLU uses the following workflow for staging and committing changes to Cumulus Linux:
cumulusnetworks.com 83
Cumulus Linux 3.5 User Guide
1. Use the net add and net del commands to stage/remove configuration changes.
2. Use the net pending command to review staged changes.
3. Use net commit and net abort to commit/delete staged changes.
net commit applies the changes to the relevant configuration files, such as /etc/network
/interfaces, then runs necessary follow on commands to enable the configuration, such as
ifreload -a.
Once you have a running configuration, you can review and update it using:
net show: A series of commands for viewing various parts of the network configuration, such as
net show configuration, net show commit history and net show bgp to view the
complete network configuration, a history of commits using NCLU and BGP status, respectively.
net clear: A way to clear net show counters, BGP and OSPF neighbor content, and more.
net rollback: Provides a mechanism to revert back to an earlier configuration.
net commit confirm: Requires the user to press Enter in order to commit changes via NCLU. If
you run net commit confirm but do not press Enter within 10 seconds, the commit is
automatically reverted and nothing changes.
net commit permanent: Retains the snapshot (see page 57) taken when committing the
change. Otherwise, the snapshots created from NCLU commands are cleaned up periodically via a
snapper cron job.
net commit delete: Deletes one or more snapshots created when committing changes with
NCLU.
net del all: Deletes all configurations and stops the IEEE 802.1X service.
This command does not remove management VRF (see page 717) configurations, as
NCLU does not interact with eth0 interfaces and management VRF at all.
84 02 March 2018
Cumulus Networks
<552-9216> :
cumulus@switch:~$ net add int swp10 mtu 9300
ERROR: Command not found
NCLU has a comprehensive help system built in to assist usage. In addition to the net man page, you can
use ? and help to display available commands:
Usage:
# net <COMMAND> [<ARGS>] [help]
#
# net is a command line utility for networking on Cumulus Linux
switches.
#
# COMMANDS are listed below and have context specific arguments
which can
# be explored by typing "<TAB>" or "help" anytime while using net.
#
# Use 'man net' for a more comprehensive overview.
net abort
net commit [verbose] [confirm] [description <wildcard>]
net commit delete (<number>|<number-range>)
net help [verbose]
net pending
net rollback (<number>|last)
net show commit (history|<number>|<number-range>|last)
net show rollback (<number>|last)
net show configuration
[commands|files|acl|bgp|ospf|ospf6|interface <interface>]
Options:
# Help commands
help : context sensitive information; see section below
example : detailed examples of common workflows
# Configuration commands
add : add/modify configuration
del : remove configuration
cumulusnetworks.com 85
Cumulus Linux 3.5 User Guide
# Status commands
show : show command output
clear : clear counters, BGP neighbors, etc
Uncomment the very last line in the .inputrc file so that the file changes from this:
86 02 March 2018
Cumulus Networks
to this:
Save the file and reconnect to the switch. The ? (question mark) ability will work on all subsequent sessions
on the switch.
cumulus@leaf01:~$ net
abort : abandon changes in the commit buffer
add : add/modify configuration
clear : clear counters, BGP neighbors, etc
commit : apply the commit buffer to the system
del : remove configuration
example : detailed examples of common workflows
help : Show this screen and exit
pending : show changes staged in the commit buffer
rollback : revert to a previous configuration state
show : show command output
When the question mark is typed, NCLU will autocomplete and show all available options, but the
question mark won't actually appear on the terminal. This is normal, expected behavior.
Built-In Examples
The NCLU has a number of built in examples to guide users through basic configuration setup:
cumulusnetworks.com 87
Cumulus Linux 3.5 User Guide
Scenario
========
We are configuring switch1 and would like to configure the following
- configure switch1 as an L2 switch for host-11 and host-12
- enable vlans 10-20
- place host-11 in vlan 10
- place host-12 in vlan 20
- create an SVI interface for vlan 10
- create an SVI interface for vlan 20
- assign IP 10.0.0.1/24 to the SVI for vlan 10
- assign IP 20.0.0.1/24 to the SVI for vlan 20
- configure swp3 as a trunk for vlans 10, 11, 12 and 20
swp3
*switch1 --------- switch2
/\
swp1 / \ swp2
/ \
/ \
host-11 host-12
Verification
============
switch1# net show interface
switch1# net show bridge macs
88 02 March 2018
Cumulus Networks
You create user accounts with read-only permissions for NCLU by adding them to the netshow
group. A user in the netshow group can run NCLU net show commands, such as net show
interface or net show config, and certain general Linux commands, such as ls, cd or man,
but cannot run net add, net del or net commit commands.
You create user accounts with edit permissions for NCLU by adding them to the netedit group. A
user in the netedit group can run NCLU configuration commands, such net add, net del or
net commit in addition to NCLU net show commands.
The examples below demonstrate how to add a new user account or modify an existing user account called
myuser.
To add a new user account with NCLU show permissions:
You can use the adduser command for local user accounts only. You can use the addgroup
command for both local and remote user accounts. For a remote user account, you must use the
mapping username, such as tacacs3 or radius_user, not the TACACS (see page 117) or
RADIUS (see page 129) account name.
If the user tries to run commands that are not allowed, the following error displays:
cumulusnetworks.com 89
Cumulus Linux 3.5 User Guide
To configure a new user group to use NCLU, add that group to the groups_with_edit and
groups_with_show lines in the file.
Use caution giving groups edit permissions. For example, you don't want to give edit permissions
to the tacacs group (see page 121).
For example, to copy out the configuration of a leaf switch called leaf01, you would run something like the
following:
With the commands all stored in a single file, you can now copy this file to another ToR switch in your
network called leaf01 and apply the configuration by running:
Advanced Configuration
NCLU needs no initial configuration; it's ready to go in Cumulus Linux. However, if you need to modify its
configuration, you must manually update the /etc/netd.conf file. This file can be configured to allow
different permission levels for users to edit configurations and run show commands. It also contains a
blacklist that hides less frequently used terms from the tabbed autocomplete.
cumulusnetworks.com 91
Cumulus Linux 3.5 User Guide
Sets the
Linux
groups with
root edit
privileges.
Contents
92 02 March 2018
Cumulus Networks
Contents
This chapter covers ...
Setting the Time Zone (see page 93)
Alternative: Use the Guided Wizard to Find and Apply a Time Zone (see page 93)
Setting the Date and Time (see page 94)
Setting Time Using NTP and NCLU (see page 95)
Specifying the NTP Source Interface (see page 96)
NTP Default Configuration (see page 96)
Related Information (see page 98)
Edit the file to add your desired time zone. A list of valid time zones can be found at the following link.
Use the following command to apply the new time zone immediately.
Alternative: Use the Guided Wizard to Find and Apply a Time Zone
To set the time zone, run dpkg-reconfigure tzdata as root:
Then navigate the menus to enable the time zone you want. The following example selects the US/Pacific
time zone:
Configuring tzdata
------------------
cumulusnetworks.com 93
Cumulus Linux 3.5 User Guide
For more info see the Debian System Administrator’s Manual – Time.
If you need to reconfigure the current time zone, refer to the instructions above.
Then, to set the system clock according to the time zone configured:
These commands add the NTP server to the list of servers in /etc/ntp.conf:
To set the initial date and time via NTP before starting the ntpd daemon, use ntpd -q. This is same as
ntpdate, which is to be retired and no longer available. See man ntp.conf(5) for details on configuring
ntpd using ntp.conf.
cumulusnetworks.com 95
Cumulus Linux 3.5 User Guide
These commands create the following configuration snippet in the ntp.conf file:
...
# Specify interfaces
interface listen swp10
...
cumulusnetworks.com 97
Cumulus Linux 3.5 User Guide
# Clients from this (example!) subnet have unlimited access, but only
if
# cryptographically authenticated.
#restrict 192.168.123.0 mask 255.255.255.0 notrust
# If you want to provide time to your local subnet, change the next
line.
# (Again, the address is an example only.)
#broadcast 192.168.123.255
# If you want to listen to time broadcasts on your local subnet, de-
comment the
# next lines. Please do this only if you trust everybody on the
network!
#disable auth
#broadcastclient
# Specify interfaces, don't listen on switch ports
interface listen eth0
Related Information
Debian System Administrator’s Manual – Time
www.ntp.org
en.wikipedia.org/wiki/Network_Time_Protocol
wiki.debian.org/NTP
Contents
This chapter covers ...
Generate an SSH Key Pair (see page 98)
Related Information (see page 100)
98 02 March 2018
Cumulus Networks
cumulus@leaf01:~$ ssh-keygen
Generating public/private rsa key pair.
Enter file in which to save the key (/home/cumulus/.ssh/id_rsa):
Enter passphrase (empty for no passphrase):
Enter same passphrase again:
Your identification has been saved in /home/cumulus/.ssh/id_rsa.
Your public key has been saved in /home/cumulus/.ssh/id_rsa.pub.
The key fingerprint is:
5a:b4:16:a0:f9:14:6b:51:f6:f6:c0:76:1a:35:2b:bb cumulus@leaf04
The key's randomart image is:
+---[RSA 2048]----+
| +.o o |
| o * o . o |
| o + o O o |
| + . = O |
| . S o . |
| + . |
| . E |
| |
| |
+-----------------+
2. Run the ssh-copy-id command, and follow the prompts, to copy the generated public key to the
desired location:
ssh-copy-id will not work if the username on the remote switch is different to the local
switch. To work around this issue, use the scp command instead:
cumulusnetworks.com 99
Cumulus Linux 3.5 User Guide
3. Connect to the remote switch to confirm the authentication keys are in place:
Related Information
Debian Documentation - Password-less logins with OpenSSH
Wikipedia - Secure Shell (SSH)
User Accounts
By default, Cumulus Linux has two user accounts: cumulus and root.
The cumulus account:
Default password is CumulusLinux!
Is a user account in the sudo group with sudo privileges
User can log in to the system via all the usual channels like console and SSH (see page 98)
Along with the cumulus group, has both show and edit rights for NCLU (see page 82)
The root account:
Default password is disabled by default
Has the standard Linux root user access to everything on the switch
Disabled password prohibits login to the switch by SSH, telnet, FTP, and so forth
For best security, you should change the default password (using the passwd command) before you
configure Cumulus Linux on the switch.
You can add more user accounts as needed. Like the cumulus account, these accounts must use sudo to
100 02 March 2018
Cumulus Networks
You can add more user accounts as needed. Like the cumulus account, these accounts must use sudo to
execute privileged commands (see page 102), so be sure to include them in the sudo group.
To access the switch without any password requires booting into a single shell/user mode (see page 767).
You can add and configure user accounts in Cumulus Linux with read-only or edit permissions for NCLU.
For more information, see Configuring User Accounts (see page 82).
3. A prompt appears, asking you to Enter file in which to save the key (/root/.ssh/id_rsa):. Press Enter to use
the root user's home directory, or else provide a different destination.
4. You are prompted to Enter passphrase (empty for no passphrase):. This is optional but it does provide
an extra layer of security.
5. The public key is now located in /root/.ssh/id_rsa.pub. The private key (identification) is now
located in /root/.ssh/id_rsa.
6. Copy the public key to the switch. SSH to the switch as the cumulus user, then run:
cumulusnetworks.com 101
1.
...
# Authentication:
LoginGraceTime 120
PermitRootLogin yes
StrictModes yes
...
Contents
This chapter covers ...
Using sudo (see page 102)
sudoers Examples (see page 103)
Related Information (see page 108)
Using sudo
sudo allows you to execute a command as superuser or another user as specified by the security policy.
See man sudo(8) for details.
The default security policy is sudoers, which is configured using /etc/sudoers. Use /etc/sudoers.d/ to
add to the default sudoers policy. See man sudoers(5) for details.
Use visudo only to edit the sudoers file; do not use another editor like vi or emacs. See man
visudo(8) for details.
Errors in the sudoers file can result in losing the ability to elevate privileges to root. You can fix
this issue only by power cycling the switch and booting into single user mode. Before modifying
sudoers, enable the root user by setting a password for the root user.
By default, users in the sudo group can use sudo to execute privileged commands. To add users to the
sudo group, use the useradd(8) or usermod(8) command. To see which users belong to the sudo
group, see /etc/group (man group(5)).
Any command can be run as sudo, including su. A password is required.
The example below shows how to use sudo as a non-privileged user cumulus to bring up an interface:
sudoers Examples
The following examples show how you grant as few privileges as necessary to a user or group of users to
allow them to perform the required task. For each example, the system group noc is used; groups are
prefixed with an %.
When executed by an unprivileged user, the example commands below must be prefixed with sudo.
Monitoring Switch
port info
ethtool -m swp1 %noc ALL=(ALL) NOPASSWD:
/sbin/ethtool
Monitoring System
diagnostics
cl-support %noc ALL=(ALL) NOPASSWD:/usr
/cumulus/bin/cl-support
Monitoring
cumulusnetworks.com 103
Cumulus Linux 3.5 User Guide
Image Install
management images
onie-select %noc ALL=(ALL) NOPASSWD:/usr
http://lab /cumulus/bin/onie-select
/install.bin
Package Install
management packages
apt-get install %noc ALL=(ALL) NOPASSWD:/usr
vim /bin/apt-get install *
Package Upgrading
management
apt-get upgrade %noc ALL=(ALL) NOPASSWD:/usr
/bin/apt-get upgrade
Netfilter List
iptables
rules iptables -L %noc ALL=(ALL) NOPASSWD:
/sbin/iptables
Interfaces Up any
interface
ifup swp1 %noc ALL=(ALL) NOPASSWD:
/sbin/ifup
Interfaces Up/down
only swp2
ifup swp2 /
ifdown swp2
cumulusnetworks.com 105
Cumulus Linux 3.5 User Guide
Interfaces Any IP
address
chg ip addr %noc ALL=(ALL) NOPASSWD:
{add|del} /sbin/ip addr *
192.0.2.1/30
dev swp1
Ethernet Add
bridging bridges
and ints brctl addbr br0 %noc ALL=(ALL) NOPASSWD:
/ brctl addif /sbin/brctl addbr *,/sbin
br0 swp1 /brctl addif *
Troubleshooting Restart
switchd
systemctl %noc ALL=(ALL) NOPASSWD:/usr
restart switchd. /sbin/service switchd *
service
Troubleshooting Restart
any service
systemctl cron %noc ALL=(ALL) NOPASSWD:/usr
switchd.service /sbin/service
Troubleshooting Packet
capture
tcpdump %noc ALL=(ALL) NOPASSWD:/usr
/sbin/tcpdump
L3 Add static
routes
ip route add %noc ALL=(ALL) NOPASSWD:/bin
10.2.0.0/16 via /ip route add *
10.0.0.1
L3 Delete
static
routes ip route del %noc ALL=(ALL) NOPASSWD:/bin
10.2.0.0/16 via /ip route del *
10.0.0.1
L3 Any static
route chg
ip route * %noc ALL=(ALL) NOPASSWD:/bin
/ip route *
L3 Any
iproute
command ip *
cumulusnetworks.com 107
Cumulus Linux 3.5 User Guide
L3 Non-
modal
OSPF cl-ospf area %noc ALL=(ALL) NOPASSWD:/usr
0.0.0.1 range /bin/cl-ospf
10.0.0.0/24
Related Information
sudo
Adding Yourself to sudoers
Contents
This chapter covers ...
Configuring LDAP Authentication (see page 109)
Installing libnss-ldapd (see page 109)
Configuring nslcd.conf (see page 110)
Connection (see page 110)
Search Function (see page 110)
Search Filters (see page 111)
Attribute Mapping (see page 111)
Example Configuration (see page 111)
Troubleshooting (see page 111)
Installing libnss-ldapd
The libpam-ldapd package depends on nslcd, so to install libnss-ldapd, libpam-ldapd and ldap-
utils, you must run:
This brings up an interactive prompt asking questions about the LDAP URI, search base distinguished name
(DN) and services that should have LDAP lookups enabled. This creates a very basic LDAP configuration,
using anonymous bind, and initiating the search for a user under the base DN specified.
Alternatively, these parameters can be pre-seeded using the debconf-utils. To use this
method, run apt-get install debconf-utils and create the pre-seeded parameters using
debconf-set-selections with the appropriate answers. Run debconf-show <pkg> to
check the settings. Here is an example of how to preseed answers to the installer questions using
debconf-set-selections .
Once the install is complete, the name service LDAP caching daemon (nslcd) will be running. This is the
service that handles all of the LDAP protocol interactions, and caches the information returned from the
LDAP server. In /etc/nsswitch.conf, ldap has been appended and is the secondary information
source for passwd, group and shadow. The local files (/etc/passwd, /etc/groups and /etc/shadow) are
used first, as specified by the compat source.
cumulusnetworks.com 109
Cumulus Linux 3.5 User Guide
You are strongly advised to keep compat as the first source in NSS for passwd, group and shadow.
This prevents you from getting locked out of the system.
Configuring nslcd.conf
You need to update the main configuration file (/etc/nslcd.conf) after installation to accommodate the
expected LDAP server settings. The nslcd.conf man page details all the available configuration options.
Some of the more important options are related to security and how the queries are handled.
Connection
The LDAP client starts a session by connecting to the LDAP server, by default, on TCP and UDP port 389, or
on port 636 for LDAPS. Depending on the configuration, this connection may be unauthenticated
(anonymous bind); otherwise, the client must provide a bind user and password. The variables used to
define the connection to the LDAP server are the URI and bind credentials.
The URI is mandatory, and specifies the LDAP server location using the FQDN or IP address. It also
designates whether to use ldap:// for clear text transport, or ldaps:// for SSL/TLS encrypted transport.
Optionally, an alternate port may also be specified in the URI. Typically, in production environments, it is
best to utilize the LDAPS protocol. Otherwise all communications are clear text and not secure.
After the connection to the server is complete, the BIND operation authenticates the session. The BIND
credentials are optional, and if not specified, an anonymous bind is assumed. This is typically not allowed in
most production environments. Configure authenticated (Simple) BIND by specifying the user ( binddn) and
password (bindpw) in the configuration. Another option is to use SASL (Simple Authentication and Security
Layer) BIND, which provides authentication services using other mechanisms, like Kerberos. Contact your
LDAP server administrator for this information since it depends on the configuration of the LDAP server
and what credentials are created for the client device.
Search Function
When an LDAP client requests information about a resource, it must connect and bind to the server. Then
it performs one or more resource queries depending on what it is looking up. All search queries sent to the
LDAP server are created using the configured search base, filter, and the desired entry (uid=myuser) being
searched for. If the LDAP directory is large, this search may take a significant amount of time. It is a good
idea to define a more specific search base for the common maps (passwd and group).
Search Filters
It is also common to use search filters to specify criteria used when searching for objects within the
directory. This is used to limit the search scope when authenticating users. The default filters applied are:
Attribute Mapping
The map configuration allows for overriding the attributes pushed from LDAP. To override an attribute for a
given map*, specify the attribute name and the new value. One example of how this is useful is ensuring
the shell is bash and the home directory is /home/cumulus:
*In LDAP, the map refers to one of the supported maps specified in the manpage for nslcd.
conf (such as passwd or group).
Example Configuration
Here is an example configuration using Cumulus Linux.
Troubleshooting
Once you enable debug mode, run the following command to test LDAP queries:
cumulusnetworks.com 111
Cumulus Linux 3.5 User Guide
If LDAP is configured correctly, the following messages appear after you run the getent command:
In the output above, <passwd(all)> indicates that the entire directory structure was queried.
A specific user can be queried using the command:
You can replace myuser with any username on the switch. The following debug output indicates that user
myuser exists:
Notice how the <passwd="myuser"> shows that the specific myuser user was queried.
Common Problems
SSL/TLS
The FQDN of the LDAP server URI does not match the FQDN in the CA-signed server certificate
exactly.
nslcd cannot read the SSL certificate, and will report a "Permission denied" error in the debug
during server connection negotiation. Check the permission on each directory in the path of the
root SSL certificate. Ensure that it is readable by the nslcd user.
NSCD
If the nscd cache daemon is also enabled and you make some changes to the user from LDAP,
you may want to clear the cache using the commands:
The nscd package works with nslcd to cache name entries returned from the LDAP server. This
may cause authentication failures. To work around these issues:
1. Disable nscd by running:
LDAP
The search filter returns wrong results. Check for typos in the search filter. Use ldapsearch to test
your filter.
Optionally, configure the basic LDAP connection and search parameters in /etc/ldap/ldap.conf
.
When a local username also exists in the LDAP database, the order of the information sources in
/etc/nsswitch can be updated to query LDAP before the local user database. This is generally
not recommended. For example, the configuration below ensures that LDAP is queried before the
local database.
# /etc/nsswitch.conf
passwd: ldap compat
cumulus@switch:~$ id cumulus
uid=1000(cumulus) gid=1000(cumulus) groups=1000(cumulus),24(cdrom),25
(floppy),27(sudo),29(audio),30(dip),44(video),46(plugdev)
cumulus@switch:~$ id myuser
uid=1230(myuser) gid=3000(Development) groups=3000(Development),500
(Employees),27(sudo)
Using getent
The getent command retrieves all records found via NSS for a given map. It can also get a specific entry
under that map. Tests can be done with the passwd, group, shadow or any other map configured in /etc
/nsswitch.conf. The output from this command is formatted according to the map requested. Thus, for
the passwd service, the structure of the output is the same as the entries in /etc/passwd. The same can
be said for the group map will output the same as /etc/group. In this example, looking up a specific user
in the passwd map, the user cumulus is locally defined in /etc/passwd, and myuser is only in LDAP.
In the next example, looking up a specific group in the group service, the group cumulus is locally defined in
/etc/groups, and netadmin is on LDAP.
Running the command getent passwd or getent group without a specific request, returns all local
and LDAP entries for the passwd and group maps, respectively.
# extended LDIF
#
# LDAPv3
# base <dc=example,dc=com> with scope subtree
# filter: uid=myuser
# requesting: ALL
#
# myuser, people, example.com
dn: uid=myuser,ou=people,dc=example,dc=com
cn: My User
displayName: My User
gecos: myuser
gidNumber: 3000
givenName: My
homeDirectory: /home/myuser
initials: MU
loginShell: /bin/bash
mail: [email protected]
objectClass: inetOrgPerson
objectClass: posixAccount
objectClass: shadowAccount
objectClass: top
shadowExpire: -1
shadowFlag: 0
shadowMax: 999999
shadowMin: 8
shadowWarning: 7
sn: User
uid: myuser
uidNumber: 1234
# search result
search: 2
result: 0 Success
# numResponses: 2
# numEntries: 1
LDAP Browsers
116 02 March 2018
Cumulus Networks
LDAP Browsers
There are some GUI LDAP clients that help to work with LDAP servers. These are free tools to help
graphically show the structure of the LDAP database.
Apache Directory Studio
LDAPManager
Related Information
Debian - configuring LDAP authentication
Debian - configuring PAM to use LDAP
GitHub - Arthur de Jong nslcd.conf file
Debian backports
TACACS Plus
Cumulus Linux implements TACACS+ client AAA (Accounting, Authentication, and Authorization) in a
transparent way with minimal configuration. The client implements the TACACS+ protocol as described in
this IETF document. There is no need to create accounts or directories on the switch. Accounting records
are sent to all configuredTACACS+ servers by default. Use of per-command authorization requires
additional setup on the switch.
Contents
This chapter covers ...
Supported Features (see page 118)
Installing the TACACS+ Client Packages (see page )
Configuring the TACACS+ Client (see page )
TACACS+ Authentication (login) (see page )
TACACS+ Accounting (see page )
Configuring NCLU for TACACS+ Users (see page )
TACACS+ Per-command Authorization (see page )
Command Options (see page 122)
NSS Plugin (see page 123)
TACACS Configuration Parameters (see page 123)
Removing the TACACS+ Client Packages (see page )
Troubleshooting TACACS+ (see page )
Debugging Basic Server Connectivity or NSS Issues (see page 125)
Debugging Issues with Per-command Authorization (see page 126)
Debug Issues with Accounting Records (see page 127)
TACACS Component Software Descriptions (see page 127)
Limitations (see page 128)
TACACS+ Client Is only Supported through the Management Interface (see page )
Multiple TACACS+ Users (see page )
cumulusnetworks.com 117
Cumulus Linux 3.5 User Guide
Multiple TACACS+ Users (see page )
Issues with deluser Command (see page 129)
Supported Features
Authentication via PAM; includes login, ssh, sudo and su
Runs over the eth0 management interface
Ability to run in the management VRF (see page 717)
TACACS+ privilege 15 users can run any command with sudo via the /etc/sudoers.d/tacplus
file that is installed by the libtacplus-map1 package
Up to 7 TACACS+ servers
secret=tacacskey
server=192.168.0.30
Up to 7 TACACS+ servers are supported. Connections are made in the order in which they are listed in this
file. In most cases, no other parameters need to be changed. All parameters used by any of the packages
can be added to this file, and will affect all the TACACS+ client software. It is also possible to configure some
of the packages through individual configuration files. For example, the timeout value (see description
below) is set to 5 seconds by default for NSS lookups in /etc/tacplus_nss.conf, while other packages
use a value of 10 seconds, set in /etc/tacplus_servers.
When TACACS+ servers or secrets are added or removed, auditd must be restarted (with systemctl
restart auditd) or a signal must be sent (with killall -HUP audisp-tacplus) before audisp-
tacplus will reread the configuration to see the changed server list. Usually this is an issue only at first
change to the configuration.
At this point, the Cumulus Linux switch should be able to query the TACACS server.
This is the complete list of the TACACS+ client configuration files, and their use. The full list of TACACS+
parameters is below at TACACS Parameters below.
Filename Description
/etc This is the primary file that requires configuration post-installation, and is used by all
/tacplus_servers packages via include=/etc/tacplus_servers parameters in the other
configuration files, when installed. Since this is usually the file with the shared
secrets, it should not be world readable (should be Linux file mode 600).
/etc/nsswitch. When the libnss_tacplus package is installed, this file is configured to enable
conf tacplus lookups via libnss_tacplus. If this file is replaced by automation or other
means, you will need to add tacplus as the first lookup method for the passwd
database line.
/etc/tacplus_nss. This file sets the basic parameters for libnss_tacplus. It includes a debug variable
conf for debugging NSS lookups separately from other client packages.
/usr/share/pam- Configuration file for pam-auth-update to generate the files in the next row. These
configs/tacplus configurations are used at login, by su, and by ssh.
/etc/pam.d The /etc/pam.d/common-* files are updated for tacplus authentication. The files
/common-* are updated with pam-auth-update, when libpam-tacplus is installed or
removed.
/etc/sudoers.d This file allows TACACS+ privilege level 15 users to run commands with sudo. It also
/tacplus includes an example (commented out) showing how to enable privilege level 15
TACACS users to use sudo without having to enter a password, and an example of
how to enable all TACACS users to run specific commands via sudo. It should only be
editted with the command: visudo -f /etc/sudoers.d/tacplus
/etc/audisp TACACS+ server configuration file for accounting. In general, no modifications are
/audisp- required. It may be useful to use this configuration file when you only want to debug
tac_plus.conf TACACS+ accounting issues, not all TACACS+ users.
/etc/audit/rules. The auditd rules for TACACS+ accounting. The augenrules command uses all rules
d/audisp- files to generate the file rules file (below)
tacplus.rules
cumulusnetworks.com 119
Cumulus Linux 3.5 User Guide
TACACS+ users at privilege levels other than 15 are not allowed to run sudo commands by
default, and are limited to commands that can be run with standard Linux user permissions.
TACACS+ Accounting
TACACS+ accounting is implemented with the audisp module, with an additional plugin for auditd/
audisp. The plugin maps the auid in the accounting record to a TACACS login, based on the auid and
sessionid. The audisp module requires libnss_tacplus, and uses the libtacplus_map.so library
interfaces as part of the modified lipam_tacplus package.
Communication with the TACACS+ servers is done via the libsimple-tacact1 library, through dlopen()
. A maximum of 240 bytes of command name and arguments are sent in the accounting record, due to the
TACACS+ field length limitation of 255 bytes.
All Linux commands result in an accounting record, including commands run as part of the login
process or as a sub-processes of other commands. This can sometimes generate a large number
of accounting records.
The IP address and encryption key of the server should be configured in the /etc/tacplus_servers file.
Minimal configuration to auditd and audisp is necessary to enable the audit records necessary for
accounting. These records are installed as part of the package.
audisp-tacplus installs the audit rules for command accounting. Modifying the configuration files is not
usually necessary. However, when a management VRF (see page 717) is configured, the accounting
configuration does need special modification, because the auditd service starts prior to networking. It is
necessary add the vrf parameter, and to signal the audisp-tacplus process to reread the configuration.
The example below shows that the management VRF is named mgmt. The vrf parameter can be placed in
either /etc/tacplus_servers or in /etc/audisp/audisp-tac_plus.conf.
vrf=mgmt
After editing the configuration file, notify the accounting process to reread it by sending the HUP signal:
killall -HUP audisp-tacplus.
All sudo commands run by TACACS+ users generate accounting records against the original
TACACS+ login name.
For more information, refer to the audisp.8 and auditd.8 man pages.
...
...
TACACS_USER in the above output is actually the username of the account logged in via TACACS.
Do not add the tacacs group to the groups_with_edit line, as this is dangerous and can
potentially enable any user to log into the switch as the root user.
If the user/command combination is not authorized by the TACACS+ server, a message similar to the
following gets displayed:
Command Options
Option Description
-a The utility can be invoked with the -a option as many times as desired. For each command in
the -a list, a symbolic link is created from tacplus-auth to the relative portion of the
command name in the local bin subdirectory. These commands also need to be enabled on
the TACACS+ server. See the server documentation for how to do that. It is common to have
the server allow some options to a command, but not others.
-f Re-initializes the environment. If you need to start over, issue the -f option with the -i to
force the re-initialization; otherwise, repeated use of -i is ignored. As part of the initialization:
The user's shell is changed to /bin/rbash.
Any existing dot files are saved.
A limited environment is set up that does not allow general command execution, but
instead allows only commands from the user's local bin subdirectory.
As a full example, if you want to allow the user to be able to run the net and ip commands (potentially, if
the TACACS+ server authorizes), use the command:
After running this command, examining the tacacs0 directory should show something similar to the
following:
Other than shell built-ins, the only two commands the privilege level 0 TACACS users can run are the ip
and net commands.
If you mistakenly add potential commands with the -a option, you can remove the commands that you
don't want (the example below shows the net command):
Use the man command on the switch for more information on tacplus-auth and tacplus-restrict.
NSS Plugin
When used with pam_tacplus, TACACS+ authenticated users are able to log in without a local account on
the system via the NSS plugin that comes with the tacplus_nss package. The plugin uses the mapped
tacplus information if the user is not found in the local password file, provides the getpwnam() and
getpwuid()entry point,s and uses the TACACS+ authentication functions.
The plugin asks the TACACS+ server if the user is known, and then for relevant attributes to determine the
user’s privilege level. When the libnss_tacplus package is installed, nsswitch.conf is be modified to
set tacplus as the first lookup method for passwd. If the order is changed, lookups will return the local
accounts such as tacacs0
If the user is not found, a mapped lookup is performed using the libtacplus_map.so exported
functions. The privilege level is appended to “tacacs”, and the lookup searches for the name in the local
password file. For example, privilege level 15 will search for the tacacs15 user. If the user is found, the
password structure is filled in with the user’s information.
If it is not found, the privilege level is decremented and checked again, until privilege level 0 (user t acacs0)
is reached. This allows use of only the two local users tacacs0 and tacacs15, if minimal configuration is
desired.
Configuration Description
Option
cumulusnetworks.com 123
Cumulus Linux 3.5 User Guide
Configuration Description
Option
secret=STRING Secret key used to encrypt/decrypt packets sent to/received from the server. Can
be specified more than once, and can be in any order with respect to the server=
parameter. When fewer secret= parameters are specified, the last secret given is
used for the remaining servers. This parameter should only be put into files such as
/etc/tacplus_servers that are not world readable.
server=HOSTNAME Adds a TACACS+ server to the servers list. Servers will be queried in turn until a
match is found, or no servers remain in the list. Can be specified up to 7 times.
server=IP_ADDR
When the IP_ADDR form is used, it can be optionally followed by a port number,
preceded by a ":". The default port is 49.
When sending accounting records, the record is sent to all servers in the
list if acct_all=1, which is the default.
login=STRING TACACS+ authentication service (pap, chap, or login). The default value is pap.
user_homedir=1 This is not enabled by default. When enabled, a separate home directory for each
TACACS+ user is created when the TACACS+ user first logs in. By default, the home
directory in the mapping accounts in /etc/passwd (/home/tacacs0 ... /home
/tacacs15) is used. If the home directory does not exist, it is created with the
mkhomedir_helper program, in the same manner as pam_mkhomedir.
This option is not honored for accounts with restricted shells when per-command
authorization is enabled.
timeout=SECS Sets the timeout in seconds for connections to each TACACS+ server. The default is
10 seconds for all lookups except that NSS lookups use a 5 second timeout.
vrf=VRFNAME If the management network is in a VRF, set this variable to the VRF name. This
would usually be "mgmt". When this variable is set, the connection to the TACACS+
accounting servers is made through the named VRF.
service TACACS+ accounting and authorization service. Examples include shell, pap,
raccess, ppp, and slip.
Configuration Description
Option
To remove the TACACS+ client configuration files as well as the packages (recommended), use this
command:
Troubleshooting TACACS+
cumulusnetworks.com 125
Cumulus Linux 3.5 User Guide
If TACACS does not appear to be working correctly, the following configuration files should be debugged by
adding the debug=1 parameter to one or more of these files:
/etc/tacplus_servers
/etc/tacplus_nss.conf
When this debugging is enabled, additional information is shown for the command authorization
conversation with the TACACS+ server:
To disable debugging:
If accounting records still are not being sent, add debug=1 to the file /etc/audisp/audisp-tac_plus.
conf, and then issue the command above to notify the plugin. Then have the TACACS+ user run a
command, and examine the end of /var/log/syslog for messages from the plugin. You can also check
the auditing log file /var/log/audit/audit.log to be sure the auditing records are being written. If
they are not, restart the audit daemon with:
Package Description
Name
audisp- This package uses auditing data from auditd to send accounting records to the TACACS+
tacplus_1. server and is started as part of auditd.
0.0-1-
cl3u3
libnss- Provides an interface between libc username lookups, the mapping functions, and the
tacplus_1. TACACS+ server.
0.1-cl3u3
cumulusnetworks.com 127
Cumulus Linux 3.5 User Guide
Package Description
Name
tacplus- This package provides the ability to do per-command TACACS+ authorization, and a setup
auth-1.0.0- utility tacplus-restrict to enable that. Per-command authorization is not done by default.
cl3u1
libtacplus- The mapping functionality between local and TACACS+ users on the server. Sets the
map1_1. immutable sessionid and auditing UID to ensure the original user can be tracked through
0.0-cl3u2 multiple processes and privilege changes. Sets the auditing loginuid as immutable if
supported. Creates and maintains a status database in /run/tacacs_client_map to
manage and lookup mappings.
libsimple- Provides an interface for programs to send accounting records to the TACACS+ server. Used
tacacct1_1. by audisp-tacplus.
0.0-cl3u2
libtac2- Provides the “tacc” testing program and TACACS+ man page.
bin_1.4.0-
cl3u2
Limitations
The current algorithm returns the first name matching the UID from the mapping file; this could
be the first or second user that logged in.
To work around this issue, the switch’s audit log or the TACACS server accounting logs can be used to
128 02 March 2018
Cumulus Networks
To work around this issue, the switch’s audit log or the TACACS server accounting logs can be used to
determine which processes and files were created by each user.
For commands that do not execute other commands (for example, changes to configurations in an
editor, or actions with tools like clagctl and vtysh), no additional accounting is done.
Per-command authorization is not implemented in this release except at the most basic level
(commands are permitted or denied based on the standard Linux user permissions for the local
TACACS users, and only privilege level 15 users can run sudo commands by default).
The Linux auditd system does not always generate audit events for processes when terminated with a
signal (via the kill system call or internal errors such as SIGSEGV). As a result, processes that exit on a
signal that isn’t caught and handled may not generate a STOP accounting record.
However, the command does remove the home directory. The user can still log in on that account, but will
not have a valid home directory. This is a known upstream issue with the deluser command for all non-
local users.
The --remove-home option should only be used when the user_homedir=1 configuration command is
in use.
RADIUS AAA
Cumulus Networks offers add-on packages that provide the ability for RADIUS users to log in to Cumulus
Linux switches in a transparent way with minimal configuration. There is no need to create accounts or
directories on the switch. Authentication is handled via PAM, and includes login, ssh, sudo and su.
cumulusnetworks.com 129
Cumulus Linux 3.5 User Guide
During installation, the PAM configuration is modified automatically via pam-auth-update (8), and the
NSS configuration file /etc/nsswitch.conf is modified to add the mapuser and mapuid plugins. If you
remove or purge the packages, these files are modified to remove the configuration for these plugins.
The server port number or name is optional. The system looks up the port in the /etc/services file.
However, you can override the ports in the /etc/pam_radius_auth.conf file.
The timeout defaults to 3 seconds; you only need to change this setting if the server is slow or latencies
are high.
The src_ip option only needs to be specified if you want to use a specific interface to reach the server. It
can be set to the hostname of the interface, or an IPv4 or IPv6 address. If specified, the timeout option
must also be specified.
The other field that you may need to set is the vrf-name field. This is normally set to mgmt if you are using
a management VRF (see page 717). If you are specifying the src_ip field for a server entry, and that
interface is in a VRF, you may also need to set it. At this time, you cannot specify more than one VRF. The
example below shows the vrf setting in the /etc/pam_radius_auth.conf file.
When first configuring the RADIUS client, it is a good idea to enable debugging. This can be done by
uncommenting the debug line in the configuration file. Debugging messages are written to syslog.
There are a number of PAM configuration keywords that you can set, but these do not normally need to be
configured. See the pam_radius_auth (8) man page for the variables that can be used. You can add
these either by editing:
The /etc/pam.d/common-auth lines for pam_radius_auth.so .
The /usr/share/pam-configs/radius file, then running pam-auth-update --package.
The latter method is recommended, because it is less likely to introduce errors.
The libpam-radius-auth package supplied with the Cumulus Linux RADIUS client is a newer version
130 02 March 2018
Cumulus Networks
The libpam-radius-auth package supplied with the Cumulus Linux RADIUS client is a newer version
than the one in Debian Jessie, and has added support for IPv6, the src_ip field described above, as well as
a number of bug fixes and minor features. Cumulus Linux further added VRF support, man pages
describing the PAM and RADIUS configuration, and the setting of the SUDO_PROMPT environment variable
to the login name as part of the mapping support described below.
...
...
Do not add the radius_users group to the groups_with_edit line, as this is dangerous and can
potentially enable any user to log into the switch as the root user.
If the user/command combination is not authorized by the RADIUS server, a message similar to the
following gets displayed:
cumulusnetworks.com 131
Cumulus Linux 3.5 User Guide
radius_user:x:1017:1002:radius user:/home/radius_user:/bin/bash
then the matching line returned by running getent passwd dave would be:
and the home directory /home/dave would be created during the login process if it does not already exist,
and will be populated with the standard skeleton files by the mkhomedir_helper command.
The configuration file /etc/nss_mapuser.conf is used to configure the plugins. It does not normally
need to be changed. The nss_mapuser (5) man page fully describes the configuration file. Comments in
the file also describe the fields.
A flat file mapping is done based on the session number assigned during login, and it persists across su
and sudo. The mapping is removed at logout.
where dave is the login account name of a RADIUS user. If you want all RADIUS users to have this ability, you
can enable sudo access for all members of the radius_users group:
When the packages are removed, the plugins are removed from the /etc/nsswitch.conf file and from
the PAM files, respectively.
To remove all configuration files for these packages, run:
The RADIUS fixed account is not removed from /etc/passwd or /etc/group, and the home
directories are not removed either. They are left in case there are modifications to the account or
files in the home directories.
To remove the home directories of the RADIUS users, first get the list by running:
For all users listed other than the radius_user, run this command to remove the home directories:
where USERNAME is the account name (the home directory relative portion). This command gives the
following warning:
Because the user is not listed in the /etc/passwd file. After removing all the RADIUS users, run the
command to remove the fixed account (the account may have been changed in /etc/nss_mapuser.conf
; if so, use that account name instead of radius_user).
cumulusnetworks.com 133
Cumulus Linux 3.5 User Guide
Limitations
Related Information
TACACS+ client (see page 117)
Cumulus Networks RADIUS demo on GitHub
Cumulus Network TACACS demo on GitHub
Netfilter - ACLs
Netfilter is the packet filtering framework in Cumulus Linux as well as most other Linux distributions. There
are a number of tools available for configuring ACLs in Cumulus Linux, including:
iptables, ip6tables and ebtables are Linux userspace tools to administer filtering rules for
IPv4 packets, IPv6 packets and Ethernet frames (layer 2 using MAC addresses) respectively.
NCLU (see page 82), a Cumulus Linux-specific userspace tool for configuring custom ACLs.
cl-acltool, another Cumulus Linux-specific userspace tool to administer filtering rules and for
configuring the default ACLs.
NCLU and cl-acltool operate on various configuration files, and use iptables, ip6tables and
ebtables to install rules into the kernel. In addition to programming rules in the kernel, NCLU and cl-
acltool program rules in hardware for interfaces involving switch port interfaces, which iptables,
ip6tables and ebtables cannot do on their own.
In many instances, you can use NCLU to configure ACLs; however, cl-acltool must be used in
some cases. The examples below show when to use which tool.
If you need help getting started setting up ACLs, run net example acl to see a basic
configuration:
Click to see the example ...
Contents
This chapter covers ...
Understanding Traffic Rules In Cumulus Linux (see page 137)
Understanding Chains (see page 137)
cumulusnetworks.com 135
Cumulus Linux 3.5 User Guide
Understanding Chains
Netfilter describes the mechanism for which packets are classified and controlled in the Linux kernel.
Cumulus Linux uses the Netfilter framework to control the flow of traffic to, from and across the switch.
Netfilter does not require a separate software daemon to run because it is part of the Linux kernel itself.
Netfilter asserts policies at layers 2, 3 and 4 of the OSI model by inspecting packet and frame headers
based on a list of rules. Rules are defined using syntax provided by the iptables, ip6tables and
ebtables userspace applications.
The rules created by these programs inspect or operate on packets at several points in the life of the
packet through the system. These five points are known as chains and are shown here:
cumulusnetworks.com 137
Cumulus Linux 3.5 User Guide
INPUT: Touches packets once they are determined to be destined for the local system but before
they are received by the control plane software
Understanding Tables
When building rules to affect the flow of traffic, the individual chains can be accessed by tables. Linux
provides three tables by default:
Filter: Classifies traffic or filters traffic
NAT: Applies Network Address Translation rules
Understanding Rules
138 02 March 2018
Cumulus Networks
Understanding Rules
Rules are the items that actually classify traffic to be acted upon. Rules are applied to chains, which are
attached to tables, similar to the graphic below.
Rules have several different components; the examples below highlight those different components.
Table: The first argument is the table. Notice the second example does not specify a table, that is
because the filter table is implied if a table is not specified.
Chain: The second argument is the chain. Each table supports several different chains. See
Understanding Tables above.
Matches: The third argument(s) are called the matches. You can specify multiple matches in a single
rule. However, the more matches you use in a rule, the more memory that rule consumes.
Jump: The jump specifies the target of the rule; that is, what action to take if the packet matches the
rule. If this option is omitted in a rule, then matching the rule will have no effect on the packet's fate,
but the counters on the rule will be incremented.
Target(s): The target can be a user-defined chain (other than the one this rule is in), one of the
special built-in targets that decides the fate of the packet immediately (like DROP), or an extended
target. See the Supported Rule Types and Common Usages (see page 154) section below for
examples of different targets.
ebtables
cumulusnetworks.com 139
Cumulus Linux 3.5 User Guide
ebtables
When rules are combined and put into one table, the order determines the relative priority of the rules;
iptables and ip6tables have the highest precedence and ebtables has the lowest.
The Linux packet forwarding construct is an overlay for how the silicon underneath processes packets; to
that end, here are some things to be aware of:
The order of operations for how rules are processed is not perfectly maintained when you compare
how iptables and the switch silicon process packets. The switch silicon reorders rules when
switchd writes to the ASIC, whereas traditional iptables executes the list of rules in order.
All rules are terminating. This means once a rule is matched, the action is carried out, and no more
rules are processed. The exception to this is when a SETCLASS rule is placed immediately before
another rule; this exists multiple times in the default ACL configuration.
In the example below, the SETCLASS action applied with the --in-interface option, creates the
internal ASIC classification, and continues to process the next rule, which does the rate-limiting for
the matched protocol:
If multiple contiguous rules with the same match criteria are applied to --in-interface
, only the first rule gets processed and then terminates processing. This would also be a
misconfiguration, because there is no reason to have duplicate rules with different
actions.
When processing traffic, rules affecting the FORWARD chain that specify an ingress interface are
performed prior to rules that match on an egress interface. As a workaround, rules that only affect
the egress interface can have an ingress interface wildcard (currently, only swp+ and bond+ are
supported as wildcard names; see below) that matches any interface applied so that you can
maintain order of operations with other input interface rules. Take the following rules, for example:
If you modify the rules like this, they are performed in order:
When using rules that do a mangle and a filter lookup for a packet, Cumulus Linux does them in
parallel and combines the action.
If a switch port is assigned to a bond, any egress rules must be assigned to the bond.
When using the OUTPUT chain, rules must be assigned to the source. For example, if a rule is
assigned to the switch port in the direction of traffic but the source is a bridge (VLAN), the traffic
won't be affected by the rule and must be applied to the bridge.
If all transit traffic needs to have a rule applied, use the FORWARD chain, not the OUTPUT chain.
ebtablesrules are put into either the IPv4 or IPv6 memory space depending on whether the rule
utilizes IPv4 or IPv6 to make a decision. Layer 2-only rules, which match the MAC address, are put
into the IPv4 memory space.
On Broadcom switches, the ingress INPUT chain rules match layer 2/layer 3 multicast packets
before multicast packet replication has occurred; hence, having a DROP rule affects all copies.
If you set an output flag with the INPUT chain you will get an error. For example, running cl-
acltool -i on the following rule:
However, simply removing the -o option and interface would make it a valid rule.
cumulusnetworks.com 141
Cumulus Linux 3.5 User Guide
To increase the number of ACL rules that may be configured, configure the switch to operate in nonatomic
mode.
Instead of reserving 50% of your TCAM space for atomic updates, incremental update instead uses the
142 02 March 2018
Cumulus Networks
Instead of reserving 50% of your TCAM space for atomic updates, incremental update instead uses the
available free space to write the new TCAM rules and swap over to the new rules once this is completed.
Cumulus Linux then deletes the old rules and frees up the original TCAM space. If there is insufficient free
space to complete this task, the original nonatomic update is performed, which interrupts traffic.
1. Updates are performed incrementally, one table at a time without stopping traffic.
2. Cumulus Linux checks if a table's rules have changed since the last time they were installed; if a table
does not have any changes, then it is not reinstalled.
3. If there are changes in a table, then the new rules are populated in new groups or slices in hardware,
cumulusnetworks.com 143
Cumulus Linux 3.5 User Guide
3. If there are changes in a table, then the new rules are populated in new groups or slices in hardware,
then that table is switched over to the new groups or slices.
4. Finally, old resources for that table are freed. This process is repeated for each of the tables listed
above.
5. If there are not sufficient resources to hold both the new rule set and old rule set, then the regular
nonatomic mode is attempted. This will interrupt network traffic.
6. If the regular nonatomic update fails, Cumulus Linux reverts back to the previous rules.
1. Edit /etc/cumulus/switchd.conf.
2. Add the following line to the file:
acl.non_atomic_update_mode = TRUE
3. Restart switchd:
During regular non-incremental nonatomic updates, traffic is stopped first, and enabled after the
new configuration is written into the hardware completely.
Appears to work, and the rule appears when you run cl-acltool -L:
TABLE filter :
Chain INPUT (policy ACCEPT 72 packets, 5236 bytes)
However, the rule is not synced to hardware when applied in this fashion and running cl-acltool -i or
reboot removes the rule without replacing it. To ensure all rules that can be in hardware are hardware
accelerated, place them in /etc/cumulus/acl/policy.conf and install them by running cl-acltool
-i.
An entry with multiple comma-separated output interfaces is split into one rule for each output
interface (listed after --out-interface below). This entry splits into two rules:
An entry with both input and output comma-separated interfaces is split into one rule for each
combination of input and output interface (listed after --in-interface and --out-interface
below). This entry splits into four rules:
An entry with multiple L4 port ranges is split into one rule for each range (listed after --dports
below). For example, this entry splits into two rules:
cumulusnetworks.com 145
Cumulus Linux 3.5 User Guide
[ebtables]
-A FORWARD -i br0.100 -p IPv4 --ip-protocol icmp -j DROP
-A FORWARD -o br0.100 -p IPv4 --ip-protocol icmp -j ACCEPT
[iptables]
-A FORWARD -i br0.100 -p icmp -j DROP
-A FORWARD --out-interface br0.100 -p icmp -j ACCEPT
-A FORWARD --in-interface br0.100 -j POLICE --set-mode pkt --set-
rate 1 --set-burst 1 --set-class 0
[ebtables]
-A FORWARD -i br0 -p IPv4 --ip-protocol icmp -j DROP
-A FORWARD -o br0 -p IPv4 --ip-protocol icmp -j ACCEPT
[iptables]
-A FORWARD -i br0 -p icmp -j DROP
-A FORWARD --out-interface br0 -p icmp -j ACCEPT
-A FORWARD --in-interface br0 -j POLICE --set-mode pkt --set-rate
1 --set-burst 1 --set-class 0
You would create this rule, called EXAMPLE1, using NCLU like this:
All options, such as the -j and -p, even FORWARD in the above rule, are added automatically when you
apply the rule to the control plane; NCLU figures it all out for you.
You can also set a priority value, which specifies the order in which the rules get executed, and the order in
which they appear in the rules file. Lower numbers are executed first. To add a new rule in the middle, first
run net show config acl, which displays the priority numbers. Otherwise, new rules get appended to
the end of the list of rules in the nclu_acl.conf and 50_nclu_acl.rules files.
If you need to hand edit a rule, don’t edit the 50_nclu_acl.rules file. Instead, edit the
nclu_acl.conf file.
After you add the rule, you need to apply it to an inbound or outbound interface using net add int acl;
swp1 is the inbound interface in our example:
After you commit your changes, you can verify the rule you created with NCLU by running net show
configuration acl:
interface swp1
acl ipv4 EXAMPLE1 inbound
Or you can see all of the rules installed by running cat on the 50_nclu_acl.rules file:
cumulusnetworks.com 147
Cumulus Linux 3.5 User Guide
For INPUT and FORWARD rules, apply the rule to a control plane interface using net add control-
plane:
This deletes all rules from the 50_nclu_acl.rules file with that name. It also deletes the interfaces
referenced in the nclu_acl.conf file.
To examine the current state of chains and list all installed rules, run:
TABLE filter :
Chain INPUT (policy ACCEPT 90 packets, 14456 bytes)
pkts bytes target prot opt in out source destination
0 0 DROP all -- swp+ any 240.0.0.0/5 anywhere
0 0 DROP all -- swp+ any loopback/8 anywhere
0 0 DROP all -- swp+ any base-address.mcast.net/8 anywhere
0 0 DROP all -- swp+ any 255.255.255.255 anywhere ...
To list installed rules using native iptables, ip6tables and ebtables, use the -L option with the
respective commands:
If the install fails, ACL rules in the kernel and hardware are rolled back to the previous state. Errors from
programming rules in the kernel or ASIC are reported appropriately.
By default:
ACL policy files are located in /etc/cumulus/acl/policy.d/.
All *.rules files in this directory are included in /etc/cumulus/acl/policy.conf.
All files included in this policy.conf file are installed when the switch boots up.
The policy.conf file expects rules files to have a .rules suffix as part of the file name.
[iptables]
-A INPUT --in-interface swp1 -p tcp --dport 80 -j ACCEPT
-A FORWARD --in-interface swp1 -p tcp --dport 80 -j ACCEPT
[ip6tables]
-A INPUT --in-interface swp1 -p tcp --dport 80 -j ACCEPT
-A FORWARD --in-interface swp1 -p tcp --dport 80 -j ACCEPT
cumulusnetworks.com 149
Cumulus Linux 3.5 User Guide
[ebtables]
-A INPUT -p IPv4 -j ACCEPT
-A FORWARD -p IPv4 -j ACCEPT
You can use wildcards or variables to specify chain and interface lists to ease administration of rules.
Interface Wildcards
Currently only swp+ and bond+ are supported as wildcard names. There may be kernel
restrictions in supporting more complex wildcards likes swp1+ etc.
INGRESS = swp+
INPUT_PORT_CHAIN = INPUT,FORWARD
[iptables]
-A $INPUT_PORT_CHAIN --in-interface $INGRESS -p tcp --dport 80 -j
ACCEPT
[ip6tables]
-A $INPUT_PORT_CHAIN --in-interface $INGRESS -p tcp --dport 80 -j
ACCEPT
[ebtables]
-A INPUT -p IPv4 -j ACCEPT
ACL rules for the system can be written into multiple files under the default /etc/cumulus/acl/policy.
d/ directory. The ordering of rules during installation follows the sort order of the files based on their file
names.
Use multiple files to stack rules. The example below shows two rules files separating rules for management
and datapath traffic:
cumulus@switch:~$ ls /etc/cumulus/acl/policy.d/
00sample_mgmt.rules 01sample_datapath.rules
cumulus@switch:~$ cat /etc/cumulus/acl/policy.d/00sample_mgmt.rules
INGRESS_INTF = swp+
INGRESS_CHAIN = INPUT
[iptables]
# protect the switch management
-A $INGRESS_CHAIN --in-interface $INGRESS_INTF -s 10.0.14.2 -d
10.0.15.8 -p tcp -j ACCEPT
-A $INGRESS_CHAIN --in-interface $INGRESS_INTF -s 10.0.11.2 -d
10.0.12.8 -p tcp -j ACCEPT
-A $INGRESS_CHAIN --in-interface $INGRESS_INTF -d 10.0.16.8 -p udp -j
DROP
[iptables]
-A $INGRESS_CHAIN --in-interface $INGRESS_INTF -s 192.0.2.5 -p icmp -
j ACCEPT
-A $INGRESS_CHAIN --in-interface $INGRESS_INTF -s 192.0.2.6 -d
192.0.2.4 -j DROP
-A $INGRESS_CHAIN --in-interface $INGRESS_INTF -s 192.0.2.2 -d
192.0.2.8 -j DROP
#
# This file is a master file for acl policy file inclusion
#
# Note: This is not a file where you list acl rules.
#
# This file can contain:
# - include lines with acl policy files
# example:
# include <filepath>
#
cumulusnetworks.com 151
Cumulus Linux 3.5 User Guide
include /etc/cumulus/acl/policy.d/01_new.datapathacl
On Broadcom platforms, IPv6 egress rules are not supported. In certain cases the ebtables
egress rules can match switched IPv6 packets based on the etype field of the packet.
In the tables below, the default rules count toward the limits listed. The raw limits below assume only one
ingress and one egress table are present.
Ingress limit with 256 (36 default) 256 (29 default) 768 (36 default) 768 (29 default)
default rules
Ingress limit with 2048 (36 default) 3072 (29 default) 6144 (36 default) 6144 (29 default)
default rules
Ingress limit with 512 (36 default) 768 (29 default) 1536 (36 default) 1536 (29 default)
default rules
Ingress limit with 768 (36 default) 384 (29 default) 1792 (36 default) 896 (29 default)
default rules
cumulusnetworks.com 153
Cumulus Linux 3.5 User Guide
Profile Atomic Mode IPv4 Atomic Mode IPv6 Nonatomic Mode Nonatomic Mode
Rules Rules IPv4 Rules IPv6 Rules
To learn more about any of the options shown in the tables below, run iptables -h [name
of option]. The same help syntax works for options for ip6tables and ebtables.
Click to see an example of help syntax for an ebtables target
cumulusnetworks.com 155
Cumulus Linux 3.5 User Guide
802.1p (CoS)
Extended Ulog
Targets
log
Unique to Cumulus Linux:
span
erspan
police
tricolorpolice
setclass
Common Examples
Counters on POLICE ACL rules in iptables do not currently show the packets that are dropped
due to those rules.
Use the POLICE target with iptables. POLICE takes these arguments:
--set-class value: Sets the system internal class of service queue configuration to value.
--set-rate value: Specifies the maximum rate in kilobytes (KB) or packets.
--set-burst value: Specifies the number of packets or kilobytes (KB) allowed to arrive
sequentially.
--set-mode string: Sets the mode in KB (kilobytes) or pkt (packets) for rate and burst size.
For example, to rate limit the incoming traffic on swp1 to 400 packets/second with a burst of 100 packets
/second and set the class of the queue for the policed traffic as 0, set this rule in your appropriate .rules
file:
Here is another example of control plane ACL rules to lock down the switch. You specify them in /etc
/cumulus/acl/policy.d/00control_plane.rules:
View the contents of the file ...
INGRESS_INTF = swp+
INGRESS_CHAIN = INPUT
INNFWD_CHAIN = INPUT,FORWARD
MARTIAN_SOURCES_4 = "240.0.0.0/5,127.0.0.0/8,224.0.0.0/8,
255.255.255.255/32"
MARTIAN_SOURCES_6 = "ff00::/8,::/128,::ffff:0.0.0.0/96,::1/128"
# Custom Policy Section
SSH_SOURCES_4 = "192.168.0.0/24"
NTP_SERVERS_4 = "192.168.0.1/32,192.168.0.4/32"
DNS_SERVERS_4 = "192.168.0.1/32,192.168.0.4/32"
SNMP_SERVERS_4 = "192.168.0.1/32"
[iptables]
-A $INNFWD_CHAIN --in-interface $INGRESS_INTF -s $MARTIAN_SOURCES_4 -
j DROP
-A $INGRESS_CHAIN --in-interface $INGRESS_INTF -p ospf -j POLICE --
set-mode pkt --set-rate 2000 --set-burst 2000 --set-class 7
-A $INGRESS_CHAIN --in-interface $INGRESS_INTF -p tcp --dport bgp -j
POLICE --set-mode pkt --set-rate 2000 --set-burst 2000 --set-class 7
-A $INGRESS_CHAIN --in-interface $INGRESS_INTF -p tcp --sport bgp -j
POLICE --set-mode pkt --set-rate 2000 --set-burst 2000 --set-class 7
-A $INGRESS_CHAIN --in-interface $INGRESS_INTF -p icmp -j POLICE --
set-mode pkt --set-rate 100 --set-burst 40 --set-class 2
-A $INGRESS_CHAIN --in-interface $INGRESS_INTF -p udp --dport bootps:
bootpc -j POLICE --set-mode pkt --set-rate 100 --set-burst 100 --set-
class 2
-A $INGRESS_CHAIN --in-interface $INGRESS_INTF -p tcp --dport bootps:
bootpc -j POLICE --set-mode pkt --set-rate 100 --set-burst 100 --set-
class 2
-A $INGRESS_CHAIN --in-interface $INGRESS_INTF -p igmp -j POLICE --
set-mode pkt --set-rate 300 --set-burst 100 --set-class 6
# Custom policy
-A $INGRESS_CHAIN --in-interface $INGRESS_INTF -p tcp --dport 22 -s
$SSH_SOURCES_4 -j ACCEPT
-A $INGRESS_CHAIN --in-interface $INGRESS_INTF -p udp --sport 123 -s
$NTP_SERVERS_4 -j ACCEPT
-A $INGRESS_CHAIN --in-interface $INGRESS_INTF -p udp --sport 53 -s
$DNS_SERVERS_4 -j ACCEPT
cumulusnetworks.com 157
Cumulus Linux 3.5 User Guide
[iptables]
[iptables]
#Match and count the packets that match SSH traffic with DSCP EF
-A FORWARD -p tcp --dport 22 -m dscp --dscp 46 -j ACCEPT
#Match and count the packets in a port range with DSCP AF41
-A FORWARD -p tcp -s 10.0.0.17/32 --sport 10000:20000 -d 10.0.100.27
/32 --dport 10000:20000 -m dscp --dscp 34 -j ACCEPT
# Send 100 packets with a small payload on host1 with a DSCP value of
AF13 with a destination of host2:
cumulusnetworks.com 159
Cumulus Linux 3.5 User Guide
INGRESS_INTF = swp20,swp21
[iptables]
-A INPUT,FORWARD --in-interface $INGRESS_INTF -p tcp --syn -j DROP
[ip6tables]
-A INPUT,FORWARD --in-interface $INGRESS_INTF -p tcp --syn -j DROP
The --syn flag in the above rule matches packets with the SYN bit set and the ACK, RST and FIN bits are
cleared. It is equivalent to using -tcp-flags SYN,RST,ACK,FIN SYN. For example, the above rule could
be re-written as:
Example Scenario
The following example scenario demonstrates where several different rules are applied to show what is
possible.
cumulusnetworks.com 161
Cumulus Linux 3.5 User Guide
Following are the configurations for the two switches used in these examples. The configuration for each
switch appears in /etc/network/interfaces on that switch.
Switch 1 Configuration
...
/etc/network/interfaces
=======================
auto swp1
iface swp1
auto swp2
iface swp2
auto swp3
iface swp3
auto swp4
iface swp4
auto bond2
iface bond2
bond-slaves swp3 swp4
auto br-untagged
iface br-untagged
address 10.0.0.1/24
bridge_ports swp1 bond2
bridge_stp on
auto br-tag100
iface br-tag100
address 10.0.100.1/24
bridge_ports swp2.100 bond2.100
bridge_stp on
...
Switch 2 Configuration
...
/etc/network/interfaces
=======================
auto swp3
iface swp3
auto swp4
iface swp4
auto br-untagged
iface br-untagged
address 10.0.0.2/24
bridge_ports bond2
bridge_stp on
auto br-tag100
iface br-tag100
address 10.0.100.2/24
bridge_ports bond2.100
bridge_stp on
auto bond2
iface bond2
bond-slaves swp3 swp4
...
cumulusnetworks.com 163
Cumulus Linux 3.5 User Guide
Egress Rule
The following rule blocks any TCP with destination port 200 traffic going from host1 or host2 through the
switch (corresponding to rule 1 in the diagram above).
Ingress Rule
The following rule blocks any UDP traffic with source port 200 going from host1 through the switch
(corresponding to rule 2 in the diagram above).
Input Rule
The following rule blocks any UDP traffic with source port 200 and destination port 50 going from host1 to
the switch (corresponding to rule 3 in the diagram above).
Output Rule
The following rule blocks any TCP traffic with source port 123 and destination port 123 going from Switch 1
to host2 (corresponding to rule 4 in the diagram above).
Combined Rules
The following rule blocks any TCP traffic with source port 123 and destination port 123 going from any
switch port egress or generated from Switch 1 to host1 or host2 (corresponding to rules 1 and 4 in the
diagram above).
[iptables]
-A FORWARD -o swp+ -p tcp --sport 123 --dport 123 -j DROP
-A OUTPUT -o swp+ -p tcp --sport 123 --dport 123 -j DROP
Useful Links
www.netfilter.org
Netfilter.org packet filtering how-to
cumulusnetworks.com 165
Cumulus Linux 3.5 User Guide
acl.non_atomic_update_mode = TRUE
However, running cl-acltool -i or reboot removes them. To ensure all rules that can be in
hardware are hardware accelerated, place them in /etc/cumulus/acl/policy.conf and run
cl-acltool -i.
cumulusnetworks.com 167
Cumulus Linux 3.5 User Guide
cumulusnetworks.com 169
Cumulus Linux 3.5 User Guide
cumulusnetworks.com 171
Cumulus Linux 3.5 User Guide
IP Tables
cumulusnetworks.com 173
Cumulus Linux 3.5 User Guide
Destination IP:
Any
Set class is internal to the switch - it does not set any precedence bits.
IPv6 Tables
cumulusnetworks.com 175
Cumulus Linux 3.5 User Guide
Set class is internal to the switch - it does not set any precedence bits.
EB Tables
Set class is internal to the switch. It does not set any precedence bits.
cumulusnetworks.com 177
Cumulus Linux 3.5 User Guide
This feature is specific to switches on the Broadcom platform only; on Mellanox Spectrum
switches, the input port ACL does not have these issues when learning MAC addresses.
Create a configuration similar to the following, where you associate a port and VLAN with a given MAC
address, adding each one to the bridge:
auto swp1
iface swp1
auto swp2
iface swp2
auto swp3
iface swp3
auto bridge
iface bridge
bridge-ports swp1 swp2 swp3
bridge-pvid 1
bridge-vids 100 200 300
bridge-vlan-aware yes
pre-up bridge fdb add 00:00:00:00:00:11 dev swp1 master static
vlan 100
pre-up bridge fdb add 00:00:00:00:00:22 dev swp2 master static
vlan 200
pre-up bridge fdb add 00:00:00:00:00:33 dev swp3 master static
vlan 300
Alternately, if you need to list too many MAC addresses, you can run a script to create the same
configuration. For example, create a script called macs.txt and put in the bridge fdb add commands
for each MAC address you need to configure:
#!/bin/bash
bridge fdb add 00:00:00:00:00:11 dev swp1 master static vlan 100
bridge fdb add 00:00:00:00:00:22 dev swp2 master static vlan 200
bridge fdb add 00:00:00:00:00:33 dev swp3 master static vlan 300
bridge fdb add 00:00:00:00:00:44 dev swp4 master static vlan 400
bridge fdb add 00:00:00:00:00:55 dev swp5 master static vlan 500
bridge fdb add 00:00:00:00:00:66 dev swp6 master static vlan 600
auto swp1
iface swp1
auto swp2
iface swp2
auto swp3
iface swp3
auto swp4
iface swp4
auto swp5
iface swp5
auto swp6
iface swp6
auto bridge
iface bridge
bridge-ports swp1 swp2 swp3 swp4 swp5 swp6
bridge-pvid 1
bridge-vids 100 200 300
bridge-vlan-aware yes
pre-up bridge fdb add 00:00:00:00:00:11 dev swp1 master static
vlan 100
pre-up bridge fdb add 00:00:00:00:00:22 dev swp2 master static
vlan 200
cumulusnetworks.com 179
Cumulus Linux 3.5 User Guide
Contents
This chapter covers ...
Using systemd and the systemctl Command (see page 180)
Understanding the systemctl Subcommands (see page 181)
Ensuring a Service Starts after Multiple Restarts (see page 181)
Keeping systemd Services from Hanging after Starting (see page 182)
Identifying Active Listener Ports for IPv4 and IPv6 (see page 182)
Identifying Daemons Currently Active or Stopped (see page 183)
Identifying Essential Services (see page 187)
Unlike the service command in Debian Wheezy, the service name is written after the systemctl
subcommand, not before it.
cumulusnetworks.com 181
Cumulus Linux 3.5 User Guide
To clear this error, run systemctl reset-failed switchd.service. If you know you are going to
restart frequently (multiple times within the StartLimitInterval), you can run the same command before you
issue the restart request. This also applies to stop followed by start.
cumulus@switch:~$ cl-service-summary
Service cron enabled active
Service ssh enabled active
Service syslog enabled active
Service neighmgrd enabled active
Service clagd enabled active
Service lldpd enabled active
Service mstpd enabled active
Service poed inactive
Service portwd inactive
Service ptmd enabled active
Service pwmd enabled active
Service smond enabled active
Service switchd enabled active
Service vxrd disabled inactive
Service vxsnd disabled inactive
Service bgpd disabled inactive
Service isisd disabled inactive
Service ospf6d disabled inactive
Service ospfd disabled inactive
Service rdnbrd disabled inactive
Service ripd disabled inactive
Service ripngd disabled inactive
Service zebra disabled inactive
You can also run systemctl list-unit-files --type service to list all services on the switch and
see which ones are enabled:
Click here to see output of this command ...
cumulusnetworks.com 183
Cumulus Linux 3.5 User Guide
auditd.service enabled
[email protected] disabled
bootlog.service enabled
bootlogd.service masked
bootlogs.service masked
bootmisc.service masked
checkfs.service masked
checkroot-bootclean.service masked
checkroot.service masked
clagd.service enabled
clcmd.service enabled
console-getty.service disabled
console-shell.service disabled
[email protected] static
cron.service enabled
cryptdisks-early.service masked
cryptdisks.service masked
cumulus-aclcheck.service static
cumulus-core.service static
cumulus-fastfailover.service enabled
cumulus-firstboot.service disabled
cumulus-platform.service enabled
cumulus-support.service static
dbus-org.freedesktop.hostname1.service static
dbus-org.freedesktop.locale1.service static
dbus-org.freedesktop.login1.service static
dbus-org.freedesktop.machine1.service static
dbus-org.freedesktop.timedate1.service static
dbus.service static
debian-fixup.service static
debug-shell.service disabled
decode-syseeprom.service static
dhcpd.service disabled
dhcpd6.service disabled
[email protected] disabled
[email protected] disabled
dhcrelay.service enabled
dhcrelay6.service disabled
[email protected] disabled
[email protected] disabled
dm-event.service disabled
dns-watcher.service disabled
dnsmasq.service enabled
emergency.service static
fuse.service masked
getty-static.service static
[email protected] enabled
halt-local.service static
halt.service masked
[email protected] static
hostname.service masked
hsflowd.service enabled
[email protected] enabled
hwclock-save.service enabled
hwclock.service masked
hwclockfirst.service masked
[email protected] static
initrd-cleanup.service static
initrd-parse-etc.service static
initrd-switch-root.service static
initrd-udevadm-cleanup-db.service static
killprocs.service masked
kmod-static-nodes.service static
kmod.service static
ledmgrd.service enabled
lldpd.service enabled
lm-sensors.service enabled
lvm2-activation-early.service enabled
lvm2-activation.service enabled
lvm2-lvmetad.service static
lvm2-monitor.service enabled
[email protected] static
lvm2.service disabled
module-init-tools.service static
motd.service masked
mountall-bootclean.service masked
mountall.service masked
mountdevsubfs.service masked
mountkernfs.service masked
mountnfs-bootclean.service masked
mountnfs.service masked
mstpd.service enabled
netd.service enabled
netq-agent.service disabled
networking.service enabled
ntp.service enabled
[email protected] disabled
openvswitch-vtep.service disabled
phy-ucode-update.service enabled
portwd.service enabled
procps.service static
ptmd.service enabled
pwmd.service enabled
frr.service enabled
quotaon.service static
rc-local.service static
rc.local.service static
rdnbrd.service disabled
reboot.service masked
rescue.service static
rmnologin.service masked
rsyslog.service enabled
screen-cleanup.service masked
sendsigs.service masked
cumulusnetworks.com 185
Cumulus Linux 3.5 User Guide
[email protected] disabled
single.service masked
smond.service enabled
snmpd.service disabled
[email protected] disabled
snmptrapd.service disabled
[email protected] disabled
ssh.service enabled
[email protected] disabled
sshd.service enabled
stop-bootlogd-single.service masked
stop-bootlogd.service masked
stopssh.service enabled
sudo.service disabled
switchd-diag.service static
switchd.service enabled
syslog.service enabled
sysmonitor.service static
systemd-ask-password-console.service static
systemd-ask-password-wall.service static
[email protected] static
systemd-binfmt.service static
systemd-fsck-root.service static
[email protected] static
systemd-halt.service static
systemd-hibernate.service static
systemd-hostnamed.service static
systemd-hybrid-sleep.service static
systemd-initctl.service static
systemd-journal-flush.service static
systemd-journald.service static
systemd-kexec.service static
systemd-localed.service static
systemd-logind.service static
systemd-machined.service static
systemd-modules-load.service static
systemd-networkd-wait-online.service disabled
systemd-networkd.service disabled
[email protected] disabled
systemd-poweroff.service static
systemd-quotacheck.service static
systemd-random-seed.service static
systemd-readahead-collect.service disabled
systemd-readahead-done.service static
systemd-readahead-drop.service disabled
systemd-readahead-replay.service disabled
systemd-reboot.service static
systemd-remount-fs.service static
systemd-resolved.service disabled
[email protected] static
systemd-setup-dgram-qlen.service static
systemd-shutdownd.service static
systemd-suspend.service static
systemd-sysctl.service static
systemd-timedated.service static
systemd-timesyncd.service disabled
systemd-tmpfiles-clean.service static
systemd-tmpfiles-setup-dev.service static
systemd-tmpfiles-setup.service static
systemd-udev-settle.service static
systemd-udev-trigger.service static
systemd-udevd.service static
systemd-update-utmp-runlevel.service static
systemd-update-utmp.service static
systemd-user-sessions.service static
udev-finish.service static
udev.service static
umountfs.service masked
umountnfs.service masked
umountroot.service masked
update-ports.service enabled
urandom.service static
[email protected] static
uuidd.service static
vboxadd-service.service enabled
vboxadd-x11.service enabled
vboxadd.service enabled
vxrd.service disabled
vxsnd.service disabled
wd_keepalive.service enabled
x11-common.service masked
ztp-init.service enabled
ztp.service disabled
191 unit files listed.
lines 147-194/194 (END)
cumulusnetworks.com 187
Cumulus Linux 3.5 User Guide
networking.service
switchd.service
wd_keepalive.service
network-pre.target
bootlog.service
systemd-readahead-done.service
systemd-readahead-done.timer
systemd-update-utmp-runlevel.service
graphical.target
systemd-update-utmp-runlevel.service
Configuring switchd
switchd is the daemon at the heart of Cumulus Linux. It communicates between the switch and Cumulus
Linux, and all the applications running on Cumulus Linux.
The switchd configuration is stored in /etc/cumulus/switchd.conf.
Versions of Cumulus Linux prior to 2.1 stored the switchd configuration at /etc/default
/switchd.
Contents
This chapter covers ...
The switchd File System (see page 188)
Configuring switchd Parameters (see page 190)
Restarting switchd (see page 190)
cumulusnetworks.com 189
Cumulus Linux 3.5 User Guide
| `-- route
| |-- count_0
| |-- count_1
| |-- count_total
| |-- count_v4
| |-- count_v6
| |-- mask_limit
| |-- max_0
| |-- max_1
| `-- max_total
`-- version
To modify the configuration, run cl-cfg -w. For example, to set the buffer utilization measurement
interval to 1 minute, run:
You can get some of this information by running cl-resource-query; though you cannot
update the switchd configuration with it.
Restarting switchd
Whenever you modify any switchd hardware configuration file (typically changing any *.conf file that
requires making a change to the switching hardware, like /etc/cumulus/datapath/traffic.conf),
you must restart switchd for the change to take effect:
You do not have to restart the switchd service when you update a network interface
configuration (that is, edit /etc/network/interfaces).
Restarting switchd causes all network ports to reset in addition to resetting the switch hardware
configuration.
Contents
This chapter covers ...
How It Works (see page 191)
About Link State and PoE State (see page 192)
Configuring PoE (see page 192)
poectl Arguments (see page 195)
Troubleshooting PoE and PoE+ (see page )
Verify the Link Is Up (see page 197)
View LLDP Information Using lldpcli (see page 197)
View LLDP Information Using tcpdump (see page 198)
Logging poed Events in syslog (see page 199)
How It Works
PoE functionality is provided by the cumulus-poe package. When a powered device is connected to the
switch via an Ethernet cable:
If the available power is greater than the power required by the connected device, power is supplied
to the switch port, and the device powers on
If available power is less than the power required by the connected device and the switch port's
priority is less than the port priority set on all powered ports, power is not supplied to the port
cumulusnetworks.com 191
Cumulus Linux 3.5 User Guide
If available power is less than the power required by the connected device and the switch port's
priority is greater than the priority of a currently powered port, power is removed from lower
priority port(s) and power is supplied to the port
If the total consumed power exceeds the configured power limit of the power source, low priority
ports are turned off. In the case of a tie, the port with the lower port number gets priority
Power is available as follows:
920W x 750W
x 920W 750W
The AS4610-54P has an LED on the front panel to indicate PoE status:
Green: The poed daemon is running and no errors are detected
Yellow: One or more errors are detected or the poed daemon is not running
Configuring PoE
You use the poectl command utility to configure PoE on a switch that supports the feature. You can:
Enable or disable PoE for a given switch port
Set a switch port's PoE priority to one of three values: low, high or critical
The PoE configuration resides in /etc/cumulus/poe.conf. The file lists all the switch ports, whether PoE
is enabled for those ports and the priority for each port.
Sample poe.conf file ...
[enable]
swp1 = enable
swp2 = enable
swp3 = enable
swp4 = enable
swp5 = enable
swp6 = enable
swp7 = enable
swp8 = enable
swp9 = enable
swp10 = enable
swp11 = enable
swp12 = enable
swp13 = enable
swp14 = enable
swp15 = enable
swp16 = enable
swp17 = enable
swp18 = enable
swp19 = enable
swp20 = enable
swp21 = enable
swp22 = enable
swp23 = enable
swp24 = enable
swp25 = enable
swp26 = enable
swp27 = enable
swp28 = enable
swp29 = enable
swp30 = enable
swp31 = enable
swp32 = enable
swp33 = enable
swp34 = enable
swp35 = enable
swp36 = enable
swp37 = enable
swp38 = enable
swp39 = enable
swp40 = enable
swp41 = enable
swp42 = enable
swp43 = enable
swp44 = enable
swp45 = enable
swp46 = enable
swp47 = enable
swp48 = enable
[priority]
swp1 = low
swp2 = low
swp3 = low
swp4 = low
swp5 = low
swp6 = low
swp7 = low
swp8 = low
swp9 = low
swp10 = low
swp11 = low
swp12 = low
swp13 = low
swp14 = low
cumulusnetworks.com 193
Cumulus Linux 3.5 User Guide
swp15 = low
swp16 = low
swp17 = low
swp18 = low
swp19 = low
swp20 = low
swp21 = low
swp22 = low
swp23 = low
swp24 = low
swp25 = low
swp26 = low
swp27 = low
swp28 = low
swp29 = low
swp30 = low
swp31 = low
swp32 = low
swp33 = low
swp34 = low
swp35 = low
swp36 = low
swp37 = low
swp38 = low
swp39 = low
swp40 = low
swp41 = low
swp42 = low
swp43 = low
swp44 = low
swp45 = low
swp46 = low
swp47 = low
swp48 = low
By default, PoE and PoE+ are enabled on all Ethernet/1G switch ports, and these ports are set with a low
priority. Switch ports can have low, high or critical priority.
There is no additional configuration for PoE+.
To change the priority for one or more switch ports, run poectl -p swp# [low|high|critical]. For
example:
To display PoE information for a set of switch ports, run poectl -i [port_numbers]:
cumulus@switch:~$ poectl -s
System power:
Total: 730.0 W
Used: 11.0 W
Available: 719.0 W
Connected ports:
swp11, swp24, swp27, swp48
The set commands (priority, enable, disable) either succeed silently or display an error message if the
command fails.
poectl Arguments
The poectl command takes the following arguments:
cumulusnetworks.com 195
Cumulus Linux 3.5 User Guide
Argument Description
-i, --port- Returns detailed information for the specified ports. You can specify a range of ports. For
info example:
PORT_LIST -i swp1-swp5,swp10
-a, --all Returns PoE status and detailed information for all ports.
-p, -- Sets priority for the specified ports: low, high, critical.
priority
PORT_LIST
PRIORITY
-r, --reset Performs a hardware reset on the specified ports. Use this if one or more ports are stuck in
PORT_LIST an error state. This does not reset any configuration settings for the specified ports.
--save Saves the current configuration. The saved configuration is automatically loaded on system
boot.
cumulusnetworks.com 197
Cumulus Linux 3.5 User Guide
cumulusnetworks.com 199
Cumulus Linux 3.5 User Guide
1. In a terminal, create a new file in the /etc/profile.d/ directory. In the code example below, the
file is called proxy.sh, and is created using the text editor nano.
2. Add a line to the file to configure either an HTTP or an HTTPS proxy, or both:
HTTP proxy:
http_proxy=http://myproxy.domain.com:8080
export http_proxy
HTTPS proxy:
https_proxy=https://myproxy.domain.com:8080
export https_proxy
3. Create a file in the /etc/apt/apt.conf.d directory and add the following lines to the file for
acquiring the HTTP and HTTPS proxies; the example below uses http_proxy as the file name:
4. Add the proxy addresses to /etc/wgetrc; you may have to uncomment the http_proxy and
https_proxy lines:
https_proxy = https://myproxy.domain.com:8080
http_proxy = http://myproxy.domain.com:8080
...
5. Run the source command, to execute the file in the current environment:
The proxy is now configured. The echo command can be used to confirm a proxy is set up correctly:
HTTP proxy:
HTTPS proxy:
Related Information
Setting up an apt package cache
HTTP API
Cumulus Linux 3.4+ implements an HTTP application programing interface to OpenStack ML2 driver (see
page 915) and NCLU (see page 82). Rather than accessing a Cumulus Linux host with SSH, you can
interact with a server with a HTTP client, such as cURL, HTTPie or a web browser.
The HTTP API service is enabled by default on chassis hardware only. However, the associated
server is configured to only listen to traffic originating from within the chassis.
The service is not enabled by default on non-chassis hardware.
Contents
This chapter covers ...
Getting Started (see page 201)
Configuration (see page 202)
Enable External Traffic on a Chassis (see page 203)
IP and Port Settings (see page 203)
Security (see page 203)
Authentication (see page 203)
Transport Layer Security (see page 204)
cURL Examples (see page 204)
Getting Started
cumulusnetworks.com 201
Cumulus Linux 3.5 User Guide
Getting Started
If you are upgrading from a version of Cumulus Linux earlier than 3.4.0, the supporting software
for the API may not be installed. Install the required software with the following command.
To enable the HTTP API service, run the following systemd command:
If you are running Cumulus Linux 3.5.0 only, you also need to enable the restserver-
gunicorn service:
Use the systemctl start and systemctl stop commands to start/stop the HTTP API service:
Configuration
There are two configuration files associated with the HTTP API services:
/etc/nginx/sites-available/nginx-restapi.conf
/etc/nginx/sites-available/nginx-restapi-chassis.conf
The first configuration file is used for non-chassis hardware; the second, for chassis hardware.
Generally, only the configuration file relevant to your hardware needs to be edited, as the associated
services determine the appropriate configuration file to use at run time.
If the configuration file is not valid, return to step 1; review any changes that were made, and correct
the errors.
5. Restart the daemons:
For more information on the listen directive, refer to the NGINX documentation.
Do not set the same listening port for internal and external chassis traffic.
Security
Authentication
The default configuration requires all HTTP requests from external sources (not internal switch traffic) to set
the HTTP Basic Authentication header.
cumulusnetworks.com 203
Cumulus Linux 3.5 User Guide
The user and password should correspond to a user on the host switch.
Do not copy the cumulus.pem or cumulus.key files. After installation, edit the “ssl_certificate”
and “ssl_certificate_key” values in the configuration file for your hardware.
cURL Examples
This section contains several example cURL commands for sending HTTP requests to a non-chassis host.
The following settings are used for these examples:
Username: user
Password: pw
IP: 192.168.0.32
Port: 8080
Requests for NCLU require setting the Content-Type request header to be set to application
/json.
cURL’s -k flag is necessary when the server uses a self-signed certificate. This is the default
configuration (see the Security section (see page 203)). To display the response headers, include -
D flag in the command.
By default, ifupdown is quiet; use the verbose option -v when you want to know what is going
on when bringing an interface down or up.
Contents
This chapter covers ...
Basic Commands (see page 206)
Using NCLU to Set the Admin State of an Interface (see page 207)
ifupdown2 Interface Classes (see page 207)
Bringing All auto Interfaces Up or Down (see page 208)
Configuring a Loopback Interface (see page 209)
ifupdown Behavior with Child Interfaces (see page 209)
ifupdown2 Interface Dependencies (see page 210)
ifup Handling of Upper (Parent) Interfaces (see page 213)
Configuring IP Addresses (see page 214)
Specifying IP Address Scope (see page 215)
Purging Existing IP Addresses on an Interface (see page 216)
Specifying User Commands (see page 217)
Sourcing Interface File Snippets (see page 218)
Using Globs for Port Lists (see page 218)
Using Templates (see page 219)
Commenting out Mako Templates (see page 220)
Adding Descriptions to Interfaces (see page 220)
Caveats and Errata (see page 221)
Related Information (see page 222)
Basic Commands
To bring up an interface or apply changes to an existing interface, run:
ifdown always deletes logical interfaces after bringing them down. Use the --admin-state
option if you only want to administratively bring the interface up or down.
To see the link and administrative state, use the ip link show command:
In this example, swp1 is administratively UP and the physical link is UP (LOWER_UP flag). More information
on interface administrative state and physical state can be found in this knowledge base article.
auto swp1
iface swp1
link-down yes
cumulusnetworks.com 207
Cumulus Linux 3.5 User Guide
auto swp1
iface swp1
You can add other classes using the allow prefix. For example, if you have multiple interfaces used for
uplinks, you can make up a class called uplinks:
auto swp1
allow-uplink swp1
iface swp1 inet static
address 10.1.1.1/31
auto swp2
allow-uplink swp2
iface swp2 inet static
address 10.1.1.3/31
This allows you to perform operations on only these interfaces using the --allow-uplinks option, or still use
the -a options since these interfaces are also in the auto class:
Another example where this feature is useful is if you're using Management VRF (see page 717), you can
use the special interface class called mgmt, and put the management interface into that class.
The mgmt interface class is not supported if you are configuring Cumulus Linux using NCLU (see
page 82).
allow-mgmt eth0
iface eth0 inet dhcp
vrf mgmt
allow-mgmt mgmt
iface mgmt
address 127.0.0.1/8
vrf-table auto
All ifupdown2 commands (ifup, ifdown, ifquery, ifreload) can take a class. Include the --
allow=<class> option when you run the command. For example, to reload the configuration for the
management interface described above, run:
To reload all network interfaces marked auto, use the ifreload command, which is equivalent to running
ifdown then ifup, the one difference being that ifreload skips any configurations that didn't change):
cumulusnetworks.com 209
Cumulus Linux 3.5 User Guide
For more information on the bridge in traditional mode vs the bridge in VLAN-aware mode, please read this
knowledge base article.
auto bond1
iface bond1
address 100.0.0.2/16
bond-slaves swp29 swp30
auto bond2
iface bond2
address 100.0.0.5/16
bond-slaves swp31 swp32
auto br2001
iface br2001
address 12.0.1.3/24
bridge-ports bond1.2001 bond2.2001
bridge-stp on
Using ifup --with-depends br2001 brings up all dependents of br2001: bond1.2001, bond2.2001,
bond1, bond2, bond1.2001, bond2.2001, swp29, swp30, swp31, swp32.
Similarly, specifying ifdown --with-depends br2001 brings down all dependents of br2001: bond1.
2001, bond2.2001, bond1, bond2, bond1.2001, bond2.2001, swp29, swp30, swp31, swp32.
As mentioned earlier, ifdown2 always deletes logical interfaces after bringing them down. Use
the --admin-state option if you only want to administratively bring the interface up or down. In
terms of the above example, ifdown br2001 deletes br2001.
To guide you through which interfaces will be brought down and up, use the --print-dependency
option to get the list of dependents.
Use ifquery --print-dependency=list -a to get the dependency list of all interfaces:
cumulusnetworks.com 211
Cumulus Linux 3.5 User Guide
swp40 : None
swp25 : None
swp26 : None
swp29 : None
swp30 : None
swp31 : None
swp32 : None
You can use dot to render the graph on an external system where dot is installed.
auto br100
iface br100
bridge-ports bond1.100 bond2.100
auto bond1
iface bond1
bond-slaves swp1 swp2
If you run ifdown bond1, ifdown deletes bond1 and the VLAN interface on bond1 (bond1.100); it also
removes bond1 from the bridge br100. Next, when you run ifup bond1, it creates bond1 and the VLAN
interface on bond1 (bond1.100); it also executes ifup br100 to add the bond VLAN interface (bond1.100)
to the bridge br100.
cumulusnetworks.com 213
Cumulus Linux 3.5 User Guide
As you can see above, implicitly bringing up the upper interface helps, but there can be cases where an
upper interface (like br100) is not in the right state, which can result in warnings. The warnings are mostly
harmless.
If you want to disable these warnings, you can disable the implicit upper interface handling by setting
skip_upperifaces=1 in /etc/network/ifupdown2/ifupdown2.conf.
With skip_upperifaces=1, you will have to explicitly execute ifup on the upper interfaces. In this case,
you will have to run ifup br100 after an ifup bond1 to add bond1 back to bridge br100.
Although specifying a subinterface like swp1.100 and then running ifup swp1.100 will also
result in the automatic creation of the swp1 interface in the kernel, Cumulus Networks
recommends you specify the parent interface swp1 as well. A parent interface is one where any
physical layer configuration can reside, such as link-speed 1000 or link-duplex full.
It's important to note that if you only create swp1.100 and not swp1, then you cannot run ifup
swp1 since you did not specify it.
Configuring IP Addresses
IP addresses are configured with the net add interface command.
You can specify both IPv4 and IPv6 addresses for the same interface.
auto swp1
iface swp1
address 12.0.0.1/30
address 12.0.0.2/30
address 2001:DB8::1/126
The address method and address family are added by NCLU when needed, specifically
when you are creating DHCP or loopback interfaces.
auto lo
iface lo inet loopback
auto swp2
iface swp2
address 35.21.30.5/30
address 3101:21:20::31/80
scope link
When you run ifreload -a on this configuration, ifupdown2 considers all IP addresses as global.
cumulusnetworks.com 215
Cumulus Linux 3.5 User Guide
These commands create the following code snippet in the /etc/network/interfaces file:
auto swp6
iface swp6
post-up ip address add 71.21.21.20/32 dev swp6 scope site
These commands create the following configuration snippet in the /etc/network/interfaces file:
auto swp1
iface swp1
address-purge no
Purging existing addresses on interfaces with multiple iface stanzas is not supported. Doing so
216 02 March 2018
Cumulus Networks
Purging existing addresses on interfaces with multiple iface stanzas is not supported. Doing so
can result in the configuration of multiple addresses for an interface after you change an
interface address and reload the configuration with ifreload -a. If this happens, you must
shut down and restart the interface with ifup and ifdown, or manually delete superfluous
addresses with ip address delete specify.ip.address.here/mask dev DEVICE. See
also the Caveats and Errata (see page 221) section below for some cautions about using multiple
iface stanzas for the same interface.
auto swp1
iface swp1
address 12.0.0.1/30
post-up /sbin/foo bar
Any valid command can be hooked in the sequencing of bringing an interface up or down, although
commands should be limited in scope to network-related commands associated with the particular
interface.
For example, it wouldn't make sense to install some Debian package on ifup of swp1, even though that is
technically possible. See man interfaces for more details.
If your post-up command also starts, restarts or reloads any systemd service, you must use the
--no-block option with systemctl. Otherwise, that service or even the switch itself may hang
after starting or restarting.
For example, to restart the dhcrelay service after bringing up VLAN 100, first run:
cumulusnetworks.com 217
Cumulus Linux 3.5 User Guide
auto bridge
iface bridge
bridge-vids 100
bridge-vlan-aware yes
auto vlan100
iface vlan100
post-up systemctl --no-block restart dhcrelay.service
vlan-id 100
vlan-raw-device bridge
source /etc/network/interfaces.d/bond0
...
auto swp1
iface swp1
auto swp2
iface swp2
auto swp3
iface swp3
auto swp4
iface swp4
auto swp6
iface swp6
auto swp10
iface swp10
auto swp11
iface swp11
auto swp12
iface swp12
Using Templates
ifupdown2 supports Mako-style templates. The Mako template engine is run over the interfaces file
before parsing.
Use the template to declare cookie-cutter bridges in the interfaces file:
%for v in [11,12]:
auto vlan${v}
iface vlan${v}
address 10.20.${v}.3/24
bridge-ports glob swp19-20.${v}
bridge-stp on
%endfor
cumulusnetworks.com 219
Cumulus Linux 3.5 User Guide
%for i in [1,12]:
auto swp${i}
iface swp${i}
address 10.20.${i}.3/24
Regarding Mako syntax, use square brackets ([1,12]) to specify a list of individual numbers (in
this case, 1 and 12). Use range(1,12) to specify a range of interfaces.
You can test your template and confirm it evaluates correctly by running mako-render /etc
/network/interfaces.
For more examples of configuring Mako templates, read this knowledge base article.
auto swp1
iface swp1
alias hypervisor_port_1
You can query the interface description using NCLU. The alias appears in the Name column after the actual
interface name:
Interface descriptions also appear in the SNMP OID (see page 773) IF-MIB::ifAlias.
source /etc/interfaces.d/speed_settings
auto swp1
iface swp1
address 10.0.14.2/24
As well as /etc/interfaces.d/speed_settings
auto swp1
iface swp1
cumulusnetworks.com 221
Cumulus Linux 3.5 User Guide
link-speed 1000
link-duplex full
ifupdown2 correctly parses a configuration like this because the same attributes are not specified in
multiple iface stanzas.
And, as stated in the note above, you cannot purge existing addresses on interfaces with multiple iface
stanzas.
Related Information
Debian - Network Configuration
Linux Foundation - Bonds
Linux Foundation - Bridges
Linux Foundation - VLANs
man ifdown(8)
man ifquery(8)
man ifreload
man ifup(8)
man ifupdown-addons-interfaces(5)
man interfaces(5)
Contents
This chapter covers ...
Interface Types (see page 223)
Interface Settings (see page 223)
Differences between Broadcom-based and Mellanox-based Switches (see page 223)
Enabling Auto-negotiation (see page 224)
Default Interface Configuration Settings (see page 224)
Port Speed and Duplexing (see page 234)
MTU (see page 235)
Setting a Policy for Global System MTU (see page 237)
Creating a Default Policy for Various Interface Settings (see page 237)
Configuring Breakout Ports (see page 238)
Removing a Breakout Port (see page 242)
Combining Four 10G Ports into One 40G Port (see page 242)
Interface Types
Cumulus Linux exposes network interfaces for several types of physical and logical devices:
lo, network loopback device
ethN, switch management port(s), for out of band management only
swpN, switch front panel ports
(optional) brN, bridges (IEEE 802.1Q VLANs)
(optional) bondN, bonds (IEEE 802.3ad link aggregation trunks, or port channels)
Interface Settings
Each physical network interface has a number of configurable settings:
Auto-negotiation
Duplex
Forward error correction
Link speed
MTU, or maximum transmission unit
Almost all of these settings are configured automatically for you, depending upon your switch ASIC,
although you must always set MTU manually.
You can only set MTU for logical interfaces. If you try to set auto-negotiation, duplex mode or link
speed for a logical interface, an unsupported error gets returned.
cumulusnetworks.com 223
Cumulus Linux 3.5 User Guide
Ports are always automatically configured on a Mellanox-based switch, with one exception — you only need
to configure is MTU (see page 235). You don't even need to enable auto-negotation, as the Mellanox
firmware configures everything for you.
Enabling Auto-negotiation
To configure auto-negotiation for a Broadcom-based switch, set link-autoneg to on for all the switch
ports. For example, to enable auto-negotiation for swp1 through swp52:
Any time you enable auto-negotiation, Cumulus Linux restores the default configuration settings specified
in the table below (see page ).
By default on a Broadcom-based switch, auto-negotiation is disabled — except on 10000 and 1000BASE-T
switch ports, where it's required for links to work at all. And for RJ-45 SFP adapters, you need to manually
configure the settings as described in the default settings table below (see page ).
If you disable it later or never enable it, then you have to configure the duplex, FEC and link speed settings
manually using NCLU (see page 82) — see the relevant sections below. The default speed if you disable
auto-negotiation depends on the type of connector used with the port. For example, a QSFP28 optic
defaults to 100G, while a QSFP+ optic defaults to 40G and SFP+ defaults to 10G.
You cannot or should not disable auto-negotiation off for any type of copper cable, including:
10000BASE-T
10000DAC
40000DAC
100000DAC
However, 10/100/1000BASE-T RJ-45 SFP adapters do not work with auto-negotiation enabled. You
must manually configure these ports using the settings below (link-autoneg=off, link-
speed=1000|100|10, link-duplex=full|half).
Depending upon the connector used for a port, enabling auto-negotiation also enables forward error
correction (FEC), if the cable requires it (see the table below (see page )). FEC always adjusts for the
speed of the cable. However, you cannot disable FEC separately using NCLU (see page 82).
If the other side of the link is running a version of Cumulus Linux earlier than 3.2, depending up
on the interface type, auto-negotiation may not work on that switch. Cumulus Networks
recommends you use the default settings on this switch in this case.
For Mellanox-based switches, the Spectrum firmware decides on the best settings based on the port type
224 02 March 2018
Cumulus Networks
For Mellanox-based switches, the Spectrum firmware decides on the best settings based on the port type
and connector type.
Configuration
in /etc
/network
/interfaces
auto swp1
iface swp1
cumulusnetworks.com 225
Cumulus Linux 3.5 User Guide
link-
autoneg on
link-speed
100
1000BASE-T On N/A
on a 1G
fixed copper $ net add
port interface
swp1 link
speed 1000
$ net add
interface
swp1 link
autoneg on
Configuration
in /etc
/network
/interfaces
auto swp1
iface swp1
link-
autoneg on
link-speed
1000
1000BASE-T On N/A
on a 10G
fixed copper $ net add
port interface
swp1 link
speed 1000
$ net add
interface
swp1 link
autoneg on
Configuration
in /etc
/network
/interfaces
auto swp1
iface swp1
link-
autoneg on
link-speed
1000
cumulusnetworks.com 227
Cumulus Linux 3.5 User Guide
swp1 link
autoneg on
Configuration
in /etc
/network
/interfaces
auto swp1
iface swp1
link-
autoneg on
link-speed
1000
10000BASE- On N/A
T fixed
copper port $ net add
interface
swp1 link
speed 10000
$ net add
interface
swp1 link
autoneg on
Configuration
in /etc
/network
/interfaces
auto swp1
iface swp1
link-
autoneg on
link-speed
10000
Configuration
in /etc
/network
/interfaces
auto swp1
iface swp1
link-
autoneg off
link-speed
10000
Configuration
in /etc
/network
/interfaces
auto swp1
iface swp1
link-
autoneg on
link-speed
40000
cumulusnetworks.com 229
Cumulus Linux 3.5 User Guide
Configuration
in /etc
/network
/interfaces
auto swp1
iface swp1
link-
autoneg off
link-speed
40000
100000BASE- On auto-
CR4 negotiated
$ net add
interface
swp1 link
speed 100000
$ net add
interface
swp1 link
autoneg on
Configuration
in /etc
/network
/interfaces
auto swp1
iface swp1
link-
autoneg on
link-speed
100000
100000BASE- Off RS
SR4,
100000AOC $ net add
interface
swp1 link
speed 100000
$ net add
interface
swp1 link
autoneg off
$ net add
interface
swp1 link fec
rs
Configuration
in /etc
/network
/interfaces
auto swp1
iface swp1
link-
autoneg off
link-speed
100000
link-fec rs
cumulusnetworks.com 231
Cumulus Linux 3.5 User Guide
$ net add
interface
swp1 link fec
off
Configuration
in /etc
/network
/interfaces
auto swp1
iface swp1
link-
autoneg off
link-speed
100000
link-fec
off
25000BASE- On auto-
CR negotiated*
$ net add
interface
swp1 link
speed 25000
$ net add
interface
swp1 link
autoneg on
Configuration
in /etc
/network
/interfaces
auto swp1
iface swp1
link-
autoneg on
link-speed
25000
Configuration
in /etc
/network
/interfaces
auto swp1
iface swp1
link-
autoneg off
link-speed
25000
link-fec
baser
cumulusnetworks.com 233
Cumulus Linux 3.5 User Guide
Configuration
in /etc
/network
/interfaces
auto swp1
iface swp1
link-
autoneg off
link-speed
25000
link-fec
off
auto swp1
iface swp1
link-speed 10000
1G 100 Mb
40G 10G*
100G 50G & 40G (with or without breakout port), 25G*, 10G*
*Requires the port to be converted into a breakout port. See below (see page 238).
MTU
Interface MTU (maximum transmission unit) applies to traffic traversing the management port, front panel
/switch ports, bridge, VLAN subinterfaces and bonds — in other words, both physical and logical interfaces.
MTU is the only interface setting that must be set manually.
In Cumulus Linux, ifupdown2 assigns 1500 as the default MTU setting. To change the setting, run:
Some switches may not support the same maximum MTU setting in hardware for both the
management interface (eth0) and the data plane ports.
auto bridge
iface bridge
bridge-ports bond1 bond2 bond3 bond4 peer5
bridge-vids 100-110
bridge-vlan-aware yes
In order for bridge to have an MTU of 9000, set the MTU for each of the member interfaces (bond1 to bond
4, and peer5), to 9000 at minimum.
When configuring MTU for a bond, configure the MTU value directly under the bond interface; the
configured value is inherited by member links/slave interfaces. If you need a different MTU on the bond, set
it on the bond interface, as this ensures the slave interfaces pick it up. There is no need to specify MTU on
the slave interfaces.
VLAN interfaces inherit their MTU settings from their physical devices or their lower interface; for example,
swp1.100 inherits its MTU setting from swp1. Hence, specifying an MTU on swp1 ensures that swp1.100
inherits swp1's MTU setting.
If you are working with VXLANs (see page 396), the MTU for a virtual network interface (VNI) must be 50
bytes smaller than the MTU of the physical interfaces on the switch, as those 50 bytes are required for
various headers and other data. You should also consider setting the MTU much higher than the default
1500.
auto swp1
iface swp1
mtu 9000
You must take care to ensure there are no MTU mismatches in the conversation path.
MTU mismatches will result in dropped or truncated packets, degrading or blocking
network performance.
cat /etc/network/ifupdown2/policy.d/mtu.json
{
"address": {"defaults": { "mtu": "9216" }
}
}
If your platform does not support a high MTU on eth0, you can set a lower MTU with the following
command:
cumulusnetworks.com 237
Cumulus Linux 3.5 User Guide
}
},
"address": {
"defaults": { "mtu": "9000" }
}
}
On Dell switches with Maverick ASICs, you configure breakout ports on the 100G uplink ports by
manually editing the /etc/cumulus/ports.conf file. You need to specify either 4x10 or 4x25
for the port speed. For example, on a Dell S4148F-ON switch, to break out swp26 into 4 25G
ports, you would modify the line starting with "26=" in ports.conf as follows:
...
# QSFP+ ports
#
# <port label 27-28> = [4x10G|40G]
27=disabled
28=disabled
# QSFP28 ports
#
# <port label 25-26, 29-30> = [4x10G|4x25G|2x50G|40G|50G|100G]
25=100G
26=4x25G
29=100G
30=100G
...
Then you need to configure the breakout ports in the /etc/network/interfaces file:
...
auto swp26s0
iface swp26s0
auto swp26s1
iface swp3s1
auto swp26s2
iface swp26s2
auto swp26s3
iface swp26s3
...
On Mellanox switches, you need to disable the next port (see below). In this example, you would
also run the following before committing the update:
...
auto swp3s0
iface swp3s0
auto swp3s1
iface swp3s1
auto swp3s2
cumulusnetworks.com 239
Cumulus Linux 3.5 User Guide
iface swp3s2
auto swp3s3
iface swp3s3
...
When you commit your change configuring the breakout ports, switchd restarts to apply the
changes. The restart interrupts network services (see page 190).
/etc/cumulus/ports.conf varies across different hardware platforms. Check the current list
of supported platforms on the hardware compatibility list.
A snippet from the /etc/cumulus/ports.conf on a Dell S6000 switch (with a Trident II+ ASIC)
where swp3 is broken out as above looks like this:
Notice that you can break out any of the 100G ports into a variety of options: four 10G ports, four
25G ports or two 50G ports. Keep in mind that you cannot have more than 128 total logical ports
on a Broadcom switch.
The Mellanox SN2700, SN2700B, SN2410, and SN2410B switches both have a limit of
64 logical ports in total. However, if you want to break out to 4x25G or 4x10G, you must
configure the logical ports as follows:
You can only break out odd-numbered ports into 4 logical ports.
You must disable the next even-numbered port.
These restrictions do not apply to a 2x50G breakout configuration.
cumulusnetworks.com 241
Cumulus Linux 3.5 User Guide
...
11=4x
12=disabled
...
Here is an example showing how to configure breakout cables for the Mellanox Spectrum SN2700
.
These commands create the following configuration snippet in the /etc/cumulus/ports.conf file:
242 02 March 2018
Cumulus Networks
These commands create the following configuration snippet in the /etc/cumulus/ports.conf file:
# SFP+ ports#
# <port label 1-48> = [10G|40G/4]
1=40G/4
2=40G/4
3=40G/4
4=40G/4
5=10G
# ports.conf --
#
# This file controls port aggregation and subdivision. For example,
QSFP+
# ports are typically configurable as either one 40G interface or four
# 10G/1000/100 interfaces. This file sets the number of interfaces
per port
# while /etc/network/interfaces and ethtool configure the link speed
for each
# interface.
#
# You must restart switchd for changes to take effect.
#
# The DELL S6000 has:
# 32 QSFP ports numbered 1-32
# These ports are configurable as 40G, split into 4x10G ports or
# disabled.
#
# The X pipeline covers QSFP ports 1 through 16 and the Y pipeline
# covers QSFP ports 17 through 32.
#
# The Trident2 chip can only handle 52 logical ports per pipeline.
#
# This means 13 is the maximum number of 40G ports you can ungang
# per pipeline, with the remaining three 40G ports set to
# "disabled". The 13 40G ports become 52 unganged 10G ports, which
# totals 52 logical ports for that pipeline.
cumulusnetworks.com 243
Cumulus Linux 3.5 User Guide
The means the maximum number of ports for this Dell S6000 is 104.
Mellanox SN2700 and SN2700B switches have a limit of 64 logical ports in total. However, the logical ports
must be configured in a specific way. See the note (see page 238) above.
Statistics
High-level interface statistics are available with the net show interface command:
Counters TX RX
---------- ---- ----
errors 0 0
unicast 0 0
broadcast 0 0
multicast 0 0
LLDP
------ ---- ---------------------------
swp1 ==== 44:38:39:00:00:03(server01)
cumulusnetworks.com 245
Cumulus Linux 3.5 User Guide
Or using ethtool:
Related Information
Debian - Network Configuration
Linux Foundation - VLANs
Linux Foundation - Bridges
Linux Foundation - Bonds
You can configure Quality of Service (QoS) for switches on the Broadcom Helix4, Tomahawk,
Trident II+ and Trident II platforms, and the Mellanox Spectrum platform only.
Contents
cumulusnetworks.com 247
Cumulus Linux 3.5 User Guide
Contents
This chapter covers ...
Commands (see page 248)
Example Configuration File (see page 248)
Configuring Traffic Marking through ACL Rules (see page 252)
Configuring Priority Flow Control (see page 253)
Understanding Port Groups (see page 255)
Configuring Link Pause (see page 256)
Configuring Cut-through Mode and Store and Forward Switching (see page 257)
Configuring Explicit Congestion Notification (see page 258)
Related Information (see page 259)
Commands
If you modify the configuration in the /etc/cumulus/datapath/traffic.conf file, you must restart
switchd (see page 190)for the changes to take effect:
cumulusnetworks.com 249
Cumulus Linux 3.5 User Guide
cumulusnetworks.com 251
Cumulus Linux 3.5 User Guide
On Mellanox Spectrum switches, packet priority remark must be enabled on the ingress port. A
packet received on a remark-enabled port is remarked according to the priority mapping
configured on the egress port. If packet priority remark is configured the same way on every port,
the default configuration example above is correct. However, per-port customized configurations
require two port groups: one for the ingress ports and one for the egress ports, as below:
remark.port_group_list = [ingress_remark_group,
egress_remark_group]
remark.ingress_remark_group.packet_priority_remark_set = [dscp]
remark.remark_port_group.port_set = swp1-swp4,swp6
remark.egress_remark_group.port_set = swp10-swp20
remark.egress_remark_group.cos_0.priority_remark.dscp = [2]
remark.egress_remark_group.cos_1.priority_remark.dscp = [10]
remark.egress_remark_group.cos_2.priority_remark.dscp = [18]
remark.egress_remark_group.cos_3.priority_remark.dscp = [26]
remark.egress_remark_group.cos_4.priority_remark.dscp = [34]
remark.egress_remark_group.cos_5.priority_remark.dscp = [42]
remark.egress_remark_group.cos_6.priority_remark.dscp = [50]
remark.egress_remark_group.cos_7.priority_remark.dscp = [58]
[ebtables]
-A FORWARD -o swp5 -j setqos --set-cos 5
Option Description
--set-cos Sets the datapath resource/queuing class value. Values are defined in IEEE_P802.1p.
INT
--set-dscp Sets the DSCP field in packet header to a value, which can be either a decimal or hex value.
value
--set-dscp- Sets the DSCP field in the packet header to the value represented by the DiffServ class
class class value. This class can be EF, BE or any of the CSxx or AFxx classes.
[iptables]
-t mangle -A FORWARD --in-interface swp+ -p tcp --dport bgp -j SETQOS
--set-dscp 10 --set-cos 5
[ip6tables]
-t mangle -A FORWARD --in-interface swp+ -j SETQOS --set-dscp 10
You can put the rule in either the mangle table or the default filter table; the mangle table and filter table
are put into separate TCAM slices in the hardware.
To put the rule in the mangle table, include -t mangle; to put the rule in the filter table, omit -t mangle.
PFC is a layer 2 mechanism that prevents congestion by throttling packet transmission. When PFC is
enabled for received packets on a set of switch ports, the switch detects congestion in the ingress buffer of
the receiving port and signals the upstream switch to stop sending traffic. If the upstream switch has PFC
enabled for packet transmission on the designated priorities, it responds to the downstream switch and
stops sending those packets for a period of time.
PFC operates between two adjacent neighbor switches; it does not provide end-to-end flow control.
However, when an upstream neighbor throttles packet transmission, it could build up packet congestion
and propagate PFC frames further upstream: eventually the sending server could receive PFC frames and
stop sending traffic for a time.
The PFC mechanism can be enabled for individual switch priorities on specific switch ports for RX and/or TX
traffic. The switch port’s ingress buffer occupancy is used to measure congestion. If congestion is present,
the switch transmits flow control frames to the upstream switch. Packets with priority values that do not
have PFC configured are not counted during congestion detection; neither do they get throttled by the
upstream switch when it receives flow control frames.
PFC congestion detection is implemented on the switch using xoff and xon threshold values for the specific
ingress buffer which is used by the targeted switch priorities. When a packet enters the buffer and the
buffer occupancy is above the xoff threshold, the switch transmits an Ethernet PFC frame to the upstream
switch to signal packet transmission should stop. When the buffer occupancy drops below the xon
threshold, the switch sends another PFC frame upstream to signal that packet transmission can resume.
(PFC frames contain a quanta value to indicate a timeout value for the upstream switch: packet
transmission can resume after the timer has expired, or when a PFC frame with quanta == 0 is received
from the downstream switch.)
After the downstream switch has sent a PFC frame upstream, it continues to receive packets until the
upstream switch receives and responds to the PFC frame. The downstream ingress buffer must be large
enough to store those additional packets after the xoff threshold has been reached.
Before Cumulus Linux 3.1.1, PFC was designated as a lossless priority group. The lossless priority
group has been removed from Cumulus Linux.
Priority flow control is fully supported on both Broadcom and Mellanox switches.
PFC is disabled by default in Cumulus Linux. Enabling priority flow control (PFC) requires configuring the
following settings in /etc/cumulus/datapath/traffic.conf on the switch:
Specifying the name of the port group in pfc.port_group_list in brackets; for example, pfc.
port_group_list = [pfc_port_group].
Assigning a CoS value to the port group in pfc.pfc_port_group.cos_list setting. Note that
pfc_port_group is the name of a port group you specified above and is used throughout the following
settings.
Populating the port group with its member ports in pfc.pfc_port_group.port_set.
Setting the xoff byte limit in pfc.pfc_port_group.xoff_size. This is a threshold for the PFC
buffer; when this limit is reached, an xoff transition is initiated, signaling the upstream port to stop
sending traffic, during which time packets continue to arrive due to the latency of the
communication. The default is 10000 bytes.
Setting the xon delta limit in pfc.pfc_port_group.xon_delta. This is the number of bytes to
subtract from the xoff limit, which results in a second threshold at which the egress port resumes
sending traffic. After the xoff limit is reached and the upstream port stops sending traffic, the buffer
begins to drain. When the buffer reaches 8000 bytes (assuming default xoff and xon settings), the
egress port signals that it can start receiving traffic again. The default is 2000 bytes.
Enabling the egress port to signal the upstream port to stop sending traffic (pfc.
pfc_port_group.tx_enable). The default is true.
Enabling the egress port to receive notifications and act on them (pfc.pfc_port_group.
rx_enable). The default is true.
The switch priority value(s) are mapped to the specific ingress buffer for each targeted switch port.
Cumulus Linux looks at either the 802.1p bits or the IP layer DSCP bits depending on which is
configured in the traffic.conf file to map packets to internal switch priority values.
The following configuration example shows PFC configured for ports swp1 through swp4 and swp6:
cumulusnetworks.com 255
Cumulus Linux 3.5 User Guide
Adding the port_set, rx_enable, and tx_enable configuration lines for each port group.
You can specify the set of ports in a port group in comma-separated sequences of contiguous ports; you
can see which ports are contiguous in /var/lib/cumulus/porttab. The syntax supports:
A single port (swp1s0 or swp5)
A sequence of regular swp ports (swp2-swp5)
A sequence within a breakout swp port (swp6s0-swp6s3)
A sequence of regular and breakout ports, provided they are all in a contiguous range. For example:
...
swp2
swp3
swp4
swp5
swp6s0
swp6s1
swp6s2
swp6s3
swp7
...
Restart switchd (see page 190)to allow the PFC configuration changes to take effect:
# -- set the xoff byte limit (buffer limit that triggers pause
frames transmit to start)
# -- set the xon byte delta (buffer limit that triggers pause
frames transmit to stop)
# link pause
link_pause.port_group_list = [pause_port_group]
link_pause.pause_port_group.port_set = swp1-swp4,swp6
link_pause.pause_port_group.port_buffer_bytes = 25000
link_pause.pause_port_group.xoff_size = 10000
link_pause.pause_port_group.xon_delta = 2000
link_pause.pause_port_group.rx_enable = true
link_pause.pause_port_group.tx_enable = true
Restart switchd (see page 190)to allow link pause configuration changes to take effect:
To work around this issue, disable link pause or disable cut-through mode in /etc/cumulus/datapath
/traffic.conf.
To disable link pause, comment out the link_pause* section in /etc/cumulus/datapath/traffic.
conf:
cumulusnetworks.com 257
Cumulus Linux 3.5 User Guide
#link_pause.port_group_0.rx_enable = true
#link_pause.port_group_0.tx_enable = true
On Trident II switches only, if ECN is enabled on a specific queue, the ASIC also enables WRED
on the same queue. If the packet is ECT marked (the ECN bits are 01 or 10), the ECN mechanism
executes as described above. However, if it is entering an ECN-enabled queue but is not ECT
marked (the ECN bits are 00), then the WRED mechanism uses the same threshold and
probability values to decide whether to drop the packet. Packets entering a non-ECN-enabled
queue do not get marked or dropped due to ECN or WRED in any case.
ECN is implemented on the switch using minimum and maximum threshold values for the egress queue
length. When a packet enters the queue and the average queue length is between the minimum and
maximum threshold values, a configurable probability value will determine whether the packet will be
marked. If the average queue length is above the maximum threshold value, the packet is always marked.
The downstream switches with ECN enabled perform the same actions as the traffic is received. If the ECN
bits are set, they remain set. The only way to overwrite ECN bits is to enable it — that is, set the ECN bits to
11.
ECN is supported on Broadcom Tomahawk, Trident II+ and Trident II, and Mellanox Spectrum switches only.
Click to learn how to configure ECN ...
ECN is disabled by default in Cumulus Linux. You can enable ECN for individual switch priorities on specific
switch ports. ECN requires configuring the following settings in /etc/cumulus/datapath/traffic.
conf on the switch:
Specifying the name of the port group in ecn.port_group_list in brackets; for example, ecn.
port_group_list = [ecn_port_group].
Assigning a CoS value to the port group in ecn.ecn_port_group.cos_list. If the CoS value of a
258 02 March 2018
Cumulus Networks
Assigning a CoS value to the port group in ecn.ecn_port_group.cos_list. If the CoS value of a
packet matches the value of this setting, then ECN is applied. Note that ecn_port_group is the name
of a port group you specified above.
Populating the port group with its member ports (ecn.ecn_port_group.port_set), where
ecn_port_group is the name of the port group you specified above. Congestion is measured on the
egress port queue for the ports listed here, using the average queue length: if congestion is present,
a packet entering the queue may be marked to indicate that congestion was observed. Marking a
packet involves setting the least 2 significant bits in the IP header DiffServ (ToS) field to 11.
The switch priority value(s) are mapped to specific egress queues for the target switch ports.
The ecn.ecn_port_group.probability value indicates the probability of a packet being
marked if congestion is experienced.
The following configuration example shows ECN configured for ports swp1 through swp4 and swp6:
Restart switchd (see page 190)to allow the ECN configuration changes to take effect:
Related Information
iptables-extensions man page
TCP packets with FIN, URG and PSH bits set and seq number == 0
cumulusnetworks.com 259
Cumulus Linux 3.5 User Guide
TCP packets with FIN, URG and PSH bits set and seq number == 0
TCP packets with both SYN and FIN bits set
TCP source PORT matches the destination PORT
UDP source PORT matches the destination PORT
First TCP fragment with partial TCP header
TCP header has fragment offset value of 1
ICMPv6 ping packets payload larger than programmed value of ICMP max size
ICMPv4 ping packets payload larger than programmed value of ICMP max size
Fragmented ICMP packet
IPv6 fragment lower than programmed minimum IPv6 packet size
This configuration option is only available for Broadcom Trident, Trident II, and Tomahawk
chipsets.
Cumulus Networks recommends enabling this feature when deploying a switch with the above mentioned
ASICs, as hardware-based DDOS protection is disabled by default. Although Cumulus recommends
enabling all of the above criteria, they can be individually enabled if desired.
dos.icmp_frag = true
dos.icmpv4_length = true
dos.icmpv6_length = true
dos.ipv6_min_frag = true
The dhcpd and dhcrelay services are disabled by default. After you finish configuring the DHCP
relays and servers, you need to start those services.
Contents
This chapter covers ...
Configuring IPv4 DHCP Relays (see page 262)
Using DHCP Option 82 (see page 263)
Configuring IPv6 DHCP Relays (see page 263)
Configuring Multiple DHCP Relays (see page 264)
Configuring a DHCP Relay with VRR (see page 264)
Configuring the DHCP Relay Service Manually (Advanced) (see page 266)
Troubleshooting the DHCP Relays (see page 266)
Looking at the Log on Switch where DHCP Relay Is Configured (see page 267)
cumulusnetworks.com 261
Cumulus Linux 3.5 User Guide
Looking at the Log on Switch where DHCP Relay Is Configured (see page 267)
You configure a DHCP relay on a per-VLAN basis, specifying the SVI, not the parent bridge — in
our example, you would specify vlan1 as the SVI for VLAN 1; do not specify the bridge named
bridge in this case.
As per RFC 3046, you can specify as many server IP addresses that can fit in 255 octets, specifying
each address only once.
After you've finished configuring the DHCP relay, restart then enable the dhcrelay service so the
configuration persists between reboots:
To see the status of the DHCP relay, use the systemctl status dhcrelay.service command:
CGroup: /system.slice/dhcrelay.service
1997 /usr/sbin/dhcrelay --nl -d -q -i vlan1 -i swp51 -i
swp52 172.16.1.102
After you've finished configuring the DHCP relay, save your changes, restart the dhcrelay6 service, then
enable the dhcrelay6 service so the configuration persists between reboots:
To see the status of the IPv6 DHCP relay, use the systemctl status dhcrelay6.service command:
cumulusnetworks.com 263
Cumulus Linux 3.5 User Guide
1. As the sudo user, open /etc/vrf/systemd.conf in a text editor, and remove dhcrelay.
2. Run the following command to reload the systemd files:
3. Create a config file in /etc/default using the following format for each dhcrelay: isc-dhcp-
relay-<dhcp-name>. An example file can be seen below:
4. Run the following command to start a dhcrelay instance, replacing dhcp-name with the instance
name or number:
...
auto bridge
iface bridge
bridge-vids 500
bridge-vlan-aware yes
auto vlan500
iface vlan500
address 192.0.2.252/24
address-virtual 00:00:5e:00:01:01 192.0.2.254/24
vlan-id 500
vlan-raw-device bridge
auto vlan500-v0
iface vlan500-v0
cumulusnetworks.com 265
Cumulus Linux 3.5 User Guide
For example:
You can run the command journalctl command with the --since flag to specify a time period:
cumulusnetworks.com 267
Cumulus Linux 3.5 User Guide
For the configurations used in this chapter, the DHCP server is a switch running Cumulus Linux; however,
the DHCP server can also be located on a dedicated server in your environment.
The dhcpd and dhcrelay services are disabled by default. After you finish configuring the DHCP
relays and servers, you need to start those services.
Contents
This chapter covers ...
Configuring DHCP Server on Cumulus Linux Switches (see page 268)
Configuring the IPv4 DHCP Server (see page 268)
Configuring the IPv6 DHCP Server (see page 269)
Troubleshooting the Log from a DHCP Server (see page 270)
default-lease-time 600;
max-lease-time 7200;
Just as you did with the DHCP relay scripts, edit the DHCP server configuration file so it can launch the
DHCP server when the system boots. Here is a sample configuration:
INTERFACES="swp1"
After you've finished configuring the DHCP server, enable the dhcpd service immediately:
default-lease-time 600;
max-lease-time 7200;
subnet6 2001:db8:100::/64 {
}
subnet6 2001:db8:1::/64 {
range6 2001:db8:1::100 2001:db8:1::200;
}
Just as you did with the DHCP relay scripts, edit the DHCP server configuration file so it can launch the
DHCP server when the system boots. Here is a sample configuration:
INTERFACES="swp1"
cumulusnetworks.com 269
Cumulus Linux 3.5 User Guide
After you've finished configuring the DHCP server, enable the dhcpd6 service immediately:
802.1X Interfaces
The IEEE 802.1X protocol provides a method of authenticating a client (called a supplicant) over wired
media. It also provides access for individual MAC addresses on a switch (called the authenticator) after those
MAC addresses have been authenticated by an authentication server — typically a RADIUS (see page 129)
(Remote Authentication Dial In User Service, defined by RFC 2865) server.
A Cumulus Linux switch acts as an intermediary between the clients connected to the wired ports and the
authentication server, which is reachable over the existing network. EAPOL (Extensible Authentication
Protocol (EAP) over LAN — EtherType value of 0x888E, defined by RFC 3748) operates on top of the data
link layer; it is used by the switch to communicate with supplicants connected to the switch ports.
Cumulus Linux implements 802.1X through the Debian hostapd package, which has been modified to
provide the PAE (port access entity).
Contents
This chapter covers ...
Supported Features and Limitations (see page 271)
Installing the 802.1X Package (see page 272)
Configuring 802.1X Interfaces (see page 272)
Configuring Accounting and Authentication Ports (see page 274)
Configuring MAC Authentication Bypass (see page 274)
Configuring a Parking VLAN (see page 274)
Verifying Connections from Linux Supplicants (see page 275)
Configuring Dynamic VLAN Assignments (see page 275)
Troubleshooting (see page 276)
Advanced Troubleshooting (see page 278)
Configuring the RADIUS Server (see page 279)
Changing the interface dot1x, dot1x mab, or dot1x parking-vlan settings do not
reset existing authorized user ports.
This has been tested with only a few wpa_supplicant (Debian) and Windows7 supplicants.
RADIUS authentication is supported with FreeRadius and Cisco ACS.
Supports simple login/password, PEAP/MSCHAPv2 (Win7) and EAP-TLS (Debian).
1. Create a simple interface bridge configuration on the switch and add the switch ports that are
members of the bridge. You can use glob syntax to add a range of interfaces. The MAB and parking
VLAN configurations require interfaces to be bridge access ports.
2. Configure the settings for the 802.1X RADIUS server, including its IP address and shared secret:
272 02 March 2018
Cumulus Networks
2. Configure the settings for the 802.1X RADIUS server, including its IP address and shared secret:
3. Enable 802.1X on interfaces, then review and commit the new configuration:
These commands create the following configuration snippet in the /etc/network/interfaces file:
...
auto swp1
iface swp1
bridge-learning off
auto swp2
iface swp2
bridge-learning off
auto swp3
iface swp3
bridge-learning off
auto swp4
iface swp4
bridge-learning off
...
auto bridge
iface bridge
bridge-ports swp1 swp2 swp3 swp4
bridge-vlan-aware yes
Verify the 802.1X configuration, showing the configuration and its status:
cumulusnetworks.com 273
Cumulus Linux 3.5 User Guide
You can specify the require option in the command so that VLAN attributes are required. If VLAN
cumulusnetworks.com 275
Cumulus Linux 3.5 User Guide
You can specify the require option in the command so that VLAN attributes are required. If VLAN
attributes do not exist in the access response packet returned from the RADIUS server, the user is not
authorized and has no connectivity. If the RADIUS server returns VLAN attributes but the user has an
incorrect password, the user is placed in the parking VLAN (if you have configured parking VLAN).
The following example shows a typical RADIUS configuration (shown for Freeradius, not typically configured
or run on the Cumulus Linux device) for a user with dynamic VLAN assignment:
To disable dynamic VLAN assignment, where VLAN attributes sent from the RADIUS server are ignored and
users are authenticated based on existing credentials:
Enabling or disabling dynamic VLAN assignment restarts hostapd, which forces existing,
authorized users to re-authenticate.
Troubleshooting
To check connectivity between two supplicants, ping one host from the other:
You can run net show dot1x with the following options for more data:
cumulusnetworks.com 277
Cumulus Linux 3.5 User Guide
dot1xAuthSessionTime 182
dot1xAuthSessionUserName testing
dot1xPaePortProtocolVersion 2
last_eap_type_as 4 (MD5)
last_eap_type_sta 4
(MD5)
...
Advanced Troubleshooting
278 02 March 2018
Cumulus Networks
Advanced Troubleshooting
More advanced troubleshooting can be accomplished with the following commands.
You can increase the debug level in hostapd by copying over the hostapd service file, then adding -d, -dd
or -ddd to the ExecStart line in the hostapd.service file:
...
Once installed and configured, the FreeRADIUS server can serve Cumulus Linux running hostapd as a
cumulusnetworks.com 279
Cumulus Linux 3.5 User Guide
Once installed and configured, the FreeRADIUS server can serve Cumulus Linux running hostapd as a
RADIUS client.
For more information, see the FreeRADIUS documentation.
Layer
280 1 and 2 02 March 2018
Cumulus Networks
Layer 1 and 2
Contents
This chapter covers ...
Supported Modes (see page 281)
STP for a VLAN-aware Bridge (see page 282)
STP within a Traditional Mode Bridge (see page 282)
Viewing Bridge and STP Status/Logs (see page 282)
Using Linux to Check Spanning Tree Status (Advanced) (see page 285)
Customizing Spanning Tree Protocol (see page 286)
Spanning Tree Priority (see page 286)
PortAdminEdge/PortFast Mode (see page 287)
PortAutoEdge (see page 288)
BPDU Guard (see page 288)
Bridge Assurance (see page 291)
BPDU Filter (see page 291)
Storm Control (see page 292)
Configuring Other Spanning Tree Parameters (see page 292)
Caveats and Errata (see page 295)
Related Information (see page 295)
Supported Modes
The STP modes Cumulus Linux supports vary depending upon whether the traditional or VLAN-aware
bridge driver mode (see page 319) is in use.
Bridges configured in VLAN-aware (see page 325) mode operate only in RSTP mode. NCLU (see page 82),
the network command line utility for configuring Cumulus Linux, only supports bridges in VLAN-aware
mode.
For a bridge configured in traditional mode, PVST and PVRST are supported, with the default set to PVRST.
Each traditional bridge has its own separate STP instance.
Since you cannot use NCLU to configure a traditional mode bridge, you must configure it directly in the
/etc/network/interfaces file.
cumulusnetworks.com 281
Cumulus Linux 3.5 User Guide
As of version 3.2.1, STP is enabled by default in Cumulus Linux. There is no need to specify
bridge-stp on for the bridge any more.
When connected to a switch that has a native VLAN configuration, the native VLAN must be
configured to be VLAN 1 only for maximum interoperability.
enabled no role
Disabled
port id 8.004 state
discarding
external port cost 305 admin external cost 0
internal port cost 305 admin internal cost 0
designated root 1.000.44:38:39:00:00:27 dsgn external cost 0
dsgn regional root 1.000.44:38:39:00:00:27 dsgn internal cost 0
designated bridge 1.000.44:38:39:00:00:27 designated port
8.004
admin edge port no auto edge port yes
oper edge port no topology change ack no
point-to-point yes admin point-to-point auto
restricted role no restricted TCN no
port hello time 2 disputed no
bpdu guard port no bpdu guard error no
network port no BA inconsistent no
Num TX BPDU 2 Num TX TCN 0
Num RX BPDU 0 Num RX TCN 0
Num Transition FWD 0 Num Transition BLK 2
bpdufilter port no
clag ISL no clag ISL Oper UP no
clag role primary clag dual conn mac 00:
00:00:00:00:00
clag remote portID F.FFF clag system mac 44:
38:39:FF:40:90
bridge:leaf01-02 CIST info
enabled yes role
Designated
port id 8.003 state
forwarding
external port cost 10000 admin external cost 0
internal port cost 10000 admin internal cost 0
designated root 1.000.44:38:39:FF:40:90 dsgn external cost 0
dsgn regional root 1.000.44:38:39:FF:40:90 dsgn internal cost 0
designated bridge 1.000.44:38:39:FF:40:90 designated port
8.003
admin edge port no auto edge port yes
oper edge port no topology change ack no
point-to-point yes admin point-to-point auto
restricted role no restricted TCN no
port hello time 2 disputed no
bpdu guard port no bpdu guard error no
network port no BA inconsistent no
Num TX BPDU 253558 Num TX TCN 2
Num RX BPDU 253373 Num RX TCN 4
Num Transition FWD 126675 Num Transition BLK
126694
bpdufilter port no
clag ISL no clag ISL Oper UP no
clag role primary clag dual conn mac 44:
38:39:FF:40:94
cumulusnetworks.com 283
Cumulus Linux 3.5 User Guide
mstpd is the preferred utility for interacting with STP on Cumulus Linux. brctl also provides
certain methods for configuring STP; however, they are not as complete as the tools offered in
mstpd and output from brctl can be misleading in some cases.
PortAdminEdge/PortFast Mode
PortAdminEdge is equivalent to the PortFast feature offered by other vendors. It enables or disables the
initial edge state of a port in a bridge.
All ports configured with PortAdminEdge bypass the listening and learning states to move immediately to
forwarding.
Using PortAdminEdge mode has the potential to cause loops if it is not accompanied by the
BPDU guard (see page 288) feature.
While it is common for edge ports to be configured as access ports for a simple end host, this is not
mandatory. In the data center, edge ports typically connect to servers, which may pass both tagged and
untagged traffic.
auto swp5
iface swp5
mstpctl-bpduguard yes
mstpctl-portadminedge yes
auto br2
iface br2 inet static
cumulusnetworks.com 287
Cumulus Linux 3.5 User Guide
PortAutoEdge
PortAutoEdge is an enhancement to the standard PortAdminEdge (PortFast) mode, which allows for the
automatic detection of edge ports. PortAutoEdge enables and disables the auto transition to/from the edge
state of a port in a bridge.
Edge ports and access ports are not the same thing. Edge ports transition directly to the
forwarding state and skip the listening and learning stages. Upstream topology change
notifications are not generated when an edge port's link changes state. Access ports only forward
untagged traffic; however, there is no such restriction on edge ports, which can forward both
tagged and untagged traffic.
When a BPDU is received on a port configured with portautoedge, the port ceases to be in the edge port
state and transitions into a normal STP port.
When BPDUs are no longer received on the interface, the port becomes an edge port, and transitions
through the discarding and learning states before resuming forwarding.
To configure PortAutoEdge for an interface:
BPDU Guard
To protect the spanning tree topology from unauthorized switches affecting the forwarding path, you can
configure BPDU guard (Bridge Protocol Data Unit). One very common example is when someone hooks up
a new switch to an access port off of a leaf switch. If this new switch is configured with a low priority, it could
become the new root switch and affect the forwarding path for the entire layer 2 topology.
auto swp5
iface swp5
mstpctl-bpduguard yes
To determine whether BPDU guard is configured, or if a BPDU has been received, run:
The only way to recover a port that has been placed in the disabled state is to manually un-shut or bring up
the port with sudo ifup [port], as shown in the example below.
Bringing up the disabled port does not fix the problem if the configuration on the connected end-
station has not been rectified.
cumulusnetworks.com 289
Cumulus Linux 3.5 User Guide
Bridge Assurance
On a point-to-point link where RSTP is running, if you want to detect unidirectional links and put the port in
a discarding state (in error), you can enable bridge assurance on the port by enabling a port type network.
The port would be in a bridge assurance inconsistent state until a BPDU is received from the peer. You
need to configure the port type network on both the ends of the link in order for bridge assurance to
operate properly.
The default setting for bridge assurance is off. This means that there is no difference between disabling
bridge assurance on an interface and not configuring bridge assurance on an interface.
auto swp1
iface swp1
mstpctl-portnetwork yes
You can monitor logs for bridge assurance messages by doing the following:
BPDU Filter
You can enable bpdufilter on a switch port, which filters BPDUs in both directions. This effectively
disables STP on the port as no BPDUs are transiting.
Using BDPU filter inappropriately can cause layer 2 loops. Use this feature deliberately and with
extreme caution.
cumulusnetworks.com 291
Cumulus Linux 3.5 User Guide
auto swp6
iface swp6
mstpctl-portbpdufilter yes
Storm Control
Storm control provides protection against excessive inbound BUM (broadcast, unknown unicast, multicast)
traffic on layer 2 switch port interfaces, which can cause poor network performance.
You configure storm control for each physical port by editing /etc/cumulus/switchd.conf. The
configuration persists across reboots and restarting switchd. If you change the storm control
configuration in this file after rebooting the switch, you must restart switchd (see page 190)to activate the
new configuration.
To enable broadcast and multicast storm control at 400 packets per second (pps) and 3000 pps,
respectively, for swp1 with /etc/cumulus/switchd.conf:
You configure these parameters using NCLU on the interfaces, not the bridge itself. Most of these
parameters are blacklisted in netd.conf in the ifupdown_blacklist; blacklisted parameters are
indicated with an asterisk (*). You can edit the blacklist (see page 91) to remove any of them.
mstpctl- maxage* Sets the bridge's maximum age to <max_age> seconds. The
maxage default is 20. The maximum age must meet the condition 2 *
(Bridge Forward Delay - 1 second) >= Bridge Max Age.
mstpctl- ageing* Sets the Ethernet (MAC) address ageing time in <time>
ageing seconds for the bridge when the running version is STP, but not
RSTP/MSTP. The default is 1800 seconds.
mstpctl- fdelay* Sets the bridge's bridge forward delay to <time> seconds. The
fdelay default is 15. The bridge forward delay must meet the condition
2 * (Bridge Forward Delay - 1 second) >= Bridge Max Age.
mstpctl- maxhops* Sets the bridge's maximum hops to <max_hops>. The default
maxhops value is 20.
mstpctl- forcevers* Sets the bridge's force STP version to either RSTP/STP. MSTP is
forcevers not supported currently. The default is RSTP.
mstpctl- treeprio* Sets the bridge's tree priority to <priority> for an MSTI
treeprio instance. The priority value is a number between 0 and 65535
and must be a multiple of 4096. The bridge with the lowest
priority is elected the root bridge. The default is 32768.
mstpctl- treeportprio* Sets the priority of port <port> to <priority> for the MSTI
treeportprio instance. The priority value is a number between 0 and 240 and
must be a multiple of 16. The default is 128.
mstpctl-hello hello* Sets the bridge's bridge hello time to <time> seconds. The
default is 2.
portpathcost*
cumulusnetworks.com 293
Cumulus Linux 3.5 User Guide
mstpctl- Sets the port cost of the port <port> in bridge <bridge> to
portpathcost <cost>. The default is 0.
mstpd supports only long mode; that is, 32 bits for the path
cost.
mstpctl- portadminedge Enables/disables the initial edge state of the port <port> in
portadminedge bridge <bridge>. The default is no.
mstpctl- portautoedge Enables/disables the auto transition to/from the edge state of
portautoedge the port <port> in bridge <bridge>. The default is yes.
portautoedge is an enhancement to the standard
PortAdminEdge (PortFast) mode, which allows for the automatic
detection of edge ports.
Edge ports and access ports are not the same thing.
Edge ports transition directly to the forwarding state
and skip the listening and learning stages. Upstream
topology change notifications are not generated
when an edge port's link changes state. Access ports
only forward untagged traffic; however, there is no
such restriction on edge ports, which can forward
both tagged and untagged traffic.
mstpctl- portbpdufilter Enables/disables the BPDU filter functionality for a port <port>
portbpdufilter in bridge <bridge>. The default is no.
mstpctl- treeportcost* Sets the spanning tree port cost to a value from 0 to 255. The
treeportcost default is 0.
Related Information
The source code for mstpd/mstpctl was written by Vitalii Demianets and is hosted at the URL below.
Sourceforge - mstpd project
Wikipedia - Spanning Tree Protocol
brctl(8)
bridge-utils-interfaces(5)
ifupdown-addons-interfaces(5)
mstpctl(8)
mstpctl-utils-interfaces(5)
For more details on the command line arguments and config options, please see man lldpd(8).
cumulusnetworks.com 295
Cumulus Linux 3.5 User Guide
For more details on the command line arguments and config options, please see man lldpd(8).
lldpd supports CDP (Cisco Discovery Protocol, v1 and v2). lldpd logs by default into /var/log/daemon.
log with an lldpd prefix.
lldpcli is the CLI tool to query the lldpd daemon for neighbors, statistics and other running
configuration information. See man lldpcli(8) for details.
Contents
This chapter covers ...
Configuring LLDP (see page 296)
Example lldpcli Commands (see page 296)
Enabling the SNMP Subagent in LLDP (see page 301)
Caveats and Errata (see page 301)
Related Information (see page 301)
Configuring LLDP
You configure lldpd settings in /etc/lldpd.conf or /etc/lldpd.d/.
Here is an example persistent configuration:
cumulusnetworks.com 297
Cumulus Linux 3.5 User Guide
Capability: Router, on
Port:
PortID: ifname swp1
PortDescr: swp1
----------------------------------------------------------------------
---------
Interface: swp4, via: LLDP, RID: 11, Time: 0 day, 17:08:27
Chassis:
ChassisID: mac 00:01:00:00:0a:00
SysName: MSP-2
SysDescr: Cumulus Linux version 3.0.0 running on QEMU
Standard PC (i440FX + PIIX, 1996)
MgmtIP: 192.0.2.10
MgmtIP: fe80::201:ff:fe00:a00
Capability: Bridge, off
Capability: Router, on
Port:
PortID: ifname swp2
PortDescr: swp2
----------------------------------------------------------------------
---------
Interface: swp49s1, via: LLDP, RID: 9, Time: 0 day, 16:55:00
Chassis:
ChassisID: mac 00:01:00:00:0c:00
SysName: TORC-1-2
SysDescr: Cumulus Linux version 3.0.0 running on QEMU
Standard PC (i440FX + PIIX, 1996)
MgmtIP: 192.0.2.12
MgmtIP: fe80::201:ff:fe00:c00
Capability: Bridge, on
Capability: Router, on
Port:
PortID: ifname swp6
PortDescr: swp6
----------------------------------------------------------------------
---------
Interface: swp49s0, via: LLDP, RID: 9, Time: 0 day, 16:55:00
Chassis:
ChassisID: mac 00:01:00:00:0c:00
SysName: TORC-1-2
SysDescr: Cumulus Linux version 3.0.0 running on QEMU
Standard PC (i440FX + PIIX, 1996)
MgmtIP: 192.0.2.12
MgmtIP: fe80::201:ff:fe00:c00
Capability: Bridge, on
Capability: Router, on
Port:
PortID: ifname swp5
PortDescr: swp5
----------------------------------------------------------------------
---------
cumulusnetworks.com 299
Cumulus Linux 3.5 User Guide
---------------------------------------------------------------------
Summary of stats:
Transmitted: 648186
Received: 437557
Discarded: 0
Unrecognized: 0
Ageout: 10
Inserted: 38
Deleted: 10
A runtime configuration does not persist when you reboot the switch — all changes are lost.
The active interface list always overrides the inactive interface list.
Related Information
GitHub - lldpd project
Wikipedia - Link Layer Discovery Protocol
Contents
This chapter covers ...
Supported Features (see page 302)
Configuring PTM (see page 303)
Basic Topology Example (see page 303)
ptmd Scripts (see page 304)
Configuration Parameters (see page 304)
Host-only Parameters (see page 304)
Global Parameters (see page 305)
Per-port Parameters (see page 305)
Templates (see page 306)
Supported BFD and LLDP Parameters (see page 306)
Bidirectional Forwarding Detection (BFD) (see page 307)
Checking Link State with FRRouting (see page 308)
Using ptmd Service Commands (see page 308)
Using ptmctl Commands (see page 309)
ptmctl Examples (see page 309)
ptmctl Error Outputs (see page 311)
Caveats and Errata (see page 312)
Related Information (see page 312)
Supported Features
Topology verification using LLDP. ptmd creates a client connection to the LLDP daemon, lldpd, and
retrieves the neighbor relationship between the nodes/ports in the network and compares them
against the prescribed topology specified in the topology.dot file.
Only physical interfaces, like swp1 or eth0, are currently supported. Cumulus Linux does not
support specifying virtual interfaces like bonds or subinterfaces like eth0.200 in the topology file.
Forwarding path failure detection using Bidirectional Forwarding Detection (BFD); however, demand
mode is not supported. For more information on how BFD operates in Cumulus Linux, read the
Bidirectional Forwarding Detection - BFD (see page 669) chapter and read man ptmd(8).
Integration with FRRouting (PTM to FRRouting notification).
Client management: ptmd creates an abstract named socket /var/run/ptmd.socket on startup.
Other applications can connect to this socket to receive notifications and send commands.
Event notifications: see Scripts below.
User configuration via a topology.dot file; see below (see page 303).
Configuring PTM
ptmd verifies the physical network topology against a DOT-specified network graph file, /etc/ptm.d
/topology.dot.
This file must be present or else ptmd will not start. You can specify an alternate file using the -c
option.
PTM performs its LLDP neighbor check using the PortID ifname TLV information. Previously, it
used the PortID port description TLV information.
graph G {
"spine1":"swp1" -- "leaf1":"swp1";
"spine1":"swp2" -- "leaf2":"swp1";
"spine2":"swp1" -- "leaf1":"swp2";
"spine2":"swp2" -- "leaf2":"swp2";
"leaf1":"swp3" -- "leaf2":"swp3";
"leaf1":"swp4" -- "leaf2":"swp4";
"leaf1":"swp5s0" -- "server1":"eth1";
"leaf2":"swp5s0" -- "server2":"eth1";
}
cumulusnetworks.com 303
Cumulus Linux 3.5 User Guide
ptmd Scripts
ptmd executes scripts at /etc/ptm.d/if-topo-pass and /etc/ptm.d/if-topo-fail for each
interface that goes through a change, running if-topo-pass when an LLDP or BFD check passes and
running if-topo-fails when the check fails. The scripts receive an argument string that is the result of
the ptmctl command, described in the ptmd commands section below (see page 308).
You should modify these default scripts as needed.
Configuration Parameters
You can configure ptmd parameters in the topology file. The parameters are classified as host-only, global,
per-port/node and templates.
Host-only Parameters
Host-only parameters apply to the entire host on which PTM is running. You can include the hostnametype
host-only parameter, which specifies whether PTM should use only the host name ( hostname) or the fully-
qualified domain name (fqdn) while looking for the self-node in the graph file. For example, in the graph
file below, PTM will ignore the FQDN and only look for switch04, since that is the host name of the switch it's
running on:
It’s a good idea to always wrap the hostname in double quotes, like "www.example.com".
Otherwise, ptmd can fail if you specify a fully-qualified domain name as the hostname and do not
wrap it in double quotes.
Further, to avoid errors when starting the ptmd process, make sure that /etc/hosts and /etc
/hostname both reflect the hostname you are using in the topology.dot file.
graph G {
hostnametype="hostname"
BFD="upMinTx=150,requiredMinRx=250"
"cumulus":"swp44" -- "switch04.cumulusnetworks.com":"swp20"
"cumulus":"swp46" -- "switch04.cumulusnetworks.com":"swp22"
}
However, in this next example, PTM will compare using the FQDN and look for switch05.cumulusnetworks.
com, which is the FQDN of the switch it’s running on:
graph G {
hostnametype="fqdn"
"cumulus":"swp44" -- "switch05.cumulusnetworks.com":"swp20"
"cumulus":"swp46" -- "switch05.cumulusnetworks.com":"swp22"
}
Global Parameters
Global parameters apply to every port listed in the topology file. There are two global parameters: LLDP and
BFD. LLDP is enabled by default; if no keyword is present, default values are used for all ports. However,
BFD is disabled if no keyword is present, unless there is a per-port override configured. For example:
graph G {
LLDP=""
BFD="upMinTx=150,requiredMinRx=250,afi=both"
"cumulus":"swp44" -- "qct-ly2-04":"swp20"
"cumulus":"swp46" -- "qct-ly2-04":"swp22"
}
Per-port Parameters
Per-port parameters provide finer-grained control at the port level. These parameters override any global or
compiled defaults. For example:
graph G {
LLDP=""
BFD="upMinTx=300,requiredMinRx=100"
cumulusnetworks.com 305
Cumulus Linux 3.5 User Guide
Templates
Templates provide flexibility in choosing different parameter combinations and applying them to a given
port. A template instructs ptmd to reference a named parameter string instead of a default one. There are
two parameter strings ptmd supports:
bfdtmpl, which specifies a custom parameter tuple for BFD.
lldptmpl, which specifies a custom parameter tuple for LLDP.
For example:
graph G {
LLDP=""
BFD="upMinTx=300,requiredMinRx=100"
BFD1="upMinTx=200,requiredMinRx=200"
BFD2="upMinTx=100,requiredMinRx=300"
LLDP1="match_type=ifname"
LLDP2="match_type=portdescr"
"cumulus":"swp44" -- "qct-ly2-04":"swp20" [BFD="
bfdtmpl=BFD1", LLDP="lldptmpl=LLDP1"]
"cumulus":"swp46" -- "qct-ly2-04":"swp22" [BFD="
bfdtmpl=BFD2", LLDP="lldptmpl=LLDP2"]
"cumulus":"swp46" -- "qct-ly2-04":"swp22"
}
In this template, LLDP1 and LLDP2 are templates for LLDP parameters while BFD1 and BFD2 are templates
for BFD parameters.
graph G {
"cumulus-1":"swp44" -- "cumulus-2":"swp20" [BFD="upMinTx=300,
requiredMinRx=100,afi=v6"]
"cumulus-1":"swp46" -- "cumulus-2":"swp22" [BFD="
detectMult=4"]
}
graph G {
"cumulus-1":"swp44" -- "cumulus-2":"swp20" [LLDP="
match_hostname=fqdn"]
"cumulus-1":"swp46" -- "cumulus-2":"swp22" [LLDP="
match_type=portdescr"]
}
When you specify match_hostname=fqdn, ptmd will match the entire FQDN, like cumulus-2.
domain.com in the example below. If you do not specify anything for match_hostname, ptmd will
match based on hostname only, like cumulus-3 below, and ignore the rest of the URL:
graph G {
"cumulus-1":"swp44" -- "cumulus-2.domain.com":"swp20"
[LLDP="match_hostname=fqdn"]
"cumulus-1":"swp46" -- "cumulus-3":"swp22" [LLDP="
match_type=portdescr"]
}
cumulusnetworks.com 307
Cumulus Linux 3.5 User Guide
You only need to do this to check link state; you don't need to enable PTM to determine BFD
status.
The check is enabled by default. Every interface has an implied ptm-enable line in the configuration
stanza in the interfaces file.
To disable the checks, delete the ptm-enable parameter from the interface. For example:
With PTM enabled on an interface, the zebra daemon connects to ptmd over a Unix socket. Any time there
is a change of status for an interface, ptmd sends notifications to zebra. Zebra maintains a ptm-status
flag per interface and evaluates routing adjacency based on this flag. To check the per-interface ptm-
status:
ptmctl Examples
The examples below contain the following keywords in the output of the cbl status column, which are
described here:
cbl Definition
status
Keyword
pass The interface is defined in the topology file, LLDP information is received on the interface,
and the LLDP information for the interface matches the information in the topology file.
fail The interface is defined in the topology file, LLDP information is received on the interface,
and the LLDP information for the interface does not match the information in the topology
file.
N/A The interface is defined in the topology file, but no LLDP information is received on the
interface. The interface may be down or disconnected, or the neighbor is not sending LLDP
packets.
The "N/A" and "fail" statuses may indicate a wiring problem to investigate.
The "N/A" status is not shown when using the -l option with ptmctl. If you specify the -l
option, ptmctl displays only those interfaces that are receiving LLDP information.
-------------------------------------------------------------
port cbl BFD BFD BFD BFD
status status peer local type
-------------------------------------------------------------
swp1 pass pass 11.0.0.2 N/A singlehop
swp2 pass N/A N/A N/A N/A
swp3 pass N/A N/A N/A N/A
----------------------------------------------------------------------
----------------
port cbl exp act sysname portID portDescr match
last BFD BFD
status nbr nbr on
upd Type state
----------------------------------------------------------------------
----------------
swp45 pass h1:swp1 h1:swp1 h1 swp1 swp1 IfName 5m:
5s N/A N/A
swp46 fail h2:swp1 h2:swp1 h2 swp1 swp1 IfName 5m:
5s N/A N/A
To return information on active BFD sessions ptmd is tracking, use the -b option:
----------------------------------------------------------
port peer state local type diag
----------------------------------------------------------
swp1 11.0.0.2 Up N/A singlehop N/A
N/A 12.12.12.1 Up 12.12.12.4 multihop N/A
To return LLDP information, use the -l option. It returns only the active neighbors currently being tracked
by ptmd.
---------------------------------------------
To return detailed information on active BFD sessions ptmd is tracking, use the -b and -d options (results
are for an IPv6-connected peer):
----------------------------------------------------------------------
------------------
port peer state local type diag det
tx_timeout rx_timeout
mult
----------------------------------------------------------------------
------------------
swp1 fe80::202:ff:fe00:1 Up N/A singlehop N/A 3
300 900
swp1 3101:abc:bcad::2 Up N/A singlehop N/A 3
300 900
#continuation of output
---------------------------------------------------------------------
echo echo max rx_ctrl tx_ctrl rx_echo tx_echo
tx_timeout rx_timeout hop_cnt
---------------------------------------------------------------------
0 0 N/A 187172 185986 0 0
0 0 N/A 501 533 0 0
cumulusnetworks.com 311
Cumulus Linux 3.5 User Guide
Unsupported command
For example:
If you encounter errors with the topology.dot file, you can use dot (included in the Graphviz
package) to validate the syntax of the topology file.
By simply opening the topology file with Graphviz, you can ensure that it is readable and that the
file format is correct.
If you edit topology.dot file from a Windows system, be sure to double check the file
formatting; there may be extra characters that keep the graph from working correctly.
Related Information
Bidirectional Forwarding Detection (BFD)
Graphviz
312 02 March 2018
Cumulus Networks
Graphviz
LLDP on Wikipedia
PTMd GitHub repo
Contents
This chapter covers ...
Hash Distribution (see page 313)
Creating a Bond (see page 314)
Configuration Options (see page 314)
Enabling balance-xor Mode (see page 316)
Example Configuration: Bonding 4 Slaves (see page 317)
Caveats and Errata (see page 319)
Related Information (see page 319)
Hash Distribution
Egress traffic through a bond is distributed to a slave based on a packet hash calculation, providing load
balancing over the slaves; many conversation flows are distributed over all available slaves to load balance
the total traffic. Traffic for a single conversation flow always hashes to the same slave.
The hash calculation uses packet header data to pick which slave to transmit the packet to:
For IP traffic, IP header source and destination fields are used in the calculation.
For IP + TCP/UDP traffic, source and destination ports are included in the hash calculation.
In a failover event, the hash calculation is adjusted to steer traffic over available slaves.
cumulusnetworks.com 313
Cumulus Linux 3.5 User Guide
Creating a Bond
Bonds can be created and configured using the Network Command Line Utility ( NCLU (see page 82)).
Follow the steps below to create a new bond:
Configuration Options
The configuration options, and their default values, are listed in the table below.
Each bond configuration option, except for bond slaves, is set to the recommended value by
default in Cumulus Linux. They should only be configured if a different setting is needed. For
more information on configuration values, refer to the Related Information section below.
bond miimon Defines how often the link state of each slave is inspected for failures. 100
bond Specifies the time, in milliseconds, to wait before disabling a slave after a 0
downdelay link failure has been detected. This option is only valid for the miimon link
monitor. The downdelay value should be a multiple of the miimon value; if
not, it will be rounded down to the nearest multiple.
bond Specifies the time, in milliseconds, to wait before enabling a slave after a 0
updelay link recovery has been detected. This option is only valid for the miimon
link monitor. The updelay value should be a multiple of the miimon value;
if not, it will be rounded down to the nearest multiple.
bond xmit- Hash method used to select the slave for a given packet. layer3+4
hash-policy
bond lacp- Sets the rate to ask the link partner to transmit LACP control packets. 1
rate
You can set the LACP rate to slow using NCLU (see page 82):
bond min- Defines the minimum number of links that must be active before the bond 1
links is put into service.
cumulusnetworks.com 315
Cumulus Linux 3.5 User Guide
You should use balance-xor mode only if you cannot use LACP for some reason, as LACP can
detect mismatched link attributes among bond members and can even detect misconnections.
In order to change the mode of an existing bond, you must first delete the bond, then recreate it with the
balance-xor mode. Assuming the bond doesn't exist on the host, you can configure it as follows:
auto bond1
iface bond1
bond-mode balance-xor
bond-slaves swp3 swp4
Bond Details
--------------- -------------
Bond Mode: Balance-XOR
Load Balancing: Layer3+4
Minimum Links: 1
In CLAG: CLAG Inactive
UP swp3(P) 10G 0 0 0 0
UP swp4(P) 10G 0 0 0 0
LLDP
------- ---- ------------
swp3(P) ==== swp1(p1c1h1)
swp4(P) ==== swp2(p1c1h1)Routing
-------
Interface bond1 is up, line protocol is up
Link ups: 3 last: 2017/04/26 21:00:38.26
Link downs: 2 last: 2017/04/26 20:59:56.78
PTM status: disabled
vrf: Default-IP-Routing-Table
index 31 metric 0 mtu 1500
flags: <UP,BROADCAST,RUNNING,MULTICAST>
Type: Ethernet
HWaddr: 00:02:00:00:00:12
inet6 fe80::202:ff:fe00:12/64
Interface Type Other
cumulusnetworks.com 317
Cumulus Linux 3.5 User Guide
auto bond0
iface bond0
address 10.0.0.1/30
bond-slaves swp1 swp2 swp3 swp4
If you are intending that the bond become part of a bridge, you don't need to specify an
IP address.
When networking is started on switch, bond0 is created as MASTER and interfaces swp1-swp4 come up in
SLAVE mode, as seen in the ip link show command:
...
All slave interfaces within a bond have the same MAC address as the bond. Typically, the first
slave added to the bond donates its MAC address as the bond MAC address, while the other
slaves’ MAC addresses are set to the bond MAC address.
The bond MAC address is used as source MAC address for all traffic leaving the bond, and
provides a single destination MAC address to address traffic to the bond.
Related Information
Linux Foundation - Bonding
802.3ad (Accessible writeup)
Wikipedia - Link aggregation
Bridge members can be individual physical interfaces, bonds or logical interfaces that traverse an
802.1Q VLAN trunk.
Cumulus Networks recommends using VLAN-aware mode (see page 325) bridges, rather than
cumulusnetworks.com 319
Cumulus Linux 3.5 User Guide
Cumulus Networks recommends using VLAN-aware mode (see page 325) bridges, rather than
traditional mode bridges. The bridge driver in Cumulus Linux is capable of VLAN filtering, which
allows for configurations that are similar to incumbent network devices. While Cumulus Linux
supports Ethernet bridges in traditional mode, Cumulus Networks recommends using VLAN-
aware (see page 325) mode.
For a comparison of traditional and VLAN-aware modes, read this knowledge base article.
Cumulus Linux does not put all ports into a bridge by default.
You can configure both VLAN-aware and traditional mode bridges on the same network in
Cumulus Linux; however you cannot have more than one VLAN-aware bridge on a given switch.
Contents
This chapter covers ...
Creating a VLAN-aware Bridge (see page 320)
Creating a Traditional Mode Bridge (see page 320)
Configuring Bridge MAC Addresses (see page 320)
MAC Address Ageing (see page 321)
Configuring an SVI (Switch VLAN Interface) (see page 321)
Keeping the SVI in an UP State (see page 323)
Caveats and Errata (see page 324)
Related Information (see page 324)
The following example output shows a MAC address table for the bridge:
...
auto bridge
iface bridge
bridge-ageing 600
...
When an interface is added to a bridge, it ceases to function as a router interface, and the IP
cumulusnetworks.com 321
Cumulus Linux 3.5 User Guide
When an interface is added to a bridge, it ceases to function as a router interface, and the IP
address on the interface, if any, becomes unreachable.
These commands create the following SVI configuration in the /etc/network/interfaces file:
auto bridge
iface bridge
bridge-ports swp1 swp2
bridge-vids 10
bridge-vlan-aware yes
auto vlan10
iface vlan10
address 10.100.100.1/24
vlan-id 10
vlan-raw-device bridge
Notice the vlan-raw-device keyword, which NCLU includes automatically. NCLU uses this
keyword to associate the SVI with the VLAN-aware bridge.
Alternately, you can use the bridge.VLAN-ID naming convention for the SVI. The following example
configuration can be manually created in the /etc/network/interfaces file, which functions identically
to the above configuration:
auto bridge
iface bridge
bridge-ports swp1 swp2
bridge-vids 10
bridge-vlan-aware yes
auto bridge.10
iface bridge.10
address 10.100.100.1/24
auto bridge
iface bridge
bridge-vlan-aware yes
bridge-ports swp3
bridge-vids 100
bridge-pvid 1
...
With this configuration, when swp3 is down, the SVI is also down:
1. Create a dummy interface, and add it to the bridge configuration. You do this by editing the /etc
/network/interfaces file and adding the dummy interface stanza before the bridge stanza:
auto dummy
iface dummy
link-type dummy
auto bridge
cumulusnetworks.com 323
Cumulus Linux 3.5 User Guide
iface bridge
...
2. Continue editing the interfaces file. Add the dummy interface to the bridge-ports line in the
bridge configuration:
auto bridge
iface bridge
bridge-vlan-aware yes
bridge-ports swp3 dummy
bridge-vids 100
bridge-pvid 1
Now, even when swp3 is down, both the dummy interface and the bridge remain up:
Related Information
Linux Foundation - Bridges
Linux Foundation - VLANs
Linux Journal - Linux as an Ethernet Bridge
You can configure both VLAN-aware and traditional mode bridges on the same network in
Cumulus Linux; however you should not have more than one VLAN-aware bridge on a given
switch.
Contents
This chapter covers ...
Configuring a VLAN-aware Bridge (see page 326)
Example Configurations (see page 327)
VLAN Filtering/VLAN Pruning (see page 327)
Untagged/Access Ports (see page 328)
Dropping Untagged Frames (see page 329)
VLAN Layer 3 Addressing — Switch Virtual Interfaces and Other VLAN Attributes (see page
330)
Configuring ARP Timers (see page 331)
Configuring Multiple Ports in a Range (see page 331)
Access Ports and Pruned VLANs (see page 332)
Large Bond Set Configuration (see page 333)
VXLANs with VLAN-aware Bridges (see page 334)
Configuring a Static MAC Address Entry (see page 335)
Caveats and Errata (see page 336)
Spanning Tree Protocol (STP) (see page 336)
IGMP Snooping (see page 336)
Reserved VLAN Range (see page 336)
VLAN Translation (see page 336)
...
auto bridge
iface bridge
bridge-ports swp1 swp2
bridge-pvid 1
bridge-vids 100 200
bridge-vlan-aware yes
...
For a definitive list of bridge attributes, run ifquery --syntax-help and look for the entries under
bridge, bridgevlan and mstpctl.
The bridge-pvid 1 is implied by default. You do not have to specify bridge-pvid for a bridge
or a port; in this case, the VLAN is untagged. And while it does not hurt the configuration, it helps
other users for readability.
The following configurations are identical to each other and the configuration above:
Do not try to bridge the management port, eth0, with any switch ports (like swp0, swp1 and so
forth). For example, if you created a bridge with eth0 and swp1, it will not work properly and may
disrupt access to the management interface.
Example Configurations
cumulusnetworks.com 327
Cumulus Linux 3.5 User Guide
...
auto bridge
iface bridge
bridge-ports swp1 swp2 swp3
bridge-pvid 1
bridge-vids 100 200
bridge-vlan-aware yes
auto swp3
iface swp3
bridge-vids 200
Untagged/Access Ports
Access ports ignore all tagged packets. In the configuration below, swp1 and swp2 are configured as access
ports, while all untagged traffic goes to VLAN 100, as specified in the example below:
...
auto bridge
iface bridge
bridge-ports swp1 swp2
bridge-pvid 1
bridge-vids 100 200
bridge-vlan-aware yes
auto swp1
iface swp1
bridge-access 100
auto swp2
iface swp2
bridge-access 100
...
auto bridge
iface bridge
bridge-ports swp1 swp2
bridge-pvid 1
bridge-vids 10 100 200
bridge-vlan-aware yes
This command creates the following configuration snippet in the /etc/network/interfaces file. Note
the bridge-allow-untagged configuration is under swp2:
cumulusnetworks.com 329
Cumulus Linux 3.5 User Guide
...
auto swp1
iface swp1
auto swp2
iface swp2
bridge-allow-untagged no
auto bridge
iface bridge
bridge-ports swp1 swp2
bridge-pvid 1
bridge-vids 10 100 200
bridge-vlan-aware yes
...
When you check VLAN membership for that port, it shows that there is no untagged VLAN.
auto bridge
iface bridge
bridge-ports swp1 swp2
bridge-pvid 1
bridge-vids 10 100 200
bridge-vlan-aware yes
auto vlan100
iface vlan100
address 192.168.10.1/24
address 2001:db8::1/32
vlan-id 100
vlan-raw-device bridge
In the above configuration, if your switch is configured for multicast routing, you do not need to
specify bridge-igmp-querier-src, as there is no need for a static IGMP querier configuration
on the switch. Otherwise, the static IGMP querier configuration helps to probe the hosts to
refresh their IGMP reports.
cumulusnetworks.com 331
Cumulus Linux 3.5 User Guide
auto bridge
iface bridge
bridge-ports swp1 swp2 swp3 ... swp51 swp52
bridge-vids 310 700 707 712 850 910
bridge-vlan-aware yes
...
# ports swp3-swp48 are trunk ports which inherit vlans from the
'bridge'
# ie vlans 310,700,707,712,850,910
#
auto bridge
iface bridge
bridge-ports swp1 swp2 swp3 ... swp51 swp52
bridge-vids 310 700 707 712 850 910
bridge-vlan-aware yes
auto swp1
iface swp1
bridge-access 310
mstpctl-bpduguard yes
mstpctl-portadminedge yes
# The following port is the trunk uplink and inherits all vlans
# from 'bridge'; bridge assurance is enabled using 'portnetwork'
attribute
auto swp49
iface swp49
mstpctl-portnetwork yes
mstpctl-portpathcost 10
# The following port is the trunk uplink and inherits all vlans
# from 'bridge'; bridge assurance is enabled using 'portnetwork'
attribute
auto swp50
iface swp50
mstpctl-portnetwork yes
mstpctl-portpathcost 0
...
...
#
# vlan-aware bridge with bonds example
#
# uplink1, peerlink and downlink are bond interfaces.
# 'bridge' is a vlan aware bridge with ports uplink1, peerlink
# and downlink (swp2-20).
#
# native vlan is by default 1
#
# 'bridge-vids' attribute is used to declare vlans.
# 'bridge-pvid' attribute is used to specify native vlans if other
than 1
# 'bridge-access' attribute is used to declare access port
#
auto lo
iface lo
auto eth0
iface eth0 inet dhcp
# bond interface
auto uplink1
iface uplink1
bond-slaves swp32
bridge-vids 2000-2079
# bond interface
auto peerlink
iface peerlink
bond-slaves swp30 swp31
bridge-vids 2000-2079 4094
cumulusnetworks.com 333
Cumulus Linux 3.5 User Guide
# bond interface
auto downlink
iface downlink
bond-slaves swp1
bridge-vids 2000-2079
#
# Declare vlans for all swp ports
# swp2-20 get vlans from 2004 to 2022.
# The below uses mako templates to generate iface sections
# with vlans for swp ports
#
%for port, vlanid in zip(range(2, 20), range(2004, 2022)) :
auto swp${port}
iface swp${port}
bridge-vids ${vlanid}
%endfor
#
# vlan-aware bridge
#
auto bridge
iface bridge
bridge-ports uplink1 peerlink downlink swp1 swp2 swp49 swp50
bridge-vlan-aware yes
...
The configuration example below shows the differences between a VXLAN configured for traditional bridge
mode and one configured for VLAN-aware mode. The configurations use head end replication (HER), along
with the VLAN-aware bridge to map VLANs to VNIs.
The current tested scale limit for Cumulus Linux 3.2 is 512 VNIs.
...
auto lo
iface lo inet loopback
address 10.35.0.10/32
auto bridge
iface bridge
bridge-ports uplink regex vni.*
bridge-pvid 1
bridge-vids 1-100
bridge-vlan-aware yes
auto vni-10000
iface vni-10000
alias CUSTOMER X VLAN 10
bridge-access 10
vxlan-id 10000
vxlan-local-tunnelip 10.35.0.10
vxlan-remoteip 10.35.0.34
...
cumulusnetworks.com 335
Cumulus Linux 3.5 User Guide
IGMP Snooping
IGMP snooping and group membership are supported on a per-VLAN basis, though the IGMP snooping
configuration (including enable/disable and mrouter ports) are defined on a per-bridge port basis.
resv_vlan_range
While restarting switchd, all running ports will flap, and forwarding will be interrupted.
VLAN Translation
A bridge in VLAN-aware mode cannot have VLAN translation enabled for it. Only traditional mode bridges
can utilize VLAN translation.
Contents
This chapter covers ...
Creating a Traditional Mode Bridge (see page 337)
Using Trunks in Traditional Bridge Mode (see page 339)
Trunk Example (see page 340)
VLAN Tagging Examples (see page 341)
Configuring ARP Timers (see page 341)
auto my_bridge
iface my_bridge
bridge-ports bond0 swp5 swp6
bridge-ageing 150
bridge-stp on
bridge-ports List of logical and physical ports belonging to the logical bridge. N/A
bridge-ageing Maximum amount of time before a MAC addresses learned on the 1800
bridge expires from the bridge MAC cache. seconds
bridge-stp off
cumulusnetworks.com 337
Cumulus Linux 3.5 User Guide
Do not try to bridge the management port, eth0, with any switch ports (like swp0, swp1,
and so forth). For example, if you created a bridge with eth0 and swp1, it will not work.
You can configure multiple bridges, in order to logically divide a switch into multiple layer 2
domains. This allows for hosts to communicate with other hosts in the same domain, while
separating them fro hosts in other domains.
The diagram below shows a multiple bridge configuration, where host-1 and host-2 are
connected to bridge-A, while host-3 and host-4 are connected to bridge-B. This means that:
host-1 and host-2 can communicate with each other.
host-3 and host-4 can communicate with each other.
host-1 and host-2 cannot communicate with host-3 and host-4.
auto bridge-A
iface bridge-A
bridge-ports swp1 swp2
bridge-stp on
auto bridge-B
iface bridge-B
bridge-ports swp3 swp4
bridge-stp on
cumulusnetworks.com 339
Cumulus Linux 3.5 User Guide
A switch receiving a tagged frame on a trunk port places that frame in the VLAN identified by the
802.1Q tag.
A bridge in traditional mode has no concept of trunks, just tagged or untagged frames. With a trunk of 200
VLANs, there would need to be 199 bridges, each containing a tagged physical interface, and one bridge
containing the native untagged VLAN. See the examples below for more information.
The interaction of tagged and un-tagged frames on the same trunk often leads to undesired and
unexpected behavior. A switch that uses VLAN 1 for the native VLAN may send frames to a switch
that uses VLAN 2 for the native VLAN, thus merging those two VLANs and their spanning tree
state.
Trunk Example
To create the above example, add the following configuration to the /etc/network/interfaces file:
auto br-VLAN100
iface br-VLAN100
bridge-ports swp1.100 swp2.100
bridge-stp on
auto br-VLAN200
iface br-VLAN200
bridge-ports swp1.200 swp2.200
bridge-stp on
VLAN Tagging
This article shows two examples of VLAN tagging, one basic and one more advanced. They both
demonstrate the streamlined interface configuration from ifupdown2.
Contents
This chapter covers ...
VLAN Tagging, a Basic Example (see page 341)
VLAN Tagging, an Advanced Example (see page 342)
VLAN Translation (see page 347)
cumulusnetworks.com 341
Cumulus Linux 3.5 User Guide
host1 connects to swp1 with both untagged frames and with 802.1Q frames tagged for vlan100.
host2 connects to swp2 with 802.1Q frames tagged for vlan120 and vlan130.
To configure the above example, edit the /etc/network/interfaces file and add a configuration like
the following:
auto swp1
iface swp1
auto swp1.100
iface swp1.100
auto swp2
iface swp2
auto swp2.120
iface swp2.120
auto swp2.130
iface swp2.130
host1 connects to bridge br-untagged with bare Ethernet frames and to bridge br-tag100 with 802.1q
frames tagged for vlan100.
host2 connects to bridge br-tag100 with 802.1q frames tagged for vlan100 and to bridge br-vlan120
with 802.1q frames tagged for vlan120.
host3 connects to bridge br-vlan120 with 802.1q frames tagged for vlan120 and to bridge v130 with
802.1q frames tagged for vlan130.
bond2 carries tagged and untagged frames in this example.
Although not explicitly designated, the bridge member ports function as 802.1Q access ports and trunk ports
. In the example above, comparing Cumulus Linux with a traditional Cisco device:
swp1 is equivalent to a trunk port with untagged and vlan100.
swp2 is equivalent to a trunk port with vlan100 and vlan120.
swp3 is equivalent to a trunk port with vlan120 and vlan130.
bond2 is equivalent to an EtherChannel in trunk mode with untagged, vlan100, vlan120, and vlan130.
Bridges br-untagged, br-tag100, br-vlan120, and v130 are equivalent to SVIs (switched virtual
interfaces).
To create the above configuration, edit the /etc/network/interfaces file and add a configuration like
the following:
cumulusnetworks.com 343
Cumulus Linux 3.5 User Guide
auto swp1.100
iface swp1.100
auto swp2.100
iface swp2.100
auto swp2.120
iface swp2.120
auto swp3.120
iface swp3.120
auto swp3.130
iface swp3.130
auto bond2
iface bond2
bond-slaves glob swp4-7
auto br-untagged
iface br-untagged
address 10.0.0.1/24
bridge-ports swp1 bond2
bridge-stp on
auto br-tag100
iface br-tag100
address 10.0.100.1/24
bridge-ports swp1.100 swp2.100 bond2.100
bridge-stp on
auto br-vlan120
iface br-vlan120
address 10.0.120.1/24
bridge-ports swp2.120 swp3.120 bond2.120
bridge-stp on
auto v130
iface v130
address 10.0.130.1/24
bridge-ports swp3.130 bond2.130
bridge-stp on
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
To verify:
cumulusnetworks.com 345
Cumulus Linux 3.5 User Guide
802.3ad info
LACP rate: fast
Min links: 0
Aggregator selection policy (ad_select): stable
Active Aggregator Info:
Aggregator ID: 3
Number of ports: 4
Actor Key: 33
Partner Key: 33
Partner Mac Address: 44:38:39:00:32:cf
Aggregator ID: 3
Slave queue ID: 0
A single bridge cannot contain multiple subinterfaces of the same port as members. Attempting
to apply such a configuration will result in an error:
VLAN Translation
By default, Cumulus Linux does not allow VLAN subinterfaces associated with different VLAN IDs to be part
of the same bridge. Base interfaces are not explicitly associated with any VLAN IDs and are exempt from
this restriction.
cumulusnetworks.com 347
Cumulus Linux 3.5 User Guide
In some cases, it may be useful to relax this restriction. For example, two servers may be connected to the
switch using VLAN trunks, but the VLAN numbering provisioned on the two servers are not consistent. You
can choose to just bridge two VLAN subinterfaces of different VLAN IDs from the servers. You do this by
enabling the sysctl net.bridge.bridge-allow-multiple-vlans. Packets entering a bridge from a
member VLAN subinterface will egress another member VLAN subinterface with the VLAN ID translated.
A bridge in VLAN-aware mode (see page 325) cannot have VLAN translation enabled for it; only
bridges configured in traditional mode (see page 337) can utilize VLAN translation.
If the sysctl is enabled and you want to disable it, run the above example, setting the sysctl net.
bridge.bridge-allow-multiple-vlans to 0.
Once the sysctl is enabled, ports with different VLAN IDs can be added to the same bridge. In the
following example, packets entering the bridge br-mix from swp10.100 will be bridged to swp11.200 with
the VLAN ID translated from 100 to 200:
MLAG or CLAG?
The Cumulus Linux implementation of MLAG is referred to by other vendors as CLAG, MC-LAG or
VPC. You will even see references to CLAG in Cumulus Linux, including the management daemon,
named clagd, and other options in the code, such as clag-id, which exist for historical
purposes. But rest assured that the Cumulus Linux implementation is truly a multi-chassis link
aggregation protocol, so we call it MLAG.
Dual-connected devices can create LACP bonds that contain links to each physical switch. Thus, active-
active links from the dual-connected devices are supported even though they are connected to two
different physical switches.
A basic setup looks like this:
The two switches, S1 and S2, known as peer switches, cooperate so that they appear as a single device to
host H1's bond. H1 distributes traffic between the two links to S1 and S2 in any manner that you configure
on the host. Similarly, traffic inbound to H1 can traverse S1 or S2 and arrive at H1.
Contents
This chapter covers ...
MLAG Requirements (see page 350)
LACP and Dual-Connectedness (see page 351)
cumulusnetworks.com 349
Cumulus Linux 3.5 User Guide
MLAG Requirements
MLAG has these requirements:
There must be a direct connection between the two peer switches implementing MLAG (S1 and S2).
This is typically a bond for increased reliability and bandwidth.
There must be only two peer switches in one MLAG configuration, but you can have multiple
configurations in a network for switch-to-switch MLAG (see below).
The peer switches implementing MLAG must be running Cumulus Linux version 2.5 or later.
You must specify a unique clag-id for every dual-connected bond on each peer switch; the value
must be between 1 and 65535 and must be the same on both peer switches in order for the bond
to be considered dual-connected.
The dual-connected devices (servers or switches) can use LACP (IEEE 802.3ad/802.1ax) to form the
bond (see page 313). In this case, the peer switches must also use LACP.
If for some reason you cannot use LACP, you can also use balance-xor mode (see page
350 02 March 2018
Cumulus Networks
If for some reason you cannot use LACP, you can also use balance-xor mode (see page
316) to dual-connect host-facing bonds in an MLAG environment. If you do, you must still
configure the same clag_id parameter on the MLAG bonds, and it must be the same
on both MLAG switches. Otherwise, the bonds are treated by the MLAG switch pair as if
they were single-connected.
More elaborate configurations are also possible. The number of links between the host and the switches
can be greater than two, and does not have to be symmetrical:
Additionally, since S1 and S2 appear as a single switch to other bonding devices, pairs of MLAG switches
can also be connected to each other in a switch-to-switch MLAG setup:
In this case, L1 and L2 are also MLAG peer switches, and thus present a two-port bond from a single logical
system to S1 and S2. S1 and S2 do the same as far as L1 and L2 are concerned. For a switch-to-switch
MLAG configuration, each switch pair must have a unique system MAC address. In the above example,
switches L1 and L2 each have the same system MAC address configured. Switch pair S1 and S2 each have
the same system MAC address configured; however, it is a different system MAC address than the one used
by the switch pair L1 and L2.
Typically, Link Aggregation Control Protocol (LACP), the IEEE standard protocol for managing bonds, is used
cumulusnetworks.com 351
Cumulus Linux 3.5 User Guide
Typically, Link Aggregation Control Protocol (LACP), the IEEE standard protocol for managing bonds, is used
for verifying dual-connectedness. LACP runs on the dual-connected device and on each of the peer
switches. On the dual-connected device, the only configuration requirement is to create a bond that will be
managed by LACP.
However, if for some reason you cannot use LACP in your environment, you can configure the bonds in
balance-xor mode (see page 316). When using balance-xor mode to dual-connect host-facing bonds in an
MLAG environment, the clag_id parameter must be configured on the MLAG bonds and must be the
same on both MLAG switches. Otherwise, the bonds are treated by the MLAG switch pair as if they were
single-connected. In short, dual-connectedness is solely determined by matching clag_id and any
misconnection will not be detected.
On each of the peer switches, the links connected to the dual-connected host or switch must be placed in
the bond. This is true even if the links are a single port on each peer switch, where each port is placed into
a bond, as shown below:
All of the dual-connected bonds on the peer switches have their system ID set to the MLAG system ID.
Therefore, from the point of view of the hosts, each of the links in its bond is connected to the same
system, and so the host will use both links.
Each peer switch periodically makes a list of the LACP partner MAC addresses for all of their bonds and
sends that list to its peer (using the clagd service; see below). The LACP partner MAC address is the MAC
address of the system at the other end of a bond, which in the figure above would be hosts H1, H2 and H3.
When a switch receives this list from its peer, it compares the list to the LACP partner MAC addresses on its
switch. If any matches are found and the clag-id for those bonds match, then that bond is a dual-
connected bond. You can also find the LACP partner MAC address by running net show bridge macs,
or by looking in the /sys/class/net/<bondname>/bonding/ad_partner_mac sysfs file for each
bond.
Configuring MLAG
Configuring MLAG involves:
On the dual-connected devices, creating a bond that uses LACP.
On each peer switch, configuring the interfaces, including bonds, VLANs, bridges and peer links.
Port configuration: For example, VLAN membership, MTU (see page 374) and bonding
352 02 March 2018
Cumulus Networks
Port configuration: For example, VLAN membership, MTU (see page 374) and bonding
parameters.
Bridge configuration: For example, spanning tree parameters or bridge properties.
Static address entries: For example, static FDB entries and static IGMP entries.
QoS configuration: For example, ACL entries.
You can verify the configuration of VLAN membership using the net show clag verify-
vlans command.
Click to see the output ...
cumulusnetworks.com 353
Cumulus Linux 3.5 User Guide
There is no need to add VLAN 4094 to the bridge VLAN list, as it is unnecessary there.
The above commands produce the following configuration in the /etc/network/interfaces
file:
auto peerlink
iface peerlink
bond-slaves swp49 swp50
auto peerlink.4094
iface peerlink.4094
address 169.254.1.1/30
clagd-peer-ip 169.254.1.2
clagd-backup-ip 192.0.2.50
clagd-sys-mac 44:38:39:FF:40:94
To enable MLAG, peerlink must be added to a traditional or VLAN-aware bridge. The commands
below add peerlink to a VLAN-aware bridge:
auto bridge
iface bridge
bridge-ports peerlink
bridge-vlan-aware yes
Keep in mind that if you change the MLAG configuration by editing the interfaces
file, the changes take effect when you bring the peer link interface up with ifup. Do not
use systemctl restart clagd.service to apply the new configuration.
cumulusnetworks.com 357
Cumulus Linux 3.5 User Guide
By default, the role is determined by comparing the MAC addresses of the two sides of the peering link; the
switch with the lower MAC address assumes the primary role. You can override this by setting the clagd-
priority option for the peer link:
The switch with the lower priority value is given the primary role; the default value is 32768, and the range
is 0 to 65535. Read the clagd(8) and clagctl(8) man pages for more information.
When the clagd service is exited during switch reboot or the service is stopped in the primary switch, the
peer switch that is in the secondary role becomes the primary.
However, if the primary switch goes down without stopping the clagd service for any reason, or if the peer
link goes down, the secondary switch will not change its role. In case the peer switch is determined to be
not alive, the switch in the secondary role will roll back the LACP system ID to be the bond interface MAC
address instead of the clagd-sys-mac and the switch in primary role uses the clagd-sys-mac as the
LACP system ID on the bonds.
You configure these interfaces using NCLU (see page 82), so the bridges are in VLAN-aware mode (see
page 325). The bridges use these Cumulus Linux-specific keywords:
bridge-vids, which defines the allowed list of tagged 802.1q VLAN IDs for all bridge member
interfaces. You can specify non-contiguous ranges with a space-separated list, like
bridge-vids 100-200 300 400-500.
bridge-pvid, which defines the untagged VLAN ID for each port. This is commonly referred to as
the native VLAN.
The bridge configurations below indicate that each bond carries tagged frames on VLANs 10, 20, 30, 40, 50
and 100 to 200 (as specified by bridge-vids), but untagged frames on VLAN 1 (as specified by bridge-
pvid). Also, take note on how you configure the VLAN subinterfaces used for clagd communication (
peerlink.4094 in the sample configuration below). Finally, the host configurations for server01 through
server04 are not shown here. The configurations for each corresponding node are almost identical, except
for the IP addresses used for managing the clagd service.
VLAN Precautions
At minimum, this VLAN subinterface should not be in your layer 2 domain, and you should give it
a very high VLAN ID (up to 4094). Read more about the range of VLAN IDs you can use (see page
).
The commands to create the configurations for both spines should look like the following. Note that the
clag-id and clagd-sys-mac must be the same for the corresponding bonds on spine01 and spine02:
spine01 spine02
These commands create the following These commands create the following
configuration in the /etc/network/interfaces configuration in the /etc/network/interfaces
file: file:
# downlinks # downlinks
cumulusnetworks.com 359
Cumulus Linux 3.5 User Guide
Here is an example configuration for the switches leaf01 through leaf04. Note that the clag-id and
clagd-sys-mac must be the same for the corresponding bonds on leaf01 and leaf02 as well as leaf03
and leaf04:
leaf01 leaf02
net add bgp ipv4 unicast net add bgp ipv4 unicast
neighbor fabric prefix-list dc- neighbor fabric prefix-list dc-
leaf-out out leaf-out out
net add bgp neighbor swp51-52 net add bgp neighbor swp51-52
interface peer-group fabric interface peer-group fabric
net add vlan 100 ip address net add vlan 100 ip address
172.16.1.1/24 172.16.1.2/24
net add bgp ipv4 unicast net add bgp ipv4 unicast
network 172.16.1.1/24 network 172.16.1.2/24
net add clag peer sys-mac 44: net add clag peer sys-mac 44:38:
38:39:FF:00:01 interface swp49- 39:FF:00:01 interface swp49-50
50 primary backup-ip secondary backup-ip 192.168.1.11
192.168.1.12 net add clag port bond server1
net add clag port bond server1 interface swp1 clag-id 1
interface swp1 clag-id 1 net add clag port bond server2
net add clag port bond server2 interface swp2 clag-id 2
interface swp2 clag-id 2 net add bond server1-2 bridge
net add bond server1-2 bridge access 100
access 100 net add bond server1-2 stp
net add bond server1-2 stp portadminedge
portadminedge net add bond server1-2 stp
net add bond server1-2 stp bpduguard
bpduguard
These commands create the following These commands create the following configuration
configuration in the /etc/network/interfaces in the /etc/network/interfaces file:
file:
cumulusnetworks.com 361
Cumulus Linux 3.5 User Guide
leaf03 leaf04
cumulusnetworks.com 363
Cumulus Linux 3.5 User Guide
net add clag peer sys-mac 44: net add clag port bond server3
38:39:FF:00:02 interface swp49- interface swp1 clag-id 3
50 primary backup-ip net add clag port bond server4
192.168.1.14 interface swp2 clag-id 4
net add clag port bond server3 net add bond server3-4 bridge
interface swp1 clag-id 3 access 100
net add clag port bond server4 net add bond server3-4 stp
interface swp2 clag-id 4 portadminedge
net add bond server3-4 bridge net add bond server3-4 stp
access 100 bpduguard
net add bond server3-4 stp
portadminedge These commands create the following configuration
net add bond server3-4 stp in the /etc/network/interfaces file:
bpduguard
auto swp1
auto eth0 iface swp1
iface eth0 inet dhcp
auto swp2
auto swp1 iface swp2
iface swp1
# peerlink
auto swp2 auto swp49
iface swp2 iface swp49
post-up ip link set $IFACE
promisc on # Only required
# peerlink on VX
auto swp49
iface swp49
post-up ip link set $IFACE auto swp50
promisc on # Only required iface swp50
on VX post-up ip link set $IFACE
promisc on # Only required
on VX
auto swp50
iface swp50
# uplinks
auto swp51
auto vlan100
auto server4 iface vlan100
iface server4 address 172.16.1.4/24
bond-slaves swp2 vlan-id 100
cumulusnetworks.com 365
Cumulus Linux 3.5 User Guide
auto vlan100
iface vlan100
address 172.16.1.3/24
vlan-id 100
vlan-raw-device bridge
Use clagd-priority to set the role of the MLAG peer switch to primary or secondary. Each peer switch
in an MLAG pair must have the same clagd-sys-mac setting. Each clagd-sys-mac setting should be
unique to each MLAG pair in the network. For more details, refer to man clagd.
CLAG Interfaces
Our Interface Peer Interface CLAG Id
Conflicts Proto-Down Reason
---------------- ---------------- -------
-------------------- -----------------
server1 server1 1
- -
server2 server2 2
- -
In order to configure MLAG with a traditional mode bridge, the peer link and all dual-connected links must
be configured as untagged/native (see page 337) ports on a bridge (note the absence of any VLANs in the
bridge-ports line and the lack of the bridge-vlan-aware parameter below):
auto br0
iface br0
cumulusnetworks.com 367
Cumulus Linux 3.5 User Guide
For a deeper comparison of traditional versus VLAN-aware bridge modes, read this knowledge base article.
The backup IP address must be different than the peer link IP address (clagd-peer-
ip). It must be reachable by a route that doesn't use the peer link and it must be in the
same network namespace as the peer link IP address.
Cumulus Networks recommends you use the switch's management IP address for this
purpose.
You can also specify the backup UDP port. The port defaults to 5342, but you can configure it as
an argument in clagd-args using --backupPort <PORT>.
You can see the backup IP address if you run net show clag:
CLAG Interfaces
Our Interface Peer Interface CLAG Id
Conflicts Proto-Down Reason
---------------- ---------------- -------
-------------------- -----------------
leaf03-04 leaf03-04 1034
- -
exit01-02 - 2930
- -
leaf01-02 leaf01-02 1012
- -
cumulusnetworks.com 369
Cumulus Linux 3.5 User Guide
...
auto swp52s0
iface swp52s0
address 192.0.2.1/24
vrf green
auto green
iface green
vrf-table auto
auto peer5.4000
iface peer5.4000
address 192.0.2.15/24
clagd-peer-ip 192.0.2.16
clagd-backup-ip 192.0.2.2 vrf green
clagd-sys-mac 44:38:39:01:01:01
...
Which you can verify with net show clag status verbose:
CLAG Interfaces
Our Interface Peer Interface CLAG Id
Conflicts Proto-Down Reason
---------------- ---------------- -------
-------------------- -----------------
bond4 bond4 4
- -
bond1 bond1 1
- -
bond2 bond2 2
- -
bond3 bond3 3
- -
...
Once bonds are identified as dual-connected, clagd sends more information to the peer switch for those
cumulusnetworks.com 371
Cumulus Linux 3.5 User Guide
Once bonds are identified as dual-connected, clagd sends more information to the peer switch for those
bonds. The MAC addresses (and VLANs) that have been dynamically learned on those ports are sent along
with the LACP partner MAC address for each bond. When a switch receives MAC address information from
its peer, it adds MAC address entries on the corresponding ports. As the switch learns and ages out MAC
addresses, it informs the peer switch of these changes to its MAC address table so that the peer can keep
its table synchronized. Periodically, at 45% of the bridge ageing time, a switch will send its entire MAC
address table to the peer, so that peer switch can verify that its MAC address table is properly
synchronized.
The switch sends an update frequency value in the messages to its peer, which tells clagd how often the
peer will send these messages. You can configure a different frequency by adding --lacpPoll
<SECONDS> to clagd-args:
In this design, the spine switches route traffic between the server hosts in the layer 2 domains and the
core. The servers (host1 - host4) each have a layer 2 connection up to the spine layer where the default
gateway for the host subnets resides. However, since the spine switches as gateway devices communicate
at layer 3, you need to configure a protocol such as VRR (see page 383) (Virtual Router Redundancy)
between the spine switch pair to support active/active forwarding.
Then, to connect the spine switches to the core switches, you need to determine whether the routing is
static or dynamic. If it's dynamic, you must choose which protocol — OSPF (see page 617) or BGP (see page
633) — to use. When enabling a routing protocol in an MLAG environment it is also necessary to manage
the uplinks, because by default MLAG is not aware of layer 3 uplink interfaces. In the event of a peer link
failure MLAG does not remove static routes or bring down a BGP or OSPF adjacency unless a separate link
state daemon such as ifplugd is used.
Routing protocols over an MLAG bond should be configured to treat the peerlink as a less
desirable next hop for all destinations. SVI MAC addresses are not learned over the peerlink so
forwarding data plane traffic laterally to the peer switch results in flooding and high CPU
utilization.
...
...
cumulusnetworks.com 373
Cumulus Linux 3.5 User Guide
Configuring MTU
auto bridge
iface bridge
bridge-ports peerlink uplink server01
auto peerlink
iface peerlink
mtu 9216
auto server01
iface server01
mtu 9216
auto uplink
iface uplink
mtu 9216
Likewise, to ensure the MTU 9216 path is respected through the spine switches above, also
change the MTU setting for bridge bridge by configuring mtu 9216 for each of the following
members of bridge bridge on both spine01 and spine02: leaf01-02, leaf03-04, exit01-02, peerlink.
auto bridge
iface bridge
bridge-ports leaf01-02 leaf03-04 exit01-02 peerlink
cumulusnetworks.com 375
Cumulus Linux 3.5 User Guide
auto exit01-02
iface exit01-02
mtu 9216
auto leaf01-02
iface leaf01-02
mtu 9216
auto leaf03-04
iface leaf03-04
mtu 9216
auto peerlink
iface peerlink
mtu 9216
Scaling this example out to a full rack, when planning for link failures, you need only allocate enough
bandwidth to meet your site's strategy for handling failure scenarios. Imagine a full rack with 40 servers and
two switches in it. You may plan for, say, 4 to 6 servers to lose connectivity to a single switch and become
single connected before you respond to the event. So expanding upon our previous example, if you have
40 hosts each with 20G of bandwidth dual-connected to the MLAG pair, you might allocate 20G to 30G of
bandwidth to the peerlink — which accounts for half of the single-connected bandwidth for 4 to 6 hosts.
cumulusnetworks.com 377
Cumulus Linux 3.5 User Guide
Use NCLU (see page 82) (net) commands for all spanning tree configurations, including bridge
priority, path cost and so forth. Do not use brctl commands for spanning tree, except for brctl
stp on/off, as changes are not reflected to mstpd and can create conflicts.
Troubleshooting MLAG
When you run ethtool -S on a peerlink interface, the drops are indicated by the HwIfInDiscards counter:
MLAG is disabled on chassis, including the Facebook Backpack and EdgeCore OMP-800.
cumulusnetworks.com 379
Cumulus Linux 3.5 User Guide
MLAG is disabled on chassis, including the Facebook Backpack and EdgeCore OMP-800.
LACP Bypass
On Cumulus Linux, LACP Bypass is a feature that allows a bond (see page 313) configured in 802.3ad mode
to become active and forward traffic even when there is no LACP partner. A typical use case for this feature
is to enable a host, without the capability to run LACP, to PXE boot while connected to a switch on a bond
configured in 802.3ad mode. Once the pre-boot process finishes and the host is capable of running LACP,
the normal 802.3ad link aggregation operation takes over.
Contents
This chapter covers ...
Understanding the LACP Bypass All-active Mode (see page 380)
LACP Bypass and MLAG Deployments (see page 380)
Configuring LACP Bypass (see page 380)
Traditional Bridge Mode Configuration (see page 382)
auto bond1
iface bond1
bond-lacp-bypass-allow yes
bond-slaves swp51s2 swp51s3
clag-id 1
mstpctl-bpduguard yes
...
auto bridge
iface bridge
bridge-ports bond1 bond2 bond3 bond4 peer5
bridge-vids 100-105
bridge-vlan-aware yes
You can check the status of the configuration by running net show interface <bond> on the bond
and its slave interfaces:
Bond Details
------------------ -------------------------
Bond Mode: LACP
Load Balancing: Layer3+4
cumulusnetworks.com 381
Cumulus Linux 3.5 User Guide
Minimum Links: 1
In CLAG: CLAG Active
LACP Sys Priority:
LACP Rate: Fast Timeout
LACP Bypass: LACP Bypass Not Supported
Untagged
----------
1
LLDP
-------- ---- ------------------
swp51s2(P) ==== swp1(spine01)
swp51s3(P) ==== swp1(spine02)
Use the cat command to verify that LACP bypass is enabled on a bond and its slave interfaces:
auto bond1
iface bond1
bond-slaves swp3 swp4
bond-lacp-bypass-allow 1
auto br0
iface br0
bridge-ports bond1 bond2 bond3 bond4 peer5
mstpctl-bpduguard bond1=yes
cumulusnetworks.com 383
Cumulus Linux 3.5 User Guide
A production implementation will have many more server hosts and network connections than
are shown here. However, this basic configuration provides a complete description of the
important aspects of the VRR setup.
As the bridges in each of the redundant routers are connected, they will each receive and reply to ARP
requests for the virtual router IP address.
Cumulus Networks recommends using MAC addresses from the reserved range when
configuring VRR.
The reserved MAC address range for VRR is the same as for the Virtual Router
Redundancy Protocol (VRRP), as they serve similar purposes.
Contents
This chapter covers ...
Configuring a VRR-enabled Network (see page 384)
Configuring the Routers (see page 384)
Configuring the Hosts (see page 386)
Example VRR Configuration with MLAG (see page 386)
For networks using MLAG, use bond interfaces. Otherwise, use switch port interfaces.
Multiple inter-peer links are typically bonded interfaces, in order to accomodate higher
bandwidth between the routers, and to offer link redundancy.
The VLAN interface must have unique IP addresses for both the physical (the address
option below) and virtual (the address-virtual option below) interfaces, as the unique
address is used when the switch initiates an ARP request.
NCLU Commands
cumulus@switch:~$ net add bridge
cumulus@switch:~$ net add vlan 500 ip address 192.0.2.252/24
cumulus@switch:~$ net add vlan 500 ip address-virtual 00:00:5e:00:
01:01 192.0.2.254/24
cumulus@switch:~$ net add vlan 500 ipv6 address 2001:db8::1/32
cumulus@switch:~$ net add vlan 500 ipv6 address-virtual 00:00:5e:0
0:01:01 2001:db8::f/32
cumulus@switch:~$ net pending
cumulus@switch:~$ net commit
/etc/network/interfaces Snippet
auto bridge
iface bridge
bridge-vids 500
bridge-vlan-aware yes
auto vlan500
iface vlan500
address 192.0.2.252/24
address 2001:db8::1/32
cumulusnetworks.com 385
Cumulus Linux 3.5 User Guide
These commands create the following These commands create the following
configuration in /etc/network/interfaces: configuration in /etc/network/interfaces:
cumulusnetworks.com 387
Cumulus Linux 3.5 User Guide
Create a configuration like the following on an Create a configuration like the following on an
Ubuntu host: Ubuntu host:
ifplugd
ifplugd is an Ethernet link-state monitoring daemon, that can execute user-specified scripts to configure
an Ethernet device when a cable is plugged in, or automatically unconfigure it when a cable is removed.
cumulusnetworks.com 389
Cumulus Linux 3.5 User Guide
Follow the steps below to install and configure the ifplugd daemon.
Install ifplugd
1. Update the switch before installing the daemon:
Configure ifplugd
Once ifplugd is installed, two configuration files must be edited to set up ifplugd:
/etc/default/ifplugd
/etc/ifplugd/action.d/ifupdown
ifplugd is configured on both both the primary and secondary MLAG (see page 348)
switches in this example.
INTERFACES="peerbond"
HOTPLUG_INTERFACES=""
ARGS="-q -f -u0 -d1 -w -I"
SUSPEND_ACTION="stop"
#!/bin/sh
set -e
case "$2" in
up)
cumulusnetworks.com 391
Cumulus Linux 3.5 User Guide
When an IGMPv2 leave message is received, a group specific query is sent to identify if there are any other
hosts interested in that group, before the group is deleted.
An IGMP query message received on a port is used to identify the port that is connected to a router and is
interested in receiving multicast traffic.
MLD snooping processes MLD v1/v2 reports, queries and v1 done messages for IPv6 groups. If IGMP or
MLD snooping is disabled, multicast traffic gets flooded to all the bridge ports in the bridge. Similarly, in the
absence of receivers in a VLAN, multicast traffic would be flooded to all ports in the VLAN. The multicast
group IP address is mapped to a multicast MAC address and a forwarding entry is created with a list of
ports interested in receiving multicast traffic destined to that group.
Contents
This chapter covers ...
Configuring IGMP/MLD Querier (see page 392)
Disable IGMP and MLD Snooping (see page 393)
Debugging IGMP/MLD Snooping (see page 394)
Related Information (see page 396)
For an explanation of the relevant parameters, see the ifupdown-addons-interfaces man page.
392 02 March 2018
Cumulus Networks
For an explanation of the relevant parameters, see the ifupdown-addons-interfaces man page.
For a VLAN-aware bridge (see page 325), use a configuration like the following:
auto bridge.100
vlan bridge.100
bridge-igmp-querier-src 123.1.1.1
auto bridge
iface bridge
bridge-ports swp1 swp2 swp3
bridge-vlan-aware yes
bridge-vids 100 200
bridge-pvid 1
bridge-mcquerier 1
For a VLAN-aware bridge, like bridge in the above example, to enable querier functionality for VLAN 100 in
the bridge, set bridge-mcquerier to 1 in the bridge stanza and set bridge-igmp-querier-src to
123.1.1.1 in the bridge.100 stanza.
You can specify a range of VLANs as well. For example:
auto bridge.[1-200]
vlan bridge.[1-200]
bridge-igmp-querier-src 123.1.1.1
For a bridge in traditional mode (see page 319), use a configuration like the following:
auto br0
iface br0
address 192.0.2.10/24
bridge-ports swp1 swp2 swp3
bridge-vlan-aware no
bridge-mcquerier 1
bridge-mcqifaddr 1
cumulusnetworks.com 393
Cumulus Linux 3.5 User Guide
The commands above add the bridge-mcsnoop line to the following example bridge in /etc
/network/interfaces:
auto bridge
iface bridge
bridge-mcquerier 1
bridge-mcsnoop 0
bridge-ports swp1 swp2 swp3
bridge-pvid 1
bridge-vids 100 200
bridge-vlan-aware yes
mc querier 0 mc query
ifaddr 0
flags
swp1 (1)
port id 8001 state
forwarding
designated root 8000.7072cf8c272c path
cost 2
designated bridge 8000.7072cf8c272c message age
timer 0.00
designated port 8001 forward delay
timer 0.00
designated cost 0 hold
timer 0.00
mc router 1 mc fast
leave 0
flags
swp2 (2)
port id 8002 state
forwarding
designated root 8000.7072cf8c272c path
cost 2
designated bridge 8000.7072cf8c272c message age
timer 0.00
designated port 8002 forward delay
timer 0.00
designated cost 0 hold
timer 0.00
mc router 1 mc fast
leave 0
flags
swp3 (3)
port id 8003 state
forwarding
designated root 8000.7072cf8c272c path
cost 2
designated bridge 8000.7072cf8c272c message age
timer 0.00
designated port 8003 forward delay
timer 8.98
designated cost 0 hold
timer 0.00
mc router 1 mc fast
leave 0
flags
To get the groups and bridge port state, use the bridge mdb show command. To display router ports
and group information use the bridge -d -s mdb show command:
cumulusnetworks.com 395
Cumulus Linux 3.5 User Guide
Related Information
www.linuxfoundation.org/collaborate/workgroups/networking/bridge#Snooping
tools.ietf.org/html/rfc4541
en.wikipedia.org/wiki/IGMP_snooping
tools.ietf.org/rfc/rfc2236.txt
tools.ietf.org/html/rfc3376
tools.ietf.org/search/rfc2710
tools.ietf.org/html/rfc3810
Network
396 Virtualization 02 March 2018
Cumulus Networks
Network Virtualization
Cumulus Linux supports these forms of network virtualization:
VXLAN (Virtual Extensible LAN) is a standard overlay protocol that abstracts logical virtual networks from the
physical network underneath. You can deploy simple and scalable layer 3 Clos architectures while
extending layer 2 segments over that layer 3 network.
VXLAN uses a VLAN-like encapsulation technique to encapsulate MAC-based layer 2 Ethernet frames within
layer 3 UDP packets. Each virtual network is a VXLAN logical L2 segment. VXLAN scales to 16 million
segments – a 24-bit VXLAN network identifier (VNI ID) in the VXLAN header – for multi-tenancy.
Hosts on a given virtual network are joined together through an overlay protocol that initiates and
terminates tunnels at the edge of the multi-tenant network, typically the hypervisor vSwitch or top of rack.
These edge points are the VXLAN tunnel end points (VTEP).
Cumulus Linux can initiate and terminate VTEPs in hardware and supports wire-rate VXLAN. VXLAN
provides an efficient hashing scheme across IP fabric during the encapsulation process; the source UDP
port is unique, with the hash based on L2-L4 information from the original frame. The UDP destination port
is the standard port 4789.
Cumulus Linux includes the native Linux VXLAN kernel support and integrates with controller-based overlay
solutions like VMware NSX (see page 566) and Midokura MidoNet (see page 535).
VXLAN is supported only on switches in the Cumulus Linux HCL using the Broadcom Tomahawk, Trident II+
and Trident II chipsets as well as the Mellanox Spectrum chipset.
VXLAN encapsulation over layer 3 subinterfaces (for example, swp3.111) is not supported.
Therefore, VXLAN uplinks should be only configured as layer 3 interfaces without any
subinterfaces (for example, swp3).
Furthermore the VXLAN tunnel endpoints cannot share a common subnet; there must be at least
one layer 3 hop between the VXLAN source and destination.
Caveats/Errata
Useful Links
cumulusnetworks.com 397
Cumulus Linux 3.5 User Guide
Useful Links
VXLAN - RFC 7348
ovsdb-server
Contents
This chapter covers ...
Requirements (see page 398)
Example Configuration (see page 399)
Configuring Static VXLAN Tunnels (see page 399)
Verifying the Configuration (see page 403)
Requirements
While they should be interoperable with other vendors, Cumulus Networks supports static VXLAN tunnels
only on switches in the Cumulus Linux HCL using the Broadcom Tomahawk, Trident II+ and Trident II
chipsets as well as the Mellanox Spectrum chipset.
For a basic VXLAN configuration, you should ensure that:
The VXLAN has a network identifier (VNI); do not use 0 or 16777215 as the VNI ID, as they are
reserved values under Cumulus Linux.
The VXLAN link and local interfaces are added to bridge to create the association between port,
VLAN and VXLAN instance.
Each bridge on the switch has only one VXLAN interface. Cumulus Linux does not support more
than one VXLAN link in a bridge.
The VXLAN registration daemon (vxrd) is not running. Static VXLAN tunnels do not interoperate
with LNV or EVPN. If vxrd is running, stop it with:
Example Configuration
The following topology is used in this chapter. Each IP address corresponds to the switch's loopback
address:
cumulusnetworks.com 399
Cumulus Linux 3.5 User Guide
auto swp1
iface swp1
auto swp2
iface swp2
auto bridge
iface bridge
bridge-ports vni-10
bridge-vids 10
bridge-vlan-aware yes
auto vni-10
iface vni-10
bridge-access 10
mstpctl-bpduguard yes
mstpctl-portbpdufilter yes
vxlan-id 10
vxlan-local-tunnelip 10.0.0.11
vxlan-remoteip 10.0.0.12
vxlan-remoteip 10.0.0.13
vxlan-remoteip 10.0.0.14
leaf02
auto swp2
iface swp2
auto bridge
iface bridge
bridge-ports
vni-10
bridge-vids 10
bridge-vlan-
aware yes
auto vni-10
iface vni-10
bridge-access
10
mstpctl-
bpduguard yes
mstpctl-
portbpdufilter yes
vxlan-id 10
vxlan-local-
tunnelip 10.0.0.12
vxlan-
remoteip 10.0.0.11
vxlan-
remoteip 10.0.0.13
vxlan-
remoteip 10.0.0.14
leaf03
cumulusnetworks.com 401
Cumulus Linux 3.5 User Guide
auto swp2
iface swp2
auto bridge
iface bridge
bridge-ports
vni-10
bridge-vids 10
bridge-vlan-
aware yes
auto vni-10
iface vni-10
bridge-access
10
mstpctl-
bpduguard yes
mstpctl-
portbpdufilter yes
vxlan-id 10
vxlan-local-
tunnelip 10.0.0.13
vxlan-
remoteip 10.0.0.11
vxlan-
remoteip 10.0.0.12
vxlan-
remoteip 10.0.0.14
leaf04
auto swp2
iface swp2
auto bridge
iface bridge
bridge-ports
vni-10
bridge-vids 10
bridge-vlan-
aware yes
auto vni-10
iface vni-10
bridge-access
10
mstpctl-
bpduguard yes
mstpctl-
portbpdufilter yes
vxlan-id 10
vxlan-local-
tunnelip 10.0.0.14
vxlan-
remoteip 10.0.0.11
vxlan-
remoteip 10.0.0.12
vxlan-
remoteip 10.0.0.13
cumulusnetworks.com 403
Cumulus Linux 3.5 User Guide
Contents
This chapter covers ...
Requirements (see page 404)
Example VXLAN Configuration (see page 404)
Configuring the Static MAC Bindings VXLAN (see page 404)
Troubleshooting VXLANs in Cumulus Linux (see page 406)
Requirements
A VXLAN configuration requires a switch with a Broadcom Tomahawk, Trident II+ or Trident II chipset
running Cumulus Linux 2.0 or later, or a Mellanox Spectrum chipset running Cumulus Linux 3.2.0 or later.
For a basic VXLAN configuration, you should ensure that:
The VXLAN has a network identifier (VNI); do not use 0 or 16777215 as the VNI ID, as they are
reserved values under Cumulus Linux.
The VXLAN link and local interfaces are added to bridge to create the association between port,
VLAN and VXLAN instance.
Preconfiguring remote MAC addresses does not scale. A better solution is to use the Cumulus
Networks Lightweight Network Virtualization feature, or a controller-based option like Midokura
MidoNet and OpenStack or VMware NSX.
auto vtep1000
iface vtep1000
vxlan-id 1000
vxlan-local-tunnelip 172.10.1.1
auto bridge
iface bridge
bridge-ports swp1 swp2 vtep1000
bridge-vids 10
bridge-vlan-aware yes
post-up bridge fdb add 0:00:10:00:00:0C dev vtep1000 dst 172.20.1.
1 vni 1000
auto vtep1000
iface vtep1000
vxlan-id 1000
cumulusnetworks.com 405
Cumulus Linux 3.5 User Guide
vxlan-local-tunnelip 172.20.1.1
auto bridge
iface bridge
bridge-ports swp1 swp2 vtep1000
bridge-vlan-aware yes
post-up bridge fdb add 00:00:10:00:00:0A dev vtep1000 dst 172.10.1
.1 vni 1000
post-up bridge fdb add 00:00:10:00:00:0B dev vtep1000 dst 172.10.1
.1 vni 1000
LNV is a lightweight controller option. Please contact Cumulus Networks with your scale
requirements so we can make sure this is the right fit for you. There are also other controller
options that can work on Cumulus Linux.
When you are using LNV, you cannot use EVPN (see page 463) at the same time.
Contents
This chapter covers ...
Understanding LNV Concepts (see page 408)
Acquiring the Forwarding Database at the Service Node (see page 409)
MAC Learning and Flooding (see page 409)
Handling BUM Traffic (see page 409)
Requirements (see page 410)
Hardware Requirements (see page 410)
Configuration Requirements (see page 410)
Installing the LNV Packages (see page 411)
Sample LNV Configuration (see page 411)
Network Connectivity (see page 411)
Layer 3 IP Addressing (see page 412)
Layer 3 Fabric (see page 414)
Host Configuration (see page 416)
Configuring the VLAN to VXLAN Mapping (see page 417)
Verifying the VLAN to VXLAN Mapping (see page 419)
Enabling and Managing Service Node and Registration Daemons (see page 420)
Enabling the Service Node Daemon (see page 420)
cumulusnetworks.com 407
Cumulus Linux 3.5 User Guide
The two switches running Cumulus Linux, called leaf1 and leaf2, each have a bridge configured. These two
bridges contain the physical switch port interfaces connecting to the servers as well as the logical VXLAN
interface associated with the bridge. By creating a logical VXLAN interface on both leaf switches, the
switches become VTEPs (virtual tunnel end points). The IP address associated with this VTEP is most
commonly configured as its loopback address — in the image above, the loopback address is 10.2.1.1 for
leaf1 and 10.2.1.2 for leaf2.
You cannot have both service node and head end replication configured simultaneously, as this
causes the BUM traffic to be duplicated — both the source VTEP and the service node sending
their own copy of each packet to every remote VTEP.
cumulusnetworks.com 409
Cumulus Linux 3.5 User Guide
You only specify this parameter when head end replication is disabled. For the loopback,
the parameter is still named vxrd-svcnode-ip.
svcnode_ip = <>
enable_vxlan_listen = true
Requirements
Hardware Requirements
Switches with a Broadcom Tomahawk, Trident II+ or Trident II, or Mellanox Spectrum chipset
running Cumulus Linux 2.5.4 or later. Please refer to the Cumulus Networks hardware compatibility
list for a list of supported switch models.
Configuration Requirements
The VXLAN has an associated VXLAN Network Identifier (VNI), also interchangeably called a VXLAN
ID.
The VNI should not be 0 or 16777215, as these two numbers are reserved values under Cumulus
Linux.
The VXLAN link and physical interfaces are added to the bridge to create the association between
the port, VLAN and VXLAN instance.
Want to try out configuring LNV and don't have a Cumulus Linux switch? Check out Cumulus VX.
Network Connectivity
There must be full network connectivity before you can configure LNV. The layer 3 IP addressing
information as well as the OSPF configuration (/etc/frr/frr.conf) below is provided to make the LNV
example easier to understand.
OSPF is not a requirement for LNV, LNV just requires L3 connectivity. With Cumulus Linux this can
be achieved with static routes, OSPF or BGP.
cumulusnetworks.com 411
Cumulus Linux 3.5 User Guide
Layer 3 IP Addressing
Here is the configuration for the IP addressing information used in this example.
spine1: spine2:
These commands create the following These commands create the following
configuration: configuration:
leaf1: leaf2:
These commands create the following These commands create the following
configuration: configuration:
cumulusnetworks.com 413
Cumulus Linux 3.5 User Guide
Layer 3 Fabric
The service nodes and registration nodes must all be routable between each other. The L3 fabric on
Cumulus Linux can either be BGP (see page 633) or OSPF (see page 617). In this example, OSPF is used to
demonstrate full reachability. Click to expand the FRRouting configurations below.
Click to expand the OSPF configuration ...
FRRouting configuration using OSPF:
spine1: spine2:
These commands create the following These commands create the following
configuration: configuration:
leaf1: leaf2:
cumulusnetworks.com 415
Cumulus Linux 3.5 User Guide
These commands create the following These commands create the following
configuration: configuration:
Host Configuration
In this example, the servers are running Ubuntu 14.04. There needs to be a trunk mapped from server1
and server2 to the respective switch. In Ubuntu this is done with subinterfaces. You can expand the
configurations below.
Click to expand the host configurations ...
server1: server2:
On Ubuntu it is more reliable to use ifup and if down to bring the interfaces up and down individually,
rather than restarting networking entirely (that is, there is no equivalent to if reload like there is in
Cumulus Linux):
leaf1: leaf2:
cumulusnetworks.com 417
Cumulus Linux 3.5 User Guide
These commands create the following These commands create the following
configuration in the /etc/network/interfaces configuration in the /etc/network/interfaces
file: file:
auto lo auto lo
iface lo iface lo
address 10.2.1.1/32 address 10.2.1.2/32
vxrd-src-ip 10.2.1.1 vxrd-src-ip 10.2.1.2
bridge-access 10 bridge-access 10
mstpctl-bpduguard yes mstpctl-bpduguard yes
mstpctl-portbpdufilter yes mstpctl-portbpdufilter yes
vxlan-id 10 vxlan-id 10
vxlan-local-tunnelip 10.2.1.1 vxlan-local-tunnelip 10.2.1.2
Why is vni-2000 not vni-20? For example, why not tie VLAN 20 to VNI 20, or why was 2000 used?
VXLANs and VLANs do not need to be the same number. This was done on purpose to highlight
this fact. However if you are using fewer than 4096 VLANs, there is no reason not to make it easy
and correlate VLANs to VXLANs. It is completely up to you.
As with any logical interfaces on Linux, the name does not matter (other than a 15-character limit). To verify
the associated VNI for the logical name, use the ip -d link show command:
cumulusnetworks.com 419
Cumulus Linux 3.5 User Guide
The vxlan id 10 indicates the VXLAN ID/VNI is indeed 10 as the logical name suggests.
svcnode_ip = 10.2.1.3
cumulusnetworks.com 421
Cumulus Linux 3.5 User Guide
svcnode_ip = 10.2.1.3
Enable, then restart the registration node daemon for the change to take effect:
loglevel The log level, which can be DEBUG, INFO, WARNING, ERROR, CRITICAL. INFO
logdest The destination for log messages. It can be a file name, stdout or syslog syslog
.
logfilesize Log file size in bytes. Used when logdest is a file name. 512000
logbackupcount Maximum number of log files stored on the disk. Used when logdest is a 14
file name.
pidfile The PIF file location for the vxrd daemon. /var/run
/vxrd.
pid
udsfile The file name for the Unix domain socket used for management. /var/run
/vxrd.
sock
vxfld_port The UDP port used for VXLAN control messages. 10001
svcnode_ip The address to which registration daemons send control messages for
registration and/or BUM packets for replication. This can also be
configured under /etc/network/interfaces with the vxrd-
svcnode-ip keyword.
holdtime Hold time (in seconds) for soft state, which is how long the service node 90
waits before ageing out an IP address for a VNI. The vxrd includes this in seconds
the register messages it sends to a vxsnd.
src_ip Local IP address to bind to for receiving control traffic from the service
node daemon.
refresh_rate Number of times to refresh within the hold time. The higher this number, 3
the more lost UDP refresh messages can be tolerated. seconds
config_check_rate The number of seconds to poll the system for current VXLAN 5
membership. seconds
head_rep Enables self replication. Instead of using the service node to replicate true
BUM packets, it will be done in hardware on the VTEP switch.
Use 1, yes, true or on for True for each relevant option. Use 0, no, false or off for False.
For the example configuration, default values are used, except for the svcnode_ip field.
The address field is set to the loopback address of the switch running the vxsnd daemon.
svcnode_ip = 10.2.1.3
Enable, then restart the service node daemon for the change to take effect:
loglevel The log level, which can be DEBUG, INFO, WARNING, ERROR, CRITICAL. INFO
logdest Destination for log messages. It can be a file name, stdout or syslog. syslog
cumulusnetworks.com 423
Cumulus Linux 3.5 User Guide
logfilesize The log file size in bytes. Used when logdest is a file name. 512000
logbackupcount Maximum number of log files stored on disk. Used when logdest is a 14
file name.
pidfile The PID file location for the vxrd daemon. /var/run
/vxrd.
pid
udsfile The file name for the Unix domain socket used for management. /var/run
/vxrd.
sock
vxfld_port The UDP port used for VXLAN control messages. 10001
svcnode_ip This is the address to which registration daemons send control 0.0.0.0
messages for registration and/or BUM packets for replication.
holdtime Holdtime (in seconds) for soft state. It is used when sending a register 90
message to peers in response to learning a <vni, addr> from a VXLAN
data packet.
src_ip Local IP address to bind to for receiving inter-vxsnd control traffic. 0.0.0.0
svcnode_peers Space-separated list of IP addresses with which the vxsnd shares its
state.
enable_vxlan_listen When set to true, the service node listens for VXLAN data traffic. true
install_svcnode_ip When set to true, the snd_peer_address gets installed on the false
loopback interface. It gets withdrawn when the vxsnd is not in service. If
set to true, you must define the snd_peer_address configuration
variable.
age_check Number of seconds to wait before checking the database to age out 90
stale entries. seconds
Use 1, yes, true or on for True for each relevant option. Use 0, no, false or off for False.
Use the vxrdctl peers command to see configured VNIs and all VTEPs (leaf switches) within the network
that have them configured.
cumulus@leaf1:~$ cumulus@leaf2:~$
vxrdctl peers vxrdctl peers
VNI Peer Addrs VNI Peer Addrs
=== ========== === ==========
10 10.2.1.1, 10 10.2.1.1,
10.2.1.2 10.2.1.2
30 10.2.1.1, 30 10.2.1.1,
10.2.1.2 10.2.1.2
2000 10.2.1.1, 2000 10.2.1.1,
10.2.1.2 10.2.1.2
When head end replication mode is disabled, the command won't work.
Use the vxrdctl peers command to see the other VTEPs (leaf switches) and what VNIs are
associated with them. This does not show anything unless you enabled head end replication
mode by setting the head_rep option to True. Otherwise, replication is done by the service node.
cumulusnetworks.com 425
Cumulus Linux 3.5 User Guide
Total In Packets : 17
Total Out Octets : 552
Total Out Packets : 17
VNI : 2000
Network In Octets : 676
Network In Packets : 5
Network Out Octets : 1072
Network Out Packets : 8
Total In Octets : 2030
Total In Packets : 20
Total Out Octets : 2042
Total Out Packets : 30
VN Interface : vni: 2000, swp32s0.20
Total In Octets : 1354
Total In Packets : 15
Total Out Octets : 446
As mentioned above, SVIs (switch VLAN interfaces) are not supported when using VXLAN. That is,
there cannot be an IP address on the bridge that also contains a VXLAN.
10 10.10.10.1 10.10.10.2
30 10.10.30.1 10.10.30.2
cumulusnetworks.com 427
Cumulus Linux 3.5 User Guide
The other VNIs were also tested and can be viewed in the expanded output below.
Test connectivity between VNI-2000 connected servers by pinging from server1:
90:e2:ba:55:f0:85 appears in the MAC address table, which indicates that connectivity is occurring between
leaf1 and server1.
cumulusnetworks.com 429
Cumulus Linux 3.5 User Guide
spine1: spine2:
Add the 10.10.10.10/32 address to the loopback Add the 10.10.10.10/32 address to the loopback
address: address:
These commands create the following These commands create the following
configuration in the /etc/network/interfaces configuration in the /etc/network/interfaces
file: file:
auto lo auto lo
iface lo inet loopback iface lo inet loopback
address 10.2.1.3/32 address 10.2.1.4/32
address 10.10.10.10/32 address 10.10.10.10/32
spine1: spine2:
Use a text editor to edit the network configuration: Use a text editor to edit the network configuration:
This sets the address on which the service node This sets the address on which the service node
listens to VXLAN messages to the configured listens to VXLAN messages to the configured
Anycast address and sets it to sync with spine2. Anycast address and sets it to sync with spine1.
Enable, then restart the vxsnd daemon: Enable, then restart the vxsnd daemon:
cumulusnetworks.com 431
Cumulus Linux 3.5 User Guide
leaf1: leaf2:
Change the vxrd-svcnode-ip field to the Change the vxrd-svcnode-ip field to the
anycast address: anycast address:
These commands create the following These commands create the following
configuration in the /etc/network/interfaces configuration in the /etc/network/interfaces
file: file:
auto lo auto lo
iface lo inet loopback iface lo inet loopback
address 10.2.1.1 address 10.2.1.2
vxrd-svcnode-ip 10.10.10.10 vxrd-svcnode-ip 10.10.10.10
Verify the new service node is configured: Verify the new service node is configured:
The svcnode 10.10.10.10 means the The svcnode 10.10.10.10 means the
interface has the correct service node interface has the correct service node
configured. configured.
Use the vxrdctl vxlans command to check the Use the vxrdctl vxlans command to check the
service node: service node:
cumulusnetworks.com 433
Cumulus Linux 3.5 User Guide
Testing Connectivity
Repeat the ping tests from the previous section. Here is the table again for reference:
10 10.10.10.1 10.10.10.2
30 10.10.30.1 10.10.30.2
To prevent this issue from occurring, you should specify an anycast IP address for the loopback interface in
both /etc/network/interfaces and vxsnd.conf. This way, in case vxsnd fails, you can withdraw the
IP address.
Related Information
tools.ietf.org/html/rfc7348
en.wikipedia.org/wiki/Anycast
Network virtualization chapter, Cumulus Linux user guide (see page 396)
Contents
This chapter covers ...
Terminology and Definitions (see page 436)
Configuring LNV Active-Active Mode (see page 436)
active-active VTEP Anycast IP Behavior (see page 437)
Failure Scenario Behaviors (see page 437)
Checking VXLAN Interface Configuration Consistency (see page 438)
Configuring the Anycast IP Address (see page 438)
Example VXLAN Active-Active Configuration (see page 440)
FRRouting Configuration (see page 440)
Layer 3 IP Addressing (see page 440)
FRRouting OSPF Configuration (see page 446)
Host Configuration (see page 449)
Enable the Registration Daemon (see page 450)
Configuring a VTEP (see page 451)
Enable the Service Node Daemon (see page 451)
Configuring the Service Node (see page 451)
Considerations for Virtual Topologies Using Cumulus VX (see page 453)
Node ID (see page 453)
Bonds with Vagrant (see page 454)
Troubleshooting with LNV Active-Active (see page 454)
Caveats and Errata (see page 456)
Related Information (see page 456)
cumulusnetworks.com 435
Cumulus Linux 3.5 User Guide
Term Definition
vxrd VXLAN registration daemon. Runs on the switch that is mapping VLANs to VXLANs. The vxrd
daemon needs to be configured to register to a service node. This turns the switch into a VTEP.
VTEP Virtual tunnel endpoint. This is an encapsulation and decapsulation point for VXLANs.
Spine The aggregation switch for multiple leafs. Specifically used when a data center is using a Clos
network architecture. Read more about spine-leaf architecture in this white paper.
vxsnd VXLAN service node daemon, that can be run to register multiple VTEPs.
exit A switch dedicated to peering the Clos network to an outside network. Also referred to as
leaf border leafs, service leafs or edge leafs.
anycast When an IP address is advertised from multiple locations. Allows multiple devices to share the
same IP and effectively load balance traffic across them. With LNV, anycast is used in 2 places:
RIOT Broadcom feature for Routing in and out of tunnels. Allows a VXLAN bridge to have a switch
VLAN interface associated with it, and traffic to exit a VXLAN into the layer 3 fabric. Also called
VXLAN Routing.
VXLAN Industry standard term for ability to route in and out of a VXLAN. Equivalent to Broadcom RIOT
Routing feature.
MLAG Refer to the MLAG chapter (see page ) for more detailed configuration information.
Configurations for the demonstration are provided below.
OSPF or Refer to the OSPF chapter (see page 617) or the BGP Chapter for more detailed
BGP configuration information. Configurations for the demonstration are provided below.
LNV Refer to the LNV chapter (see page 435) for more detailed configuration information.
Configurations for the demonstration are provided below.
STP BPDU filter and BPDU guard (see page ) should be enabled in the VXLAN interfaces if
STP (see page 435) is enabled in the bridge that is connected to the VXLAN.
Configurations for the demonstration are provided below.
1 When the switches boot up, ifupdown2 places all VXLAN interfaces in a PROTO_DOWN state (see
page ). The configured anycast addresses are not configured yet.
2 MLAG peering takes place, and a successful VXLAN interface consistency check between the switches
occurs.
3 clagd (the daemon responsible for MLAG) adds the anycast address to the loopback interface. It
then changes the local IP address of the VXLAN interface from a unique address to the anycast virtual
IP address and puts the interface in an UP state.
Scenario Behavior
The peer link The primary MLAG switch continues to keep all VXLAN interfaces up with the anycast
goes down. IP address while the secondary switch brings down all VXLAN interfaces and places
them in a PROTO_DOWN state. The secondary MLAG switch removes the anycast IP
address from the loopback interface and changes the local IP address of the VXLAN
interface to the configured unique IP address.
One of the The other operational switch continues to use the anycast IP address.
switches goes
down.
cumulusnetworks.com 437
Cumulus Linux 3.5 User Guide
Scenario Behavior
clagd is All VXLAN interfaces are put in a PROTO_DOWN state. The anycast IP address is
stopped. removed from the loopback interface and the local IP addresses of the VXLAN
interfaces are changed from the anycast IP address to unique non-virtual IP
addresses.
MLAG peering clagd brings up all the VXLAN interfaces after the reload timer expires with the
could not be configured anycast IP address. This allows the VXLAN interface to be up and running
established on both switches even though peering is not established.
between the
switches.
When the peer All VXLAN interfaces are put into a PROTO_DOWN state on the secondary switch.
link goes down
but the peer
switch is up (i.e.
the backup link is
active).
A configuration The VXLAN interface is placed into a PROTO_DOWN state on the secondary switch.
mismatch
between the
MLAG switches
auto lo
iface lo inet loopback
address 10.0.0.11/32
vxrd-src-ip 10.0.0.11
vxrd-svcnode-ip 10.10.10.10
clagd-vxlan-anycast-ip 10.10.10.20
auto lo
iface lo inet loopback
address 10.0.0.12/32
vxrd-src-ip 10.0.0.12
vxrd-svcnode-ip 10.10.10.10
clagd-vxlan-anycast-ip 10.10.10.20
Explanation of Variables
Variable Explanation
vxrd-src-
ip
The service node anycast IP address in the topology. In this demonstration, this is an
anycast IP address being shared by both spine switches.
vxrd-
svcnode-
ip
cumulusnetworks.com 439
Cumulus Linux 3.5 User Guide
Variable Explanation
The anycast address for the MLAG pair to share and bind to when MLAG is up and
running.
clagd-
vxlan-
anycast-
ip
Note the configuration of the local IP address in the VXLAN interfaces below. They are configured with
individual IP addresses, which clagd changes to anycast upon MLAG peering.
FRRouting Configuration
The layer 3 fabric can be configured using BGP (see page 435) or OSPF (see page 435). The following
example uses BGP Unnumbered. The MLAG switch configuration for the topology above is shown below.
Layer 3 IP Addressing
The IP address configuration for this example:
auto lo auto lo
iface lo inet loopback iface lo inet loopback
address 10.0.0.21/32 address 10.0.0.22/32
address 10.10.10.10/32 address 10.10.10.10/32
# downlinks # downlinks
auto swp1 auto swp1
iface swp1 iface swp1
auto lo auto lo
iface lo inet loopback iface lo inet loopback
address 10.0.0.11/32 address 10.0.0.12/32
vxrd-src-ip 10.0.0.11 vxrd-src-ip 10.0.0.12
vxrd-svcnode-ip 10.10.10.10 vxrd-svcnode-ip 10.10.10.10
clagd-vxlan-anycast-ip clagd-vxlan-anycast-ip
10.10.10.20 10.10.10.20
# peerlinks # peerlinks
auto swp49 auto swp49
iface swp49 iface swp49
cumulusnetworks.com 441
Cumulus Linux 3.5 User Guide
bond-min-links 1 bond-min-links 1
bond-xmit-hash-policy bond-xmit-hash-policy
layer3+4 layer3+4
# Downlinks # Downlinks
auto swp1 auto swp1
iface swp1 iface swp1
auto vlan20
iface vlan20 auto vlan20
bridge-ports peerlink.20 iface vlan20
bond0.20 vxlan20 bridge-ports peerlink.20
bridge-stp on bond0.20 vxlan20
mstpctl-portbpdufilter bridge-stp on
vxlan20=yes mstpctl-portbpdufilter
mstpctl-bpduguard vxlan20=yes vxlan20=yes
mstpctl-bpduguard vxlan20=yes
#vxlan config
auto vxlan1 #vxlan config
iface vxlan1 auto vxlan1
vxlan-id 1 iface vxlan1
vxlan-local-tunnelip vxlan-id 1
10.0.0.11 vxlan-local-tunnelip
10.0.0.12
auto vxlan10
iface vxlan10 auto vxlan10
vxlan-id 10 iface vxlan10
vxlan-local-tunnelip vxlan-id 10
10.0.0.11 vxlan-local-tunnelip
10.0.0.12
auto vxlan20
iface vxlan20 auto vxlan20
vxlan-id 20 iface vxlan20
vxlan-local-tunnelip vxlan-id 20
10.0.0.11 vxlan-local-tunnelip
10.0.0.12
# uplinks
auto swp51 # uplinks
iface swp51 auto swp51
iface swp51
auto swp52
iface swp52 auto swp52
iface swp52
auto lo auto lo
iface lo inet loopback iface lo inet loopback
address 10.0.0.13/32 address 10.0.0.14/32
vxrd-src-ip 10.0.0.13 vxrd-src-ip 10.0.0.14
vxrd-svcnode-ip 10.10.10.10 vxrd-svcnode-ip 10.10.10.10
clagd-vxlan-anycast-ip clagd-vxlan-anycast-ip
10.10.10.30 10.10.10.30
cumulusnetworks.com 443
Cumulus Linux 3.5 User Guide
# peerlinks # peerlinks
auto swp49 auto swp49
iface swp49 iface swp49
# Downlinks # Downlinks
auto swp1 auto swp1
iface swp1 iface swp1
mstpctl-portbpdufilter mstpctl-portbpdufilter
vxlan1=yes vxlan1=yes
mstpctl-bpduguard mstpctl-bpduguard
vxlan1=yes vxlan1=yes
# uplinks # uplinks
auto swp51 auto swp51
iface swp51 iface swp51
cumulusnetworks.com 445
Cumulus Linux 3.5 User Guide
! !
interface swp1 interface swp1
no ipv6 nd suppress-ra no ipv6 nd suppress-ra
ipv6 nd ra-interval 3 ipv6 nd ra-interval 3
! !
interface swp2 interface swp2
no ipv6 nd suppress-ra no ipv6 nd suppress-ra
ipv6 nd ra-interval 3 ipv6 nd ra-interval 3
! !
interface swp3 interface swp3
no ipv6 nd suppress-ra no ipv6 nd suppress-ra
ipv6 nd ra-interval 3 ipv6 nd ra-interval 3
! !
interface swp4 interface swp4
no ipv6 nd suppress-ra no ipv6 nd suppress-ra
ipv6 nd ra-interval 3 ipv6 nd ra-interval 3
! !
interface swp29 interface swp29
no ipv6 nd suppress-ra no ipv6 nd suppress-ra
ipv6 nd ra-interval 3 ipv6 nd ra-interval 3
! !
interface swp30 interface swp30
no ipv6 nd suppress-ra no ipv6 nd suppress-ra
ipv6 nd ra-interval 3 ipv6 nd ra-interval 3
! !
router bgp 65020 router bgp 65020
bgp router-id 10.0.0.21 bgp router-id 10.0.0.22
network 10.0.0.21/32 network 10.0.0.22/32
network 10.10.10.10/32 network 10.10.10.10/32
bgp bestpath as-path bgp bestpath as-path
multipath-relax multipath-relax
bgp bestpath compare-routerid bgp bestpath compare-routerid
bgp default show-hostname bgp default show-hostname
neighbor FABRIC peer-group neighbor FABRIC peer-group
neighbor FABRIC remote-as neighbor FABRIC remote-as
external external
neighbor FABRIC description neighbor FABRIC description
Internal Fabric Network Internal Fabric Network
neighbor FABRIC neighbor FABRIC
advertisement-interval 0 advertisement-interval 0
neighbor FABRIC timers 1 3 neighbor FABRIC timers 1 3
! !
cumulusnetworks.com 447
Cumulus Linux 3.5 User Guide
! !
interface swp51 interface swp51
no ipv6 nd suppress-ra no ipv6 nd suppress-ra
ipv6 nd ra-interval 3 ipv6 nd ra-interval 3
! !
interface swp52 interface swp52
no ipv6 nd suppress-ra no ipv6 nd suppress-ra
ipv6 nd ra-interval 3 ipv6 nd ra-interval 3
! !
router bgp 65013 router bgp 65014
bgp router-id 10.0.0.13 bgp router-id 10.0.0.14
network 10.0.0.13/32 network 10.0.0.14/32
network 172.16.3.0/24 network 172.16.3.0/24
network 10.10.10.30/32 network 10.10.10.30/32
bgp bestpath as-path bgp bestpath as-path
multipath-relax multipath-relax
bgp bestpath compare-routerid bgp bestpath compare-routerid
bgp default show-hostname bgp default show-hostname
neighbor FABRIC peer-group neighbor FABRIC peer-group
neighbor FABRIC remote-as neighbor FABRIC remote-as
external external
neighbor FABRIC description neighbor FABRIC description
Internal Fabric Network Internal Fabric Network
neighbor FABRIC neighbor FABRIC
advertisement-interval 0 advertisement-interval 0
neighbor FABRIC timers 1 3 neighbor FABRIC timers 1 3
neighbor FABRIC timers neighbor FABRIC timers
connect 3 connect 3
neighbor FABRIC capability neighbor FABRIC capability
extended-nexthop extended-nexthop
neighbor FABRIC filter-list neighbor FABRIC filter-list
dc-leaf-out out dc-leaf-out out
neighbor swp51 interface neighbor swp51 interface
neighbor swp51 peer-group neighbor swp51 peer-group
FABRIC FABRIC
neighbor swp52 interface neighbor swp52 interface
neighbor swp52 peer-group neighbor swp52 peer-group
FABRIC FABRIC
! !
ip as-path access-list dc-leaf- ip as-path access-list dc-leaf-
out permit ^$ out permit ^$
! !
Host Configuration
In this example, the servers are running Ubuntu 14.04. A layer2 bond must be mapped from server01 and
server03 to the respective switch. In Ubuntu this is done with subinterfaces.
server01 server03
auto lo auto lo
iface lo inet loopback iface lo inet loopback
cumulusnetworks.com 449
Cumulus Linux 3.5 User Guide
auto lo auto lo
iface lo inet static iface lo inet static
address 10.0.0.31/32 address 10.0.0.33/32
START=yes
Configuring a VTEP
The registration node was configured earlier in /etc/network/interfaces; no additional configuration
is typically needed. Alternatively, the configuration can be done in /etc/vxrd.conf, which has additional
configuration knobs available.
START=yes
[common] [common]
# Log level is one of DEBUG, # Log level is one of DEBUG,
INFO, WARNING, ERROR, CRITICAL INFO, WARNING, ERROR, CRITICAL
#loglevel = INFO #loglevel = INFO
# Destination for log # Destination for log
message. Can be a file name, message. Can be a file name,
'stdout', or 'syslog' 'stdout', or 'syslog'
#logdest = syslog #logdest = syslog
cumulusnetworks.com 451
Cumulus Linux 3.5 User Guide
# log file size in bytes. Used # log file size in bytes. Used
when logdest is a file when logdest is a file
#logfilesize = 512000 #logfilesize = 512000
# maximum number of log files # maximum number of log files
stored on disk. Used when stored on disk. Used when
logdest is a file logdest is a file
#logbackupcount = 14 #logbackupcount = 14
# The file to write the pid. # The file to write the pid.
If using monit, this must If using monit, this must
match the one match the one
# in the vxsnd.rc # in the vxsnd.rc
#pidfile = /var/run/vxsnd.pid #pidfile = /var/run/vxsnd.pid
# The file name for the unix # The file name for the unix
domain socket used for mgmt. domain socket used for mgmt.
#udsfile = /var/run/vxsnd.sock #udsfile = /var/run/vxsnd.sock
# UDP port for vxfld control # UDP port for vxfld control
messages messages
#vxfld_port = 10001 #vxfld_port = 10001
# This is the address to which # This is the address to which
registration daemons send registration daemons send
control messages for control messages for
# registration and/or BUM # registration and/or BUM
packets for replication packets for replication
svcnode_ip = 10.10.10.10 svcnode_ip = 10.10.10.10
# Holdtime (in seconds) for # Holdtime (in seconds) for
soft state. It is used when soft state. It is used when
sending a sending a
# register msg to peers in # register msg to peers in
response to learning a <vni, response to learning a <vni,
addr> from a addr> from a
# VXLAN data pkt # VXLAN data pkt
#holdtime = 90 #holdtime = 90
# Local IP address to bind to # Local IP address to bind to
for receiving inter-vxsnd for receiving inter-vxsnd
control traffic control traffic
src_ip = 10.0.0.21 src_ip = 10.0.0.22
[vxsnd] [vxsnd]
# Space separated list of IP # Space separated list of IP
addresses of vxsnd to share addresses of vxsnd to share
state with state with
svcnode_peers = 10.0.0.21 svcnode_peers = 10.0.0.21
10.0.0.22 10.0.0.22
# When set to true, the # When set to true, the
service node will listen for service node will listen for
vxlan data traffic vxlan data traffic
# Note: Use 1, yes, true, or # Note: Use 1, yes, true, or
on, for True and 0, no, false, on, for True and 0, no, false,
or off, or off,
# for False # for False
#enable_vxlan_listen = true #enable_vxlan_listen = true
Node ID
vxrd requires a unique node_id for each individual switch. This node_id is based off of the first
interface's MAC address; when using certain virtual topologies like Vagrant, both leaf switches within an
MLAG pair can generate the same exact unique node_id. One of the node_ids must then be configured
manually (or make sure the first interface always has a unique MAC address), as they are not unique.
To verify the node_id that gets configured by your switch, use the vxrdctl get config command:
cumulusnetworks.com 453
Cumulus Linux 3.5 User Guide
"udsfile": "/var/run/vxrd.sock",
"vxfld_port": 10001
}
[common]
node_id = 13
Ensure that each leaf has a separate node_id so that LNV can function correctly.
auto swp49
iface swp49
#for vagrant so bonds work correctly
post-up ip link set $IFACE promisc on
auto swp50
iface swp50
#for vagrant so bonds work correctly
post-up ip link set $IFACE promisc on
For more information on using Cumulus VX and Vagrant, refer to the Cumulus VX documentation.
cumulus@leaf01$ clagctl
The peer is alive
Our Priority, ID, and Role: 32768 44:38:39:00:00:35 primary
Peer Priority, ID, and Role: 32768 44:38:39:00:00:36 secondary
Peer Interface and IP: peerlink.4094 169.254.1.2
VxLAN Anycast IP: 10.10.10.30
Backup IP: 10.0.0.14 (inactive)
Output Explanation
VXLAN Anycast IP: The anycast IP address being shared by the MLAG pair for VTEP termination is in
10.10.10.30 use and is 10.10.10.30.
Conflicts: -
Proto-Down
Reason: -
In the next example the vxlan-id on VXLAN10 was switched to the wrong vxlan-id. When the clagctl
command is run, you will see that VXLAN10 went down because this switch was the secondary switch and
the peer switch took control of VXLAN. The reason code is vxlan-single indicating that there is a vxlan-
id mis-match on VXLAN10
cumulus@leaf02$ clagctl
The peer is alive
Peer Priority, ID, and Role: 32768 44:38:39:00:00:11 primary
Our Priority, ID, and Role: 32768 44:38:39:00:00:12 secondary
Peer Interface and IP: peerlink.4094 169.254.1.1
VxLAN Anycast IP: 10.10.10.20
Backup IP: 10.0.0.11 (inactive)
System MAC: 44:38:39:ff:40:94
CLAG Interfaces
cumulusnetworks.com 455
Cumulus Linux 3.5 User Guide
Related Information
Network virtualization chapter, Cumulus Linux user guide (see page 396)
LNV is a lightweight controller option. Please contact Cumulus Networks with your scale
requirements and we can make sure this is the right fit for you. There are also other controller
options that can work on Cumulus Linux.
Contents
This chapter covers ...
Example LNV Configuration (see page 456)
Layer 3 IP Addressing (see page 457)
FRRouting Configuration (see page 459)
Host Configuration (see page 461)
Service Node Configuration (see page 462)
Related Information (see page 463)
Want to try out configuring LNV and don't have a Cumulus Linux switch? Check out Cumulus VX .
Feeling Overwhelmed? Come join a Cumulus Boot Camp and get instructor-led training!
Layer 3 IP Addressing
Here is the configuration for the IP addressing information used in this example:
auto lo auto lo
iface lo inet loopback iface lo inet loopback
address 10.2.1.3/32 address 10.2.1.4/32
address 10.10.10.10/32 address 10.10.10.10/32
cumulusnetworks.com 457
Cumulus Linux 3.5 User Guide
auto lo auto lo
iface lo inet loopback iface lo inet loopback
address 10.2.1.1/32 address 10.2.1.2/32
vxrd-src-ip 10.2.1.1 vxrd-src-ip 10.2.1.2
vxrd-svcnode-ip 10.10.10.10 vxrd-svcnode-ip 10.10.10.10
FRRouting Configuration
The service nodes and registration nodes must all be routable between each other. The L3 fabric on
Cumulus Linux can either be BGP (see page 633) or OSPF (see page 617). In this example, OSPF is used to
demonstrate full reachability.
Here is the FRRouting configuration using OSPF:
interface lo interface lo
ip ospf area 0.0.0.0 ip ospf area 0.0.0.0
interface swp49 interface swp49
ip ospf network point-to- ip ospf network point-to-
point point
ip ospf area 0.0.0.0 ip ospf area 0.0.0.0
! !
interface swp50 interface swp50
ip ospf network point-to- ip ospf network point-to-
point point
ip ospf area 0.0.0.0 ip ospf area 0.0.0.0
! !
cumulusnetworks.com 459
Cumulus Linux 3.5 User Guide
interface lo interface lo
ip ospf area 0.0.0.0 ip ospf area 0.0.0.0
interface swp1s0 interface swp1s0
ip ospf network point-to- ip ospf network point-to-
point point
ip ospf area 0.0.0.0 ip ospf area 0.0.0.0
! !
interface swp1s1 interface swp1s1
ip ospf network point-to- ip ospf network point-to-
point point
ip ospf area 0.0.0.0 ip ospf area 0.0.0.0
! !
interface swp1s2 interface swp1s2
ip ospf network point-to- ip ospf network point-to-
point point
ip ospf area 0.0.0.0 ip ospf area 0.0.0.0
! !
interface swp1s3 interface swp1s3
ip ospf network point-to- ip ospf network point-to-
point point
ip ospf area 0.0.0.0 ip ospf area 0.0.0.0
! !
! !
! !
! !
! !
router-id 10.2.1.1 router-id 10.2.1.2
router ospf router ospf
Host Configuration
In this example, the servers are running Ubuntu 14.04. A trunk must be mapped from server1 and server2
to the respective switch. In Ubuntu this is done with subinterfaces.
server1 server2
cumulusnetworks.com 461
Cumulus Linux 3.5 User Guide
spine1:/etc/vxsnd.conf spine2:/etc/vxsnd.conf
[common] [common]
# Log level is one of DEBUG, # Log level is one of DEBUG,
INFO, WARNING, ERROR, CRITICAL INFO, WARNING, ERROR, CRITICAL
#loglevel = INFO #loglevel = INFO
# Destination for log # Destination for log
message. Can be a file name, ' message. Can be a file name, '
stdout', or 'syslog' stdout', or 'syslog'
#logdest = syslog #logdest = syslog
# log file size in bytes. Used # log file size in bytes. Used
when logdest is a file when logdest is a file
#logfilesize = 512000 #logfilesize = 512000
# maximum number of log files # maximum number of log files
stored on disk. Used when stored on disk. Used when
logdest is a file logdest is a file
#logbackupcount = 14 #logbackupcount = 14
# The file to write the pid. # The file to write the pid.
If using monit, this must If using monit, this must
match the one match the one
# in the vxsnd.rc # in the vxsnd.rc
#pidfile = /var/run/vxsnd.pid #pidfile = /var/run/vxsnd.pid
# The file name for the unix # The file name for the unix
domain socket used for mgmt. domain socket used for mgmt.
#udsfile = /var/run/vxsnd.sock #udsfile = /var/run/vxsnd.sock
# UDP port for vxfld control # UDP port for vxfld control
messages messages
#vxfld_port = 10001 #vxfld_port = 10001
# This is the address to which # This is the address to which
registration daemons send registration daemons send
control messages for control messages for
# registration and/or BUM # registration and/or BUM
packets for replication packets for replication
svcnode_ip = 10.10.10.10 svcnode_ip = 10.10.10.10
# Holdtime (in seconds) for # Holdtime (in seconds) for
soft state. It is used when soft state. It is used when
sending a sending a
# register msg to peers in # register msg to peers in
response to learning a <vni, response to learning a <vni,
addr> from a addr> from a
# VXLAN data pkt # VXLAN data pkt
#holdtime = 90 #holdtime = 90
# Local IP address to bind to f # Local IP address to bind to f
or receiving inter-vxsnd or receiving inter-vxsnd
control traffic control traffic
src_ip = 10.2.1.3 src_ip = 10.2.1.4
[vxsnd] [vxsnd]
# Space separated list of IP # Space separated list of IP
addresses of vxsnd to share addresses of vxsnd to share
state with state with
svcnode_peers = 10.2.1.4 svcnode_peers = 10.2.1.3
# When set to true, the # When set to true, the
service node will listen for service node will listen for
vxlan data traffic vxlan data traffic
# Note: Use 1, yes, true, or # Note: Use 1, yes, true, or
on, for True and 0, no, false, on, for True and 0, no, false,
or off, or off,
# for False # for False
#enable_vxlan_listen = true #enable_vxlan_listen = true
# When set to true, the # When set to true, the
svcnode_ip will be installed svcnode_ip will be installed
on the loopback on the loopback
# interface, and it will be # interface, and it will be
withdrawn when the vxsnd is no withdrawn when the vxsnd is no
longer in longer in
# service. If set to true, # service. If set to true,
the svcnode_ip configuration the svcnode_ip configuration
# variable must be defined. # variable must be defined.
# Note: Use 1, yes, true, or # Note: Use 1, yes, true, or
on, for True and 0, no, false, on, for True and 0, no, false,
or off, or off,
# for False # for False
#install_svcnode_ip = false #install_svcnode_ip = false
# Seconds to wait before # Seconds to wait before
checking the database to age checking the database to age
out stale entries out stale entries
#age_check = 90 #age_check = 90
Related Information
tools.ietf.org/html/rfc7348
en.wikipedia.org/wiki/Anycast
Detailed LNV Configuration Guide (see page 407)
Cumulus Networks Training
Network virtualization chapter, Cumulus Linux user guide (see page 396)
cumulusnetworks.com 463
Cumulus Linux 3.5 User Guide
When you are using EVPN, you cannot use LNV at the same time.
Ethernet Virtual Private Network (EVPN) is a standards-based control plane for VXLAN (see page 396)
defined in RFC 7432 and draft-ietf-bess-evpn-overlay that allows for building and deploying VXLANs at scale.
It relies on multi-protocol BGP (MP-BGP) for exchanging information and is based on BGP-MPLS IP VPNs (
RFC 4364). Hence, it has provisions to enable not only bridging between end systems in the same layer 2
segment but also routing between different segments (subnets). There is also inherent support for multi-
tenancy. EVPN is often referred to as the means of implementing controller-less VXLAN.
Cumulus Linux fully supports EVPN as the control plane for VXLAN, including for both intra-subnet bridging
and inter-subnet routing. Key features include:
VNI membership exchange between VTEPs using EVPN type-3 (Inclusive multicast Ethernet tag)
routes.
Exchange of host MAC and IP addresses using EVPN type-2 (MAC/IP advertisement) routes.
Support for host/VM mobility (MAC and IP moves) through exchange of the MAC Mobility Extended
community.
Support for dual-attached hosts via VXLAN active-active mode (see page 435). MAC synchronization
between the peer switches is done using MLAG (see page 348).
Support for ARP/ND suppression, which provides VTEPs with the ability to suppress ARP flooding
over VXLAN tunnels.
Support for exchange of static (sticky) MAC addresses through EVPN.
Support for distributed symmetric routing between different subnets.
Support for distributed asymmetric routing between different subnets.
Support for centralized routing.
Support for prefix-based routing using EVPN type-5 routes (EVPN IP prefix route)
Support for layer 3 multi-tenancy.
EVPN address-family is supported with both eBGP and iBGP peering. If the underlay routing is provisioned
using eBGP, the same eBGP session can also be used to carry EVPN routes. For example, in a typical 2-tier
Clos network topology where the leaf switches are the VTEPs, if eBGP sessions are in use between the leaf
and spine switches for the underlay routing, the same sessions can be used to exchange EVPN routes; the
spine switches merely act as "route forwarders" and do not install any forwarding state as they are not
VTEPs. When EVPN routes are exchanged over iBGP peering, OSPF can be used as the IGP or the next hops
can also be resolved using iBGP.
You can provision and manage EVPN using NCLU (see page 82).
For Cumulus Linux 3.4 and later releases, the routing control plane (including EVPN) is installed as
part of the FRRouting (FRR) package. For more information about FRR, refer to the FRR Overview
(see page 600).
Contents
This chapter covers ...
Basic EVPN Configuration (see page 466)
Enabling EVPN between BGP Neighbors (see page 466)
cumulusnetworks.com 465
Cumulus Linux 3.5 User Guide
1. Enable EVPN route exchange (that is, address-family L2VPN/EVPN) between BGP peers
2. Enable EVPN on the system to advertise VNIs and host reachability information (MAC addresses
learned on associated VLANs) to BGP peers
3. Disable MAC learning on VXLAN interfaces as EVPN is responsible for installing remote MACs
There are additional steps (configuration) to enable ARP/ND suppression, provision inter-subnet routing,
and so forth. These steps depend on the deployment scenario. Various other BGP parameters can also be
configured, if desired.
The command syntax bgp evpn is also permitted for backwards compatibility with prior versions
of Cumulus Linux, but the syntax bgp l2vpn evpn is recommended in order to standardize the
BGP address-family configuration to the AFI/SAFI format.
The above configuration does not result in BGP knowing about the local VNIs defined on the system and
advertising them to peers. This requires additional configuration, as described below (see page 467).
This configuration is only needed on leaf switches that are VTEPs. EVPN routes received from a
BGP peer are accepted, even without this explicit EVPN configuration. These routes are
maintained in the global EVPN routing table. However, they only become effective (that is,
imported into the per-VNI routing table and appropriate entries installed in the kernel) when the
VNI corresponding to the received route is locally known.
for the RD is a unique, internally generated number for a VNI. It solely has local significance; on remote
switches, its only role is for route disambiguation. This number is used instead of the VNI value itself
because this number has to be less than or equal to 65535. In the RT, the AS part is always encoded as a 2-
byte value to allow room for a large VNI. If the router has a 4-byte AS, only the lower 2 bytes are used. This
ensures a unique RT for different VNIs while having the same RT for the same VNI across routers in the
same AS.
For eBGP EVPN peering, the peers are in a different AS so using an automatic RT of "AS:VNI" does not work
for route import. Therefore, the import RT is treated as "*:VNI" to determine which received routes are
applicable to a particular VNI. This only applies when the import RT is auto-derived and not configured.
These commands are per VNI and must be specified under address-family l2vpn evpn in
BGP.
If you delete the RD or RT later, it reverts back to its corresponding default value.
You can configure multiple RT values for import or export for a VNI. In addition, you can configure both the
import and export route targets with a single command by using route-target both:
cumulus@switch:~$ net add bgp evpn vni 10400 route-target import 100:
400
cumulus@switch:~$ net add bgp evpn vni 10400 route-target import 100:
500
cumulus@switch:~$ net add bgp evpn vni 10500 route-target both 65000:
500
The above commands create the following configuration snippet in the /etc/frr/frr.conf file:
interface lo
ip ospf area 0.0.0.0
cumulusnetworks.com 469
Cumulus Linux 3.5 User Guide
!
interface swp50
ip ospf area 0.0.0.0
ip ospf network point-to-point
interface swp51
ip ospf area 0.0.0.0
ip ospf network point-to-point
!
router bgp 65020
neighbor 10.1.1.2 remote-as internal
neighbor 10.1.1.3 remote-as internal
neighbor 10.1.1.4 remote-as internal
!
address-family l2vpn evpn
neighbor 10.1.1.2 activate
neighbor 10.1.1.3 activate
neighbor 10.1.1.4 activate
advertise-all-vni
exit-address-family
!
Router ospf
Ospf router-id 10.1.1.1
Passive-interface lo
These commands create the following code snippet in the /etc/network/interfaces file:
auto vni200
iface vni200
bridge-access 200
bridge-learning off
vxlan-id 10200
vxlan-local-tunnelip 10.0.0.1
For a bridge in traditional mode (see page 337), you must edit the bridge configuration in the
/etc/network/interfaces file in a text editor:
auto bridge1
iface bridge1
bridge-ports swp3.100 swp4.100 vni100
bridge-learning vni100=off
On switches with the Mellanox Spectrum chipset, ND suppression only functions with the
Spectrum A1 chip.
ARP and ND suppression are not enabled by default. You configure ARP/ND suppression on a VXLAN
interface. You also need to create an SVI for the neighbor entry.
To configure ARP or ND suppression, use NCLU (see page 82). Here's an example configuration using 2
VXLANs, 10100 and 10200, and 2 VLANs, 100 and 200:
cumulusnetworks.com 471
Cumulus Linux 3.5 User Guide
auto bridge
iface bridge
bridge-ports vni100 vni200
bridge-stp on
bridge-vids 100 200
bridge-vlan-aware yes
auto vlan100
iface vlan100
ip6-forward off
ip-forward off
vlan-id 100
vlan-raw-device bridge
auto vlan200
iface vlan200
ip6-forward off
ip-forward off
vlan-id 200
vlan-raw-device bridge
auto vtep100
iface vtep100
bridge-access 100
bridge-arp-nd-suppress on
bridge-learning off
vxlan-id 10100
vxlan-local-tunnelip 110.0.0.1
auto vtep200
iface vtep200
bridge-learning off
bridge-access 200
bridge-arp-nd-suppress on
vxlan-id 10200
vxlan-local-tunnelip 110.0.0.1
For a bridge in traditional mode (see page 337), you must edit the bridge configuration in the
/etc/network/interfaces file in a text editor:
auto bridge1
iface bridge1
bridge-ports swp3.100 swp4.100 vni100
bridge-learning vni100=off
bridge-arp-nd-suppress vni100=on
ip6-forward off
ip-forward off
After you save your settings, reboot the switch to apply the new configuration.
Inter-subnet Routing
There are multiple models in EVPN for routing between different subnets (VLANs). These models arise due
to the following two main considerations:
Does every VTEP act as an L3 gateway and do routing, or only specific VTEPs do routing?
Is routing done only at the ingress of the VXLAN tunnel or is it done at both the ingress and the
egress of the VXLAN tunnel?
These models are:
cumulusnetworks.com 473
Cumulus Linux 3.5 User Guide
Centralized routing: Specific VTEPs act as designated L3 gateways and do routing between
subnets; other VTEPs just do bridging.
Distributed asymmetric routing: Every VTEP participates in routing, but all routing is done at the
ingress VTEP; the egress VTEP only does bridging.
Distributed symmetric routing: Every VTEP participates in routing and routing is done at both
the ingress VTEP and the egress VTEP.
Distributed routing — asymmetric or symmetric — is commonly deployed with the VTEPs configured with
an anycast IP/MAC address for each subnet. That is, each VTEP that has a particular subnet is configured with
the same IP/MAC for that subnet. Such a model facilitates easy host/VM mobility as there is no need to
change the host/VM configuration when it moves from one VTEP to another.
EVPN in Cumulus Linux supports all of the routing models listed above. The models are described further in
the following sections.
All routing happens in the context of a tenant VRF (virtual routing and forwarding (see page 693)). A VRF
instance is provisioned for each tenant, and the subnets of the tenant are associated with that VRF (the
corresponding SVI is attached to the VRF). Inter-subnet routing for each tenant occurs within the context of
that tenant's VRF, and is separate from the routing for other tenants.
When configuring VXLAN routing (see page 508), Cumulus Networks recommends enabling ARP
suppression on all VXLAN interfaces. Otherwise, when a locally attached host ARPs for the
gateway, it will receive multiple responses, one from each anycast gateway.
Centralized Routing
In centralized routing, a specific VTEP is configured to act as the default gateway for all the hosts in a
particular subnet throughout the EVPN fabric. It is common to provision a pair of VTEPs in active-active
mode as the default gateway, using an anycast IP/MAC address for each subnet. All subnets need to be
configured on such gateway VTEP(s). When a host in one subnet wants to communicate with a host in
another subnet, it addresses the packets to the gateway VTEP. The ingress VTEP (to which the source host
is attached) bridges the packets to the gateway VTEP over the corresponding VXLAN tunnel. The gateway
VTEP performs the routing to the destinaion host and post-routing, the packet gets bridged to the egress
VTEP (to which the destination host is attached). The egress VTEP then bridges the packet on to the
destination host.
Centralized routing can be deployed at the VNI level. Thus, you can configure the
advertise-default-gw command per VNI so that centralized routing is used for some
VNIs while distributed routing (described below) is used for other VNIs. This type of
configuration is not recommended unless the deployment requires it.
When centralized routing is in use, even if the source host and destination host are
attached to the same VTEP, the packets travel to the gateway VTEP to get routed and then
come back.
Asymmetric Routing
In distributed asymmetric routing, each VTEP acts as a layer 3 gateway, performing routing for its attached
hosts. The routing is called asymmetric because only the ingress VTEP performs routing, the egress VTEP
only performs the bridging. Asymmetric routing is easy to deploy as it can be achieved with only host
routing and does not involve any interconnecting VNIs. However, each VTEP must be provisioned with all
VLANs/VNIs — the subnets between which communication can take place; this is required even if there are
no locally-attached hosts for a particular VLAN.
The only additional configuration required to implement asymmetric routing beyond the standard
configuration for a layer 2 VTEP described earlier is to ensure that each VTEP has all VLANs (and
corresponding VNIs) provisioned on it and the SVI for each such VLAN is configured with an
anycast IP/MAC address.
Symmetric Routing
In distributed symmetric routing, each VTEP acts as a layer 3 gateway, performing routing for its attached
hosts. This is the same as in asymmetric routing. The difference is that in symmetric routing, both the
ingress VTEP and egress VTEP route the packets. Thus, it can be compared to the traditional routing
behavior of routing to a next hop router. In the VXLAN encapsulated packet, the inner destination MAC
address is set to the router MAC address of the egress VTEP as an indication that the egress VTEP is the
next hop and also needs to perform routing. All routing happens in the context of a tenant (VRF). For a
packet received by the ingress VTEP from a locally attached host, the SVI interface corresponding to the
VLAN determines the VRF. For a packet received by the egress VTEP over VXLAN tunnel, the VNI in the
packet has to specify the VRF. For symmetric routing, this is a VNI corresponding to the tenant and is
different from either the source VNI or the destination VNI. This VNI is referred to as the L3-VNI or
interconnecting VNI; it has to be provisioned by the operator and is exchanged through the EVPN control
plane. In order to make the distinction clear, the regular VNI, which is used to map a VLAN, is referred to as
the L2-VNI.
L3-VNI
There is a one-to-one mapping between an L3-VNI and a tenant (VRF).
The VRF to L3-VNI mapping has to be consistent across all VTEPs. The L3-VNI has to be
provisioned by the operator.
In addition to the L3-VNI, the other parameter relevant to symmetric routing is the router MAC address.
This should be a MAC address of the VTEP so that it knows to route the packet after VXLAN decapsulation.
This parameter is automatically derived from the MAC address of the SVI corresponding to the L3-VNI. It is
also exchanged through the EVPN control plane.
For EVPN symmetric routing, the additional configuration required is as follows:
1. Configure a per-tenant VXLAN interface that specifies the L3-VNI for the tenant. This VXLAN interface
is part of the bridge and router MAC addresses of remote VTEPs is installed over this interface.
2. Configure an SVI (layer 3 interface) corresponding to the per-tenant VXLAN interface. This is attached
to the tenant's VRF. Remote host routes for symmetric routing are installed over this SVI.
3. Specify the mapping of VRF to L3-VNI. This configuration is for the BGP control plane.
The above commands create the following snippet in the /etc/network/interfaces file:
auto vni104001
iface vni104001
bridge-access 4001
bridge-arp-nd-suppress on
bridge-learning off
vxlan-id 104001
vxlan-local-tunnelip 10.0.0.11
auto bridge
iface bridge
bridge-ports vni104001
bridge-vlan-aware yes
auto vlan4001
iface vlan4001
vlan-id 4001
vlan-raw-device bridge
vrf blue
When two VTEPs are operating in VXLAN active-active mode and performing symmetric routing,
their router MAC needs to be configured corresponding to each L3-VNI in order to ensure both
VTEPs use the same MAC address. Do this by specifying the hwaddress (MAC address) for the
SVI corresponding to the L3-VNI, with the same address used on both switches in the MLAG pair.
It is recommended that you use the MLAG system MAC address for this purpose. The
corresponding snippet of configuration in the /etc/network/interfaces file looks like this:
auto vlan4001
iface vlan4001
hwaddress 44:39:39:FF:40:94
vlan-id 4001
vlan-raw-device bridge
vrf blue
vrf blue
vni 104001
!
cumulusnetworks.com 477
Cumulus Linux 3.5 User Guide
1. Enable advertisement of EVPN prefix (type-5) routes. Refer to the next section (see page 478) for the
steps to do this.
2. Ensure that the routes corresponding to the connected subnets are known in the BGP VRF routing
table by injecting them using the network command or redistributing them using the
redistribute connected command.
This configuration is recommended only if the deployment is known to have silent hosts. It is also
recommended that this should be enabled on only one VTEP per subnet, or two for redundancy.
An earlier version of this chapter referred to the advertise-subnet command. This command
has been deprecated and should not be used.
When connecting to a WAN edge router to reach destinations outside the data center, it is highly
recommended that specific border/exit leaf switches be deployed to originate the type-5 routes.
On switches with the Mellanox Spectrum chipset, centralized routing, symmetric routing and
prefix-based routing only functions with the Spectrum A1 chip.
If you are using a Broadcom Trident II+ switch as a border/exit leaf, review release note 766 for a
necessary workaround; the workaround only applies to Trident II+ switches, not Tomahawk or
Spectrum.
1. Configure a per-tenant VXLAN interface that specifies the L3-VNI for the tenant. This VXLAN interface
is part of the bridge; router MAC addresses of remote VTEPs are installed over this interface.
2. Configure an SVI (layer 3 interface) corresponding to the per-tenant VXLAN interface. This is attached
to the tenant's VRF. The remote prefix routes are installed over this SVI.
3. Specify the mapping of the VRF to L3-VNI. This configuration is for the BGP control plane.
cumulus@bl1:~$ net add bgp vrf vrf1 l2vpn evpn advertise ipv4 unicast
cumulus@bl1:~$ net pending
cumulus@bl1:~$ net commit
auto bridge
cumulusnetworks.com 479
Cumulus Linux 3.5 User Guide
iface bridge
bridge-ports swp1 vni10101
bridge-vids 101
bridge-vlan-aware yes
post-up bridge fdb add 00:11:22:33:44:55 dev swp1 vlan 101 master
static
For a bridge in traditional mode (see page 337), you must edit the bridge configuration in the
/etc/network/interfaces file in a text editor:
auto br101
iface br101
bridge-ports swp1.101 vni10101
bridge-learning vni10101=off
post-up bridge fdb add 00:11:22:33:44:55 dev swp1.101
master static
A sample output of bridge fdb show is depicted below. Some information of interest from this output
are:
swp3 and swp4 are access ports with VLAN ID 100. This is mapped to VXLAN interface vni100.
00:02:00:00:00:01 is a local host MAC learned on swp3.
The remote VTEPs which participate in VLAN ID 100 are 110.0.0.3, 110.0.0.4 and 110.0.0.2. This is
evident from the FDB entries with a MAC address of 00:00:00:00:00:00. These entries are used for
BUM traffic replication.
00:02:00:00:00:06 is a remote host MAC reachable over the VXLAN tunnel to 110.0.0.2.
A sample output of ip neigh show is depicted below. Some interesting information from this output
includes:
50.1.1.11 is a locally-attached host on VLAN 100. It is shown twice because of the configuration of
the anycast IP/MAC on the switch.
50.1.1.42 is a remote host on VLAN 100 and 60.1.1.23 is a remote host on VLAN 200. The MAC
address of these hosts can be examined using the bridge fdb show command described earlier
to determine the VTEPs behind which these hosts are located.
cumulusnetworks.com 481
Cumulus Linux 3.5 User Guide
You can examine the underlay routing, which determines how remote VTEPs are reached, by running the
net show route command. Here is some sample output from a leaf switch:
=============
Codes: K - kernel route, C - connected, S - static, R - RIP,
O - OSPF, I - IS-IS, B - BGP, P - PIM, E - EIGRP, N - NHRP,
T - Table, v - VNC, V - VNC-Direct, A - Babel,
> - selected route, * - FIB route
K>* 0.0.0.0/0 [0/0] via 192.168.0.2, eth0, 1d03h41m
B>* 20.0.0.1/32 [20/0] via fe80::202:ff:fe00:29, swp1, 1d03h40m
B>* 20.0.0.2/32 [20/0] via fe80::202:ff:fe00:2d, swp2, 1d03h40m
C>* 110.0.0.1/32 is directly connected, lo, 1d03h41m
B>* 110.0.0.2/32 [20/0] via fe80::202:ff:fe00:29, swp1, 1d03h40m
* via fe80::202:ff:fe00:2d, swp2, 1d03h40m
B>* 110.0.0.3/32 [20/0] via fe80::202:ff:fe00:29, swp1, 1d03h40m
* via fe80::202:ff:fe00:2d, swp2, 1d03h40m
B>* 110.0.0.4/32 [20/0] via fe80::202:ff:fe00:29, swp1, 1d03h40m
* via fe80::202:ff:fe00:2d, swp2, 1d03h40m
C>* 192.168.0.0/24 is directly connected, eth0, 1d03h41m
You can view the MAC forwarding database on the switch by running the net show bridge macs
command:
cumulusnetworks.com 483
Cumulus Linux 3.5 User Guide
You can examine the EVPN information for a specific VNI in detail. The following output shows the details
for the L2-VNI 10100 as well as for the L3-VNI 104001. For the L2-VNI, the remote VTEPs which have that
VNI are shown. For the L3-VNI, the router MAC and associated L2-VNIs are shown. The state of the L3-VNI
depends on the state of its associated VRF as well as the states of its underlying VXLAN interface and SVI.
cumulusnetworks.com 485
Cumulus Linux 3.5 User Guide
110.0.0.4
110.0.0.3
Number of MACs (local and remote) known for this VNI: 8
Number of ARPs (IPv4 and IPv6, local and remote) known for this VNI:
12
Advertise-gw-macip: No
cumulus@leaf01:~$
cumulus@leaf01:~$ net show evpn vni 104001
VNI: 104001
Type: L3
Tenant VRF: vrf1
Local Vtep Ip: 110.0.0.1
Vxlan-Intf: vni4001
SVI-If: vlan4001
State: Up
Router MAC: 00:01:00:00:11:00
L2 VNIs: 10100 10200
cumulus@leaf01:~$
You can examine MAC addresses for all VNIs using net show evpn mac vni all.
You can examine the details for a specific MAC addresses or query all remote MAC addresses behind a
specific VTEP:
You can examine neighbor entries for all VNIs using net show evpn arp-cache vni all.
You can examine router MACs for all L3-VNIs using net show evpn rmac vni all.
You can examine gateway next hops for all L3-VNIs using net show evpn next-hops vni all.
You can query a specific next hop; the output displays the remote host and prefix routes via this next hop:
VRF vrf1:
K * 0.0.0.0/0 [255/8192] unreachable (ICMP unreachable), 1d02h42m
C * 50.1.1.0/24 is directly connected, vlan100-v0, 1d02h42m
As you can see in the output above, the next hops for these routes are specified by EVPN to be onlink, or
reachable over the specified SVI. This is necessary because this interface is not required to have an IP
address, and even if it is configured with an IP address, the next hop would not be on the same subnet as it
is usually the remote VTEP's IP address, and hence, part of the underlay IP network.
cumulusnetworks.com 489
Cumulus Linux 3.5 User Guide
...
The routing table can be filtered based on EVPN route type. The available options are as shown below:
cumulus@leaf01:~$ net show bgp l2vpn evpn route rd 110.0.0.4:3 mac 00:
02:00:00:00:10 ip 60.1.1.44
BGP routing table entry for 110.0.0.4:3:[2]:[0]:[0]:[48]:[00:02:00:00:
00:10]:[32]:[60.1.1.44]
Paths: (2 available, best #2)
Advertised to non peer-group peers:
s1(swp1) s2(swp2)
Route [2]:[0]:[0]:[48]:[00:02:00:00:00:10]:[32]:[60.1.1.44] VNI
10200/104001
65100 65004
110.0.0.4 from s2(swp2) (20.0.0.2)
Origin IGP, localpref 100, valid, external
Extended Community: RT:65004:10200 RT:65004:104001 ET:8 Rmac:00:
01:00:00:14:00
AddPath ID: RX 0, TX 97
Last update: Sun Dec 17 20:57:24 2017
Route [2]:[0]:[0]:[48]:[00:02:00:00:00:10]:[32]:[60.1.1.44] VNI
10200/104001
65100 65004
110.0.0.4 from s1(swp1) (20.0.0.1)
Origin IGP, localpref 100, valid, external, bestpath-from-AS
65100, best
Extended Community: RT:65004:10200 RT:65004:104001 ET:8 Rmac:00:
01:00:00:14:00
AddPath ID: RX 0, TX 71
Last update: Sun Dec 17 20:57:23 2017
cumulus@leaf01:~$
Only global VNIs are supported. So even though VNI values are exchanged in the type-2
and type-5 routes, the received values are not used when installing the routes into the
forwarding plane; the local configuration is used. You must ensure that the VLAN to VNI
mappings and the L3-VNI assignment for a tenant VRF are uniform throughout the
network.
If the remote host is dual attached, the next hop for the EVPN route is the anycast IP
address of the remote MLAG (see page 348) pair, when MLAG is active.
The following example shows a prefix (type-5) route. Such a route has only the L3-VNI and the route target
corresponding to this VNI. This route is learned via two paths, one through each spine switch.
cumulusnetworks.com 491
Cumulus Linux 3.5 User Guide
To display the VNI routing table for all VNIs, run net show bgp evpn route vni all.
cumulus@switch:~$ net show bgp evpn route vni 10109 mac 00:02:22:22:
22:02
BGP routing table entry for [2]:[0]:[0]:[48]:[00:02:22:22:22:02]
Paths: (1 available, best #1)
Not advertised to any peer
Route [2]:[0]:[0]:[48]:[00:02:22:22:22:02] VNI 10109
Local
6.0.0.184 from 0.0.0.0 (6.0.0.184)
Origin IGP, localpref 100, weight 32768, valid, sourced, local,
bestpath-from-AS Local, best
Extended Community: RT:650184:10109 ET:8 MM:3
AddPath ID: RX 0, TX 10350121
Last update: Tue Feb 14 18:40:37 2017
cumulusnetworks.com 493
Cumulus Linux 3.5 User Guide
cumulus@switch:~$ net show bgp evpn route vni 10101 mac 00:02:00:00:
00:01
BGP routing table entry for [2]:[0]:[0]:[48]:[00:02:00:00:00:01]
Paths: (1 available, best #1)
Not advertised to any peer
Route [2]:[0]:[0]:[48]:[00:02:00:00:00:01] VNI 10101
Local
6.0.0.18 from 0.0.0.0 (6.0.0.18)
Origin IGP, localpref 100, weight 32768, valid, sourced, local,
bestpath-from-AS Local, best
Extended Community: ET:8 RT:60176:10101 MM:0, sticky MAC
AddPath ID: RX 0, TX 46
Last update: Tue Apr 11 21:44:02 2017
Troubleshooting EVPN
The primary way to troubleshoot EVPN is by enabling FRR debug logs. The relevant debug options are:
debug zebra vxlan — which traces VNI addition and deletion (local and remote) as well as MAC
and neighbor addition and deletion (local and remote).
debug zebra kernel — which traces actual netlink messages exchanged with the kernel, which
includes everything, not just EVPN.
debug bgp updates — which traces BGP update exchanges, including all updates. Output is
extended to show EVPN specific information.
debug bgp zebra — which traces interactions between BGP and zebra for EVPN (and other)
routes.
Caveats
The following caveats apply to EVPN in this version of Cumulus Linux:
When EVPN is enabled on a switch (VTEP), all locally defined VNIs on that switch and other
information (such as MAC addresses) pertaining to them will be advertised to EVPN peers. There is
no provision to only announce certain VNIs.
In a VXLAN active-active (see page 435) configuration, ARPs may sometimes not be suppressed even
if ARP suppression is enabled. This is because the neighbor entries are not synced between the two
switches operating in active-active mode by a control plane. This has no impact on forwarding.
Symmetric routing and prefix-based routing are only supported for IPv4 hosts and IPv4 prefixes;
IPv6 routing is currently supported with asymmetric routing only.
Currently, only switches with the Mellanox Spectrum chipset or the Broadcom Tomahawk chipset
can be deployed as a border/exit leaf. If you want to use a Broadcom Trident II+ switch as a border
/exit leaf, please read release note 766 for a necessary workaround; the workaround only applies to
Trident II+ switches, not Tomahawk or Spectrum.
The overlay (tenants) need to be configured in a specific VRF(s) and separated from the underlay,
which resides in the default VRF. An L3-VNI mapping for the default VRF is not supported.
Example Configuration
cumulusnetworks.com 495
Cumulus Linux 3.5 User Guide
auto lo auto lo
iface lo inet loopback iface lo inet loopback
address 10.0.0.11/32 address 10.0.0.12/32
auto eth0 auto eth0
iface eth0 inet dhcp iface eth0 inet dhcp
# uplinks # uplinks
auto swp51 auto swp51
iface swp51 iface swp51
auto swp52 auto swp52
iface swp52 iface swp52
auto bridge auto bridge
iface bridge iface bridge
bridge-ports swp1 bridge-ports swp2
vxlan10001 vxlan10100 vxlan10001 vxlan10100
vxlan10200 vxlan10200
bridge-vlan-aware yes bridge-vlan-aware yes
bridge-vids 1 100 200 bridge-vids 1 100 200
bridge-pvid 1 bridge-pvid 1
auto vxlan10001 auto vxlan10001
iface vxlan10001 iface vxlan10001
vxlan-id 10001 vxlan-id 10001
vxlan-local-tunnelip vxlan-local-tunnelip
10.0.0.11 10.0.0.12
bridge-access 1 bridge-access 1
bridge-learning off bridge-learning off
auto vxlan10100 auto vxlan10100
iface vxlan10100 iface vxlan10100
vxlan-id 10100 vxlan-id 10100
vxlan-local-tunnelip vxlan-local-tunnelip
10.0.0.11 10.0.0.12
bridge-access 100 bridge-access 100
bridge-learning off bridge-learning off
auto vxlan10200 auto vxlan10200
iface vxlan10200 iface vxlan10200
vxlan-id 10200 vxlan-id 10200
vxlan-local-tunnelip vxlan-local-tunnelip
10.0.0.11 10.0.0.12
bridge-access 200 bridge-access 200
bridge-learning off bridge-learning off
! !
interface swp51 interface swp51
ipv6 nd ra-interval 10 ipv6 nd ra-interval 10
no ipv6 nd suppress-ra no ipv6 nd suppress-ra
! !
interface swp52 interface swp52
ipv6 nd ra-interval 10 ipv6 nd ra-interval 10
no ipv6 nd suppress-ra no ipv6 nd suppress-ra
! !
router bgp 65011 router bgp 65012
bgp router-id 10.0.0.11 bgp router-id 10.0.0.12
bgp bestpath as-path bgp bestpath as-path
multipath-relax multipath-relax
neighbor fabric peer-group neighbor fabric peer-group
neighbor fabric remote-as neighbor fabric remote-as
external external
neighbor fabric description neighbor fabric description
Internal Fabric Network Internal Fabric Network
neighbor fabric capability neighbor fabric capability
extended-nexthop extended-nexthop
neighbor swp51 interface peer- neighbor swp51 interface peer-
group fabric group fabric
neighbor swp52 interface peer- neighbor swp52 interface peer-
group fabric group fabric
! !
address-family ipv4 unicast address-family ipv4 unicast
network 10.0.0.11/32 network 10.0.0.12/32
neighbor fabric prefix-list neighbor fabric prefix-list
dc-leaf-in in dc-leaf-in in
neighbor fabric prefix-list neighbor fabric prefix-list
dc-leaf-out out dc-leaf-out out
exit-address-family exit-address-family
! !
! !
address-family ipv6 unicast address-family ipv6 unicast
neighbor fabric activate neighbor fabric activate
exit-address-family exit-address-family
! !
address-family l2vpn evpn address-family l2vpn evpn
neighbor fabric activate neighbor fabric activate
advertise-all-vni advertise-all-vni
exit-address-family exit-address-family
exit exit
! !
ip prefix-list dc-leaf-in seq ip prefix-list dc-leaf-in seq
10 permit 0.0.0.0/0 10 permit 0.0.0.0/0
ip prefix-list dc-leaf-in seq ip prefix-list dc-leaf-in seq
20 permit 10.0.0.0/24 le 32 20 permit 10.0.0.0/24 le 32
ip prefix-list dc-leaf-in seq ip prefix-list dc-leaf-in seq
500 deny any 500 deny any
cumulusnetworks.com 497
Cumulus Linux 3.5 User Guide
auto lo auto lo
iface lo inet loopback iface lo inet loopback
address 10.0.0.21/32 address 10.0.0.22/32
auto eth0 auto eth0
iface eth0 inet dhcp iface eth0 inet dhcp
# downlinks # downlinks
auto swp1 auto swp1
iface swp1 iface swp1
auto swp2 auto swp2
iface swp2 iface swp2
auto swp3 auto swp3
iface swp3 iface swp3
auto swp4 auto swp4
iface swp4 iface swp4
! !
interface swp1 interface swp1
ipv6 nd ra-interval 10 ipv6 nd ra-interval 10
no ipv6 nd suppress-ra no ipv6 nd suppress-ra
! !
interface swp2 interface swp2
ipv6 nd ra-interval 10 ipv6 nd ra-interval 10
no ipv6 nd suppress-ra no ipv6 nd suppress-ra
! !
interface swp3 interface swp3
ipv6 nd ra-interval 10 ipv6 nd ra-interval 10
no ipv6 nd suppress-ra no ipv6 nd suppress-ra
! !
interface swp4 interface swp4
ipv6 nd ra-interval 10 ipv6 nd ra-interval 10
no ipv6 nd suppress-ra no ipv6 nd suppress-ra
! !
router bgp 65020 router bgp 65020
bgp router-id 10.0.0.21 bgp router-id 10.0.0.22
bgp bestpath as-path bgp bestpath as-path
multipath-relax multipath-relax
neighbor fabric peer-group neighbor fabric peer-group
neighbor fabric remote-as neighbor fabric remote-as
external external
cumulusnetworks.com 499
Cumulus Linux 3.5 User Guide
cumulusnetworks.com 501
Cumulus Linux 3.5 User Guide
cumulusnetworks.com 503
Cumulus Linux 3.5 User Guide
cumulusnetworks.com 505
Cumulus Linux 3.5 User Guide
# downlinks # downlinks
auto swp1 auto swp1
iface swp1 iface swp1
cumulusnetworks.com 507
Cumulus Linux 3.5 User Guide
VXLAN Routing
VXLAN routing, sometimes referred to as inter-VXLAN routing, provides IP routing between VXLAN VNIs in
overlay networks. The routing of traffic is based on the inner header or the overlay tenant IP address.
VXLAN routing is supported on the following platforms:
Broadcom Tomahawk using an internal loopback on one or more switch ports
Broadcom Maverick and Trident II+
If you want to use VXLAN routing on a Trident II switch, you must use a hyperloop (see
page 527).
Mellanox Spectrum
Switches with the Mellanox Spectrum A0 ASIC can only operate in asymmetric mode (see
page 475).
Contents
This chapter covers ...
VXLAN Routing Data Plane and the Broadcom Tomahawk and Trident II+ Platforms (see page )
Trident II+ (see page )
Tomahawk (see page 510)
Configuring VXLAN Routing (see page 511)
Configuring the Underlays (see page 511)
Configuring the Server-facing Downlinks (see page 513)
Configuring BGP EVPN (see page 514)
Configuring the VXLANs (see page 516)
Resulting Configurations (see page 520)
leaf01 (see page 520)
leaf03 (see page 522)
spine01 (see page 524)
VXLAN Routing with Active-Active VTEPs (see page 525)
VXLAN with VRFs (see page 525)
Viewing VXLAN Routing Information (see page 526)
Troubleshooting VXLAN Routing (see page 527)
cumulusnetworks.com 509
Cumulus Linux 3.5 User Guide
VXLAN Routing Data Plane and the Broadcom Tomahawk and Trident II+
Platforms
On switches with Broadcom ASICs, VXLAN routing is supported only on the Tomahawk and Trident II+
platforms. Below are some differences in how VXLAN routing works on these switches.
Trident II+
For Trident II+ switches, you can specify a VXLAN routing (RIOT — routing in and out of tunnels) profile in
the vxlan_routing_overlay.profile field in the /usr/lib/python2.7/dist-packages/cumulus
/__chip_config/bcm/datapath.conf file if you don't want to use the default. This profile determines
the maximum number of overlay next hops (adjacency entries). The profile is one of the following:
default: 15% of the underlay next hops are set apart for overlay, up to a maximum of 8k next hops
mode-1: 25% of the underlay next hops are set apart for overlay
mode-2: 50% of the underlay next hops are set apart for overlay
mode-3: 80% of the underlay next hops are set apart for overlay
disable: disables VXLAN routing
The Trident II+ ASIC supports a maximum of 48k underlay next hops.
The maximum number of VXLAN SVI interfaces that can be allocated is 2k (2048) regardless of which profile
you specify.
If you want to disable VXLAN routing on a Trident II+ switch, set the vxlan_routing_overlay.profile
field to disable.
Tomahawk
The Tomahawk ASIC does not support RIOT natively, so you must configure the switch ports for VXLAN
routing to use the internal loopback. The internal loopback facilitates the recirculation of packets through
the ingress pipeline to achieve VXLAN routing. One or more loopback switch ports can be bundled into a
loopback trunk based on the amount of bandwidth needed.
VXLAN routing using the internal loopback is supported only with VLAN-aware bridges (see page
325); you cannot use a bridge in traditional mode (see page 337).
To configure one or more switch ports for loopback mode, edit the /etc/cumulus/ports.conf file,
changing the port speed to loopback. In the example below, swp8 and swp9 are configured for loopback
mode:
...
7=4x10G
8=loopback
9=loopback
10=100G
...
After you save your changes to the ports.conf file, you must restart switchd (see page 190)for the
changes to take effect.
When configuring VXLAN routing, Cumulus Networks recommends that you enable ARP
suppression (see page 471) on all VXLAN interfaces. Otherwise, when a locally-attached host ARPs
for the gateway, it will receive multiple responses, one from each anycast gateway.
leaf03:
cumulusnetworks.com 511
Cumulus Linux 3.5 User Guide
leaf03:
spine01:
leaf03:
spine01:
4.
512 02 March 2018
Cumulus Networks
4. Verify the loopback addresses are advertised and learned by all VTEPs (look for the line that starts
with B>*).
leaf01:
show ip route
=============
Codes: K - kernel route, C - connected, S - static, R - RIP,
O - OSPF, I - IS-IS, B - BGP, P - PIM, E - EIGRP, N -
NHRP,
T - Table, v - VNC, V - VNC-Direct, A - Babel,
> - selected route, * - FIB route
C>* 10.0.0.11/32 is directly connected, lo
B>* 10.0.0.13/32 [20/0] via fe80::eef4:bbff:fefc:bf36, swp51, 00:
10:57
C * 10.100.0.0/24 is directly connected, vlan100-v0
C>* 10.100.0.0/24 is directly connected, vlan100
C * 10.200.0.0/24 is directly connected, vlan200-v0
C>* 10.200.0.0/24 is directly connected, vlan200
leaf03:
show ip route
=============
Codes: K - kernel route, C - connected, S - static, R - RIP,
O - OSPF, I - IS-IS, B - BGP, P - PIM, E - EIGRP, N -
NHRP,
T - Table, v - VNC, V - VNC-Direct, A - Babel,
> - selected route, * - FIB route
B>* 10.0.0.11/32 [20/0] via fe80::eef4:bbff:fefc:bf3e, swp51, 00:
00:12
C>* 10.0.0.13/32 is directly connected, lo
C * 10.100.0.0/24 is directly connected, vlan100-v0
C>* 10.100.0.0/24 is directly connected, vlan100
C * 10.200.0.0/24 is directly connected, vlan200-v0
C>* 10.200.0.0/24 is directly connected, vlan200
cumulusnetworks.com 513
Cumulus Linux 3.5 User Guide
leaf03:
The real IP addresses assigned to each SVI must be unique per VTEP. The virtual address can be
reused as the anycast gateway.
leaf03:
spine01:
514 02 March 2018
Cumulus Networks
spine01:
leaf03:
cumulusnetworks.com 515
Cumulus Linux 3.5 User Guide
leaf03:
2. Disable bridge learning and enable ARP suppression for VXLAN routing, then review and commit
your changes.
leaf01:
leaf03:
cumulusnetworks.com 517
Cumulus Linux 3.5 User Guide
10.0.0.13 0
65020 65013 i
Route Distinguisher: 10.0.0.13:2
*> [2]:[0]:[0]:[48]:[90:e2:ba:7e:96:d8]
10.0.0.13 0
65020 65013 i
*> [2]:[0]:[0]:[48]:[90:e2:ba:7e:96:d8]:[32]:[10.200.0.20]
10.0.0.13 0
65020 65013 i
*> [2]:[0]:[0]:[48]:[90:e2:ba:7e:96:d8]:[128]:[fe80::92e2:baff:
fe7e:96d8]
10.0.0.13 0
65020 65013 i
*> [3]:[0]:[32]:[10.0.0.13]
10.0.0.13 0
65020 65013 i
Route Distinguisher: 10.200.0.2:1
*> [2]:[0]:[0]:[48]:[90:e2:ba:7e:98:94]
10.0.0.11 32768 i
*> [2]:[0]:[0]:[48]:[90:e2:ba:7e:98:94]:[32]:[10.100.0.10]
10.0.0.11 32768 i
*> [3]:[0]:[32]:[10.0.0.11]
10.0.0.11 32768 i
Route Distinguisher: 10.200.0.2:2
*> [3]:[0]:[32]:[10.0.0.11]
10.0.0.11 32768 i
leaf03:
*> [2]:[0]:[0]:[48]:[90:e2:ba:7e:96:d8]:[128]:[fe80::92e2:baff:
fe7e:96d8]
10.0.0.13 32768 i
*> [3]:[0]:[32]:[10.0.0.13]
10.0.0.13 32768 i
Route Distinguisher: 10.200.0.2:1
*> [2]:[0]:[0]:[48]:[90:e2:ba:7e:98:94]
10.0.0.11 0
65020 65011 i
*> [2]:[0]:[0]:[48]:[90:e2:ba:7e:98:94]:[32]:[10.100.0.10]
10.0.0.11 0
65020 65011 i
*> [3]:[0]:[32]:[10.0.0.11]
10.0.0.11 0
65020 65011 i
Route Distinguisher: 10.200.0.2:2
*> [3]:[0]:[32]:[10.0.0.11]
10.0.0.11 0
65020 65011 i
4. Verify the VXLAN entries are programmed into the bridge table.
leaf01:
leaf03:
cumulusnetworks.com 519
Cumulus Linux 3.5 User Guide
Resulting Configurations
Following are the resulting interfaces and routing configurations for the three nodes you configured above:
leaf01, leaf03 and spine01.
leaf01
leaf01 /etc/network/interfaces
bridge-arp-nd-suppress on
bridge-learning off
mstpctl-bpduguard yes
mstpctl-portbpdufilter yes
mtu 9152
vxlan-id 10100
vxlan-local-tunnelip 10.0.0.11
auto VNI200
iface VNI200
bridge-access 200
bridge-arp-nd-suppress on
bridge-learning off
mstpctl-bpduguard yes
mstpctl-portbpdufilter yes
mtu 9152
vxlan-id 10200
vxlan-local-tunnelip 10.0.0.11
auto bridge
iface bridge
bridge-ports VNI100 VNI200 swp1
bridge-vids 100 200
bridge-vlan-aware yes
auto mgmt
iface mgmt
address 127.0.0.1/8
vrf-table auto
auto vlan100
iface vlan100
address 10.100.0.2/24
address-virtual 00:00:00:00:00:1a 10.100.0.1/24
vlan-id 100
vlan-raw-device bridge
auto vlan200
iface vlan200
address 10.200.0.2/24
address-virtual 00:00:00:00:00:1b 10.200.0.1/24
vlan-id 200
vlan-raw-device bridge
leaf01 /etc/frr/frr.conf
cumulusnetworks.com 521
Cumulus Linux 3.5 User Guide
interface swp51
ipv6 nd ra-interval 10
no ipv6 nd suppress-ra
!
router bgp 65011
bgp router-id 10.0.0.11
neighbor swp51 interface remote-as external
!
address-family ipv4 unicast
network 10.0.0.11/32
exit-address-family
!
address-family l2vpn evpn
neighbor swp51 activate
advertise-all-vni
exit-address-family
!
line vty
!
leaf03
leaf03 /etc/network/interfaces
mtu 9152
vxlan-id 10100
vxlan-local-tunnelip 10.0.0.13
auto VNI200
iface VNI200
bridge-access 200
bridge-arp-nd-suppress on
bridge-learning off
mstpctl-bpduguard yes
mstpctl-portbpdufilter yes
mtu 9152
vxlan-id 10200
vxlan-local-tunnelip 10.0.0.13
auto bridge
iface bridge
bridge-ports VNI100 VNI200 swp1
bridge-vids 100 200
bridge-vlan-aware yes
auto mgmt
iface mgmt
address 127.0.0.1/8
vrf-table auto
auto vlan100
iface vlan100
address 10.100.0.4/24
address-virtual 00:00:00:00:00:1a 10.100.0.1/24
vlan-id 100
vlan-raw-device bridge
auto vlan200
iface vlan200
address 10.200.0.4/24
address-virtual 00:00:00:00:00:1b 10.200.0.1/24
vlan-id 200
vlan-raw-device bridge
leaf03 /etc/frr/frr.conf
cumulusnetworks.com 523
Cumulus Linux 3.5 User Guide
spine01
spine01 /etc/network/interfaces
spine01 /etc/frr/frr.conf
524 02 March 2018
Cumulus Networks
spine01 /etc/frr/frr.conf
cumulusnetworks.com 525
Cumulus Linux 3.5 User Guide
cumulus@switch:~$ ip route
45.0.0.0/26 dev vlan1000 proto kernel scope link src 45.0.0.16
45.0.0.64/26 dev vlan1001 proto kernel scope link src 45.0.0.80
cumulus@switch:~$ ip neighbor
45.0.0.70 dev vlan1001 lladdr 00:02:00:00:00:0c STALE
45.0.0.72 dev vlan1001 lladdr 00:02:00:00:00:10 REACHABLE
45.0.0.5 dev vlan1000 lladdr 00:02:00:00:00:0a REACHABLE
VXLAN Hyperloop
This chapter covers configuring VXLAN gateways using a loopback cable (which we call a hyperloop) on non-
RIOT (VXLAN routing) capable ASICs running Cumulus Linux.
The Broadcom Trident II and Tomahawk ASICs have a limitation where a layer 2 bridge that contains a
VXLAN interface can not also have an IP address assigned to it. This is an expected limitation with this ASIC,
because of the ordering of the decapsulation. A packet that is decapsulated will already have passed the
portion of the ASIC capable of reading the IP address lookup (for example, VXLAN lookup happens before
IP address lookup). Please contact your sales team if there is any confusion. Refer to the Cumulus Networks
Hardware Compatibility List to determine which ASIC is running on the switch.
This limitation does not exist in some ASICs. For example, the Trident II+ has the RIOT (Routing In/Out
of Tunnels) feature; see VXLAN Routing (see page 508) for more information.
Contents
This chapter covers ...
Requirements (see page 528)
Hyperloop Use Cases (see page 528)
Exiting a VXLAN with a Hyperloop (see page 529)
(see page 529)
Packet Flow Diagram (see page 531)
Trident II and Tomahawk switchd Flag (see page 531)
VXLAN Hyperloop Troubleshooting Matrix (see page 532)
Are HER (Head End Replication) entries being programmed into the bridge fdb table? (see
page 532)
If you are using an MLAG VTEP (dual attached), is it set up correctly? (see page 533)
Can you ping from host to host on the same VXLAN? (see page 534)
Is the SVI on a physical interface or on a traditional bridge? (see page 535)
Is the port plugged in where it is supposed to be plugged in? (see page 535)
Is the VRR MAC address unique per subnet? (see page 535)
cumulusnetworks.com 527
Cumulus Linux 3.5 User Guide
Is the VRR MAC address unique per subnet? (see page 535)
Requirements
VXLAN hyperloop only works on an ASIC capable of encapsulating and decapsulating VXLAN traffic,
which includes:
Broadcom Tomahawk
Broadcom Trident II
Broadcom Trident II+
Mellanox Spectrum
VXLAN hyperloop is supported on Cumulus Linux 3.2.1 and later. Make sure to upgrade to the latest
version (see page 44) of Cumulus Linux.
If you are using EVPN (see page 463), you must be running FRRouting version eau8. Use a dpkg -l
to check the FRRouting version:
If you are not running the right version of FRRouting for EVPN, follow these directions (see page 463)
to upgrade.
This limitation means a physical cable must be attached from one port on leaf1 to another port on leaf1.
One port is a layer 3 port while the other is a member of the bridge. The native VLAN (VLAN ID 1) must be
tagged for all traffic going to the hyperloop ports.
For example, following the configuration above, in order for a layer 3 address to be used as the gateway for
vni-10, you could configure the following on exit01:
auto lo
iface lo inet loopback
address 10.0.0.11/32
auto bridge
cumulusnetworks.com 529
Cumulus Linux 3.5 User Guide
iface bridge
bridge-vlan-aware yes
bridge-ports inside server01 server02 vni-10 vni-20 peerlink
bridge-vids 100 200
bridge-pvid 1 # sets native VLAN to 1, an unused VLAN
mstpctl-treeprio 8192
auto outside
iface outside
bond-slaves swp45 swp47
alias hyperloop outside
mstpctl-bpduguard yes
mstpctl-portbpdufilter yes
auto inside
iface inside
bond-slaves swp46 swp48
alias hyperloop inside
mstpctl-bpduguard yes
mstpctl-portbpdufilter yes
auto VLAN100GW
iface VLAN100GW
bridge_ports outside.100
address 172.16.100.2/24
alias VXLAN GW 100 Linux Bridge
address-virtual 44:38:39:FF:01:90 172.16.100.1/24
auto VLAN200GW
iface VLAN200GW
bridge_ports outside.200
address 172.16.200.2/24
alias VXLAN GW 200 Linux Bridge
address-virtual 44:38:39:FF:02:90 172.16.200.1/24
auto vni-10
iface vni-10
vxlan-id 10
vxlan-local-tunnelip 10.0.0.11
bridge-access 100
auto vni-20
iface vni-20
vxlan-id 20
vxlan-local-tunnelip 10.0.0.11
bridge-access 200
Restarting switchd is a disruptive change and affects data plane network traffic.
cumulusnetworks.com 531
Cumulus Linux 3.5 User Guide
Are HER (Head End Replication) entries being programmed into the bridge
fdb table?
Check for 00:00:00:00:00:00 entries for each VXLAN using bridge fdb show:
If you are not getting HER entries, there are some steps you can try:
Make sure you are using LNV (see page 407) OR EVPN. You cannot use both at the same time.
Make sure you are not trying to use any VNI/VXLAN values over 65535. For example, VXLAN 70000 is
not supported in Cumulus Linux.
Make sure you are not using the reserved VLAN range; by default it is 3000-3999. This range is
stored in the resv_vlan_range variable in the /etc/cumulus/switchd.conf file.
cumulusnetworks.com 533
Cumulus Linux 3.5 User Guide
If you can't even ping from server to server this is not a VXLAN gateway problem, but a problem with the
network itself. This must be fixed prior to making a VXLAN gateway, with or without a hyperloop.
Only proceed past this point if you can get server to server connectivity on the same VXLAN.
534 02 March 2018
Cumulus Networks
Only proceed past this point if you can get server to server connectivity on the same VXLAN.
The SVI (switch virtual interface) IP address for a hyperloop MUST be on a traditional bridge.
Please follow the configuration guidelines above.
Notice above that swp53 and swp54 are a loopback cable (hyperloop) where it is connected to itself.
cumulusnetworks.com 535
Cumulus Linux 3.5 User Guide
Contents
This chapter covers ...
Getting Started (see page 537)
Caveats and Errata (see page 537)
Preparing for the MidoNet Integration (see page 537)
Enabling the openvswitch-vtep Package (see page 537)
Bootstrapping the OVSDB Server and VTEP (see page 538)
Automating with the Bootstrap Script (see page 538)
Manually Bootstrapping (see page 538)
Configuring MidoNet VTEP and Port Bindings (see page 539)
Using the MidoNet Manager GUI (see page 539)
Using the MidoNet CLI (see page 543)
Troubleshooting MidoNet and Cumulus VTEPs (see page 545)
Troubleshooting the Control Plane (see page 545)
Verifying VTEP and OVSDB Services (see page 545)
Verifying OVSDB-server Connections (see page 546)
Verifying the VXLAN Bridge and VTEP Interfaces (see page 546)
Datapath Troubleshooting (see page 547)
Verifying IP Reachability (see page 547)
MidoNet VXLAN Encapsulation (see page 547)
Inspecting the OVSDB (see page 548)
Using VTEP-CTL (see page 548)
Getting Started
Before you create VXLANs with MidoNet, make sure you have the following components:
A switch (L2 gateway) with a Tomahawk, Trident II+ or Trident II chipset running Cumulus Linux
OVSDB server (ovsdb-server), included in Cumulus Linux
VTEPd (ovs-vtepd), included in Cumulus Linux and supports VLAN-aware bridges (see page 325)
Integrating a VXLAN with MidoNet involves:
Preparing for the MidoNet integration
Bootstrapping the OVS and VTEP
Configuring the MidoNet VTEP binding
Verifying the VXLAN configuration
1. Edit the /etc/default/openvswitch-vtep file, changing the START option from no to yes. This
simple sed command does this, and creates a backup as well:
cumulusnetworks.com 537
Cumulus Linux 3.5 User Guide
Since MidoNet does not have a controller, you need to use a dummy IP address (for example,
1.1.1.1) for the controller parameter in the bootstrap script. After the script completes, delete the
VTEP manager, since it is not needed and will otherwise fill the logs with inconsequential error
messages:
Manually Bootstrapping
If you don't use the bootstrap script, then you must initialize the OVS database instance manually, and
create the VTEP.
Perform the following commands in order (see the automated bootstrapping example above for values):
At this point, the switch is ready to connect to MidoNet. The rest of the configuration is performed in the
MidoNet Manager GUI, or using the MidoNet API.
cumulusnetworks.com 539
Cumulus Linux 3.5 User Guide
The tunnel zone is a construct used to define the VXLAN source address used for the tunnel. This host's
address is used for the source of the VXLAN encapsulation, and traffic will transit into the routing domain
from this point. Thus, the host must have layer 3 reachability to the Cumulus Linux switch tunnel IP.
Next, add a host entry to the tunnel zone:
1. Click Add.
2. Select a host from the Host list.
3. Provide the tunnel source IP Address to use on the selected host.
4. Click Save.
cumulusnetworks.com 541
Cumulus Linux 3.5 User Guide
The new VTEP appears in the list below. MidoNet then initiates a connection between the OpenStack
Controller and the Cumulus Linux switch. If the OVS client is successfully connected to the OVSDB server,
the VTEP entry should display the switch name and VXLAN tunnel IP address, which you specified during
the bootstrapping process.
1. Click Add.
2. In the Port Name list, select the port on the Cumulus Linux switch that you are using to connect to
the VXLAN segment.
3. Specify the VLAN ID (enter 0 for untagged).
4. In the Bridge list, select the MidoNet bridge that the instances (VMs) are using in OpenStack.
5. Click Save.
You should see the port binding displayed in the binding table under the VTEP.
Once the port is bound, this automatically configures a VXLAN bridge interface, and includes the VTEP
interface and the port bound to the bridge. Now the OpenStack instances (VMs) should be able to ping the
hosts connected to the bound port on the Cumulus switch. The Troubleshooting section below
demonstrates the verification of the VXLAN data and control planes.
root@os-controller:~# midonet-cli
midonet>
Now from the MidoNet CLI, the commands explained in this section perform the same operations depicted
in the previous section with the MidoNet Manager GUI.
2. The tunnel zone is a construct used to define the VXLAN source address used for the tunnel. This
host's address is used for the source of the VXLAN encapsulation, and traffic will transit into the
routing domain from this point. Thus, the host must have layer 3 reachability to the Cumulus Linux
switch tunnel IP.
First, get the list of available hosts connected to the Neutron network and the MidoNet
cumulusnetworks.com 543
2.
First, get the list of available hosts connected to the Neutron network and the MidoNet
bridge.
Next, get a listing of all the interfaces.
Finally, add a host entry to the tunnel zone ID returned in the previous step, and specify
which interface address to use.
Repeat this procedure for each OpenStack host connected to the Neutron network and the MidoNet
bridge.
3. Create a VTEP and assign it to the tunnel zone ID returned in the previous step. The management IP
address (the destination address for the VXLAN/remote VTEP) and the port must be the same ones
you configured in the vtep-bootstrap script or the manual bootstrapping:
In this step, MidoNet initiates a connection between the OpenStack Controller and the Cumulus
Linux switch. If the OVS client is successfully connected to the OVSDB server, the returned values
should show the name and description matching the switch-name parameter specified in the
bootstrap process.
4. The VTEP binding uses the information provided to MidoNet from the OVSDB server, providing a list
of ports that the hardware VTEP can use for layer 2 attachment. This binding virtually connects the
physical interface to the overlay switch, and joins it to the Neutron bridged network.
First, get the UUID of the Neutron network behind the MidoNet bridge:
Next, create the VTEP binding, using the UUID and the switch port being bound to the VTEP on the
remote end. If there is no VLAN ID, set vlan to 0:
At this point, the VTEP should be connected, and the layer 2 overlay should be operational. From the
openstack instance (VM), you should be able to ping a physical server connected to the port bound to the
hardware switch VTEP.
cumulusnetworks.com 545
Cumulus Linux 3.5 User Guide
If the connection fails, verify IP reachability from the host to the switch. If that succeeds, it is likely the
bootstrap process did not set up port 6632. Redo the bootstrapping procedures above.
Next, look at the bridging table for the VTEP and the forwarding entries. The bound interface and the VTEP
should be listed along with the MAC addresses of those interfaces. When the hosts attached to the bound
port send data, those MACs are learned, and entered into the bridging table, as well as the OVSDB.
Datapath Troubleshooting
If you have verified the control plane is correct, and you still cannot get data between the OpenStack
instances and the physical nodes on the switch, there may be something wrong with the data plane. The
data plane consists of the actual VXLAN encapsulated path, between one of the OpenStack nodes running
the midolman service. This is typically the compute nodes, but can include the MidoNet gateway nodes. If
the OpenStack instances can ping the tenant router address but cannot ping the physical device connected
to the switch (or vice versa), then something is wrong in the data plane.
Verifying IP Reachability
First, there must be IP reachability between the encapsulating node, and the address you bootstrapped as
the tunnel IP on the switch. Verify the OpenStack host can ping the tunnel IP. If this doesn't work, check the
routing design, and fix the layer 3 problem first.
cumulusnetworks.com 547
Cumulus Linux 3.5 User Guide
Using VTEP-CTL
These commands show you the information installed in the OVSDB. This database is structured using the
physical switch ID, with one or more logical switch IDs associated with it. The bootstrap process creates the
physical switch, and MidoNet creates the logical switch after the control session is established.
cumulusnetworks.com 549
Cumulus Linux 3.5 User Guide
Logical_Binding_Stats table
_uuid bytes_from_local bytes_to_local packets_from_local
packets_to_local
------------------------------------ ---------------- --------------
------------------ ----------------
d2e378b4-61c1-4daf-9aec-a7fd352d3193 5782569 1658250 21687 14589
Logical_Router table
_uuid description name static_routes switch_binding
----- ----------- ---- ------------- --------------
Logical_Switch table
_uuid description name tunnel_key
------------------------------------ -----------
----------------------------------------- ----------
44d162dc-0372-4749-a802-5b153c7120ec "" "mn-6c9826da-6655-4fe3-a826-
4dcba6477d2d" 10006
Manager table
_uuid inactivity_probe is_connected max_backoff other_config status
target
----- ---------------- ------------ ----------- ------------ ------
------
Mcast_Macs_Local table
MAC _uuid ipaddr locator_set logical_switch
----------- ------------------------------------ ------
------------------------------------
------------------------------------
unknown-dst 25eaf29a-c540-46e3-8806-3892070a2de5 "" 7a4c000a-244e-
4b37-8f25-fd816c1a80dc 44d162dc-0372-4749-a802-5b153c7120ec
Mcast_Macs_Remote table
MAC _uuid ipaddr locator_set logical_switch
----------- ------------------------------------ ------
------------------------------------
------------------------------------
unknown-dst b122b897-5746-449e-83ba-fa571a64b374 "" 6c04d477-18d0-
41df-8d52-dc7b17845ebe 44d162dc-0372-4749-a802-5b153c7120ec
Physical_Locator table
_uuid dst_ip encapsulation_type
------------------------------------ -------------- ------------------
2fcf8b7e-e084-4bcb-b668-755ae7ac0bfb "10.111.0.182" "vxlan_over_ipv4"
3f78dbb0-9695-42ef-a31f-aaaf525147f1 "10.111.1.2" "vxlan_over_ipv4"
Physical_Locator_Set table
_uuid locators
------------------------------------
--------------------------------------
6c04d477-18d0-41df-8d52-dc7b17845ebe [2fcf8b7e-e084-4bcb-b668-
755ae7ac0bfb]
7a4c000a-244e-4b37-8f25-fd816c1a80dc [3f78dbb0-9695-42ef-a31f-
aaaf525147f1]
Physical_Port table
_uuid description name port_fault_status vlan_bindings vlan_stats
------------------------------------ ----------- ---------
----------------- ----------------------------------------
----------------------------------------
cumulusnetworks.com 551
Cumulus Linux 3.5 User Guide
----------------------------------------------------------------------
----------------------------------------------------------------------
----------------------------------------------------------------------
----------------------------------------------------------------------
----------------------------------------------------------------------
----------------------------------------------------------------------
----------------------------------------------------------------------
----------------------------------------------------------------------
----------------------------------------------------------------------
----------------------------------------------------------------------
----------------------------------------------------------------------
----------------------------------------------------------------------
----------------------------------------------------------------------
-------------------------------------------- -------------------
-------------- --------------------------------------
6d459554-0c75-4170-bb3d-117eb4ce1f4d "sw12" ["10.50.20.22"] "sw12"
[109a9911-d6c7-4142-b6c9-7c985506abb4, 124d1e01-a187-4427-819f-
21de66e76f13, 2a2d04fa-7190-41fe-8cee-318fcbafb2ea, 3001c904-b152-
4dc4-9d8e-718f24ffa439, 3943fb6a-0b49-4806-a014-2bcd4d469537,
4223559a-da1c-4c34-b8bf-bff7ced376ad, 439afb62-067e-4bbe-a0d9-
ee33a23d2a9c, 47cc66fb-ef8a-4a9b-a497-1844b89f7d32, 54f6c9df-01a1-
4d96-9dcf-3035a33ffb3e, 55b49814-b5c5-405e-8e9f-898f3df4f872,
5b15372b-89f0-4e14-a50b-b6c6f937d33d, 5be3a052-be0f-4258-94cb-
5e8be9afb896, 631b19bd-3022-4353-bb2d-f498b0c1cb17, 652c6cd1-0823-4585
-bb78-658e6ca2abfc, 684f99d5-426c-45c8-b964-211489f45599, 69585fff-436
0-4177-901d-8360ade5391b, 6bbccda8-d7e5-4b19-b978-4ec7f5b868e0,
7096abaf-eebf-4ee3-b0cc-276224bc3e71, 7cb681f4-2206-4c70-85b7-
23b60963cd21, 93b85c31-be38-4384-8b7a-9696764f9ba9, 9a7e42c4-228f-
4b55-b972-7c3b8352c27d, a44c5402-6218-4f09-bf1e-518f41a5546e,
a6f8a88a-3877-4f81-b9b4-d75394a09d2c, a9294152-2b32-4058-8796-23520
ffb7379, b26ce4dd-b771-4d7b-8647-41fa97aa40e3, b2b2cd14-662d-45a5-
87c1-277acbccdffd, bcfb2920-6676-494c-9dcb-b474123b7e59, bf38137d-
3a14-454e-8df0-9c56e4b4e640, bf69fcbb-36b3-4dbc-a90d-fc7412e57076,
c32a9ff9-fd11-4399-815f-806322f26ff5, c35f55f5-8ec6-4fed-bef4-
49801cd0934c, c5a88dd6-d931-4b2c-9baa-a0abfb9d41f5, c6876886-8386-4
e34-a307-931909fca58f, c85ed6cd-a7d4-4016-b3e9-34df592072eb, cf382ed6-
60d3-43f5-8586-81f4f0f2fb28, d9db91a6-1c10-4154-9269-84877faa79b4,
e00741f1-ba34-47c5-ae23-9269c5d1a871, e0ee993a-8383-4701-a766-
d425654dbb7f] [] ["10.111.1.2"] [062eaf89-9bd5-4132-8b6b-09db254325af]
Tunnel table
_uuid bfd_config_local bfd_config_remote bfd_params bfd_status local
remote
------------------------------------
-----------------------------------------------------------
----------------- ---------- ----------
------------------------------------
------------------------------------
062eaf89-9bd5-4132-8b6b-09db254325af {bfd_dst_ip="169.254.1.0",
bfd_dst_mac="00:23:20:00:00:01"} {} {} {} 3f78dbb0-9695-42ef-a31f-
aaaf525147f1 2fcf8b7e-e084-4bcb-b668-755ae7ac0bfb
Ucast_Macs_Local table
MAC _uuid ipaddr locator logical_switch
Contents
This chapter covers ...
Getting Started (see page 554)
Caveats and Errata (see page 554)
Bootstrapping the NSX-V Integration (see page 554)
Enabling the openvswitch-vtep Package (see page 554)
Using the Bootstrapping Script (see page 555)
Manually Bootstrapping the NSX-V Integration (see page 556)
Generating the Credentials Certificate (see page 556)
Getting Started
Before you integrate VXLANs with NSX-V, make sure you have the following components:
A switch (L2 gateway) with a Broadcom Tomahawk, Trident II+ or Trident II chipset, or a Mellanox
Spectrum chipset running Cumulus Linux
OVSDB server (ovsdb-server), included in Cumulus Linux
VTEPd (ovs-vtepd), included in Cumulus Linux and supports VLAN-aware bridges (see page 325)
Integrating a VXLAN with NSX-V involves:
Bootstrapping the NSX-V integration
Configuring the transport zone and segment ID
Configuring the logical layer
Verifying the VXLAN configuration
Do not use 0 or 16777215 as the VNI ID, as they are reserved values under Cumulus Linux.
For more information about NSX-V, see the VMware NSX User Guide, version 4.0.0 or later.
cumulus@switch:~$ vtep-bootstrap -h
usage: vtep-bootstrap [-h] [--controller_ip CONTROLLER_IP]
[--controller_port CONTROLLER_PORT] [--
no_encryption]
[--credentials-path CREDENTIALS_PATH]
[--pki-path PKI_PATH]
switch_name tunnel_ip management_ip
positional arguments:
switch_name Switch name
tunnel_ip local VTEP IP address for tunnel termination
(data
plane)
management_ip local management interface IP address for OVSDB
conection (control plane)
optional arguments:
-h, --help show this help message and exit
--controller_ip CONTROLLER_IP
--controller_port CONTROLLER_PORT
--no_encryption clear text OVSDB connection
--credentials-path CREDENTIALS_PATH
--pki-path PKI_PATH
cumulusnetworks.com 555
Cumulus Linux 3.5 User Guide
().
Executed:
sign certificate
(vtep7-req.pem Wed May 3 01:15:24 UTC 2017
fingerprint bcc876b7bc8d1d596d1e78d3bde9337d2550f92e).
Executed:
define physical switch
().
Executed:
define NSX controller IP address in OVSDB
().
Executed:
define local tunnel IP address on the switch
().
Executed:
define management IP address on the switch
().
Executed:
restart a service
().
In the above example, the following information was passed to the vtep-bootstrap script:
--credentials-path /var/lib/openvswitch: Is the path to where the certificate and key
pairs for authenticating with the NSX controller are stored.
vtep7: is the ID for the VTEP, the switch name.
192.168.110.110: Is the IP address of the NSX controller.
172.16.20.157: Is the datapath IP address of the VTEP.
192.168.110.25: Is the IP address of the management interface on the switch.
These IP addresses will be used throughout the rest of the examples below.
Cumulus Networks
Creating switchca...
cumulus@switch:~$ sudo ovs-pki req+sign cumulus
cumulus@switch:~$ ls -l
total 12
-rw-r--r-- 1 root root 4028 Oct 23 05:32 cumulus-cert.pem
-rw------- 1 root root 1679 Oct 23 05:32 cumulus-privkey.pem
-rw-r--r-- 1 root root 3585 Oct 23 05:32 cumulus-req.pem
# Start ovsdb-server.
set ovsdb-server "$DB_FILE"
set "$@" -vANY:CONSOLE:EMER -vANY:SYSLOG:ERR -vANY:FILE:INFO
set "$@" --remote=punix:"$DB_SOCK"
set "$@" --remote=db:Global,managers
set "$@" --remote=ptcp:6633:$LOCALIP
set "$@" --private-key=/root/cumulus-privkey.pem
set "$@" --certificate=/root/cumulus-cert.pem
set "$@" --bootstrap-ca-cert=/root/controller.cacert
If files have been moved or regenerated, restart the OVSDB server and vtepd:
3. Define the NSX Controller Cluster IP address in OVSDB. This causes the OVSDB server to start
contacting the NSX controller:
4. Define the local IP address on the VTEP for VXLAN tunnel termination. First, find the physical switch
name as recorded in OVSDB:
Then set the tunnel source IP address of the VTEP. This is the datapath address of the VTEP, which is
typically an address on a loopback interface on the switch that is reachable from the underlying L3
network:
cumulusnetworks.com 557
Cumulus Linux 3.5 User Guide
Once you finish generating the certificate, keep the terminal session active, as you need to paste the
certificate into NSX Manager when you configure the VTEP gateway.
1. In NSX Manager, add a new HW VTEP gateway. Click the Network & Security icon, Service
Definitions category, then the Hardware Devices tab. Under Hardware Devices, click +. The
Create Add Hardware Devices window appears.
cumulus@switch:~$ cd /var/lib/openvswitch
cumulus@switch:/var/lib/openvswitch$ ls
conf.db pki vtep7-cert.pem vtep7-privkey.pem vtep7-req.pem
cumulus@switch:/var/lib/openvswitch$ cat vtep7-cert.pem
Once communication is established between the switch and the controller, a controller.cacert file will
be downloaded onto the switch.
Verify the controller and switch handshake is successful. In a terminal connected to the switch, run this
command:
cumulusnetworks.com 559
Cumulus Linux 3.5 User Guide
target : "ssl:192.168.110.110:6640"
1. In the Installation category, click the Logical Network Preparation tab, then click the Segment
ID tab.
2. Click Edit and add the segment IDs (VNIDs) to be used. Here VNIs 5000-5999 are configured.
5. Select Unicast to choose the NSX-V Controller Cluster to handle the VXLAN control plane.
1. In NSX Manager, select the Logical Switches category. Click + to add a logical switch instance.
2. Hypervisors connected to the NSX controller for replication appear in the Available Objects list.
Select the required service nodes, then click the green arrow to move them to the Selected Objects
list.
1.
cumulusnetworks.com 563
Cumulus Linux 3.5 User Guide
1. In NSX Manager, add a new logical switch port. Click the Logical Switches category. Under Actions,
click Manage Hardware Bindings. The Manage Hardware Binding wizard appears.
cumulusnetworks.com 565
Cumulus Linux 3.5 User Guide
Contents
This chapter covers ...
Getting Started (see page 567)
Caveats and Errata (see page 567)
Bootstrapping the NSX Integration (see page 567)
Enabling the openvswitch-vtep Package (see page 567)
Using the Bootstrapping Script (see page 568)
Manually Bootstrapping the NSX Integration (see page 569)
Generating the Credentials Certificate (see page 569)
Configuring the Switch as a VTEP Gateway (see page 571)
Configuring the Transport Layer (see page 573)
Configuring the Logical Layer (see page 574)
Defining Logical Switches (see page 574)
Defining Logical Switch Ports (see page 576)
Verifying the VXLAN Configuration (see page 578)
Getting Started
Before you integrate VXLANs with NSX, make sure you have the following components:
A switch (L2 gateway) with a Broadcom Tomahawk, Trident II+ or Trident II chipset, or a Mellanox
Spectrum chipset running Cumulus Linux
OVSDB server (ovsdb-server), included in Cumulus Linux
VTEPd (ovs-vtepd), included in Cumulus Linux and supports VLAN-aware bridges (see page 325)
Integrating a VXLAN with NSX involves:
Bootstrapping the NSX Integration
Configuring the Transport Layer
Configuring the Logical Layer
Verifying the VXLAN Configuration
Do not use 0 or 16777215 as the VNI ID, as they are reserved values under Cumulus Linux.
For more information about NSX, see the VMware NSX User Guide, version 4.0.0 or later.
cumulusnetworks.com 567
1.
In the above example, the following information was passed to the vtep-bootstrap script:
--credentials-path /var/lib/openvswitch: Is the path to where the certificate and key
pairs for authenticating with the NSX controller are stored.
vtep7: is the ID for the VTEP.
192.168.100.17: is the IP address of the NSX controller.
172.16.20.157: is the datapath IP address of the VTEP.
192.168.100.157: is the IP address of the management interface on the switch.
These IP addresses will be used throughout the rest of the examples below.
cumulus@switch:~$ ls -l
total 12
-rw-r--r-- 1 root root 4028 Oct 23 05:32 cumulus-cert.pem
-rw------- 1 root root 1679 Oct 23 05:32 cumulus-privkey.pem
-rw-r--r-- 1 root root 3585 Oct 23 05:32 cumulus-req.pem
# Start ovsdb-server.
set ovsdb-server "$DB_FILE"
set "$@" -vANY:CONSOLE:EMER -vANY:SYSLOG:ERR -vANY:FILE:INFO
set "$@" --remote=punix:"$DB_SOCK"
set "$@" --remote=db:Global,managers
set "$@" --remote=ptcp:6633:$LOCALIP
set "$@" --private-key=/root/cumulus-privkey.pem
set "$@" --certificate=/root/cumulus-cert.pem
set "$@" --bootstrap-ca-cert=/root/controller.cacert
If files have been moved or regenerated, restart the OVSDB server and vtepd:
3. Define the NSX controller cluster IP address in OVSDB. This causes the OVSDB server to start
contacting the NSX controller:
4. Define the local IP address on the VTEP for VXLAN tunnel termination. First, find the physical switch
name as recorded in OVSDB:
Then set the tunnel source IP address of the VTEP. This is the datapath address of the VTEP, which is
typically an address on a loopback interface on the switch that is reachable from the underlying L3
network:
Once you finish generating the certificate, keep the terminal session active, as you need to paste the
certificate into NSX Manager when you configure the VTEP gateway.
1. In NSX Manager, add a new gateway. Click the Network Components tab, then the Transport
Layer category. Under Transport Node, click Add, then select Manually Enter All Fields. The
Create Gateway wizard appears.
2. In the Create Gateway dialog, select Gateway for the Transport Node Type, then click Next.
3. In the Display Name field, give the gateway a name, then click Next.
4. Enable the VTEP service. Select the VTEP Enabled checkbox, then click Next.
5.
cumulusnetworks.com 571
Cumulus Linux 3.5 User Guide
5. From the terminal session connected to the switch where you generated the certificate, copy the
certificate and paste it into the Security Certificate text field. Copy only the bottom portion,
including the BEGIN CERTIFICATE and END CERTIFICATE lines. For example, copy all the
highlighted text in the terminal:
Once communication is established between the switch and the controller, a controller.cacert file will
be downloaded onto the switch.
Verify the controller and switch handshake is successful. In a terminal connected to the switch, run this
572 02 March 2018
Cumulus Networks
Verify the controller and switch handshake is successful. In a terminal connected to the switch, run this
command:
1. In NSX Manager, add a new gateway service. Click the Network Components tab, then the Services
category. Under Gateway Service, click Add. The Create Gateway Service wizard appears.
2. In the Create Gateway Service dialog, select VTEP L2 Gateway Service as the Gateway Service Type.
cumulusnetworks.com 573
Cumulus Linux 3.5 User Guide
1. In NSX Manager, add a new logical switch. Click the Network Components tab, then the Logical
Layer category. Under Logical Switch, click Add. The Create Logical Switch wizard appears.
2. In the Display Name field, enter a name for the logical switch, then click Next.
4. Specify the transport zone bindings for the logical switch. Click Add Binding. The Create Transport
574 02 March 2018
Cumulus Networks
4. Specify the transport zone bindings for the logical switch. Click Add Binding. The Create Transport
Zone Binding dialog appears.
5. In the Transport Type list, select VXLAN, then click OK to add the binding to the logical switch.
6. In the VNI field, assign the switch a VNI ID, then click OK.
Do not use 0 or 16777215 as the VNI ID, as they are reserved values under Cumulus Linux.
cumulusnetworks.com 575
Cumulus Linux 3.5 User Guide
1.
576 02 March 2018
Cumulus Networks
1. In NSX Manager, add a new logical switch port. Click the Network Components tab, then the
Logical Layer category. Under Logical Switch Port, click Add. The Create Logical Switch Port
wizard appears.
2. In the Logical Switch UUID list, select the logical switch you created above, then click Create.
3. In the Display Name field, give the port a name that indicates it is the port that connects the
gateway, then click Next.
4. In the Attachment Type list, select VTEP L2 Gateway.
5. In the VTEP L2 Gateway Service UUID list, choose the name of the gateway service you created
earlier.
6. In the VLAN list, you can optionally choose a VLAN if you wish to connect only traffic on a specific
VLAN of the physical network. Leave it blank to handle all traffic.
7.
cumulusnetworks.com 577
Cumulus Linux 3.5 User Guide
7. Click Save to save the logical switch port. Connectivity is established. Repeat this procedure for each
logical switch port you want to define.
or
VXLAN Scale
On Broadcom Trident II and Tomahawk (but not Trident II+ or Maverick) and Mellanox Spectrum switches
running Cumulus Linux, there is a limit on the amount of VXLANs you can configure simultaneously. The
limit most often given is 2000 VXLANs that can be run, but network architects want to get more specific and
know exactly the limit for their specific design.
The limit is a physical to virtual mappings where a switch can hold 15000 mappings in hardware before you
encounter hash collisions. There is also an upper limit of around 3000 VLANs you can configure before you
hit the reserved range (Cumulus Linux uses 3000-3999 by default). Cumulus Networks typically uses a soft
number because the math is unique to each customer's environment. An internal VLAN is consumed by
each layer 3 port, subinterface, traditional bridge (see page 337) and the VLAN-aware bridge (see page 325)
. Thus, the number of configurable VXLANs is:
(total configurable 802.1q VLANs) - (reserved VLANS) - (physical or logical interfaces) =
4094-999-eth0-loopback = 3093 by default (without any other configuration)
The equation for the number of configurable VXLANs looks like this:
(number of trunks) * (VXLAN/VLANs per trunk) - (Linux logical and physical interfaces) = 15000
For example, on a 10Gb switch with 48 * 10 G ports and 6 * 40G uplinks, you can calculate for X, the
amount of configurable VXLANs:
48 * X + (48 downlinks + 6 uplinks + 1 loopback + 1 eth0 + 1 bridge) = 15000
48 * X = 14943
X = 311 VXLANs
Similarly, this logic can be applied to a 32 port 100G switch where 16 ports have been broken up to 4 * 25
Gbps ports, for a total of 64 * 25 Gbps ports:
64 * X + (64 downlinks + 16 uplinks + 1 loopback + 1 eth0 + 1 bridge) = 15000
64 * X = 14917
X = 233 VXLANs
However, not all ports are trunks for all VXLANs (or at least not all the time). It is much more common for
subsets of ports to be used for different VXLANs. For example, a 10G (48 * 10G + 6 * 40G uplinks) can have
the following configuration:
cumulusnetworks.com 579
Cumulus Linux 3.5 User Guide
Ports Trunks
swp31-48 X VXLAN/VLANs
Contents
This chapter covers ...
Removing the Early Access QinQ Metapackage (see page 581)
Configuring Single Tag Translation (see page 581)
Configuring the Public Cloud-facing Switch (see page 581)
Configuring the Customer-facing Edge Switch (see page 582)
Viewing the Configuration (see page 584)
You configure two switches: one at the service provider edge that faces the customer (the switch on the left
above), and one on the public cloud handoff edge (the righthand switch above).
cumulusnetworks.com 581
Cumulus Linux 3.5 User Guide
A trunk port connected to the public cloud is the QinQ trunk, and packets are double tagged, where
the S-tag is for the customer and the C-tag is for the service.
To configure the public cloud-facing switch, run the following NCLU (see page 82) commands:
auto vni-1000
iface vni-1000
bridge-access 100
bridge-learning off
vxlan-id 1000
vxlan-local-tunnelip 10.0.0.1
auto vni-3000
iface vni-3000
bridge-access 200
bridge-learning off
vxlan-id 3000
vxlan-local-tunnelip 10.0.0.1
auto bridge
iface bridge
bridge-ports swp3 vni-1000 vni-3000
bridge-vids 100 200
bridge-vlan-aware yes
bridge-vlan-protocol 802.1ad
The service VLAN tags (C-tags) are preserved during VXLAN encapsulation.
582 02 March 2018
Cumulus Networks
The service VLAN tags (C-tags) are preserved during VXLAN encapsulation.
To configure the customer-facing switch, run the following NCLU (see page 82) commands:
auto vni-1000
iface vni-1000
bridge-access 100
bridge-learning off
vxlan-id 1000
vxlan-local-tunnelip 10.0.0.1
auto vni-3000
iface vni-3000
bridge-access 200
bridge-learning off
vxlan-id 3000
vxlan-local-tunnelip 10.0.0.1
auto swp3
iface swp3
bridge-access 100
auto swp4
iface swp4
bridge-access 200
auto bridge
iface bridge
bridge-ports swp3 swp4 vni-1000 vni-3000
bridge-vids 100 200
bridge-vlan-aware yes
bridge-vlan-protocol 802.1ad
cumulusnetworks.com 583
Cumulus Linux 3.5 User Guide
To verify that the bridge is configured for QinQ, run ip -d link show bridge and look for vlan_protocol
802.1ad in the output:
The configuration in Cumulus Linux uses the outer tag for the customer and the inner tag for the
service.
You configure a double-tagged interface by stacking the VLANs in the following manner: <port>.<outer tag>.
<inner tag>. For example, consider swp1.100.10: the outer tag is VLAN 100, which represents the customer,
and the inner tag is VLAN 10, which represents the service.
The outer tag or TPID (tagged protocol identifier) needs the vlan_protocol to be specified. It can be
either 802.1Q or 802.1ad. If 802.1ad is used, it must be specified on the lower VLAN device, such as swp3.
100 in the example below.
Double tag translation only works with bridges in traditional mode (see page 337) (not VLAN-
aware). As such, you cannot use NCLU (see page 82) to configure it.
cumulusnetworks.com 585
Cumulus Linux 3.5 User Guide
auto swp3.100
iface swp3.100
vlan_protocol 802.1ad
auto swp3.100.10
iface swp3.100.10
mstpctl-portbpdufilter yes
mstpctl-bpduguard yes
auto vni1000
iface vni1000
vxlan-local-tunnelip 10.0.0.1
mstpctl-portbpdufilter yes
mstpctl-bpduguard yes
vxlan-id 1000
auto custA-10-azr
iface custA-10-azr
bridge-ports swp3.100.10 vni1000
bridge-vlan-aware no
bridge-learning vni1000=off
You can check the configuration with the brctl show command:
You can try this out without the bridge being VXLAN-enabled. The configuration would look
something like this:
auto swp5.100.10
iface swp5.100.10
mstpctl-portbpdufilter yes
mstpctl-bpduguard yes
auto br10
iface br10
bridge-ports swp3.10 swp4 swp5.100.10
bridge-vlan-aware no
Feature Limitations
iptables match on double-tagged interfaces is not supported.
Single-tagged translation supports only VLAN-aware bridge mode (see page 325) with the bridge’s
VLAN 802.1ad protocol.
MLAG (see page 348) is only supported with single-tagged translation.
No layer 2 protocol (STP BPDU, LLDP) tunneling support.
Mixing 802.1Q and 802.1ad subinterfaces on the same switch port is not supported.
When using switches with Mellanox Spectrum ASICs in an MLAG pair:
The peerlink (peerlink.4094) between the MLAG pair should be configured for VLAN protocol
802.1ad.
The peerlink cannot be used as a backup datapath in the event that one of the MLAG peers
loses all uplinks.
For switches with the Spectrum ASIC (but not the Spectrum 2), when the bridge VLAN protocol is
802.1ad and is VXLAN-enabled, either:
All bridge ports are access ports, except for the MLAG peerlink.
All bridge ports are VLAN trunks.
This means the switch terminating the cloud provider connections (double-tagged) cannot have
local clients; these clients must be on a separate switch.
cumulusnetworks.com 587
Cumulus Linux 3.5 User Guide
To work around this issue, you'll need to create two VLANs as nested VLAN raw devices, one for the outer
tag and one for the inner tag. For example, you can't create an interface called swp50s0.1001.101, since it
has 16 characters in its name. Instead, you'll create VLANs with IDs 1001 and 101 as follows by editing /etc
/network/interfaces and adding a configuration like the following:
auto vlan1001
iface vlan1001
vlan-id 1001
vlan-raw-device swp50s0
vlan-protocol 802.1ad
auto vlan1001-101
iface vlan1001-101
vlan-id 101
vlan-raw-device vlan1001
auto bridge101
iface bridge101
bridge-ports vlan1001-101 vxlan1000101
Layer
588 3 02 March 2018
Cumulus Networks
Layer 3
Routing
This chapter discusses routing on switches running Cumulus Linux.
Contents
This chapter covers ...
Managing Static Routes (see page 589)
Static Multicast Routes (see page 590)
Static Routing via ip route (see page 590)
Applying a Route Map for Route Updates (see page 592)
Configuring a Gateway or Default Route (see page 592)
Supported Route Table Entries (see page 592)
Forwarding Table Profiles (see page 592)
TCAM Resource Profiles for Mellanox Switches (see page 593)
Number of Supported Route Entries, by Platform (see page 594)
Caveats and Errata (see page 596)
Don't Delete Routes via Linux Shell (see page 596)
Adding IPv6 Default Route with src Address on eth0 Fails without Adding Delay (see page 596)
Related Information (see page 597)
!
ip route 203.0.113.0/24 198.51.100.2
!
cumulusnetworks.com 589
Cumulus Linux 3.5 User Guide
!
ip mroute 230.0.0.0/24
!
To view mroutes, open the FRRouting CLI, and run the following command:
auto swp3
iface swp3
address 198.51.100.1/24
post-up ip route add 203.0.113.0/24 via 198.51.100.2
If an IPv6 address is assigned to a DOWN interface, the associated route is still installed into the
routing table. The type of IPv6 address doesn't matter: link local, site local and global all exhibit
the same problem.
If the interface is bounced up and down, then the routes are no longer in the route table.
The ip route command allows manipulating the kernel routing table directly from the Linux shell. See
man ip(8) for details. FRRouting monitors the kernel routing table changes and updates its own routing
table accordingly.
To display the routing table:
cumulusnetworks.com 591
Cumulus Linux 3.5 User Guide
Cumulus Linux defines these profiles as default, l 2-heavy, v4-lpm-heavy and v6-lpm-heavy. Choose the profile
that best suits your network architecture and specify the profile name for the forwarding_table.
profile variable in the /etc/cumulus/datapath/traffic.conf file.
After you specify a different profile, restart switchd (see page 190)for the change to take effect. You can
see the forwarding table profile when you run cl-resource-query.
Broadcom ASICs other than Tomahawk and Trident II/Trident II+ support only the default profile.
1. Valid profiles -
2. default, ipmc-heavy, acl-heavy, ipmc-max
tcam_resource.profile = default
After you specify a different profile, restart switchd (see page 190)for the change to take effect.
When nonatomic updates (see page 141) are enabled (that is, the acl.non_atomic_update_mode is set
to TRUE in /etc/cumulus/switchd.conf file), the maximum number of mroute and ACL entries for
each profile are as follows:
cumulusnetworks.com 593
Cumulus Linux 3.5 User Guide
When nonatomic updates (see page 141) are disabled (that is, the acl.non_atomic_update_mode is set
to FALSE in /etc/cumulus/switchd.conf file), the maximum number of mroute and ACL entries for
each profile are as follows:
The values in the following tables reflect results from our testing on the different platforms we
support, and may differ from published manufacturers' specifications provided about these
chipsets.
default 40k 32k (IPv4) and 16k (IPv6) 64k (IPv4) or 28k (IPv6-long)
l2-heavy 88k 48k (IPv4) and 40k (IPv6) 8k (IPv4) and 8k (IPv6-long)
v4-lpm-heavy 8k 8k (IPv4) and 16k (IPv6) 80k (IPv4) and 16k (IPv6-long)
v6-lpm-heavy 40k 8k (IPv4) and 40k (IPv6) 8k (IPv4) and 64k (IPv6-long)
For Broadcom switches, IPv4 and IPv6 entries are not carved in separate spaces so it is not
possible to define explicit numbers in the L3 Neighbors column of the tables shown above.
However, note that an IPv6 entry takes up twice the space of an IPv4 entry.
cumulusnetworks.com 595
Cumulus Linux 3.5 User Guide
Adding IPv6 Default Route with src Address on eth0 Fails without Adding
Delay
Attempting to install an IPv6 default route on eth0 with a source address fails at reboot or when running
ifup on eth0.
The first execution of ifup -dv returns this warning and does not install the route:
Exclude the src parameter to the ip route add that causes the need for the delay. If the src
parameter is removed, the route is added correctly.
Related Information
Linux IP - ip route command
FRRouting docs - static route commands
Contents
This chapter covers ...
Defining Routing Protocols (see page 597)
Configuring Routing Protocols (see page 597)
Protocol Tuning (see page 598)
Cumulus Linux does not support running multiple instances of the same protocol on a router.
Distributed routing protocols compute reachability between end points by disseminating relevant
information and running a routing algorithm on this information to determine the routes to each end
station. To scale the amount of information that needs to be exchanged, routes are computed on address
prefixes rather than on every end point address.
cumulusnetworks.com 597
Cumulus Linux 3.5 User Guide
The way they answer these questions affects the network design and thereby configuration. For example, in
a link-state protocol such as OSPF (see Open Shortest Path First (OSPF) Protocol (see page 617)) or IS-IS,
complete local information (links and attached address prefixes) about a node is disseminated to every
other node in the network. Since the state that a node has to keep grows rapidly in such a case, link-state
protocols typically limit the number of nodes that communicate this way. They allow for bigger networks to
be built by breaking up a network into a set of smaller subnetworks (which are called areas or levels), and
by advertising summarized information about an area to other areas.
Besides the two critical pieces of information mentioned above, protocols have other parameters that can
be configured. These are usually specific to each protocol.
Protocol Tuning
Most protocols provide certain tunable parameters that are specific to convergence during changes.
Wikipedia defines convergence as the “state of a set of routers that have the same topological information
about the network in which they operate”. It is imperative that the routers in a network have the same
topological state for the proper functioning of a network. Without this, traffic can be blackholed, and thus
not reach its destination. It is normal for different routers to have differing topological states during
changes, but this difference should vanish as the routers exchange information about the change and
recompute the forwarding paths. Different protocols converge at different speeds in the presence of
changes.
A key factor that governs how quickly a routing protocol converges is the time it takes to detect the change.
For example, how quickly can a routing protocol be expected to act when there is a link failure. Routing
protocols classify changes into two kinds: hard changes such as link failures, and soft changes such as a
peer dying silently. They’re classified differently because protocols provide different mechanisms for dealing
with these failures.
It is important to configure the protocols to be notified immediately on link changes. This is also true when
a node goes down, causing all of its links to go down.
Even if a link doesn’t fail, a routing peer can crash. This causes that router to usually delete the routes it has
computed or worse, it makes that router impervious to changes in the network, causing it to go out of sync
with the other routers in the network because it no longer shares the same topological information as its
peers.
The most common way to detect a protocol peer dying is to detect the absence of a heartbeat. All routing
protocols send a heartbeat (or “hello”) packet periodically. When a node does not see a consecutive set of
these hello packets from a peer, it declares its peer dead and informs other routers in the network about
this. The period of each heartbeat and the number of heartbeats that need to be missed before a peer is
declared dead are two popular configurable parameters.
If you configure these timers very low, the network can quickly descend into instability under stressful
conditions when a router is not able to keep sending the heartbeats quickly as it is busy computing routing
state; or the traffic is so much that the hellos get lost. Alternately, configuring this timer to very high values
also causes blackholing of communication because it takes much longer to detect peer failures. Usually, the
default values initialized within each protocol are good enough for most networks. Cumulus Networks
recommends you do not adjust these settings.
Network Topology
In computer networks, topology refers to the structure of interconnecting various nodes. Some commonly
used topologies in networks are star, hub and spoke, leaf and spine, and broadcast.
Contents
This chapter covers ...
Clos Topologies (see page 599)
Over-Subscribed and Non-Blocking Configurations (see page 599)
Containing the Failure Domain (see page 600)
Load Balancing (see page 600)
Clos Topologies
In the vast majority of modern data centers, Clos or fat tree topology is very popular. This topology is shown
in the figure below. It is also commonly referred to as leaf-spine topology. We shall use this topology
throughout the routing protocol guide.
This topology allows the building of networks of varying size using nodes of different port counts and/or by
increasing the tiers. The picture above is a three-tiered Clos network. We number the tiers from the bottom
to the top. Thus, in the picture, the lowermost layer is called tier 1 and the topmost tier is called tier 3.
The number of end stations (such as servers) that can be attached to such a network is determined by a
very simple mathematical formula.
In a 2-tier network, if each node is made up of m ports, then the total number of end stations that can be
connected is m^2/2. In more general terms, if tier-1 nodes are m-port nodes and tier-2 nodes are n-port
nodes, then the total number of end stations that can be connected are (m*n)/2. In a three tier network,
where tier-3 nodes are o-port nodes, the total number of end stations that can be connected are (m*n*o)
/2^(number of tiers-1).
Let’s consider some practical examples. In many data centers, it is typical to connect 40 servers to a top-of-
rack (ToR) switch. The ToRs are all connected via a set of spine switches. If a ToR switch has 64 ports, then
after hooking up 40 ports to the servers, the remaining 24 ports can be hooked up to 24 spine switches of
the same link speed or to a smaller number of higher link speed switches. For example, if the servers are all
hooked up as 10GE links, then the ToRs can connect to the spine switches via 40G links. So, instead of
connecting to 24 spine switches with 10G links, the ToRs can connect to 6 spine switches with each link
being 40G. If the spine switches are also 64-port switches, then the total number of end stations that can
be connected is 2560 (40*64) stations.
In a three tier network of 64-port switches, the total number of servers that can be connected are
(40*64*64)/2^(3-1) = 40960. As you can see, this kind of topology can serve quite a large network with
three tiers.
Load Balancing
In a Clos network, traffic is load balanced across the multiple links using equal cost multi-pathing (ECMP).
Routing algorithms compute shortest paths between two end stations where shortest is typically the lowest
path cost. Each link is assigned a metric or cost. By default, a link’s cost is a function of the link speed. The
higher the link speed, the lower its cost. A 10G link has a higher cost than a 40G or 100G link, but a lower
cost than a 1G link. Thus, the link cost is a measure of its traffic carrying capacity.
In the modern data center, the links between tiers of the network are homogeneous; that is, they have the
same characteristics (same speed and therefore link cost) as the other links. As a result, the first hop router
can pick any of the spine switches to forward a packet to its destination (assuming that there is no link
failure between the spine and the destination switch). Most routing protocols recognize that there are
multiple equal-cost paths to a destination and enable any of them to be selected for a given traffic flow.
FRRouting Overview
Cumulus Linux uses FRRouting to provide the routing protocols for dynamic routing. FRRouting provides
many routing protocols, of which Cumulus Linux supports the following:
Open Shortest Path First (v2 (see page 617) and v3 (see page 631))
Border Gateway Protocol (see page 633)
Contents
This chapter covers ...
Architecture (see page 601)
About zebra (see page 601)
Related Information (see page 601)
Architecture
As shown in the figure above, the FRRouting suite consists of various protocol-specific daemons and a
protocol-independent daemon called zebra. Each of the protocol-specific daemons are responsible for
running the relevant protocol and building the routing table based on the information exchanged.
It is not uncommon to have more than one protocol daemon running at the same time. For example, at the
edge of an enterprise, protocols internal to an enterprise (called IGP for Interior Gateway Protocol) such as
OSPF (see page 617) or RIP run alongside the protocols that connect an enterprise to the rest of the world
(called EGP or Exterior Gateway Protocol) such as BGP (see page 633).
About zebra
zebra is the daemon that resolves the routes provided by multiple protocols (including static routes
specified by the user) and programs these routes in the Linux kernel via netlink (in Linux). zebra does
more than this, of course. The FRRouting documentation defines zebra as the IP routing manager for
FRRouting that "provides kernel routing table updates, interface lookups, and redistribution of routes
between different routing protocols."
Related Information
frrouting.org
GitHub
cumulusnetworks.com 601
Cumulus Linux 3.5 User Guide
These instructions only apply to upgrading to Cumulus Linux 3.4 or later from releases earlier
than 3.4. New image installations contain frr instead of quagga or quagga-compat. If you are
using any automation tools to configure your network and are installing a new Cumulus Linux
image, make sure your automation tools refer to FRR and not to Quagga.
If you are upgrading Cumulus Linux using apt-get upgrade, existing automation that
references Quagga continues to work until you upgrade to FRR. Once you perform the following
upgrade steps, your automation must reference FRR instead of Quagga.
Upgrading to Cumulus Linux 3.4 or later results in both quagga.service and frr.service
being present on the system, until quagga.service is removed. These services have been
configured to conflict with each other; starting one service automatically stops the other, as they
cannot run concurrently.
At the end of the apt-get upgrade process, the output shows details of the upgrade
process, regarding the Quagga to FRR switchover.
| and may be
removed.
|
|
|
| As part of this upgrade, please take note of the
following information: |
|
|
| - The location of your configuration files has changed
to /etc/frr. |
| In order to enable a seamless transition, FRRouting
will continue to |
| read all configuration files from /etc/quagga until
the transition is |
|
completed.
|
|
|
| - In the interest of stability, action is required on
your part to |
| complete the transition to FRRouting. For
instructions on how to do |
| this, please refer to the Cumulus Linux
documentation. |
+---------------------------------------------------------
--------------------+
Setting up quagga (1.0.0+cl3u14-1) ...
Processing triggers for libc-bin (2.19-18+deb8u10) ...
Creating post-apt snapshot... 245 done.
root@dell-s6000-16:/etc#
Once the upgrade process is completed, the switch is in the following state:
cumulusnetworks.com 603
Cumulus Linux 3.5 User Guide
Cumulus 3.4 and later releases do not support or implement python-clcmd. While the package
remains, the related commands have been removed.
The vtysh.conf file should not be moved, as it is unlikely any configuration is in the file.
However, if there is necessary configuration in place, copy the contents into /etc/frr
/vtysh.conf.
2. Merge the current Quagga.conf file with the new frr.conf file. Keep the default configuration for
frr.conf in place, and add the additional configuration sections from Quagga.conf.
3. Enable the daemons needed for your installation in /etc/frr/daemons.
4. Manually update the log file locations to /var/log/frr or syslog.
5. Remove the compatibility package:
This step stops the Quagga compatibility mode, causing routing to go down.
This step deletes all Quagga configuration files. Please ensure you back up your
configuration.
Cumulus Networks does not recommend reinstalling the quagga and quagga-compat
packages once they have been removed. While they can be reinstalled to continue
migration iterations, limited testing has taken place, and configuration issues may occur.
Troubleshooting
If the systemctl -l status frr output shows an issue, edit the configuration files to correct it, and
repeat the process. If issues persist, you can return to Quagga compatibility mode for further testing:
cumulusnetworks.com 605
Cumulus Linux 3.5 User Guide
Once further testing is complete, run the following commands to reset the FRR installation, and then repeat
the steps from the beginning of this section to upgrade to FRR:
Configuring FRRouting
This section provides an overview of configuring FRRouting, the routing software package that provides a
suite of routing protocols so you can configure routing on your switch.
Contents
This chapter covers ...
Configuring FRRouting (see page 606)
Enabling and Starting FRRouting (see page 607)
Understanding Integrated Configurations (see page 607)
Restoring the Default Configuration (see page 608)
Interface IP Addresses and VRFs (see page 609)
Using the FRRouting vtysh Modal CLI (see page 609)
Reloading the FRRouting Configuration (see page 614)
Debugging (see page 614)
Related Information (see page 614)
Configuring FRRouting
FRRouting does not start by default in Cumulus Linux. Before you run FRRouting, make sure all you have
enabled relevant daemons that you intend to use — zebra, bgpd, ospfd, ospf6d or pimd — in the /etc
/frr/daemons file.
Cumulus Networks has not tested RIP, RIPv6, IS-IS and Babel.
The zebra daemon must always be enabled. The others you can enable according to how you plan to
route your network — using BGP (see page 633) for example, instead of OSPF (see page 617).
Before you start FRRouting, you need to enable the corresponding daemons. Edit the /etc/frr/daemons
file and set to yes each daemon you are enabling. For example, to enable BGP, set both zebra and bgpd to
yes:
bgpd=yes
ospfd=no
ospf6d=no
ripd=no
ripngd=no
isisd=no
babeld=no
pimd=no
All the routing protocol daemons (bgpd, ospfd, ospf6d, ripd, ripngd, isisd and pimd) are
dependent on zebra. When you start frr, systemd determines whether zebra is running; if
zebra is not running, it starts zebra, then starts the dependent service, such as bgpd.
In general, if you restart a service, its dependent services also get restarted. For example, running
systemctl restart frr.service restarts any of the routing protocol daemons that are
enabled and running.
For more information on the systemctl command and changing the state of daemons, read
Managing Application Daemons (see page 180).
If you disable the integrated configuration mode, FRRouting saves each daemon-specific configuration file
in a separate file. At a minimum for a daemon to start, that daemon must be enabled and its daemon-
specific configuration file must be present, even if that file is empty.
When the integrated configuration mode disabled, the output looks like this:
2. Remove /etc/frr/frr.conf:
3. Restart FRRouting:
If for some reason you disabled service integrated-vtysh-config, then you should
remove all the configuration files (such as zebra.conf or ospf6d.conf) instead of frr.conf
in step 2 above.
switch#
vtysh provides a Cisco-like modal CLI, and many of the commands are similar to Cisco IOS commands. By
modal CLI, we mean that there are different modes to the CLI, and certain commands are only available
within a specific mode. Configuration is available with the configure terminal command, which is
invoked thus:
The prompt displays the mode the CLI is in. For example, when the interface-specific commands are
invoked, the prompt changes to:
When the routing protocol specific commands are invoked, the prompt changes to:
At any level, ”?” displays the list of available top-level commands at that level:
cumulusnetworks.com 609
Cumulus Linux 3.5 User Guide
switch(config-if)# ?
bandwidth Set bandwidth informational parameter
description Interface specific description
end End current mode and change to enable mode
exit Exit current mode and down to previous mode
ip IP Information
ipv6 IPv6 Information
isis IS-IS commands
link-detect Enable link detection on interface
list Print command list
mpls-te MPLS-TE specific commands
multicast Set multicast flag to interface
no Negate a command or set its defaults
ptm-enable Enable neighbor check with specified topology
quit Exit current mode and down to previous mode
shutdown Shutdown the selected interface
?-based completion is also available to see the parameters that a command takes:
switch(config-if)# bandwidth ?
<1-10000000> Bandwidth in kilobits
switch(config-if)# ip ?
address Set the IP address of an interface
irdp Alter ICMP Router discovery preference this interface
ospf OSPF interface commands
rip Routing Information Protocol
router IP router interface commands
Displaying state can be done at any level, including the top level. For example, to see the routing table as
seen by zebra, you use:
Running single commands with vtysh is possible using the -c option of vtysh:
Notice that the commands also take a partial command name (for example, sh ip route above) as long
as the partial command name is not aliased:
cumulusnetworks.com 611
Cumulus Linux 3.5 User Guide
A command or feature can be disabled in FRRouting by prepending the command with no. For example:
The current state of the configuration can be viewed using the show running-config command:
Current configuration:
!
username cumulus nopassword
!
service integrated-vtysh-config
!
vrf mgmt
!
interface lo
link-detect
!
interface swp1
ipv6 nd ra-interval 10
link-detect
!
interface swp2
ipv6 nd ra-interval 10
link-detect
!
interface swp3
ipv6 nd ra-interval 10
link-detect
!
interface swp4
ipv6 nd ra-interval 10
link-detect
!
interface swp29
ipv6 nd ra-interval 10
link-detect
!
interface swp30
ipv6 nd ra-interval 10
link-detect
!
interface swp31
link-detect
!
interface swp32
link-detect
!
interface vagrant
link-detect
!
interface eth0 vrf mgmt
ipv6 nd suppress-ra
link-detect
!
interface mgmt vrf mgmt
link-detect
!
router bgp 65020
bgp router-id 10.0.0.21
bgp bestpath as-path multipath-relax
bgp bestpath compare-routerid
neighbor fabric peer-group
neighbor fabric remote-as external
neighbor fabric description Internal Fabric Network
neighbor fabric capability extended-nexthop
neighbor swp1 interface peer-group fabric
neighbor swp2 interface peer-group fabric
neighbor swp3 interface peer-group fabric
neighbor swp4 interface peer-group fabric
neighbor swp29 interface peer-group fabric
neighbor swp30 interface peer-group fabric
!
address-family ipv4 unicast
network 10.0.0.21/32
neighbor fabric activate
neighbor fabric prefix-list dc-spine in
neighbor fabric prefix-list dc-spine out
exit-address-family
!
ip prefix-list dc-spine seq 10 permit 0.0.0.0/0
ip prefix-list dc-spine seq 20 permit 10.0.0.0/24 le 32
ip prefix-list dc-spine seq 30 permit 172.16.1.0/24
ip prefix-list dc-spine seq 40 permit 172.16.2.0/24
ip prefix-list dc-spine seq 50 permit 172.16.3.0/24
ip prefix-list dc-spine seq 60 permit 172.16.4.0/24
cumulusnetworks.com 613
Cumulus Linux 3.5 User Guide
If you attempt to configure a routing protocol that has not been started, vtysh silently ignores
those commands.
Alternately, if you do not want to use a modal CLI to configure FRRouting, you can use a suite of Cumulus
Linux-specific commands (see page 615) instead.
FRRouting reload only applies to an integrated service configuration, where your FRRouting
configuration is stored in a single frr.conf file instead of one configuration file per FRRouting
daemon (like zebra or bgpd).
Examine the running configuration and verify that it matches the config in /etc/frr/frr.conf:
Debugging
If the running configuration is not what you expected, please submit a support request and supply the
following information:
The current running configuration (run net show configuration and output the contents to a
file)
The contents of /etc/frr/frr.conf
The contents of /var/log/frr/frr-reload.log
Related Information
614 02 March 2018
Cumulus Networks
Related Information
frrouting.org/user-guide/BGP.html#BGP
frrouting.org/user-guide/IPv6-Support.html#IPv6-Support
frrouting.org/user-guide/Zebra.html#Zeb frrouting.org/user-guide/Zebra.html#Zebra
cumulus@switch: switch(config)
~$ net add bgp # router bgp 65
autonomous- 002
system 65002 switch(config-
cumulus@switch: router)#
~$ net add bgp neighbor 14.0.0
neighbor 14.0.0.2 .22
2
cumulusnetworks.com 615
Cumulus Linux 3.5 User Guide
cumulus@switch: switch(config)
~$ net add # ip route 155.
routing route 155 1.2.20/24
.1.2.20/24 bridge 45
bridge 45
cumulus@switch: switch(config)
~$ net add interf # int swp3
ace swp3 ipv6 switch(config-i
address 3002:2123 f)# ipv6
:1234:1abc::21/64 address 3002:21
23:1234:1abc::2
1/64
cumulus@switch: switch(config)
~$ net add interf # int swp3
ace swp3 ospf6 switch(config-i
priority 120 f)# ip ospf6
priority 120
cumulus@switch: switch(config)
~$ net add ospf6 # router ospf6
timers throttle switch(config-
spf 40 50 60 ospf6)# timer
throttle spf 40
50 60
flooding. Flooding is done hop-by-hop. OSPF ensures reliability by using link state acknowledgement
cumulusnetworks.com 617
Cumulus Linux 3.5 User Guide
flooding. Flooding is done hop-by-hop. OSPF ensures reliability by using link state acknowledgement
packets. The set of LSAs in a router’s memory is termed link-state database (LSDB), a representation of the
network graph. Thus, OSPF ensures a consistent view of LSDB on each node in the network in a distributed
fashion (eventual consistency model); this is key to the protocol’s correctness.
Contents
This chapter covers ...
Scalability and Areas (see page 618)
Configuring OSPFv2 (see page 619)
Enabling the OSPF and Zebra Daemons (see page 619)
Configuring OSPF (see page 619)
Defining (Custom) OSPF Parameters on the Interfaces (see page 621)
OSPF SPF Timer Defaults (see page 621)
Configure MD5 Authentication for OSPF Neighbors (see page 621)
Scaling Tips (see page 622)
Summarization (see page 622)
Stub Areas (see page 623)
Running Multiple ospfd Instances (see page 624)
Unnumbered Interfaces (see page 628)
Applying a Route Map for Route Updates (see page 629)
ECMP (see page 629)
Topology Changes and OSPF Reconvergence (see page 630)
Example Configurations (see page 630)
Debugging OSPF (see page 630)
Related Information (see page 631)
Here are some points to note about areas and OSPF behavior:
Routers that have links to multiple areas are called area border routers (ABR). For example, routers
R3, R4, R5, R6 are ABRs in the diagram. An ABR performs a set of specialized tasks, such as SPF
computation per area and summarization of routes across areas.
Most of the LSAs have an area-level flooding scope. These include router LSA, network LSA, and
summary LSA.
In the diagram, we reused the same non-zero area address. This is fine since the area address is
only a scoping parameter provided to all routers within that area. It has no meaning outside the
area. Thus, in the cases where ABRs do not connect to multiple non-zero areas, the same area
address can be used, thus reducing the operational headache of coming up with area addresses.
Configuring OSPFv2
Configuring OSPF involves the following tasks:
Enabling the OSPF daemon
Enabling OSPF
Defining (Custom) OSPF parameters on the interfaces
Configuring OSPF
As discussed in Introduction to Routing Protocols (see page 597), there are three steps to the configuration:
There are two ways to achieve (2) and (3) in FRRouting OSPF:
1. The network statement under router ospf does both. The statement is specified with an IP
subnet prefix and an area address. All the interfaces on the router whose IP address matches the
network subnet are put into the specified area. OSPF process starts bringing up peering adjacency
on those interfaces. It also advertises the interface IP addresses formatted into LSAs (of various
types) to the neighbors for proper reachability.
The subnets can be as coarse as possible to cover the most number of interfaces on the router that
should run OSPF.
There may be interfaces where it’s undesirable to bring up OSPF adjacency. For example, in a data
center topology, the host-facing interfaces need not run OSPF; however the corresponding IP
addresses should still be advertised to neighbors. This can be achieved using the passive-
interface construct:
Or use the passive-interface default command to put all interfaces as passive and
selectively remove certain interfaces to bring up protocol adjacency:
2. Explicitly enable OSPF for each interface by configuring it under the interface configuration mode:
If OSPF adjacency bringup is not desired, you should configure the corresponding interfaces as
passive as explained above.
This model of configuration is required for unnumbered interfaces as discussed later in this guide.
For achieving step (3) alone, the FRRouting configuration provides another method: redistribution. For
example:
Redistribution, however, unnecessarily loads the database with type-5 LSAs and should be limited to
generating real external prefixes (for example, prefixes learned from BGP). In general, it is a good
practice to generate local prefixes using network and/or passive-interface statements.
In the example command above, KEYID represents the key used to create the message digest. It's a value
between 1-255 and must be consistent across all routers on a link.
KEY represents the actual message digest key, and is associated to the given KEYID. This value has an
upper range of 16 characters; longer strings get truncated.
cumulusnetworks.com 621
Cumulus Linux 3.5 User Guide
Existing MD5 authentication hashes can be removed with the net del interface
<interface> ospf message-digest-key <1-255> md5 <text> command.
Scaling Tips
Here are some tips for how to scale out OSPF.
Summarization
By default, an ABR creates a summary (type-3) LSA for each route in an area and advertises it in adjacent
areas. Prefix range configuration optimizes this behavior by creating and advertising one summary LSA for
multiple routes.
To configure a range:
Summarize in the direction to the backbone. The backbone receives summarized routes and
injects them to other areas already summarized.
Summarization can cause non-optimal forwarding of packets during failures. Here is an example
scenario:
As shown in the diagram, the ABRs in the right non-zero area summarize the host prefixes as 10.1.0.0/16.
When the link between R5 and R10 fails, R5 will send a worse metric for the summary route (metric for the
summary route is the maximum of the metrics of intra-area routes that are covered by the summary route.
Upon failure of the R5-R10 link, the metric for 10.1.2.0/24 goes higher at R5 as the path is R5-R9-R6-R10).
As a result, other backbone routers shift traffic destined to 10.1.0.0/16 towards R6. This breaks ECMP and is
an under-utilization of network capacity for traffic destined to 10.1.1.0/24.
Stub Areas
Nodes in an area receive and store intra-area routing information and summarized information about
other areas from the ABRs. In particular, a good summarization practice about inter-area routes through
prefix range configuration helps scale the routers and keeps the network stable.
Then there are external routes. External routes are the routes redistributed into OSPF from another
protocol. They have an AS-wide flooding scope. In many cases, external link states make up a large
percentage of the LSDB.
Stub areas alleviate this scaling problem. A stub area is an area that does not receive external route
advertisements.
To configure a stub area:
Stub areas still receive information about networks that belong to other areas of the same OSPF domain.
Especially, if summarization is not configured (or is not comprehensive), the information can be
overwhelming for the nodes. Totally stubby areas address this issue. Routers in totally stubby areas keep in
their LSDB information about routing within their area, plus the default route.
To configure a totally stubby area:
Type Behavior
Normal non- zero LSA types 1, 2, 3, 4 area-scoped, type 5 externals, inter-area routes summarized
area
Totally stubby area LSA types 1, 2 area-scoped, default summary, No type 3, 4, 5 LSA types allowed
1. Edit /etc/frr/daemons and add ospfd_instances="instance1 instance2 ..." to the ospfd line,
specifying an instance ID for each separate instance. For example, the following configuration has
OSPF enabled with 2 ospfd instances, 11 and 22:
hostname zebra
log file /var/log/frr/zebra.log
username cumulus nopassword
service integrated-vtysh-config
interface eth0
ipv6 nd suppress-ra
link-detect
interface lo
link-detect
interface swp1
ip ospf 11 area 0.0.0.0
link-detect
interface swp2
ip ospf 22 area 0.0.0.0
link-detect
interface swp45
link-detect
interface swp46
link-detect
interface swp47
link-detect
interface swp48
link-detect
interface swp49
link-detect
interface swp50
link-detect
interface swp51
link-detect
interface swp52
link-detect
cumulusnetworks.com 625
Cumulus Linux 3.5 User Guide
interface vagrant
link-detect
router ospf 11
ospf router-id 1.1.1.1
router ospf 22
ospf router-id 1.1.1.1
ip forwarding
ipv6 forwarding
line vty
end
Caveats
You can use the redistribute ospf option in your frr.conf file works with this so you can route
between the instances. Specify the instance ID for the other OSPF instance. For example:
...
!
router ospf 11
ospf router-id 1.1.1.1
!
router ospf 22
ospf router-id 1.1.1.1
redistribute ospf 11
!
...
If you disabled the integrated (see page 607) FRRouting configuration, you must create a separate
ospfd configuration file for each instance. The ospfd.conf file must include the instance ID in
the file name. Continuing with our example, you would create /etc/frr/ospfd-11.conf and
/etc/frr/ospfd-22.conf.
cumulusnetworks.com 627
Cumulus Linux 3.5 User Guide
interface swp48
link-detect
!
interface swp49
link-detect
!
interface swp50
link-detect
!
interface swp51
link-detect
!
interface swp52
link-detect
!
interface vagrant
link-detect
!
router ospf 11
ospf router-id 1.1.1.1
!
router ospf 22
ospf router-id 1.1.1.1
!
ip forwarding
ipv6 forwarding
!
line vty
!
Unnumbered Interfaces
Unnumbered interfaces are interfaces without unique IP addresses. In OSPFv2, configuring unnumbered
interfaces reduces the links between routers into pure topological elements, which dramatically simplifies
network configuration and reconfiguration. In addition, the routing database contains only the real
networks, so the memory footprint is reduced and SPF is faster.
Unless the Ethernet media is intended to be used as a LAN with multiple connected routers, we
recommend configuring the interface as point-to-point. It has the additional advantage of a
simplified adjacency state machine; there is no need for DR/BDR election and LSA reflection. See
RFC5309 for a more detailed discussion.
To configure an unnumbered interface, take the IP address of another interface (called the anchor) and use
that as the IP address of the unnumbered interface:
auto lo
iface lo inet loopback
address 192.0.2.1/32
auto swp1
iface swp1
address 192.0.2.1/32
auto swp2
iface swp2
address 192.0.2.1/32
ECMP
During SPF computation for an area, if OSPF finds multiple paths with equal cost (metric), all those paths
are used for forwarding. For example, in the reference topology diagram, R8 uses both R3 and R4 as next
hops to reach a destination attached to R9.
cumulusnetworks.com 629
Cumulus Linux 3.5 User Guide
For the maintenance events, operators typically raise the OSPF administrative weight of the link(s) to ensure
that all traffic is diverted from the link or the node (referred to as costing out). The speed of reconvergence
does not matter. Indeed, changing the OSPF cost causes LSAs to be reissued, but the links remain in service
during the SPF computation process of all routers in the network.
For the failure events, traffic may be lost during reconvergence; that is, until SPF on all nodes computes an
alternative path around the failed link or node to each of the destinations. The reconvergence depends on
layer 1 failure detection capabilities and at the worst case DeadInterval OSPF timer.
Example Configurations
Example configuration for event 1, using vtysh:
Debugging OSPF
OperState lists all the commands to view the operational state of OSPF.
The three most important states while troubleshooting the protocol are:
1. Neighbors, with net show ospf neighbor. This is the starting point to debug neighbor states
(also see tcpdump below).
2. Database, with net show ospf database. This is the starting point to verify that the LSDB is, in
fact, synchronized across all routers in the network. For example, sweeping through the output of
show ip ospf database router taken from all routers in an area will ensure if the topology
graph building process is complete; that is, every node has seen all the other nodes in the area.
3. Routes, with net show route ospf. This is the outcome of SPF computation that gets
downloaded to the forwarding table, and is the starting point to debug, for example, why an OSPF
route is not being forwarded correctly.
Related Information
Bidirectional forwarding detection (see page 669) (BFD) and OSPF
en.wikipedia.org/wiki/Open_Shortest_Path_First
frrouting.org/user-guide/OSPFv2.html#OSPFv2
Perlman, Radia (1999). Interconnections: Bridges, Routers, Switches, and Internetworking Protocols
(2 ed.). Addison-Wesley.
Moy, John T. OSPF: Anatomy of an Internet Routing Protocol. Addison-Wesley.
IETF has defined extensions to OSPFv3 to support multiple address families (that is, both IPv6 and
IPv4). frr (see page 600) does not support it yet.
cumulusnetworks.com 631
Cumulus Linux 3.5 User Guide
Contents
This chapter covers ...
Configuring OSPFv3 (see page 632)
Unnumbered Interfaces (see page 632)
Debugging OSPF (see page 632)
Related Information (see page 633)
Configuring OSPFv3
Configuring OSPFv3 involves the following tasks:
1. Enabling the zebra and ospf6 daemons, as described in Configuring FRRouting (see page 606) then
start the FRRouting service:
Unnumbered Interfaces
Unlike OSPFv2, OSPFv3 intrinsically supports unnumbered interfaces. Forwarding to the next hop router is
done entirely using IPv6 link local addresses. Therefore, you are not required to configure any global IPv6
address to interfaces between routers.
Debugging OSPF
See Debugging OSPF (see page 630) for OSPFv2 for the troubleshooting discussion. The equivalent
commands are:
Another helpful command is net show ospf6 spf tree. It dumps the node topology as computed by
SPF to help visualize the network view.
Related Information
Bidirectional forwarding detection (see page 669) (BFD) and OSPF
en.wikipedia.org/wiki/Open_Shortest_Path_First
frrouting.org/user-guide/OSPFv3.html#OSPFv3
Contents
This chapter covers ...
Autonomous System Number (ASN) (see page 634)
eBGP and iBGP (see page 635)
Route Reflectors (see page 635)
Configuring Clusters (see page 636)
ECMP with BGP (see page 637)
Maximum Paths (see page 637)
BGP for Both IPv4 and IPv6 (see page 637)
Configuring BGP (see page 637)
Using BGP Unnumbered Interfaces (see page 638)
BGP and Extended Next-hop Encoding (see page 639)
Configuring BGP Unnumbered Interfaces (see page 639)
Managing Unnumbered Interfaces (see page 640)
How traceroute Interacts with BGP Unnumbered Interfaces (see page 642)
cumulusnetworks.com 633
Cumulus Linux 3.5 User Guide
How traceroute Interacts with BGP Unnumbered Interfaces (see page 642)
Advanced: Understanding How Next-hop Fields Are Set (see page 642)
Limitations (see page 643)
BGP add-path (see page 644)
BGP add-path RX (see page 644)
BGP add-path TX (see page 645)
Fast Convergence Design Considerations (see page 647)
Specifying the Interface Name in the neighbor Command (see page 647)
Using Peer Groups to Simplify Configuration (see page 648)
Configuring BGP Dynamic Neighbors (see page 648)
Configuring BGP Peering Relationships across Switches (see page 649)
Configuring MD5-enabled BGP Neighbors (see page 651)
Manually Configuring an MD5-enabled BGP Neighbor (see page 652)
Configuring eBGP Multihop (see page 654)
Configuring BGP TTL Security (see page 655)
Configuration Tips (see page 657)
BGP Advertisement Best Practices (see page 657)
Utilizing Multiple Routing Tables and Forwarding (see page 658)
Using BGP Community Lists (see page 658)
Additional Default Settings (see page 659)
Configuring BGP Neighbor maximum-prefixes (see page 659)
Troubleshooting BGP (see page 659)
Debugging Tip: Logging Neighbor State Changes (see page 662)
Troubleshooting Link-local Addresses (see page 662)
Enabling Read-only Mode (see page 665)
Applying a Route Map for Route Updates (see page 665)
Filtering Routes from BGP into Zebra (see page 665)
Filtering Routes from Zebra into the Linux Kernel (see page 666)
Protocol Tuning (see page 666)
Converging Quickly On Link Failures (see page 666)
Converging Quickly On Soft Failures (see page 666)
Reconnecting Quickly (see page 667)
Advertisement Interval (see page 667)
Caveats and Errata (see page 668)
ttl-security Issue (see page 668)
BGP Dynamic Capabilities not Supported (see page 668)
Related Information (see page 668)
Route Reflectors
Route reflectors are quite easy to understand in a Clos topology. In a two-tier Clos network, the leaf (or tier
1) switches are the only ones connected to end stations. Subsequently, this means that the spines
themselves do not have any routes to announce; they are merely reflecting the routes announced by one
leaf to the other leaves. Therefore, the spine switches function as route reflectors while the leaf switches
serve as route reflector clients.
In a three-tier network, the tier 2 nodes (or mid-tier spines) act as both route reflector servers and route
reflector clients. They act as route reflectors because they announce the routes learned from the tier 1
nodes to other tier 1 nodes and to tier 3 nodes. They also act as route reflector clients to the tier 3 nodes,
receiving routes learned from other tier 2 nodes. Tier 3 nodes act only as route reflectors.
In the following illustration, tier 2 node 2.1 is acting as a route reflector server, announcing the routes
between tier 1 nodes 1.1 and 1.2 to tier 1 node 1.3. It is also a route reflector client, learning the routes
between tier 2 nodes 2.2 and 2.3 from the tier 3 node, 3.1.
cumulusnetworks.com 635
Cumulus Linux 3.5 User Guide
Configuring Clusters
A cluster consists of route reflectors (RRs) and their clients and is used in iBGP environments where
multiple sets of route reflectors and their clients are configured. Configuring a unique ID per cluster (on the
route reflector server and clients) prevents looping as a route reflector does not accept routes from
another that has the same cluster ID. Additionally, because all route reflectors in the cluster recognize
updates from peers in the same cluster, they do not install routes from a route reflector in the same
cluster; this reduces the number of updates that need to be stored in BGP routing tables.
To configure a cluster ID on a route reflector, run the net add bgp cluster-id (<ipv4>|<1-
4294967295>) command. You can enter the cluster ID as an IP address or as a 32-bit quantity.
The following example configures a cluster ID on a route reflector in IP address format:
Maximum Paths
In Cumulus Linux, the BGP maximum-paths setting is enabled by default, so multiple routes are already
installed. The default setting is 64 paths.
Configuring BGP
A basic BGP configuration looks like the following. However, the rest of this chapter discusses how to
configure various other features, from unnumbered interfaces to route maps.
1. Enable the BGP and Zebra daemons, zebra and bgpd, then enable the FRRouting service and start
it, as described in Configuring FRRouting (see page 606).
2. Identify the BGP node by assigning an ASN and router-id:
cumulusnetworks.com 637
3.
Specifying the IP address of the peer, allows BGP to set up a TCP socket with this peer, but it doesn’t
distribute any prefixes to it, unless it is explicitly told that it must with the activate command:
As you can see, activate has to be specified for each address family that is being announced by
the BGP session.
4. Specify some properties of the BGP session:
It is node switchRR, the route reflector, on which the peer is specified as a client.
BGP unnumbered interfaces are particularly useful in deployments where IPv4 prefixes are advertised
through BGP over a section without any IPv4 address configuration on links. As a result, the routing entries
are also IPv4 for destination lookup and have IPv6 next-hops for forwarding purposes.
It is assumed that the IPv6 implementation on the peering device will use the MAC address as the
interface ID when assigning the IPv6 link-local address, as suggested by RFC 4291.
cumulusnetworks.com 639
Cumulus Linux 3.5 User Guide
Notice above, for an unnumbered configuration, you can use a single command to configure a neighbor
and attach it to a peer group (see page 648) (making sure to substitute for the interface and peer group
below):
The following commands show how the IPv4 link-local address 169.254.0.1 is used to install the route and
static neighbor entry to facilitate proper forwarding without having to install an IPv4 prefix with IPv6 next-
hop in the kernel:
You can use this iproute2 command to display more neighbor information:
cumulus@switch:~$ ip neighbor
192.168.0.254 dev eth0 lladdr 44:38:39:00:00:5f REACHABLE
169.254.0.1 dev swp52 lladdr 44:38:39:00:00:2b PERMANENT
169.254.0.1 dev swp51 lladdr 44:38:39:00:00:5c PERMANENT
cumulusnetworks.com 641
Cumulus Linux 3.5 User Guide
If this address were a link-local IPv6 address, it would get reset so that the link-local IPv6
address of the eBGP peer is not passed along to an iBGP peer, which most likely may be
on a different link.
route-map and/or the peer configuration can change the above behavior. For example, route-
map can set the global IPv6 next-hop or the peer configuration can set it to self — which is relevant
for iBGP peers. The route map or peer configuration can also set the next-hop to unchanged, which
ensures the source IPv6 global next-hop is passed around — which is relevant for eBGP peers.
Whenever two next-hops are being sent, the link-local next-hop (the second value of the two) is the
link-local IPv6 address on the peering interface unless it is due to nh-local-unchanged or
route-map has set the link-local next-hop.
Network administrators cannot set martian values for IPv6 next-hops in route-map. Also, global
and link-local next-hops are validated to ensure they match the respective address types.
In a received update, a martian check is imposed for the IPv6 global next-hop. If the check fails, it
gets treated as an implicit withdraw.
If two next-hops are received in an update and the second next-hop is not a link-local address, it
gets ignored and the update is treated as if only one next-hop was received.
Whenever two next-hops are received in an update, the second next-hop is used to install the route
into zebra. As per the previous point, it is already assured that this is a link-local IPv6 address.
Currently, this is assumed to be reachable and is not registered with NHT.
When route-map specifies the next-hop as peer-address, the global IPv6 next-hop as well as the
link-local IPv6 next-hop (if it's being sent) is set to the peering address. If the peering is on a link-local
address, the former could be the link-local address on the peering interface, unless there is a global
IPv6 address present on this interface.
The above rules imply that there are scenarios where a generated update has two IPv6 next-hops, and both
of them are the IPv6 link-local address of the peering interface on the local system. If you are peering with a
switch or router that is not running Cumulus Linux and expects the first next-hop to be a global IPv6
address, a route map can be used on the sender to specify a global IPv6 address. This conforms with the
recommendations in the Internet draft draft-kato-bgp-ipv6-link-local-00.txt, "BGP4+ Peering Using IPv6 Link-
local Address".
Limitations
Interface-based peering with separate IPv4 and IPv6 sessions is not supported.
ENHE is sent for IPv6 link-local peerings only.
If an IPv4 /30 or /31 IP address is assigned to the interface, IPv4 peering is used over IPv6 link-local
peering.
If the default router lifetime in the generated IPv6 route advertisements (RA) is set to 0, the receiving
FRRouting instance drops the RA if it is on a Cumulus Linux 2.5.z switch. To work around this issue,
either:
Explicitly configure the switch to advertise a router lifetime of 0, unless a value is specifically
set by the operator — with the assumption that the host is running Cumulus Linux 3.y.z
version of FRRouting. When hosts see an IPv6 RA with a router lifetime of 0, they do not
make that router a default router.
Use the sysctl on the host — net.ipv6.conf.all.accept_ra_defrtr. However, this
requires applying this setting on all hosts, which might mean many hosts, especially if
FRRouting is run on the hosts.
cumulusnetworks.com 643
Cumulus Linux 3.5 User Guide
BGP add-path
BGP add-path RX
BGP add-path RX allows BGP to receive multiple paths for the same prefix. A path identifier is used so that
additional paths do not override previously advertised paths. No additional configuration is required for
BGP add-path RX.
To view the existing capabilities, run net show bgp neighbor. The existing capabilities are listed in the
subsection Add Path, below Neighbor capabilities:
...
The example output above shows that additional BGP paths can be sent and received (TX and RX are
advertised). It also shows that the BGP neighbor, fe80::4638:39ff:fe00:5c, supports both.
To view the current additional paths, run net show bgp <network>. The example output shows an
additional path that has been added by the TX node for receiving. Each path has a unique AddPath ID.
BGP add-path TX
AddPath TX allows BGP to advertise more than just the bestpath for a prefix. Consider the following
topology:
r8
|
|
r1 ---- ---- r6
r2 ---- r7 ---- r5
||
||
r3 r4
In this topology:
r1 and r2 are in AS 100
r3 and r4 are in AS 300
r5 and r6 are in AS 500
r7 is in AS 700
r8 is in AS 800
r7 learns 1.1.1.1/32 from r1, r2, r3, r4, r5, and r6. Among these r7 picks the path from r1 as the
bestpath for 1.1.1.1/32
The example below configures the r7 session to advertise the bestpath learned from each AS. In this case,
this means a path from AS 100, a path from AS 300, and a path from AS 500. The net show bgp 1.1.1.1
/32 from r7 has "bestpath-from-AS 100" so the user can see what the bestpath is from each AS:
cumulusnetworks.com 645
Cumulus Linux 3.5 User Guide
700 300
10.7.8.1 from r7(10.7.8.1) (10.0.0.7)
Origin IGP, localpref 100, valid, external
Community: 3:3
AddPath ID: RX 4, TX 3
Last update: Thu Jun 2 00:57:14 2016
700 500
10.7.8.1 from r7(10.7.8.1) (10.0.0.7)
Origin IGP, localpref 100, valid, external, bestpath-from-AS
700, best
Community: 5:5
AddPath ID: RX 6, TX 2
Last update: Thu Jun 2 00:57:14 2016
The example below shows the results if r7 is configured to advertise all paths to r8:
AddPath ID: RX 2, TX 4
Last update: Thu Jun 2 00:57:14 2016
700 300
10.7.8.1 from r7(10.7.8.1) (10.0.0.7)
Origin IGP, localpref 100, valid, external
Community: 3:3
AddPath ID: RX 4, TX 3
Last update: Thu Jun 2 00:57:14 2016
700 500
10.7.8.1 from r7(10.7.8.1) (10.0.0.7)
Origin IGP, localpref 100, valid, external, bestpath-from-AS
700, best
Community: 5:5
AddPath ID: RX 6, TX 2
Last update: Thu Jun 2 00:57:14 2016
You create the above configuration with the following NCLU commands:
cumulusnetworks.com 647
Cumulus Linux 3.5 User Guide
By default, Cumulus Linux sends IPv6 neighbor discovery router advertisements. Cumulus
Networks recommends you adjust the interval of the router advertisement to a shorter value (
net add interface <interface> ipv6 nd ra-interval <interval>) to address
scenarios when nodes come up and miss router advertisement processing to relay the neighbor’s
link-local address to BGP. The interval is measured in seconds and defaults to 10 seconds.
BGP peer-group restrictions have been replaced with update-groups, which dynamically examine
all peers and group them if they have the same outbound policy.
You can limit the number of dynamic peers by specifying that limit in the bgp listen limit command
(the default value is 100):
cumulusnetworks.com 649
Cumulus Linux 3.5 User Guide
To connect to the same AS using the neighbor command, modify your configuration similar to the
following:
To connect to a different AS using the neighbor command, modify your configuration similar to the
following:
To connect to the same AS using the peer-group command, modify your configuration similar to
the following:
To connect to a different AS using the peer-group command, modify your configuration similar to
the following:
switch1
cumulusnetworks.com 651
Cumulus Linux 3.5 User Guide
switch2
Cumulus Networks
3. Confirm the configuration has been implemented with the net show bgp summary command:
6. Confirm the configuration has been implemented with the net show bgp summary command:
cumulusnetworks.com 653
Cumulus Linux 3.5 User Guide
swp3 4 0 0 0 0 0 0
never Idle
swp4 4 0 0 0 0 0 0
never Idle
Total number of neighbors 4
1. To establish a connection between two eBGP peers that are not directly connected:
2. Confirm the configuration with the net show bgp neighbor <ip> command:
cumulusnetworks.com 655
2.
Cumulus Linux 3.5 User Guide
Configuration Tips
cumulusnetworks.com 657
Cumulus Linux 3.5 User Guide
In Cumulus Linux 3.0 and later, BGP and static routing (IPv4 and IPv6) are supported within a VRF
context. For more information, refer to Virtual Routing and Forwarding - VRF (see page 693).
When the neighbor receives the prefix, it examines the community value and takes action accordingly, such
as permitting or denying the community member in the routing policy.
Here's an example of standard community list filter:
You can apply the community list to a route map to define the routing policy:
Troubleshooting BGP
The most common starting point for troubleshooting BGP is to view the summary of neighbors connected
to and some information about these connections. A sample output of this command is as follows:
cumulusnetworks.com 659
Cumulus Linux 3.5 User Guide
You can determine whether the sessions above are iBGP or eBGP sessions by looking at the
ASNs.
A more detailed breakdown of a specific neighbor can be obtained using net show bgp neighbor
<neighbor>:
cumulusnetworks.com 661
Cumulus Linux 3.5 User Guide
To see the details of a specific route such as from whom it was received, to whom it was sent, and so forth,
use the net show bgp <ip address/prefix> command:
This shows that the routing table prefix seen by BGP is 10.0.0.11/32, that this route was advertised to two
neighbors, and that it was not heard by any neighbors.
IPv6 route advertisements (RAs) are automatically enabled on an interface with IPv6 addresses, so
the step no ipv6 nd suppress-ra is no longer needed for BGP unnumbered.
Instead of the IPv6 address, the peering interface name is displayed in the show ip bgp summary
command and wherever else applicable:
cumulusnetworks.com 663
Cumulus Linux 3.5 User Guide
Most of the show commands can take the interface name instead of the IP address, if that level of
specificity is needed:
cumulusnetworks.com 665
Cumulus Linux 3.5 User Guide
Protocol Tuning
See Caveats and Errata below for information regarding ttl-security hops.
Here is an example:
The following display snippet shows that the default values have been modified for this neighbor:
Reconnecting Quickly
A BGP process attempts to connect to a peer after a failure (or on startup) every connect-time seconds.
By default, this is 10 seconds. To modify this value, use:
This command has to be specified per each neighbor, peer-group doesn’t support this option in frr.
Advertisement Interval
BGP by default chooses stability over fast convergence. This is very useful when routing for the Internet. For
example, unlike link-state protocols, BGP typically waits for a duration of advertisement-interval
seconds between sending consecutive updates to a neighbor. This ensures that an unstable neighbor
flapping routes won’t be propagated throughout the network. By default, this is set to 0 seconds for both
eBGP and iBGP sessions, which allows for very fast convergence. You can modify this as follows:
cumulusnetworks.com 667
Cumulus Linux 3.5 User Guide
Route Refresh: 0 0
Capability: 0 0
Total: 2383 2380
Minimum time between advertisement runs is 5 seconds
...
See this IETF draft for more details on the use of this value.
ttl-security Issue
Enabling ttl-security does not cause the hardware to be programmed with the relevant information.
This means that frames will come up to the CPU and be dropped there. It is recommended that you use the
net add acl command to explicitly add the relevant entry to hardware.
For example, you can configure a file, like /etc/cumulus/acl/policy.d/01control_plane_bgp.
rules, with a rule like this for TTL:
INGRESS_INTF = swp1
INGRESS_CHAIN = INPUT, FORWARD
[iptables]
-A $INGRESS_CHAIN --in-interface $INGRESS_INTF -p tcp --dport bgp
-m ttl --ttl 255 POLICE --set-mode pkt --set-rate 2000 --set-burst
1000
-A $INGRESS_CHAIN --in-interface $INGRESS_INTF -p tcp --dport bgp DROP
For more information about ACLs, see Netfilter (ACLs) (see page 134).
Related Information
Bidirectional forwarding detection (see page 669) (BFD) and BGP
Wikipedia entry for BGP (includes list of useful RFCs)
frrouting.org/user-guide/BGP.html#BGP
IETF draft discussing BGP use within data centers
Contents
This chapter covers ...
Using BFD Multihop Routed Paths (see page 669)
BFD Parameters (see page 669)
Configuring BFD (see page 670)
BFD in BGP (see page 670)
BFD in OSPF (see page 671)
OSPF Show Commands (see page 672)
Scripts (see page 673)
Echo Function (see page 674)
About the Echo Packet (see page 674)
Transmitting and Receiving Echo Packets (see page 674)
Using Echo Function Parameters (see page 674)
Troubleshooting BFD (see page 675)
BFD Parameters
You can configure the following BFD parameters for both IPv4 and IPv6 sessions:
The required minimum interval between the received BFD control packets.
The minimum interval for transmitting BFD control packets.
The detection time multiplier.
cumulusnetworks.com 669
Cumulus Linux 3.5 User Guide
Configuring BFD
You configure BFD one of two ways: by specifying the configuration in the PTM topology.dot file (see
page 301), or using FRRouting (see page 600). However, the topology file has some limitations:
The topology.dot file supports creating BFD IPv4 and IPv6 single hop sessions only; you cannot
specify IPv4 or IPv6 multihop sessions in the topology file.
The topology file supports BFD sessions for only link-local IPv6 peers; BFD sessions for global IPv6
peers discovered on the link will not be created.
You cannot specify BFD multihop sessions in the topology.dot file since you cannot specify the
source and destination IP address pairs in that file. Use FRRouting (see page 606) to configure
multihop sessions.
The FRRouting CLI can track IPv4 and IPv6 peer connectivity — both single hop and multihop, and both link-
local IPv6 peers and global IPv6 peers — using BFD sessions without needing the topology.dot file. Use
FRRouting to register multihop peers with PTM and BFD as well as for monitoring the connectivity to the
remote BGP multihop peer. FRRouting can dynamically register and unregister both IPv4 and IPv6 peers
with BFD when the BFD-enabled peer connectivity is established or de-established, respectively. Also, you
can configure BFD parameters for each BGP or OSPF peer using FRRouting.
The BFD parameter configured in the topology file is given higher precedence over the client-
configured BFD parameters for a BFD session that has been created by both topology file and
client (FRRouting).
BFD requires an IP address for any interface on which it is configured. The neighbor IP address
for a single hop BFD session must be in the ARP table before BFD can start sending control
packets.
BFD in BGP
For FRRouting when using BGP, neighbors are registered and de-registered with PTM (see page 301)
dynamically when you enable BFD in BGP using net add bgp neighbor <neighbor|IP|interface>
bfd. For example:
Configuration of BFD for a peergroup or individual neighbors is performed in the same way.
These commands add the neighbor SPINE bfd line below the last address family configuration in the
/etc/frr/frr.conf file:
...
...
The configuration above configures the default BFD values of intervals: 3, minimum RX interval: 300ms,
minimum TX interval: 300ms.
To see neighbor information in BGP, including BFD status, run net show bgp neighbor <interface>.
To change the BFD values to something other than the defaults, BFD parameters can be configured for
each BGP neighbor. For example:
BFD in BGP
BFD in OSPF
For FRRouting using OSFP, neighbors are registered and de-registered dynamically with PTM (see page 301)
when you enable or disable BFD in OSPF. A neighbor is registered with BFD when two-way adjacency is
established and deregistered when adjacency goes down if the BFD is enabled on the interface. The BFD
configuration is per interface and any IPv4 and IPv6 neighbors discovered on that interface inherit the
configuration.
BFD in OSPF
These commands create the following configuration snippet in the /etc/frr/frr.conf file:
interface swp1
cumulusnetworks.com 671
Cumulus Linux 3.5 User Guide
Scripts
ptmd executes scripts at /etc/ptm.d/bfd-sess-down and /etc/ptm.d/bfd-sess-up for when BFD
sessions go down or up, running bfd-sess-down when a BFD session goes down and running bfd-sess-
up when a BFD session goes up.
You should modify these default scripts as needed.
cumulusnetworks.com 673
Cumulus Linux 3.5 User Guide
Echo Function
Cumulus Linux supports the echo function for IPv4 single hops only, and with the asynchronous operating
mode only (Cumulus Linux does not support demand mode).
You use the echo function primarily to test the forwarding path on a remote system. To enable the echo
function, set echoSupport to 1 in the topology file.
Once the echo packets are looped by the remote system, the BFD control packets can be sent at a much
lower rate. You configure this lower rate by setting the slowMinTx parameter in the topology file to a non-
zero value of milliseconds.
You can use more aggressive detection times for echo packets since the round-trip time is reduced
because they are accessing the forwarding path. You configure the detection interval by setting the
echoMinRx parameter in the topology file to a non-zero value of milliseconds; the minimum setting is 50
milliseconds. Once configured, BFD control packets are sent out at this required minimum echo Rx interval.
This indicates to the peer that the local system can loop back the echo packets. Echo packets are
transmitted if the peer supports receiving echo packets.
0 1 2 3
My Discriminator
Where:
Version is the version of the BFD echo packet.
Length is the length of the BFD echo packet.
My Discriminator is a non-zero value that uniquely identifies a BFD session on the transmitting
side. When the originating node receives the packet after being looped back by the receiving
system, this value uniquely identifies the BFD session.
echoSupport: Enables and disables echo mode. Set to 1 to enable the echo function. It defaults to
674 02 March 2018
Cumulus Networks
echoSupport: Enables and disables echo mode. Set to 1 to enable the echo function. It defaults to
0 (disable).
echoMinRx: The minimum interval between echo packets the local system is capable of receiving.
This is advertised in the BFD control packet. When the echo function is enabled, it defaults to 50. If
you disable the echo function, this parameter is automatically set to 0, which indicates the port or
the node cannot process or receive echo packets.
slowMinTx: The minimum interval between transmitting BFD control packets when the echo
packets are being exchanged.
Troubleshooting BFD
You can use the following commands to view information about active BFD sessions.
To return information on active BFD sessions, use the net show bfd sessions command:
----------------------------------------------------------
port peer state local type diag
----------------------------------------------------------
swp1 11.0.0.2 Up N/A singlehop N/A
N/A 12.12.12.1 Up 12.12.12.4 multihop N/A
To return more detailed information on active BFD sessions, use the net show bfd sessions detail
command (results are for an IPv6-connected peer):
----------------------------------------------------------------------
------------------
port peer state local type diag det
tx_timeout rx_timeout
mult
----------------------------------------------------------------------
------------------
swp1 fe80::202:ff:fe00:1 Up N/A singlehop N/A 3
300 900
swp1 3101:abc:bcad::2 Up N/A singlehop N/A 3
300 900
#continuation of output
---------------------------------------------------------------------
echo echo max rx_ctrl tx_ctrl rx_echo tx_echo
tx_timeout rx_timeout hop_cnt
---------------------------------------------------------------------
0 0 N/A 187172 185986 0 0
0 0 N/A 501 533 0 0
cumulusnetworks.com 675
Cumulus Linux 3.5 User Guide
Contents
This chapter covers ...
Understanding Equal Cost Routing (see page 676)
Understanding ECMP Hashing (see page 676)
Using cl-ecmpcalc to Determine the Hash Result (see page 677)
cl-ecmpcalc Limitations (see page 678)
ECMP Hash Buckets (see page 678)
Resilient Hashing (see page 680)
Resilient Hash Buckets (see page 681)
Removing Next Hops (see page 681)
Adding Next Hops (see page 683)
Configuring Resilient Hashing (see page 683)
As of Cumulus Linux 3.0, the BGP maximum-paths setting is enabled, so multiple routes are
installed by default. See the ECMP section (see page 637) of the BGP chapter for more
information.
To prevent out of order packets, ECMP hashing is done on a per-flow basis, which means that all packets
with the same source and destination IP addresses and the same source and destination ports always hash
to the same next hop. ECMP hashing does not keep a record of flow states.
ECMP hashing does not keep a record of packets that have hashed to each next hop and does not
guarantee that traffic sent to each next hop is equal.
cumulusnetworks.com 677
Cumulus Linux 3.5 User Guide
cl-ecmpcalc Limitations
cl-ecmpcalc can only take input interfaces that can be converted to a single physical port in the port tab
file, like the physical switch ports (swp). Virtual interfaces like bridges, bonds, and subinterfaces are not
supported.
cl-ecmpcalc is supported only on switches with the Spectrum, Tomahawk, Trident II+ and Trident II
chipsets.
A new next hop is added and a new hash bucket is created. As a result, the hash and hash bucket
assignment changed, causing the existing flows to be sent to different next hops.
cumulusnetworks.com 679
Cumulus Linux 3.5 User Guide
A next hop fails and the next hop and hash bucket are removed. The remaining next hops may be
reassigned.
In most cases, the modification of hash buckets has no impact on traffic flows as traffic is being forward to a
single end host. In deployments where multiple end hosts are using the same IP address (anycast), resilient
hashing must be used.
Resilient Hashing
In Cumulus Linux, when a next hop fails or is removed from an ECMP pool, the hashing or hash bucket
assignment can change. For deployments where there is a need for flows to always use the same next hop,
like TCP anycast deployments, this can create session failures.
The ECMP hash performed with resilient hashing is exactly the same as the default hashing mode. Only the
method in which next hops are assigned to hash buckets differs.
Resilient hashing supports both IPv4 and IPv6 routes.
Resilient hashing is not enabled by default. See below for steps on configuring it.
Resilient hashing prevents disruptions when new next hops are removed. It does not prevent
disruption when next hops are added.
Resilient hashing is supported only on switches with the Broadcom Tomahawk, Trident II+ and
Trident II as well as Mellanox Spectrum chipsets. You can run net show system to determine
the chipset.
cumulusnetworks.com 681
Cumulus Linux 3.5 User Guide
With 12 buckets assigned and four next hops, instead of reducing the number of buckets — which would
impact flows to known good hosts — the remaining next hops replace the failed next hop.
After the failed next hop is removed, the remaining next hops are installed as replacements. This prevents
impact to any flows that hash to working next hops.
As a result, some flows may hash to new next hops, which can impact anycast deployments.
An ECMP route counts as a single route with multiple next hops. The following example is
considered to be a single ECMP route:
All ECMP routes must use the same number of buckets (the number of buckets cannot be configured per
ECMP route).
The number of buckets can be configured as 64, 128, 256, 512 or 1024; the default is 128:
64 1024
128 512
cumulusnetworks.com 683
Cumulus Linux 3.5 User Guide
256 256
512 128
1024 64
A larger number of ECMP buckets reduces the impact on adding new next hops to an ECMP route.
However, the system supports fewer ECMP routes. If the maximum number of ECMP routes have been
installed, new ECMP routes log an error and are not installed.
To enable resilient hashing, edit /etc/cumulus/datapath/traffic.conf:
Redistribute Neighbor
Redistribute neighbor provides a mechanism for IP subnets to span racks without forcing the end hosts to
run a routing protocol.
The fundamental premise behind redistribute neighbor is to announce individual host /32 routes in the
routed fabric. Other hosts on the fabric can then use this new path to access the hosts in the fabric. If
multiple equal-cost paths (ECMP) are available, traffic can load balance across the available paths natively.
The challenge is to accurately compile and update this list of reachable hosts or neighbors. Luckily, existing
commonly-deployed protocols are available to solve this problem. Hosts use ARP to resolve MAC addresses
when sending to an IPv4 address. A host then builds an ARP cache table of known MAC addresses: IPv4
tuples as they receive or respond to ARP requests.
In the case of a leaf switch, where the default gateway is deployed for hosts within the rack, the ARP cache
table contains a list of all hosts that have ARP'd for their default gateway. In many scenarios, this table
contains all the layer 3 information that's needed. This is where redistribute neighbor comes in, as it is a
mechanism of formatting and syncing this table into the routing protocol.
Contents
This chapter covers ...
Availability (see page 685)
Target Use Cases and Best Practices (see page 685)
How It Works (see page 686)
Configuration Steps (see page 686)
Configuring the Leaf(s) (see page 686)
Configuring the Host(s) (see page 689)
Known Limitations (see page 690)
TCAM Route Scale (see page 690)
Possible Uneven Traffic Distribution (see page 690)
Silent Hosts Never Receive Traffic (see page 690)
Support for IPv4 Only (see page 690)
VRFs Are not Supported (see page 690)
Only 1024 Interfaces Supported (see page 690)
Troubleshooting (see page 690)
Verification (see page 693)
Availability
Redistribute neighbor is distributed as python-rdnbrd.
cumulusnetworks.com 685
Cumulus Linux 3.5 User Guide
How It Works
Redistribute neighbor works as follows:
1. The leaf/ToR switches learn about connected hosts when the host sends an ARP request or ARP
reply.
2. An entry for the host is added to the kernel neighbor table of each leaf switch.
3. The redistribute neighbor daemon, rdnbrd, monitors the kernel neighbor table and creates a /32
route for each neighbor entry. This /32 route is created in kernel table 10.
4. FRRouting is configured to import routes from kernel table 10.
5. A route-map is used to control which routes from table 10 are imported.
6. In FRRouting these routes are imported as table routes.
7. BGP, OSPF and so forth are then configured to redistribute the table 10 routes.
Configuration Steps
The following configuration steps are based on the reference topology set forth by Cumulus Networks.
Here is a diagram of the topology:
1. Configure the host facing ports, using the same IP address on both host-facing interfaces as well as a
/32 prefix. In this case, swp1 and swp2 are configured as they are the ports facing server01 and
server02:
Cumulus Networks
auto lo
iface lo inet loopback
address 10.0.0.11/32
auto swp1
iface swp1
address 10.0.0.11/32
auto swp2
iface swp2
address 10.0.0.11/32
4. Configure routing:
a. Define a route-map that matches on the host-facing interfaces:
c. Redistribute the imported table routes in into the appropriate routing protocol.
BGP:
cumulusnetworks.com 687
c.
OSPF:
auto lo:1
iface lo:1
address 10.1.0.101/32
auto eth1
iface eth1
address 10.1.0.101/32
post-up for i in {1..3}; do arping -q -c 1 -w 0 -i eth1
10.0.0.11; sleep 1; done
post-up ip route add 0.0.0.0/0 nexthop via 10.0.0.11 dev eth1
onlink nexthop via 10.0.0.12 dev eth2 onlink || true
auto eth2
iface eth2
address 10.1.0.101/32
post-up for i in {1..3}; do arping -q -c 1 -w 0 -i eth2
10.0.0.12; sleep 1; done
post-up ip route add 0.0.0.0/0 nexthop via 10.0.0.11 dev eth1
onlink nexthop via 10.0.0.12 dev eth2 onlink || true
Installing ifplugd
cumulusnetworks.com 689
Cumulus Linux 3.5 User Guide
Installing ifplugd
Additionally, install and use ifplugd (see page 389). ifplugd modifies the behavior of the Linux routing
table when an interface undergoes a link transition (carrier up/down). The Linux kernel by default leaves
routes up even when the physical interface is unavailable (NO-CARRIER).
After you install ifplugd, edit /etc/default/ifplugd as follows, where eth1 and eth2 are the interface
names that your host uses to connect to the leaves.
Known Limitations
Troubleshooting
690 02 March 2018
Cumulus Networks
Troubleshooting
How do I determine if rdnbrd (the redistribute neighbor daemon) is running?
Use systemd to check:
# If a host does not send an ARP reply for holdtime consider the
host down
holdtime = 3
cumulusnetworks.com 691
Cumulus Linux 3.5 User Guide
unicast_arp_requests = True
cumulus@leaf01:~$ sudo systemctl restart rdnbrd.service
Read more information on Linux route tables, or you can read the Ubuntu man pages for ip route.
How do I determine that the /32 redistribute neighbor routes are being advertised to my
neighbor?
For BGP, check the advertised routes to the neighbor.
Verification
The following workflow can be used to verify that the kernel routing table is being correctly populated, and
that routes are being correctly imported/advertised:
1. Verify that ARP neighbour entries are being populated into the Kernel routing table 10.
Both the > and * should be present so that table 10 routes are installed as preferred into the routing
table. If the routes are not being installed, verify the following:
The imported distance of the locally imported kernel routes using the ip import 10
distance X command, where X is not less than the adminstrative distance of the routing
protocol. If the distance is too low, routes learned from the protocol may overwrite the locally
imported routes.
The routes are in the kernel routing table.
3. Confirm that routes are in the BGP/OSPF database and being advertised.
cumulusnetworks.com 693
Cumulus Linux 3.5 User Guide
The primary use cases for VRF in a data center are similar to VLANs at layer 2: using common physical
infrastructure to carry multiple isolated traffic streams for multi-tenant environments, where these streams
are allowed to cross over only at configured boundary points, typically firewalls or IDS. You can also use it to
burst traffic from private clouds to enterprise networks where the burst point is at layer 3. Or you can use it
in an OpenStack deployment.
VRF is fully supported in the Linux kernel, so it has the following characteristics:
The VRF is presented as a layer 3 master network device with its own associated routing table.
The layer 3 interfaces (VLAN interfaces, bonds, switch virtual interfaces/SVIs) associated with the VRF
are enslaved to that VRF; IP rules direct FIB (forwarding information base) lookups to the routing
table for the VRF device.
The VRF device can have its own IP address, known as a VRF-local loopback.
Applications can use existing interfaces to operate in a VRF context — by binding sockets to the VRF
device or passing the ifindex using cmsg. By default, applications on the switch run against the
default VRF. Services started by systemd run in the default VRF unless the VRF instance is used. If
management VRF (see page 717) is enabled, logins to the switch default to the management VRF.
This is a convenience for users to not have to specify management VRF for each command.
Listen sockets used by services are VRF-global by default unless the application is configured to use
a more limited scope — for example, read about services in the management VRF (see page 719).
Connected sockets (like TCP) are then bound to the VRF domain in which the connection originates.
The kernel provides a sysctl that allows a single instance to accept connections over all VRFs. For
TCP, connected sockets are bound to the VRF the first packet was received. This sysctl is enabled for
Cumulus Linux.
Connected and local routes are placed in appropriate VRF tables.
Neighbor entries continue to be per-interface, and you can view all entries associated with the VRF
device.
A VRF does not map to its own network namespace; however, you can nest VRFs in a network
namespace.
You can use existing Linux tools to interact with it, such as tcpdump.
Cumulus Linux supports up to 64 VRFs on a switch.
You configure VRF by associating each subset of interfaces to a VRF routing table, and configuring an
instance of the routing protocol — BGP or OSPFv2 — for each routing table.
Contents
This chapter covers ...
Configuring VRF (see page 695)
Specifying a Table ID (see page 696)
Configure Route Leaking (see page 696)
Bringing a VRF Up after Downing It with ifdown (see page 697)
Using the vrf Command (see page 697)
Services in VRFs (see page 698)
FRRouting Operation in a VRF (see page 699)
Example BGP and OSPF Configurations (see page 700)
Example Commands to Show VRF Data (see page 702)
Showing VRF Data Using NCLU Commands (see page 702)
Showing VRF Data Using FRRouting Commands (see page 704)
Showing VRF Data Using ip Commands (see page 706)
Using BGP Unnumbered Interfaces with VRF (see page 710)
Using DHCP with VRF (see page 712)
Caveats for DHCP with VRF (see page 713)
Example Configuration (see page 713)
Using ping or traceroute (see page 716)
Caveats and Errata (see page 717)
Configuring VRF
cumulusnetworks.com 695
Cumulus Linux 3.5 User Guide
Configuring VRF
Each routing table is called a VRF table, and has its own table ID. You configure VRF using NCLU (see page
82), then place the layer 3 interface in the VRF. You can have a maximum of 64 VRFs on a switch.
When you configure a VRF, you follow a similar process to other network interfaces. Keep in mind the
following for a VRF table:
It can have an IP address, a loopback interface for the VRF.
Associated rules are added automatically.
You can also add a default route to avoid skipping across tables when the kernel forwards the
packet.
Names for VRF tables can be up to 15 characters. However, you cannot use the name mgmt, as this
name can only be used for management VRF (see page 717).
To configure a VRF, run:
These commands result in the following VRF configuration in the /etc/network/interfaces file:
auto red
iface red
vrf-table auto
auto swp1
iface swp1
vrf red
Specifying a Table ID
Instead of having Cumulus Linux assign a table ID for the VRF table, you can specify your own table ID in the
configuration. The table ID to name mapping is saved in /etc/iproute2/rt_tables.d/ for name-
based references. So instead of using the auto option above, specify the table ID like this:
If you do specify a table ID, it must be in the range of 1001 to 1255 which is reserved in Cumulus
Linux for VRF table IDs.
If you are trying to leak a static route in a VRF, the operation is handled in software and is not
hardware accelerated, so there is a noticeable performance impact.
VRF Table
---------------- -----
red 1016
To return a list of processes and PIDs associated with a specific VRF table, run vrf task list <vrf-
name>. For example:
VRF: red
-----------------------
cumulusnetworks.com 697
Cumulus Linux 3.5 User Guide
dhclient 2508
sshd 2659
bash 2681
su 2702
bash 2720
vrf 2829
To determine which VRF table is associated with a particular PID, run vrf task identify <pid>. For
example:
red
You should manage long-running services with systemd using the service@vrf notation; for example,
systemctl start ntp@mgmt. systemd-based services are stopped when a VRF is deleted and started
when the VRF is created. For example, restarting networking or running an ifdown/ifup sequence.
Services in VRFs
For services that need to run against a specific VRF, Cumulus Linux uses systemd instances, where the
instance is the VRF. In general, you start a service within a VRF like this:
For example, you can run the NTP service in the blue VRF using:
In most cases, the instance running in the default VRF needs to be stopped before a VRF instance can start.
This is because the instance running in the default VRF owns the port across all VRFs — that is, it is VRF
global. systemd-based services are stopped when the VRF is deleted and started when the VRF is created.
For example, when you restart networking or run an ifdown/ifup sequence — as mentioned above. The
management VRF chapter (see page 719) details how to do this.
In Cumulus Linux, the following services work with VRF instances:
698 02 March 2018
Cumulus Networks
There are cases where systemd instances do not work; you must use a service-specific
configuration option instead. For example, you can configure rsyslogd to send messages to
remote systems over a VRF:
VRFs are provisioned using NCLU. VRFs can be pre-provisioned in FRRouting too, but they become active
cumulusnetworks.com 699
Cumulus Linux 3.5 User Guide
VRFs are provisioned using NCLU. VRFs can be pre-provisioned in FRRouting too, but they become active
only when configured with NCLU.
You pre-provision a VRF in FRRouting by running the command vrf vrf-name.
A BGP instance corresponding to a VRF can be pre-provisioned by configuring net add bgp vrf
<VRF> autonomous-system <ASN>. Under this context, all existing BGP parameters can be
configured: neighbors, peer-groups, address-family configuration, redistribution, and so forth.
An OSPFv2 instance can be configured using the net add ospf vrf <VRF> command; as with
BGP, all OSPFv2 parameters can be configured.
Static routes (IPv4 and IPv6) can be provisioned in a VRF by specifying the VRF along with the static
route configuration. For example, ip route prefix dev vrf vrf-name. The VRF has to exist
for this configuration to be accepted — either already defined through /etc/network
/interfaces or pre-provisioned in FRRouting. If you want to leak a static route in a VRF, see the
note above (see page 697).
cumulusnetworks.com 701
Cumulus Linux 3.5 User Guide
However, to show BGP IPv6 routes in the VRF, you need to use vtysh, the FRRouting CLI:
cumulusnetworks.com 703
Cumulus Linux 3.5 User Guide
VRF vrf1012:
O>* 6.0.0.1/32 [110/210] via 200.254.2.10, swp2s0.2, 00:13:30
* via 200.254.2.14, swp2s1.2, 00:13:30
* via 200.254.2.18, swp2s2.2, 00:13:30
O>* 6.0.0.2/32 [110/210] via 200.254.2.10, swp2s0.2, 00:13:30
* via 200.254.2.14, swp2s1.2, 00:13:30
* via 200.254.2.18, swp2s2.2, 00:13:30
O>* 9.9.12.5/32 [110/20] via 200.254.2.10, swp2s0.2, 00:13:29
* via 200.254.2.14, swp2s1.2, 00:13:29
* via 200.254.2.18, swp2s2.2, 00:13:29
Show VRFs configured in BGP, including the default. A non-zero ID is a VRF that has also been actually
provisioned — that is, defined in /etc/network/interfaces:
inet6 fe80::202:ff:fe00:a/64
ND advertised reachable time is 0 milliseconds
ND advertised retransmit interval is 0 milliseconds
ND router advertisements are sent every 600 seconds
ND router advertisements lifetime tracks ra-interval
ND router advertisement default router preference is medium
Hosts use stateless autoconfig for addresses.
switch# exit
cumulus@switch:~$
cumulusnetworks.com 705
Cumulus Linux 3.5 User Guide
switch# exit
cumulus@switch:~$
To see the routing table for each VRF, use the show up route vrf all command. The OSPF route is
denoted in the row that starts with O:
cumulusnetworks.com 707
Cumulus Linux 3.5 User Guide
To see a list of links associated with a particular VRF table, run ip link list <vrf-name>. For
example:
VRF: red
--------------------
swp1.10@swp1 UP 6c:64:1a:00:5a:0c <BROADCAST,
MULTICAST,UP,LOWER_UP>
swp2.10@swp2 UP 6c:64:1a:00:5a:0d <BROADCAST,
MULTICAST,UP,LOWER_UP>
To see a list of routes associated with a particular VRF table, run ip route list <vrf-name>. For
example:
VRF: red
--------------------
unreachable default metric 8192
10.1.1.0/24 via 10.10.1.2 dev swp2.10
10.1.2.0/24 via 10.99.1.2 dev swp1.10
broadcast 10.10.1.0 dev swp2.10 proto kernel scope link src
10.10.1.1
10.10.1.0/28 dev swp2.10 proto kernel scope link src 10.10.1.1
local 10.10.1.1 dev swp2.10 proto kernel scope host src 10.10.1.1
broadcast 10.10.1.15 dev swp2.10 proto kernel scope link src
10.10.1.1
broadcast 10.99.1.0 dev swp1.10 proto kernel scope link src
10.99.1.1
10.99.1.0/30 dev swp1.10 proto kernel scope link src 10.99.1.1
local 10.99.1.1 dev swp1.10 proto kernel scope host src 10.99.1.1
broadcast 10.99.1.3 dev swp1.10 proto kernel scope link src
10.99.1.1
cumulusnetworks.com 709
Cumulus Linux 3.5 User Guide
You can also show routes in a VRF using ip [-6] route show vrf <name>. This command
omits local and broadcast routes, which can clutter the output.
1. Configure the BGP unnumbered configuration. The BGP unnumbered configuration is the same for a
non-VRF, applied under the VRF context (router bgp asn vrf <vrf-name>).
auto swp1
iface swp1
link-autoneg on
link-speed 10000
vrf vrf1
auto bridge
iface bridge
bridge-ports vlan101
bridge-vids 101
bridge-vlan-aware yes
auto vlan101
iface vlan101
address 20.1.6.1/24
address 2001:20:1:6::1/80
vlan-id 101
vlan-raw-device bridge
auto vrf1
iface vrf1
address 6.1.0.6/32
address 2001:6:1::6/128
vrf-table auto
!
router bgp 65001 vrf vrf1
no bgp default ipv4-unicast
bgp bestpath as-path multipath-relax
cumulusnetworks.com 711
Cumulus Linux 3.5 User Guide
To enable the service at boot time you should also run systemctl enable <service>@<vrf-name>.
To continue with the previous example:
In addition, you need to create a separate default file in /etc/default for every instance of a DHCP
server and/or relay in a non-default VRF; this is where you set the server and relay options. To run multiple
instances of any of these services, you need a separate file for each instance. The files must be named as
follows:
isc-dhcp-server-<vrf-name>
isc-dhcp-server6-<vrf-name>
isc-dhcp-relay-<vrf-name>
isc-dhcp-relay6-<vrf-name>
712 02 March 2018
Cumulus Networks
isc-dhcp-relay6-<vrf-name>
See the example configuration below for more details.
Example Configuration
In the following example, there is one IPv4 network with a VRF named red and one IPv6 network with a VRF
named blue.
The IPv4 DHCP server/relay network looks like this: The IPv6 DHCP server/relay network looks like this:
cumulusnetworks.com 713
Cumulus Linux 3.5 User Guide
You can create this configuration using You can create this configuration using
the vrf command (see above (see page the vrf command (see above (see page
698) for more details): ) for more details):
cumulusnetworks.com 715
Cumulus Linux 3.5 User Guide
Or:
Management VRF
Management VRF is a subset of VRF (see page 693) (virtual routing tables and forwarding) and provides a
separation between the out-of-band management network and the in-band data plane network. For all
VRFs, the main routing table is the default table for all of the data plane switch ports. With management
VRF, a second table, mgmt, is used for routing through the Ethernet ports of the switch. The mgmt name is
special cased to identify the management VRF from a data plane VRF. FIB rules are installed for DNS servers
because this is the typical deployment case.
Cumulus Linux only supports eth0 as the management interface, or eth1, depending on the switch
platform. The Ethernet ports are software-only parts that are not hardware accelerated by switchd. VLAN
subinterfaces, bonds, bridges, and the front panel switch ports are not supported as management
interfaces.
When management VRF is enabled, logins to the switch are set into the management VRF context. IPv4 and
IPv6 networking applications (for example, Ansible, Chef, and apt-get) run by an administrator
communicate out the management network by default. This default context does not impact services run
through systemd and the systemctl command, and does not impact commands examining the state of
the switch, such as the ip command to list links, neighbors, or routes.
The management VRF configurations in this chapter contain a localhost loopback IP address
(127.0.0.1/8). Adding the loopback address to the L3 domain of the management VRF prevents
issues with applications that expect the loopback IP address to exist in the VRF, such as NTP.
Contents
This chapter covers ...
Enabling Management VRF (see page 718)
Bringing the Management VRF Up after Downing It with ifdown (see page 719)
Running Services within the Management VRF (see page 719)
Enabling Polling with snmpd in a Management VRF (see page 721)
Enabling hsflowd (see page 721)
Using ping or traceroute (see page 722)
The management VRF must be named mgmt to differentiate from a data plane VRF.
The NCLU commands above create the following snippets in the /etc/network/interfaces file:
...
auto eth0
iface eth0 inet dhcp
vrf mgmt
...
auto mgmt
iface mgmt
address 127.0.0.1/8
vrf-table auto
...
When you commit the change to add the management VRF, all connections over eth0 are
dropped. This can impact any automation that might be running, such as Ansible or Puppet
scripts.
Running ifreload -a disconnects the session for any interface configured as auto.
cumulusnetworks.com 719
Cumulus Linux 3.5 User Guide
Some applications can work across all VRFs. The kernel provides a sysctl that allows a single instance to
accept connections over all VRFs. For TCP, connected sockets are bound to the VRF on which the first
packet is received. This sysctl is enabled for Cumulus Linux.
To enable a service to run in the management VRF, do the following. These steps use the NTP service, but
you can use any of the services listed above, except for dhcrelay (discussed here (see page 264)) and
hsflowd (discussed below (see page 721)).
1. Configure the management VRF as described in the Enabling Management VRF section above (see
page 717).
2. If NTP is running, stop the service:
After you enable ntp@mgmt, you can verify that NTP peers are active:
Enabling hsflowd
If you are using sFlow to monitor traffic in the management VRF, you need to complete the following steps
to enable sFlow.
1. Add the hsflowd process to the systemd configuration file in /etc/vrf. Edit the /etc/vrf
/systemd.conf file with a text editor.
3. Disable hsflowd to ensure it does not start in the default VRF if the system is rebooted:
cumulusnetworks.com 721
4.
Cumulus Linux 3.5 User Guide
Or:
1. Copy the original service file to its new name and store the file in /etc/systemd/system.
Cumulus Networks
2. If there is a User directive, comment it out. If it exists, you can find it under [Service].
[Unit]
Description=Example
Documentation=https://www.example.io/
[Service]
#User=username
ExecStart=/usr/local/bin/myservice agent -data-dir=/tmp
/myservice -bind=192.168.0.11
[Install]
WantedBy=multi-user.target
3. Modify the ExecStart line to /usr/bin/vrf exec mgmt /sbin/runuser -u USER -- COMMAND
. For example, to have the cumulus user run the foocommand:
[Unit]
Description=Example
Documentation=https://www.example.io/
[Service]
#User=username
ExecStart=/usr/bin/vrf task exec mgmt /sbin/runuser -u cumulus
-- foocommand
[Install]
WantedBy=multi-user.target
^O
^X
cumulus@switch:~$
cumulusnetworks.com 723
Cumulus Linux 3.5 User Guide
This also creates a route on the neighbor device to the management network through the data
plane, which might not be desired.
Cumulus Networks recommends you always use route maps to control the advertised networks
redistributed by the redistribute connected command. For example, you can specify a route map to
redistribute routes in this way (for both BGP and OSPF):
These commands produce the following configuration snippet in the /etc/frr/frr.conf file:
<routing protocol>
redistribute connected route-map REDISTRIBUTE-CONNECTED
To get the route for any VRF, run the following command:
The management VRF interface class is not supported if you are configuring Cumulus Linux using
NCLU (see page 82).
You configure the management interface in the /etc/network/interfaces file. In the example below,
the management interface, eth0 and the management VRF stanzas are added to the mgmt interface class:
auto lo
iface lo inet loopback
cumulusnetworks.com 725
Cumulus Linux 3.5 User Guide
allow-mgmt eth0
iface eth0 inet dhcp
vrf mgmt
allow-mgmt mgmt
iface mgmt
address 127.0.0.1/8
vrf-table auto
When you run ifupdown2 commands against the interfaces in the mgmt class, include --allow=mgmt
with the commands. For example, to see which interfaces are in the mgmt interface class, run:
You can still bring the management interface up and down using ifup eth0 and ifdown eth0.
nameserver 192.0.2.1
nameserver 198.51.100.31 # vrf mgmt
nameserver 203.0.113.13 # vrf mgmt
Nameservers configured through DHCP are updated automatically, Statically configured nameservers
(configured in the /etc/resolv.conf file) only get updated when you run ifreload -a.
Because DNS lookups are forced out of the management interface using FIB rules, this might
affect data plane ports if overlapping addresses are used. For example, when the DNS server IP
address is learned over the management VRF, a FIB rule is created for that IP address. When
DHCP relay is configured for the same IP address, a DHCP discover packet received on the front
panel port is forwarded out of the management interface (eth0) even though a route is present
out the front-panel port.
Management VRF has replaced the management namespace functionality in Cumulus Linux. The
management namespace feature (used with the cl-ns-mgmt utility) has been deprecated, and
the cl-ns-mgmt command has been removed.
Contents
This chapter covers ...
PIM Overview (see page 728)
PIM Messages (see page 730)
PIM Neighbors (see page 732)
PIM Sparse Mode (PIM-SM) (see page 732)
Any-source Multicast Routing (see page 732)
PIM Null-Register (see page 736)
PIM and ECMP (see page 737)
Configuring PIM (see page 737)
Configuring PIM Using FRRouting (see page 739)
Example Configurations (see page 740)
Source Specific Multicast Mode (SSM) (see page 742)
IP Multicast Boundaries (see page 743)
Multicast Source Discovery Protocol (MSDP) (see page 744)
Verifying PIM (see page 745)
Source Starts First (see page 745)
Receiver Joins First (see page 748)
PIM in a VRF (see page 749)
BFD for PIM Neighbors (see page 752)
PIM Overview
Network Description
Element
First Hop The FHR is the router attached to the source. The FHR is responsible for the PIM register
Router process.
(FHR)
Last Hop The LHR is the last router in the path, attached to an interested multicast receiver. There is
Router a single LHR for each network subnet with an interested receiver, however multicast
(LHR) groups can have multiple LHRs throughout the network.
Rendezvous The RP allows for the discovery of multicast sources and multicast receivers. The RP is
Point (RP) responsible for sending PIM Register Stop messages to FHRs. The PIM RP address must be
globally routable.
Network Description
Element
Do not use a spine switch as an RP. If you're running BGP (see page 633) on a
spine switch and it's configured for allow-as in origin, BGP does not accept
routes learned through other spines that did not originate on the spine itself.
The RP must route to a multicast source. During a single failure scenario, this
would not be possible if the RP were on the spine. This also applies to Multicast
Source Discovery Protocol (MSDP — see below (see page 727)).
PIM Shared The Shared Tree is the multicast tree rooted at the RP. When receivers wish to join a
Tree (RP multicast group, join messages are sent along the shared tree towards the RP.
Tree) or (*,
G) Tree
PIM The SPT is the multicast tree rooted at the multicast source for a given group. Each
Shortest multicast source will have a unique SPT. The SPT may match the RP Tree, but this is not a
Path Tree requirement. The SPT represents the most efficient way to send multicast traffic from a
(SPT) or (S, source to the interested receivers.
G) Tree
Outgoing The outgoing interface indicates the interface a PIM or multicast packet should be sent out
Interface on. OIFs are the interfaces towards the multicast receivers.
(OIF)
Incoming The incoming interface indicates the interface a multicast packet should be received on.
Interface IIFs can be the interfaces towards the source or towards the RP.
(IIF)
Reverse Reverse path forwarding interface is the path used to reach the RP or source. There must
Path be a valid PIM neighbor to determine the RPF unless directly connected to source.
Forwarding
Interface
(RPF
Interface)
Multicast A multicast route indicates the multicast source and multicast group as well as associated
Route OIFs, IIFs, and RPF information.
(mroute)
Star-G The (*,G) mroute represents the RP Tree. The * is a wildcard indicating any multicast
mroute (*, source. The G is the multicast group. An example (*,G) would be (*, 239.1.2.9).
G)
S-G mroute This is the mroute representing the source entry. The S is the multicast source IP. The G is
(S,G) the multicast group. An example (S,G) would be (10.1.1.1, 239.1.2.9).
cumulusnetworks.com 729
Cumulus Linux 3.5 User Guide
PIM Messages
PIM Description
Message
PIM Hello PIM hellos announce the presence of a multicast router on a segment. PIM hellos are sent
every 30 seconds by default.
PIM Join PIM J/P messages indicate the groups that a multicast router would like to receive or no
/Prune (J/P) longer receive. Often PIM join/prune messages are described as distinct message types,
but are actually a single PIM message with a list of groups to join and a second list of
groups to leave. PIM J/P messages can be to join or prune from the SPT or RP trees (also
called (*,G) joins or (S,G) joins).
PIM join/prune messages are sent to PIM neighbors on individual interfaces. Join
/prune messages are never unicast.
PIM Description
Message
This PIM join/prune is for group 239.1.1.9, with 1 join and 0 prunes for the group. Join
/prunes for multiple groups can exist in a single packet.
PIM Register PIM register messages are unicast packets sent from a FHR destined to the RP to
advertise a multicast group. The FHR fully encapsulates the original multicast packet in a
PIM register messages. The RP is responsible for decapsulating the PIM register message
and forwarding it along the (*,G) tree towards the receivers.
PIM Null PIM null register is a special type of PIM register message where the "Null-Register" flag is
Register set within the packet. Null register messages are used for an FHR to signal to an RP that a
source is still sending multicast traffic. Unlike normal PIM register messages, null register
messages do not encapsulate the original data packet.
PIM Register PIM register stop messages are sent by an RP to the FHR to indicate that PIM register
Stop messages should no longer be sent.
IGMP IGMP membership reports are sent by multicast receivers to tell multicast routers of their
Membership interest in a specific multicast group. IGMP join messages trigger PIM *,G joins. IGMP
Report version 2 queries are sent to the all hosts multicast address, 224.0.0.1. IGMP version 2
(IGMP Join) reports (joins) are sent to the group's multicast address. IGMP version 3 messages are
sent to an IGMP v3 specific multicast address, 224.0.0.22.
IGMP Leave IGMP leaves tell a multicast router that a multicast receiver no longer wants the multicast
group. IGMP leave messages trigger PIM *,G prunes.
cumulusnetworks.com 731
Cumulus Linux 3.5 User Guide
PIM Neighbors
When PIM is configured on an interface, PIM Hello messages are sent to the link local multicast group
224.0.0.13. Any other router configured with PIM on the segment that hears the PIM Hello messages will
build a PIM neighbor with the sending device.
PIM neighbors are stateless. No confirmation of neighbor relationship is exchanged between PIM
endpoints.
This behavior is in contrast to PIM Dense Mode (PIM-DM), where traffic is flooded, and the
network must be periodically notified that the receiver wishes to stop receiving the multicast
stream.
PIM-SM has three configuration options: Any-source Multicast (ASM), Bi-directional Multicast (BiDir), and
Source Specific Multicast (SSM):
Any-source Mulitcast (ASM) is the traditional, and most commonly deployed PIM implementation.
ASM relies on rendezvous points to connect multicast senders and receivers that then dynamically
determine the shortest path through the network between source and receiver, to efficiently send
multicast traffic.
Bidirectional PIM (BiDir) forwards all traffic through the multicast rendezvous point (RP), rather than
tracking multicast source IPs, allowing for greater scale, while resulting in inefficient forwarding of
network traffic.
Source Specific Multicast (SSM) requires multicast receivers to know exactly which source they wish
to receive multicast traffic from, rather than relying on multicast rendezvous points. SSM requires
the use of IGMPv3 on the multicast clients.
Cumulus Linux only supports ASM and SSM. PIM BiDir is not currently supported.
This creates a (*,G) mroute, with an OIF of the interface on which the IGMP Membership Report was
received and an IIF of the RPF interface for the RP.
The LHR generates a PIM (*,G) join message, and sends it from the interface towards the RP. Each multicast
router between the LHR and the RP will build a (*,G) mroute with the OIF being the interface on which the
PIM join message was received and an Incoming Interface of the reverse path forwarding interface for the
RP.
When the RP receives the (*,G) Join message, it will not send any additional PIM join messages.
The RP will maintain a (*,G) state as long as the receiver wishes to receive the multicast group.
cumulusnetworks.com 733
Cumulus Linux 3.5 User Guide
Unlike multicast receivers, multicast sources do not send IGMP (or PIM) messages to the FHR. A
multicast source begins sending and the FHR will receive the traffic and build both a (*,G) and an
(S,G) mroute. The FHR will then begin the PIM register process.
cumulusnetworks.com 735
1.
Cumulus Linux 3.5 User Guide
The configured prefix-list can be viewed with the net show mroute command:
In the example above, 235.0.0.0 has been configured for SPT switchover, identified by pimreg.
PIM register messages are sourced from the interface that received the multicast traffic and are
destined to the RP address. The PIM register is not sourced from the interface towards the RP.
PIM Null-Register
In order to notify the RP that multicast traffic is still flowing when the RP has no receiver, or if the RP is not
on the SPT tree, the FHR periodically sends PIM null register messages. The FHR sends a PIM register with
the Null-Register flag set, but without any data. This special PIM register notifies the RP that a multicast
source is still sending, should any new receivers come online.
After receiving a PIM Null-Register, the RP immediately sends a PIM register stop to acknowledge the
reception of the PIM null register message.
The ip pim ecmp rebalance command recalculates all stream paths in the event of a loss of path over
one of the ECMP paths. Without this command, only the streams that were using the path that was lost are
moved to alternate ECMP paths. Rebalance does not affect existing groups.
The show ip pim nexthop provides you with a way to review which nexthop is selected for a specific
source/group:
Configuring PIM
To configure PIM using NCLU:
cumulusnetworks.com 737
Cumulus
1. Linux 3.5 User Guide
PIM must be enabled on all interfaces facing multicast sources or multicast receivers, as
well as on the interface where the RP address is configured.
2. Optional: Run the following command to enable IGMP (either version 2 or 3) on the interfaces with
hosts attached. IGMP version 3 is the default, so you only need to specify the version if you want to
use IGMP version 2:
Unless you are using PIM SSM, each PIM-SM enabled device must configure a static RP to a
group mapping, and all PIM-SM enabled devices must have the same RP to group mapping
configuration.
IP PIM RP group ranges can overlap. Cumulus Linux performs a longest prefix match (LPM)
to determine the RP. For example:
In this example, if the group is in 224.10.2.5, the RP that gets selected is 192.168.0.2. If the
group is 224.10.15, the RP that gets selected is 192.168.0.1.
zebra=yes
pimd=yes
4. In a terminal, run the vtysh command to start the FRRouting CLI on the switch.
PIM must be enabled on all interfaces facing multicast sources or multicast receivers, as
well as on the interface where the RP address is configured.
6. Optional: Run the following commands to enable IGMP (either version 2 or 3) on the interfaces with
hosts attached. IGMP version 3 is the default, so you only need to specify the version if you want to
use IGMP version 2:
cumulusnetworks.com 739
Cumulus Linux 3.5 User Guide
Each PIM-SM enabled device must configure a static RP to a group mapping, and all PIM-
SM enabled devices must have the same RP to group mapping configuration.
IP PIM RP group ranges can overlap. Cumulus Linux performs a longest prefix match (LPM)
to determine the RP. For example:
Example Configurations
Complete Multicast Network Configuration Example
The following is example configuration:
RP Configuration
interface swp2
description interface to LHR
ip ospf area 0.0.0.0
ip ospf network point-to-point
ip pim sm
!
router ospf
ospf router-id 192.168.0.1
!
line vty
!
end
FHR Configuration
cumulusnetworks.com 741
Cumulus Linux 3.5 User Guide
LHR Configuration
PIM considers 232.0.0.0/8 the default range if the ssm range is not configured. If this default is
overridden with a prefix-list, all ranges that should be considered must be in the prefix-list
IP Multicast Boundaries
Multicast boundaries enable network administrators to limit the distribution of multicast traffic by setting
boundaries with the goal of pushing multicast to a subset of the network.
With such boundaries in place, any incoming IGMP or PIM joins are dropped or accepted based upon the
prefix-list specified. The boundary is implemented by applying an IP multicast boundary OIL (outgoing
interface list) on an interface.
To configure the boundary, use NCLU:
cumulusnetworks.com 743
Cumulus Linux 3.5 User Guide
To configure the boundary, use NCLU:
Cumulus Linux MSDP support is primarily for anycast-RP configuration, rather than multiple
multicast domains. Each MSDP peer must be configured in a full mesh, as SA messages are not
received and re-forwarded.
The following steps demonstrate configuring a Cumulus switch to use the MSDP:
1. Add an anycast IP address to the loopback interface for each RP in the domain:
2. On every multicast switch, configure the group to RP mapping using the anycast address:
3. Configure the MSDP mesh group for all active RPs (the following example uses 3 RPs):
The mesh group should include all RPs in the domain as members, with a unique address
as the source. This configuration results in MSDP peerings between all RPs.
4. Pick the local loopback address as the source of the MSDP control packets:
If the network is unnumbered and uses unnumbered BGP as the IGP, avoid using the anycast IP
address for establishing unicast or multicast peerings. For PIM-SM, ensure that the unique
address is used as the PIM hello source by setting the source:
Verifying PIM
The following outputs are based on the Cumulus Reference Topology with cldemo-pim.
Use the net show mroute command (or show ip mroute in FRR) to review detailed output for the FHR:
cumulusnetworks.com 745
Cumulus Linux 3.5 User Guide
Use the net show mroute command (or show ip mroute in FRR) to review detailed output for the FHR:
On the RP, no mroute state is created, but the net show pim upstream output includes the S,G:
As a receiver joins the group, the mroute output interface on the FHR transitions from "none" to the RPF
interface of the RP:
cumulusnetworks.com 747
Cumulus Linux 3.5 User Guide
On the RP:
PIM in a VRF
VRFs (see page 693) divide the routing table on a per-tenant basis, ultimately providing for separate layer 3
networks over a single layer 3 infrastructure. With a VRF, each tenant has its own virtualized layer 3
network, so IP addresses can overlap between tenants.
PIM in a VRF enables PIM trees and multicast data traffic to run inside a layer 3 virtualized network, with a
separate tree per domain or tenant. Each VRF has its own multicast tree with its own RP(s), sources, and so
forth. Thus you can have one tenant per corporate division, client or product, for example.
VRFs on different switches typically connect or are peered over subinterfaces, where each subinterface is in
its own VRF, provided MP-BGP VPN is not enabled or supported.
To configure PIM in a VRF, run the following commands. First, add the VRFs and associate them with switch
ports:
Then add the PIM configuration to FRR, review and commit the changes:
cumulusnetworks.com 749
Cumulus Linux 3.5 User Guide
These commands create the following configuration in the /etc/network/interfaces file and the /etc
/frr/frr.conf file:
auto purple
iface purple
vrf-table auto
auto blue
iface blue
vrf-table auto
auto swp1
iface swp1
vrf purple
auto swp49.1
iface swp49.1
vrf purple
auto swp2
iface swp2
vrf blue
auto swp49.2
iface swp49.2
vrf blue
...
vrf purple
ip pim rp 192.168.0.1 224.0.0.0/4
!
vrf blue
ip pim rp 192.168.0.1 224.0.0.0/4
int swp49.2
ip pim sm
cumulusnetworks.com 751
Cumulus Linux 3.5 User Guide
Troubleshooting PIM
1. Validate that the FHR can reach the RP. If the RP and FHR can not communicate, the registration
process fails:
Cumulus Networks
2. On the RP, use tcpdump to see if the PIM register packets are arriving:
3. If PIM registration packets are being received, verify that they are seen by PIM by issuing debug
pim packets from within FRRouting:
4. Repeat the process on the FHR to see if PIM register stop messages are being received on the FHR
and passed to the PIM process:
cumulusnetworks.com 753
Cumulus Linux 3.5 User Guide
To troubleshoot this issue, if both PIM and IGMP are enabled, ensure that IGMPv3 joins are being sent by
the receiver:
3. If PIM is configured, verify that the RPF interface for the source matches the interface the multicast
754 02 March 2018
Cumulus Networks
3. If PIM is configured, verify that the RPF interface for the source matches the interface the multicast
traffic is received on:
This is expected behavior. The active source can be seen on the RP with show ip pim upstream:
For Mellanox chipsets, please refer to TCAM Resource Profiles for Mellanox Switches (see page 593).
Monitoring
756 and Troubleshooting 02 March 2018
Cumulus Networks
Contents
This chapter covers ...
Using the Serial Console (see page 757)
Configuring the Serial Console on ARM Switches (see page 757)
Configuring the Serial Console on x86 Switches (see page 758)
Getting General System Information (see page 759)
Diagnostics Using cl-support (see page 759)
Sending Log Files to a syslog Server (see page 760)
Using NCLU (see page 760)
Logging Technical Details (see page 760)
Local Logging (see page 761)
Enabling Remote syslog (see page 762)
Writing to syslog with Management VRF Enabled (see page 763)
Rate-limiting syslog Messages (see page 763)
Harmless syslog Error: Failed to reset devices.list (see page 764)
Syslog Troubleshooting Tips (see page 764)
Next Steps (see page 767)
You must reboot the switch for the baudrate change to take effect.
Incorrect configuration settings in grub can cause the switch to be inaccessible via the console.
Grub changes should be carefully reviewed before implementation.
1. Edit /etc/default/grub. The two relevant lines in /etc/default/grub are as follows; replace
the 115200 value with a valid value specified above in the --speed variable in the first line and in
the console variable in the second line:
2. After you save your changes to the grub configuration, type the following at the command prompt:
cumulus@switch:~$ update-grub
3. If you plan on accessing your switch's BIOS over the serial console, you need to update the baud rate
in the switch BIOS. For more information, see this knowledge base article.
4. Reboot the switch.
For general information about the switch, run net show system, which gathers information about the
switch from a number of files in the system:
cumulusnetworks.com 759
Cumulus Linux 3.5 User Guide
Args:
[reason]: Optional reason to give for invoking cl-support.
Saved into tarball's cmdline.args file.
Options:
-h: Print this usage statement
-s: Security sensitive collection
-t: User filename tag
-v: Verbose
-e MODULES: Enable modules. Comma separated module list (run with -e
help for module names)
-d MODULES: Disable modules. Comma separated module list (run with -d
help for module names)
Using NCLU
The remote syslog server can be configured on the switch using the following configuration:
cumulus@switch:~$ net add syslog host ipv4 192.168.0.254 port udp 514
This creates a file called /etc/rsyslog.d/11-remotesyslog.conf in the rsyslog directory. The file
has the following content:
NCLU cannot configure a remote syslog if management VRF is enabled on the switch. To do so,
please refer to the section Writing to syslog with Management VRF Enabled (see page 763) below.
There are applications in Cumulus Linux that could write directly to a log file without going through
rsyslog. These files are typically located in /var/log/.
All Cumulus Linux rules are stored in separate files in /etc/rsyslog.d/, which are called at the
end of the GLOBAL DIRECTIVES section of /etc/rsyslog.conf. As a result, the RULES
section at the end of rsyslog.conf is ignored because the messages have to be processed by
the rules in /etc/rsyslog.d and then dropped by the last line in /etc/rsyslog.d/99-
syslog.conf.
Local Logging
Most logs within Cumulus Linux are sent through rsyslog, which then writes them to files in the /var
/log directory. There are default rules in the /etc/rsyslog.d/ directory that define where the logs are
written:
Rule Purpose
10-rules. Sets defaults for log messages, include log format and log rate limits.
conf
15-crit.conf Logs crit, alert or emerg log messages to /var/log/crit.log to ensure they are not
rotated away rapidly.
20-clagd. Logs clagd messages to /var/log/clagd.log for MLAG (see page 348).
conf
22-linkstate. Logs link state changes for all physical and logical network links to /var/log/linkstate
conf
30-ptmd. Logs ptmd messages to /var/log/ptmd.log for Prescription Topology Manager (see
conf page 301).
35-rdnbrd. Logs rdnbrd messages to /var/log/rdnbrd.log for redistribute neighbor (see page
conf 684).
40-netd. Logs netd messages to /var/log/netd.log for NCLU (see page 82).
conf
45-frr.conf Logs routing protocol messages to /var/log/frr/frr.log. This includes BGP and OSPF log
messages.
99-syslog. All remaining processes that use rsyslog are sent to /var/log/syslog.
conf
cumulusnetworks.com 761
Cumulus Linux 3.5 User Guide
Log files that are rotated are compressed into an archive. Processes that do not use rsyslog write to their
own log files within the /var/log directory. For more information on specific log files, see Troubleshooting
Log Files (see page 788).
1. Create a file in /etc/rsyslog.d/. Make sure it starts with a number lower than 99 so that it
executes before log messages are dropped in, such as 20-clagd.conf or 25-switchd.conf.
Our example file is called /etc/rsyslog.d/11-remotesyslog.conf. Add content similar to the
following:
@192.168.1.2:514
This configuration sends log messages to a remote syslog server for the following processes:
clagd, switchd, ptmd, rdnbrd, netd and syslog. It follows the same syntax as the /var/log
/syslog file, where @ indicates UDP, 192.168.1.2 is the IP address of the syslog server, and 514 is
the UDP port.
The numbering of the files in /etc/rsyslog.d/ dictates how the rules are installed into
rsyslog.d. If you want to remotely log only the messages in /var/syslog, and not
those in /var/log/clagd.log or /var/log/switchd.log, for instance, then name
the file 98-remotesyslog.conf, since it's lower than the /var/syslog file 99-
syslog.conf only.
Do not use the imfile module with any file written by rsyslogd.
2. Restart rsyslog.
For each syslog server, configure a unique action line. For example, to configure two syslog servers at
192.168.0.254 and 10.0.0.1:
module(load="imuxsock"
SysSock.RateLimit.Interval="2" SysSock.RateLimit.Burst="50")
The following test script shows an example of rate-limit output in Cumulus Linux ...
cumulusnetworks.com 763
Cumulus Linux 3.5 User Guide
DONE.
root@leaf1:mgmt-vrf:/home/cumulus# tail -n 60 /var/log/syslog
2017-02-22T19:59:50.043342+00:00 leaf1 syslog.py[22830]: Message
Number:0
2017-02-22T19:59:50.043723+00:00 leaf1 syslog.py[22830]: Message
Number:1
2017-02-22T19:59:50.043941+00:00 leaf1 syslog.py[22830]: Message
Number:2
2017-02-22T19:59:50.044565+00:00 leaf1 syslog.py[22830]: Message
Number:3
2017-02-22T19:59:50.044830+00:00 leaf1 syslog.py[22830]: Message
Number:4
2017-02-22T19:59:50.045680+00:00 leaf1 syslog.py[22830]: Message
Number:5
<...snip...>
2017-02-22T19:59:50.056727+00:00 leaf1 syslog.py[22830]: Message
Number:45
2017-02-22T19:59:50.057599+00:00 leaf1 syslog.py[22830]: Message
Number:46
2017-02-22T19:59:50.057741+00:00 leaf1 syslog.py[22830]: Message
Number:47
2017-02-22T19:59:50.057936+00:00 leaf1 syslog.py[22830]: Message
Number:48
2017-02-22T19:59:50.058125+00:00 leaf1 syslog.py[22830]: Message
Number:49
2017-02-22T19:59:50.058324+00:00 leaf1 rsyslogd-2177: imuxsock[pid
22830]: begin to drop messages due to rate-limiting
This message is harmless, and can be ignored. It is logged when systemd attempts to change cgroup
attributes that are read only. The upstream version of systemd has been modified to not log this message
by default.
The systemctl daemon-reload command is often issued when Debian packages are installed, so the
message may be seen multiple times when upgrading packages.
After correcting the invalid syntax, issuing the sudo rsyslogd -N1 command produces the following
output.
cumulusnetworks.com 765
Cumulus Linux 3.5 User Guide
Using tcpdump
If a syslog server is not accessible to validate that syslog messages are being exported, you can use
tcpdump.
In the following example, a syslog server has been configured at 192.168.0.254 for UDP syslogs on port
514:
A simple way to generate syslog messages is to use sudo in another session, such as sudo date. Using
sudo generates an authpriv log.
To see the contents of the syslog file, use the tcpdump -X option:
0x0030: 6c65 6166 3031 2073 7564 6f3a 2020 6375 leaf01.sudo:..cu
0x0040: 6d75 6c75 7320 3a20 5454 593d 7074 732f mulus.:.TTY=pts/
0x0050: 3120 3b20 5057 443d 2f68 6f6d 652f 6375 1.;.PWD=/home/cu
0x0060: 6d75 6c75 7320 3b20 5553 4552 3d72 6f6f mulus.;.USER=roo
0x0070: 7420 3b20 434f 4d4d 414e 443d 2f62 696e t.;.COMMAND=/bin
0x0080: 2f64 6174 65 /date
Next Steps
The links below discuss more specific monitoring topics.
+----------------------------------------------------------------
------------+
|*Cumulus Linux GNU
/Linux |
| Advanced options for Cumulus Linux GNU
/Linux |
|
ONIE
|
|
|
+----------------------------------------------------------------
------------+
2. Use the ^ and v arrow keys to select Advanced options for Cumulus Linux GNU/Linux. A menu
similar to the following should appear:
+----------------------------------------------------------------
------------+
| Cumulus Linux GNU/Linux, with Linux 4.1.0-cl-1-
amd64 |
cumulusnetworks.com 767
Cumulus Linux 3.5 User Guide
|
|
+----------------------------------------------------------------
------------+
6. Sync the /etc directory using btrfs, then reboot the system:
routes: 8092 <<<< if all routes are IPv6, or 16384 if all routes are
IPv4
long mask routes 2048 <<<< these are routes with a mask longer than
the route mask limit
route mask limit 64
host_routes: 8192
ecmp_nhs: 16346
ecmp_nhs_per_route: 52
This translates to about 314 routes with ECMP next hops, if every route has the maximum ECMP NHs.
768 02 March 2018
Cumulus Networks
This translates to about 314 routes with ECMP next hops, if every route has the maximum ECMP NHs.
You can monitor this in Cumulus Linux with the cl-resource-query command. Results vary between
switches running on different chipsets.
cl-resource-query results for a Mellanox Spectrum switch:
cumulusnetworks.com 769
Cumulus Linux 3.5 User Guide
Contents
This chapter covers ...
Monitoring Hardware Using decode-syseeprom (see page 770)
Command Options (see page 771)
Related Commands (see page 771)
Monitoring Hardware Using sensors (see page 771)
Command Options (see page 772)
Monitoring Switch Hardware Using SNMP (see page 773)
Monitoring System Units Using smond (see page 773)
Command Options (see page 774)
Keeping the Switch Alive Using the Hardware Watchdog (see page 774)
Related Information (see page 775)
cumulus@switch:~$ decode-syseeprom
TlvInfo Header:
Id String: TlvInfo
Version: 1
Total Length: 114
TLV Name Code Len Value
-------------------- ---- --- -----
Product Name 0x21 4 4804
Part Number 0x22 14 R0596-F0009-00
Device Version 0x26 1 2
Serial Number 0x23 19 D1012023918PE000012
Command Options
Usage: /usr/cumulus/bin/decode-syseeprom [-a][-r][-s [args]][-t]
Option Description
-s Sets the EEPROM content if the EEPROM is writable. args can be supplied in command line in
a comma separated list of the form '<field>=<value>, ...'. ',' and '=' are illegal
characters in field names and values. Fields that are not specified will default to their current
values. If args are supplied in the command line, they will be written without confirmation. If
args is empty, the values will be prompted interactively.
-t Selects the target EEPROM (board, psu2, psu1) for the read or write operation; default is
TARGET board.
Related Commands
You can also use the dmidecode command to retrieve hardware configuration information that’s been
populated in the BIOS.
You can use apt-get to install the lshw program on the switch, which also retrieves hardware
configuration information.
cumulusnetworks.com 771
Cumulus Linux 3.5 User Guide
cumulus@switch:~$ sensors
tmp75-i2c-6-48
Adapter: i2c-1-mux (chan_id 0)
temp1: +39.0 C (high = +75.0 C, hyst = +25.0 C)
tmp75-i2c-6-49
Adapter: i2c-1-mux (chan_id 0)
temp1: +35.5 C (high = +75.0 C, hyst = +25.0 C)
ltc4215-i2c-7-40
Adapter: i2c-1-mux (chan_id 1)
in1: +11.87 V
in2: +11.98 V
power1: 12.98 W
curr1: +1.09 A
max6651-i2c-8-48
Adapter: i2c-1-mux (chan_id 2)
fan1: 13320 RPM (div = 1)
fan2: 13560 RPM
Output from the sensors command varies depending upon the switch hardware you use, as
each platform ships with a different type and number of sensors.
Command Options
Usage: sensors [OPTION]... [CHIP]...
Option Description
-c, -- Specify a config file; use - after -c to read the config file from stdin; by default, sensors
config-file references the configuration file in /etc/sensors.d/.
-s, --set Executes set statements in the config file (root only); sensors -s is run once at boot time
and applies all the settings to the boot drivers.
If [CHIP] is not specified in the command, all chip info will be printed. Example chip names include:
lm78-i2c-0-2d *-i2c-0-2d
lm78-i2c-0-* *-i2c-0-*
772 02 March 2018
Cumulus Networks
lm78-i2c-0-* *-i2c-0-*
lm78-i2c-*-2d *-i2c-*-2d
lm78-i2c-*-* *-i2c-*-*
lm78-isa-0290 *-isa-0290
lm78-isa-* *-isa-*
lm78-*
Some switch models lack the sensor for reading voltage information, so this data is not output
from the smonctl command.
For example, the Dell S4048 series has this sensor and displays power and voltage information:
cumulusnetworks.com 773
Cumulus Linux 3.5 User Guide
Command Options
Usage: smonctl [OPTION]... [CHIP]...
Option Description
run_watchdog=1
To disable the watchdog, edit the /etc/watchdog.d/<your_platform> file and set run_watchdog to 0
:
run_watchdog=0
You can modify the settings for the watchdog — like the timeout setting and scheduler priority — in its
configuration file, /etc/watchdog.conf.
Related Information
packages.debian.org/search?keywords=lshw
lm-sensors.org
Net-SNMP tutorials
Contents
This chapter covers ...
Network Port LEDs (see page 775)
Status LEDs (see page 776)
Locate a Switch (see page 777)
cumulusnetworks.com 775
Cumulus Linux 3.5 User Guide
Beaconing provides a way for a network operator to identify a particular link. The
administrator can beacon that port from a remote location so the network operator has
visual indication for that port.
Fault can also be considered a form of beaconing or vice versa. Both try to draw attention of
the network operator towards the port, thus they are signaled the same way.
Blinking amber implies a blink rate of 33ms. Slow blinking amber indicates a blink rate of
500 ms, with a 50% on/off duty cycle. In other words, a slow blinking amber LED is amber for
500 ms and then off for 500ms.
Status LEDs
A set of status LEDs are typically located on one side of a network switch. The status LEDs provide a visual
indication on what is physically wrong with the network switch. Typical LEDs on the front panel are for PSU
(Power Supply Units), fans and system. Locator LEDs are also found on the front panel of a switch. Let's call
the different components for which the LEDs are there as just units for now.
Number of LEDs per unit — Each unit should have only 1 LED.
Location — All units should have their LEDs on the righthand side of the switch after the physical
ports.
Unit label — The label should be printed on the front panel directly above the LED.
Colors — The focus should be on giving a network operator a simple set of indications that provide
basic information about the unit. The following section has more information about the indications,
but colors are standardized on green and amber. These colors are universally found on all status
LEDs and should be easy to implement on future switches.
Defined LED — Every network switch must have LEDs for the following:
PSU
Fans
System LED
Locator LED
PSU LEDs — Each PSU must have its own LED. PSU faults are difficult to debug. If a network
operator knows which PSU is faulty, he or she can quickly check if it is powered up correctly and if
that fault persists, replace the PSU.
Fan LED — A network switch may have multiple fan trays (3 - 6). It is difficult to put an LED for each
fan tray on the front panel, given the limited real estate. Hence, the recommendation is one LED for
all fans.
System LED — A network switch must have a system LED that indicates the general state of a
switch. This state could be of hardware, software, or both. It is up to the individual switch NOS to
decide what this LED indicates. But the LED can have only the following indications:
Locator LED — The locator LED helps locate a particular switch in a data center full of switches.
Thus, it should have a different color and predefined location. It must be located at the top right
corner on the front panel of the switch and its color must be blue.
Locate a Switch
Cumulus Linux 3.3 and newer versions support the locator LED functionality for identifying a switch, by
blinking a single LED on a specified network port, on the following switches:
Celestica Seastone, Dell Z9100-ON, Edgecore AS7712-32X, Penguin Arctica 3200C, Quanta
QuantaMesh BMS T4048-IX2, Supermicro SSE-C3632S
To use the locator LED functionality, run:
cumulusnetworks.com 777
Cumulus Linux 3.5 User Guide
In the example above, INTERFACE_NAME should be replaced with the name of the port, and TIME should
be replaced with the length of time, in seconds, that the port LED should blink.
This functionality is only supported on swp* ports, not eth* management interfaces.
Contents
This chapter covers ...
Sample VXLAN Statistics (see page 778)
Sample VLAN Statistics (see page 779)
For VLANs Using the non-VLAN-aware Bridge Driver (see page 779)
For VLANs Using the VLAN-aware Bridge Driver (see page 780)
Configuring the Counters in switchd (see page 780)
Configuring the Poll Interval (see page 781)
Configuring Internal VLAN Statistics (see page 781)
Clearing Statistics (see page 781)
Caveats and Errata (see page 781)
swp2s2.6
swp2s3.6
vxln16757104
cumulusnetworks.com 779
Cumulus Linux 3.5 User Guide
stats.vxlan.member, which controls the statistics available for each local/access port in a VXLAN
bridge. Its value defaults to BRIEF.
If you change one of these settings on the fly, the new configuration applies only to those VNIs or
VLANs set up after the configuration changed; previously allocated counters remain as is.
#stats.vlan.show_internal_vlans = FALSE
Clearing Statistics
Since ethtool is not supported for virtual devices, you cannot clear the statistics cache maintained by the
kernel. You can clear the hardware statistics via switchd:
cumulusnetworks.com 781
Cumulus Linux 3.5 User Guide
When checking the virtual counters for the bridge, the TX count is the number of packets destined
to the CPU before any hardware policers take effect. For example, if 500 broadcast packets are sent
into the bridge, the CPU is also sent 500 packets. These 500 packets are policed by the default ACLs
in Cumulus Linux, so the CPU might receive fewer than the 500 packets if the incoming packet rate
is too high. The TX counter for the bridge should be equal to 500*(number of ports in the bridge -
incoming port + CPU port) or just 500 * number of ports in the bridge.
You cannot use ethtool -S for virtual devices. This is because the counters available via netdev
are sufficient to display the vlan/vxlan counters currently supported in the hardware (only rx/tx
packets/bytes are supported currently).
Buffer Monitoring
Monitoring packet buffers (see page 247) and their utilization is vital for proper traffic management on a
network. It is quite useful for:
Identifying microbursts that result in longer packet latency
Giving early warning signs of packet buffer congestion that could lead to packet drops
Quickly identifying a network problem with a particular switch, port or traffic class
You can use buffer utilization monitoring to quickly filter out non-problematic switches so you can focus on
the ones causing trouble on the network.
The monitoring involves a set of configurable triggers, that, when triggered can lead to any or all of the
following three actions:
Log actions, which involves writing to syslog
Snapshot actions, which involves writing to a file detailing the current state
Collect actions, where the switch can collect more information
The monitoring is managed by the asic-monitor service, which is in turn managed by systemd.
Contents
This chapter covers ...
Understanding Histograms (see page 782)
Configuring Buffer Monitoring (see page 783)
Restarting the asic-monitor Service (see page 785)
Understanding Triggers (see page 786)
Understanding Monitoring Actions (see page 786)
Caveats and Errata (see page 786)
Understanding Histograms
The Mellanox Spectrum ASIC provides a mechanism to measure and report egress queue lengths in
histograms. You can configure the ASIC to measure up to 64 egress queues. Each queue is reported
through a histogram with 10 bins, where each bin represents a range of queue lengths.
You configure the histogram with a minimum size boundary (Min) and a histogram size — the monitor.
histogram_pg.histogram.minimum_bytes_boundary and monitor.histogram_pg.histogram.bin_size_bytes settings,
which are described in the table below.
You then derive the maximum size boundary (Max) by adding the Min and the histogram size.
The 10 bins are numbered 0 through 9. Bin 0 represents queue lengths up to the Min specified, including
queue length 0.
Bin 9 represents queue lengths of Max and above.
Bins 1 through 8 represent equal-sized ranges between the Min and Max, which is determined by dividing
the histogram size by 8.
For example, consider the following histogram queue length ranges, in bytes:
Min = 960
Histogram size = 12288
Max = 13248
Range size = 1536
Bin 0: 0:959
Bin 1: 960:2495
Bin 2: 2496:4031
Bin 3: 4032:5567
Bin 4: 5568:7103
Bin 5: 7104:8639
Bin 6: 8640:10175
Bin 7: 10176:11711
Bin 8: 11712:13247
Bin 9: 13248:*
When using the snapshot action, all of this information is captured in the file specified by the monitor.
histogram_pg.snapshot.file setting.
Setting Description
monitor.port_group_list A user-defined list of all the port groups in the monitor file. The
configuration file contains the following port group names as examples:
all_packet_pg
buffers_pg
discards_pg
histogram_pg
You must specify at least one port group. If the port group list is empty, then
systemd shuts down the asic-monitor service.
cumulusnetworks.com 783
Cumulus Linux 3.5 User Guide
Setting Description
monitor.histogram_pg. The range of ports for which histograms are configured. This setting can
port_set take GLOBs and comma-separated lists, like swp1-swp4,swp8,swp10-swp50.
monitor.histogram_pg. Each port group monitors one kind of hardware state, in this case, a
stat_type histogram.
monitor.histogram_pg. Each CoS (Class of Service) value in the list has its own histogram on each
cos_list port.
monitor.histogram_pg. The type of trigger that can initiate state collection. The only valid option is
trigger_type timer. This setting is optional.
If no port group has its trigger type set to timer, the asic-monitor service
exits without an error.
monitor.histogram_pg. The frequency at which the histogram triggers; for example, a setting of 1s
timer indicates it executes once per second.
The timer can be set to:
1 to 60 seconds, so 1s, 2s, and so on up to 60s
1 to 60 minutes, so 1m, 2m, and so on up to 60m
1 to 24 hours, so 1h, 2h, and so on up to 24h
1 to 7 days, so 1d, 2d and so on up to 7d
monitor.histogram_pg. The prefix file name for the snapshot file. All snapshots use this name, with
snapshot.file a sequential number appended to it. For example, /var/lib/cumulus
/histogram_stats_0.
monitor.histogram_pg. The number of snapshots that can be created before the first snapshot file
snapshot.file_count is overwritten. While more snapshots can provide you with more data, they
can occupy a lot of disk space on the switch. See Caveats and Errata (see
page 786) below.
monitor.histogram_pg. The minimum boundary size for the histogram in bytes. On a Mellanox
histogram. switch, this number must be a multiple of 96.
minimum_bytes_boundary
Adding this number to the size of the histogram produces the maximum
boundary size.
Setting Description
monitor.histogram_pg.log. The length of the queue in bytes before the log action writes a message to
queue_bytes syslog.
monitor.histogram_pg. During state collection, when this queue length (measured in bytes) is
collect.queue_bytes reached, the collect action initiates another state collection.
monitor.histogram_pg. The port groups that get triggered by the histogram_pg collect action.
collect.port_group_list
The configuration is stored in the /etc/cumulus/datapath/monitor.conf file. You edit the settings in
the file directly with a text editor. There is no default configuration. Here is a sample configuration:
monitor.port_group_list = [discards_pg,histogram_pg,all_packet_pg,
buffers_pg]
cumulusnetworks.com 785
Cumulus Linux 3.5 User Guide
The service is enabled by default when you boot the switch and is restarted whenever you restart switchd
.
Understanding Triggers
During state collection, the monitoring service may respond to a threshold being crossed, which triggers a
monitoring action.
At this time, the only type of trigger that initiates state collection is a timer. The timer is the frequency at
which the histogram triggers and reads the ASIC state.
When a monitoring statistic meets a configured threshold, it can trigger an action. Triggers can include:
Queue length, as measured by a histogram
Packet drops due to packet buffer congestion
Packet drops due to errors
If no trigger is configured for a monitoring action, the action happens unconditionally and always occurs.
A snapshot action takes a snapshot of the current state that was collected and writes it out to a file. You
specify the prefix for the snapshot file name — including the path, like /var/lib/cumulus/ for example
— and the number of snapshots that can be taken before the system starts overwriting the earliest
snapshot files. For example, if the snapshot file is called /var/lib/cumulus/snapshot and the
snapshot file count is set to 64, then the first snapshot file is named snapshot_0 and the 64th snapshot is
named snapshot_63. When the 65th snapshot has taken, the original snapshot file — /var/lib/cumulus
/snapshot_0 — is overwritten and the files are overwritten in sequence..
To create the cl-support archive file manually, run the cl-support command:
If the Cumulus Networks support team requests that you submit the output from cl-support to help with
the investigation of issues you might experience with Cumulus Linux and you need to include security-
sensitive information, such as the sudoers file, use the -s option:
cumulusnetworks.com 787
Cumulus Linux 3.5 User Guide
/var/log Information from the update-alternatives are logged into this log file.
/alternatives.
log
/var/log/apt Information the apt utility can send logs here; for example, from
apt-get install and apt-get remove.
/var/log/audit Contains log information stored by the Linux audit daemon, auditd.
/*
/var/log Logs output generated by running the zero touch provisioning (see
/autoprovision page 71) script.
/var/log/btmp This file contains information about failed login attempts. Use the
last command to view the btmp file. For example:
/var/log Contains kernel ring buffer information. When the system boots up, it
/dmesg prints number of messages on the screen that display information
about the hardware devices that the kernel detects during boot
process. These messages are available in the kernel ring buffer and
whenever a new message arrives, the old message gets overwritten.
You can also view the content of this file using the dmesg command.
Note that Cumulus Linux does not write to this log file; but because
it's a standard file, Cumulus Linux creates it as a zero length file.
/var/log/faillog Contains failed user login attempts. Use the faillog command to
display the contents of this file.
Note that Cumulus Linux does not write to this log file; but because
it's a standard file, Cumulus Linux creates it as a zero length file.
/var/log/fsck/* The fsck utility is used to check and optionally repair one or more
Linux filesystems.
Note that Cumulus Linux does not write to this log file; but because
it's a standard file, Cumulus Linux creates it as a zero length file.
/var/log Formats and prints the contents of the last login log file.
/lastlog
/var/log/news The news command keeps you informed of news concerning the
/* system.
Note that Cumulus Linux does not write to this log file; but because
it's a standard file, Cumulus Linux creates it as a zero length file.
cumulusnetworks.com 789
Cumulus Linux 3.5 User Guide
example an
md5 or mtu
mismatch with
OSPF.
/var/log Log file for snapshots (see page 57). These logs are
/snapper.log valuable for the
snapshots you
take on your
switch.
/var/log/syslog The main system log, which logs everything except auth-related The primary
messages. log; it's easiest
to grep this file
to see what
occurred
during a
problem.
File Description
/etc/nologin nologin prevents unprivileged users from logging into the system.
/etc update-alternatives creates, removes, maintains and displays information about the
/alternatives symbolic links comprising the Debian alternatives system.
This is the alphabetical of the output from running ls -l on the /etc directory structure created by cl-
support. The green highlighted rows are the ones Cumulus Networks finds most important when
troubleshooting problems.
ca-certificates
cumulusnetworks.com 791
Cumulus Linux 3.5 User Guide
chef This is an example of something that is not included This is not installed by
by default. In this instance, cl-support included the default, but this tool could
chef folder for some reason. have been installed or
configured incorrectly, which
is why it's included in the
cl-support output.
cumulusnetworks.com 793
Cumulus Linux 3.5 User Guide
gss
image-release Contains the version of Cumulus Linux that was Useful for determining
installed with the installer. This version number does baseline version.
not change when you upgrade using apt-get.
cumulusnetworks.com 795
Cumulus Linux 3.5 User Guide
lsb-release Shows the current version of Linux on the system. This shows you the version
Run cat /etc/lsb-release for output. of the operating system you
are running; also compare
this to the output of onie-
select.
cumulusnetworks.com 797
Cumulus Linux 3.5 User Guide
netd.conf The NCLU (see page 82) configuration file. Contains the settings for
which Linux commands are
operable under NCLU. Also
contains the blacklist of
infrequently used
commands.
network Contains the network interface configuration for ifup The main configuration file is
and ifdown. under /etc/network
/interfaces. This is
where you configure L2 and
L3 information for all of your
front panel ports (swp
interfaces). Settings like
MTU, link speed, IP address
information, and VLANs are
all included here.
cumulusnetworks.com 799
Cumulus Linux 3.5 User Guide
Cumulus Linux-specific
folder for PTM (prescriptive
topology manager).
resolv.conf Resolver configuration file, which is where DNS is set You need DNS to reach the
(domain, nameserver and search). Cumulus Linux repository.
securetty This file lists terminals into which the root user can log
in.
cumulusnetworks.com 801
Cumulus Linux 3.5 User Guide
sudoers
timezone If this file exists, it is read and its contents are used as
the time zone name.
cumulusnetworks.com 803
Cumulus Linux 3.5 User Guide
Contents
This chapter covers ...
Enabling Logging for Networking (see page 804)
Using ifquery to Validate and Debug Interface Configurations (see page 805)
Debugging Mako Template Errors (see page 806)
ifdown Cannot Find an Interface that Exists (see page 807)
Removing All References to a Child Interface (see page 807)
MTU Set on a Logical Interface Fails with Error: "Numerical result out of range" (see page 808)
Interpreting iproute2 batch Command Failures (see page 808)
Understanding the "RTNETLINK answers: Invalid argument" Error when Adding a Port to a Bridge
(see page 808)
MLAG Peerlink Interface Drops Many Packets (see page 809)
# Exclude interfaces
EXCLUDE_INTERFACES=
Use ifquery --check to check the current running state of an interface within the interfaces file. It
will return exit code 0 or 1 if the configuration does not match. The line bond-xmit-hash-policy
layer3+7 below fails because it should read bond-xmit-hash-policy layer3+4.
Use ifquery --running to print the running state of interfaces in the interfaces file format:
cumulusnetworks.com 805
Cumulus Linux 3.5 User Guide
auto bond0
iface bond0
bond-slaves swp25 swp26
address 14.0.0.9/30
address 2001:ded:beef:2::1/64
ifquery --syntax-help provides help on all possible attributes supported in the interfaces file. For
complete syntax on the interfaces file, see man interfaces and man ifupdown-addons-
interfaces.
You can use ifquery --print-savedstate to check the ifupdown2 state database. ifdown works
only on interfaces present in this state database.
# ssim2 added
auto swp45
iface swp45
auto swp46
iface swp46
auto lo
iface lo inet loopback
auto eth0
iface eth0 inet dhcp
auto bond1
iface bond1
bond-slaves swp2 swp1
auto bond3
iface bond3
bond-slaves swp8 swp6 swp7
auto br0
iface br0
bridge-ports swp3 swp5 bond1 swp4 bond3
bridge-pathcosts swp3=4 swp5=4 swp4=4
address 11.0.0.10/24
address 2001::10/64
Notice that bond1 is a member of br0. If bond1 is removed, you must remove the reference to it from the
br0 configuration. Otherwise, if you reload the configuration with ifreload -a, bond1 is still part of br0.
cumulusnetworks.com 807
Cumulus Linux 3.5 User Guide
MTU Set on a Logical Interface Fails with Error: "Numerical result out of
range"
This error occurs when the MTU (see page 235) you are trying to set on an interface is higher than the MTU
of the lower interface or dependent interface. Linux expects the upper interface to have an MTU less than
or equal to the MTU on the lower interface.
In the example below, the swp1.100 VLAN interface is an upper interface to physical interface swp1. If you
want to change the MTU to 9000 on the VLAN interface, you must include the new MTU on the lower
interface swp1 as well.
auto swp1.100
iface swp1.100
mtu 9000
auto swp1
iface swp1
mtu 9000
error: failed to execute cmd 'ip -force -batch - [link set dev host2
master bridge
addr flush dev host2
link set dev host1 master bridge
addr flush dev host1
]'(RTNETLINK answers: Invalid argument
Command failed -:1)
warning: bridge configuration failed (missing ports)
Contents
This chapter covers ...
Monitoring Interface Status Using ethtool (see page 809)
Viewing and Clearing Interface Counters (see page 810)
Monitoring Switch Port SFP/QSFP Hardware Information Using ethtool (see page 811)
cumulusnetworks.com 809
Cumulus Linux 3.5 User Guide
Option Description
-c Copies and clears statistics. It does not clear counters in the kernel or hardware.
The -c argument is applied per user ID by default. You can override it by using the -
t argument to save statistics to a different directory.
The -d argument is applied per user ID by default. You can override it by using the -
t argument to save statistics to a different directory.
cumulusnetworks.com 811
Cumulus Linux 3.5 User Guide
Network Troubleshooting
Cumulus Linux contains a number of command line and analytical tools to help you troubleshoot issues
with your network.
Contents
This chapter covers ...
Checking Reachability Using ping (see page 814)
Printing Route Trace Using traceroute (see page 814)
Manipulating the System ARP Cache (see page 815)
cumulus@switch:~$ arp -a
? (11.0.2.2) at 00:02:00:00:00:10 [ether] on swp3
? (11.0.3.2) at 00:02:00:00:00:01 [ether] on swp4
? (11.0.0.2) at 44:38:39:00:01:c1 [ether] on swp1
If you need to flush or remove and ARP entry for a specific interface, you can disable dynamic ARP learning:
cumulusnetworks.com 815
Cumulus Linux 3.5 User Guide
[iptables]
-A FORWARD -p tcp --dport 80 -j ACCEPT
The -p option clears out all other rules, and the -i option is used to reinstall all the rules.
----------------------------------------------------------
| MAC_HEADER | IP_HEADER | GRE_HEADER | L2_Mirrored_Packet |
----------------------------------------------------------
Mirrored traffic is not guaranteed. If the MTP is congested, mirrored packets may be discarded.
cumulusnetworks.com 817
Cumulus Linux 3.5 User Guide
SPAN and ERSPAN are configured via cl-acltool, the same utility for security ACL configuration (see
page 134). The match criteria for SPAN and ERSPAN is usually an interface; for more granular match terms,
use selective spanning (see page 823). The SPAN source interface can be a port, a subinterface or a bond
interface. Both ingress and egress traffic on interfaces can be matched.
Cumulus Linux supports a maximum of 2 SPAN destinations. Multiple rules (SPAN sources) can point to the
same SPAN destination, although a given SPAN source cannot specify 2 SPAN destinations. The SPAN
destination (MTP) interface can be a physical port, a subinterface, or a bond interface. The SPAN/ERSPAN
action is independent of security ACL actions. If packets match both a security ACL rule and a SPAN rule,
both actions will be carried out.
In order to configure SPAN or ERSPAN on a Tomahawk-based switch, you must enable non-
atomic update mode (see page 166).
Mellanox switches reject SPAN ACL rules for an output interface that is a subinterface.
Using cl-acltool with the --out-interface rule applies to transit traffic only; it does not
apply to traffic sourced from the switch.
cumulusnetworks.com 819
Cumulus Linux 3.5 User Guide
Running the following command is incorrect and will remove all existing control-plane rules or
other installed rules and only install the rules defined in span.rules:
Using cl-acltool with the --out-interface rule applies to transit traffic only; it does not
apply to traffic sourced from the switch.
cumulusnetworks.com 821
Cumulus Linux 3.5 User Guide
Configuring ERSPAN
This section describes how to configure ERSPAN for all packets coming in from swp1 to 12.0.0.2.
The src-ip option can be any IP address, whether it exists in the routing table or not. The dst-ip
option must be an IP address reachable via the routing table. The destination IP address must be
reachable from a front-panel port, and not the management port. Use ping or ip route get
<ip> to verify that the destination IP address is reachable. Setting the --ttl option is
recommended.
When using Wireshark to review the ERSPAN output, Wireshark may report the message
"Unknown version, please report or test to use fake ERSPAN preference", and the trace is
unreadable. To resolve this, go into the General preferences for Wireshark, then go to Protocols
> ERSPAN and check the Force to decode fake ERSPAN frame option.
Selective Spanning
SPAN/ERSPAN traffic rules can be configured to limit the traffic that is spanned, to reduce the volume of
copied data.
Cumulus Linux supports selective spanning for iptables only. ip6tables and ebtables are
not supported.
With ERSPAN, a maximum of two --src-ip --dst-ip pairs are supported. Exceeding this limit
produces an error when you install the rules with cl-acltool.
SPAN Examples
To mirror forwarded packets from all ports matching SIP 20.0.1.0 and DIP 20.0.1.2 to port swp1s1:
cumulusnetworks.com 823
Cumulus Linux 3.5 User Guide
To mirror forwarded UDP packets received from port swp1s0, towards DIP 20.0.1.2 and destination
port 53:
ERSPAN Examples
To mirror forwarded packets from all ports matching SIP 20.0.1.0 and DIP 20.0.1.2:
To mirror forwarded UDP packets received from port swp1s0, towards DIP 20.0.1.2 and destination
port 53:
cumulusnetworks.com 825
Cumulus Linux 3.5 User Guide
Related Information
www.perihel.at/sec/mz/mzguide.html
en.wikipedia.org/wiki/Ping
www.tcpdump.org
en.wikipedia.org/wiki/Traceroute
Contents
This chapter covers ...
Using net show Commands (see page 827)
Showing Interfaces (see page 827)
Other Useful Features (see page 829)
Installing netshow on a Linux Server (see page 829)
Showing Interfaces
To show all available interfaces that are physically UP, run net show interface:
cumulusnetworks.com 827
Cumulus Linux 3.5 User Guide
Whereas net show interface all displays every interface regardless of state:
You can get information about the switch itself by running net show system:
Debian and Red Hat packages will be available in the near future.
Contents
This chapter covers ...
Installing hsflowd (see page 830)
Configuring sFlow (see page 830)
Configuring sFlow via DNS-SD (see page 830)
cumulusnetworks.com 829
Cumulus Linux 3.5 User Guide
Installing hsflowd
To download and install the hsflowd package, use apt-get:
Configuring sFlow
You can configure hsflowd to send to the designated collectors via two methods:
DNS service discovery (DNS-SD)
Manually configuring /etc/hsflowd.conf
The above snippet instructs hsflowd to send sFlow data to collector1 on port 6343 and to collector2 on
port 6344. hsflowd will poll counters every 20 seconds and sample 1 out of every 2048 packets.
The maximum samples per second delivered from the hardware is limited to 16K. You can
configure the number of samples per scond in the /etc/cumulus/datapath/traffic.conf
file, as shown below:
After the initial configuration is ready, bring up the sFlow daemon by running:
DNSSD = off
sampling.1G=2048
sampling.10G=4096
sampling.40G=8192
collector {
ip = 192.0.2.100
udpport = 6343
}
collector {
ip = 192.0.2.200
udpport = 6344
}
This configuration polls the counters every 20 seconds, samples 1 of every 2048 packets and sends this
information to a collector at 192.0.2.100 on port 6343 and to another collector at 192.0.2.200 on port
6344.
Some collectors require each source to transmit on a different port, others may listen on only
one port. Please refer to the documentation for your collector for more information.
Related Information
sFlow Collectors
sFlow Wikipedia page
SNMP Monitoring
Cumulus Linux utilizes the open source Net-SNMP agent snmpd, v5.7.3, which provides support for most of
the common industry-wide MIBs, including interface counters and TCP/UDP IP stack data.
Contents
This chapter covers ...
Introduction to SNMP (Simple Network Management Protocol) (see page 833)
Configuring Ports for SNMP to Listen for Requests (see page 833)
Quick Start Guide (see page 833)
Starting the SNMP Daemon (see page 836)
Configuring SNMP (see page 836)
Configuring SNMP with NCLU (see page 837)
Configuring SNMP Manually (see page 839)
Configuring SNMP with Management VRF (see page 839)
Configuring the agentAddress (see page 841)
Setting up the Custom Cumulus Networks MIBs (see page 842)
Setting the Community String (see page 842)
Enabling SNMP Support for FRRouting (see page 843)
Enabling the .1.3.6.1.2.1 Range (see page 845)
Configuring SNMPv3 (see page 846)
snmpwalk a Switch from Another Linux Device (see page 849)
Troubleshooting Tips Table for snmpwalks (see page 850)
SNMP Traps (see page 851)
Generating Event Notification Traps (see page 851)
snmptrapd.conf (see page 860)
Supported MIBs (see page 861)
About Pass Persist Scripts (see page 864)
Troubleshooting (see page 865)
Syntax Meaning
agentAddress Required. This command sets the protocol, IP address, and the port for
snmpd to listen on for incoming requests. The IP address must exist on
an interface that has link UP on the switch where snmpd is being used. By
default, this is set to udp:127.0.0.1:161, which means snmpd listens on the
loopback interface and so only responds to requests (snmpwalk, snmpget
, snmpgetnext) originating from the switch. A wildcard setting of udp:161,
udp6:161 forces snmpd to listen on all IPv4 and IPv6 interfaces for
incoming SNMP requests. Multiple IP address can be configured as
comma-separated values, as in udp:66.66.66.66:161,udp:77.77.77.77:161,
udp6:[2001::1]:161.
rocommunity Required. This command defines the password that is required for SNMP
version 1 or 2c requests for GET or GETNEXT. By default, this provides
access to the full OID tree for such requests, regardless of from where
they were sent. There is no default password set, so that snmpd does not
respond to any requests that arrive. Users often specify a source IP
address token to restrict access to only that host or network given. Users
also specify a view name (as defined above) to restrict the subset of the
OID tree.
Some examples of rocommunity commands are shown below. The first
command sets the read only community string to "simplepassword" for
SNMP requests sourced from the 10.10.10.0/24 subnet and restricts
cumulusnetworks.com 833
Cumulus Linux 3.5 User Guide
Syntax Meaning
viewing to the systemonly view name defined previously with the view
command. The second example simply creates a read-only community
password that allows access to the entire OID tree from any source IP
address.
rocommunity cumulustestpassword
view This commands defines a view name that specifies a subset of the overall
OID tree. This restricted view can then be referenced by name in the
rocommunity command to link the view to a password that is used to see
this restricted OID subset. By default, the snmpd.conf file contains
numerous views with the systemonly view name:
trapsink This command defines the IP address of the notification (or trap) receiver
for either SNMPv1 traps or SNMPv2 traps. If several sink directives are
trap2sink
specified, multiple copies of each notification (in the appropriate formats)
are generated. Note that a trap server must be configured to receive and
decode these trap messages (for example, snmptrapd). The address of
the trap receiver can be configured with a different protocol and port but
this is most often left out. The defaults are to use the well-known UDP
packets and port 162.
Syntax Meaning
createUser snmptrapusernameX
iquerySecName snmptrapusernameX
rouser snmptrapusernameX
linkUpDownNotifications This command enables link up and link down trap notifications, assuming
yes the other trap configurations settings are set. This command configures
the Event MIB tables to monitor the ifTable for network interfaces being
taken up or down, and triggering a linkUp or linkDown notification as
appropriate. This is exactly equivalent to the following configuration:
defaultMonitors yes This command configures the Event MIB tables to monitor the various
UCD-SNMP-MIB tables for problems (as indicated by the appropriate
xxErrFlag column objects) and send a trap. This assumes the user has
downloaded the snmp-mibs-downloader Debian package and
comments out "mibs" from /etc/snmp/snmp.conf (as in: "#mibs"). This
command is exactly equivalent to the following configuration:
cumulusnetworks.com 835
Cumulus Linux 3.5 User Guide
Syntax Meaning
[Service]
Restart=always
RestartSec=60
Once the service is started, SNMP can be used to manage various components on the Cumulus Linux
switch.
Configuring SNMP
Cumulus Linux ships with a production usable default snmpd.conf file included.This section covers a few
basic configuration options in snmpd.conf. For more information regarding further configuring this file,
refer to the snmpd.conf man page.
Cumulus Linux 3.4 and later releases support configuring SNMP with NCLU.
The default snmpd.conf file does not include all supported MIBs or OIDs that can be exposed.
Customers must at least change the default community string for v1 or v2c environments or the
snmpd daemon will not respond to any requests.
Command Summary
net del all or net Removes all entries in the /etc/snmp/snmpd.conf file and replaces them
del snmp-server all with defaults. The defaults remove all SNMPv3 usernames, readonly-
communities, and a listening-address of localhost will be configured.
net add snmp-server For security reasons, the localhost is set to a listening address 127.0.0.1 by
listening-address default. This means that the SNMP agent will only respond to requests
localhost originating on the switch itself. One or more IP addresses can be deleted.
net add snmp-server Configures the snmpd agent to listen on all interfaces for UDP port 161
listening-address SNMP requests.
all
net add snmp-server Sets the SNMP agent snmpd to listen to a specific IPv4 or IPv6 address, or a
listening-address group of addresses with space separated values, for incoming SNMP
IP_ADDRESS queries.
IP_ADDRESS ...
net add snmp-server Creates a view to restrict MIB tree exposure. By itself, this view definition
viewname [view has no effect; however, when linked to an SNMPv3 username or community
name] (included | password, and a host from a restricted subnet, any SNMP request with that
excluded) [OID or username/password must have a source IP address within the configured
name] subnet.
Note that OID can be either a string of period separated decimal numbers
or a unique text string that identifies an SNMP MIB object. Some MIBs are
not installed by default and must be installed either by hand or with the
cumulusnetworks.com 837
Cumulus Linux 3.5 User Guide
Command Summary
net add snmp-server Defines the community password, and which parts of the OID tree to apply
(readonly-community the password to for incoming SNMP requests.
| readonly-
community-v6)
[password] access
(any | localhost |
[network]) [(view
[view name]) or
[oid [oid or name])
net add snmp-server Sets the SNMP Trap destination IP address. Multiple destinations can exist,
trap-destination but at least one must be set to enable SNMP Traps to be sent. Removing all
(localhost | settings will disable SNMP traps.
[ipaddress])
The default version is 2c, unless otherwise configured.
community-password
[password] [version
[1 | 2c]]
net add snmp-server Enables notifications for interface link-up to be sent to SNMP Trap
trap-link-up [check- destinations.
frequency [seconds]]
net add snmp-server Enables notifications for interface link-down to be sent to SNMP Trap
trap-link-down destinations.
[check-frequency
[seconds]]
net add snmp-server Enables SNMP Trap notifications to be sent for every SNMP authentication
trap-snmp-auth- failure.
failures
net add snmp-server Enables a trap when the cpu-load-average exceeds the configured
trap-cpu-load- threshold. Only integers or floating point numbers can be used.
average one-minute
[threshold] five-
minute [5-min-
threshold] fifteen-
minute [15-min-
threshold]
Command Summary
net add snmp-server Sets the system physical location for the node in the SNMPv2-MIB
system-location [string] system table.
net add snmp-server Sets the identification of the contact person for this managed node,
system-contact [string] together with information on how to contact this person.
net add snmp-server Sets an administratively-assigned name for the managed node. By
system-name [string] convention, this is the node's fully-qualified domain name.
The example commands below enable an SNMP agent to listen on all IP addresses with a community string
password, set the trap destination host IP address, and create four types of SNMP traps:
SNMP configuration in NCLU is not VRF aware so the snmpd daemon is always started in the default VRF.
cumulusnetworks.com 839
Cumulus Linux 3.5 User Guide
SNMP configuration in NCLU is not VRF aware so the snmpd daemon is always started in the default VRF.
Because interfaces in a particular VRF (routing table) are not aware of interfaces in a different VRF, the
snmpd daemon only responds to polling requests and sends traps on the interfaces of the VRF on which it
is running.
When management VRF is configured, most users will want to start the snmpd daemon in the management
VRF to receive and respond to SNMP polling requests on eth0. Follow these guidelines:
1. Configure all the required SNMP settings with NCLU. Pay particular attention to the listening-address
configuration setting, which should contain one or more IP addresses that belong to interfaces
within a single VRF (if management VRF is configured, this is typically the IP address of eth0 ). You can
use IP addresses other than eth0, but the interfaces for these IP addresses must be in the same VRF
(typically the management VRF).
2. Commit the changes to start the snmpd daemon in the default VRF.
3. Manually stop the snmpd daemon from running in the default VRF.
4. Manually restart the snmpd daemon in the management VRF.
To use management VRF, you need to configure the IP address of eth0 as the listening-address. In the
example below, eth0 IP address is 10.10.10.10. You can also add other snmp-server configurations, then
commit the changes.
This restarts the snmpd daemon in the default VRF. Then, to run snmpd in the correct VRF, stop the
daemon in the default VRF (or stop any other snmpd daemons that happen to be running), then restart
snmpd in the management VRF so that it can respond to requests on interfaces only in that VRF. Make sure
that only one instance of the snmpd daemon is running and that it is running in the desired VRF. Assuming
the Management VRF has been enabled, the following example shows how to stop snmpd and restart it in
the management VRF.
1. Open the /etc/snmp/snmpd.conf file in a text editor, and edit the following line:
agentAddress udp:127.0.0.1:161
You can only specify one agentAddress line. If you want to listen on multiple IP addresses,
use comma-separated addresses, like this:
cumulusnetworks.com 841
Cumulus Linux 3.5 User Guide
agentAddress 10.10.10.10,44.44.44.44,127.0.0.1
However, several files need to be copied to the server, in order for the custom Cumulus MIB to be
recognized on the destination NMS server.
/usr/share/snmp/mibs/Cumulus-Snmp-MIB.txt
/usr/share/snmp/mibs/Cumulus-Counters-MIB.txt
/usr/share/snmp/mibs/Cumulus-Resource-Query-MIB.txt
Keyword Meaning
Keyword Meaning
default The default keyword allows connections from any system. The localhost keyword
allows requests only from the local host. A restricted source can either be a
specific hostname (or address), or a subnet, represented as IP/MASK (like
10.10.10.0/255.255.255.0), or IP/BITS (like 10.10.10.0/24), or the IPv6 equivalents.
systemonly The name of this particular SNMP view. This is a user-defined value.
3. Restart snmpd:
At this time, SNMP does not support monitoring BGP unnumbered neighbors.
Similarly, if you plan on using the OSPFv2 MIB, you need to expose .1.3.6.1.2.1.14 in the /etc/snmp
/snmpd.conf file, and expose .1.3.6.1.2.1.191 for the OSPv3 MIB.
To enable SNMP support for FRRouting, do the following:
2. Update the SNMP configuration to enable FRRouting to respond to SNMP requests. Open the /etc
/snmp/snmpd.conf file in a text editor, and add the following lines:
cumulusnetworks.com 843
2.
agentAddress udp:161
rocommunity public default
3. Optionally, you need to uncomment parts of snmpd.conf if you intend to use SNMP with the
following MIBs:
For the BGP4 MIB, uncomment the view systemonly included .1.3.6.1.2.1.15 line
below.
For the OSPF MIB, uncomment the view systemonly included .1.3.6.1.2.1.14 line
below.
For the OSPFV3 MIB, uncomment the view systemonly included .1.3.6.1.2.1.191
line below.
4. After you save the snmpd.conf file, create a file called /etc/snmp/frr.conf that contains the
following line:
agentXSocket /run/agentx/master
5. After you save this file, restart the snmpd and FRRouting services for these changes to take effect
844 02 March 2018
Cumulus Networks
5. After you save this file, restart the snmpd and FRRouting services for these changes to take effect
and to reload the FRRouting daemons with AgentX access:
To verify the configuration, run snmpwalk. For example, if you have a running OSPF configuration with
routes, you can check this OSPF-MIB first from the switch itself with:
This configuration grants access to a large number of MIBs, including all MIB2 MIBs, which could
reveal more data than expected. In addition to being a security vulnerability, it could consume
more CPU resources.
#################################################################
##############
#
# ACCESS CONTROL
#
# system
view systemonly included .1.3.6.1.2.1
# frrouting ospf6
view systemonly included .1.3.6.1.3.102
# lldpd (Note: lldpd must be restarted with the -x option
cumulusnetworks.com 845
Cumulus Linux 3.5 User Guide
3. Restart snmpd:
Configuring SNMPv3
SNMPv3 is often used to enable authentication and encryption, as community strings in versions 1 and 2c
are sent in plaintext. SNMPv3 usernames are added to the /etc/snmp/snmpd.conf file, along with
plaintext authentication and encryption pass phrases.
The NCLU command structures for configuring SNMP user passwords are:
An example is shown below, defining five users, each with a different combination of authentication and
encryption:
After configuring user passwords and restarting the snmpd daemon, the user access can be checked with a
client.
The snmp Debian package contains snmpget, snmpwalk, and other programs that are useful for
checking daemon functionality from the switch itself or from another workstation.
The following commands check the access for each user defined above from the localhost:
cumulusnetworks.com 847
Cumulus Linux 3.5 User Guide
A slightly more secure method of configuring SNMPv3 users without creating cleartext passwords is the
following:
3. Use the net-snmp-config command to create two users, one with MD5 and DES, and the next
with SHA and AES.
The minimum password length is 8 characters and the arguments -a and -x to net-
snmp-config have different meanings than they do for snmpwalk.
This adds a createUser command in /var/lib/snmp/snmpd.conf. Do not edit this file by hand,
unless you are removing usernames. It also adds the rwuser in /usr/share/snmp/snmpd.conf. You
may want to edit this file and restrict access to certain parts of the MIB by adding noauth, auth or priv to
allow unauthenticated access, require authentication or to enforce use of encryption, respectively.
The snmpd daemon reads the information from the /var/lib/snmp/snpmd.conf file and then the line is
removed (eliminating the storage of the master password for that user) and replaced with the key that is
derived from it (using the EngineID). This key is a localized key, so that if it is stolen it cannot be used to
access other agents. To remove the two users userMD5withDES and userSHAwithAES, you need simply
stop the snmpd daemon and edit the files /var/lib/snmp/snmpd.conf and /usr/share/snmp
/snmpd.conf. Simply remove the lines containing the username. Then restart the snmpd daemon as in
step 3 above.
From a client, you would access the MIB with the correct credentials. (Again, note that the roles of -x, -a
and -X and -A are reversed on the client side as compared with the net-snmp-config command used
above.)
snmpwalk does not show enterprise MIBs by default (from the 1.3.6.1.4.1 tree); these need to be
explicitly named.
For this demonstration, another switch running Cumulus Linux within the network is used.
4. Many SNMP clients (snmpwalk, snmpget and snmpgetnext) as well as the SNMP agent (snmpd)
can benefit from having MIBs installed.
Enabling monitoring for traps with defaultMonitors and monitor (when referring to
OIDs by name) require MIBs to be installed on the switch.
#
# As the snmp packages come without MIB files due to license
reasons, loading
cumulusnetworks.com 849
Cumulus Linux 3.5 User Guide
6. Perform an snmpwalk on the switch. The switch running snmpd in the demonstration is using IP
address 192.168.0.111. It is possible to snmpwalk the switch from itself. Run the following
command, which rules out an SNMP problem against a networking problem.
Any information gathered here should verify that snmpd is running correctly on the Cumulus Linux side,
reducing locations where a problem may reside.
switch2 (another snmpd is serving information correctly and Network connectivity is not
Cumulus Linux switch in network reachability works between switch able to grab information?
the network) and switch2. Is there an iptables rule
blocking? Is the snmpwalk
The problem resides somewhere else. For
example, Prism cannot reach switch, or being run correctly?
there is a Prism misconfiguration.
Nutanix Prism CLI (see snmpd is serving information correctly and Is the right community name
page 866) (SSH to the network reachability works between switch being used in the GUI? Is
cluster IP address) and the Nutanix Appliance. snmp v2c being used?
The problem resides somewhere else. For
example, the GUI might be misconfigured.
SNMP Traps
createUser trapusername
iquerySecName trapusername
rouser trapusername
iquerySecName specifies the default SNMPv3 username to be used when making internal
queries to retrieve any necessary information — either for evaluating the monitored expression
or building a notification payload. These internal queries always use SNMPv3, even if normal
querying of the agent is done using SNMPv1 or SNMPv2c. Note that this user must also be
cumulusnetworks.com 851
Cumulus Linux 3.5 User Guide
querying of the agent is done using SNMPv1 or SNMPv2c. Note that this user must also be
explicitly created via createUser and given appropriate access rights, for rouser, for example.
The iquerySecName directive is purely concerned with defining which user should be used, not
with actually setting this user up.
Although the traps are sent to an SNMPV2 receiver, the SNMPv3 user is still required. Starting
with Net-SNMP 5.3, snmptrapd no longer accepts all traps by default. snmptrapd must be
configured with authorized SNMPv1/v2c community strings and/or SNMPv3 users. Non-
authorized traps/informs will be dropped. Please refer to the snmptrapd.conf(5) manual page for
details.
It is possible to define multiple trap receivers and to use the domain name instead of an IP
address in the trap2sink directive.
SNMPv3 TRAP/INFORM
The SNMP trap receiving daemon must have usernames, authentication passwords, and encryption
passwords created with its own EngineID. You must configure this trap server EngineID in the switch snmpd
daemon sending the trap and inform messages. You specify the level of authentication and encryption for
SNMPv3 trap and inform messages with -l (NoauthNoPriv, authNoPriv, or authPriv).
You can define multiple trap receivers and use the domain name instead of an IP address in the
trap2sink directive.
After you complete the configuration, restart the snmpd service to apply the changes:
snmptrap, snmpget, snmpwalk and snmpd itself must be able to bind to this address.
clientaddr [<transport-specifier>:]<transport-address>
specifies the source address to be used by command-line
applica
tions when sending SNMP requests. See snmpcmd(1) for
more infor
mation about the format of addresses.
This value is also used by snmpd when generating
notifications.
cumulusnetworks.com 853
Cumulus Linux 3.5 User Guide
EXPRESSION
There are three types of monitor expression supported
by the Event MIB - existence, boolean and threshold tests.
OID OP VALUE
OPTIONS
cumulusnetworks.com 855
Cumulus Linux 3.5 User Guide
snmpd can be configured to monitor the operational status of an Entity MIB or Entity-Sensor MIB. The
operational status, given as a value of ok(1), unavailable(2) or nonoperational(3), can be determined by
adding the following example configuration to /etc/snmp/snmpd.conf, and adjusting the values:
Using the entPhySensorOperStatus integer:
To get all sensor information, run snmpwalk on the entPhysicalName table. For example:
cumulusnetworks.com 857
4.
Cumulus Linux 3.5 User Guide
5. Open the /etc/snmp/snmp.conf file to verify that the mibs : line is commented out:
#
# As the snmp packages come without MIB files due to license
reasons, loading
# of MIBs is disabled by default. If you added the MIBs you can
reenable
# loading them by commenting out the following line.
#mibs :
6. Open the /etc/default/snmpd file to verify that the export MIBS= line is commented out:
7. Once the configuration has been confirmed, remove or comment out the non-free repository in
/etc/apt/sources.list.
linkUpDownNotifications yes
The default frequency for checking link up/down is 60 seconds. The default frequency can be
changed using the monitor directive directly instead of the linkUpDownNotifications
directive. See man snmpd.conf for details.
trap is generated. The -o lmTempSenesorsDevice option is used to instruct SNMP to also include the
858 02 March 2018
Cumulus Networks
trap is generated. The -o lmTempSenesorsDevice option is used to instruct SNMP to also include the
lmTempSensorsDevice MIB in the generated trap. The default frequency for the monitor directive is 600
seconds. The default frequency may be changed using the -r option.:
Alternatively, temperature sensors may be monitored individually. To monitor the sensors individually, first
use the sensors command to determine which sensors are available to be monitored on the platform.
CY8C3245-i2c-4-2e
Adapter: i2c-0-mux (chan_id 2)
fan5: 7006 RPM (min = 2500 RPM, max = 23000 RPM)
fan6: 6955 RPM (min = 2500 RPM, max = 23000 RPM)
fan7: 6799 RPM (min = 2500 RPM, max = 23000 RPM)
fan8: 6750 RPM (min = 2500 RPM, max = 23000 RPM)
temp1: +34.0 C (high = +68.0 C)
temp2: +28.0 C (high = +68.0 C)
temp3: +33.0 C (high = +68.0 C)
temp4: +31.0 C (high = +68.0 C)
temp5: +23.0 C (high = +68.0 C)
Configure a monitor command for the specific sensor using the -I option. The -I option indicates that
the monitored expression is applied to a single instance. In this example, there are five temperature
sensors available. The following monitor directive can be used to monitor only temperature sensor three at
five minute intervals.
cumulusnetworks.com 859
Cumulus Linux 3.5 User Guide
load 12 10 5
includeAllDisks 1%
monitor -r 60 -o dskPath -o DiskErrMsg "dskTable" diskErrorFlag !=0
authtrapenable 1
snmptrapd.conf
To receive SNMP traps, the Net-SNMP trap daemon can be used on the switch. The configuration file, /etc
/snmp/snmptrapd.conf, is used to configure how incoming traps should be processed. Starting with
release 5.3, it is necessary to explicitly specify who is authorized to send traps and informs to the
notification receiver (and what types of processing these are allowed to trigger). There are currently three
types of processing that can be specified:
log: Logs the details of the notification, in a specified file, to standard output (or stderr), or via syslog
(or similar).
execute: Passes the details of the trap to a specified handler program, including embedded Perl.
net: Forwards the trap to another notification receiver.
Most commonly, this configuration typically is log,execute,net to cover any style of processing for a particular
category of notification. But it is possible (even desirable) to limit certain notification sources to selected
processing only.
authCommunity TYPES COMMUNITY [SOURCE [OID | -v VIEW ]] authorizes traps and SNMPv2c
INFORM requests with the specified community to trigger the types of processing listed. By default, this
allows any notification using this community to be processed. The SOURCE field can be used to specify that
the configuration should only apply to notifications received from particular sources. For more information
about specific configuration options within the file, look at the snmpd.conf(5) man page with the
following command:
######################################################################
#########
#
# EXAMPLE-trap.conf:
# An example configuration file for configuring the Net-SNMP
snmptrapd agent.
#
######################################################################
#########
#
# This file is intended to only be an example. If, however, you want
# to use it, it should be placed in /etc/snmp/snmptrapd.conf.
# When the snmptrapd agent starts up, this is where it will look for
it.
#
# All lines beginning with a '#' are comments and are intended for you
# to read. All other lines are configuration commands for the agent.
#
# PLEASE: read the snmptrapd.conf(5) manual page as well!
#
# this is the default (port 162) and defines the listening
# protocol and address (e.g. udp:10.10.10.10)
snmpTrapdAddr localhost
#
# defines the actions and the community string
authCommunity log,execute,net public
Supported MIBs
Below are the MIBs supported by Cumulus Linux, as well as suggested uses for them. The overall Cumulus
Linux MIB is defined in /usr/share/snmp/mibs/Cumulus-Snmp-MIB.txt.
BRIDGE and Q- The dot1dBasePortEntry and dot1dBasePortIfIndex tables in the BRIDGE-MIB and
BRIDGE dot1qBase, dot1qFdbEntry, dot1qTpFdbEntry, dot1qTpFdbStatus, and the
dot1qVlanStaticName tables in the Q-BRIDGE-MIB tables. You must uncomment the
bridge_pp.py pass_persist script in /etc/snmp/snmpd.conf.
BGP4, OSPF, FRRouting SNMP support may be enabled to provide support for OSPF-MIB (RFC-1850),
OSPFV3, RIPv2 OSPFV3-MIB (RFC-5643), and BGP4-MIB (RFC-4273). To enable this support, see the
FRRouting section (see page 843) above.
CUMULUS- Discard counters: Cumulus Linux also includes its own counters MIB, defined in /usr
COUNTERS- /share/snmp/mibs/Cumulus-Counters-MIB.txt. It has the OID .
MIB 1.3.6.1.4.1.40310.2
CUMULUS-
RESOURCE-
QUERY-MIB
cumulusnetworks.com 861
Cumulus Linux 3.5 User Guide
Cumulus Linux includes its own resource utilization MIB, which is similar to using cl-
resource-query . It monitors L3 entries by host, route, nexthops, ECMP groups and
L2 MAC/BDPU entries. The MIB is defined in /usr/share/snmp/mibs/Cumulus-
Resource-Query-MIB.txt, and has the OID .1.3.6.1.4.1.40310.1.
CUMULUS- The Cumulus Networks custom Power over Ethernet (see page 191) PoE MIB defined in
POE-MIB /usr/share/snmp/mibs/Cumulus-POE-MIB.txt. For devices that provide PoE, this
provides users with the system wide power information in poeSystemValues as well as
per interface PoeObjectsEntry values for the poeObjectsTable. Most of this information
comes from the poectl command. This MIB is enabled by uncommenting the following
line in /etc/snmp/snmpd.conf:
ENTITY From RFC 4133, the temperature sensors, fan sensors, power sensors, and ports are
covered.
ENTITY- Physical sensor information (temperature, fan, and power supply) from RFC 3433.
SENSOR
IF-MIB Interface description, type, MTU, speed, MAC, admin, operation status, counters
The IF-MIB cache is disabled by default. To enable the counter to reflect traffic
statistics, remove the -y option from the SNMPDOPTS line in the /etc
/default/snmpd file. The example below first shows the original line,
commented out, then the modified line without the -y option:
LLDP L2 neighbor info from lldpd (note, you need to enable the SNMP subagent (see page
301) in LLDP). lldpd needs to be started with the -x option to enable connectivity to
snmpd (AgentX).
LM-SENSORS Fan speed, temperature sensor values, voltages. This is deprecated since the ENTITY-
MIB SENSOR MIB has been added.
NET-SNMP- See this knowledge base article on extending NET-SNMP in Cumulus Linux to include
EXTEND-MIB data from power supplies, fans and temperature sensors.
SNMP-TARGET
cumulusnetworks.com 863
Cumulus Linux 3.5 User Guide
SNMPv2 SNMP counters. For information on exposing CPU and memory information via SNMP,
see this knowledge base article.
The ENTITY MIB does not currently show the chassis information in Cumulus Linux.
Troubleshooting
The following commands can be used to troubleshoot potential SNMP issues:
cumulusnetworks.com 865
Cumulus Linux 3.5 User Guide
Contents
This chapter covers ...
Configuring Cumulus Linux (see page 866)
Configuring Nutanix (see page 867)
Switch Information Displayed on Nutanix Prism (see page 870)
Troubleshooting a Nutanix Node (see page 871)
Enabling LLDP/CDP on VMware ESXi (Hypervisor on Nutanix) (see page 872)
Enabling LLDP/CDP on Nutanix Acropolis (Hypervisor on Nutanix Acropolis) (see page 873)
Troubleshooting Connections without LLDP or CDP (see page 873)
Community
5. Restart snmpd:
Configuring Nutanix
1. Log into the Nutanix Prism. Nutanix defaults to the Home menu, referred to as the Dashboard:
2.
cumulusnetworks.com 867
Cumulus Linux 3.5 User Guide
2. Click on the gear icon in the top right corner of the dashboard, and select NetworkSwitch:
3. Click the +Add Switch Configuration button in the Network Switch Configuration pop up
window.
4. Fill out the Network Switch Configuration for the Top of Rack (ToR) switch configured for snmpd
in the previous section:
Host IP 192.168.0.171,192.168.0.172,192.168.0.173,192.168.0.174
Addresses or
Host Names
IP addresses of
Nutanix hosts
connected to that
particular ToR
switch.
The rest of the values were not touched for this demonstration. They are usually used with
SNMP v3.
5. Save the configuration. The switch will now be present in the Network Switch Configuration
menu now.
6. Close the pop up window to return to the dashboard.
7.
cumulusnetworks.com 869
Cumulus Linux 3.5 User Guide
The switch has been added correctly, when interfaces hooked up to the Nutanix hosts are visible.
cumulusnetworks.com 871
Cumulus Linux 3.5 User Guide
The both means CDP is now running, and the lldp dameon on Cumulus Linux is capable of
'seeing' CDP devices.
2. After the next CDP interval, the Cumulus Linux box will pick up the interface via the lldp daemon:
2. Select a MAC address to troubleshoot (e.g. 0c:c4:7a:09:a2:43 represents vmnic0 which is tied to NX-
cumulusnetworks.com 873
Cumulus Linux 3.5 User Guide
2. Select a MAC address to troubleshoot (e.g. 0c:c4:7a:09:a2:43 represents vmnic0 which is tied to NX-
1050-A).
3. List out all the MAC addresses associated to the bridge:
Contents
This chapter covers ...
Overview (see page 876)
Trend Analysis Using Metrics (see page 876)
Generating Alerts with Triggered Logging (see page 876)
Log Formatting (see page 876)
Hardware (see page 877)
System Data (see page 879)
CPU Idle Time (see page 879)
Disk Usage (see page 880)
Process Restart (see page 881)
Layer 1 Protocols and Interfaces (see page 881)
Layer 2 Protocols (see page 888)
Layer 3 Protocols (see page 890)
BGP (see page 890)
OSPF (see page 891)
Route and Host Entries (see page 891)
Routing Logs (see page 892)
Logging (see page 893)
cumulusnetworks.com 875
Cumulus Linux 3.5 User Guide
Overview
This document describes:
Metrics that you can poll from Cumulus Linux and use in trend analysis
Critical log messages that you can monitor for triggered alerts
Log Formatting
Most log files in Cumulus Linux use a standard presentation format. For example, consider this syslog
entry:
Hardware
The smond process provides monitoring functionality for various switch hardware elements. Minimum or
maximum values are output depending on the flags applied to the basic command. The hardware elements
and applicable commands and flags are listed in the table below.
Temperature 10 seconds
cumulus@switch:~$ smonctl -j
cumulus@switch:~$ smonctl -j -s TEMP[X]
Fan 10 seconds
cumulus@switch:~$ smonctl -j
cumulus@switch:~$ smonctl -j -s FAN[X]
PSU 10 seconds
cumulus@switch:~$ smonctl -j
cumulus@switch:~$ smonctl -j -s PSU[X]
cumulus@switch:~$ smonctl -j
cumulus@switch:~$ smonctl -j -s PSU[X]Fan[X]
cumulus@switch:~$ smonctl -j
cumulus@switch:~$ smonctl -j -s PSU[X]Temp
[X]
Voltage 10 seconds
cumulus@switch:~$ smonctl -j
cumulus@switch:~$ smonctl -j -s Volt[X]
cumulusnetworks.com 877
Cumulus Linux 3.5 User Guide
cumulus@switch:~$ ledmgrd -d
cumulus@switch:~$ ledmgrd -j
Not all switch models include a sensor for monitoring power consumption and voltage. See this
note (see page 773) for details.
High
temperature
/var /usr/sbin/smond : : Temp1(Board Sensor near
/log CPU): state changed from UNKNOWN to OK
/sysl /usr/sbin/smond : : Temp2(Board Sensor Near
og
Virtual Switch): state changed from UNKNOWN to
OK
/usr/sbin/smond : : Temp3(Board Sensor at Front
Left Corner): state changed from UNKNOWN to OK
/usr/sbin/smond : : Temp4(Board Sensor at Front
Right Corner): state changed from UNKNOWN to OK
/usr/sbin/smond : : Temp5(Board Sensor near
Fan): state changed from UNKNOWN to OK
Fan speed
issues
/var /usr/sbin/smond : : Fan1(Fan Tray 1, Fan 1):
/log state changed from UNKNOWN to OK
/sysl /usr/sbin/smond : : Fan2(Fan Tray 1, Fan 2):
og
state changed from UNKNOWN to OK
/usr/sbin/smond : : Fan3(Fan Tray 2, Fan 1):
state changed from UNKNOWN to OK
/usr/sbin/smond : : Fan4(Fan Tray 2, Fan 2):
state changed from UNKNOWN to OK
/usr/sbin/smond : : Fan5(Fan Tray 3, Fan 1):
state changed from UNKNOWN to OK
/usr/sbin/smond : : Fan6(Fan Tray 3, Fan 2):
state changed from UNKNOWN to OK
PSU failure
System Data
Cumulus Linux includes a number of ways to monitor various aspects of system data. In addition, alerts are
issued in high risk situations.
High
CPU
/var/log sysmonitor: Critically high CPU use: 99%
/syslog systemd[1]: Starting Monitor system resources
(cpu, memory, disk)...
systemd[1]: Started Monitor system resources
(cpu, memory, disk).
sysmonitor: High CPU use: 89%
systemd[1]: Starting Monitor system resources
(cpu, memory, disk)...
cumulusnetworks.com 879
Cumulus Linux 3.5 User Guide
Cumulus Linux 3.0 and later monitors CPU, memory, and disk space via sysmonitor. The configurations
for the thresholds are stored in /etc/cumulus/sysmonitor.conf. More information is available with
man sysmonitor.
Click here to see differences between Cumulus Linux 2.5 ESR and 3.0 and later...
High
CPU
/var jdoo[2803]: 'localhost' cpu system usage of 41.1%
/log matches resource limit [cpu system usage>30.0%]
/syslo jdoo[4727]: 'localhost' sysloadavg(15min) of 111.0
g
matches resource limit [sysloadavg(15min)>110.0]
In Cumulus Linux 2.5, CPU logs are created with each unique threshold:
User 70%
System 30%
Wait 20%
In Cumulus Linux 2.5, CPU and memory warnings are generated with jdoo. The configuration for the
thresholds is stored in /etc/jdoo/jdoorc.d/cl-utilities.rc.
Disk Usage
When monitoring disk utilization, you can exclude tmpfs from monitoring.
cumulus@switch:~$ /bin/df -x
tmpfs
Process Restart
In Cumulus Linux 3.0 and later, systemd is responsible for monitoring and restarting processes.
cumulus@switch:~$ systemctl
status
Click here to changes from Cumulus Linux 2.5 ESR to 3.0 and later...
Cumulus Linux 2.5.2 through 2.5 ESR uses a forked version of monit called jdoo to monitor processes. If
the process fails, jdoo invokes init.d to restart the process.
cumulus@switch:~$ ps -aux
Link state
Link speed
Port state
Bond state
Interface counters are obtained from either querying the hardware or the Linux kernel. The two outputs
should align, but the Linux kernel aggregates the output from the hardware.
Interface counters 10
seconds
cumulus@switch:~$ cat /sys/class/net/[iface]
/statistics/[stat_name]
cumulus@switch:~$ net show counters json
cumulus@switch:~$ cl-netstat -j
cumulus@switch:~$ ethtool -S [iface]
Link failure/Link
flap
/var switchd[5692]: nic.c:213 nic_set_carrier:
/log swp17: setting kernel carrier: down
/swi switchd[5692]: netlink.c:291 libnl: swp1,
tchd
family 0, ifi 20, oper down
switchd[5692]: nic.c:213 nic_set_carrier:
.log
swp1: setting kernel carrier: up
switchd[5692]: netlink.c:291 libnl: swp17,
family 0, ifi 20, oper up
Unidirectional
link
/var ptmd[7146]: ptm_bfd.c:2471 Created new session
/log 0x1 with peer 10.255.255.11 port swp1
/swi ptmd[7146]: ptm_bfd.c:2471 Created new session
tchd
0x2 with peer fe80::4638:39ff:fe00:5b port swp1
ptmd[7146]: ptm_bfd.c:2471 Session 0x1 down to
.log
peer 10.255.255.11, Reason 8
/var ptmd[7146]: ptm_bfd.c:2471 Detect timeout on
/log session 0x1 with peer 10.255.255.11, in state 1
/ptm
.log
Bond
Negotiation
/var kernel: [85412.763193] bonding: bond0 is being
Working
/log created...
/sys kernel: [85412.770014] bond0: Enslaving swp2
log
as a backup interface with an up link
kernel: [85412.775216] bond0: Enslaving swp1
as a backup interface with an up link
kernel: [85412.797393] IPv6: ADDRCONF
(NETDEV_UP): bond0: link is not ready
kernel: [85412.799425] IPv6: ADDRCONF
(NETDEV_CHANGE): bond0: link becomes ready
Bond
Negotiation
/var kernel: [85412.763193] bonding: bond0 is being
Failing
/log created...
cumulusnetworks.com 883
Cumulus Linux 3.5 User Guide
MLAG peerlink
negotiation
/var lldpd[998]: error while receiving frame on
Working
/log swp50: Network is down
/sys lldpd[998]: error while receiving frame on
log
swp49: Network is down
kernel: [76174.262893] peerlink: Setting
ad_actor_system to 44:38:39:00:00:11
kernel: [76174.264205] 8021q: adding VLAN 0 to
HW filter on device peerlink
mstpd: one_clag_cmd: setting (1) peer link:
peerlink
mstpd: one_clag_cmd: setting (1) clag state: up
mstpd: one_clag_cmd: setting system-mac 44:38:
39:ff:40:94
mstpd: one_clag_cmd: setting clag-role
secondary
MLAG peerlink
negotiation
/var lldpd[998]: error while receiving frame on
Failing
/log swp50: Network is down
/sys lldpd[998]: error while receiving frame on
log
swp49: Network is down
kernel: [76174.262893] peerlink: Setting
ad_actor_system to 44:38:39:00:00:11
kernel: [76174.264205] 8021q: adding VLAN 0 to
HW filter on device peerlink
mstpd: one_clag_cmd: setting (1) peer link:
peerlink
mstpd: one_clag_cmd: setting (1) clag state:
down
mstpd: one_clag_cmd: setting system-mac 44:38:
39:ff:40:94
mstpd: one_clag_cmd: setting clag-role
secondary
MLAG port
negotiation
/var kernel: [77419.112195] bonding: server01 is
Working
/log being created...
/sys lldpd[998]: error while receiving frame on
log
swp1: Network is down
kernel: [77419.122707] 8021q: adding VLAN 0 to
HW filter on device swp1
kernel: [77419.126408] server01: Enslaving
swp1 as a backup interface with a down link
cumulusnetworks.com 885
Cumulus Linux 3.5 User Guide
MLAG port
negotiation
/var kernel: [79290.290999] bonding: server01 is
Failing
/log being created...
/sys kernel: [79290.299645] 8021q: adding VLAN 0 to
log
HW filter on device swp1
kernel: [79290.301790] server01: Enslaving
swp1 as a backup interface with a down link
kernel: [79290.358294] server01: Setting
ad_actor_system to 44:38:39:ff:40:94
kernel: [79290.373590] server01: Warning: No
802.3ad response from the link partner for any
adapters in the bond
kernel: [79290.374024] IPv6: ADDRCONF
(NETDEV_UP): server01: link is not ready
kernel: [79290.374028] 8021q: adding VLAN 0 to
HW filter on device server01
MLAG port
negotiation
/var mstpd: one_clag_cmd: setting (0) mac 00:00:00:
Flapping
/log 00:00:00 <server01, None>
/sys mstpd: one_clag_cmd: setting (1) mac 44:38:39:
log
00:00:03 <server01, None>
Prescriptive Topology Manager (PTM) uses LLDP information to compare against a topology.dot file that
describes the network. It has built in alerting capabilities, so it is preferable to use PTM on box rather than
polling LLDP information regularly. The PTM code is available on the Cumulus Networks GitHub repository.
Additional PTM, BFD, and associated logs are documented in the code.
Cumulus Networks recommends that you track peering information through PTM. For more
information, refer to the Prescriptive Topology Manager documentation (see page 301).
cumulusnetworks.com 887
Cumulus Linux 3.5 User Guide
cumulus@switch:~$ lldpctl -f
json
Layer 2 Protocols
Spanning tree is a protocol that prevents loops in a layer 2 infrastructure. In a stable state, the spanning
tree protocol should stably converge. Monitoring the Topology Change Notifications (TCN) in STP helps
identify when new BPDUs are received.
Spanning
Tree Working
/var kernel: [1653877.190724] device swp1 entered
/log promiscuous mode
/syslog kernel: [1653877.190796] device swp2 entered
promiscuous mode
mstpd: create_br: Add bridge bridge
mstpd: clag_set_sys_mac_br: set bridge mac 00:
00:00:00:00:00
mstpd: create_if: Add iface swp1 as port#2 to
bridge bridge
mstpd: set_if_up: Port swp1 : up
mstpd: create_if: Add iface swp2 as port#1 to
bridge bridge
mstpd: set_if_up: Port swp2 : up
mstpd: set_br_up: Set bridge bridge up
mstpd: MSTP_OUT_set_state: bridge:swp1:0
entering blocking state(Disabled)
mstpd: MSTP_OUT_set_state: bridge:swp2:0
entering blocking state(Disabled)
mstpd: MSTP_OUT_flush_all_fids: bridge:swp1:0
Flushing forwarding database
mstpd: MSTP_OUT_flush_all_fids: bridge:swp2:0
Flushing forwarding database
mstpd: MSTP_OUT_set_state: bridge:swp1:0
entering learning state(Designated)
mstpd: MSTP_OUT_set_state: bridge:swp2:0
entering learning state(Designated)
sudo: pam_unix(sudo:session): session closed
for user root
mstpd: MSTP_OUT_set_state: bridge:swp1:0
entering forwarding state(Designated)
mstpd: MSTP_OUT_set_state: bridge:swp2:0
entering forwarding state(Designated)
mstpd: MSTP_OUT_flush_all_fids: bridge:swp2:0
Flushing forwarding database
mstpd: MSTP_OUT_flush_all_fids: bridge:swp1:0
Flushing forwarding database
Spanning
Tree Blocking
/var mstpd: MSTP_OUT_set_state: bridge:swp2:0
/log entering blocking state(Designated)
/syslog mstpd: MSTP_OUT_set_state: bridge:swp2:0
entering learning state(Designated)
cumulusnetworks.com 889
Cumulus Linux 3.5 User Guide
Layer 3 Protocols
When FRRouting boots up for the first time, there is a different log file for each daemon that is activated. If
the log file is ever edited (for example, through vtysh or frr.conf), the integrated configuration sends all
logs to the same file.
To send FRRouting logs to syslog, apply the configuration log syslog in vtysh.
BGP
When monitoring BGP, check if BGP peers are operational. There is not much value in alerting on the
current operational state of the peer; monitoring the transition is more valuable, which you can do by
monitoring syslog.
Monitoring the routing table provides trending on the size of the infrastructure. This is especially useful
when integrated with host-based solutions (such as Routing on the Host) when the routes track with the
number of applications available.
BGP
peer
down /var/log bgpd[3000]: %NOTIFICATION: sent to neighbor
/syslog swp1 4/0 (Hold Timer Expired) 0 bytes
/var/log bgpd[3000]: %ADJCHANGE: neighbor swp1 Down BGP
/frr/*.
Notification send
log
OSPF
When monitoring OSPF, check if OSPF peers are operational. There is not much value in alerting on the
current operational state of the peer; monitoring the transition is more valuable, which you can do by
monitoring syslog.
Monitoring the routing table provides trending on the size of the infrastructure. This is especially useful
when integrated with host-based solutions (such as Routing on the Host) when the routes track with the
number of applications available.
cumulusnetworks.com 891
Cumulus Linux 3.5 User Guide
cumulus@switch:~$ cl-resource-query
cumulus@switch:~$ cl-resource-query -
k
cumulus@switch:~$ cl-resource-query
cumulus@switch:~$ cl-resource-query -
k
Routing Logs
Routing
protocol
process crash /var frrouting[1824]: Starting FRRouting daemons
/log (prio:10):. zebra. bgpd.
/sys bgpd[1847]: BGPd 1.0.0+cl3u7 starting:
log
vty@2605, bgp@<all>:179
zebra[1840]: client 12 says hello and bids fair
to announce only bgp routes
watchquagga[1853]: watchquagga 1.0.0+cl3u7
watching [zebra bgpd], mode [phased zebra
restart]
watchquagga[1853]: bgpd state -> up : connect
succeeded
watchquagga[1853]: bgpd state -> down : read
returned EOF
cumulus-core: Running cl-support for core files
bgpd.3030.1470341944.core.core_helper
core_check.sh[4992]: Please send /var/support
/cl_support__spine01_20160804_201905.tar.xz to
Cumulus support
watchquagga[1853]: Forked background command
[pid 6665]: /usr/sbin/service frr restart bgpd
watchquagga[1853]: watchquagga 0.99.24+cl3u2
watching [zebra bgpd ospfd], mode [phased zebra
restart]
watchquagga[1853]: zebra state -> up : connect
succeeded
Logging
The table below describes the various log files.
syslog Catch all log file. Identifies memory leaks and CPU spikes.
/va
r
/lo
g
/sy
slo
g
cumulusnetworks.com 893
Cumulus Linux 3.5 User Guide
emo
n.
log
Routing The log file is configurable in FRRouting. When FRRouting first boots, it uses
protocol the non-integrated configuration so each routing protocol has its own log file.
After booting up, FRRouting switches over to using the integrated /va
configuration, so that all logs go to a single place. r
To edit the location of the log files, use the log file <location> /lo
command. By default, FRRouting logs are not sent to syslog. Use the log g
syslog <level> command to send logs through rsyslog and into /var /fr
/log/syslog. r
/ze
bra
.
log
/va
r
/lo
g
/fr
r/
{pr
oto
col
}.
log
/va
r
/lo
g
/fr
r
/fr
r.
log
NTP
Run the following command to confirm that the NTP process is working correctly and that the switch clock
is in sync with NTP:
cumulus@switch:~$ /usr/bin/ntpq -p
Device Management
User
Authentication
and Remote /va sshd[31830]: Accepted publickey for cumulus
Login r from 192.168.0.254 port 45582 ssh2: RSA 38:e6:
/lo 3b:cc:04:ac:41:5e:c9:e3:93:9d:cc:9e:48:25
g
sshd[31830]: pam_unix(sshd:session): session
opened for user cumulus by (uid=0)
/sy
slo
g
Executing
commands
using sudo /var sudo: cumulus : TTY=unknown ; PWD=/home
/log /cumulus ; USER=root ; COMMAND=/tmp
/sysl /script_9938.sh -v
og
sudo: pam_unix(sudo:session): session opened
for user root by (uid=0)
sudo: pam_unix(sudo:session): session closed
for user root
cumulusnetworks.com 895
Cumulus Linux 3.5 User Guide
Logs
Network
896 Solutions 02 March 2018
Cumulus Networks
Network Solutions
Contents
This chapter covers ...
Layer 2 - Architecture (see page 897)
Traditional Spanning Tree - Single Attached (see page 897)
MLAG (see page 899)
Layer 3 Architecture (see page 901)
Single-attached Hosts (see page 901)
Redistribute Neighbor (see page 903)
Routing on the Host (see page 904)
Routing on the VM (see page 905)
Virtual Router (see page 906)
Anycast with Manual Redistribution (see page 907)
Network Virtualization (see page 909)
LNV with MLAG (see page 909)
Layer 2 - Architecture
cumulusnetworks.com 897
Cumulus Linux 3.5 User Guide
auto eth1
iface eth1 inet manual
auto eth1.10
iface eth1.10 inet manual
auto eth2
iface eth1 inet manual
auto eth2.20
iface eth2.20 inet manual
auto br-10
iface br-10 inet manual
bridge-ports eth1.10 vnet0
auto br-20
iface br-20 inet manual
bridge-ports eth2.20 vnet1
MLAG
MLAG (see page 348) (multi-chassis link aggregation) is when both Benefits
uplinks are utilized at the same time. VRR gives the ability for both
100% of links utilized
spines to act as gateways simultaneously for HA (high availability) and
active-active mode (see page 435) (both are being used at the same Caveats
time).
More complicated (more
Configurations moving parts)
leaf01 Config More configuration
cumulusnetworks.com 899
Cumulus Linux 3.5 User Guide
No interoperability
auto bridge between vendors
iface bridge
ISL (inter-switch link)
bridge-vlan-aware yes required
bridge-ports host-01 peerlink
bridge-vids 1-2000 Additional Comments
bridge-stp on Can be done with either
the traditional (see page
auto bridge.10 319) or VLAN-aware (see
iface bridge.10 page 325) bridge driver
address 172.16.1.2/24 depending on overall STP
address-virtual 44:38:39:00:00:10 needs
172.16.1.1/24 There are a few different
solutions including Cisco
auto peerlink VPC and Arista MLAG, but
iface peerlink none of them
bond-slaves glob swp49-50 interoperate and are very
vendor specific
auto peerlink.4094 Cumulus Networks Layer
iface peerlink.4094 2 HA validated design
address 169.254.1.2 guide
clagd-enable yes
clagd-peer-ip 169.254.1.2
clagd-system-mac 44:38:39:FF:40:94
auto host-01
iface host-01
bond-slaves swp1
clag-id 1
{bond-defaults removed for brevity}
auto bond0
iface bond0 inet manual
bond-slaves eth0 eth1
{bond-defaults removed for brevity}
auto bond0.10
iface bond0.10 inet manual
auto vm-br10
iface vm-br10 inet manual
bridge-ports bond0.10 vnet0
Layer 3 Architecture
Single-attached Hosts
cumulusnetworks.com 901
Cumulus Linux 3.5 User Guide
ip ospf area 0
leaf02 Config
/etc/network/interfaces
auto swp1
iface swp1
address 172.16.2.1/30
/etc/frr/frr.conf
router ospf
router-id 10.0.0.12
interface swp1
ip ospf area 0
auto eth1
iface eth1 inet static
address 172.16.1.2/30
up ip route add 0.0.0.0/0
nexthop via 172.16.1.1
auto eth1
iface eth1 inet static
address 172.16.2.2/30
up ip route add 0.0.0.0/0
nexthop via 172.16.2.1
No redundancy, uses single ToR as Big Data validated design guide uses single
gateway. attached ToR
Redistribute Neighbor
Equal cost route installed on server/host Cumulus Networks blog post introducing
/hypervisor to both ToRs to load balance redistribute neighbor
evenly.
cumulusnetworks.com 903
Cumulus Linux 3.5 User Guide
Certain hypervisors or
host OSes might not
support a routing
application like FRRouting
and will require a virtual
router on the hypervisor
No L2 adjacnecy between
servers without VXLAN
The first hop is still the ToR, just like redistribute neighbor Routing on the Host: An
Introduction
A default route can be advertised by all leaf/ToRs for
dynamic ECMP paths Installing the Cumulus
Linux FRRouting Package
on an Ubuntu Server
Configuring FRRouting (see
page 606)
Routing on the VM
cumulusnetworks.com 905
Cumulus Linux 3.5 User Guide
Caveats
All VMs must be capable of routing
Scale considerations might need to be
taken into an account —
instead of one routing process, there
are as many as there are VMs
No L2 adjacency between servers
without VXLAN
The first hop is still the ToR, just like Routing on the host: An Introduction
redistribute neighbor
Installing the Cumulus Linux FRRouting
Multiple ToRs (2+) can be used Package on an Ubuntu Server
Configuring FRRouting (see page 606)
Virtual Router
In contrast to routing on the host (preferred), this method allows a user Benefits
to route to the host. The ToRs are the gateway, as with redistribute
Most benefits of
neighbor, except because there is no daemon running, the networks
routing on the host
must be manually configured under the routing process. There is a
potential to black hole unless a script is run to remove the routes when No requirement for
the host no longer responds. host to run routing
Configurations No requirement for
redistribute neighbor
leaf01 Config
Caveats
/etc/network/interfaces
Removing a subnet
from one ToR and re-
adding it to another
auto swp1
(hence, network
iface swp1
statements from your
address 172.16.1.1/30 router process) is a
manual process
/etc/frr/frr.conf
cumulusnetworks.com 907
Cumulus Linux 3.5 User Guide
router ospf
Network team and
router-id 10.0.0.11 server team would
interface swp1 have to be in sync, or
ip ospf area 0 server team controls
the ToR, or
leaf02 Config automation is being
used whenever VM
/etc/network/interfaces migration happens
auto lo
iface lo inet loopback
auto lo:1
iface lo:1 inet static
address 172.16.1.2/32
up ip route add 0.0.0.0/0 nexthop via 172.16
.1.1 dev eth0 onlink nexthop via 172.16.1.1
dev eth1 onlink
auto eth1
iface eth2 inet static
address 172.16.1.2/32
auto eth2
iface eth2 inet static
address 172.16.1.2/32
Network Virtualization
auto vni-10
iface vni-10
vxlan-id 10
vxlan-local-tunnelip 10.0.0.11
auto br-10
iface br-10
bridge-ports swp1 vni-10
cumulusnetworks.com 909
Cumulus Linux 3.5 User Guide
leaf02 Config
/etc/network/interfaces
auto lo
iface lo inet loopback
address 10.0.0.12/32
Vxrd-src-ip 10.0.0.12
vxrd-svcnode-ip 10.10.10.10
clagd-vxlan-anycast-ip 36.0.0.11
auto vni-10
iface vni-10
vxlan-id 10
vxlan-local-tunnelip 10.0.0.12
auto br-10
iface br-10
bridge-ports swp1 vni-10
More Information
Contents
This chapter covers ...
Reference Topology (see page 911)
IP and MAC Addressing (see page 911)
Reference Topology
The Cumulus Networks reference topology includes cabling (in DOT format for dual use with PTM (see page
301)), MAC addressing, IP addressing, switches and servers. This topology is blessed by the Professional
Services Team at Cumulus Networks to fit a majority of designs seen in the field.
cumulusnetworks.com 911
Cumulus Linux 3.5 User Guide
edge01 192.168.0.51 A0:00:00:00:00: 10g NICs (customer edge device, firewall, load
51 balancer, etc.)
Virtual Appliance
You can build out the reference topology in hardware or using Cumulus VX (the free Cumulus Networks
virtual appliance). The Cumulus Reference Topology using Vagrant is essentially the reference topology built
out inside Vagrant with VirtualBox or KVM. The installation and setup instructions for bringing up the entire
reference topology on a laptop or server are on the cldemo-vagrant GitHub repo.
Hardware
Any switch from the hardware compatibility list is compatible with the topology as long as you follow the
interface count from the table above. Of course, in your own production environment, you don't have to
use exactly the same devices and cabling as outlined above.
Demos
You can find an up to date list of all the demos in the cldemo-vagrant GitHub repository, which is available
to anyone free of charge.
...
2. Create the /etc/apt/sources.list.d/docker.list file, add the following line in a text editor,
and save the file:
...
docker
[Service]
ExecStart=
ExecStart=/usr/bin/docker daemon --iptables=false --ip-
masq=false --ip-forward=false
Performance Notes
Keep in mind switches are not servers, in terms of the hardware that drives them. As such, you should be
mindful of the types of applications you want to run in containers on a Cumulus Linux switch. In general,
depending upon the configuration of the container, you can expect DHCP servers, custom scripts and other
lightweight services to run well. However, VPN, NAT and encryption-type services are CPU-intensive and
could lead to undesirable effects on critical applications. Use of any resource-intensive services should be
avoided and is not supported.
cumulusnetworks.com 915
Cumulus Linux 3.5 User Guide
Contents
Configuring the REST API (see page 916)
Installing and Configuring the Cumulus Networks Modular Layer 2 Mechanism Driver (see page 917)
Demo (see page 917)
[ML2]
#local_bind = 10.40.10.122
#service_node = 10.40.10.1
2. Restart the REST API service for the configuration changes to take effect:
Additional REST API calls have been added to support the configuration of bridge using the bridge name
916 02 March 2018
Cumulus Networks
Additional REST API calls have been added to support the configuration of bridge using the bridge name
instead of network ID.
[ml2_cumulus]
switches="192.168.10.10,192.168.20.20"
The ML2 mechanism driver contains the following configurable parameters. You configure them in the /etc
/neutron/plugins/ml2/ml2_conf.ini file.
switches — The list of Cumulus Linux switches connected to the Neutron host. Specify a list of IP
addresses.
scheme — The scheme (for example, HTTP) for the base URL for the ML2 API.
protocol_port — The protocol port for the bast URL for the ML2 API. The default value is 8000.
sync_time — A periodic time interval for polling the Cumulus Linux switch. The default value is 30
seconds.
spf_enable — Enables/disables SPF for the bridge. The default value is False.
new_bridge — Enables/disables VLAN-aware bridge mode (see page 325) for the bridge
configuration. The default value is False, so a traditional mode bridge is created.
Demo
A demo involving OpenStack with Cumulus Linux is available in the Cumulus Networks knowledge base. It
demonstrates dynamic provisioning of VLANs using a virtual simulation of two Cumulus VX leaf switches
and two CentOS 7 (RDO Project) servers; collectively they comprise an OpenStack environment.
cumulusnetworks.com 917
Cumulus Linux 3.5 User Guide
Anycast Architecture
Anycast relies on layer 3 equal cost multipath functionality to provide load sharing throughout the network.
Each server announces a route for a service. As the route is propagated through the network, each network
device sees the route as originating from multiple places. As an end user connects to the anycast IP, each
network device performs a hardware hash of the layer 3 and layer 4 headers to determine which path to
use.
Every packet in a flow from an end user has the same source and destination IP address as well as source
and destination port numbers. The hash performed by the network devices results in the same answer for
every packet, ensuring all packets in a flow are sent to the same destination.
In the following image, the client initiates two flows: the blue, dotted flow and the red dashed flow. Each
flow has the same source IP address (the client’s IP address), destination IP address (172.16.255.66) and
same destination port (depending on the service; for example, DNS is port 53). Each flow has a unique
source port generated by the client.
In this example, each flow hashes to different servers based on this source port, which you can see when
you run ip route show to the destination IP address:
On a Cumulus Linux switch, you can see the hardware hash with the cl-ecmpcalc command. In Figure 2,
two flows originate from a remote user destined to the anycast IP address. Each session has a different
source port. Using the cl-ecmpcalc command, you can see that the sessions were hashed to different
egress ports.
cumulusnetworks.com 919
Cumulus Linux 3.5 User Guide
As previously described, every packet in a flow hashes to the same next hop. However, if that next hop is no
longer valid, the traffic flows to another anycast next hop instead. For example, in the image below, if leaf03
fails, traffic flows to a different anycast address; in this case, server04:
For stateless applications that rely on UDP, like DNS, this does not present a problem. However, for stateful
applications that rely on TCP, like HTTP, this breaks any existing traffic flows, such as a file download. If the
TCP three-way handshake was established on server03, after the failure, server04 would have no
connection built and would send a TCP reset message back to the client, restarting the session.
This is not to say that it is not possible to use TCP-based applications for anycast. However, TCP
applications in an anycast environment should have short-lived flows (measured in seconds or less) to
reduce the impact of network changes or failures.
Resilient Hashing
Resilient hashing (see page 680) provides a method to prevent failures from impacting the hash result of
unrelated flows. However, resilient hashing does not prevent rehashing when new next hops are added.
As previously mentioned, the hardware hashing function determines which path gets used for a given flow.
The simplified version of that hash is the combination of protocol, source IP address, destination IP
address, source layer 4 port and destination layer 4 port. The full hashing function includes not only these
fields but also the list of possible layer 3 next hop addresses. The hash result is passed through a modulo of
the number of next hop addresses. If the number of next hop addresses changes, through either addition
or subtraction of the next hops, this changes the hash result for all traffic, including flows that have already
established.
Continuing with the example in Figure 3, leaf03 is in a failed state, so traffic is hashing to server04. This is a
result of the hash considering three possible next hop IPs (leaf01, leaf02, leaf04). When leaf03 is brought
back online, the number of possible next hop IPs grows to four. This changes the modulo value that is part
of the hashing function, which may result in traffic being sent to a different server, even if previously
unaffected by the change.
As you can see below, leaf03 is in a failed state. The blue dotted flow uses leaf02 to reach server02.
As leaf03 is brought back into service, the hashing function on spine02 changes, impacting the blue dotted
flow:
Just as the addition of a device can impact unrelated traffic, the removal of a device can also impact
unrelated traffic, since again, the modulo of the hash function is changed. You can see this below, where
the blue dotted flow goes through leaf01 and the red dashed line goes through leaf04.
Now, leaf02 has failed. As a result, the modulo on spine02 has changed from four possible next hops to
cumulusnetworks.com 921
Cumulus Linux 3.5 User Guide
Now, leaf02 has failed. As a result, the modulo on spine02 has changed from four possible next hops to
only three next hops. In this example, the red dashed line has rehashed to leaf03:
To help solve this issue, resilient hashing can prevent traffic flows from shifting on unrelated failure
scenarios. With resilient hashing enabled, the failure of leaf02 does not impact both existing flows, since
they do not currently flow through leaf02:
Although resilient hashing can prevent rehashing on next hop failure, it cannot prevent rehashing on next
hop addition.
You can read more information on resilient hashing in the ECMP chapter (see page 676).
Conclusion
Anycast can provide a low cost, highly scalable implementation for services. However, the limitations
inherent in network-based ECMP makes anycast challenging to integrate with some applications. An
anycast architecture is best suited for stateless applications or applications that are able to share session
state at the application layer.
cumulusnetworks.com 923
Cumulus Linux 3.5 User Guide
Contents
This chapter covers ...
Enabling RDMA over Converged Ethernet with PFC (see page 924)
Enabling RDMA over Converged Ethernet with ECN (see page 925)
Related Information (see page 926)
On Mellanox switches, you can alternately use NCLU to configure RoCE with PFC:
...
ecn_red.port_group_list = [ROCE_ECN]
pfc.ROCE_PFC.port_set = swp1
pfc.ROCE_PFC.cos_list = [1]
pfc.ROCE_PFC.xoff_size = 18000
pfc.ROCE_PFC.xon_delta = 18000
pfc.ROCE_PFC.tx_enable = true
pfc.ROCE_PFC.rx_enable = true
pfc.ROCE_PFC.port_buffer_bytes = 70000
ecn_red.ROCE_ECN.port_set = swp1
ecn_red.ROCE_ECN.cos_list = [0,1]
ecn_red.ROCE_ECN.min_threshold_bytes = 150000
ecn_red.ROCE_ECN.max_threshold_bytes = 1500000
ecn_red.ROCE_ECN.ecn_enable = true
ecn_red.ROCE_ECN.red_enable = true
ecn_red.ROCE_ECN.probability = 100
...
While link pause (see page 256) is another way to provide lossless ethernet, PFC is the preferred
method. PFC allows more granular control by pausing the traffic flow for a given CoS group,
rather than the entire link.
On Mellanox switches, you can alternately use NCLU to configure RoCE with ECN:
cumulusnetworks.com 925
Cumulus Linux 3.5 User Guide
ecn_red.port_group_list = [ROCE_ECN]
ecn_red.ROCE_ECN.port_set = swp1
ecn_red.ROCE_ECN.cos_list = [0,1]
ecn_red.ROCE_ECN.min_threshold_bytes = 150000
ecn_red.ROCE_ECN.max_threshold_bytes = 1500000
ecn_red.ROCE_ECN.ecn_enable = true
ecn_red.ROCE_ECN.red_enable = true
ecn_red.ROCE_ECN.probability = 100
...
Related Information
RoCE introduction — roceinitiative.org
RoCEv2 congestion management — community.mellanox.com
Configuring RoCE over a DSCP-based lossless network with a Mellanox Spectrum switch
Index
926 02 March 2018
Cumulus Networks
Index
4
40G ports 243
logical limitations 243
8
802.1p 247
class of service 247
802.3ad link aggregation 380
A
ABRs 619
area border routers 619
access control lists 134
access ports 343
ACL policy files 149
ACL rules 252
ACLs 134, 137, 156
chains 137
QoS 156
active-active mode 386, 435
VRR 386
VXLAN 435
active listener ports 182
Algorithm Longest Prefix Match 592
routing 592
ALPM mode 592
routing 592
AOC cables 23
apt-get 65
area border routers 619
ABRs 619
arp cache 815
ASN 635
autonomous system number 635
auto-negotiation 224
autonomous system number 635
BGP 635
cumulusnetworks.com 927
Cumulus Linux 3.5 User Guide
autoprovisioning 71
B
BFD 306, 674
Bidirectional Forwarding Detection 306
echo function 674
BGP 633, 637, 693
Border Gateway Protocol 633
ECMP 637
virtual routing and forwarding (VRF) 693
BGP peering relationships 649, 649
external 649
internal 649
bonds 313, 380
LACP Bypass 380
boot recovery 767
bpdufilter 291
and STP 291
BPDU guard 288
and STP 288
brctl 25
bridge assurance 291
and STP 291
bridges 319, 320, 320, 321, 325, 339, 343, 343
access ports 343
adding IP addresses 321
MAC addresses 320
MTU 319
trunk ports 343
untagged frames 339
VLAN-aware 320, 325
C
cable connectivity 23
cabling 301
Prescriptive Topology Manager 301
chain 137
cl-acltool 134, 253, 816
clagctl 367
class of service 247
cl-cfg 190, 780
cl-ecmpcalc 677
cl-license 22
cl-netstat 810
cl-ospf6 632
Clos topology 599
cl-resource-query 190, 768
cl-support 759
convergence 598
routing 598
Cumulus Linux 19, 20, 29, 29, 32, 403
installing 19, 32
reprovisioning 29
uninstalling 29
upgrading 20
VXLAN 403
cumulus user 100
D
DAC cables 23
daemons 180
datapath 247, 254, 256
link pause 256
priority flow control 254
datapath.conf 247
date 94
setting 94
deb 69
debugging 757
decode-syseeprom 770
differentiated services code point 247
dmidecode 771
dpkg 67
dpkg-reconfigure 93
DSCP 247
differentiated services code point 247
DSCP marking 253
dual-connected hosts 351
duplex interfaces 234
dynamic routing 308
and PTM 308
cumulusnetworks.com 929
Cumulus Linux 3.5 User Guide
E
eBGP 635
external BGP 635
ebtables 134, 141
memory spaces 141
echo function 674, 674
BFD 674
PTM 674
ECMP 600, 629, 637, 683, 1
BGP 637
equal cost multi-pathing 600
monitoring 1
OSPF 629
resilient hashing 683
ECMP hashing 677, 680
resilient hashing 680
EGP 601
Exterior Gateway Protocol 601
equal cost multipath 677
ECMP hashing 677
equal cost multi-pathing 600
ECMP 600
ERSPAN 817
network troubleshooting 817
Ethernet management port 20
ethtool 245, 809
switch ports 245
external BGP 635
eBGP 635
F
fast convergence 647
BGP 647
First Hop Redundancy Protocol 386
VRR 386
FRRouting 308, 308, 600
and PTM 308, 308
dynamic routing 600
globs 218
Graphviz 301
H
hardware 770
monitoring 770
hardware compatibility list 17
hash distribution 313
HCL 17
head end replication 409
LNV 409
high availability 600
host entries 768
monitoring 768
hostname 21
hsflowd 830
hwclock 94
I
iBGP 635
internal BGP 635
ifdown 207
ifquery 211, 805
ifup 206
ifupdown 206
ifupdown2 216, 341, 804, 804, 804
excluding interfaces 804
logging 804
purging IP addresses 216
troubleshooting 804
VLAN tagging 341
IGMP snooping 373, 391
MLAG 373
IGP 601
Interior Gateway Protocol 601
image contents 31
installing 19
Cumulus Linux 19
interface counters 810
interface dependencies 210
interfaces 222, 244
cumulusnetworks.com 931
Cumulus Linux 3.5 User Guide
statistics 244
internal BGP 635
iBGP 635
ip6tables 134
IP addresses 216
purging 216
iproute2 808
failures 808
iptables 134
IPv4 routes 637
BGP 637
IPv6 routes 637
BGP 637
L
LACP 313, 349
MLAG 349
LACP Bypass 380
layer 3 access ports 25
configuring 25
LDAP 109
leaf-spine topology 599
license 22
installing 22
lightweight network virtualization 407, 409, 410, 456
head end replication 409
service node replication 410
link aggregation 313
Link Layer Discovery Protocol 295
link-local IPv6 addresses 662
BGP 662
link pause 256
datapath 256
link-state advertisement 617
LLDP 295, 301
SNMP 301
lldpcli 296
lldpd 295, 303
LNV 407, 407, 409, 410, 456, 456
head end replication 409
service node replication 410
VXLAN 407, 456
load balancing 600
M
MAC entries 768
monitoring 768
Mako templates 219, 806
debugging 806
mangle table 253
ACL rules 253
memory spaces 141
ebtables 141
MLAG 349, 368, 368, 368, 373, 374, 377
backup link 368
IGMP snooping 373
MTU 374
peer link states 368
protodown state 368
STP 377
MLD snooping 392
monitoring 92, 757, 768, 774, 778, 809, 829, 832
hardware watchdog 774
Net-SNMP 832
network traffic 829
mstpctl 285, 345
MTU 235, 319, 374, 808
bridges 319
failures 808
MLAG 374
multi-Chassis Link Aggregation 349
cumulusnetworks.com 933
Cumulus Linux 3.5 User Guide
MLAG 349
multiple bridges 338
mz 815
traffic generator 815
N
name switch service 108
Netfilter 134
Net-SNMP 832
networking service 804
logging 804
network interfaces 206, 222
ifupdown 206
network traffic 829
monitoring 829
network troubleshooting 825
tcpdump 825
network virtualization 397, 403, 553, 566
VMware NSX 553, 566
nonatomic updates 143
switchd 143
non-blocking networks 600
NSS 108
name switch service 108
NTP 95
time 95
ntpd 95
O
ONIE 19, 30
rescue mode 30
onie-select 29
Open Network Install Environment 19
Open Shortest Path First Protocol 617, 631
OSPFv2 617
OSPFv3 631
open source contributions 17
OSPF 622, 628, 629, 630
ECMP 629
reconvergence 630
summary LSA 622
P
packages 64
managing 64
packet buffering 247
datapath 247
packet queueing 247
datapath 247
packet scheduling 247
datapath 247
PAM 108
pluggable authentication modules 108
parent interfaces 213
password 100
default 100
passwords 20
peer groups 648
BGP 648
Per VLAN Spanning Tree 282
PVST 282
ping 814
pluggable authentication modules 108
policy.conf 151
port lists 218
port speeds 234
Prescriptive Topology Manager 301
priority flow control 254
datapath 254
priority groups 247
datapath 247
privileged commands 103
protocol tuning 598, 666
BGP 666
routing 598
protodown state 368
MLAG 368
PTM 301, 674
cumulusnetworks.com 935
Cumulus Linux 3.5 User Guide
Q
QoS 156
ACLs 156
QSFP 811
Quagga 606
configuring 606
quality of service 247
querier 392
IGMP/MLD snooping 392
R
Rapid PVST 282
PVRST 282
read-only mode 665
BGP 665
recommended configuration 48
reconvergence 630
OSPF 630
repositories 69
other packages 69
rescue mode 30
resilient hashing 680, 683
ECMP 683
restart 190
switchd 190
root user 20, 100
route advertisements 635
BGP 635
route maps 592, 629, 665
BGP 592, 629, 665
route reflectors 635
BGP 635
routes 768
monitoring 768
routing protocols 597
RSTP 282
S
sensors command 771
serial console management 20
service node replication 410
LNV 410
services 180
sFlow 829
sFlow visualization tools 832
SFP 245, 811
switch ports 245
single user mode 767
smonctl 773
smond 773
snmpd 832
sources.list 69
SPAN 817
network troubleshooting 817
spanning tree parameters 292
Spanning Tree Protocol 281, 325
STP 281
VLAN-aware bridges 325
static routing 591
with ip route 591
storm control 292
STP 292
STP 281, 291, 292, 377
and bridge assurance 291
MLAG 377
Spanning Tree Protocol 281
storm control 292
stub areas 623
OSPF 623
sudo 100, 102
sudoers 102, 103
examples 103
summary LSA 622
OSPF 622
cumulusnetworks.com 937
Cumulus Linux 3.5 User Guide
T
tcpdump 825
network troubleshooting 825
templates 219
time 94
setting 94
time zone 93
topology 301, 598
data center 301
traceroute 814
traffic.conf 247, 247
traffic distribution 313
traffic generator 815
mz 815
traffic marking 252
datapath 252
troubleshooting 757, 767, 825
single user mode 767
tcpdump 825
trunk ports 339, 343
tzdata 93
U
U-Boot 19, 757
unnumbered interfaces 628, 632
OSPF 628
OSPFv3 632
untagged frames 339
bridges 339
upgrading 20
Cumulus Linux 20
user accounts 100
cumulus 100
root 100
user authentication 108
user commands 217
interfaces 217
V
virtual device counters 778, 781, 781
monitoring 778
poll interval 781
VLAN statistics 781
virtual routing and forwarding (VRF) 693, 696
BGP 693
table ID 696
visudo 102
VLAN 356, 778
statistics 778
switched virtual interface 356
VLAN-aware bridges 320, 325, 325
Spanning Tree Protocol 325
VLAN tagging 341, 341, 342
advanced example 342
basic example 341
VLAN translation 347
VTEP 397, 554, 567
vtysh 609
FRRouting CLI 609
VXLAN 397, 403, 407, 435, 456, 554, 567, 778
active-active mode 435
LNV 407, 456
no controller 403
statistics 778
cumulusnetworks.com 939
Cumulus Linux 3.5 User Guide
W
watchdog 774
monitoring 774
Z
zebra 601
routing 601
zero touch provisioning 71, 72
USB 72
ZTP 71