First of all, I’d like to say that spanning tree is a Good Thing. It saves you from loops, which will completely turn down your network. But it has to be set up right for it to work right. I can’t tell you how many times a desperate client with a badly broken network has called me and I’ve said, “Sounds like a spanning tree problem.”
There are many ways that spanning tree can go wrong. In this article, I’ve put together a few of the themes that keep coming up.
1. Not setting up spanning tree in any way
I already said that spanning tree is good. But a lot of switch vendors turn it off by default for some reason. So you might have to turn on the protocol right away.
People sometimes turn off spanning tree on purpose. Most of the time, spanning tree is turned off because the original 802.1D Spanning Tree Protocol (STP) has a long wait time between when a port becomes electrically active and when it starts to pass traffic. This waiting time, which is usually 45 seconds, is long enough that DHCP can stop trying to get an IP address for this new device.
The problem could be fixed by turning off spanning tree on the switch. This is not the right answer.
The right solution is to set up a Cisco switch feature called PortFast. You set up the command “spanning-tree portfast” on all the ports that connect to end devices like workstations. (Most switch manufacturers have a similar feature.) The waiting period is then skipped automatically, and DHCP works as it should.
It’s important to make sure that this command is only set up on ports that connect to end devices. Ports that connect to other switches need to share information about the spanning tree.
2. Letting Your Network Decide which Root Bridge To Use
As the name suggests, spanning tree gets rid of loops in your network by making a logical tree structure between the switches. The switch that becomes the tree’s root is called the “root bridge.” Then, each of the other switches figures out the best way to reach the root bridge.
If there are more than one path, spanning tree chooses the best one on each switch and makes the other ports blocked. So, there’s only one way for any two devices on the network to talk to each other, even if it’s a long way around.
Each switch in a spanning tree has something called a bridge priority. The root bridge is the switch with the minimal priority. If both switches have the same bridge ID number, the switch with the lower number wins. Most of the time, the ID number comes from the MAC address on the switch.
The problem is that every switch has the same priority setting by default (32768). So, if you don’t set a switch’s bridge priority value to something better (lower), the network will choose a root for you. Then Murphy’s Law comes into play. The root bridge that results could be a small edge switch with slow uplinks and few backplane resources.
A bad choice of root bridge can make things even worse because it can make the network less stable. spanning tree fixes itself pretty quickly if there is a problem with connectivity that takes any random switch off the network. But if the root bridge goes down or if the failure means that some switches can no longer reach the root bridge, this is a major change to the topology. There needs to be a new root bridge chosen. During this time, the whole network will freeze, and no packets can be sent.
I always say that the core switch should be made to the root bridge. I like to choose a backup root bridge as well. If there are two redundant core switches, one will be the root bridge and the other will be my backup.
Set the bridge priority on the main root bridge to the best possible value, which is 4096, and set the bridge priority on the backup root bridge to the next best value, which is 8192. Why do they look so strange? Well, that’s a long story that we can’t tell here, but the lower bits in the priority field are used for something else, so they can’t be used as priorities.
3. Using legacy 802.1D
802.1D is the name of the first open standard for spanning tree. It’s one of the earliest standards in the IEEE 802 series, which includes all the rules for Ethernet, Wi-Fi, and a lot of other protocols. Even though it’s old, it still works well, and you can find this kind of spanning tree on almost every switch. Any switch that doesn’t support 802.1D should only be used in small, isolated environments and should never be connected to any other switches.
But since 802.1D, there have been several important changes to spanning tree. With these improvements, a network can recover from a broken link in less than a second. It can also grow to bigger networks and have different spanning tree topologies and root bridges for different VLANs. So, using them makes a lot of sense.
The protocol that most modern Cisco switches use by default is called Per-VLAN RSTP, which stands for Rapid Spanning Tree Protocol. On each VLAN, it runs automatically a separate spanning tree domain with a separate root bridge. In practise, though, it is common for the same switch to be the root bridge for all or most of the VLANs.
Most likely, you’ll find the rapid feature, or RSTP, to be the most useful. This lets the network get back up and running after most failures in about 1 to 2 seconds. MST, which stands for Multiple Instance Spanning Tree, is the same as RSTP. The main difference is that you can set up groups of VLANs that are all part of the same tree structure with a single root bridge. But Per-VLAN RSTP is easier to set up, so I suggest using it most of the time. Also, I’ve had trouble getting MSTP from different switch vendors to work together.
4. Mixing Types of Spanning Trees
From what was said about 802.1D, RSTP, and MST in the last section, it should be pretty clear that mixing them could get messy. The RSTP and MST protocols have instructions for how to deal with this mixing. In general, it means putting groups of switches that run different types of spanning tree in their own zones in the network. This rarely means that the most efficient path between devices is chosen.
Mixing spanning tree types is only a good idea if you want to use older equipment that doesn’t support the newer protocols. As time goes on, there should be fewer and fewer of these legacy devices, and the number of places where it makes sense to mix the protocols should go down.
I think you should choose one, preferably RSTP or MST, and use it consistently on all of your switches.
5. MST On Trunks That Have Been Cut Down
Because MST allows a single spanning tree structure to support multiple VLANs, you need to be very careful about your inter-switch trunks.
I once had a client whose network was big and complicated, with a lot of switches and a lot of VLANs. They had MST going. For ease of use, they only set up one MST instance, which meant that all VLANs were controlled by the same root bridge.
The problem for this client started when they decided for security reasons that certain VLANs should only be on certain switches. All of this makes sense. So, they took the VLAN off of the main trunks between switches and added new trunks that were only for these secure VLANs. Everything went wrong.
MST thought that all VLANs were part of the same tree, so it used that assumption to decide which trunks to block and which to forward. But since some VLANs were only on some trunks and other VLANs were on the other trunks, blocking a trunk meant that only some of the VLANs could go through. When the other trunk was blocked, only the other set of VLANs could go through. For the VLANs that were blocked, there was no way to get to the root bridge.
So, if you want to use MST, you need to make sure that all VLANs are passed on all trunks, or you need to carefully and manually create different MST instances for each group of VLANs with different topological needs. In other words, you need to carefully look at the data and plan the network in the right way. Or you could run Per-VLAN RSTP, which is the easy way out.