Perfect multicast storm

This is a story of a well meaning default causing more problems than a randomly picked value. Read it if you are running BEA Weblogic cluster on a switched network, especially with a CISCO switch. Read it even if you do not run WLS cluster, but are interested in TCP voodoo.

Weblogic server is a combination of many technologies. Quite a number of these technologies used to require a dedicated professional to configure and maintain. I am talking about such subsystems as SSL, transactional JDBC and - case to the point - multicast. Back then, the user interfaces and procedures were arcane and nobody was surprised that a specialist was required to make sure things were working.

Nowadays, people seem to believe that the default settings for a sub-system they are not familiar with will fullfil their need - whatever that need may be. And most of the time, the defaults do seem to work, especially when tested in an environment that does not match production setup in throughput, domain, or network layout (e.g. cluster). However, when the code is promoted to production, unexpected happens.

For this example, let’s talk about multicasting. By itself, multicasting is a fairly heavy voodoo and applications that require multicast are usually setup and configured by gurus/vendors who ensure that all the i’s are dotted and t’s are crossed.

But BEA Weblogic ships with and uses multicast implementation as part of its clustering technology. As it is used in a limited scenario (single network, no switch cross-over, etc), multicast administration is suddenly simplified to a simple set of properties: primarily multicast address/port (others are Send Delay, TTL and buffer size). Look at the configuration page, all values are preset to defaults. In fact, you don’t even need to see that page when you create your cluster; you have to manually choose it if anything needs to be changed. One way or another, the values are usually left at the defaults even for very large deployments.

Now, Weblogic’s default multicast address is 237.0.0.1. If you cringed at this point, do not continue reading. You know way too much about TCP, MAC address and multicast special cases. You are a guru.

For the rest of us, following is an explanation why this default is a bad one.

If you read multicast RFCs - which are many in number (3170, 2236, 1112, 2365, etc) you will find that not all multicast addresses are equal. There is a magic address 224.0.0.1 on which all IP hosts will listen and this includes gateways (which covers CISCO routers and switches).

Now, 224.0.0.1 is NOT 237.0.0.1, so there should be no issue. And on TCP layer, there is none. The problem happens bellow TCP layer. OSI Layer 2 switches (of which CISCO switch is an example), do not actually listen to TCP directly. Instead they listen for the multicast MAC address. There is a direct mapping between multicast IP address and multicast MAC address, but it is not unique. In fact, there can be 32 different IP addresses corresponding to the same MAC address (as described in this book excerpt, section 1.6.3).

So, to follow the specification, CISCO switch has to listen for the multicast address 224.0.0.1, which is MAC address 01-00-5e-00-00-01. But that is the MAC address for 237.0.0.1 as well. Therefore, every time a packet passes by for 237.0.0.1, CISCO switch firmware triggers a match.

Once the match is triggered, software part of the switch examines packet’s full IP address and figures out that it was not a 224.0.0.1, but just one of the other 31 co-sharing IPs. The packet is discarded and nobody is hurt.

Or so it seems. Turns out that these interrupts are relatively expensive. When a packet is supposed to be sent to a different network, this cost is part of a business. But when it happens every time for a fairly frequent internal cross-server chit-chat (e.g. JNDI tree updates), the switch gets interrupted too often. And a switch that is interrupted unexpectedly often, may be sluggish to respond to legitimate requests.

This, under high enough load, can cause unexpected network timeouts that are nearly impossible to detect. After all, how often do you check your switch’s CPU utilisation, when your DNS misbehaves?

So, what’s the solution. Easy! Switch to a different multicast address that does not map to the same MAC as the 224.0.0.1. Weblogic will do exactly that for the next release.

The proposed new address is 239.192.0.0. If this makes you cringe, please let BEA know NOW! 🙂

BlogicBlogger Over and Out