Majority Node Set Clustering
*Update: This article is specific to Windows 2003 MNS. Windows 2008 MNS behavior is slightly different so my comments below may not apply to 2008 clusters*
When I was first introduced MNS, I hated this little feature of Windows 2003. In my opinion, MNS has created some confusion in the marketplace as it has been positioned (incorrectly) by some as “the” solution for geographically dispersed clustering. I’ve seen many posts over the years in the newsgroups from folks that have setup their MNS clusters and now want to know how to make their cluster work without shared storage. News flash: MNS does not mean that you not need shared storage for your cluster. MNS only means that the quorum disk no longer requires a physical drive resource for the quorum and the same rules still apply for the rest of your clustered resources. Want to have a print spool or DTC resource? Sorry, you still will need a physical disk resource in your cluster. If you’ve got an application that does not require shared data, then maybe MNS is the solution for you but most cluster applications will have a shared disk requirement.
It is my belief that no one in their right might would use MNS in a HA environment that is geographically dispersed…unless you plan to span 3 sites. Why would I say this? Well, if the goal of HA in your environment is to maintain uptime, why would you introduce a “feature” that will guarantee a total cluster outage if half of the cluster is suddenly unavailable? When you look at geographically dispersed clustering, you’re typically looking for a solution that can help you survive a total site outage…why else would you spend the time/money on a geo-cluster? With MNS, chances are high that your whole cluster is going down in all site disaster scenarios. Let’s take a look at some of these scenarios with MNS:
Scenario 1 - Primary site has 2 nodes and DR site has 2 nodes. If you lose either site, you will lose the entire cluster since no site can ever have a majority in this situation. This will only happen to you once and the lesson learned here is that we never want to have an even number of nodes with MNS.
Scenario 2 – Primary site has 3 nodes and DR site has 2 nodes. Your cluster can now survive the outage of the DR site, but cluster will not survive an outage of the primary site…which sort of defeats the whole purpose of having a DR site in the first place.
Scenario 3 - Primary site has 2 nodes and DR site has 3 nodes. Your cluster can now survive the outage of the Primary site, but now the cluster will not survive an outage of the DR site…which again seems to defeat the whole purpose of having a DR site when your DR site causes an outage of your production cluster applications.
Some will argue that in each of these scenarios, you can MANUALLY get your cluster up and running if you use the FORCEQUORUM procedure…which I do not deny. At least you do have some capability to get a somewhat working cluster up in these DR scenarios, even if it is a manual solution. There’s another HUGE gotcha here that is not often talked about or documented well. After you’ve started your cluster using /forcequorum, when the other nodes come back online, these nodes CANNOT join back into the cluster. In order to get your cluster back up and running again, you need to TAKE DOWN THE ENTIRE CLUSTER and then start all of the cluster nodes normally with no flags. Of course you can plan this downtime but no one ever wants to hear that their whole cluster has to go offline. This seems to defeat the entire purpose of HA when you are requiring a cluster shutdown to recover from your recovery procedure.
Based on the above scenarios, you might start to see why I would make the claim that no sane cluster admin would use MNS in their environment. If you substitute a shared disk quorum in any of these scenarios, the cluster would survive the outage as long as any one node survives. Also, a total cluster outage is not required to get the other nodes back into the cluster.
I think I’ve done enough bashing of MNS so let’s start to look at some of the good points. One of the major distance limitations of geographically dispersed clustering is Microsoft’s requirement that the quorum disk is only supported using synchronous replication for the quorum disk. From KB 280743, “The quorum disk must be replicated in real-time, synchronous mode across all sites.” This limits the possible distance for your geo-cluster solution based on the replication technology you are using…with EMC’s SRDF, this limit is approximately 200km for SRDF/S. Well if you use a MNS quorum instead of a disk quorum, you are no longer limited by the requirement of synchronous replication technology. With MNS, your only limit is the network latency requirement, and even this has some flexibility now with the introduction of hotfix 921181. So this is one key reason why you might consider using MNS over a shared disk quorum resource. If you are looking for an extended distance geo-cluster solution, MNS is the only way to go.
Another new feature introduced in hotfix 921181 is the ability to use a File Share Witness (FSW) node with 2-node MNS clusters. This allows you to have a file share anywhere in the network (doesn’t need to be on the same subnet!) and this node will be used as the decision maker when one of the nodes fails. You could even setup this FSW as a clustered file share resource in a separate cluster giving another level of protection to this decision maker. The downside to FSW is that it currently only works in 2-node clusters. In Longhorn, this will change but today the feature only works in 2-node clusters. If you add a third node to your cluster, the cluster will ignore the FSW settings. Another minor downside is that the FSW share does not contain a full copy of the CLUSDB so you could not restore a cluster registry hive using the data from the FSW. The FSW is only used to help make the decisions during quorum arbitration and does not contain the clusdb info.
Another place where MNS might be the better option would be the geographically dispersed cluster that spans three sites. A three site synchronous disk replication of the quorum disk will prove to be a challenge for any vendor. Based on the quorum disk’s synchronous requirement, you’re also going to need 3 sites that are all within 200km of each other which may also prove to be a challenge. With three sites, I typically would picture having two Primary sites replicating data to a single DR site. Each of the primary sites would have replication running between themselves and the DR site, but no replication between the primary sites. This sort of configuration would only be possible with a MNS quorum. In this three site scenario, you could survive the failure of any single site and keep the other two up and running.
So overall, I don’t hate MNS nearly as much as I used to. I can see that it has its place, and can see some benefits in specific scenarios.