Pages

Wednesday, 14 March 2012

Understanding the concept of SQL Server Failover Cluster

Before getting hands on experience in SQL server on failover cluster , I used to read many article about failover clustering. Unfortunately I was not able to digest the concept till I wet my hands.I will keep this point in my mind while explaining the failover cluster in this post.

What is a windows Cluster ? A cluster is a group of independent computer systems, referred to as nodes, working together as a unified computing resource. When we say unified computing , it does not mean that a single application can access the resources (cpu/memory) of multiple computer at any point of time. A cluster provides a single name for clients to use it services.There are two type of cluster:

Network Load Balancing Cluster : A Network Load Balancing Cluster (NLB) distribute the load (request from the client)  to the different nodes which are part of the cluster based on predefined rules.The  client application has to communicate to a single cluster IP address (virtual server)  and client does not know which node in the cluster exactly serving its request.Network Load Balancing cluster helps to enhance the the availability and scalability of  application. We are not going to talk much about this on this post.

Fail over Cluster:  Failover cluster is a collection of servers that by working together increase the availability of applications and services that run on the cluster. It can be  described as a technology that automatically allows one physical server to take over the tasks and responsibilities of another physical server that has failed.A failover cluster provides high availability for cluster aware application like SQL server ,exchange server,etc. The major difference between NLB cluster and failover cluster is, failover cluster will not help to improve the scalability of the application. It can only support high availability.

There are many misunderstanding among people about the advantage of having SQL server on a failover cluster environment. Below table will give a clear picture about the capability of SQL server on cluster environment.


To understand the failover cluster it is important to familiar with the terminologies. Let us have look on the terminologies used in the windows clustering.

Server Node: Physical node with operating system that support windows clustering. Each server node should have minimum two network card for public and private network.Minimum one local hard disk is required for OS and other application binaries.

Private Storage: Local disks are referred as private storage. There will be minimum of one private disk for OS and SQL binaries.Server node can have a additional local disk for bigger page file.
Shared Disk Array: Each server needs to be attached to the shared external storage.In non-clustered SQL server instance, databases (system/user/resource) are store on locally attached disk storage but in clustered SQL server instances databases are store data on a shared disk array.That mean all the server nodes in the cluster setup are physically connected to the disk array.This shared storage configuration allow application to failover between server in the cluster.

Quorum Disk: Quorum is the cluster's configuration file.This file (quorum.log) resides in the the quorum disk (one disk from shared disk array).Quorum is the main interpreter between all nodes. It stores latest cluster configuration and resource data. This helps the other nodes to take ownership when one node goes down.
In real life, it is like emergency contact number and other medical information in your wallet which help others to take action in case of emergency.

Public Network and Private Network: Each server needs at least two network cards. Typically, one is the public network and the other is a private network between the nodes. The private network is used for internal communication of cluster.This is referred as heart beat. Public network is used to connect the external world or intranet.

Heart Beat: Heart beat is health check mechanism in cluster A single UDP packet sent between nodes in the cluster via the private network to confirm that  nodes are still online. By default cluster service will wait five seconds(one heart beat sent in every second)  before considering a cluster node to be unreachable.

Looks Alive check:Looks alive check is a basic resource health check to verify that the service(SQL service in our context) is running properly.To perform this , cluster service queries the windows service control manager to check the status of the service.By default looks alive check will happen in every five seconds.

Is Alive check: An exhaustive check to verify that a resource is running properly. If this check fails, the resource is moved offline and the failover process is triggered. During the Is alive check  the cluster service connects to the SQL server instance and execute select @@SERVERNAME.It will check only the SQL server instance availability and does not check the availability of user databases.

Failover: Clusters service monitor the health of the nodes in the cluster setup and the resources in the cluster. In the event of a server/resource failure, the cluster service  re-starts the failed server's workload on one of the remaining servers based on possible owners settings. The process of detecting failures and restarting the application on another server in the cluster is known as failover .


Virtual Server:Virtual server consist of a network name and IP address to which the client connect.This helps the client to connect the service  which hosted in the cluster environment with out knowing the details of the server node on which the service is running.In simple word , virtual server helps the client application to connect the SQL server  instance with out changing the connection string in case of failover of SQL instance.

Possible Owners :Possible owners are the server nodes on which cluster group (in our context SQL instance) can failover

Preferred Owner: Preferred owner is the server node which best suited to running an application or group.

Cluster Setup: Below schema diagram will give a clear picture about the windows cluster setup.























This picture depicts  two  node cluster setup.At this stage Node A and Node B will have only OS and windows cluster service.One of the disk from cluster disk will be designated as the quorum disk. This setup basically done by windows administrators and storage specialists.On top of this,  DBA's install  and configure the SQL instances. We are not going talk about the SQL server installation on cluster environment on this post.

To make it more clear, let us describe our cluster environment as given below.

















Let us look into the pictorial representation current setup.
Fig 1





















From the Fig-1, we can see that INST1 and INST2 are installed on both physical node , but at any point of time one instance will be online only in one node.At this stage INST1 is running on physical node Pnode-A and INST2 is running on node Pnode-B.The cluster disk D and E will be visible in the node Pnode-A and F and G will be visible in the node Pnode-B.The request from the App1 will be routed through the virtual server A and landed in Pnode-A where the INST1 is running.In the same way the request from the App2 will be routed through the virtual Server B and landed in Pnode-B where the INST2 is running.

Let us see what will happen if there is a hardware/network failure in Pnode-A. Fig 2 depicts the failover of  scenario of INST1 to Pnode-B.

Fig 2





















Now the INST1 went offline in Pnode-A and came online Pnode-B.The connection from the App1 will be routed through the same virtual Server A and landed in Pnode-B.All four cluster disk will be visible in Pnode-B and only C (Local disk) will be accessible from Pnode-A. Note that even after the failover of INST1,App1 will be connecting to the virtual server A.It helps to make the failover transparent to the application.

Thank you for reading this article. Please feel free to pass your comments to improve the quality of the post. To follow this blog visit my Facebook Page





39 comments:

  1. Nice post Nelson, keep it up!

    ReplyDelete
  2. that is a nice post on one of my topic of interest. There is some excellent database related stuff here also - distributed database system, Normalization, codd's rules etc. explained so nicely : crazy4db.blogspot.in

    ReplyDelete
  3. Your definition can lead to misinterpretation.

    A cluster is a group of independent computer systems, referred to as nodes, working together as a unified computing resource

    "Working together as a unified computing resource" can lead to the interpretation or assumption that, given a two-node cluster, both physical servers will work together as a unified computing resource. Another way to say this would be that programs would have access to processors and RAM on both servers.

    This is NOT the case.

    Programs and services run on one and only one server on the cluster. If the cluster node is inaccessible, the programs/services running on the inaccessible node will be spun up on another cluster node. At no time will a service or program run across multiple nodes in the cluster.

    ReplyDelete
    Replies
    1. I agree with you and made slight modification in the statement

      Delete
  4. not a bad cover of clustering but you should note the following

    vserver-a 10.23.23.30
    vserver-b 10.23.23.31

    in the above you have not mentioned the values used for the Windows cluster.
    Whilst a failure in the public network will initiate a failover, a failure in the heartbeat network will not
    regards
    Perry

    ReplyDelete
  5. "All four cluster disk will be visible in Pnode-B and only C (Local disk) will be accessible from Pnode-A" --> This should be reflected in the graph.

    Good article!

    ReplyDelete
    Replies
    1. Thank you for the article. It describes things in very simple terms and it's a very good introductory article. I'm very new at this. Could you clarify exactly where the Virtual Servers are defined/set-up and how a virtual server may go down?

      Thank you.

      Delete
  6. What are the advantages of using clustering instead of availability groups?

    ReplyDelete
  7. Took me time to read all the comments, but I really enjoyed the article. It proved to be Very helpful to me and I am sure to all the commenters here! Its always nice when you can not only be informed, but also entertained! Im sure you had fun writing this article.
    Valuable information and excellent design you got here! I would like to thank you for sharing your thoughts and time into the stuff you post!!
    http://www.sqlservermasters.com/

    ReplyDelete
  8. Such a nice article. Easy enough to understand.

    ReplyDelete
  9. Very informative..keep up the good work.
    http://udayarumilli.com/

    ReplyDelete
  10. Its really good to know about that the procedure and the other key features mentioned here are quite considerable and most necessary as well, would be so far better idea to have these information to run the system in a systematic manner.

    Service Management Software

    ReplyDelete
  11. Thank you for the article. Its really good, but i need some troubleshooting steps related to this articles. If for example nodeA goes down, With in few minutes noteA and noteB will be available in noteB. But my Query is how we will get back this NoteA to the original position and what are the basic steps that we need to be taken to get back the NodeA to the original position.

    ReplyDelete
  12. Really appreciate the way, information is shared.

    Sheeraz

    ReplyDelete
  13. Thanks for article..its really helpful...especially for beginners

    Bipin

    ReplyDelete
  14. Thanks a lot to you for the article. Its really good, and helpful for me to understand the basic concept regarding Clustering.

    ReplyDelete
  15. Please tell me that youre heading to keep this up! Its so great and so important. I cant wait to read a lot more from you. I just feel like you know so substantially and know how to make people listen to what you might have to say. This blog is just too cool to become missed. Terrific stuff, genuinely. Please, PLEASE keep it up!

    ReplyDelete
  16. The diagram is misleading and "The cluster disk D and E will be visible in the node Pnode-A and F and G will be visible in the node Pnode-B" is wrong.
    Every node in the cluster can equally access shared disk storage. In Failover Cluster environment only one node is
    active at a time and is connected to shared disk storage.

    ReplyDelete
  17. Hey there I am so glad I found your site, I really found you by accident, while I was researching on Aol for something else, Anyhow I am here now and would just like to say thanks for a tremendous post and a all round thrilling blog (I also love the theme/design), I don’t have time to read through it all at the minute but I have book-marked it and also added in your RSS feeds, so when I have time I will be back to read a lot more, Please do keep up the fantastic work.

    ReplyDelete
  18. This is one technology that I would love to be able to use for myself. It’s definitely a cut above the rest and I can’t wait until my provider has it. Your insight was what I needed. Thanks

    ReplyDelete
  19. This is one technology that I would love to be able to use for myself. It’s definitely a cut above the rest and I can’t wait until my provider has it. Your insight was what I needed. Thanks

    ReplyDelete
  20. am speechless. It is a unbelievable weblog and very partaking too. Great work! That's probably not a lot coming from an beginner blogger like me, but it's all I may assume after having fun with your posts. Nice grammar and vocabulary. Not like different blogs. You actually know what you are speaking about too. So much that you made me want to learn more. Your blog has turn into a stepping stone for me, my fellow blogger. Thank you for the detailed journey. I actually enjoyed the 6 posts that I have learned so far.

    ReplyDelete
  21. http://www.lgnetworksinc.com/it-consulting/microsoft-exchange-server-support-and-consulting/

    ReplyDelete
  22. Please tell me that youre heading to keep this up! Its so great and so important. I cant wait to read a lot more from you. I just feel like you know so substantially and know how to make people listen to what you might have to say. This blog is just too cool to become missed. Terrific stuff, genuinely. Please, PLEASE keep it up!

    ReplyDelete
  23. Please tell me that youre heading to keep this up! Its so great and so important. I cant wait to read a lot more from you. I just feel like you know so substantially and know how to make people listen to what you might have to say. This blog is just too cool to become missed. Terrific stuff, genuinely. Please, PLEASE keep it up!

    ReplyDelete
  24. Please tell me that youre heading to keep this up! Its so great and so important. I cant wait to read a lot more from you. I just feel like you know so substantially and know how to make people listen to what you might have to say. This blog is just too cool to become missed. Terrific stuff, genuinely. Please, PLEASE keep it up!

    ReplyDelete
  25. Please tell me that youre heading to keep this up! Its so great and so important. I cant wait to read a lot more from you. I just feel like you know so substantially and know how to make people listen to what you might have to say. This blog is just too cool to become missed. Terrific stuff, genuinely. Please, PLEASE keep it up!

    ReplyDelete
  26. Hi the information on this blog is just amazing it keeps me coming back time and time again ,personally i met my wife using this site so i couldnt like it any more i have done my best to promote this blog as i know that others need to read this thing ,Thanks for all your effort spent in making this fabulous resource ! ok,nice one Jake

    ReplyDelete
  27. This design is incredible! most certainly know how to keep a reader entertained. Between your wit and your videos, I was almost moved to start my own blog (well, almost...HaHa!) Great job. I really enjoyed what you had to say, and more than that, how you presented it. Too cool!

    ReplyDelete
  28. By far essentially the most succinct and also up-to-date information I found about this topic matter. Certain pleased that I stumbled upon your write-up by opportunity. I will likely be signing up on your rss feed so as I'll receive the newest posts. Enjoy everything right here.

    ReplyDelete
  29. hello m8 the information on this site is just incredible it keeps me coming back time and time again ,personally i met my wife using this site so i couldnt love it any more i have done my best to promote this site as i feel that others need to see this thang ,thankyou for all the time spent in making this fabulous site ! ok,nice one Billy

    ReplyDelete
  30. By Reading this anyone can understand easily about 2 node Cluster. Many Thanks. You can also mention the Network IP /Network name in fig 1

    ReplyDelete
  31. In SQL cluster environment, when we create sql jobs, do we need to create job on both nodes individually?

    ReplyDelete
    Replies
    1. No we will create it in only once (in an instance level not on node). Job is not part of the node it is part of the instance.

      Delete
  32. Can you let me know how many IPs are required for cluster configuration , Minimum is 5 is there any max limit , if so for what all, can you explain me in detail regarding that.

    ReplyDelete
    Replies
    1. For a two node cluster service , you need minimum 5 IPs. 2 for SQL instances (Logical IP address), two for physical nodes and one for cluster.You may need more than 5, if you are planning install more than two instance of SQL server in a two node cluster. Hope this answer to your question

      Delete