System-Level Fault Tolerance (Clustering/Network Load Balancing) - Examining Windows Server 2003 Clustering Technologies (Page 3 of 26 ) Windows Server 2003 provides two clustering technologies, which are included on the Enterprise and Datacenter server platforms. Clustering is the grouping of independent server nodes that are accessed and viewed on the network as a single system. When an application is run from a cluster, the end user can connect to a single cluster node to perform his work, or each request can be handled by multiple nodes in the cluster. In cases where data is read-only, the client may request data and receive the information from all the nodes in the cluster, improving overall performance and response time. The first clustering technology Windows Server 2003 provides is Cluster Service, also known as Microsoft Cluster Service (MSCS). The Cluster Service provides system fault tolerance through a process called failover. When a system fails or is unable to respond to client requests, the clustered services are taken offline and moved from the failed server to another available server, where they are brought online and begin responding to existing and new connections and requests. Cluster Service is best used to provide fault tolerance for file, print, enterprise messaging, and database servers. The second Windows Server 2003 clustering technology is network load balancing (NLB) and is best suited to provide fault tolerance for front-end Web applications and Web sites, Terminal servers, VPN servers, and streaming media servers. NLB provides fault tolerance by having each server in the cluster individually run the network services or applications, removing any single points of failure. Certain applications—for example, Terminal Services—require a client to connect to the same server during the entire session, while clients viewing Web sites can request pages from any node in the cluster during a visit. Configuring how client/server communication is divided and balanced across the servers is dependent on the application's needs. Note: Microsoft does not support running both MSCS and NLB on the same computer due to potential hardware sharing conflicts between the two technologies. Reviewing Cluster Terminology Before you can design and implement MSCS and NLB clusters, you must understand certain clustering terminology. The following list describes key terms associated with Windows Server 2003 clustering: Cluster—A cluster is a group of independent servers that are accessed and viewed on the network as a single system. Node—A node is an independent server that is a member of a cluster. Cluster resource—A cluster resource is a network application or service defined and managed by the cluster application. Some examples of cluster resources are network names, IP addresses, logical disks, and file shares. Cluster resource group—Cluster resources are contained within a cluster in a logical set called a cluster resource group, or commonly referred to as a cluster group. Cluster groups are the units of failover within the cluster. When a cluster resource fails and cannot be restarted automatically, the entire cluster group is taken offline and failed over to another available cluster node. Cluster virtual server—A cluster virtual server is a cluster resource group that contains a network name and IP address resource. Virtual server resources are accessed either by the domain name system (DNS) or NetBIOS name resolution or directly from the IP address. The name and IP address remain the same regardless of which cluster node the virtual server is running on. Cluster heartbeat—The cluster heartbeat is the communication that is kept between individual cluster nodes that is used to determine node status. Typically, heartbeat communication between nodes must be no longer than 500 milliseconds, or the nodes may believe that there is a failure and commence cluster group failovers. Cluster quorum disk—The cluster quorum disk maintains the definitive cluster configuration data. MSCS uses a quorum disk or disks and requires continuous access to the cluster configuration data contained within it. The quorum contains configuration data defining which server nodes actively participate in the cluster, what applications and services are defined in the cluster, and the current states of the resources and the individual nodes. This data is used to determine whether a particular resource group or groups need to be failed to an available cluster node in the event of a failure on an active node. If a cluster node loses access to the quorum, the Cluster Service will fail on that node. In a typical MSCS cluster, the quorum resource is located on a shared storage device. Local quorum resource—Like the quorum resource, the local quorum contains the cluster configuration data. Unlike the standard quorum device that is usually housed on a shared disk, the local quorum is kept on a node's local disk. The local quorum resource was created for single-node cluster configurations, commonly used for cluster application development and testing. Majority Node Set (MNS) resource—The MNS resource is the quorum resource used for a Majority Node Set cluster. The MNS resource maintains consistent configuration data across all the nodes in the cluster. If the MNS quorum is lost, it can be recovered by "forcing the quorum" on a remaining cluster node. Refer to the Windows Server 2003 online help and look for the topic "Forcing the Quorum in a Majority Node Set Cluster." Generic cluster resource—Generic cluster resources were created to define cluster-unaware applications within a cluster group. This gives the ability to fail the resource over to another node in the cluster when the active node fails. This resource is not monitored by the cluster application; therefore, application failure does not result in a restart or failover scenario. Generic cluster resources include the generic application, generic script, and generic service resources. For more information on these resources, refer to the Windows Server 2003 Help and Support tool and search for "generic cluster resources." Cluster-aware application—A cluster-aware application provides a mechanism by which the Cluster Service can test the application availability to determine whether it is functioning as desired. When a cluster-aware application fails, the cluster can stop and restart the application as necessary on the same node and, if necessary, move it to another available node where it can be restarted. Cluster-unaware application—A cluster-unaware application can run on a cluster, but the application itself is not monitored by the Cluster Service. This means that the cluster can fail over the application only in the event that another resource fails in the cluster group. If the application stops responding, the cluster is not aware and therefore cannot restart it. Keep in mind that there are other ways to manage cluster-unaware applications outside the cluster, and in some cases these approaches may be the only option. For more information on how to install and configure generic applications, refer to the Windows Server 2003 Help and Support and search for "generic application resource type." Failover—Failover is the process of a cluster group moving from the current active node to another available node in the cluster. Failover occurs when a server becomes unavailable or when a resource in the cluster group fails and cannot recover with the failure threshold. Failback—Failback is the process of a cluster group moving back to a preferred node after the preferred node resumes cluster membership. Failback must be configured within a cluster group for this to happen. The cluster group must have a preferred node defined and a failback threshold configured. A preferred node is the node you would like your cluster group to run on during regular cluster operation. When a group is failing back, the cluster is performing the same failover operation but is triggered by a server rejoining or resuming cluster operation instead of by a server or resource failure.
Note - Plan carefully when considering failback. For more information, refer to the "Configuring Failover and Failback" section later in this chapter.
This chapter is from Microsoft Windows Server 2003 Unleashed, by Rand Morimoto, et al. (Sams Publishing, 2004, ISBN: 0672326671). Check it out at your favorite bookstore today.
Buy this book now. |
Next: Active and Passive Clustering Modes >>
More MS SQL Server Articles More By Sams Publishing |