System-Level Fault Tolerance (Clustering/Network Load Balancing)
(Page 1 of 26 )
Windows Server 2003 provides several methods of improving system- or server-level fault tolerance by using a few of the services included in the Enterprise and Datacenter platforms. This chapter covers system-level fault tolerance using Windows Server 2003 network load balancing (NLB) and the Microsoft Cluster Service (MSCS). (From
Microsoft Windows Server 2003 Unleashed, second edition, by Rand Morimoto, et al. Sams Publishing, 2004, ISBN: 0672326671.)
In many of today's business environments, using computer applications and networking services has become critical in conducting day-to-day business functions efficiently. The word downtime has become taboo in situations in which an unstable application or a failed server can greatly impact employee productivity or cost organizations money. Deploying fault-tolerant servers to provide reliable access to critical applications, user data, and networking services is required when unexpected downtime is unacceptable.
Windows Server 2003 provides several methods of improving system- or server-level fault tolerance by using a few of the services included in the Enterprise and Datacenter platforms. Chapter 30, "File System Fault Tolerance (DFS)," discussed file-level fault tolerance, including the Distributed File System (DFS) and volume shadow copies. This chapter covers system-level fault tolerance using Windows Server 2003 network load balancing (NLB) and the Microsoft Cluster Service (MSCS). These built-in clustering technologies provide load-balancing and failover capabilities that can be used to increase fault tolerance for many different types of applications and network services. Each of these clustering technologies is different in many ways. Choosing the correct type of clustering depends on the applications and services that will be hosted on the cluster.
Windows Server 2003 technologies such as NLB and MSCS improve fault tolerance for applications and network services, but before these technologies can be leveraged effectively, basic server stability best practices must be put in place.
This chapter focuses on the policies and procedures needed to create an environment that supports a fault-tolerant network. Additionally, this chapter contains the step-by-step procedures needed to make server hardware more reliable through the successful implementation of NLB and MSCS.
Building Fault-Tolerant Systems
Building fault-tolerant computing systems consists of carefully planning and configuring server hardware and software, network devices, and power sources. Purchasing quality server and network hardware is a good start to building a fault-tolerant system, but the proper configuration of this hardware is equally important. Also, providing this equipment with stable line power that is backed up by a battery or generator adds fault tolerance to the network. Last but not least, proper tuning of server operating systems helps enhance availability of network services such as file shares, print servers, network applications, and authentication servers.
Using Uninterruptible Power Supplies
Connecting line power to server and network devices through uninterruptible power supplies (UPSs) not only provides conditioned incoming power by removing voltage spikes and providing steady line voltage levels, but it also provides battery backup power. When line power fails, the UPS switches to battery mode, which should provide ample time to shut down the server or network device without risk of damaging hardware or corrupting data. UPS manufacturers commonly provide software that can send network notifications, run scripts, or even gracefully shut down servers when power thresholds are met. One final word on power is that most computer and network hardware manufacturers provide device configurations that incorporate redundant power supplies designed to keep the system powered up in the event of a single power supply failure.
During power outages, many system administrators find out which critical devices are not connected to a UPS, and the race begins to shut down and shift power from non-critical devices. To avoid these situations, administrators need to perform regular inspections of critical hardware devices in server rooms and network closets to ensure that all necessary servers, network routers, switches, hubs, and firewalls are backed by battery power. When power to a server fails and the battery provides only a few minutes for users to save data and close connections to reduce the chance of data corruption, it is essential for the network to remain available.
This chapter is from Microsoft Windows Server 2003 Unleashed, by Rand Morimoto, et al. (Sams Publishing, 2004, ISBN: 0672326671). Check it out at your favorite bookstore today.
Buy this book now. |
Next: Choosing Networking Hardware for Fault Tolerance >>
More MS SQL Server Articles
More By Sams Publishing