System-Level Fault Tolerance (Clustering/Network Load Balancing) - Restoring a Single-Node Cluster When the Cluster Service Fails (Page 19 of 26 ) When Cluster Service on a single node fails and will not start, it is usually a sign of corruption in the local cluster database file CLUSDB. In the interest of time, an administrator can replace the CLUSDB file with the latest CHKxxx.tmp file from the quorum disk's MSCS directory. To replace the CLUSDB file, follow these steps: Log on to the cluster node using an account that has the right to back up the system. (Any Local Administrator, Domain Administrator, or Cluster Service account has the necessary permissions to complete the operation.) Open Cluster Administrator on an available cluster node. Then check to ensure that all cluster groups are running properly to verify that the Cluster Service problem is only on a single node. If only one node is experiencing Cluster Service startup problems, log on to the server console and click Start, All Programs, Administrative Tools, Services. In the Services applet, locate Cluster Service and double-click it. On the General tab of the property page for Cluster Service, disable the Startup Type service. Click OK to save changes. Reboot the server to release any file locks on the CLUSDB file. When the server completes the reboot process, log on with a Cluster Administrator account. Click Start, Run. Connect to the cluster quorum disk by using the UNC path \\<clustername>\<quorum_drive_letter>$. For example, in a cluster named cluster1 with a quorum disk named Q, use the path \\cluster1\Q$. Double-click the MSCS directory. Choose View, Details in the Explorer window. Locate the file named CHKxxx.tmp with the latest time stamp, similar to the one shown in Figure 31.17. Right-click the file and choose Copy. Then close the Explorer window. Click Start, Run. 
Figure 31.17 Choosing a backup set for restoral.
Type in the full path to the cluster directory and click OK. The default path is C:\windows\cluster, where C is the system drive and windows is the %SystemRoot% directory. Locate the CLUSDB file, right-click it, and choose Rename. Rename the file to CLUSDB.old and press Enter to save. If the file cannot be renamed, make sure Cluster Service is set to disable, reboot the server, and then try again. Choose Edit, Paste in the Explorer window. The CHKxxx.tmp file should now be copied in the c:\windows\cluster directory. Locate the CHKxxx.tmp file, right-click it, and choose Rename. Rename the file to CLUSDB and press Enter to save. If the file cannot be renamed, make sure the Cluster Service is set to disable, reboot the server, and then try again. Close the Explorer window. Click Start, All Programs, Administrative Tools, Services. In the Services applet, locate Cluster Service and double-click it. On the General tab of Cluster Service's property page, change the Startup Type service to Automatic. Click OK to save your changes. Right-click Cluster Service and choose Start. When Cluster Service starts, move the appropriate group or groups to the recovered node to test failover functionality.
If this process does not restore operational status to Cluster Service, restore the system state from a previous backup by following these steps: Click Start, All Programs, Accessories, System Tools, Backup. If this is the first time you've run Backup, it will open in Wizard mode. Choose to run it in Advanced mode by clicking on the Advanced Mode hyperlink. After you change to Advanced mode, the window should look like the one in Figure 31.13. Click the Restore Wizard (Advanced) button to start the Restore Wizard. Click Next on the Restore Wizard Welcome screen to continue. On the What to Restore page, select the appropriate cataloged backup media, expand the catalog selection, and check System State, as shown in Figure 31.18. Click Next to continue. 
Figure 31.18 Choosing to restore the system state.
If the correct tape or file backup media does not appear in this window, cancel the restore process. Then, from the Restore Wizard page, locate and catalog the appropriate media and return to the restore process from step 1.
Note - Refer to Chapter 33 for information on how to catalog tape and file backup media.
On the Completing the Restore Wizard page, click Finish to start the restore. When the process is complete, review the log for detailed information and click Close when finished. Reboot the restored cluster node as prompted. When Cluster Service starts, move the appropriate group or groups to the recovered node to test failover functionality.
This chapter is from Microsoft Windows Server 2003 Unleashed, by Rand Morimoto, et al. (Sams Publishing, 2004, ISBN: 0672326671). Check it out at your favorite bookstore today.
Buy this book now. |
Next: Restoring a Single Node After a Complete Server Failure >>
More MS SQL Server Articles More By Sams Publishing |