Solutions with Practical SQL DBA : Windows Server 2008 Cluster: Understanding the Cluster Log Behavior

Wednesday, 17 October 2012

Windows Server 2008 Cluster: Understanding the Cluster Log Behavior

In any system, logs are very important to troubleshoot the issue. In the case of windows cluster, cluster logs are like a black box which will have all information about the failure of cluster.This is very important information, that we need, to raise a case with Microsoft. In this post let us discuss only about cluster log behavior and generating the cluster log.In the coming post we will,discuss about deciphering the cluster log.

When you trouble shoot the cluster issue, I am sure that, in most of the cases cluster log will give you the root cause of the issue. You can link the windows event log entries for further analysis.In windows 2008, the cluster logs are captured using the new eventing and diagnostic channel called ETW (Event Tracing for Windows). You can see this tracing in the Reliability and Performance Monitor under "Data Collector Sets\Event Tracing Session". Below is the snapshot of the same.

The log file generated by ETW are stored in the folder %windir%\system32\winevt\logs

Each time the server is rebooted, a new file will be generated like clusterlog.etl.002 and start logging in that until the server is rebooted again. Up to 3 log files are kept, so after the third consecutive reboot, you will start loosing old log entries. Please find below a snapshot of etl file in the folder.

For example, in the above case, after two consecutive restart on 29th september, we lost all the logs till 12th september. If we had one more reboot on 29th, we might have lost all the logs till 29th. The ETL log file name incremented each time and has 00X suffix appended to it. Once the maximum number of log files (3) reached, it will start overwriting the first one.At any point of time, only one log file being actively used.

The default log file size is 100 MB (for each etl file). Once the file reached the limit of 100 MB, it will start deleting entries from the beginning of the file to make room for the current logging. In our case the active log is 'ClusterLog.etl.003' which created on 29th September 2012 1:41 AM and it is reached the limit of 100MB. Now the log entries at the beginning of the file ClusterLog.etl.003 will get deleted to make room for new entries.

Generating Cluster Log

Now we have the cluster log spread across three ETL files.The easiest way to read the ETL files is, use the Cluster Log command from the command line. The syntax is :

cluster log /g

This will merge the three etl files of each node and create a output file cluster.log. The output file will be stored in the %windir%\Cluster\Reports directory on each node of the cluster.

One more interesting switch available for Cluster Log command is /Span:<minutes> , Which will help us to generate the log only for last 30 minutes. For example /Span:15. This will help us to quickly troubleshoot the recent issues.

Missing entries in the cluster log

When you generate the log using the cluster log , in the output file, you might have notice a gap of log, that is log is not available for some days in between. This is happening due to the truncation of log once it is reached the limit of 100 MB. In our example, the etl file ClusterLog.etl.003' which created on 29th September 2012 1:41 AM, has already reached 100MB limit and might have truncated some data at the beginning. So when you merge the etl file using the cluster log command, you can notice a gap after 29th September 2012 1:41 AM. May be for some hours or days/week. In the output file ,you might have log from September 12th 5.21 AM (might have truncated at the beginning as it reached 100 MB) till 29th September 2012 1:41 AM. These entries are coming from the ClusterLog.etl and ClusterLog.etl.002. After that you might notice a gap of log data for few hours/days as the data at the beginning of ClusterLog.etl.003 got truncated to make room for current log entries.