AlwaysOn Availability Group Disk Failure But No Failover - by Dave Hughes

Status : 

  Won't Fix<br /><br />
		Due to several factors the product team decided to focus its efforts on other items.<br /><br />
		A more detailed explanation for the resolution of this particular item may have been provided in the comments section.

Sign in
to vote
ID 772887 Comments
Status Closed Workarounds
Type Bug Repros 5
Opened 11/30/2012 1:24:17 AM
Access Restriction Public


The scenario is a follows:

2 x Physical Servers (Site 1: Node 1 + Node 2) with local C & D drive for  Operating System and Applications
2 x Hyper-V v2 Virtual Servers in a DR site (Site 2: Node 3 + Node 4, different subnet)
5 x iSCSI LUNs for data (H),  logs (L) / tempdb data (T), tempdb log (U) , backup (Z).

I believe that native Windows drivers are used for the networking and iSCSI.
1 Cluster is spanning the 4 nodes.  

SQL Server binaries installed to D
SQL Server root data (master db etc.) installed to H:/Program Files /Microsoft SQL Server...

Cluster node voting set to Node 1 + Node 2 + File Share Witness

Availability Group set to Sync + Failover on Node 1 + Node 2, Async on Node 3 + Node 4

All is currently healthy, clustering is reporting zero errors, data movement is running, Node 1 (physical) is the active server.

To simulate a SAN failure, the network adapters to the iSCSI target were disabled on the active physical server.  At this point the additional drives disappeared from Windows Explorer as expected, however the Availability Group DID NOT failover to the second node.

Running sp_server_diagnostics (on what I assume to be a dying server) reported no errors - all components reported 'clean' (possibly expect for events which may have been 'unknown').  Trying to connect to the databases threw errors as expected.

So my question is, why is sp_server_diagnostics reporting a healthy state when ALL the iSCSI drives have 'failed', including master, tempdb and the user databases.  Is this scenarion not likely to be one of the most common errors that Availability Groups (and I guess failover Cluster Instances as well - they also use sp_server_diagnostics) should protect against?

I am aware that per-database errors are not currently included in sp_server_diagnostics, but even so, a complete loss of the databases especially system databases should have flagged something!

Am I missing something?
Sign in to post a comment.