Search

AlwaysOn Availability Group Disk Failure But No Failover by Dave Hughes

Active

2
0
Sign in
to vote
Type: Bug
ID: 772887
Opened: 11/30/2012 1:24:17 AM
Access Restriction: Public
0
Workaround(s)
1
User(s) can reproduce this bug
The scenario is a follows:

2 x Physical Servers (Site 1: Node 1 + Node 2) with local C & D drive for Operating System and Applications
2 x Hyper-V v2 Virtual Servers in a DR site (Site 2: Node 3 + Node 4, different subnet)
5 x iSCSI LUNs for data (H), logs (L) / tempdb data (T), tempdb log (U) , backup (Z).

I believe that native Windows drivers are used for the networking and iSCSI.
1 Cluster is spanning the 4 nodes.

SQL Server binaries installed to D
SQL Server root data (master db etc.) installed to H:/Program Files /Microsoft SQL Server...

Cluster node voting set to Node 1 + Node 2 + File Share Witness

Availability Group set to Sync + Failover on Node 1 + Node 2, Async on Node 3 + Node 4

All is currently healthy, clustering is reporting zero errors, data movement is running, Node 1 (physical) is the active server.

To simulate a SAN failure, the network adapters to the iSCSI target were disabled on the active physical server. At this point the additional drives disappeared from Windows Explorer as expected, however the Availability Group DID NOT failover to the second node.

Running sp_server_diagnostics (on what I assume to be a dying server) reported no errors - all components reported 'clean' (possibly expect for events which may have been 'unknown'). Trying to connect to the databases threw errors as expected.

So my question is, why is sp_server_diagnostics reporting a healthy state when ALL the iSCSI drives have 'failed', including master, tempdb and the user databases. Is this scenarion not likely to be one of the most common errors that Availability Groups (and I guess failover Cluster Instances as well - they also use sp_server_diagnostics) should protect against?

I am aware that per-database errors are not currently included in sp_server_diagnostics, but even so, a complete loss of the databases especially system databases should have flagged something!

Am I missing something?
Details (expand)

Product Language

English

Version

SQL Server 2012 - Enterprise Edition

Category

SQL Engine

Operating System

Windows Server 2012 Standard

Operating System Language

English

Steps to Reproduce

1) Provision 4 Win 2012 servers (although 2 may be sufficient?)
2) Attach iSCSI LUNS for SQL system database and user databases.
3) Configure Clustering across the servers (inc voting etc.)
4) Install SQL using iSCSI for system databases, data, logs, tempdb etc.
5) Provision an Availability Group with a single database (Sync + Auto failover between nodes 1 and 2)
6) Disable the network adapters to the iSCSI targets on the active physical server.

Actual Results

No Availability Group failover to the second node.

sp_server_diagnostics returns 'clean' or 'unknown' for all subsystems.

Expected Results

The Availability Group fails over to the second node.

sp_server_diagnostics returns 'error'

Platform

X64

Virtualization

Other (e.g. VM Ware, specify in Description)
File Attachments
0 attachments
Sign in to post a comment.
Sign in to post a workaround.