SQL Server 2012 AlwaysOn Availability Groups Dashboard stops updating after thread exhaustion - by Nick_Craver

Status : 

  Won't Fix<br /><br />
		Due to several factors the product team decided to focus its efforts on other items.<br /><br />
		A more detailed explanation for the resolution of this particular item may have been provided in the comments section.


29
1
Sign in
to vote
ID 779206 Comments
Status Closed Workarounds
Type Bug Repros 3
Opened 2/14/2013 6:09:20 AM
Access Restriction Public

Description

Once the primary server runs out of threads for AlwaysOn replication to function, the HADR service stops and so do the dashboard updates.  The underlying data in what appears to be all hadr DMVs ceases to be updated and the dashboard continues to indicate status from the time of the failure.

The other replicas indicate that they are disconnected while the primary indicates everything's connected and fine - this means we cannot trust the dashboard for status, we have to (via external software) poll each node's DMVs directly.

The only way we've found to kick start replication again without a SQL restart is to alter a replica's properties, for example:

ALTER AVAILABILITY GROUP [MyAGName] MODIFY REPLICA ON N'SQL01' WITH (SECONDARY_ROLE(ALLOW_CONNECTIONS = READ_ONLY));
ALTER AVAILABILITY GROUP [MyAGName] MODIFY REPLICA ON N'SQL01' WITH (SECONDARY_ROLE(ALLOW_CONNECTIONS = ALL));

Note: this restarts replication as a whole for all AGs, just toggling something triggers a replication restart.
Sign in to post a comment.
Posted by Brent Ozar on 2/14/2013 at 6:14 AM
When this happens, log_send_queue_size in the DMV sys.dm_hadr_database_replica_states shows 0. When the value cannot be accurately determined, instead of showing 0, it should return null. That way we don't break the DMV output, but we can still use it for alerting.