Once the primary server runs out of threads for AlwaysOn replication to function, the HADR service stops and so do the dashboard updates. The underlying data in what appears to be all hadr DMVs ceases to be updated and the dashboard continues to indicate status from the time of the failure.
The other replicas indicate that they are disconnected while the primary indicates everything's connected and fine - this means we cannot trust the dashboard for status, we have to (via external software) poll each node's DMVs directly.
The only way we've found to kick start replication again without a SQL restart is to alter a replica's properties, for example:
ALTER AVAILABILITY GROUP [MyAGName] MODIFY REPLICA ON N'SQL01' WITH (SECONDARY_ROLE(ALLOW_CONNECTIONS = READ_ONLY));
ALTER AVAILABILITY GROUP [MyAGName] MODIFY REPLICA ON N'SQL01' WITH (SECONDARY_ROLE(ALLOW_CONNECTIONS = ALL));
Note: this restarts replication as a whole for all AGs, just toggling something triggers a replication restart.