We are running mainly OLTP workload on SQL Server 2012 SP1 CU4 on Windows Server 2008 R2 SP1 on a two-node, single-instance FCI, where each node is an HP DL980 Gen 7 (with eight Xeon E7-4870 processors and 2TB of RAM). We have HT disabled, with MAXDOP at 6 and cost threshold for parallelism at 100.
We recently decided to roll out an AlwaysOn AG to one asynchronous replica on a second, two-node, single instance cluster, with identical hardware. In order to get ready for that, we made sure to deploy all of the OS hotfixes for Windows Server 2008 R2 listed here. We also installed SQL Server 2012 SP1 CU7, in order to get the hotfix for a memory leak related to AlwaysOn AGs that was included in SQL Server 2012 SP1 CU6 (FIX: A memory leak occurs when you enable AlwaysOn Availability Groups or SQL Server failover cluster in Microsoft SQL Server 2012)
We have installed CU7 on a Friday and setup AG on Saturday, but starting Sunday evening (Asia Monday), we started seeing high CPU usage, went from 30-40% CPU range to 90%-95% CPU range. We were also seeing same symptoms even during non-peak periods. We were suspecting CU7 for this change and also suspicious that the AlwaysOn AG was also causing significant extra CPU utilization.
We tried various troubleshooting method, and paused AG for few hours, but nothing helped, so we have rolled back CU7 which brought the CPU usage back to normal. We also went to our PreProd environment and compared the CPU metrics before and after CU7 and we can clearly see that, the CPU usage was spiked after CU7 upgrade. We can consistently reproduce this in few of our environments.
1. Have you heard of any issues like this from other customers, with the hotfixes between SP1 CU4 and SP1 CU7?
2. How do we proceed further to apply any hotfixes related to AG especially KB2877100?
We have opened support case related to this, (114031011250466) if you want to check it out.