Home Dashboard Directory Help
Search

SQL Server IO handling mechanism can be severely affected by high CPU usage by Ramesh Meyyappan (www.SQLWorkshops.com)


Status: 

Closed
 as Fixed Help for as Fixed


107
2
Sign in
to vote
Type: Bug
ID: 744650
Opened: 5/29/2012 7:21:49 AM
Access Restriction: Public
0
Workaround(s)
view
8
User(s) can reproduce this bug

Description

You might notice very high IO response time within SQL Server when CPU intensive queries are running on the system while there are no real IO issues at the Hardware or Operating System level. As a result the system might be very slow.

To reproduce the behavior, create the following database and table:
use master
drop database SQLWorkshops
go
use master
create database SQLWorkshops
go
use SQLWorkshops
set nocount on
create table tab71 (c1 int primary key clustered, c2 int, c3 nchar(2000))
begin tran
declare @i int
set @i = 1
while @i <= 200000
begin
insert into tab71 values (@i, @i, 'SQLWorkshops.com')
set @i = @i + 1
end
commit tran
go

Execute the following 500 row insert batch, it will be fast.
use SQLWorkshops
declare @num_of_writes_before bigint, @num_of_bytes_written_before bigint, @io_stall_write_ms_before bigint
declare @num_of_writes_after bigint, @num_of_bytes_written_after bigint, @io_stall_write_ms_after bigint
declare @date_before datetime, @date_after datetime
set nocount on
drop table tab7
create table tab7 (c1 int primary key clustered, c2 int, c3 char(2000))
checkpoint
dbcc sqlperf('sys.dm_os_wait_stats', clear)
select @num_of_writes_before = num_of_writes, @num_of_bytes_written_before = num_of_bytes_written, @io_stall_write_ms_before = io_stall_write_ms
     from sys.dm_io_virtual_file_stats(db_id(), 2) vfs
set statistics time off
set @date_before = getdate()
declare @i int
set @i = 1
while @i <= 500
begin
insert into tab7 values (@i, @i, 'a')
set @i = @i + 1
end
set @date_after = getdate()
set statistics time off
select @num_of_writes_after = num_of_writes, @num_of_bytes_written_after = num_of_bytes_written, @io_stall_write_ms_after = io_stall_write_ms
     from sys.dm_io_virtual_file_stats(db_id(), 2) vfs
select @num_of_writes_after - @num_of_writes_before as num_of_writes, @num_of_bytes_written_after - @num_of_bytes_written_before as num_of_bytes_written, @io_stall_write_ms_after - @io_stall_write_ms_before as io_stall_write_ms, (@num_of_bytes_written_after - @num_of_bytes_written_before) / (@num_of_writes_after - @num_of_writes_before) as WriteSizeBytes, (@io_stall_write_ms_after - @io_stall_write_ms_before) * 1.0 / (@num_of_writes_after - @num_of_writes_before) as AvgStallMS
select * from sys.dm_os_wait_stats
     where wait_type = 'WRITELOG'
select datediff(ms, @date_before, @date_after) as Duration
go

Now execute the CPU intensive query to spin all CPUs at 100% and then execute the above 500 row insert batch concurrently, it will be very, very slow.
use SQLWorkshops
while 1=1
select * from tab71
     where c3 like '%SQLIO.COM%'
option (maxdop 0)
go

You will notice the inserts are very, very slow now, SQL Server reports very high IO times both in sys.dm_io_virtual_file_stats and sys.dm_os_wait_stats. In addition you will notice a large number of WAITELOG waits since log records are written by LOG WRITER and hence very high signal_wait_time_ms leading to more query delays. However, Performance Monitor Counter, PhysicalDisk, Avg. Disk sec/Write will report very low latency times.

Such delayed IO handling also occurs to read operations with artificially very high PAGEIOLATCH_SH wait time (with number of PAGEIOLATCH_SH waits remaining the same). This problem will manifest more and more as customers start using SSD based storage for SQL Server, since they drive the CPU usage to the limits with faster IOs.
Details
Sign in to post a comment.
Posted by warCHE on 8/19/2013 at 4:28 PM
So this is marked as fixed but there's no comment regarding resolution. Anyone can comment on that??
Posted by Data Realized on 7/30/2012 at 5:04 PM
I've reproduced this and I'm also experiencing this problem - in production - on a DL980 with 64 processors. This environment uses tempDB aggressively which is on a FusionIO card. I noticed it the for the first time when I was compressing some objects while other activity was occurring (update stats on a different database) and a backup on yet another db.

I can't throw more hardware at it... I suppose I could remove the fusionIO card... but that's not the best option.

Microsoft, can you please provide an update / ETA for resolution?
Posted by HANSTOFTE on 7/22/2012 at 6:31 AM
Thanks Ramesh

I checked it on SQL 2012. And the high cpu wait manifest into long IO waits there to, and that’s miss leading. Since the cause actually is CPU starvation. No CPU to process the IO. The SQLOS should have preemptive some CPU resource to finish the IO.

Hans
Posted by Marco Carozzi on 6/25/2012 at 2:32 PM
Hello Gents

in my testing the issue looks remarkable with cpu at top which, can be somehow understandable (yet with the server remaining responsive under other interactions), however as Randy introduced it is quite sensible even when not all the cores are at their top, and this looks also to happen when the cores dealing with the two operations have nothing to share.
I tested on slightly powerfull machine (32 cores, HP XP24k, EMC clarion) and the result was the same, particulary affecting on VM. Looking backward I am almost certain I can link it to real customer stand still in IO, which had happened a few before Ramesh notified me this; there was a large cpu usage, but not much IO, still the IO was slow and I recall now all those strange WRITELOG waits in my Sommarskog's beta_lockinfo.
I look forward MS to report on this soon with possible great news in term of boosting up MSSQL even more.
Thanks everybody

         Marco
Posted by Microsoft on 6/19/2012 at 3:11 PM
Hi,

Thanks for reporting it. We're looking at this problem and will report back as soon as we have more information.

Thanks,
Fabricio Voznika
SQL Server Engine team
Posted by Randy Rabin on 6/5/2012 at 2:24 PM
I was able to repro on a VMWare-based server running Win2008 R2 with SQL2008 R2 SP1, with 2 CPUs and then again after bumping up to 4 CPUs.

Setting CPU and I/O affinity masks to separate the different workloads onto different CPUs had no effect (I was hoping it would). Even with one CPU allocated to processing and 3 CPUs to I/O, there was still a big difference in query performance.

Randy
Posted by Titus63 on 5/30/2012 at 5:12 AM
On a virtual Windows 2008 R2 database server run on Hyper-V I got the following results (without and while running the intensive query in parallel:

num_of_writes num_of_bytes_written io_stall_write_ms WriteSizeBytes AvgStallMS
500    1354752    344    2709    0.68800000000000000
500    1357312    48520    2714    97.04000000000000000

wait_type waiting_tasks_count wait_time_ms max_wait_time_ms signal_wait_time_ms
WRITELOG    500    547    16    7
WRITELOG    500    48778    218    40

Duration
590
48820

Kind regards!
Walter
Posted by AJB69 on 5/30/2012 at 3:19 AM
Very Interesting issue. Thanks Ramesh
Posted by harald bach-gansmo on 5/29/2012 at 11:59 PM
Hi,

I carried out a successfull repro on a HP DL 580 G7 server with 128Gb of RAM & 2*10cores CPU. Very interesting this.

Thanks Ramesh

kind regards
harald.
Sign in to post a workaround.