Home Dashboard Directory Help
Search

Killed/Rollback by donaldvc


Status: 

Closed
 as Not Reproducible Help for as Not Reproducible


18
1
Sign in
to vote
Type: Suggestion
ID: 433703
Opened: 4/20/2009 3:42:28 AM
Access Restriction: Public
0
Workaround(s)
view

Description


This is a generic request (and a generic proposed solution): using the KILL command to end processes that appear in a hung state (for example, backups or CHECKDB) occasionally leaves the process in a killed/rollback state which can only be terminated by restarting the SQL Server service - not very desirable in high availability applications. The problem is intermittent but also widely reported by users of SQL Server 2000, 2005 and 2008. It seems to be confined to parallel processes running on Enterprise SKUs.
Details
Sign in to post a comment.
Posted by Luiz Mercante on 11/2/2012 at 10:36 AM
Just helping with some additional information.

There were a job runing a stored procedure that backup all databases to a storage UNC path. The storage has been crashed and the backup job was running forever.

I try to kill this process but still killed/rollback with 66% completed and increasing time remaining.
Posted by Jeff Moden on 6/21/2012 at 10:01 AM
Dear Microsoft

Gosh folks... If "Customer Support" has a workaround for this very common problem, you should share it with us here instead of just closing this to clear your books. You should also provide a retorfit fix this problem on all versions because it looks like it affects many versions including the current ones.

Is calling "Customer Support" without getting charged actually an option for 2000 or 2005 anymore?

--Jeff Moden
Posted by Microsoft on 10/26/2010 at 5:52 PM
Hello, this defect is being closed with a recommendation to contact Customer Support for further investigation, and to provide the information suggested in this blog article: http://blogs.msdn.com/psssql/archive/2008/09/12/sql-server-2000-2005-2008-recovery-rollback-taking-longer-than-expected.aspx
Thanks
Guy
Posted by Microsoft on 10/18/2010 at 11:00 AM
Hello, please refer to the update dated 10/16/2009. Determine if the thread is doing work. Note down the circumstances, generate a dump and contact Customer Support.

Thanks
Guy Bowerman
SQLOS Team
Posted by Quynh_Quinn_Nguyen on 9/22/2010 at 10:01 PM
I'm experiencing the same exact issue for the log backup. What is the resolution? I'm running SQL2008 SP1 + CU5.
Posted by Fabricio Voznika [MSFT] on 10/16/2009 at 5:55 PM
Some large operations may take a long time to rollback. You can use database_transaction_log_bytes_reserved column of sys.dm_tran_database_transactions to determine if the thread is actually progressing versus being hung. Reserved space will decrease as the transaction rollback. If you see that it’s not progressing after about an hour then it’s probably hung. In this case, please follow the instructions in this blog post to generate a dump and contact Customer Support: http://blogs.msdn.com/psssql/archive/2008/09/12/sql-server-2000-2005-2008-recovery-rollback-taking-longer-than-expected.aspx

Thanks,
Fabricio Voznika
SQLOS Team
Posted by SueMH on 9/7/2009 at 1:26 AM
We have exerience another two issues of the same this week on two seperate clusters . It would be good to see Microsoft take an active part in this discussion as bouncing an instance is not good practice to kill a process. We have in the past left this for days and it does not resolve itself it requires an instance restart / falover to resolve.

Customer experience degraded due to the hanging process
Custmer experience degraded due to down time to restart

SQL Server 2005 x64 Service Pack 3 on Windows 2003 R2 SP2
(7 insatance 2 node cluster)
Posted by GeneralSQL on 8/5/2009 at 11:46 AM
I too have a process that is trying to roll back. I was running a re-index that was blocking other processes, but was not moving on either the cputime or diskio numbers. I am waiting for it to stop, but new connections cannot find the tables. I get timeout errors when I try to open the properties of the database. I too echo what is stated by others that there needs to be a way to kill these processes without stopping the entire server service.
Posted by Carl Landry on 6/8/2009 at 8:52 AM
I am now at three run-away "KILLED/ROLLBACK" processes on my production system. This happens when the agent launches a stored procedure that opens a cursor on a remote server. This command execute every 5 minutes but, for the past 4 weeks, three of them would stay blocked. When this happens, the agents starts consuming more and more memory.

Stop/start of MSDTC do not help.
Stop/start of the agent will release the memory and cause the PID to become "KILLED/ROLLBACK".
The agent was the Windows process ("hostprocess" column from sys.sysprocesses) associated with the SQL PID. After the restart, this Windows process is obviously gone.
Setting the related database to single-user or offline (which show close any connection to it) don't work as the alter database command is blocked by the spinning SPIDs.

The oldest PID is like this for over 3 weeks.

Using Process Monitor, each of them appear to be in a loop reading the following registry keys:
- Open HKU\SQL Server User SID\
- Open HKU\SQL Server User SID\Software\Policies\Microsoft\Control Panel\Desktop (Not found)
- Open HKU\SQL Server User SID\Control Panel\Desktop
- Query Value of HKU\SQL Server User SID\Control Panel\Desktop\MultiUILanguageId (Not found)

This causes the SQL PID to use the resources of a single CPU, increasing the number of CPU time but not I/O and memory. No wait or transaction open.

I am now required to look for a window to restart the SQL services as everything fails to effectively kill these threads. The system has 4 CPUs... leaving only one for the real work!
Posted by GeLaPa on 6/5/2009 at 3:33 PM
I have a process in this hung killed\rollback state for over 24 hours on our SQL Server 2005 Enterprise 64-bit edition clustered instance. The process is/was an online reindex command on a relatively small index that usually takes 1-4 minutes to complete.

As others have noted, I cannot create any log or full backups while the process is rolling back because any backup process is blocked by the CHECKPOINT process which in turn is waiting for the rollback to complete. The completion rate of the rollback is static at 0% (used KILL <spid> WITH STATUSONLY). Also the hung process is blocking itself and the KPIDs of these processes do not exist on the database server (used select * from sys.sysprocesses where spid=<spid> and then checked Task Manager for the PIDs on the db server). Further, our transaction log file is continuing to be filled up as we cannot generate any log backups.

We are going to restart our SQL Server service during an emergency maintenance period after EOB. This is unacceptable for a production system where high availability is required. I echo the comments of the others that KILL should effectively and actually kill processes.
Posted by Cate Donoghue on 6/3/2009 at 8:21 AM
Enterprise, clustered system here. I currently have 3 of these hung on my server. The longest one is 6 days old. This is a high availability situation and restarting sql server service is not practical. There should be an override that you can kill a hung thread that is in this state.
Posted by donaldvc on 5/8/2009 at 4:18 AM
2 more occurrences this week of the killed/rollback issue on a different server (again SQL Server 2005 SP3, Windows 2003 R2 SP2). We have several hundred SQL Server instances to manage, so are perhaps more likely to see low-level bugs in SQL Server than many other DBAs, but this is still an annoyingly high incidence. It would be interesting to see how many others have experienced this issue.
Posted by donaldvc on 4/27/2009 at 12:34 AM
Still running - SPID 65: transaction rollback in progress. Estimated rollback completion: 0%. Estimated time remaining: 0 seconds.
96 hours so far ... This is not an unusual occurrence.
Posted by donaldvc on 4/24/2009 at 1:13 AM
Quick follow-up - the affected servers are all running SQL Server 2005 x64 Service Pack 3 on Windows 2003 R2 SP2.
Posted by donaldvc on 4/24/2009 at 1:09 AM
We have experienced this killed/rollback problem with checkdb and backup processes but currently have an issue with a relatively simple data collection job, pulling back system monitoring data from dmvs. The process is executed remotely against the affected server through an SSIS package. Currently, the process has been in a killed/rollback state for 21 hours with a cpu time of 16 and diskio of 3.
Posted by Burtonrw on 4/21/2009 at 11:51 PM
I've experienced this when a SQL backup to UNC path across the network has been affected by a networking failure that has left the backup hung, then kill leaves it in kill/rollback indefinitely until SQL service is restarted. While in kill rollback it prevents any further database or log backups of that database. On service restart there is no incomplete transacation to roll back
Posted by Microsoft on 4/21/2009 at 8:51 AM
Dear customer,
if you are experiencing scenarios where the killed/rollback process seems hanging for more than 48 hours, that seems like a defect that needs to be investigated and fixed. We would like to hear more specifics about such scenarios. can you please send us the relevant details.

thank you,
Madhan
SQLOS Team
Posted by donaldvc on 4/21/2009 at 1:02 AM
I wouldn't like this suggestion to be seen as positively recommending a service restart, but I suspect many DBAs will have experienced a killed/rollback process still hung after more than 48 hours. The point is that a service restart will terminate the 'hung' process and restore the server to a consistent state very quickly, even though it of course involves rolling forward or back transactions in the recovery phase. Restarting services to achieve this end is drastic and should not be necessary if the KILL command worked effectively.
Posted by Michael J. Swart on 4/20/2009 at 6:35 AM
woah!
The processes aren't necessarily hung (i.e. waiting for an event that will never come). They're busy. These processes that are rolling back are trying to get the server into a consistent state. By restarting the SQL Server service, I believe you're essentially postponing that rollback (or roll forward) work to the recovery step that runs before the database becomes available.
Sign in to post a workaround.