OpenQuery() to Linked Server hangs, leaving SPID with open tran that cannot be killed, then templog.ldf grows without limit. Requires SQL Server restart on production servers. - by fdc2005

Status : 

  Won't Fix<br /><br />
		Due to several factors the product team decided to focus its efforts on other items.<br /><br />
		A more detailed explanation for the resolution of this particular item may have been provided in the comments section.


69
1
Sign in
to vote
ID 187192 Comments
Status Closed Workarounds
Type Bug Repros 33
Opened 8/27/2006 9:40:55 AM
Access Restriction Public

Description

We have a customer using SQL Server 2000 that is experiencing an intermittent problem with OPENQUERY() to an Oracle linked server:

- The application requests data from the Oracle linked server once a minute, using OPENQUERY() to get data as the source for an insert into a #temp table.
- Each OPENQUERY() is quick, and should return in less than 10 seconds.
- Occasionally, an OPENQUERY() will hang. A second job that runs every 5 minutes will detect this (by searching for the most recent logged download) and issue a 'stop job' request for the once-a-minute job that pulls data.
- Even though the job is cancelled, the hanging OPENQUERY() leaves an SPID with open transactions (against tempdb) that cannot be killed using KILL. Attempting to kill the SPID with KILL does not work, even if we wait 24 hours or longer.
- Because there are open transactions in tempdb, the log file cannot be truncated beyond this point, and templog.ldf grows without limit. 
- The only solution we have found is to stop/restart SQL Server. Otherwise the drive containing templog eventually fills up and everything stops working.

This problem used to be relatively rare, occurring only once or twice a year. But now it has started occurring once every few weeks. If it happens on a test server it is not such a big deal. 

But it also happens on production servers that support a manufacturing site, and restarting SQL Server on these boxes is a big problem.

I posted a question about this issue to an MSDN forum (http://forums.microsoft.com/MSDN/ShowPost.aspx?PostID=622876&SiteID=1), but it has been two weeks now and no one from Microsoft has replied, other than to move the posting from one forum to another.

I am quite frustrated, because it appears this is not a new problem, but I do not see any response from Microsoft on how to solve this issue. For example, a quick search turns up these related articles with never any resolution from Microsoft that I can see:

Oracle: http://groups.google.com/group/microsoft.public.sqlserver.odbc/browse_thread/thread/4c6a57ddf22aa5a9/3893e142c56f2306%233893e142c56f2306

ASE: http://groups.google.com/group/microsoft.public.sqlserver.connect/browse_thread/thread/7a2df55eaa3c63fe/f7562f954e3726d7%23f7562f954e3726d7

Sybase: http://groups.google.com/group/microsoft.public.sqlserver.odbc/browse_thread/thread/e1270ce025dfc4ad/52cb71b66c31a33b

In general: http://groups.google.com/groups?q=openquery+to+linked+server+hangs&start=0&scoring=d&hl=en&lr=&

Here is data from the test server where this problem most recently occurred:

Product version: 8.00.760
Product level: SP3
Edition: Standard Edition
Version: Microsoft SQL Server  2000 - 8.00.760 (Intel X86)   Dec 17 2002
14:22:05   Copyright (c) 1988-2003 Microsoft Corporation  Standard Edition
on Windows NT 5.0 (Build 2195: Service Pack 4) 

Here is the data for the hung process (prior to any manual KILL attempt) from sysprocesses:

spid:  51
kpid:  1876
blocked:  0
waittype:  0x0042
waittime:  135452422
lastwaittype:  OLEDB
waitresource:  _name of Oracle linked server_
dbid:  9
uid:  0
cpu:  188
physical_io:  0
memusage:  12
login_time:  2006-08-07  18:53:04
last_batch:  2006-08-07  18:53:04
ecid:  0
open_tran:  2
status:  runnable                      
sid:  …
hostname:  …
program_name:  SQLAgent - TSQL JobStep ...                                                       
hostprocess:  1500
cmd:  INSERT          
nt_domain:  NT AUTHORITY                                                                                                                    
nt_username:  SYSTEM                                                                                                                          
net_address:  000CF1C7A509
net_library:  TCP/IP      
loginame:  NT AUTHORITY\SYSTEM                                                                                                             
context_info:  …
sql_handle:  …
stmt_start:  0
stmt_end:  -1


Bottom line: I think this is a bug that needs to be fixed -- telling customers that they must stop/restart SQL Server on production servers at random times due to SQL Sever's inability to kill a hung process is unacceptable.

Thanks!
-Frank.
Sign in to post a comment.
Posted by SDanny Smith on 7/20/2016 at 3:06 AM
I'm running SQL 2014 Server Standard (12.0.4213) with 34 link servers to Advantage 11.0 databases. Currently I have on one of my production servers 20 queries that I can't clear. All of the hung connections are to ADS servers (using OpenQuery). I've tried killing the queries thru the Activity monitor, killing the connections on ADS but no luck. The only way I have is to clear the connections is to restart SQL server. The hangs will reoccur in a day, a week and I've had the hangs occur immediately after a SQL reboot.
Posted by tefthy on 3/7/2016 at 11:53 AM
Similar issue with the one described above (SQL Server connected to an Oracle Server, using OPENQUERY in order to bypass another (hard-core) issue that the SQL Server "Prefers" to bring all rows locally and select some of them itself (SQL Server is Unicode, Oracle is not, although Collation Compatible has been set to true). Very-Very old issues that had to be corrected by now :(((
Geiger1's solution worked for me as well. Since it took me some time to decide which connection to close, I decided to share a few things that might help others with similar issues: "Remote Address" is very important (I had to monitor it for quite sometime while other similar connections were working). "State" has to be "ESTABLISHED" (there is some short but to the point Help on the TCPView). Pay attention to pick "Close Connection" and not to "End Process". I think it is also important not to have any Sent and/or Received counters on the suspect line. Obviously the Protocol should be "TCP" and in my case the Process was "sqlservr.exe".
A big Thank-You to Geiger1 !!!
Posted by Honza P on 1/6/2016 at 9:29 AM
Geiger1's solution helps.

I am facing this problem as well. The Linked Oracle Server happens to crash from time to time. When this happens every pending openquery stays open forever. (I tried keeping it open for 1 month.)

It looks like Microsoft doesn't provide any solution for that. (Status of this Issue is "Closed as won't fixed")
The suggested solution to use "remote query timeout" doesn't help in this case. It looks like the SQL Server keeps waiting for some sort response but it will obviously never come.

Geiger1 pointed to a workaround earlier in this conversation. It is possible to manually close the pending TCP connection. SQL Server will detect that and will finally end the openquery. It's not an ideal solution but it's the best one I know. The other one is to restart the entire SQL Server.
Posted by Geiger1 on 3/3/2015 at 11:02 AM
You will see something like this if the server your Linked Server points to does not play nice.    I found my solution and wrote about it (link below). Microsoft should really harden and expand their Linked Servers but have not enhanced the functionality in years.
http://www.jaygeiger.com/index.php/2015/03/03/how-to-kill-a-frozen-linked-sql-server-connection/
Posted by btvdan on 7/10/2012 at 12:30 PM
Same exact scenario for me... I have a linked server to Intersystems Cache. I was testing some updated extration queries (select * into ...) and the query hung. I tried killing it and now I have the "KILLED/ROLLBACK" statuses. The linked server doesn't work.

Trying out the TCP View suggestion, but I'm not sure how to match up connections displayed in that tool to my SQL Server processes.
Posted by gdc_j on 7/9/2012 at 1:42 PM
I have a 32b SQL Server 2008 SP2 running on Windows 2003R2. We see the same intermittant issue. We have linked server connections to OSISoft PI system and Oracle. At this point I'm not sure which one of the linked servers is causing the issue but my gut feeling is PI. I see the same symptoms that the job is stuck running, waiting on an external resource yet I can't see any connection in Oracle nor do we see an orphan on the OSI side. Killing the job does not work since it stays forever in a "rollback" state. The only way to get rid of the job I've found so far is to restart the instance (not a good solution for a production instance). NOT GOOD!
Posted by pdougenik on 4/17/2012 at 9:02 AM
Amazing that nearly 6yrs after this issue was entered I'm adding a comment saying that this bug is still around, but I am.

We have a 64b SQL Server 2008 SP1 (v10.0.2789.0) installation running on Windows Server 2003 in our production environment. We are connecting to a Sybase ASE 15.0.3 installation via the Sybase 64b ODBC driver ver 03.51. Last month we had a SQL Server query that uses the linked server get stuck, Sybase had no connection showing but the SPID on SQL was stuck in an OLEDB wait. The SPID could not be killed, we had to cycle the SQL service. A few weeks later we had the same thing happen, then again later that day. Using TCPView there were NO connections between the SQL Server and Sybase server. We decided to reboot the box. Is there any update from Microsoft on this issue? Have you been able to reproduce it in any of your labs?
Posted by Stephen Malley on 3/7/2012 at 7:31 AM
We had one occurrence of this associated with replication and another [SPID 7s] that may have been replication related. All these *may* be related to swapped out tasks too. Please note that this is occurring on SQL 2008 R2.
Posted by Stephen Malley on 3/7/2012 at 7:30 AM
We had one occurrence of this associated with replication and another [SPID 7s] that may have been replication related. All these *may* be related to swapped out tasks too.
Posted by Seb71 on 9/1/2009 at 11:45 AM
Restarting MSDTC didn't work for me with a MySQL linked server.
Posted by McDebil on 4/21/2009 at 12:18 AM
I receive similar error with Linked Server to Informix.
Posted by Microsoft on 2/1/2007 at 5:26 PM
Have you tried setting a timeout option for remote queries?

In SQL Server 2000, you can set the server configuration option 'remote query timeout' using sp_configure or the server properties in Enterprise Manager. The timeout value is set in seconds. When this option is set, SQL Server will set the DBPROP_COMMANDTIMEOUT rowset property in the provider. If the provider supports this property, then the query should fail with a timeout error message once the timeout value is reached. This should cause the local SQL Server to automatically abort/rollback the local transaction.

A similar property can be set for a given linked server using 'query timeout' option with sp_serveroption.
Posted by Randy in Marin on 12/20/2006 at 5:56 PM
I have seen processes that get hung and can't be killed. The problem was with using a linked server to a SQL Server in another domain when DTC was not setup correctly. Restarting the DTC enabled the process to die. Not all queries invoked a distributed transaction and experienced the problem related to the invalid DTC setup.
Posted by Unregistered User on 8/29/2006 at 11:06 PM
I receive a similar problem in that when my Linked Server connection from SQL Server to Sybase receives a Sybase error (e.g. results pending) the Sybase connection is left hanging after the SQL Server connection is closed.
It seems to be that over time, even successful Sybase connections are left open after the SQL Server connection has been closed.