We have a customer using SQL Server 2000 that is experiencing an intermittent problem with OPENQUERY() to an Oracle linked server:
- The application requests data from the Oracle linked server once a minute, using OPENQUERY() to get data as the source for an insert into a #temp table.
- Each OPENQUERY() is quick, and should return in less than 10 seconds.
- Occasionally, an OPENQUERY() will hang. A second job that runs every 5 minutes will detect this (by searching for the most recent logged download) and issue a 'stop job' request for the once-a-minute job that pulls data.
- Even though the job is cancelled, the hanging OPENQUERY() leaves an SPID with open transactions (against tempdb) that cannot be killed using KILL. Attempting to kill the SPID with KILL does not work, even if we wait 24 hours or longer.
- Because there are open transactions in tempdb, the log file cannot be truncated beyond this point, and templog.ldf grows without limit.
- The only solution we have found is to stop/restart SQL Server. Otherwise the drive containing templog eventually fills up and everything stops working.
This problem used to be relatively rare, occurring only once or twice a year. But now it has started occurring once every few weeks. If it happens on a test server it is not such a big deal.
But it also happens on production servers that support a manufacturing site, and restarting SQL Server on these boxes is a big problem.
I posted a question about this issue to an MSDN forum (http://forums.microsoft.com/MSDN/ShowPost.aspx?PostID=622876&SiteID=1), but it has been two weeks now and no one from Microsoft has replied, other than to move the posting from one forum to another.
I am quite frustrated, because it appears this is not a new problem, but I do not see any response from Microsoft on how to solve this issue. For example, a quick search turns up these related articles with never any resolution from Microsoft that I can see:
In general: http://groups.google.com/groups?q=openquery+to+linked+server+hangs&start=0&scoring=d&hl=en&lr=&
Here is data from the test server where this problem most recently occurred:
Product version: 8.00.760
Product level: SP3
Edition: Standard Edition
Version: Microsoft SQL Server 2000 - 8.00.760 (Intel X86) Dec 17 2002
14:22:05 Copyright (c) 1988-2003 Microsoft Corporation Standard Edition
on Windows NT 5.0 (Build 2195: Service Pack 4)
Here is the data for the hung process (prior to any manual KILL attempt) from sysprocesses:
waitresource: _name of Oracle linked server_
login_time: 2006-08-07 18:53:04
last_batch: 2006-08-07 18:53:04
program_name: SQLAgent - TSQL JobStep ...
nt_domain: NT AUTHORITY
loginame: NT AUTHORITY\SYSTEM
Bottom line: I think this is a bug that needs to be fixed -- telling customers that they must stop/restart SQL Server on production servers at random times due to SQL Sever's inability to kill a hung process is unacceptable.