Home Dashboard Directory Help
Search

SQL Server 2008 Periodically Does Not Accept Connections by Glenn A. Berry


Status: 

Closed
 as Not Reproducible Help for as Not Reproducible


35
0
Sign in
to vote
Type: Bug
ID: 468478
Opened: 6/19/2009 10:14:25 AM
Access Restriction: Public
3
Workaround(s)
view
22
User(s) can reproduce this bug

Description

This is happening on a couple of my SQL 2008 servers in Production, about once a week. We see lots of these two errors in the SQL Error Log:

The client was unable to reuse a session with SPID 127, which had been reset for connection pooling. The failure ID is 29. This error may have been caused by an earlier operation failing. Check the error logs for failed operations immediately before this error message.
Error: 18056, Severity: 20, State: 23.

The server stops accepting connections from ADO.NET clients and from SSMS. Restarting the SQL Server Service clears it up every time. Both of these servers are on x64 Windows Server 2008, and they are Enterprise Edition, Build 2710

Microsoft SQL Server 2008 (SP1) - 10.0.2710.0 (X64) Apr 7 2009 20:36:04 Copyright (c) 1988-2008 Microsoft Corporation Enterprise Edition (64-bit) on Windows NT 6.0 <X64> (Build 6001: Service Pack 1)


There was a KB and hotfix for what seems like the exact same issue in SQL 2005. I don't know if that fix got rolled into SQL 2008.

SQL Server 2005 KB
http://support.microsoft.com/kb/937745
Details
Sign in to post a comment.
Posted by Nizamettin Özpolat on 4/6/2012 at 4:53 AM
I have the same issue. SQL Server version is 2008 SP2. User connection count is about 6000. And applications run disconnectedly. If lots of people have the same problem, microsoft must reproduce and solve it.
Posted by debshutts on 3/1/2012 at 7:56 AM
I am experiencing this exact same issue on a Clustered SQL Server 2008 installation. The server, however, is NOT experiencing high CPU. In fact the CPU is less than 5%. This is a HA Production server, we really need a resolution to this issue. Any thoughts?
Posted by Laurentiu Cristofor [MSFT] on 2/17/2012 at 3:29 PM
FWIW, these are the errors I was seeing in the scenario we were investigating:

2012-02-10 19:56:36.92 spid2421    Error: 18056, Severity: 20, State: 29.
2012-02-10 19:56:36.92 spid2421    The client was unable to reuse a session with SPID 2421, which had been reset for connection pooling. The failure ID is 29. This error may have been caused by an earlier operation failing. Check the error logs for failed operations immediately before this error message.

2012-02-10 19:56:37.29 Logon     Error: 18456, Severity: 14, State: 46.
2012-02-10 19:56:37.29 Logon     Login failed for user 'X\X'. Reason: Failed to open the database configured in the login object while revalidating the login on the connection. [CLIENT: X.X.X.X]

2012-02-10 19:56:38.05 spid4002    Error: 18056, Severity: 20, State: 46.
2012-02-10 19:56:38.05 spid4002    The client was unable to reuse a session with SPID 4002, which had been reset for connection pooling. The failure ID is 46. This error may have been caused by an earlier operation failing. Check the error logs for failed operations immediately before this error message.
Posted by Laurentiu Cristofor [MSFT] on 2/17/2012 at 3:25 PM
Do you have any procedures or jobs that call sp_grantlogin, by any chance?

Our team has recently investigated an issue where 18056 and 18456 errors were showing in logs (2008 R2) and we have finally correlated those issues with calls made to sp_grantlogin. Using CREATE LOGIN instead seems to avoid the problem in our scenario.
Posted by PaulBarbin on 2/9/2012 at 7:56 AM
Still seeing this with 2008 SP2 CU5! This was supposed to be addressed in CU5 and did reduce the number of 18456 errors. But we have seen the 18056 error steadily since the update. Last night we received hundreds of them of them on one server and a similar number on the other server (in the same cluster).

"Error: 18056, Severity: 20, State: 29.
The client was unable to reuse a session with SPID xxx, which had been reset for connection pooling. The failure ID is 29. This error may have been caused by an earlier operation failing. Check the error logs for failed operations immediately before this error message."

Btw, there were no messages logged just prior to these that seem to correlate.
Posted by mbourgon on 12/29/2011 at 8:56 AM
Still seeing this type of thing with R2 SP1.
Posted by Mohan Kumar - SQLServerExperts.com on 5/18/2011 at 3:02 PM
Microsoft has released CU for this issue: http://support.microsoft.com/kb/2543687
Posted by Robert L Davis on 10/28/2010 at 10:09 AM
I have several servers in Production and nonproduction experiencing this problem. They are all hosting SharePoint 2010. Some are hosting PerformancePoint as well. They are all internal MSFT applicaitons if you'd like to get a look at the actual server.

What I see is many instances of the State 29 error followed by a prolonged period of time where nothing at all is logged in the SQL log. At some point during the period where nothing is getting logged, SharePoint reports one or more errors connecting to the server due to login timeout.
Posted by Ikeadale on 9/8/2010 at 3:22 AM
Hi,

We are also getting this issue on Windows server 2008 Standard x64, SQL Server 2008 SP1. What was the outcome of the CSS??
Posted by Yanni Smith on 9/2/2010 at 3:55 PM
My server is on SQL 2008 Enterprise x64 - SP1, CU9 and we experienced this issue three times so far. Twice yesterday and once today. Same symptom and error message. SQL refuse connections from our service layer and created a massive timeout and made our site unresponsive.    I can submit the error log, trace and mini-dump if needed.    Please resolve this ASAP
Posted by AISADMIN on 5/28/2010 at 6:22 AM
Actually i've noticed something interesting meanwhile. Let me explain and see if it helps in resolution.

This happens with the datalist or listview control and custom data paging. But it works fine with GridView :(

In addition i tested with Telerik controls (RadGrid and RadListView).

Means I am using the VERY SAME code in the NeedDataSource event:



' for the RadListView
Me.RadListView1.DataSource = ClassName.GetCatalogProducts(Me.txtSearch.Text.Trim, RadListView1.CurrentPageIndex + 1, 30)
Me.RadListView1.VirtualItemCount = ClassName.CatalogProductsCount(Me.txtSearch.Text.Trim)

' for the RadGrid
Me.RadGrid1.DataSource = ClassName.GetCatalogProducts(Me.txtSearch.Text.Trim, RadListView1.CurrentPageIndex + 1, 30)
Me.RadGrid1.VirtualItemCount = ClassName.CatalogProductsCount(Me.txtSearch.Text.Trim)





And it works perfectly fine for the RadGrid. It loads data and renders very fast.
However RadListView works only up to 3-4 pager clicks. It works also very fast when you go to page #2, #3 and finally when you click the next page (does not matter the page order) it gets locked up .

Means it is not about the code that i use as obviously it works flawless (with RadGrid).

I tried with standard listview control and custom paging . It behaves the very same. It only allows you click up to 3 different pages. Then it hangs to the end of the session (connection timeout i guess).

Is this sort of Bug?

Thanks
Posted by volatiless on 5/26/2010 at 11:10 AM
Same shit 5 times a day.

2010-05-26 15:35:58.39 spid57     The client was unable to reuse a session with SPID 57, which had been reset for connection pooling. The failure ID is 29. This error may have been caused by an earlier operation failing. Check the error logs for failed operations immediately before this error message.
2010-05-26 15:35:58.39 spid59     Error: 18056, Severity: 20, State: 29.
2010-05-26 15:35:58.39 spid59     The client was unable to reuse a session with SPID 59, which had been reset for connection pooling. The failure ID is 29. This error may have been caused by an earlier operation failing. Check the error logs for failed operations immediately before this error message.
2010-05-26 15:35:58.39 spid58     Error: 18056, Severity: 20, State: 29.
2010-05-26 15:35:58.39 spid58     The client was unable to reuse a session with SPID 58, which had been reset for connection pooling. The failure ID is 29. This error may have been caused by an earlier operation failing. Check the error logs for failed operations immediately before this error message.
2010-05-26 15:35:58.40 spid54     Error: 18056, Severity: 20, State: 29.
2010-05-26 15:35:58.40 spid54     The client was unable to reuse a session with SPID 54, which had been reset for connection pooling. The failure ID is 29. This error may have been caused by an earlier operation failing. Check the error logs for failed operations immediately before this error message.
2010-05-26 15:35:58.40 spid55     Error: 18056, Severity: 20, State: 29.
2010-05-26 15:35:58.40 spid55     The client was unable to reuse a session with SPID 55, which had been reset for connection pooling. The failure ID is 29. This error may have been caused by an earlier operation failing. Check the error logs for failed operations immediately before this error message.


Where i can get my money back?
Posted by Pawel Potasinski, MSFT on 5/5/2010 at 4:12 AM
My customer also suffers the problem. Configuration: Windows 2003 Server (x64), SQL Server 2008 Enterprise Edition x64 (10.0.2757). If the issue reocurs they are going to open the CSS case too.
Posted by Glenn A. Berry on 3/18/2010 at 8:28 AM
Please also see Connect 540092 for more information about this issue, which continues with SQL Server 2008 SP1 CU6, running on Windows Server 2008 R2. We have an open CSS ticket for this, and we have supplied minidump files captured during one of these incidents to CSS.
Posted by DBA24hrs Inc on 3/16/2010 at 6:50 AM
Hello, I have been troubleshooting this issue with a client this week and we applied SP1 CU6 as a fix to the issue, but it doesn't seem to have resolved it. Is there any update on this from Microsoft or has this thread gone quiet.
Posted by PMattson on 2/3/2010 at 2:07 AM
I received this error today. (Error: 18056, Severity 20, State: 29)

Server was in a low-stress situation, but otherwise has been operating well for weeks. SQL 2008 SP1 with CU6 x64 running on top of Windows 2008 R2.

Actually I receved two Error: 18056 with State: 29 in a row (each for a different SPID). Error log also says "The client was unable to reuse a session with SPID xxx, which had been reset for connection pooling)

Immediately following the 2 18056 errors was a single "Error: 10982, Severity: 16, State: 1".

After that occured I was no longer able to connect to the SQL Server and a restart was required to get it accepting connections again.

Any further info??
Posted by Grant Fritchey on 1/12/2010 at 7:30 AM
We're experiencing this from one application on one server with State 29, but there's no apparent corresponding overload on the server. Since others are seeing this, would it be possible to get more information on the possible causes or what CSS might be looking for?
Posted by Matthew Hugill on 1/5/2010 at 8:29 PM
I have had this problem the last few days on my SQL 2008 server with the same error as above. Is it possible to maybe see the fix from the CSS support case?
Posted by Microsoft on 7/24/2009 at 12:59 PM
Dear customer,
We have resolved this bug since we understand that you have opened a CSS support case. If you need any help with opening a support case, please let us know.
Posted by Microsoft on 7/8/2009 at 5:56 PM
Dear customer,
We exchanged e-mails with Kevin Cox who was on-site at your location, and who indicated that you would be opening a CSS support case for this issue. Can you confirm whether you have been able to open a CSS support case?

Posted by Glenn A. Berry on 7/1/2009 at 2:43 AM
I have also attached a filtered minidump, zipped up.
Posted by Glenn A. Berry on 7/1/2009 at 2:21 AM
This issue is happening more frequently, about once every 24 hours. I have applied SP1 CU2 (Build 2714) with no improvement. I have captured a filtered minidump, and I have attached the SQL Error logs and default trace files.
Posted by Microsoft on 6/30/2009 at 6:03 PM
Dear customer,
After looking into your problem, I suspect that this is going to be a thorny, stress-related issue, and I think you would get the quickest and best service by opening a case with Microsoft Customer Support Services. They will be able to engage more fully with you than we are able to through Connect. Here are my findings so far based on the information you have provided:

From the error message Error: 18056, Severity: 20, State: 23, the SQL Server process believes it is in the process of shutting down; do you know if these messages appeared before you initiated the restart of the SQL Server process to fix the issue where you were unable to connect? When the SQL Server process is shutting down, this causes all connections to fail, unless the connection is from a user that has the privilege to connect during shutdown. If these messages only appeared after the service restart was initiated, that would explain those particular error messages and they would then not be very interesting.

The other error message, with State 29, is more interesting and harder to deal with; it can indicate that a timeout occurred on the server side while processing login, quite different from a client connection timeout. This can happen when the server is under heavy stress and is unable to find a worker thread to process the incoming login attempt. The hotfix for SQL Server 2005 that you referenced, KB 937745, fixed one cause of long-running login processes which was a frequent root cause of this error and state, but this error and state can still occur when the server's worker threads are all taken by other running processes. If you were already receiving the other error (state 23) before these ones (state 29) started popping up, then one possibility for why the server was unable to process the login in a timely manner is that, if the server is in shutdown state, connections will be ended, and if your applications aggressively attempt to reconnect all at the same time, they may overload the server with reconnect requests, stressing it to the point of being unable to process further logins, leading to this error. That is just a theory, though, and only a possible cause if the state 23 error occurred first; it would definitely require more information to verify and to troubleshoot further, but in that case, the first thing to inspect would be: why was the server shutting down? Assuming that the state 23 errors are a red herring and that state 29 happened first, then this is likely to require investigation of in what way the server is under stress, etc.

If this problem is as regularly-occurring as you indicate, then I would recommend you collect a memory dump of the SQL Server process the next time when it is not accepting connections. This can be done with the SQLDumper utility that installs with SQL Server as per the process described here, with one difference for SQL Server 2008: the utility is now installed at "SQLServerInstallDrive:\Program Files\Microsoft SQL Server\100\Shared" (rather than 90\Shared): http://support.microsoft.com/kb/917825. Of the dump file types that are listed in that article, a "Filtered dump" should contain enough information to proceed with analysis without being an unmanageably large file; it should also compress well with any zip utility.

So, in brief, here are the things that would be helpful to proceed, whenever they are available:
1) The complete ERRORLOG file from the SQL Server instance experiencing the problem
2) A filtered memory dump of the sqlservr.exe process taken when the problem is occurring.
And I really do recommend that you open a case with Microsoft CSS for this issue, since I do not think this is going to be an easy case to resolve.
Posted by Microsoft on 6/24/2009 at 10:39 AM
Hi,

The issue described in KB 937745 has been fixed in SQL Server 2008 so this is likely not the problem. Could you please provide the full error message and any error entries from the server's error log so that we can try to identify the problem?

Thanks.
Posted by AaronBertrand on 6/20/2009 at 7:51 AM
I've heard of at least one other person getting the exact same issue. I am betting it is something going awry in the native client rather than the engine itself, but who knows... maybe they just forgot to port the 2005 fix to 2008 branches...
Sign in to post a workaround.
Posted by Kevin Kunz on 9/25/2012 at 12:56 PM
Finally fixed in SP2 CU1: http://support.microsoft.com/kb/2289254
Posted by Laurentiu Cristofor [MSFT] on 2/17/2012 at 3:32 PM
If you have code calling sp_grantlogin - replace it with code calling CREATE LOGIN.
Posted by Andy Ansryan on 9/11/2010 at 7:38 AM
Since you are running SQL 2008. Turn on data collections and try to figure out the Query which is taking the most CPU ms/sec at the time of the error. This is caused from having an high traffic OLTP system and a long running query (could be from a sql job) running at the same time. If you eliminated all the queyr's to be under 200 CPU(ms) used /sec you will avoid the SQL engine bug. Or else you need to wait for SP2 to come out.
File Name Submitted By Submitted On File Size  
SQLErrorLogs.zip (restricted) 7/1/2009 -
SQLDefaultTraceFiles.zip (restricted) 7/1/2009 -
SQL Filtered Minidump.zip (restricted) 7/1/2009 -