Home Dashboard Directory Help
Search

.NET Framework 3.5 SP1 breaks type verification by David A Nelson


Status: 

Closed
 as Fixed Help for as Fixed


75
1
Sign in
to vote
Type: Bug
ID: 361539
Opened: 8/12/2008 3:28:17 PM
Access Restriction: Public
2
Workaround(s)
view
22
User(s) can reproduce this bug

Description

Our production application which works with .NET 3.5 installed no longer works after installing .NET 3.5 SP1. It throws a type load exception. I have distilled the problem to its essence and the result is in the steps to reproduce. This (legal) C# code compiles correctly, but the resulting assembly will no longer pass PEVerify once SP1 is installed. My suspicion, based on seeing what works and what doesn't, is that the problem is related to generic type unification.

I would like to emphasize again that this is not a theoretical exercise. This is a distillation of our production application which runs fine prior to installing .NET 3.5 SP1. We cannot install SP1 on our users' machines until this is resolved.
Details
Sign in to post a comment.
Posted by Microsoft on 6/3/2009 at 2:06 PM
Hello,

This issue has been fixed. It will be part of next .NET Framework 4.0 release (incl. 4.0 Beta2).
We have also released a QFE patch for 3.5 SP1, downloadable here:
http://code.msdn.microsoft.com/KB970510
https://connect.microsoft.com/VisualStudio/Downloads/DownloadDetails.aspx?DownloadID=19242

Thank you very much for your patience,
-Karel Zikmund
Developer on CLR team.
Posted by Microsoft on 4/24/2009 at 12:24 PM
To update on this item, we currently have a fix candidate that is being evaluated and tested both internally and externally. A publicly available fix will be available very soon.

Thank you,
Andrew Dai
Posted by gatt327 on 4/23/2009 at 5:50 AM
Today we received a fix from Micrsoft PSS as a trial if it worked.
And our problem has gone. We will test more thoroughly but at first glance it looks OK.
Posted by gatt327 on 3/12/2009 at 9:05 AM
Yes we did encounter the same problem. It was sent to Microsoft PSS two months ago and we were informed on 9-mar-09 that we can expext a first version of a fix within two months. It took them this time because the problem is complicated.
Note: Microsoft found out that there are scenarios where you can get the same problem in .NET 3.5 without sp1 (different use of generics, not the same example). So it is wise to add a 'PEVerify' step to your build procedures to make certain that your code is not effected.
Posted by Microsoft on 2/27/2009 at 1:01 PM
Sorry for the long delay between updates on this issue. We have been working on it, but our initial analysis and attempts to fix this issue in a robust way have taken longer than we hoped. We have been making progress however, and are working on fixes for both .NET Framework 3.5 SP1 and .NET Framework 4.0. As we've mentioned before, these are delicate areas of our codebase and we do not want to rush the fixes and potentially regress other functionality. So unfortunately we can't make any promises on timeframes. When we have a fix for 3.5 SP1, we'll update this bug.

Regards,
--Michael Downen
Posted by Pavel Minaev [MSFT] on 2/12/2009 at 1:07 PM
> Although I explicitly instructed my assemblies to use the .NET 2.0 framework.

This has nothing to do with what assemblies are used; the problem is with 3.5 SP1 _runtime_. If you have SP1, it replaces the old runtime, and all .NET 2.0+ apps run under it; there's no way to opt out of it and use the pre-SP1 one.

Posted by ridkun on 2/5/2009 at 2:28 AM
It broke my existing deployment already. Fortunately, I could fix my class to workaround.
I wish it'll get fixed soon.
Posted by Sentient on 1/7/2009 at 10:01 AM
Got the same issue that suddenly my click once deployment stopped working. On startup I got the error "because it attempts to implement a class as an interface". I uninstalled .NET 3.5 SP1 and it works again. Although I explicitly instructed my assemblies to use the .NET 2.0 framework.

There is something fishy going on.
Posted by Łukasz Świątkowski on 12/22/2008 at 11:28 AM
I took a closer look at it, and I admit that you're right, the code is valid and shoud execute without any errors.
Posted by David A Nelson on 12/18/2008 at 7:58 AM
I have just about given up on actually getting a response from anyone, but I thought I should try one more time. Is anyone actually working on getting this fixed, and if not could you at least tell us that you're not going to fix it?
Posted by David A Nelson on 12/11/2008 at 8:02 AM
LukeSw,

It is not "abuse" if it is considered valid by the C# specification, as has already been discussed.
Posted by Łukasz Świątkowski on 12/11/2008 at 7:28 AM
On the second thought, it should display the error at this line:
public class DoesntWork<T> : Works<T, T>
Posted by Łukasz Świątkowski on 12/11/2008 at 7:22 AM
There should be a fix for the compiler, so it would display an error:
'Works<T,U>' cannot implement both 'Works<T>' and 'IBlank<U>' because they may unify for some type parameter substitutions
The new version of CLR and generic type unification works now perfectly, and the many examples of the code which once worked, and now does not, IMHO are examples of a "generic abuse".
Posted by David A Nelson on 10/31/2008 at 10:25 AM
It has been over a month since I have heard anything about this issue. Has there been any progress? Is a fix going to be included in the upcoming GDR? I need to know if my organization is ever going to be able to install SP1 in order to do future project planning.
Posted by David A Nelson on 9/29/2008 at 8:06 AM
To update those who are interested, I have been communicating by email with Jon, as well as Mike Downen, PM Lead for the CLR. The following points have come up in our conversation:

1) This bug is the result of a significant rewrite of the type loading and verification code in the CLR for SP 1. The goal of this rewrite was to improve startup time for client applications.

2) Prior to the release of SP 1, the CLR team was aware that the rewrite had resulted in certain problems around generic type unification, but they "underestimated the customer impact" and chose to release the new CLR code anyway.

3) As of 9/5, Mike estimated that it would take "a few more weeks" to complete the investigation into "how we would re-enable this scenario and if/how we can port that to a 3.5 SP1 hotfix versus re-enabling it in our next major version."

4) According to recent blog posts (http://www.hanselman.com/blog/UpdateOnNETFramework35SP1AndWindowsUpdate.aspx), there are currently no plans to include a fix for this issue in the upcoming SP 1 patch to be released prior to pushing SP 1 to Windows Update.

I would strongly encourage anyone who has been affected by this issue to email Jon (jlangdon_at_microsoft.com) and Mike (mdownen_at_microsoft.com) with information about the impact that this issue has had, the type of applications affected, and the deployment scenarios for those applications, so that the CLR team can appropriately judge the customer impact of getting this issue fixed.
Posted by Tom Clement on 9/26/2008 at 3:24 PM
We have this problem in production code. If any existing customer installs .NET 3.5 SP1, our application will fail to run. Given these definitions:

class Test
interface ITestList : IList<Test>
class TestList<T> : List<T>, ITestList        //it's this interface that causes the problem
class TestList2 : TestList<Test>                 // this class fails

Any attempt to instantiate TestList2 will cause an immediate crash with a failure to load the containing module.

Microsoft, are there any plans in the works to address this problem?
Tom Clement
Posted by David A Nelson on 9/2/2008 at 8:16 AM
Alex,

In your example casting to IBlank<int> produces Works<int, int>, which is in fact the closest implementation to the actual type of the instance.
Posted by AlexFeinman on 9/1/2008 at 1:18 PM
With regard to Jon's comment re inheritance wanted to point out that the following scenario:
    public interface IBlank<T>
    {
        void Method();
    }
    public class Works<T> : IBlank<T>
    {
        #region IBlank<T> Members

        public void Method()
        {
            Console.WriteLine("In Works<{0}>.Method", typeof(T).Name);
        }

        #endregion
    }
    public class Works<T, U> : Works<T>, IBlank<U>
    {
        #region IBlank<U> Members

        void IBlank<U>.Method()
        {
            Console.WriteLine("In Works<{0},{1}>.Method", typeof(T).Name, typeof(U).Name);
        }

        #endregion
    }


            Works<int, int> wIntInt = new Works<int, int>();
            wIntInt.Method();
            ((IBlank<int>)wIntInt).Method();

produces
In Works<Int32>.Method
In Works<Int32,Int32>.Method

So casting to IBlank<int> produces the furthest and not the closest in the interface hierarchy
Posted by John Saunders on 8/28/2008 at 12:33 PM
Has anyone verified whether this was broken in the beta? We've been running the beta, and if we've been running with this problem and didn't notice it, then it should be just as safe (or unsafe) for us to move to the RTM (as far as this particular issue).
Posted by Pavel Minaev [MSFT] on 8/22/2008 at 1:06 AM
Jon:
> If T and U are both an int and you cast an instance of DoesntWork<int> to IBlank<int>, which implementation of IBlank<int> would you expect to get?

Clearly, the one closest to me in the inheritance chain (precisely as it works with non-generic interfaces). Isn't it obvious?

Since C# does not have MI, there's no problem with inheriting two implementations of the same interface from two unrelated classes, where the true ambiguity lies.
Posted by David A Nelson on 8/21/2008 at 6:03 PM
Jon,

To the best of my understanding, applying the interface implementation test from section 13.4.2 of the C# specification to your example does not result in any generic type unification, and therefore does not result in any ambiguity. Perhaps there is an even more complicated situation that I am not thinking of, but I don't see a problem here. If you send me an email at david AT commongenius DOT com, I can elaborate.
Posted by Microsoft on 8/21/2008 at 5:16 PM
David,

Thanks so much for the additional details as to your particular usage. It's been helpful. Clearly, in your case, it worked before and it doesn't now and we appreciate the issue that's causing you. When trying to determine the right way to address this issue we're exploring the scope beyond your particular repro and are finding cases where the type resolution could be ambiguous. In fact, even in the case you wrote below,

public interface IBlank<T>
{
}
public class Works<T> : IBlank<T>
{
}
public class Works<T, U> : Works<T>, IBlank<U>
{
}
public class DoesntWork<T> : Works<T, T>
{
}

If T and U are both an int and you cast an instance of DoesntWork<int> to IBlank<int>, which implementation of IBlank<int> would you expect to get?

We're giving this issue our full attention and sincerely appreciate your input on the matter. Also, if you wouldn't mind sending me your e-mail address, I'd love to follow-up with you offline to get some more details about the scope of impact this issue is having on you and/or your customers.

Regards,
Jon
Posted by David A Nelson on 8/19/2008 at 8:43 AM
Jon,

Our production code is somewhat more complicated than the repro case i have provided. I am attaching a small project which contains all of the relevant type definitions, minus implementation and a few internal interfaces which don't contribute to the hierarchy.

The code in question is on the critical path of our data access layer; with SP1 installed, the application throws a TypeLoadException within moments of launching, and is completely non-functional. I have not yet fully investigated the possibility of redesigning the DAL to remove this pattern; I don't believe it would be possible without affecting higher levels of the application and probably requiring significant restructuring. I would prefer not to have to do that; for now we are simply not allowing SP1 to be installed.

Can you clarify what you mean by "potentially ambiguous"? I don't want to sound naive, but this code works fine on the 2.0 CLR prior to installing 3.5 SP1.

Please let me know if I can help in any other way.

David
Posted by Microsoft on 8/18/2008 at 7:56 PM
Thank you for the simplified repro. It has helped us narrow our investigation greatly. However, you mentioned that your repro is distilled from your production code. In order to make sure we're scoping this accurately it would be good to understand how similar the repro is to your production case(s), with respect to inheritance/implementation chains and number/type of type parameters, or if you're doing something more complex. We want to make sure we understand not just the simple case but at least your case as well since "folding" this type of code from a compiler standpoint could potentially be ambiguous depending on the constructions.

Also, can you help us understand the severity of the impact to your users? Is your application functional at all in light of this issue?

Thanks,
Jon
Posted by Pavel Minaev [MSFT] on 8/17/2008 at 4:10 AM
Well, it's a serious one - it breaks a perfectly valid use case for generics, it does so in a subtle manner (no compiler error), and it can potentially break a lot of existing generic-reliant code. Personally, I wouldn't be surprised if we have something like that in our codebase as well, just haven't found it... so it is certainly a highly critical issue, and it seems to be widely recognized - hopefully, enough so for MS to take swift action and release a hotfix.
Posted by Microsoft on 8/16/2008 at 9:36 AM
I wanted to follow up and let you know that the CLR team is looking into this issue. We greatly appreciate your bringing this to our attention and will provide status updates as the investigation progresses.

Kind Regards,
Jon Langdon
Program Manager
Common Language Runtime
Posted by David A Nelson on 8/15/2008 at 7:19 AM
Agreed, probably not related. Plus one is throwing a TypeLoadException and one an ExecutionEngineException.

Although I do believe that is the first time I have seen a Connect bug get so many votes in just two days.
Posted by Pavel Minaev [MSFT] on 8/15/2008 at 5:20 AM
I don't see the relation between those two. This one is clearly related to generic interface inheritance, and possible duplication of interfaces in the interface chain that may occur as the result of that. The other one does not involve inheritance at all.
Posted by Cine on 8/15/2008 at 2:29 AM
361606 is probably closely related to this issue
Posted by Pavel Minaev [MSFT] on 8/14/2008 at 12:03 AM
I'm one of those who validated this, and no, no NGen here either. Just took the sample code, compiled it from command line, and fed the .dll to peverify. It reproduces perfectly.
Posted by David A Nelson on 8/13/2008 at 11:08 PM
I have independently verified the bug on multiple machines now, all Windows XP Pro SP2 with all .NET versions from 1.1 through 3.5 installed. Compiling the repro code passes PEVerify. After installing 3.5 SP1, PEVerify fails.
Posted by David A Nelson on 8/13/2008 at 9:57 PM
Additional information:

Uninstalling SP1 did not immediately solve the problem. I had to install .NET 3.5, then 3.0, and finally 2.0, and then reinstall them all (except for 3.5 SP1) before the code below would pass PEVerify.

Also, in answer to questions from Scott Hanselman:

"Did your app get ngen'd? Perhaps you'd need to remove the gen'ed
native image? No recompile right? Just running?"

No part of our application was ngen'd. There was no recompile, just running the application after installing SP1 resulted in the TypeLoadException. And again, I was able to reproduce the problem without our application, using just the code below. When I ran PEVerify on my machine with SP1 installed on the assembly built from that code, it failed. When I copied that assembly to our network and a coworker without SP1 ran PEVerify, it passed.

Let me know if you need any other information from me.
Posted by Microsoft on 8/12/2008 at 9:33 PM
We were able to reproduce the issue you are seeing. We are escalating this bug to the product unit who works on that specific feature area. The product team will review this issue and make a decision on whether they will fix it or not for the next release.
Posted by Microsoft on 8/12/2008 at 9:27 PM
We were able to reproduce the issue you are seeing. We are escalating this bug to the product unit who works on that specific feature area. The product team will review this issue and make a decision on whether they will fix it or not for the next release.
Sign in to post a workaround.
Posted by gatt327 on 5/7/2009 at 3:39 AM
Today we tested the official fix from Microsoft.
It works for us.
Ask for: KB Article Number 970510.
Posted by gatt327 on 4/23/2009 at 6:34 AM
Today we received a fix from Micrsoft PSS as a trial if it worked.
And our problem has gone. We will test more thoroughly but at first glance it looks OK.