Search

Struct methods should be inlined by Rüdiger Klaehn

Active

306
Sign in to vote
2
Sign in to vote
Sign in
to vote
Type: Suggestion
ID: 93858
Opened: 8/18/2004 3:30:59 AM
Access Restriction: Public
Duplicates: 94789
1
Workaround(s)
The current JIT compiler does not inline methods that have a struct as a parameter. This makes many methods that could benefit from inlining very slow. For example

public static Point operator + (Point a,Point b)
{
    return new Point(a.X+b.X,a.Y.b.Y);
}

would be a very good candidate for inlining and subsequent optimizations. But since Point is a ValueType this method will never be inlined by the current JIT compiler. This makes all geometric operations like drawing orders of magnitude slower than they could be.
Details (expand)
Product Language
English
Version
Visual Studio 2005
Category
Performance
Operating System
Windows XP Professional
Operating System Language
US English
Proposed Solution
The JIT compiler should inline methods that have struct parameters. This is very important since in some benchmarks I did manually inlining methods using struct parameters led to a factor 5 performance improvement. However manually inlining simple methods should not be nessecary in the year 2004.

I know that C++ does inlining at the IL level. But going back to C++ is not a viable option either.
Benefits
Faster Development
Improved Performance
Other Benefits
Faster Development
File Attachments
1 attachments
Sign in to post a comment.
Posted by Microsoft on 8/20/2004 at 9:10 PM
Thank you for your suggestion. We have triaged this issue and are now tracking it in our primary planning and scheduling database for consideration in a future release.
Posted by Microsoft on 8/23/2004 at 2:41 PM
see devdiv schedule link under links tab
Posted by Microsoft on 8/25/2004 at 4:15 PM
We are unable to address this issue at this time but I have added this suggestion to be considered in the next version of the product. We appreciate your submitting this report.
Posted by Wout de Zeeuw on 12/29/2006 at 11:52 AM
Does moving to C++/CLI provide a work around for this? (I've been experimenting with it a little bit, but I have a hard time detecting whether inlining is taking place or not). Moving over about 20 or 30 structs would be doable.

What's the current status of this bug report by the way?

Wout
Posted by Wout de Zeeuw on 12/29/2006 at 3:34 PM
Strange, I just tested C++/CLI with a static inline method, and don't see a bit of performance difference...

Need to dig deeper I guess...

Wout
Posted by Kevin Frei on 7/16/2007 at 9:01 AM
FYI - we're currently prototyping a fix for this issue. It should be available in the next full release of the CLR.
Posted by Kardax on 7/26/2007 at 11:55 AM
That's great to hear, Kevin :) Will it be included in Beta 2 or RTM?
Posted by Rüdiger Klaehn on 7/21/2008 at 11:57 AM
I just tried my little test program with the .NET 3.5 SP1 beta release. The good news is that this issue has been fixed on x86. The bad news is that it is still NOT fixed on x64.

Here is the output of the program when I run it as x86:

C:\Users\rudi\Desktop>InliningDemo_x86.exe
Mandelbrot struct inlining benchmark
        Width=1000, Height=1000, MaxIter=256
        Min=(-0,5+-1i), Max=(2+1i)
        CLR Version=2.0.50727.3031
        ARCH=x86
        CPU ID=Intel64 Family 6 Model 15 Stepping 11, GenuineIntel
        #CORES=4
Time with manual inlining=00:00:00.5410000
Time with using complex=00:00:00.5250000
Factor=0,970425138632163

Here is the output of the program when I run it on x64:

C:\Users\rudi\Desktop>InliningDemo_anycpu.exe
Mandelbrot struct inlining benchmark
        Width=1000, Height=1000, MaxIter=256
        Min=(-0,5+-1i), Max=(2+1i)
        CLR Version=2.0.50727.3031
        ARCH=AMD64
        CPU ID=Intel64 Family 6 Model 15 Stepping 11, GenuineIntel
        #CORES=4
Time with manual inlining=00:00:00.7070000
Time with using complex=00:00:01.6890000
Factor=2,38896746817539

So the program actually runs three times as fast in x86 mode than in x64 mode. Microsoft: Please fix this ASAP!
Posted by Matt Hope on 8/20/2008 at 1:24 AM
May I second the request for this feature to make it into the x64 Jit
Also on reading http://blogs.msdn.com/clrcodegeneration/archive/2007/11/02/how-are-value-types-implemented-in-the-32-bit-clr-what-has-been-done-to-improve-their-performance.aspx I wasn't clear whether the inlining (as opposed to decomposition/enregistering) had the same limitations on Sequential/<= 4 fields of primitive/reference only.
Posted by Rüdiger Klaehn on 5/21/2009 at 6:02 AM
Still not fixed in the .NET framework 4.0 beta 1:

This is after restricting the program to x86 using corflags /32BIT+:
Mandelbrot struct inlining benchmark
        Width=1000, Height=1000, MaxIter=256
        Min=(-0,5+-1i), Max=(2+1i)
        CLR Version=4.0.20506.1
        ARCH=x86
        CPU ID=Intel64 Family 6 Model 15 Stepping 11, GenuineIntel
        #CORES=4
Time with manual inlining=00:00:00.5440312
Time with using complex=00:00:00.5050288
Factor=0,928308523481742

This is on x64:
Mandelbrot struct inlining benchmark
        Width=1000, Height=1000, MaxIter=256
        Min=(-0,5+-1i), Max=(2+1i)
        CLR Version=4.0.20506.1
        ARCH=AMD64
        CPU ID=Intel64 Family 6 Model 15 Stepping 11, GenuineIntel
        #CORES=4
Time with manual inlining=00:00:00.6090348
Time with using complex=00:00:01.5000858
Factor=2,46305432793003

As you can see, the performance on x64 is a little bit better than with 3.5SP1, but still a factor of three! worse than on x86.

Nice to see that performance has such a high priority at microsoft...
Posted by Microsoft on 8/7/2009 at 9:52 AM
There are several improvements to the 64-bit JIT in .NET Framework 4 that result in better code quality – better inlining and better tail calls are the primary ones. Some of these improvements are documented in our team blog here: http://blogs.msdn.com/clrcodegeneration/.

In this particular case, the poor performance on 64-bit is more due to redundant copies when operating on the complex numbers than the lack of inlining. The x86 JIT does a struct promotion optimization where it promotes the fields of structs to "field local variables" allowing it to eliminate redundant copy operations (it does copy propagation on these field locals) – essentially what you’re doing manually in MandelbrotIteration4. The 64-bit JIT is a different compiler and doesn't have support for struct promotion. (In general, the 2 JITs do different sets of optimizations - something we're trying to reconcile over time. We spent our effort in CLR 4 on removing various limitations in the 64-bit inliner, primarily around generics, and (unfortunately) not on struct promotion).

If you have further questions or concerns, please feel free to write to me directly at <MyFirstName>B at. microsoft dot. com.

Regards,
Surupa Biswas
CLR Code Generation Team
Posted by Rüdiger Klaehn on 10/22/2009 at 1:23 AM
Re: Surupa Biswas

I am aware that the main performance issue on the x64 JIT is the redundant copies. But this is not surprising. Usually, the optimizer works on basic blocks, and inlining functions enables the optimizer to copy the called function into the basic block of the calling function and thus "see" the possible optimizations. So inlining is always just the first step of an optimization.

Why do you have two different JITs anyway? The java virtual machine uses the same code base for a dozen different architectures, and it performs much more sophisticated optimizations (e.g. virtual method inlining, stack allocation).

Will the inlining problems of the x64 JIT be fixed in 4.0 SP1?
Posted by 小陈 xcVista on 12/5/2009 at 11:22 PM
Is it possible to use an attribute to force the debugger inline?