Search

.tail call is much slower than standad call instruction in IL by kskalski

Closed
as External Help for as External

14
Sign in to vote
0
Sign in to vote
Sign in
to vote
Type: Bug
ID: 98236
Opened: 1/22/2005 1:28:57 PM
Access Restriction: Public
0
Workaround(s)
1
User(s) can reproduce this bug

.tail call IL instruction is much slower than normal call, which is very
suprising.
I expect that keeping stack low (actually almost zero size) should increase
performance, not destroy it.
The need for tail calls is common in functional languages like F#, SML.NET,
Nemerle, where recursion and function calls are widely used instead of loops and
gotos / jumps.
Because of this issue existance of tail call in CLR instruction set is almost
unexplainable - it only allows reducing stack overflow problems when using
recursion heavily. This is of course also a problem (as stated in
http://lab.msdn.microsoft.com/productfeedback/viewfeedback.aspx?feedbackid=7996eaa1-351d-40de-aaff-542a7cebabcb)
but with this extra cost of performance loose, it is not really a solution.

I suspect this has something to do with security / stack crawling, but I expect
that with zero-size stack it should be even easier... Could you comment on this?
Details (expand)
Product Language
English
Version
Visual C# 2005 Express Beta 1
Category
CLR
Subcategory
 
Operating System
Windows 2000 Server
Steps to Reproduce
1. Compile both programs:
ilasm.exe tail.il
ilasm.exe notail.il

2. Run them:
time ./tail.exe
time ./notail.exe
Actual Results
$ time ./tail.exe
10001 -27310030

real 0m33.820s
user 0m0.015s
sys 0m0.031s


$ time ./notail.exe
10001 -27310030

real 0m20.263s
user 0m0.015s
sys 0m0.015s
Expected Results
I expect the relative performance to be at least as good as on mono platform,
where tail.exe is 20% faster than notail.exe.
File Attachments
1 attachments
Sign in to post a comment.
Posted by kskalski on 1/22/2005 at 1:46 PM
Just for easier understanding
of hwat is happeing in attached programs, here is the original Nemerle source
code for the testcase:

module Test {

mutable i : int = 3;
mutable count : int = 0;

f : int;
f () : int { 4 }

foo () : void {
    ++count;
    unchecked {
     i = i * (i + i * 34) / 5 + 1;
    }
    goo ();
}

goo () : void {
    unchecked {
     mutable x = 7;
     x *= 3;
     i = i * (i + i * 34) / 7 + 1 + x;
    }
    unless (count > 10000)
     foo ();
}

public Main() : void
{
    for (mutable k = 0; k < 10000; ++k) {
     count = 0;
     foo ();
    }
    _ = f();
    Nemerle.IO.printf ("%d %d\n", count, i);
}
}
Posted by Microsoft on 1/25/2005 at 2:18 PM
This is a quick explanation on why the tail call is slower:
Say foo does a tail call to bar, this is the sequence for a tailcall on x86:
    - foo pushes all the outgoing arguments for bar, just like for any other call
    - then it pushes a DWORD which indictes the number of incoming arguments, the number of locals, and the number of outgoing arugments.
    - It loads the address of bar into a register
    - It calls a helper function called JIT_TailCall.

JIT_TailCall uses the last pushed DWORD to unwind the stack frame of foo, slide the outgoing arugments over the incoming arguments, move the return address appropriately, and then jumps to bar.
For interface calls which are dispatches through a special mechanism known as stub dispatch, there is even more work to be done.

Tail calls are slow because of this. We currently do all this work in a helper function instead of jitted code as jitted code needs to obey some constraints which allow stack-unwinding. The stack-unwinding logic cannot deal with issues like moving around the return address.

Tail calls will be faster on 64bit as the arguments or most methods fit in registers (and so the return address does not need to be moved around).

The CLR Team.
Posted by Microsoft on 11/3/2005 at 2:46 PM
http://blogs.msdn.com/shrib/archive/2005/01/25/360370.aspx has more details about why tailcall is slow in the CLR.