anything similar for higher level languages (c# or the likes)?

andor · on Dec 5, 2013

The Java Hotspot VM can still optimize for this case, if the virtual call leads to only a few classes most of the time. Several virtual methods can be inlined, but of course there's still an extra step compared to static dispatch: the classes of the current object and the inlined methods have to be compared. If no matching method is inlined, control needs to be passed back to the VM.

pmjordan · on Dec 5, 2013

A fascinating article on this type of optimisation:

http://www.azulsystems.com/blog/cliff/2010-04-08-inline-cach...

MrBuddyCasino · on Dec 5, 2013

Interesting read, I didn't know that making fields final in Java does nothing for performance. Any idea what those "Generic Popular Frameworks" are? I put my money on Hibernate.

army · on Dec 5, 2013

This is less of a problem for systems with JIT compilation (including Java, C#, etc). They can recompile the code at runtime, which allows some nice tricks for virtual calls. They can turn a virtual call into a regular call with inline caching (http://en.wikipedia.org/wiki/Inline_caching), or can even compile a specialized version of the code for a given type and inline the entire virtual function.

kevingadd · on Dec 5, 2013

The .NET and Java VMs are in fact able to do some inlining on virtual/interface calls and do other sorts of smart dispatch. So the cost of a virtual method in .NET and Java is not necessarily equivalent to the cost in C++.

MichaelGG · on Dec 5, 2013

I tried a simple example[1] in C#, .NET 4.5, both 32 and 64-bit, just looping and calling an Add method. The adding the keyword virtual increased runtimes > 200%. JVMs might do this, but the CLR's codegen doesn't.

An old blog post by one of the CLR engineers[1] states:

"We don't inline across virtual calls. The reason for not doing this is that we don't know the final target of the call. We could potentially do better here (for example, if 99% of calls end up in the same target, you can generate code that does a check on the method table of the object the virtual call is going to execute on, if it's not the 99% case, you do a call, else you just execute the inlined code), but unlike the J language, most of the calls in the primary languages we support, are not virtual, so we're not forced to be so aggressive about optimizing this case."

I guess things haven't changed. My testing with the CLR indicates that for best performance, you should make sure your IL is already inlined. The CLR does much better with huge function bodies.

1: http://pastebin.com/98c7Bt7f 2: http://blogs.msdn.com/b/davidnotario/archive/2004/11/01/2503...

kevingadd · on Dec 5, 2013

The CLR's inlining for virtual calls is constrained specifically to interfaces, not to all uses of 'virtual', IIRC. (Interfaces are still very much virtual calls, they just don't use the 'virtual' keyword.)

See [1] for an example where the CLR fully inlines a virtual call (through an interface, specifically)

The call is most definitely virtual (or dynamic if you prefer that term), not statically-dispatched. It just happens to be performed through an interface. I suspect the CLR optimizes this because interfaces are incredibly common (IEnumerable, etc.)

[1] http://msdn.microsoft.com/en-us/library/ms973852.aspx

MichaelGG · on Dec 6, 2013

I get even worse results using an interface for the Add function. That is, if I do "new X() as InterfaceType" then call the function, the performance is 5x worse than if I don't cast to the interface. This is in a tight loop doing an add.

Do you have any actual examples of making an interface call that gets inlined? This post[1], dated 2004 (later than the MSDN article you referenced) from Eric Gunnerson says:

"all the compiler knows is that it has an IProcessor reference, which could be pointing to any instance of an type that implementes IProcessor. There is therefore no way for the JIT to inline this call - the fact that there is a level of indirection in interfaces prevents it. That's the source of the slowdown."

He goes on to say that Sun does do something since Java makes everything virtual, and the CLR could do it in theory, but doesn't.

I skimmed through the linked article you provided but didn't find any mention of inlining interface method calls. On the excellent performance of virtual/interface calls, it says:

"the combination of caching the virtual method and interface method dispatch mechanisms (the method table and interface map pointers and entries) and spectacularly provident branch prediction enables the processor to do an unrealistically effective job"

1: http://blogs.msdn.com/b/ericgu/archive/2004/03/19/92911.aspx...

kevingadd · on Dec 6, 2013

I've seen interface calls be inlined in action on the modern CLR when looking at disassembly. I don't understand why you would have expected the interface call to be faster than a normal non-virtual call? The interface call always needs a type check before the inlined call body in case of polymorphism; it can't be as fast as a normal call.

Or are you saying the interface call is 5x slower than a virtual call? That definitely isn't right.

MichaelGG · on Dec 6, 2013

I expect an inlined interface or virtual call to be the same as an inlined non-virtual call. But since the CLR (4.5, Windows 7 x64, using 32 or 64-bit codegen) won't emit an inlined virtual/interface call for int Add(int, int) -- it's slower.

In my simple program doing a loop, calling an Add function on an interface, it is definitely making a function call each time. It unrolls 4 times, and loads the function pointer once per iteration - I'd have though it would only load it once overall. Loop is 89 bytes. There is no conditional inside the loop to check for the type.[1]

If I change it to not use the interface (don't cast to the interface type), it's unrolled and inlined. Loop is 34 bytes.[2]

It's the same on 32-bit, except there's no unrolling. The non-virtual loop body is 2 instructions (inc, add). The interface has a push, 3 movs and a call. The virtual one requires two extra movs (to load the function pointer - with an interface the address is embedded as a literal).

Shrug. Maybe it still doesn't work with value types? I started it without VS then broke in with the debugger to get the disassembly.

The loop is doing "y = x.Add(y, i)" where y is a local.

Edit: Aha! Using an interface method (not virtual) and strings, I was able to get inlining. I guess the CLR is still weak in dealing with value types.

1: Start of the loop using an interface:

  lea r8d,[rdi-1]
  mov rbx,qword ptr [FFEEFE60h]
  mov edx,eax
  mov rcx,rsi ; rsi is the object pointer
  lea r11,[FFEEFE60h] ; I am embarrassed to admit I don't know what r11 is doing
  call rbx ; just does lea eax[rdx+r8], ret
  ; similarly 3 more times then loop

2: Without using the interface, the loop body:

  lea eax,[r8-1] ; r8's the counter
  add ecx,eax   
  lea edx,[rcx+r8]
  lea ecx,[rdx+rax]
  lea eax,[r8+2]
  add ecx,eax
  ; then loop

kevingadd · on Dec 7, 2013

Nice work digging in and figuring out how to trigger it. I fiddled around some earlier and wasn't able to reproduce the behavior I saw before, so I gave up. :) You are correct that the CLR does a poor job optimizing value types, and I probably made the same mistake (i.e. used a struct)

CmonDev · on Dec 5, 2013

I think there is usually something else to optimize before this becomes a problem. And if it becomes a problem you need a low-level language anyway?

MichaelGG · on Dec 5, 2013

But in cases of needless virtual calls (doesn't Java default to virtual for some strange reason?) it may be a quick and easy win.

Additionally, it's not always so easy to drop to a low-level language. If your architecture is enormous and complicated, it might be totally unfeasible to change languages for hot parts.

army · on Dec 5, 2013

In Java, all methods are virtual. You can often achieve a similar effect to non-virtual methods by declaring them final to prevent them being overridden in subclasses, but the same rules about which method is called apply. The reason to simplify the language (in comparison to C++) - the rules about which method are called are much simpler and easy to remember.

MichaelGG · on Dec 5, 2013

Simpler? The only case it matters is when a subclass has shadowed a non-virtual method. "Simpler" would be simply disallowing shadowing.

AndrewBissell · on Dec 6, 2013

Not having to think about the question "should I make this method virtual?" makes the Java language simpler, yes.

In the vast majority of cases where a virtual function call is actually monomorphic or bimorphic at runtime, the JVM JIT can observe that and potentially inline the method (with an if statement in the bimorphic case). It puts guards around the inlined method and deoptimizes in the event that a newly loaded class renders the optimization incorrect.

MichaelGG · on Dec 6, 2013

It just flips it for "should I make this method final?" So at best it's a wash. In the majority of cases, a function shouldn't be virtual so users need to mark it final to accurately represent their design.

AndrewBissell · on Dec 6, 2013

Well, the answer to "should I make x final" is always "yes" in Java, so you don't really have to think too hard about that either. :-) `final` really should have been the default setting for all methods, classes, and local variables, but unfortunately inheritance was still in vogue when Java was created, and the benefits of immutable values weren't as well understood or appreciated either (or maybe they just figured the C & C++ programmers they were trying to win over would hate it).

kasey_junk · on Dec 5, 2013

Also depending on the JVM and JIT settings over time those virtual methods get treated as non-virtual.

CmonDev · on Dec 6, 2013

There you go - something else to fix before other optimizations: "your architecture is enormous and complicated".