More details: Vtable stubs are assembled on the fly, but there are only about a hundred of them. There is one for each distinct vtable slot (offset within the klassOop of the second indirection). For a monomorphic call site, the caller loads the expected class (as a klassOop) into a standard (non-argument) register, and optimistically jumps to the expected target method. All such target methods are compiled with a special "verified entry point" (VEP) which includes an instruction to compare the incoming klassOop against the receiver's _klass field. If the check succeeds, control falls through to the normal "unverified entry point" (UEP) of the compiled method. If the check fails, control branches out of the (wrong) method into a stub which performs an up-call to the JVM, to figure out what's wrong, and (usually) relink the call site.
Since most call sites are monomorphic, they can be completed in a single branch with a parallel receiver type check. Here is a generic instruction trace of this simple case:
callSite: set #expectedKlass, CHECK call #expectedMethod.VEP --- expectedMethod.VEP: cmp (RCVR + #klass), CHECK jump,ne wrongMethod compiledEntry: ...
Any method that can be a target of such a call has two entry points, the VEP ("verified entry point") and the UEP ("unverified entry point"). The former does a two-instruction type check and then falls into the latter.
Here is a generic instruction trace of a polymorphic interface call. It is the equivalent of the C++ virtual function dispatch.
callSite: set #garbage, CHECK call #vtableStub[vtableSlot] --- vtableStub[vtableSlot]: load (RCVR + #klass), TEM load (TEM + #(vtable + vtableSlot)), METHOD load (METHOD + #compiledEntry), TEM jump TEM --- compiledEntry: ...
In all, that is 3 memory references and two nonlocal jumps.
Note that the intermediate "vtable stub" is customized to the vtable offset (which is usually in the 200-300 range, given that the header of a klassOop is around 200 bytes, or somewhat more on LP64). The vtable stub is not customized to the klass.
The fial memory reference in this instruction trade could be removed by storing a pointer to each method's UEP in vtable (and itable) entries, as C++ doea. This has not been done because it would then be difficult for the interpreter to use the vtables. (An earlier version of the system had two-word vtables, to display both kinds two entry points on an equal footing, but this led to complexities and, worst of all, race conditions.) It is an interesting problem to try to invert the preference the current design gives to the interpreter; the interpreted entry point would have to be hidden somewhere at a fixed offset from the method's compiled entry point, but not close enough to disturb the tuning of the VEP.
Call sites of this polymorphic form are rarely generated directly by the JIT; instead, they are generated to call to a bootstrapping linkage routine, which then sets them (if all goes well) to the monomorphic state described above. When the second receiver type is encountered, the linkage code (at the
wrongMethod jump target above) is called to transition the call site to its polymorphic state. The state changes are summarized in the leading comments of src/share/vm/code/compiledIC.hpp.