The copying algorithm basically starts scanning the class metadata using MetaspaceClosure, starting from the set of roots provided by StaticArchiveBuilder::iterate_roots() or DynamicArchiveBuilder::iterate_roots(). The copying is done in the following steps:
Step 1. Determine Iteration Order
When dumping the static archive, we want the contents to be deterministic (see JDK-8241071). Basically, if you run "java -Xshare:dump" twice, you should get two classes.jsa files that are bit-by-bit identical.
See comments in ArchiveBuilder::gather_klasses_and_symbols() for more detail.
Step 2. Categorize Read-only and Read-write Objects
This step is implemented by ArchiveBuilder::gather_source_objs(), which iterates the metadata with ArchiveBuilder::iterate_sorted_roots(). All objects that are eligible for copying are entered (by reference) into ArchiveBuilder::_rw_src_objs or ArchiveBuilder::_rw_src_objs, depending on whether they are read-write or read-only.
There is a pointer at 0x108 (&foo->barPtrA, offset == 0x08) There is a pointer at 0x118 (&foo->barPtrB, offset == 0x18)
Step 3. Copy Source Objects into Output Buffer
This is done by ArchiveBuilder::dump_rw_region() (and ArchiveBuilder::dump_ro_region()). The copying is fairly straightforward: all the objects that should be copied into the RW region are already stored in the array inside _rw_src_objs, along with their sizes. So we just linearly allocate the copies in ArchiveBuilder::_rw_region, and copy the contents of the source objects to their copies using memcpy(). See ArchiveBuilder::make_shallow_copy() for details.
With the above example, if _rw_region starts at 0x400, the copies will look like this:
source foo @ 0x100 -> copy of foo @ 0x400 source bar1 @ 0x200 -> copy of bar1 @ 0x428 source bar2 @ 0x208 -> copy of bar2 @ 0x420
Step 4. Relocate Embedded Pointers
During this step, we update the pointers embedded in the copies. See ArchiveBuilder::relocate_embedded_pointers() for details. Here's an illustration of how it works.
When we update the embedded pointers, we also use ArchivePtrMarker::mark_pointer() to mark the location of all the embedded pointers. This information is used for relocating the entire archive. E.g., if we want to relocate the output buffer from 0x400 to 0x500, we need to update the embedded pointer of 0x428 to 0x528. See VM_PopulateDumpSharedSpace::relocate_to_requested_base_address() for more info.
In the previous implementation of static dump, we used MetaspaceClosure for 3 times:
- copy the RW objects
- copy the RO object
- relocate embedded pointers
However, MetaspaceClosure is slow. In the new implementation, we using MetaspaceClosure only twice (in Steps 1 and Step 2). During Step 2, we remember the size of all the source objects, as well as the location of the embedded pointers. This eliminates the use of MetaspaceClosure in the subsequent steps for making copies and relocating embedded pointers. The resulting code is faster, and also easier to understand (no need to think about recursion during copying, etc).
The previous implementation of dynamic dump used MetaspaceClosure even more. As a result, it gets a more pronounced speed up from the new implementation.
Here are the elapsed time of the following test cases (which archive more than 20000 classes) using fastdebug build:
|Old||42.655 sec||67.014 sec|
|New||37.027 sec||34.974 sec|