Problem
Although PR #2792 reduces peak memory used by checkpointing by reusing ledger state, we can further reduce peak memory used by over 35GB during checkpoint serialization.
Updates #1744
Proposed Solution
Replace largest data structure used for checkpoint serialization and process subtries instead of entire trie. Also use preallocations when feasible.
Optionally, allow a flag to specify the number of levels to use. Specifying 4 levels will use 16 subtries, which is a reasonable default for impactful memory savings and faster serialization.
Serializing data in parallel is made easier by this proposed change, but that is outside the scope of this issue.
Preliminary Results Using Levels=4 (16 Subtries)
Using August 12 mainnet checkpoint file with Go 1.18.5:
- -37GB peak RAM (
top command), -23GB RAM (go bench B/op)
- -19.6 million (-50%) allocs/op in serialization phase
- -2.7 minutes duration
Before: 625746 ms 88320868048 B/op 39291999 allocs/op
After: 461937 ms 64978613264 B/op 19671410 allocs/op
No benchstat comparisons yet (n=5+) due to duration and memory (requires the big benchnet-dev-004 server).
EDIT: added more details after reading PR #3050 review comments.