-
Notifications
You must be signed in to change notification settings - Fork 207
Description
Problem
On execution nodes, Linux eventually holds 425+ GB RAM in its file cache (shown as buff/cache in top). This is caused by Linux automatically caching files we read or write.
Among other problems, Grafana's EN Memory Usage chart (and other tools) doesn't exclude the Linux page cache, which obscures Go's memory usage patterns. E.g. Grafana doesn't show operational memory dropped by 250+ GB from PR #1944.
UPDATE: As of June 15 (mainnet18 spork) checkpoint files are 98GB.
Reading and writing large (98GB) files can cause Linux to cache them even after the program exits. For example, checkpointing includes:
- reading 98 GB from old checkpoint file
- writing 98 GB to a new checkpoint file
This 196GB growth in the file cache after each checkpointing is cumulative, and Linux can end up automatically caching 3-5 checkpoint files in memory.
Updates epic #1744
The Proposed Solution
- Avoid clearing out the entire file system cache.
- Drop the new checkpoint file (that was created) from the cache
- Drop the old checkpoint file (that was read) from the cache
- Optionally, also do this for WAL files
Proof of concept
On benchnet (using 53GB files), checkpoint creation began with OS file cache at around 2 GB. Once checkpoint file loading and creation activity begins, the OS cache use might peak at 106GB and then continue using about 105GB after the benchmark program exits.
-
Run checkpoint.00003464 creation benchmark.
OS file cache will be around 105 GB after benchmark program exits. -
Run
dd if=checkpoint.00003464 iflag=nocache count=0(these params won't modify files).
OS file cache will immediately drop by the checkpoint file size (around 53GB).
Outside of benchnet, @zhangchiqing confirmed using the dd command on 3 checkpoint files also reduced the memory used by OS cache by the combined file sizes.
Caveats
- This is primarily aimed at having Grafana, etc. show expected memory use (to avoid hunting for nonexistent leaks, etc.)
- May need to look into special considerations when running inside a container.
