Skip to content

[Execution State] Linux page cache is holding 425+ GB RAM, so make it drop checkpoint files to free up 294-394GB RAM #2261

@fxamacker

Description

@fxamacker

Problem

On execution nodes, Linux eventually holds 425+ GB RAM in its file cache (shown as buff/cache in top). This is caused by Linux automatically caching files we read or write.

Among other problems, Grafana's EN Memory Usage chart (and other tools) doesn't exclude the Linux page cache, which obscures Go's memory usage patterns. E.g. Grafana doesn't show operational memory dropped by 250+ GB from PR #1944.

UPDATE: As of June 15 (mainnet18 spork) checkpoint files are 98GB.

Reading and writing large (98GB) files can cause Linux to cache them even after the program exits. For example, checkpointing includes:

  • reading 98 GB from old checkpoint file
  • writing 98 GB to a new checkpoint file

This 196GB growth in the file cache after each checkpointing is cumulative, and Linux can end up automatically caching 3-5 checkpoint files in memory.

Updates epic #1744

The Proposed Solution

  • Avoid clearing out the entire file system cache.
  • Drop the new checkpoint file (that was created) from the cache
  • Drop the old checkpoint file (that was read) from the cache
  • Optionally, also do this for WAL files

Proof of concept

On benchnet (using 53GB files), checkpoint creation began with OS file cache at around 2 GB. Once checkpoint file loading and creation activity begins, the OS cache use might peak at 106GB and then continue using about 105GB after the benchmark program exits.

image

  1. Run checkpoint.00003464 creation benchmark.
    OS file cache will be around 105 GB after benchmark program exits.

  2. Run dd if=checkpoint.00003464 iflag=nocache count=0 (these params won't modify files).
    OS file cache will immediately drop by the checkpoint file size (around 53GB).

Outside of benchnet, @zhangchiqing confirmed using the dd command on 3 checkpoint files also reduced the memory used by OS cache by the combined file sizes.

Caveats

  • This is primarily aimed at having Grafana, etc. show expected memory use (to avoid hunting for nonexistent leaks, etc.)
  • May need to look into special considerations when running inside a container.

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions