Details
Description
SPARK-21501 changed the spark shuffle index service to be based on memory instead of the number of files.
Unfortunately, there's a problem with the calculation which is based on size information provided by `ShuffleIndexInformation`.
It is based purely on the file size of the cached file on disk.
We're running in OOMs with very small index files (byte size ~16 bytes) but the overhead of the ShuffleIndexInformation around this is much larger (e.g. 184 bytes, see screenshot). We need to take this into account and should probably add a fixed overhead of somewhere between 152 and 180 bytes according to my tests. I'm not 100% sure what the correct number is and it'll also depend on the architecture etc. so we can't be exact anyway.
If we do that we can maybe get rid of the size field in ShuffleIndexInformation to save a few more bytes per entry.
In effect this means that for small files we use up about 70-100 times as much memory as we intend to. Our NodeManagers OOM with 4GB and more of indexShuffleCache.
Attachments
Attachments
Issue Links
- links to