Details
-
Improvement
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
None
-
None
-
ghx-label-10
Description
IcebergDeleteNode currently uses an ordered int64_t array for each data file to hold the deleted positions. This can consume significant amount of memory when there are lots of deleted records.
E.g. 100 Million delete records consume 800 MiB memory.
RoaringBitmap is a highly compressed and highly efficient data structure to store bitmaps:
https://arxiv.org/pdf/1603.06549
https://github.com/RoaringBitmap/CRoaring
We could use it to store the deleted file positions instead of the sorted arrays, as
- it consumes significantly less memory
- makes the code simpler
- might have perf benefits