Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-13109

Use RoaringBitmap in IcebergDeleteNode

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • Impala 4.5.0
    • None
    • ghx-label-10

    Description

      IcebergDeleteNode currently uses an ordered int64_t array for each data file to hold the deleted positions. This can consume significant amount of memory when there are lots of deleted records.
      E.g. 100 Million delete records consume 800 MiB memory.

      RoaringBitmap is a highly compressed and highly efficient data structure to store bitmaps:
      https://arxiv.org/pdf/1603.06549
      https://github.com/RoaringBitmap/CRoaring

      We could use it to store the deleted file positions instead of the sorted arrays, as

      • it consumes significantly less memory
      • makes the code simpler
      • might have perf benefits

      Attachments

        Activity

          People

            boroknagyz Zoltán Borók-Nagy
            boroknagyz Zoltán Borók-Nagy
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: