Uploaded image for project: 'Kudu'
  1. Kudu
  2. KUDU-3153

Use full DRS size when considering rowsets to compact

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Major
    • Resolution: Unresolved
    • None
    • None
    • compaction, tserver
    • None

    Description

      We sometimes encounter interesting behavior when viewing the rowset layout diagram, like the quartiles indicating well-compacted (32MB-sized) rowsets, while the compaction policy dump shows all rowsets very much undersized (around 10MB).

      Looking through what's used where, a snippet from the patch for KUDU-2701 indicates, the policy considers only base data and redo files sizes, excluding the PK index and bloom filters:

      It's not totally clear to me why just base data and REDOs are used, ...

      After some spelunking, it seems like the usage of base data + redo file size stems from a time when DiskRowSet didn't have an interface to get the full size of the DRS, as seen in an older version of RowSetInfo and the corresponding version of diskrowset.h.

      We should probably consider using the full size of the DRSs – I suspect that would give us more fruitful estimates to the efficacy of a compaction, especially in the context of a "small rowset" compaction (see KUDU-1400).

      Attachments

        Activity

          People

            Unassigned Unassigned
            awong Andrew Wong
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated: