Uploaded image for project: 'Kudu'
  1. Kudu
  2. KUDU-1567

Short default for log retention increases write amplification

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 0.10.0
    • 1.1.0
    • perf, tserver
    • None

    Description

      Currently the maintenance manager prioritizes flushes over compactions if the flush operations are retaining WAL segments. The goal here is to prevent the amount of in-memory data from getting so large that restarts would be incredibly slow. However, it has a somewhat unintuitive negative effect on performance:

      • with the default of retaining just two segments, flushes become highly prioritized when the MRS only has ~128MB of data, regardless of the "flush_threshold_mb" configuration
      • this creates lots of overlapping rowsets in the case of random-write applications
      • because flushes are prioritized over compactions, compactions rarely run
      • the frequent flushes, combined with low priority of compactions, means that after a few days of constant inserts, we often end up with average "bloom lookups per op" metrics of 50-100, which is quite slow even if the blooms fit in cache.

      Attachments

        Issue Links

          Activity

            People

              tlipcon Todd Lipcon
              tlipcon Todd Lipcon
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: