Uploaded image for project: 'Kudu'
  1. Kudu
  2. KUDU-2618

Factor the amount of data into time-based flush decisions

    XMLWordPrintableJSON

    Details

    • Type: Improvement
    • Status: Open
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: 1.8.0
    • Fix Version/s: None
    • Component/s: None
    • Labels:
      None
    • Target Version/s:

      Description

      Pure time-based flush can cause small rowset problems when the rate of inserts is so low that hardly any data accumulates before it is flushed.

      On the other hand, cribbing an example from Todd from the KUDU-1400 design doc:

      if you configure your TS to allow 100G of heap, and insert 30G of data spread across 30 tablets (1G each tablet being lower than the default size-based threshold), would you want it to ever flush to disk? or just sit there in RAM? The restart could be relatively slow if it never flushed, and also scans of MRS are slower than DRS.

      As Todd goes on to say

      That said, we could probably make the "time-based flush" somehow related to the amount of data, so that we wait a long time to flush if it's only 10kb, but still flush relatively quickly if it's many MB.

      We should tune time-based flush to wait on average a shorter time to flush if the amount to flush is enough for 1 or more "full-sized" diskrowsets than if the flush is of less data than a full diskrowset.

        Attachments

          Activity

            People

            • Assignee:
              Unassigned
              Reporter:
              wdberkeley William Berkeley
            • Votes:
              1 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated: