Uploaded image for project: 'Kudu'
  1. Kudu
  2. KUDU-2618

Factor the amount of data into time-based flush decisions



    • Type: Improvement
    • Status: Open
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: 1.8.0
    • Fix Version/s: None
    • Component/s: None
    • Labels:
    • Target Version/s:


      Pure time-based flush can cause small rowset problems when the rate of inserts is so low that hardly any data accumulates before it is flushed.

      On the other hand, cribbing an example from Todd from the KUDU-1400 design doc:

      if you configure your TS to allow 100G of heap, and insert 30G of data spread across 30 tablets (1G each tablet being lower than the default size-based threshold), would you want it to ever flush to disk? or just sit there in RAM? The restart could be relatively slow if it never flushed, and also scans of MRS are slower than DRS.

      As Todd goes on to say

      That said, we could probably make the "time-based flush" somehow related to the amount of data, so that we wait a long time to flush if it's only 10kb, but still flush relatively quickly if it's many MB.

      We should tune time-based flush to wait on average a shorter time to flush if the amount to flush is enough for 1 or more "full-sized" diskrowsets than if the flush is of less data than a full diskrowset.




            • Assignee:
              wdberkeley William Berkeley
            • Votes:
              1 Vote for this issue
              2 Start watching this issue


              • Created: