Uploaded image for project: 'Kudu'
  1. Kudu
  2. KUDU-2693

Buffer DiskRowSet flushes to more efficiently write many columns

    XMLWordPrintableJSON

    Details

    • Type: Improvement
    • Status: Open
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: 1.9.0
    • Fix Version/s: None
    • Component/s: fs, tablet
    • Labels:
      None

      Description

      When looking at a trace of some MRS flushes on a table with 280 columns, it was observed that during the course of the flush some 695 fdatasync() calls occurred.

      One possible way to minimize the number of fsync calls would be to flush directly to memory buffers first, determine the ideal layout on disk for the flushed blocks (possibly striped across one log block container per data disk) and then potentially write the data out to the containers in parallel. This would require some memory buffer space to be reserved per maintenance manager thread, possibly 64MB since the DRS roll size is 32MB.

      According to Todd we could probably do it all in LogBlockManager by adding a new flag to CreateBlockOptions that says whether to buffer or something like that.

        Attachments

          Activity

            People

            • Assignee:
              tlipcon Todd Lipcon
              Reporter:
              mpercy Mike Percy
            • Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

              • Created:
                Updated: