Uploaded image for project: 'Kudu'
  1. Kudu
  2. KUDU-2844

Avoid copying strings from dictionary or plain-encoded blocks

Attach filesAttach ScreenshotVotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • 1.13.0
    • cfile, perf
    • None

    Description

      When scanning a plain or dictionary-encoded binary column, we currently loop over each entry and copy the string into the destination RowBlock's arena. In TPCH Q1, the scanner threads use a significant percentage of CPU doing this copying, and it also increases CPU cache footprint which likely decreases performance in downstream operations like predicate evaluation, merging, result serialization, etc.

      Instead of doing this, we could "attach" the dictionary block (with ref-counting) to the RowBlock and refer directly to the dictionary entry from the RowBlock. When the RowBlock eventually is reset, we can drop the reference. This should be safe because we never mutate indirect data in-place.

      Attachments

        Issue Links

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            tlipcon Todd Lipcon
            tlipcon Todd Lipcon
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment