Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-9792

Split Kudu scan ranges into smaller chunks for greater paralellelism

Attach filesAttach ScreenshotAdd voteVotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Open
    • Major
    • Resolution: Unresolved
    • None
    • None
    • Backend

    Description

      We currently use one thread to scan each tablet, which may underparallelise queries in many cases. Kudu added an API in KUDU-2437 and KUDU-2670 to split tokens at a finer granularity.

      See
      https://github.com/apache/kudu/commit/22a6faa44364dec3a171ec79c15b814ad9277d8f#diff-a4afa9dba99c7612b2cb9176134ff2b0

      The major downside is that the planner has to do an extra RPC to a tserver for each tablet being scanned in order to figure out key range splits. Maybe we can tie this to mt_dop >= 2, or use some heuristics to avoid these RPCs for smaller tables.

      Attachments

        Issue Links

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            bikramjeet.vig Bikramjeet Vig
            tarmstrong Tim Armstrong

            Dates

              Created:
              Updated:

              Slack

                Issue deployment