Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-7171

Add docs for Kudu insert partitioning/sorting

VotersWatch issueWatchersLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Closed
    • Major
    • Resolution: Fixed
    • None
    • Impala 2.13.0, Impala 3.1.0
    • Docs
    • ghx-label-6

    Description

      On the page: http://impala.apache.org/docs/build3x/html/topics/impala_kudu.html, at the end of the section: "Impala DML Support for Kudu Tables (INSERT, UPDATE, DELETE, UPSERT)", we should add text like:

      Starting from Impala 2.9, Impala will automatically add a partition and sort step to INSERTs before sending the rows to Kudu. Since Kudu partitions and sorts rows on write, pre-partitioning and sorting takes some of the load off of Kudu, and helps ensure that large INSERTs complete without timing out, but it may slow down the end-to-end performance of the INSERT. Starting from Impala 2.10, the hints "/* +noshuffle,noclustered */" may be used to turn this pre-partitioning and sorting off. Additionally, since sorting may consume a lot of memory, users should consider setting a "mem_limit" for these queries.

      Attachments

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            arodoni Alexandra Rodoni
            twmarshall Thomas Tauber-Marshall
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment