Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-5004 Switch to sorting node for large TopN queries
  3. IMPALA-7836

Impala 3.1 Doc: New query option 'topn_bytes_limit' for TopN to Sort conversion

    XMLWordPrintableJSON

Details

    • Sub-task
    • Status: Closed
    • Major
    • Resolution: Fixed
    • Impala 2.9.0
    • Impala 3.1.0
    • Docs, Frontend
    • None

    Description

      IMPALA-5004 adds a new query level option called 'topn_bytes_limit' that we should document. The changes in IMPALA-5004 work by estimating the amount of memory required to run a TopN operator. The memory estimate is based on the size of the individual tuples that need to be processed by the TopN operator, as well as the sum of the limit and offset in the query. TopN operators don't spill to disk so they have to keep all rows they process in memory.

      If the estimated size of the working set of the TopN operator exceeds the threshold of 'topn_bytes_limit' the TopN operator will be replaced with a Sort operator. The Sort operator can spill to disk, but it processes all the data (the limit and offset have no affect). So switching to Sort might incur performance penalties, but it will require less memory.

      Attachments

        Activity

          People

            arodoni Alexandra Rodoni
            stakiar Sahil Takiar
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: