Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-12120

Set appropriate output writer parallelism when using new processing cost planner

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • Impala 4.3.0
    • Frontend
    • None
    • ghx-label-2

    Description

      The new processing cost based planner changes (IMPALA-11604, IMPALA-12091) will impact output writer parallelism for insert queries, with the potential for more small files if the processing cost based planning results in too many writer fragments.  This could further exacerbate a problem that was introduced with mt_dop (see IMPALA-8125). 

      There are 2 cases to consider:

      1. Unpartitioned inserts where the output writer is in the same fragment as the scan.  In this case the output parallelism will be determined by the scan parallelism which may increase (vs mt_dop) with the changes in IMPALA-12091.
      2. Partitioned inserts where the output writer fragment typically consists of a sort followed by the writer, and the parallelism under IMPALA-11604 is driven by the estimated sort cost.  Again we have the potential to overparallelize resulting in too many small files.

      The MAX_FS_WRITERS query option (IMPALA-8125) can help mitigate this but we should have better default behavior even when MAX_FS_WRITERS isn't set.  The default output writer parallelism with no query options set should avoid creating excessive writer parallelism for both partitioned and unpartitioned inserts.  We could also consider always including an exchange (even in the unpartitioned case) to decouple the writer from the scan parallelism.

      Attachments

        Issue Links

          Activity

            People

              rizaon Riza Suminto
              drorke David Rorke
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: