Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-1968

Bad plan choices due to incorrect number of estimated hosts.

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Critical
    • Resolution: Fixed
    • Impala 1.4, Impala 1.4.1, Impala 2.0, Impala 2.0.1, Impala 2.1, Impala 2.1.1, Impala 2.2
    • Impala 2.3.0
    • None

    Description

      The number of hosts a plan node is estimated to run on during plan generation and the actual number of hosts the query will run on could be different in the following scenarios:
      1. Queries accessing tables on non-HDFS remote storage, e.g., S3 or Isilon
      2. Queries with aggressive partition pruning
      3. Large clusters relative to the size of the tables scanned

      This discrepancy between the number of estimated hosts and the actual number of hosts can lead to suboptimal plan choices, in particular, a bad join strategy (broadcast vs. partitioned).

      To determine whether a query is running slow due such a bad planning decision, you can examine the "hosts" field in the query plan from the profile, and contrast it with the actual hosts from the execution summary.

      Attachments

        Activity

          People

            dhecht Daniel Hecht
            alex.behm Alexander Behm
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: