Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-5381

Add query option to control join strategy when tables have no stats

    Details

    • Type: New Feature
    • Status: Resolved
    • Priority: Critical
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: Impala 2.9.0
    • Component/s: Frontend
    • Labels:
      None

      Description

      In IMPALA-5120 the join strategy was changed from bcast to shuffle when tables have no stats. Adding a query option to specify the behavior lowers the risk that users may have come to rely on this behavior. This would allow them to revert back to the previous behavior.

      Query option proposal:

      default_join_distribution_mode = [ broadcast | shuffle ] 
      

      Ideally, the default would be shuffle, but in the spirit of preserving existing behavior it will stay broadcast. We should re-evaluate this choice in a compatibility-breaking release.

        Issue Links

          Activity

          Hide
          alex.behm Alexander Behm added a comment -

          commit ecda49f3e3001e23bebd6bdfaa1c612716df4bf1
          Author: Alex Behm <alex.behm@cloudera.com>
          Date: Thu Jun 1 18:39:43 2017 -0700

          IMPALA-5381: Adds DEFAULT_JOIN_DISTRIBUTION_MODE query option.

          Adds a new query option DEFAULT_JOIN_DISTRIBUTION_MODE to
          control which join distribution mode is chosen when the join
          inputs have an unknown cardinality (e.g., missing stats) or when
          the expected costs of the different strategies are equal.

          Values for DEFAULT_JOIN_DISTRIBUTION_MODE: [BROADCAST, SHUFFLE]
          Default: BROADCAST

          Note that this change effectively undoes IMPALA-5120.

          Testing:

          • Added new planner tests
          • Core/hdfs run passed

          Change-Id: Ibd34442f422129d53bef5493fc9cbe7375a0765c
          Reviewed-on: http://gerrit.cloudera.org:8080/7059
          Reviewed-by: Alex Behm <alex.behm@cloudera.com>
          Tested-by: Impala Public Jenkins

          Show
          alex.behm Alexander Behm added a comment - commit ecda49f3e3001e23bebd6bdfaa1c612716df4bf1 Author: Alex Behm <alex.behm@cloudera.com> Date: Thu Jun 1 18:39:43 2017 -0700 IMPALA-5381 : Adds DEFAULT_JOIN_DISTRIBUTION_MODE query option. Adds a new query option DEFAULT_JOIN_DISTRIBUTION_MODE to control which join distribution mode is chosen when the join inputs have an unknown cardinality (e.g., missing stats) or when the expected costs of the different strategies are equal. Values for DEFAULT_JOIN_DISTRIBUTION_MODE: [BROADCAST, SHUFFLE] Default: BROADCAST Note that this change effectively undoes IMPALA-5120 . Testing: Added new planner tests Core/hdfs run passed Change-Id: Ibd34442f422129d53bef5493fc9cbe7375a0765c Reviewed-on: http://gerrit.cloudera.org:8080/7059 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: Impala Public Jenkins

            People

            • Assignee:
              alex.behm Alexander Behm
              Reporter:
              grahn Greg Rahn
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development