Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-8214

Bad plan in load_nested.py

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: Impala 3.1.0
    • Fix Version/s: Impala 3.2.0
    • Component/s: Infrastructure
    • Labels:
      None
    • Epic Color:
      ghx-label-8

      Description

      The plan for the below SQL, which is executed without stats, has the larger input on the build side of the join and does a broadcast join, which is very suboptimal. This causes high memory consumption when loading larger scale factors, and generally makes the loading process slower than necessary. We should flip the join and make it a shuffle join.

      https://github.com/apache/impala/blob/d481cd4/testdata/bin/load_nested.py#L123

      
            tmp_customer_sql = r"""
                SELECT
                  c_custkey, c_name, c_address, c_nationkey, c_phone, c_acctbal, c_mktsegment,
                  c_comment,
                  GROUP_CONCAT(
                    CONCAT(
                      CAST(o_orderkey AS STRING), '\003',
                      CAST(o_orderstatus AS STRING), '\003',
                      CAST(o_totalprice AS STRING), '\003',
                      CAST(o_orderdate AS STRING), '\003',
                      CAST(o_orderpriority AS STRING), '\003',
                      CAST(o_clerk AS STRING), '\003',
                      CAST(o_shippriority AS STRING), '\003',
                      CAST(o_comment AS STRING), '\003',
                      CAST(lineitems_string AS STRING)
                    ), '\002'
                  ) orders_string
                FROM {source_db}.customer
                LEFT JOIN tmp_orders_string ON c_custkey = o_custkey
                WHERE c_custkey % {chunks} = {chunk_idx}
                GROUP BY 1, 2, 3, 4, 5, 6, 7, 8""".format(**sql_params)
      

        Attachments

          Activity

            People

            • Assignee:
              tarmstrong Tim Armstrong
              Reporter:
              tarmstrong Tim Armstrong
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: