Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-1820

TPCH-Q20 memory requirement drastically increased after the partition patch

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Blocker
    • Resolution: Fixed
    • Impala 2.2
    • Impala 2.2
    • None

    Description

      The partition patch (http://github.mtv.cloudera.com/CDH/Impala/commit/b8528bc64a21716b15fb6d0fbca888d2915d7b42) increases the minimum memory required for TPCH-Q20 from ~125m to ~3100m.

      Error in Impala shell with 1000m limit:

      Query: select c_name,
      c_custkey,
      o_orderkey,
      o_orderdate,
      o_totalprice,
      sum(l_quantity)
      from
      customer,
      orders,
      lineitem
      where
      o_orderkey in (
      select
      l_orderkey
      from
      lineitem
      group by
      l_orderkey
      having
      sum(l_quantity) > 300
      )
      and c_custkey = o_custkey
      and o_orderkey = l_orderkey
      group by
      c_name,
      c_custkey,
      o_orderkey,
      o_orderdate,
      o_totalprice
      order by
      o_totalprice desc,
      o_orderdate
      limit 100
      
      WARNINGS: Memory limit exceeded
      Cannot perform aggregation at hash aggregation node with id 4. The input data was partitioned the maximum number of 4 times. This could mean there is significant skew in the data or the memory limit is set too low.
      
      Backend 6:Memory Limit Exceeded
      Query(65439766566cd9ba:ccc44e5991539184) Limit: Limit=1000.00 MB Consumption=918.83 MB
        Fragment 65439766566cd9ba:ccc44e5991539185: Consumption=8.00 KB
          EXCHANGE_NODE (id=17): Consumption=0
          DataStreamRecvr: Consumption=0
        Block Manager: Limit=800.00 MB Consumption=796.57 MB
        Fragment 65439766566cd9ba:ccc44e5991539186: Consumption=3.14 MB
          SORT_NODE (id=9): Consumption=4.00 KB
          AGGREGATION_NODE (id=16): Consumption=3.12 MB
          EXCHANGE_NODE (id=15): Consumption=0
          DataStreamRecvr: Consumption=0
          DataStreamSender: Consumption=4.00 KB
        Fragment 65439766566cd9ba:ccc44e5991539189: Consumption=326.57 MB
          AGGREGATION_NODE (id=8): Consumption=3.12 MB
          HASH_JOIN_NODE (id=7): Consumption=24.00 KB
          HASH_JOIN_NODE (id=6): Consumption=22.02 MB
          HASH_JOIN_NODE (id=5): Consumption=266.06 MB
          EXCHANGE_NODE (id=10): Consumption=0
          DataStreamRecvr: Consumption=28.95 MB
          EXCHANGE_NODE (id=11): Consumption=0
          DataStreamRecvr: Consumption=4.00 KB
          EXCHANGE_NODE (id=12): Consumption=0
          DataStreamRecvr: Consumption=0
          AGGREGATION_NODE (id=14): Consumption=6.37 MB
          EXCHANGE_NODE (id=13): Consumption=0
          DataStreamRecvr: Consumption=4.00 KB
          DataStreamSender: Consumption=12.00 KB
        Fragment 65439766566cd9ba:ccc44e599153918c: Consumption=8.82 MB
          AGGREGATION_NODE (id=4): Consumption=8.75 MB
          HDFS_SCAN_NODE (id=3): Consumption=0
          DataStreamSender: Consumption=68.00 KB
        Fragment 65439766566cd9ba:ccc44e5991539192: Consumption=44.29 MB
          HDFS_SCAN_NODE (id=2): Consumption=44.18 MB
          DataStreamSender: Consumption=69.27 KB
      

      Also, memory required for TPCH-Q18 is increased from ~800m to ~3400m. The whole report can be seen here: http://sandbox.jenkins.cloudera.com/job/Low-Memory-Comparison/5/artifact/result.txt (env1 is before the patch, env2 is after the patch)

      Attachments

        Activity

          People

            ippokratis Ippokratis Pandis
            tarasbob Taras Bobrovytsky
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: