Details
-
Bug
-
Status: Resolved
-
Blocker
-
Resolution: Fixed
-
Impala 2.2
-
None
Description
The partition patch (http://github.mtv.cloudera.com/CDH/Impala/commit/b8528bc64a21716b15fb6d0fbca888d2915d7b42) increases the minimum memory required for TPCH-Q20 from ~125m to ~3100m.
Error in Impala shell with 1000m limit:
Query: select c_name, c_custkey, o_orderkey, o_orderdate, o_totalprice, sum(l_quantity) from customer, orders, lineitem where o_orderkey in ( select l_orderkey from lineitem group by l_orderkey having sum(l_quantity) > 300 ) and c_custkey = o_custkey and o_orderkey = l_orderkey group by c_name, c_custkey, o_orderkey, o_orderdate, o_totalprice order by o_totalprice desc, o_orderdate limit 100 WARNINGS: Memory limit exceeded Cannot perform aggregation at hash aggregation node with id 4. The input data was partitioned the maximum number of 4 times. This could mean there is significant skew in the data or the memory limit is set too low. Backend 6:Memory Limit Exceeded Query(65439766566cd9ba:ccc44e5991539184) Limit: Limit=1000.00 MB Consumption=918.83 MB Fragment 65439766566cd9ba:ccc44e5991539185: Consumption=8.00 KB EXCHANGE_NODE (id=17): Consumption=0 DataStreamRecvr: Consumption=0 Block Manager: Limit=800.00 MB Consumption=796.57 MB Fragment 65439766566cd9ba:ccc44e5991539186: Consumption=3.14 MB SORT_NODE (id=9): Consumption=4.00 KB AGGREGATION_NODE (id=16): Consumption=3.12 MB EXCHANGE_NODE (id=15): Consumption=0 DataStreamRecvr: Consumption=0 DataStreamSender: Consumption=4.00 KB Fragment 65439766566cd9ba:ccc44e5991539189: Consumption=326.57 MB AGGREGATION_NODE (id=8): Consumption=3.12 MB HASH_JOIN_NODE (id=7): Consumption=24.00 KB HASH_JOIN_NODE (id=6): Consumption=22.02 MB HASH_JOIN_NODE (id=5): Consumption=266.06 MB EXCHANGE_NODE (id=10): Consumption=0 DataStreamRecvr: Consumption=28.95 MB EXCHANGE_NODE (id=11): Consumption=0 DataStreamRecvr: Consumption=4.00 KB EXCHANGE_NODE (id=12): Consumption=0 DataStreamRecvr: Consumption=0 AGGREGATION_NODE (id=14): Consumption=6.37 MB EXCHANGE_NODE (id=13): Consumption=0 DataStreamRecvr: Consumption=4.00 KB DataStreamSender: Consumption=12.00 KB Fragment 65439766566cd9ba:ccc44e599153918c: Consumption=8.82 MB AGGREGATION_NODE (id=4): Consumption=8.75 MB HDFS_SCAN_NODE (id=3): Consumption=0 DataStreamSender: Consumption=68.00 KB Fragment 65439766566cd9ba:ccc44e5991539192: Consumption=44.29 MB HDFS_SCAN_NODE (id=2): Consumption=44.18 MB DataStreamSender: Consumption=69.27 KB
Also, memory required for TPCH-Q18 is increased from ~800m to ~3400m. The whole report can be seen here: http://sandbox.jenkins.cloudera.com/job/Low-Memory-Comparison/5/artifact/result.txt (env1 is before the patch, env2 is after the patch)