[IMPALA-2535] PAGG fails to acquire buffers despite sufficient memory limit - ASF JIRA

Details

Type: Bug
Status: Resolved
Priority: Blocker
Resolution: Fixed
Affects Version/s: Impala 2.3.0
Fix Version/s: Impala 2.5.0, Impala 2.3.2
Component/s: None
Labels:
- resource-management

Target Version:

Impala 2.5.0, Impala 2.3.2

Description

I ran into a memory limit running TPC-H Q18 100-scale text with a 8000mb memory limit.

The query was able to complete with a much lower memory limit: 1800mb.

An initial look suggests that the join nodes temporarily reserved most or all of the blocks
Probably the conditions needed for this to happen are that other nodes have partitions large enough that they consume almost all of the available blocks just with a single partition per node pinned. If these partitions are pinned before another node gets its initial reservation, the observed behaviour could result.

use tpch100;
set mem_limit=8000mb;
select
c_name,
c_custkey,
o_orderkey,
o_orderdate,
o_totalprice,
sum(l_quantity)
from
customer,
orders,
lineitem
where
o_orderkey in (
  select
  l_orderkey
  from
  lineitem
  group by
  l_orderkey
  having
  sum(l_quantity) > 300
)
and c_custkey = o_custkey
and o_orderkey = l_orderkey
group by
c_name,
c_custkey,
o_orderkey,
o_orderdate,
o_totalprice
order by
o_totalprice desc,
o_orderdate
limit 100;

Not enough memory to get the minimum required buffers for aggregation with id=14.

I1011 13:36:17.719748 31287 plan-fragment-executor.cc:303] Open(): instance_id=4a434673989dd50b:78d43f783784dcb2
I1011 13:41:20.710088 31190 status.cc:45] Memory limit exceeded
    @     0x7efd80e386eb  impala::Status::Status()
    @     0x7efd80e38405  impala::Status::MemLimitExceeded()
    @     0x7efd80a9f110  impala::PartitionedAggregationNode::Partition::Spill()
    @     0x7efd80aa209d  impala::PartitionedAggregationNode::SpillPartition()
    @     0x7efcf78f3ab6  (unknown)
I1011 13:41:22.637529 31190 data-stream-mgr.cc:128] DeregisterRecvr(): fragment_instance_id=4a434673989dd50b:78d43f783784dca6, node=13
I1011 13:41:22.637567 31190 data-stream-recvr.cc:233] cancelled stream: fragment_instance_id_=4a434673989dd50b:78d43f783784dca6 node_id=13
I1011 13:41:22.703086 31139 status.cc:45] Memory limit exceeded
    @     0x7efd80e386eb  impala::Status::Status()
    @     0x7efd80e38405  impala::Status::MemLimitExceeded()
    @     0x7efd7ef87939  impala::RuntimeState::SetMemLimitExceeded()
    @     0x7efd7ef661f3  impala::PlanFragmentExecutor::UpdateStatus()
    @     0x7efd7ef64193  impala::PlanFragmentExecutor::Open()
    @     0x7efd7e63bcc9  impala::FragmentMgr::FragmentExecState::Exec()
    @     0x7efd7e660725  impala::FragmentMgr::FragmentExecThread()
    @     0x7efd7e669102  boost::_mfi::mf1<>::operator()()
    @     0x7efd7e668e73  boost::_bi::list2<>::operator()<>()
    @     0x7efd7e668514  boost::_bi::bind_t<>::operator()()
    @     0x7efd7e667a11  boost::detail::function::void_function_obj_invoker0<>::invoke()
    @     0x7efd7ef506c7  boost::function0<>::operator()()
    @     0x7efd7d6a53c9  impala::Thread::SuperviseThread()
    @     0x7efd7d6aec97  boost::_bi::list4<>::operator()<>()
    @     0x7efd7d6aebb8  boost::_bi::bind_t<>::operator()()
    @     0x7efd7d6aeb6c  boost::detail::thread_data<>::run()
    @     0x7efd7cab609a  (unknown)
    @     0x7efd7bf586aa  start_thread
    @     0x7efd7a11eeed  (unknown)

I1011 13:41:22.703302 31139 runtime-state.cc:229] Error from query 4a434673989dd50b:78d43f783784dca1: Memory Limit Exceeded
Query(4a434673989dd50b:78d43f783784dca1) Limit: Limit=7.81 GB Consumption=5.89 GB
  Fragment 4a434673989dd50b:78d43f783784dca3: Consumption=3.26 MB
    SORT_NODE (id=9): Consumption=4.00 KB
    AGGREGATION_NODE (id=16): Consumption=3.25 MB
    EXCHANGE_NODE (id=15): Consumption=0
    DataStreamRecvr: Consumption=0
    DataStreamSender: Consumption=1.59 KB
  Block Manager: Limit=6.25 GB Consumption=5.75 GB
  Fragment 4a434673989dd50b:78d43f783784dca6: Consumption=4.07 GB
    AGGREGATION_NODE (id=8): Consumption=3.25 MB
    HASH_JOIN_NODE (id=7): Consumption=24.00 KB
    HASH_JOIN_NODE (id=6): Consumption=1.26 GB
    HASH_JOIN_NODE (id=5): Consumption=2.78 GB
    EXCHANGE_NODE (id=10): Consumption=0
    DataStreamRecvr: Consumption=28.83 MB
    EXCHANGE_NODE (id=11): Consumption=0
    DataStreamRecvr: Consumption=0
    EXCHANGE_NODE (id=12): Consumption=0
    DataStreamRecvr: Consumption=0
    AGGREGATION_NODE (id=14): Consumption=0
    EXCHANGE_NODE (id=13): Consumption=0
    DataStreamSender: Consumption=4.78 KB
  Fragment 4a434673989dd50b:78d43f783784dcab: Consumption=18.23 MB
    AGGREGATION_NODE (id=4): Consumption=18.19 MB
    HDFS_SCAN_NODE (id=3): Consumption=0
    DataStreamSender: Consumption=40.00 KB
  Fragment 4a434673989dd50b:78d43f783784dcb2: Consumption=74.09 MB
    HDFS_SCAN_NODE (id=2): Consumption=73.98 MB
    DataStreamSender: Consumption=75.98 KB

Workaround
The problem only occurs for specific memory limit values: increasing or decreasing the memory limit will avoid the issue in most cases.

PAGG fails to acquire buffers despite sufficient memory limit

Details

Description

Attachments

Activity

People

Dates