Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-2535

PAGG fails to acquire buffers despite sufficient memory limit

    XMLWordPrintableJSON

Details

    Description

      I ran into a memory limit running TPC-H Q18 100-scale text with a 8000mb memory limit.

      The query was able to complete with a much lower memory limit: 1800mb.

      An initial look suggests that the join nodes temporarily reserved most or all of the blocks
      Probably the conditions needed for this to happen are that other nodes have partitions large enough that they consume almost all of the available blocks just with a single partition per node pinned. If these partitions are pinned before another node gets its initial reservation, the observed behaviour could result.

      use tpch100;
      set mem_limit=8000mb;
      select
      c_name,
      c_custkey,
      o_orderkey,
      o_orderdate,
      o_totalprice,
      sum(l_quantity)
      from
      customer,
      orders,
      lineitem
      where
      o_orderkey in (
        select
        l_orderkey
        from
        lineitem
        group by
        l_orderkey
        having
        sum(l_quantity) > 300
      )
      and c_custkey = o_custkey
      and o_orderkey = l_orderkey
      group by
      c_name,
      c_custkey,
      o_orderkey,
      o_orderdate,
      o_totalprice
      order by
      o_totalprice desc,
      o_orderdate
      limit 100;
      
      Not enough memory to get the minimum required buffers for aggregation with id=14.
      
      I1011 13:36:17.719748 31287 plan-fragment-executor.cc:303] Open(): instance_id=4a434673989dd50b:78d43f783784dcb2
      I1011 13:41:20.710088 31190 status.cc:45] Memory limit exceeded
          @     0x7efd80e386eb  impala::Status::Status()
          @     0x7efd80e38405  impala::Status::MemLimitExceeded()
          @     0x7efd80a9f110  impala::PartitionedAggregationNode::Partition::Spill()
          @     0x7efd80aa209d  impala::PartitionedAggregationNode::SpillPartition()
          @     0x7efcf78f3ab6  (unknown)
      I1011 13:41:22.637529 31190 data-stream-mgr.cc:128] DeregisterRecvr(): fragment_instance_id=4a434673989dd50b:78d43f783784dca6, node=13
      I1011 13:41:22.637567 31190 data-stream-recvr.cc:233] cancelled stream: fragment_instance_id_=4a434673989dd50b:78d43f783784dca6 node_id=13
      I1011 13:41:22.703086 31139 status.cc:45] Memory limit exceeded
          @     0x7efd80e386eb  impala::Status::Status()
          @     0x7efd80e38405  impala::Status::MemLimitExceeded()
          @     0x7efd7ef87939  impala::RuntimeState::SetMemLimitExceeded()
          @     0x7efd7ef661f3  impala::PlanFragmentExecutor::UpdateStatus()
          @     0x7efd7ef64193  impala::PlanFragmentExecutor::Open()
          @     0x7efd7e63bcc9  impala::FragmentMgr::FragmentExecState::Exec()
          @     0x7efd7e660725  impala::FragmentMgr::FragmentExecThread()
          @     0x7efd7e669102  boost::_mfi::mf1<>::operator()()
          @     0x7efd7e668e73  boost::_bi::list2<>::operator()<>()
          @     0x7efd7e668514  boost::_bi::bind_t<>::operator()()
          @     0x7efd7e667a11  boost::detail::function::void_function_obj_invoker0<>::invoke()
          @     0x7efd7ef506c7  boost::function0<>::operator()()
          @     0x7efd7d6a53c9  impala::Thread::SuperviseThread()
          @     0x7efd7d6aec97  boost::_bi::list4<>::operator()<>()
          @     0x7efd7d6aebb8  boost::_bi::bind_t<>::operator()()
          @     0x7efd7d6aeb6c  boost::detail::thread_data<>::run()
          @     0x7efd7cab609a  (unknown)
          @     0x7efd7bf586aa  start_thread
          @     0x7efd7a11eeed  (unknown)
      
      I1011 13:41:22.703302 31139 runtime-state.cc:229] Error from query 4a434673989dd50b:78d43f783784dca1: Memory Limit Exceeded
      Query(4a434673989dd50b:78d43f783784dca1) Limit: Limit=7.81 GB Consumption=5.89 GB
        Fragment 4a434673989dd50b:78d43f783784dca3: Consumption=3.26 MB
          SORT_NODE (id=9): Consumption=4.00 KB
          AGGREGATION_NODE (id=16): Consumption=3.25 MB
          EXCHANGE_NODE (id=15): Consumption=0
          DataStreamRecvr: Consumption=0
          DataStreamSender: Consumption=1.59 KB
        Block Manager: Limit=6.25 GB Consumption=5.75 GB
        Fragment 4a434673989dd50b:78d43f783784dca6: Consumption=4.07 GB
          AGGREGATION_NODE (id=8): Consumption=3.25 MB
          HASH_JOIN_NODE (id=7): Consumption=24.00 KB
          HASH_JOIN_NODE (id=6): Consumption=1.26 GB
          HASH_JOIN_NODE (id=5): Consumption=2.78 GB
          EXCHANGE_NODE (id=10): Consumption=0
          DataStreamRecvr: Consumption=28.83 MB
          EXCHANGE_NODE (id=11): Consumption=0
          DataStreamRecvr: Consumption=0
          EXCHANGE_NODE (id=12): Consumption=0
          DataStreamRecvr: Consumption=0
          AGGREGATION_NODE (id=14): Consumption=0
          EXCHANGE_NODE (id=13): Consumption=0
          DataStreamSender: Consumption=4.78 KB
        Fragment 4a434673989dd50b:78d43f783784dcab: Consumption=18.23 MB
          AGGREGATION_NODE (id=4): Consumption=18.19 MB
          HDFS_SCAN_NODE (id=3): Consumption=0
          DataStreamSender: Consumption=40.00 KB
        Fragment 4a434673989dd50b:78d43f783784dcb2: Consumption=74.09 MB
          HDFS_SCAN_NODE (id=2): Consumption=73.98 MB
          DataStreamSender: Consumption=75.98 KB
      

      Workaround
      The problem only occurs for specific memory limit values: increasing or decreasing the memory limit will avoid the issue in most cases.

      Attachments

        Activity

          People

            tarmstrong Tim Armstrong
            tarmstrong Tim Armstrong
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: