Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-3304

Aggregations do not always spill when exprs use large amount of memory

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Major
    • Resolution: Unresolved
    • Impala 2.5.0, Impala 2.6.0
    • None
    • Backend
    • Hide
      Workaround:
      You may be able to successfully run the query if you set the query option max_block_mgr_memory (Impala <= 2.9.x) or buffer_pool_limit (Impala >= 2.10.0) to a value significantly less than the query's mem_limit.
      Show
      Workaround: You may be able to successfully run the query if you set the query option max_block_mgr_memory (Impala <= 2.9.x) or buffer_pool_limit (Impala >= 2.10.0) to a value significantly less than the query's mem_limit.

    Description

      This was reported by a user, and I was able to reproduce locally on a TPC-DS 100 data set.

      Large aggregations with avg() hit OOM before spilling, so spill-to-disk is not effective. See below that a large portion of the memory is consumed by exprs rather than spillable buffers.

      [tarmstrong-box.ca.cloudera.com:21000] > use tpcds_100_parquet;
      Query: use tpcds_100_parquet
      [tarmstrong-box.ca.cloudera.com:21000] > set mem_limit=1g;
      MEM_LIMIT set to 1g
      [tarmstrong-box.ca.cloudera.com:21000] > select ss_sold_time_sk, ss_item_sk, ss_customer_sk, ss_cdemo_sk, avg(ss_list_price) from store_sales group by 1, 2, 3, 4;Query: select ss_sold_time_sk, ss_item_sk, ss_customer_sk, ss_cdemo_sk, avg(ss_list_price) from store_sales group by 1, 2, 3, 4
      WARNINGS: 
      Memory limit exceeded
      
      
      
      Memory Limit Exceeded
      Query(5c4cbadeabe25055:daf8e57b54f81ab8) Limit: memory limit exceeded. Limit=1.00 GB Consumption=1.00 GB
        Fragment 5c4cbadeabe25055:daf8e57b54f81ab9: Consumption=8.00 KB
          EXCHANGE_NODE (id=4): Consumption=0
          DataStreamRecvr: Consumption=0
        Block Manager: Limit=819.20 MB Consumption=576.00 MB
        Fragment 5c4cbadeabe25055:daf8e57b54f81aba: Consumption=798.25 MB
          AGGREGATION_NODE (id=3): Consumption=794.06 MB
            Exprs: Consumption=351.94 MB
          EXCHANGE_NODE (id=2): Consumption=0
          DataStreamRecvr: Consumption=4.18 MB
          DataStreamSender: Consumption=3.20 KB
        Fragment 5c4cbadeabe25055:daf8e57b54f81abd: Consumption=227.26 MB
          AGGREGATION_NODE (id=1): Consumption=217.07 MB
            Exprs: Consumption=64.06 MB
          HDFS_SCAN_NODE (id=0): Consumption=10.08 MB
          DataStreamSender: Consumption=50.84 KB
      

      Log excerpt:

      I0405 14:10:20.968636 18059 status.cc:45] Memory limit exceeded
          @          0x10fc428  impala::Status::Status()
          @          0x10fc198  impala::Status::MemLimitExceeded()
          @          0x12a8d2c  impala::RuntimeState::SetMemLimitExceeded()
          @          0x12a91b1  impala::RuntimeState::CheckQueryState()
          @          0x1906edc  impala::DataStreamSender::Send()
          @          0x1897079  impala::PlanFragmentExecutor::OpenInternal()
          @          0x1896579  impala::PlanFragmentExecutor::Open()
          @          0x146e37e  impala::FragmentMgr::FragmentExecState::Exec()
          @          0x1465b83  impala::FragmentMgr::FragmentThread()
          @          0x14696e0  boost::_mfi::mf1<>::operator()()
          @          0x146949d  boost::_bi::list2<>::operator()<>()
          @          0x1468dc7  boost::_bi::bind_t<>::operator()()
          @          0x1468730  boost::detail::function::void_function_obj_invoker0<>::invoke()
          @          0x127db54  boost::function0<>::operator()()
          @          0x152aa4b  impala::Thread::SuperviseThread()
          @          0x153217c  boost::_bi::list4<>::operator()<>()
          @          0x15320bf  boost::_bi::bind_t<>::operator()()
          @          0x1532082  boost::detail::thread_data<>::run()
          @          0x194d05a  thread_proxy
          @     0x7fb9730486aa  start_thread
          @     0x7fb970420e9d  (unknown)
      I0405 14:10:20.969120 18059 runtime-state.cc:225] Error from query 5c4cbadeabe25055:daf8e57b54f81ab8: Memory Limit Exceeded
      Query(5c4cbadeabe25055:daf8e57b54f81ab8) Limit: memory limit exceeded. Limit=1.00 GB Consumption=1.00 GB
        Fragment 5c4cbadeabe25055:daf8e57b54f81ab9: Consumption=8.00 KB
          EXCHANGE_NODE (id=4): Consumption=0
          DataStreamRecvr: Consumption=0
        Block Manager: Limit=819.20 MB Consumption=576.00 MB
        Fragment 5c4cbadeabe25055:daf8e57b54f81aba: Consumption=798.25 MB
          AGGREGATION_NODE (id=3): Consumption=794.06 MB
            Exprs: Consumption=351.94 MB
          EXCHANGE_NODE (id=2): Consumption=0
          DataStreamRecvr: Consumption=4.18 MB
          DataStreamSender: Consumption=3.20 KB
        Fragment 5c4cbadeabe25055:daf8e57b54f81abd: Consumption=227.26 MB
          AGGREGATION_NODE (id=1): Consumption=217.07 MB
            Exprs: Consumption=64.06 MB
          HDFS_SCAN_NODE (id=0): Consumption=10.08 MB
          DataStreamSender: Consumption=50.84 KB
      

      We should be able to reclaim the expr memory by spilling, but block mgr memory pressure has not kicked in because the block mgr did not yet hit it's limit.

      We'd expect to see this problem in queries where non-spillable data is > 20% of overall memory consumption.

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              tarmstrong Tim Armstrong
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated: