Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-5577

Memory leak when looping a select, CTAS, and daemon crashes

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Critical
    • Resolution: Won't Fix
    • Impala 2.9.0
    • None
    • Backend
    • None

    Description

      While trying to reproduce IMPALA-5558 I hit a "Memory limit exceeded" error.

      I0625 04:53:07.747248  7944 status.cc:55] Memory limit exceeded: Error occurred on backend lv-desktop:22000 by fragment 6c459f163a70ea25:7765587500000002
      Memory left in process limit: -350618.00 B
      Process: memory limit exceeded. Limit=8.35 GB Total=8.35 GB Peak=8.38 GB
        RequestPool=default-pool: Total=85.35 MB Peak=181.51 MB
          Query(6c459f163a70ea25:7765587500000000): Total=85.35 MB Peak=85.61 MB
            Fragment 6c459f163a70ea25:7765587500000002: Total=85.35 MB Peak=85.61 MB
              HDFS_SCAN_NODE (id=0): Total=85.29 MB Peak=85.55 MB
              HdfsTableSink: Total=50.00 KB Peak=50.00 KB
              CodeGen: Total=415.00 B Peak=49.00 KB
            Block Manager: Limit=6.68 GB Total=0 Peak=0
        Untracked Memory: Total=8.27 GB
          @     0x7fe3f28adf1e  impala::Status::Status()
          @     0x7fe3f28adc5e  impala::Status::MemLimitExceeded()
          @     0x7fe3f2035a4a  impala::MemTracker::MemLimitExceeded()
          @     0x7fe3f2076b24  impala::RuntimeState::SetMemLimitExceeded()
          @     0x7fe3f2076e4a  impala::RuntimeState::CheckQueryState()
          @     0x7fe3f170ccaf  impala::ExecNode::QueryMaintenance()
          @     0x7fe3f174e577  impala::HdfsScanNode::GetNextInternal()
          @     0x7fe3f174e222  impala::HdfsScanNode::GetNext()
          @     0x7fe3f201ffde  impala::FragmentInstanceState::ExecInternal()
          @     0x7fe3f201d949  impala::FragmentInstanceState::Exec()
          @     0x7fe3f20472f6  impala::QueryState::ExecFInstance()
          @     0x7fe3f2054efa  boost::_mfi::mf1<>::operator()()
          @     0x7fe3f20542dd  boost::_bi::list2<>::operator()<>()
          @     0x7fe3f2053935  boost::_bi::bind_t<>::operator()()
          @     0x7fe3f2052596  boost::detail::function::void_function_obj_invoker0<>::invoke()
          @     0x7fe3f2519a3c  boost::function0<>::operator()()
          @     0x7fe3f25170cf  impala::Thread::SuperviseThread()
          @     0x7fe3f25205c8  boost::_bi::list4<>::operator()<>()
          @     0x7fe3f252050b  boost::_bi::bind_t<>::operator()()
          @     0x7fe3f25204ce  boost::detail::thread_data<>::run()
          @           0x87d2aa  thread_proxy
          @     0x7fe3ec52e184  start_thread
          @     0x7fe3ec25bbed  clone
      I0625 04:53:07.747314  7944 runtime-state.cc:194] Error from query 6c459f163a70ea25:7765587500000000: Memory limit exceeded: Error occurred on backend lv-desktop:22000 by fragment 6c459f163a70ea25:7765587500000002
      Memory left in process limit: -350618.00 B
      Process: memory limit exceeded. Limit=8.35 GB Total=8.35 GB Peak=8.38 GB
        RequestPool=default-pool: Total=85.35 MB Peak=181.51 MB
          Query(6c459f163a70ea25:7765587500000000): Total=85.35 MB Peak=85.61 MB
            Fragment 6c459f163a70ea25:7765587500000002: Total=85.35 MB Peak=85.61 MB
              HDFS_SCAN_NODE (id=0): Total=85.29 MB Peak=85.55 MB
              HdfsTableSink: Total=50.00 KB Peak=50.00 KB
              CodeGen: Total=415.00 B Peak=49.00 KB
            Block Manager: Limit=6.68 GB Total=0 Peak=0
        Untracked Memory: Total=8.27 GB
      

      My test does the following in a loop:

      • Run 4 select queries to warm up the client caches
      • Restart the second node of the local minicluster
      • Run a CTAS query and make sure it succeeded

      The script to run this is here: https://gist.github.com/lekv/0093bf133d2c61267af0f910348da124

      The CTAS query in the last step hit the memory exceeded exception and it looks like there is a leak somewhere.

      edit:
      It uses warmup.sh to create connections in the client cache.

      #!/bin/bash
      impala-shell.sh -i localhost:21000 -f union50.sql
      impala-shell.sh -i localhost:21000 -f union50.sql
      impala-shell.sh -i localhost:21001 -f union50.sql
      impala-shell.sh -i localhost:21001 -f union50.sql
      

      union50.sql can be found here and unions the same query 50 times.

      Attachments

        1. pprof-growth.txt
          23 kB
          Lars Volker
        2. pprof-growth-off.pdf
          14 kB
          Lars Volker

        Issue Links

          Activity

            People

              sailesh Sailesh Mukil
              lv Lars Volker
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: