Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-2102

cgroup not removed when cleaning up fragment after cancellation

    XMLWordPrintableJSON

Details

    Description

      With RM enabled, when a query is cancelled, the cgroup in which the fragment was assigned may not be cleaned up properly. This may be related to IMPALA-2060, but this can be fixed on a single fragment alone before fixing IMPALA-2060 which is a larger task.

      Running tpcds q46 on a cluster with RM enabled, the following error occurred cleaning up the cgroup, and the cgroup was never actually removed.

      I0625 19:36:07.322680 27849 status.cc:111] Failed to drop CGroup at path /var/run/cloudera-scm-agent/cgroups/cpu/hadoop-yarn/474bbf17afcf9e1c_1fee8d4816b7c9e_impala. boost::filesystem::remove: Device or resource busy: "/var/run/cloudera-scm-agent/cgroups/cpu/hadoop-yarn/474bbf17afcf9e1c_1fee8d4816b7c9e_impala"
          @           0xf48ea9  impala::Status::Status()
          @          0x138cda6  impala::CgroupsMgr::DropCgroup()
          @          0x138e9c3  impala::CgroupsMgr::UnregisterFragment()
          @          0x1545c87  impala::PlanFragmentExecutor::Close()
          @          0x153f9eb  impala::PlanFragmentExecutor::~PlanFragmentExecutor()
          @          0x151f23f  boost::checked_delete<>()
          @          0x151c859  boost::scoped_ptr<>::~scoped_ptr()
          @          0x150d4d6  impala::Coordinator::~Coordinator()
          @          0x12fb2fc  boost::checked_delete<>()
          @          0x12f8725  boost::scoped_ptr<>::~scoped_ptr()
          @          0x12e9032  impala::ImpalaServer::QueryExecState::~QueryExecState()
          @          0x12a3f80  boost::checked_delete<>()
          @          0x12b41ae  boost::detail::sp_counted_impl_p<>::dispose()
          @           0xf09a34  boost::detail::sp_counted_base::release()
          @           0xf09aad  boost::detail::shared_count::~shared_count()
          @          0x1283ba8  boost::shared_ptr<>::~shared_ptr()
          @          0x126dd51  impala::ImpalaServer::UnregisterQuery()
          @          0x12e095b  impala::ImpalaServer::close()
          @          0x148e93c  beeswax::BeeswaxServiceProcessor::process_close()
          @          0x14892a0  beeswax::BeeswaxServiceProcessor::dispatchCall()
          @          0x1470e5b  impala::ImpalaServiceProcessor::dispatchCall()
          @          0x127955c  apache::thrift::TDispatchProcessor::process()
          @          0x1fbeb79  apache::thrift::server::TThreadPoolServer::Task::run()
          @          0x1faae9f  apache::thrift::concurrency::ThreadManager::Task::run()
          @          0x1facde4  apache::thrift::concurrency::ThreadManager::Worker::run()
          @          0x11a8633  impala::ThriftThread::RunRunnable()
          @          0x11a9de3  boost::_mfi::mf2<>::operator()()
          @          0x11a9c3c  boost::_bi::list3<>::operator()<>()
          @          0x11a99b9  boost::_bi::bind_t<>::operator()()
          @          0x11a98c7  boost::detail::function::void_function_obj_invoker0<>::invoke()
          @          0x11df763  boost::function0<>::operator()()
          @          0x13f9f28  impala::Thread::SuperviseThread()
      

      In the above stack and attached log, this happened on the coordinator, but it also looks like this can happen on remote fragments as well-- it seems that it occurs on the fragment which had the error.

      Attachments

        1. impalad.cgroup-cleanup-bug.log
          246 kB
          Matthew Jacobs

        Activity

          People

            mjacobs Matthew Jacobs
            mjacobs Matthew Jacobs
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: