Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-2102

cgroup not removed when cleaning up fragment after cancellation

Attach filesAttach ScreenshotVotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    Description

      With RM enabled, when a query is cancelled, the cgroup in which the fragment was assigned may not be cleaned up properly. This may be related to IMPALA-2060, but this can be fixed on a single fragment alone before fixing IMPALA-2060 which is a larger task.

      Running tpcds q46 on a cluster with RM enabled, the following error occurred cleaning up the cgroup, and the cgroup was never actually removed.

      I0625 19:36:07.322680 27849 status.cc:111] Failed to drop CGroup at path /var/run/cloudera-scm-agent/cgroups/cpu/hadoop-yarn/474bbf17afcf9e1c_1fee8d4816b7c9e_impala. boost::filesystem::remove: Device or resource busy: "/var/run/cloudera-scm-agent/cgroups/cpu/hadoop-yarn/474bbf17afcf9e1c_1fee8d4816b7c9e_impala"
          @           0xf48ea9  impala::Status::Status()
          @          0x138cda6  impala::CgroupsMgr::DropCgroup()
          @          0x138e9c3  impala::CgroupsMgr::UnregisterFragment()
          @          0x1545c87  impala::PlanFragmentExecutor::Close()
          @          0x153f9eb  impala::PlanFragmentExecutor::~PlanFragmentExecutor()
          @          0x151f23f  boost::checked_delete<>()
          @          0x151c859  boost::scoped_ptr<>::~scoped_ptr()
          @          0x150d4d6  impala::Coordinator::~Coordinator()
          @          0x12fb2fc  boost::checked_delete<>()
          @          0x12f8725  boost::scoped_ptr<>::~scoped_ptr()
          @          0x12e9032  impala::ImpalaServer::QueryExecState::~QueryExecState()
          @          0x12a3f80  boost::checked_delete<>()
          @          0x12b41ae  boost::detail::sp_counted_impl_p<>::dispose()
          @           0xf09a34  boost::detail::sp_counted_base::release()
          @           0xf09aad  boost::detail::shared_count::~shared_count()
          @          0x1283ba8  boost::shared_ptr<>::~shared_ptr()
          @          0x126dd51  impala::ImpalaServer::UnregisterQuery()
          @          0x12e095b  impala::ImpalaServer::close()
          @          0x148e93c  beeswax::BeeswaxServiceProcessor::process_close()
          @          0x14892a0  beeswax::BeeswaxServiceProcessor::dispatchCall()
          @          0x1470e5b  impala::ImpalaServiceProcessor::dispatchCall()
          @          0x127955c  apache::thrift::TDispatchProcessor::process()
          @          0x1fbeb79  apache::thrift::server::TThreadPoolServer::Task::run()
          @          0x1faae9f  apache::thrift::concurrency::ThreadManager::Task::run()
          @          0x1facde4  apache::thrift::concurrency::ThreadManager::Worker::run()
          @          0x11a8633  impala::ThriftThread::RunRunnable()
          @          0x11a9de3  boost::_mfi::mf2<>::operator()()
          @          0x11a9c3c  boost::_bi::list3<>::operator()<>()
          @          0x11a99b9  boost::_bi::bind_t<>::operator()()
          @          0x11a98c7  boost::detail::function::void_function_obj_invoker0<>::invoke()
          @          0x11df763  boost::function0<>::operator()()
          @          0x13f9f28  impala::Thread::SuperviseThread()
      

      In the above stack and attached log, this happened on the coordinator, but it also looks like this can happen on remote fragments as well-- it seems that it occurs on the fragment which had the error.

      Attachments

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            mjacobs Matthew Jacobs
            mjacobs Matthew Jacobs
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment