Details
-
Bug
-
Status: Resolved
-
Minor
-
Resolution: Not A Bug
-
Impala 2.2
Description
With RM enabled, when a query is cancelled, the cgroup in which the fragment was assigned may not be cleaned up properly. This may be related to IMPALA-2060, but this can be fixed on a single fragment alone before fixing IMPALA-2060 which is a larger task.
Running tpcds q46 on a cluster with RM enabled, the following error occurred cleaning up the cgroup, and the cgroup was never actually removed.
I0625 19:36:07.322680 27849 status.cc:111] Failed to drop CGroup at path /var/run/cloudera-scm-agent/cgroups/cpu/hadoop-yarn/474bbf17afcf9e1c_1fee8d4816b7c9e_impala. boost::filesystem::remove: Device or resource busy: "/var/run/cloudera-scm-agent/cgroups/cpu/hadoop-yarn/474bbf17afcf9e1c_1fee8d4816b7c9e_impala" @ 0xf48ea9 impala::Status::Status() @ 0x138cda6 impala::CgroupsMgr::DropCgroup() @ 0x138e9c3 impala::CgroupsMgr::UnregisterFragment() @ 0x1545c87 impala::PlanFragmentExecutor::Close() @ 0x153f9eb impala::PlanFragmentExecutor::~PlanFragmentExecutor() @ 0x151f23f boost::checked_delete<>() @ 0x151c859 boost::scoped_ptr<>::~scoped_ptr() @ 0x150d4d6 impala::Coordinator::~Coordinator() @ 0x12fb2fc boost::checked_delete<>() @ 0x12f8725 boost::scoped_ptr<>::~scoped_ptr() @ 0x12e9032 impala::ImpalaServer::QueryExecState::~QueryExecState() @ 0x12a3f80 boost::checked_delete<>() @ 0x12b41ae boost::detail::sp_counted_impl_p<>::dispose() @ 0xf09a34 boost::detail::sp_counted_base::release() @ 0xf09aad boost::detail::shared_count::~shared_count() @ 0x1283ba8 boost::shared_ptr<>::~shared_ptr() @ 0x126dd51 impala::ImpalaServer::UnregisterQuery() @ 0x12e095b impala::ImpalaServer::close() @ 0x148e93c beeswax::BeeswaxServiceProcessor::process_close() @ 0x14892a0 beeswax::BeeswaxServiceProcessor::dispatchCall() @ 0x1470e5b impala::ImpalaServiceProcessor::dispatchCall() @ 0x127955c apache::thrift::TDispatchProcessor::process() @ 0x1fbeb79 apache::thrift::server::TThreadPoolServer::Task::run() @ 0x1faae9f apache::thrift::concurrency::ThreadManager::Task::run() @ 0x1facde4 apache::thrift::concurrency::ThreadManager::Worker::run() @ 0x11a8633 impala::ThriftThread::RunRunnable() @ 0x11a9de3 boost::_mfi::mf2<>::operator()() @ 0x11a9c3c boost::_bi::list3<>::operator()<>() @ 0x11a99b9 boost::_bi::bind_t<>::operator()() @ 0x11a98c7 boost::detail::function::void_function_obj_invoker0<>::invoke() @ 0x11df763 boost::function0<>::operator()() @ 0x13f9f28 impala::Thread::SuperviseThread()
In the above stack and attached log, this happened on the coordinator, but it also looks like this can happen on remote fragments as well-- it seems that it occurs on the fragment which had the error.