[IMPALA-2102] cgroup not removed when cleaning up fragment after cancellation - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Minor
Resolution: Not A Bug
Affects Version/s: Impala 2.2
Fix Version/s: Impala 2.8.0
Component/s: Backend
Labels:

Target Version:

Product Backlog

Description

With RM enabled, when a query is cancelled, the cgroup in which the fragment was assigned may not be cleaned up properly. This may be related to ~~IMPALA-2060~~, but this can be fixed on a single fragment alone before fixing ~~IMPALA-2060~~ which is a larger task.

Running tpcds q46 on a cluster with RM enabled, the following error occurred cleaning up the cgroup, and the cgroup was never actually removed.

I0625 19:36:07.322680 27849 status.cc:111] Failed to drop CGroup at path /var/run/cloudera-scm-agent/cgroups/cpu/hadoop-yarn/474bbf17afcf9e1c_1fee8d4816b7c9e_impala. boost::filesystem::remove: Device or resource busy: "/var/run/cloudera-scm-agent/cgroups/cpu/hadoop-yarn/474bbf17afcf9e1c_1fee8d4816b7c9e_impala"
    @           0xf48ea9  impala::Status::Status()
    @          0x138cda6  impala::CgroupsMgr::DropCgroup()
    @          0x138e9c3  impala::CgroupsMgr::UnregisterFragment()
    @          0x1545c87  impala::PlanFragmentExecutor::Close()
    @          0x153f9eb  impala::PlanFragmentExecutor::~PlanFragmentExecutor()
    @          0x151f23f  boost::checked_delete<>()
    @          0x151c859  boost::scoped_ptr<>::~scoped_ptr()
    @          0x150d4d6  impala::Coordinator::~Coordinator()
    @          0x12fb2fc  boost::checked_delete<>()
    @          0x12f8725  boost::scoped_ptr<>::~scoped_ptr()
    @          0x12e9032  impala::ImpalaServer::QueryExecState::~QueryExecState()
    @          0x12a3f80  boost::checked_delete<>()
    @          0x12b41ae  boost::detail::sp_counted_impl_p<>::dispose()
    @           0xf09a34  boost::detail::sp_counted_base::release()
    @           0xf09aad  boost::detail::shared_count::~shared_count()
    @          0x1283ba8  boost::shared_ptr<>::~shared_ptr()
    @          0x126dd51  impala::ImpalaServer::UnregisterQuery()
    @          0x12e095b  impala::ImpalaServer::close()
    @          0x148e93c  beeswax::BeeswaxServiceProcessor::process_close()
    @          0x14892a0  beeswax::BeeswaxServiceProcessor::dispatchCall()
    @          0x1470e5b  impala::ImpalaServiceProcessor::dispatchCall()
    @          0x127955c  apache::thrift::TDispatchProcessor::process()
    @          0x1fbeb79  apache::thrift::server::TThreadPoolServer::Task::run()
    @          0x1faae9f  apache::thrift::concurrency::ThreadManager::Task::run()
    @          0x1facde4  apache::thrift::concurrency::ThreadManager::Worker::run()
    @          0x11a8633  impala::ThriftThread::RunRunnable()
    @          0x11a9de3  boost::_mfi::mf2<>::operator()()
    @          0x11a9c3c  boost::_bi::list3<>::operator()<>()
    @          0x11a99b9  boost::_bi::bind_t<>::operator()()
    @          0x11a98c7  boost::detail::function::void_function_obj_invoker0<>::invoke()
    @          0x11df763  boost::function0<>::operator()()
    @          0x13f9f28  impala::Thread::SuperviseThread()

In the above stack and attached log, this happened on the coordinator, but it also looks like this can happen on remote fragments as well-- it seems that it occurs on the fragment which had the error.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

impalad.cgroup-cleanup-bug.log
26/Jun/15 02:53
246 kB
Matthew Jacobs

Activity

People

Assignee:: Matthew Jacobs

Reporter:: Matthew Jacobs

Votes:: 0 Vote for this issue

Watchers:: 1 Start watching this issue

Dates

Created:: 26/Jun/15 02:49

Updated:: 07/Dec/16 17:45

Resolved:: 07/Dec/16 17:45