Details
-
Bug
-
Status: Open
-
Minor
-
Resolution: Unresolved
-
None
-
None
-
None
-
ghx-label-8
Description
Hi, when encounting error, both `get_json_object()` and `DecimalOperators::IntToDecimalVal` will raise warning.
During to their stateless nature, The warning flood will easily overwhelm cluster's processing capacity.
To be specific, we have observed these bottlenecks:
Exchange Receiver: the default value for `rpc_max_message_size` is 50MB. The flooding warning messages carried by ReportExecStatusPB may exceed that limit, causing profile-less status report. Or, if the report message size is somehow under the limit, the bandwidth consumption is also non-trivial.
Storage: like IMPALA-5256 , flooding warnings produce huge log files since `stdout/stderr` won't be redirected when glog is rolling logs. Under this circumstance, we had enough of clearing log files and restarting executors.
Coordinator: runtime profiles will be serialized to thrift and stored in Coordinator's memory. The warning flood will make `Untracked Memory` rising rapidly. I have made a heap profile(with pprof) and found most memory were used by RuntimeProfile and Strings.
1 preliminary Solution:
We suffered a lot from this problem, and we have came out with an preliminary solution.
- We have a straightforward solution by muting the AddWarning()
- Introduced a query option to re-enable the warning when needed.
Testing:
With muted warning messages, we find the burden of C nodes is highly alleviated and heap profiles no longer bound to RuntimeProfile.
Update
Encountered a similar crash case with `get_json_object()` query, each time the query submitted, the Coordinator crashes.
Log:
# A fatal error has been detected by the Java Runtime Environment: # # SIGSEGV (0xb) at pc=0x0000000002c64dca, pid=3633220, tid=0x00007eff73308700 # # JRE version: Java(TM) SE Runtime Environment (8.0_181-b13) (build 1.8.0_181-b13) # Java VM: Java HotSpot(TM) 64-Bit Server VM (25.181-b13 mixed mode linux-amd64 ) # Problematic frame: # C [impalad+0x2864dca] tcmalloc::ThreadCache::ReleaseToCentralCache(tcmalloc::ThreadCache::FreeList*, unsigned long, int)+0x13a # # Failed to write core dump. Core dumps have been disabled. To enable core dumping, try "ulimit -c unlimited" before starting Java again # # An error report file with more information is saved as: # /run/cloudera-scm-agent/process/10376-impala-IMPALAD/hs_err_pid3633220.log # # If you would like to submit a bug report, please visit: # http://bugreport.java.com/bugreport/crash.jsp # The crash happened outside the Java Virtual Machine in native code. # See problematic frame for where to report the bug. # d. The connection had 2 associated session(s). I0427 13:43:03.907536 3853145 status.cc:126] Couldn't serialize thrift object: std::bad_alloc @ 0xbf4ef9 @ 0x1352d5f @ 0x1352eaf @ 0x11986de @ 0x122516c @ 0x1225515 @ 0x137ee36 @ 0x13801a0 @ 0x139682f @ 0x139915a @ 0x1399784 @ 0x7f34791e0e24 @ 0x7f3475dd835c
StackTrace:
Stack: [0x00007eff72b08000,0x00007eff73309000], sp=0x00007eff733006b0, free space=8161k Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code) C [impalad+0x2864dca] tcmalloc::ThreadCache::ReleaseToCentralCache(tcmalloc::ThreadCache::FreeList*, unsigned long, int)+0x13a C [impalad+0x286519f] tcmalloc::ThreadCache::Scavenge()+0x3f C [impalad+0x29a211a] operator delete(void*)+0x32a C [impalad+0xae94d9] impala::TRuntimeProfileNode::~TRuntimeProfileNode()+0x289 C [impalad+0xae4987] impala::TRuntimeProfileTree::~TRuntimeProfileTree()+0x47 C [impalad+0xf5280a] impala::RuntimeProfile::Compress(std::vector<unsigned char, std::allocator<unsigned char> >*) const+0x3aa C [impalad+0xf52eb0] impala::RuntimeProfile::SerializeToArchiveString(std::basic_stringstream<char, std::char_traits<char>, std::allocator<char> >*) const+0x40 C [impalad+0xd986df] impala::ImpalaServer::GetRuntimeProfileOutput(impala::TUniqueId const&, std::string const&, impala::TRuntimeProfileFormat::type, std::basic_stringstream<char, std::char_traits<char>, std::allocator<char> >*, impala::TRuntimeProfileTree*, rapidjson::GenericDocument<rapidjson::UTF8<char>, rapidjson::MemoryPoolAllocator<rapidjson::CrtAllocator>, rapidjson::CrtAllocator>*)+0x5bf C [impalad+0xe2516d] impala::ImpalaHttpHandler::QueryProfileHelper(kudu::WebCallbackRegistry::WebRequest const&, rapidjson::GenericDocument<rapidjson::UTF8<char>, rapidjson::MemoryPoolAllocator<rapidjson::CrtAllocator>, rapidjson::CrtAllocator>*, impala::TRuntimeProfileFormat::type)+0x4ed C [impalad+0xe25516] impala::ImpalaHttpHandler::QueryProfileEncodedHandler(kudu::WebCallbackRegistry::WebRequest const&, rapidjson::GenericDocument<rapidjson::UTF8<char>, rapidjson::MemoryPoolAllocator<rapidjson::CrtAllocator>, rapidjson::CrtAllocator>*)+0x16 C [impalad+0xf7ee37] impala::Webserver::RenderUrlWithTemplate(sq_connection const*, kudu::WebCallbackRegistry::WebRequest const&, impala::Webserver::UrlHandler const&, std::basic_stringstream<char, std::char_traits<char>, std::allocator<char> >*, impala::ContentType*)+0x177 C [impalad+0xf801a1] impala::Webserver::BeginRequestCallback(sq_connection*, sq_request_info*)+0x951 C [impalad+0xf96830] kudu::StringGauge::~StringGauge()+0x100