Details
-
Bug
-
Status: Resolved
-
Critical
-
Resolution: Fixed
-
Impala 4.0.0, Impala 3.4.0, Impala 3.4.1, Impala 4.1.0, Impala 4.2.0, Impala 4.1.1, Impala 4.1.2
-
None
-
ghx-label-5
Description
Status ImpalaServer::GetQueryRecord(const TUniqueId& query_id, QueryLogIndex::const_iterator* query_record) { lock_guard<mutex> l(query_log_lock_); *query_record = query_log_index_.find(query_id); ... return Status::OK(); }
This may cause the caller to access invalid iterators, although the function locks query_log_lock_ in the execution, the query_record it provides cannot guarantee to be valid, because it is out of the protection of query_log_lock_ after returning, if query_log_index_ just deletes the corresponding record at this time, then query_record will be an invalid iterator.
There is a very small probability that this issue may cause impalad to crash:
Stack: [0x00007f5be789f000,0x00007f5be809f000], sp=0x00007f5be8099a00, free space=8170k Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code) C [impalad+0x118b08d] impala::ImpalaServer::GetRuntimeProfileOutput(impala::TUniqueId const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, impala::TRuntimeProfileFormat::type, impala::ImpalaServer::RuntimeProfileOutput*)+0x1dd C [impalad+0x1167213] impala::ImpalaHttpHandler::QueryProfileHelper(kudu::WebCallbackRegistry::WebRequest const&, rapidjson::GenericDocument<rapidjson::UTF8<char>, rapidjson::MemoryPoolAllocator<rapidjson::CrtAllocator>, rapidjson::CrtAllocator>*, impala::TRuntimeProfileFormat::type)+0x53d C [impalad+0x116774a] impala::ImpalaHttpHandler::QueryProfileTextHandler(kudu::WebCallbackRegistry::WebRequest const&, rapidjson::GenericDocument<rapidjson::UTF8<char>, rapidjson::MemoryPoolAllocator<rapidjson::CrtAllocator>, rapidjson::CrtAllocator>*)+0x16 C [impalad+0x115bd0d] std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > apache::thrift::to_string<std::vector<int, std::allocator<int> > >(std::vector<std::vector<int, std::allocator<int> >, std::allocator<std::vector<int, std::allocator<int> > > > const&)+0x141 C [impalad+0x14d158b] impala::Webserver::RenderUrlWithTemplate(sq_connection const*, kudu::WebCallbackRegistry::WebRequest const&, impala::Webserver::UrlHandler const&, std::__cxx11::basic_stringstream<char, std::char_traits<char>, std::allocator<char> >*, impala::ContentType*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)+0x177 C [impalad+0x14d5423] impala::Webserver::BeginRequestCallback(sq_connection*, sq_request_info*)+0x1c71 C [impalad+0x14d5d52] impala::Webserver::BeginRequestCallbackStatic(sq_connection*)+0x20 C [impalad+0x14e5a87] impala::ScanRangesPB::~ScanRangesPB()+0x111 C [impalad+0x14e7d28] impala::ScanRangeParamsPB::_InternalSerialize(unsigned char*, google::protobuf::io::EpsCopyOutputStream*) const+0x13e C [impalad+0x14e83ec] impala::PlanFragmentInstanceCtxPB::_InternalSerialize(unsigned char*, google::protobuf::io::EpsCopyOutputStream*) const+0x3ce
To trigger this issue, I looped get the oldest record on the page using a script:
#!/usr/bin/env python import requests from bs4 import BeautifulSoup import time root = "http://localhost:25000/" queries = root + "queries" profile = root + "query_profile_plain_text?query_id=" while True: response = requests.get(queries) soup = BeautifulSoup(response.content, "html.parser") details_links = soup.find_all("a", text="Details") last_details_link = details_links[-1] details_url = last_details_link["href"] query_id = details_url[-33:] response = requests.get(profile + query_id) content = response.content[0:44] print content
At the same time, I constantly executed select 1 using another script, To increase the probability of triggering, I added a small delay after the GetQueryRecord() call. Then it was easy to trigger the crash.