Details
-
Bug
-
Status: Resolved
-
Blocker
-
Resolution: Fixed
-
Impala 2.6.0, Impala 2.7.0, Impala 2.8.0
Description
in the HdfsScanner,RowBatch own some io_buffers when Scanner has complete some io read:
// We need to pass the row batch to the scan node if there is too much memory attached, // which can happen if the query is very selective. We need to release memory even // if no rows passed predicates. if (batch_->AtCapacity() || context_->num_completed_io_buffers() > 0) { context_->ReleaseCompletedResources(batch_, /* done */ false); }
when the row batch is reset, the io_buffers will be free or return to the mem_pool。
if (!FLAGS_disable_mem_pools && free_buffers_[idx].size() < FLAGS_max_free_io_buffers) { free_buffers_[idx].push_back(buffer); if (ImpaladMetrics::IO_MGR_NUM_UNUSED_BUFFERS != NULL) { ImpaladMetrics::IO_MGR_NUM_UNUSED_BUFFERS->Increment(1L); } } else { process_mem_tracker_->Release(buffer_size); num_allocated_buffers_.Add(-1); delete[] buffer; if (ImpaladMetrics::IO_MGR_NUM_BUFFERS != NULL) { ImpaladMetrics::IO_MGR_NUM_BUFFERS->Increment(-1L); } if (ImpaladMetrics::IO_MGR_TOTAL_BYTES != NULL) { ImpaladMetrics::IO_MGR_TOTAL_BYTES->Increment(-buffer_size); } }
here is the bug:the io_buffers owned by the row batch A may by used by the row batch B in next ScanNode::GetNext at the same time,but when the row batch B is need to be read,the io_buffers may has been released because row batch A has been reset。for example:
1. in scanner, get row batch A and owned the io_buffer O1。
2. row batch A has been consumed,and the io_buffer O1 is released。
3. in scanner, get row batch B,but some tuple in row batch B is pointed to io_buffer O1,for example,some string tuples。Especially when row batch A is AtCapacity(),the io_buffer is very likely not only used by row batch A。
4. when row batch B need to be consumed,some tuples will produce error data,because io_buffer O1 has been released。
this bug is easy to reproduce when use starts option: "disable_mem_pools=true",because in this situation, the io_buffers will be free really instead of being returned to the mem_pool。
Attachments
Issue Links
- is duplicated by
-
IMPALA-4715 Aggregate functions on nested type sometimes returns wrong results
- Resolved
Activity
git status # On branch cdh5-2.7.0_5.9.0 ${IMPALA_HOME}/bin/start-impala-cluster.py --build_type=release -s 3 --impalad_args '--disable_mem_pools=true' --log_level 1 impala-shell.sh [localhost.localdomain:21000] > select distinct p_type from bugtest where p_type='yydbyydb'; Query submitted at: 2016-11-29 00:25:27 (Coordinator: http://10-10-212-60:25000) Query progress can be monitored at: http://10-10-212-60:25000/query_plan?query_id=4846650025a1f96b:a3c5552100000000 +------------+ | p_type | +------------+ | yydbyydb | | | +------------+ Fetched 2 row(s) in 7.64s [localhost.localdomain:21000] >
Did you see a crash or incorrect results? I wasn't able to reproduce any problems when running with disable_mem_pools=true. I tried a few different file formats.
I don't understand based on your description why row batch B would reference memory from an IO buffer that was attached earlier. If your scenario is possible, the bug is that row batch B is referencing io_buffer O1.
We have a wiki page that describes the memory transfer model here: https://cwiki.apache.org/confluence/display/IMPALA/Impala+Row+Batches
I wasn't able to reproduce the bug locally or see a memory management error in the code. It's not clear from the description whether there was a crash or other symptoms.
Feel free to reopen if you have more info.
I can reproduce the bug in my test situation, but the test data is too large(about ~10GB) to provide。Bug I have already done some debug and try to find out the cause of the bug as follows:
1. init row batch A in HdfsParquetScanner::StartNewParquetRowBatch。
2. in HdfsParquetScanner::AssembleRows, scanner read one data page to io_buffer O1,and fill scratch_batch_ 。
3. in TransferScratchTuples,tuple rows transfer from scratch_batch_ to row batch A。In this case, row batch A is full and return, but some tuple rows is still in scratch_batch_。
4. in CommitRows, because row batch A is full,so io_buffer O1 was attached to row batch A in ReleaseCompletedResources。
5. init row batch B in HdfsParquetScanner::StartNewParquetRowBatch。
6. in HdfsParquetScanner::GetNextInternal, the rest tuple rows in scratch_batch_ added to row batch B,but this tuple rows was reference memory io_buffer O1。
7. row batch A is consumed,and the io_buffer O1 is released。
8. row batch B is in error status。
Thanks for the more detailed explanation. Can you please tell me what symptoms you saw so I can know if I reproduced it - did you see incorrect results or a crash or?
I think this may require non-dictionary-encoded string columns to trigger, which may explain why we haven't seen it.
This looks like a real bug - the root cause is that resources can be an attached to an output batch while still referenced by rows in scratch_batch_.
I was able to reproduce under ASAN with --disable_mem_pools=true. This only seems to be reproducible on plain-encoded uncompressed strings. I reproduced it quicker by hacking the parquet writer to always write plain-encoded data, but it will automatically switch to plain-encoding when writing distinct UUID values.
start-impala-cluster.py --impalad_args="--disable_mem_pools=true"
[tarmstrong-box.ca.cloudera.com:21000] > set compression_codec=none; COMPRESSION_CODEC set to none [tarmstrong-box.ca.cloudera.com:21000] > create table big_uuids stored as parquet as select uuid() from tpch_20_parquet.lineitem; Query: create table big_uuids stored as parquet as select uuid() from tpch_20_parquet.lineitem Query submitted at: 2016-12-07 13:08:37 (Coordinator: http://tarmstrong-box:25000) Query progress can be monitored at: http://tarmstrong-box:25000/query_plan?query_id=b74cc52a10d50c79:a367b9600000000 +---------------------------+ | summary | +---------------------------+ | Inserted 119994608 row(s) | +---------------------------+ Fetched 1 row(s) in 69.20s [tarmstrong-box.ca.cloudera.com:21000] > select ndv(_c0) from big_uuids where substring(_c0, 1, 2) != 'a1' limit 10; Query: select ndv(_c0) from big_uuids where substring(_c0, 1, 2) != 'a1' limit 10 Query submitted at: 2016-12-07 13:10:54 (Coordinator: http://tarmstrong-box:25000) Query progress can be monitored at: http://tarmstrong-box:25000/query_plan?query_id=204990812e8ba7ab:83b7aa3d00000000 WARNINGS: Cancelled due to unreachable impalad(s): tarmstrong-box:22002
================================================================= ==10656==ERROR: AddressSanitizer: heap-use-after-free on address 0x7f15c0782354 at pc 0x000000f9cbf9 bp 0x7f162dec3240 sp 0x7f162dec29f0 READ of size 2 at 0x7f15c0782354 thread T316 #0 0xf9cbf8 in __interceptor_strncmp /data/jenkins/workspace/verify-impala-toolchain-package-build/label/ec2-package-ubuntu-16-04/toolchain/source/llvm/llvm-3.8.0.src-p1/projects/compiler-rt/lib/asan/../sanitizer_common/sanitizer_common_interceptors.inc:253 #1 0x7f1641b70244 (<unknown module>) #2 0x17e5a79 in impala::HdfsParquetScanner::TransferScratchTuples(impala::RowBatch*) /tmp/be/src/exec/hdfs-parquet-scanner.cc:642:25 #3 0x17e3ced in impala::HdfsParquetScanner::GetNextInternal(impala::RowBatch*) /tmp/be/src/exec/hdfs-parquet-scanner.cc:389:29 #4 0x17e36c4 in impala::HdfsParquetScanner::ProcessSplit() /tmp/be/src/exec/hdfs-parquet-scanner.cc:351:31 #5 0x1775375 in impala::HdfsScanNode::ProcessSplit(std::vector<impala::FilterContext, std::allocator<impala::FilterContext> > const&, impala::DiskIoMgr::ScanRange*) /tmp/be/src/exec/hdfs-scan-node.cc:539:12 #6 0x1774541 in impala::HdfsScanNode::ScannerThread() /tmp/be/src/exec/hdfs-scan-node.cc:430:16 #7 0x177e457 in boost::_bi::bind_t<void, boost::_mfi::mf0<void, impala::HdfsScanNode>, boost::_bi::list1<boost::_bi::value<impala::HdfsScanNode*> > >::operator()() /opt/Impala-Toolchain/boost-1.57.0/include/boost/bind/bind_template.hpp:20:16 #8 0x12eb4e2 in boost::function0<void>::operator()() const /opt/Impala-Toolchain/boost-1.57.0/include/boost/function/function_template.hpp:766:14 #9 0x16b21f5 in impala::Thread::SuperviseThread(std::string const&, std::string const&, boost::function<void ()>, impala::Promise<long>*) /tmp/be/src/util/thread.cc:317:3 #10 0x16bafda in void boost::_bi::list4<boost::_bi::value<std::string>, boost::_bi::value<std::string>, boost::_bi::value<boost::function<void ()> >, boost::_bi::value<impala::Promise<long>*> >::operator()<void (*)(std::string const&, std::string const&, boost::function<void ()>, impala::Promise<long>*), boost::_bi::list0>(boost::_bi::type<void>, void (*&)(std::string const&, std::string const&, boost::function<void ()>, impala::Promise<long>*), boost::_bi::list0&, int) /opt/Impala-Toolchain/boost-1.57.0/include/boost/bind/bind.hpp:457:9 #11 0x16bae67 in boost::_bi::bind_t<void, void (*)(std::string const&, std::string const&, boost::function<void ()>, impala::Promise<long>*), boost::_bi::list4<boost::_bi::value<std::string>, boost::_bi::value<std::string>, boost::_bi::value<boost::function<void ()> >, boost::_bi::value<impala::Promise<long>*> > >::operator()() /opt/Impala-Toolchain/boost-1.57.0/include/boost/bind/bind_template.hpp:20:16 #12 0x1cfe579 in thread_proxy (/home/tarmstrong/Impala/incubator-impala/be/build/debug/service/impalad+0x1cfe579) #13 0x7f18a3920709 in start_thread (/lib/x86_64-linux-gnu/libpthread.so.0+0x7709) #14 0x7f18a344082c in clone /build/glibc-Qz8a69/glibc-2.23/misc/../sysdeps/unix/sysv/linux/x86_64/clone.S:109 0x7f15c0782354 is located 8313684 bytes inside of 8388608-byte region [0x7f15bff94800,0x7f15c0794800) freed by thread T310 here: #0 0x1025350 in operator delete[](void*) /data/jenkins/workspace/verify-impala-toolchain-package-build/label/ec2-package-ubuntu-16-04/toolchain/source/llvm/llvm-3.8.0.src-p1/projects/compiler-rt/lib/asan/asan_new_delete.cc:98 #1 0x13b3d5f in impala::DiskIoMgr::FreeBufferMemory(impala::DiskIoMgr::BufferDescriptor*) /tmp/be/src/runtime/disk-io-mgr.cc:806:7 #2 0x13ae80c in impala::DiskIoMgr::ReturnBuffer(impala::DiskIoMgr::BufferDescriptor*) /tmp/be/src/runtime/disk-io-mgr.cc:664:7 #3 0x13ae641 in impala::DiskIoMgr::BufferDescriptor::Return() /tmp/be/src/runtime/disk-io-mgr.cc:259:3 #4 0x13eb4d6 in impala::RowBatch::Reset() /tmp/be/src/runtime/row-batch.cc:314:5 #5 0x18852dc in impala::PartitionedAggregationNode::Open(impala::RuntimeState*) /tmp/be/src/exec/partitioned-aggregation-node.cc:354:5 #6 0x1c86f02 in impala::PlanFragmentExecutor::OpenInternal() /tmp/be/src/runtime/plan-fragment-executor.cc:329:31 #7 0x1c86872 in impala::PlanFragmentExecutor::Open() /tmp/be/src/runtime/plan-fragment-executor.cc:301:14 #8 0x158c3a5 in impala::FragmentMgr::FragmentExecState::Exec() /tmp/be/src/service/fragment-exec-state.cc:53:9 #9 0x157fd75 in impala::FragmentMgr::FragmentThread(impala::TUniqueId) /tmp/be/src/service/fragment-mgr.cc:86:3 #10 0x15853fd in boost::_mfi::mf1<void, impala::FragmentMgr, impala::TUniqueId>::operator()(impala::FragmentMgr*, impala::TUniqueId) const /opt/Impala-Toolchain/boost-1.57.0/include/boost/bind/mem_fn_template.hpp:165:16 #11 0x1585257 in void boost::_bi::list2<boost::_bi::value<impala::FragmentMgr*>, boost::_bi::value<impala::TUniqueId> >::operator()<boost::_mfi::mf1<void, impala::FragmentMgr, impala::TUniqueId>, boost::_bi::list0>(boost::_bi::type<void>, boost::_mfi::mf1<void, impala::FragmentMgr, impala::TUniqueId>&, boost::_bi::list0&, int) /opt/Impala-Toolchain/boost-1.57.0/include/boost/bind/bind.hpp:313:9 #12 0x1585107 in boost::_bi::bind_t<void, boost::_mfi::mf1<void, impala::FragmentMgr, impala::TUniqueId>, boost::_bi::list2<boost::_bi::value<impala::FragmentMgr*>, boost::_bi::value<impala::TUniqueId> > >::operator()() /opt/Impala-Toolchain/boost-1.57.0/include/boost/bind/bind_template.hpp:20:16 #13 0x12eb4e2 in boost::function0<void>::operator()() const /opt/Impala-Toolchain/boost-1.57.0/include/boost/function/function_template.hpp:766:14 #14 0x16b21f5 in impala::Thread::SuperviseThread(std::string const&, std::string const&, boost::function<void ()>, impala::Promise<long>*) /tmp/be/src/util/thread.cc:317:3 #15 0x16bafda in void boost::_bi::list4<boost::_bi::value<std::string>, boost::_bi::value<std::string>, boost::_bi::value<boost::function<void ()> >, boost::_bi::value<impala::Promise<long>*> >::operator()<void (*)(std::string const&, std::string const&, boost::function<void ()>, impala::Promise<long>*), boost::_bi::list0>(boost::_bi::type<void>, void (*&)(std::string const&, std::string const&, boost::function<void ()>, impala::Promise<long>*), boost::_bi::list0&, int) /opt/Impala-Toolchain/boost-1.57.0/include/boost/bind/bind.hpp:457:9 #16 0x16bae67 in boost::_bi::bind_t<void, void (*)(std::string const&, std::string const&, boost::function<void ()>, impala::Promise<long>*), boost::_bi::list4<boost::_bi::value<std::string>, boost::_bi::value<std::string>, boost::_bi::value<boost::function<void ()> >, boost::_bi::value<impala::Promise<long>*> > >::operator()() /opt/Impala-Toolchain/boost-1.57.0/include/boost/bind/bind_template.hpp:20:16 #17 0x1cfe579 in thread_proxy (/home/tarmstrong/Impala/incubator-impala/be/build/debug/service/impalad+0x1cfe579) previously allocated by thread T226 here: #0 0x1024d50 in operator new[](unsigned long) /data/jenkins/workspace/verify-impala-toolchain-package-build/label/ec2-package-ubuntu-16-04/toolchain/source/llvm/llvm-3.8.0.src-p1/projects/compiler-rt/lib/asan/asan_new_delete.cc:64 tarmstrong@tarmstrong-box:~/Impala/incubator-impala$ gvim be/src/exec/hdfs-parquet-scanner.cc
I also hit this DCHECK while experimenting with some interesting row batch/I/O buffers size combinations.
F1207 14:18:50.471175 20525 hdfs-parquet-scanner.cc:257] Check failed: scratch_batch_->mem_pool()->total_allocated_bytes() == 0 (17408 vs. 0) *** Check failure stack trace: *** @ 0x2ae7c36 google::DumpStackTraceAndExit() @ 0x2ae10ad google::LogMessage::Fail() @ 0x2ae39d6 google::LogMessage::SendToLog() @ 0x2ae0bcd google::LogMessage::Flush() @ 0x2ae447e google::LogMessageFatal::~LogMessageFatal() @ 0x17e30b8 impala::HdfsParquetScanner::Close() @ 0x17a92be impala::HdfsScanNodeMt::Close() @ 0x173ceb7 impala::ExecNode::Close() @ 0x188c449 impala::PartitionedAggregationNode::Close() @ 0x1c89453 impala::PlanFragmentExecutor::Close() @ 0x158c400 impala::FragmentMgr::FragmentExecState::Exec() @ 0x157fd76 impala::FragmentMgr::FragmentThread() @ 0x15853fe boost::_mfi::mf1<>::operator()() @ 0x1585258 boost::_bi::list2<>::operator()<>() @ 0x1585108 boost::_bi::bind_t<>::operator()() @ 0x12eb4e3 boost::function0<>::operator()() @ 0x16b21f6 impala::Thread::SuperviseThread() @ 0x16bafdb boost::_bi::list4<>::operator()<>() @ 0x16bae68 boost::_bi::bind_t<>::operator()() @ 0x1cfe5ba thread_proxy @ 0x7fa856a2070a start_thread @ 0x7fa85654082d (unknown) Picked up JAVA_TOOL_OPTIONS: -agentlib:jdwp=transport=dt_socket,address=30000,server=y,suspend=n
IMPALA-4539: fix bug when scratch batch references I/O buffers
The bug occurs when scanning uncompressed plain-encoded Parquet
string data.
Testing:
I could manually reproduce it reliably with ASAN and
--disable_mem_pools=true using the following steps:
- The repro is more predictable when there is no row batch queue.
set mt_dop=1; - A unique string column should switch to plain encoding
set compression_codec=none;
create table big_uuids stored as parquet as select uuid() from tpch_20_parquet.lineitem; - The repro requires that some rows are filtered out, so that we end
- up in a state where the output batch is full before all rows are
- copied from the scratch batch
select ndv(_c0) from big_uuids where substring(_c0, 1, 2) != 'a1' limit 10;
After the fix it no longer reproduces.
I do not yet have a practical test case that triggers the bug on a
normal ASAN setup. I will continue to try to create one.
Change-Id: Ic27e7251e0f633cb694b506f6eb62beed6e66ad9
Reviewed-on: http://gerrit.cloudera.org:8080/5406
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Impala Public Jenkins
—
M be/src/exec/hdfs-parquet-scanner.cc
1 file changed, 10 insertions, 8 deletions
Approvals:
Impala Public Jenkins: Verified
Tim Armstrong: Looks good to me, approved
In addition to Tim's instructions above, I've found that lowering the batch_size really helps with reproducing the issue, even with scan nodes that have a row batch queue. For example, a batch_size of 100 reproduced the issue reliably for me.
What release version/git hash did you see this on? Do you have more information about what query you ran to reproduce the problem and the symptoms of the problem?