[IMPALA-11751] Crash in processing partition columns of Avro table with MT_DOP>1 - ASF JIRA

Details

Type: Bug
Status: Resolved
Priority: Critical
Resolution: Fixed
Affects Version/s: Impala 4.0.0, Impala 4.1.0, Impala 4.2.0, Impala 4.1.1
Fix Version/s: Impala 4.1.2, Impala 4.3.0
Component/s: Backend
Labels:
None

Target Version:

Impala 4.1.2
Epic Color:
ghx-label-8

Description

We saw a crash in a query that aggregates the string partition column of an Avro table with MT_DOP setting to 4. The query is quite simple:

create external table date_str_avro (v int)
  partitioned by (date_str string)
  stored as avro;
-- Import files attached in this JIRA, repeat the following query.
-- It will crash in 10 runs.
set MT_DOP=2;
select count(*), date_str from date_str_avro group by date_str;

It needs specifit data set to reproduce the crash. Files and steps given later.
Disable codegen (by "set disable_codegen=1") and reproduce the crash. The stacktrace is

Crash reason:  SIGSEGV /SEGV_MAPERR
Crash address: 0x0
Process uptime: not available

Thread 512 (crashed)
 0  impalad!impala::HashTableCtx::Hash(void const*, int, unsigned int) const [sse-util.h : 227 + 0x2]
 1  impalad!impala::HashTableCtx::HashVariableLenRow(unsigned char const*, unsigned char const*) const [hash-table.cc : 306 + 0x8]
 2  impalad!impala::HashTableCtx::HashRow(unsigned char const*, unsigned char const*) const [hash-table.cc : 255 + 0x5]
 3  impalad!void impala::GroupingAggregator::EvalAndHashPrefetchGroup<false>(impala::RowBatch*, int, impala::TPrefetchMode::type, impala::HashTableCtx*) [hash-table.inline.h : 39 + 0xe]
 4  impalad!impala::GroupingAggregator::AddBatchStreamingImpl(int, bool, impala::TPrefetchMode::type, impala::RowBatch*, impala::RowBatch*, impala::HashTableCtx*, int*) [grouping-aggregator-ir.cc : 185 + 0x1c]
 5  impalad!impala::GroupingAggregator::AddBatchStreaming(impala::RuntimeState*, impala::RowBatch*, impala::RowBatch*, bool*) [grouping-aggregator.cc : 520 + 0x2d]
 6  impalad!impala::StreamingAggregationNode::GetRowsStreaming(impala::RuntimeState*, impala::RowBatch*) [streaming-aggregation-node.cc : 120 + 0x3]
 7  impalad!impala::StreamingAggregationNode::GetNext(impala::RuntimeState*, impala::RowBatch*, bool*) [streaming-aggregation-node.cc : 77 + 0x19]
 8  impalad!impala::FragmentInstanceState::ExecInternal() [fragment-instance-state.cc : 446 + 0x3]
 9  impalad!impala::FragmentInstanceState::Exec() [fragment-instance-state.cc : 104 + 0xb]
10  impalad!impala::QueryState::ExecFInstance(impala::FragmentInstanceState*) [query-state.cc : 950 + 0x19]
11  impalad!impala::Thread::SuperviseThread(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, boost::function<void ()>, impala::ThreadDebugInfo const*, impala::Promise<long, (impala::PromiseMode)0>*) [function_template.hpp : 763 + 0x3]
12  impalad!boost::detail::thread_data<boost::_bi::bind_t<void, void (*)(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, boost::function<void ()>, impala::ThreadDebugInfo const*, impala::Promise<long, (impala::PromiseMode)0>*), boost::_bi::list5<boost::_bi::value<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, boost::_bi::value<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, boost::_bi::value<boost::function<void ()> >, boost::_bi::value<impala::ThreadDebugInfo*>, boost::_bi::value<impala::Promise<long, (impala::PromiseMode)0>*> > > >::run() [bind.hpp : 531 + 0x3]
13  impalad!thread_proxy + 0x67
14  libpthread.so.0 + 0x76ba
15  libc.so.6 + 0x1074dd

This is reproduced on commit 2733d039a of the master branch.

Reproducing the bug requires the following conditions:

Partitioned Avro table
MT_DOP is set to be larger than 1
Query needs follow-up processing (e.g. GROUP BY, JOIN, etc.) on the partition values or default values of missing fields in the files.
num of files(blocks) > num of impalads. So multiple scan fragment instances run on one impalad.
Some scan node instances finish earlier than others, e.g. when there are both small files and large files.

Steps to import the attached Avro data files

$ tar zxf date_str_avro.tar.gz
$ hdfs dfs -put date_str_avro/* hdfs_location_of_table_dir
impala-shell> alter table date_str_avro recover partitions;

RCA
This is a bug introduces by ~~IMPALA-9655~~.

Each avro file requires at least two scan ranges. The initial range reads the file header and initializes the template tuple. The initial scanner then issues follow-up scan ranges to read the file content. Mem of the template tuple is transferred to the ScanNode. Note that partition values are materialized into the template tuple.

After ~~IMPALA-9655~~, the ranges of a file could be scheduled to different ScanNode instances when MT_DOP > 1. In the following sequence, there is an illegal mem access of "heap-use-after-free", which could cause a crash.

t0:
Scanner of ScanNode-1 reads header of a large avro file.
Scanner of ScanNode-2 reads header of a small avro file.
Varlen memory of the template_tuple transfers to the corresponding ScanNode.
t1:
Scanner of ScanNode-1 reads content of the small avro file.
Scanner of ScanNode-2 reads content of the large avro file.
Scanner will reuse the template_tuple created by the header scanners [1]. So RowBatch produced by ScanNode-2 actually reference mem owned by ScanNode-1.
t2:
ScanNode-1 finishes first and closes (assuming no more files to read).
Downstream consumer of ScanNode-2 will crash if accessing the partition string values.

[1] https://github.com/apache/impala/blob/2733d039ad4a830a1ea34c1a75d2b666788e39a9/be/src/exec/avro/hdfs-avro-scanner.cc#L478

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

date_str_avro.tar.gz
28/Nov/22 12:50
11 kB
Quanlong Huang
heap-use-after-free-report1.txt
28/Nov/22 12:59
39 kB
Quanlong Huang
heap-use-after-free-report2.txt
28/Nov/22 12:59
39 kB
Quanlong Huang

Issue Links

is caused by

IMPALA-9655 Dynamic intra-node load balancing for HDFS scans

Resolved

links to

Code Review

Crash in processing partition columns of Avro table with MT_DOP>1

Details

Description

Attachments

Attachments

Issue Links

Activity

People

Dates