[IMPALA-12233] Partitioned hash join with a limit can hang when using mt_dop>0 - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Blocker
Resolution: Fixed
Affects Version/s: Impala 4.3.0
Fix Version/s: Impala 4.3.0
Component/s: Backend
Labels:
None

Target Version:

Impala 4.3.0, Impala 4.1.3
Epic Color:
ghx-label-8

Description

After encountering a hung query on an Impala cluster, we were able to reproduce it in the Impala developer environment with these steps:

use tpcds;
set mt_dop=2;
select ss_cdemo_sk from store_sales where ss_sold_date_sk = (select max(ss_sold_date_sk) from store_sales) group by ss_cdemo_sk limit 1;

The problem reproduces with limit values up to 183, then at limit 184 and higher it doesn't reproduce.

Taking stack traces show a thread waiting for a cyclic barrier:

 0  libpthread.so.0!__pthread_cond_wait + 0x216
 1  impalad!impala::CyclicBarrier::Wait<impala::PhjBuilder::DoneProbingHashPartitions(const int64_t*, impala::BufferPool::ClientHandle*, impala::RuntimeProfile*, std::deque<std::unique_ptr<impala::PhjBuilderPartition> >*, impala::RowBatch*)::<lambda()> > [condition-variable.h : 49 + 0xc]
 2  impalad!impala::PhjBuilder::DoneProbingHashPartitions(long const*, impala::BufferPool::ClientHandle*, impala::RuntimeProfile*, std::deque<std::unique_ptr<impala::PhjBuilderPartition, std::default_delete<impala::PhjBuilderPartition> >, std::allocator<std::unique_ptr<impala::PhjBuilderPartition, std::default_delete<impala::PhjBuilderPartition> > > >*, impala::RowBatch*) [partitioned-hash-join-builder.cc : 766 + 0x25]
 3  impalad!impala::PartitionedHashJoinNode::DoneProbing(impala::RuntimeState*, impala::RowBatch*) [partitioned-hash-join-node.cc : 1189 + 0x28]
 4  impalad!impala::PartitionedHashJoinNode::GetNext(impala::RuntimeState*, impala::RowBatch*, bool*) [partitioned-hash-join-node.cc : 599 + 0x15]
 5  impalad!impala::StreamingAggregationNode::GetRowsStreaming(impala::RuntimeState*, impala::RowBatch*) [streaming-aggregation-node.cc : 115 + 0x14]
 6  impalad!impala::StreamingAggregationNode::GetNext(impala::RuntimeState*, impala::RowBatch*, bool*) [streaming-aggregation-node.cc : 77 + 0x15]
 7  impalad!impala::FragmentInstanceState::ExecInternal() [fragment-instance-state.cc : 446 + 0x15]
 8  impalad!impala::FragmentInstanceState::Exec() [fragment-instance-state.cc : 104 + 0xf]
 9  impalad!impala::QueryState::ExecFInstance(impala::FragmentInstanceState*) [query-state.cc : 956 + 0xf]

Adding some debug logging around locations that go through that cyclic barrier, we see one Impalad where it is expecting two threads and only one arrives:

I0621 18:28:19.926551 210363 partitioned-hash-join-builder.cc:766] 2a4787b28425372d:ac6bd96200000004] DoneProbingHashPartitions: num_probe_threads_=2
I0621 18:28:19.927855 210362 streaming-aggregation-node.cc:136] 2a4787b28425372d:ac6bd96200000003] the number of rows (93) returned from the streaming aggregation node has exceeded the limit of 1
I0621 18:28:19.928887 210362 query-state.cc:958] 2a4787b28425372d:ac6bd96200000003] Instance completed. instance_id=2a4787b28425372d:ac6bd96200000003 #in-flight=4 status=OK

Other instances that don't have a stuck thread see both threads arrive:

I0621 18:28:19.926223 210358 partitioned-hash-join-builder.cc:766] 2a4787b28425372d:ac6bd96200000005] DoneProbingHashPartitions: num_probe_threads_=2
I0621 18:28:19.926326 210359 partitioned-hash-join-builder.cc:766] 2a4787b28425372d:ac6bd96200000006] DoneProbingHashPartitions: num_probe_threads_=2

So, there must be a codepath that skips going through the cyclic barrier.

Attachments

Activity

People

Assignee:: Gergely Fürnstáhl

Reporter:: Joe McDonnell

Votes:: 0 Vote for this issue

Watchers:: 6 Start watching this issue

Dates

Created:: 22/Jun/23 04:00

Updated:: 21/Jul/23 08:22

Resolved:: 20/Jul/23 20:18