[IMPALA-6685] Improve profile in KrpcDataStreamRecvr and KrpcDataStreamSender - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Sub-task
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: Impala 3.0, Impala 2.12.0
Fix Version/s: Impala 3.0, Impala 2.12.0
Component/s: Distributed Exec
Labels:
- observability

Epic Color:
ghx-label-8

Description

The existing profiles in KrpcDataStreamRecvr and KrpcDataStreamSender made it hard to diagnose slow queries shown in ~~IMPALA-6657~~. In particular, there are times in which the profile of the receiver showing a lot of time waiting for row batches to arrive while the sender is also showing a lot of time waiting for responses of TransmitData() RPC.

A couple of improvements can be done to make it slightly easier to diagnose the problem:

track the number of deferred row batches over time in KrpcDataStreamRecvr
track the number of bytes dequeued over time in KrpcDataStreamRecvr
track the amount of time row batches spent in deferred queue
track the number of bytes sent from KrpcDataStreamSender over time

The above items help identify cases in which one fragment instances containing an exchange node is slow for a period of time (e.g. the parent of exchange node spills heavily), causing all senders to that fragment instance to block waiting for responses. As all senders are blocked waiting for previous RPC to complete, they will not produce more rows and all other fragment instances will be starved, leading to the high wait time shown in their receiver's profile. The time series counter for the number of deferred row batches in a receiver helps identify cases described above.

Attachments

Issue Links

breaks

IMPALA-7449 TotalNetworkThroughput in KrpcDataStreamSender is broken

Resolved

duplicates

IMPALA-515 add a time-series for bytes sent in the datastreamsender node

Resolved

is related to

IMPALA-6692 When partition exchange is followed by sort each sort node becomes a synchronization point across the cluster

Reopened

IMPALA-10137 Network Debugging / Supportability Improvements

Open

IMPALA-5473 Make diagnosing network issues easier

Open

relates to

IMPALA-10241 Impala Doc: RPC troubleshooting guide

Open

(1 relates to)

Activity

People

Assignee:: Michael Ho

Reporter:: Michael Ho

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Dates

Created:: 16/Mar/18 06:45

Updated:: 14/Oct/20 19:39

Resolved:: 29/Mar/18 06:52