[IMPALA-10811] RPC to submit query getting stuck for AWS NLB forever. - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: None
Fix Version/s: Impala 4.1.0
Component/s: None
Labels:
None

Target Version:

Impala 4.1.0
Epic Color:
ghx-label-1

Description

Initial RPC to submit a query and fetch the query handle can take quite long time to return as it can do various operations for planning and submission that involve executing Catalog Operations like Rename, Alter Table Recover partition that can take time on tables with many partitions(https://github.com/apache/impala/blob/1231208da7104c832c13f272d1e5b8f554d29337/be/src/exec/catalog-op-executor.cc#L92). Attached is the profile of one such DDL query (with few fields hidden).

These RPCs are:

1. Beeswax:

https://github.com/apache/impala/blob/b28da054f3595bb92873433211438306fc22fbc7/be/src/service/impala-beeswax-server.cc#L57

2. HS2:

https://github.com/apache/impala/blob/b28da054f3595bb92873433211438306fc22fbc7/be/src/service/impala-hs2-server.cc#L462

One of the side effects of such RPC taking long time is that clients such as impala-shell using AWS NLB can get stuck for ever. The reason is NLB tracks and closes connections after 350s and cannot be configured. But after closing the connection it doesn;t send TCP RST to the client. Only when client tries to send data or packets NLB issues back TCP RST to indicate connection is not alive. Documentation is here: https://docs.aws.amazon.com/elasticloadbalancing/latest/network/network-load-balancers.html#connection-idle-timeout. Hence the impala-shell waiting for RPC to return gets stuck indefinitely.

Hence, we may need to evaluate techniques for RPCs to return query handle after

and execute later parts of RPC asynchronously in different thread without blocking the RPC. That way clients can get query handle and poll for it for state and results.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

profile+(13).txt
20/Jul/21 15:00
3 kB
Amogh Margoor

Issue Links

causes

IMPALA-10989 TSAN data race during data loading

Resolved

IMPALA-12711 DDL/DML errors are not shown in impalad logs

Resolved

is cloned by

IMPALA-10812 [DOCS] RPC to submit query getting stuck for AWS NLB forever.

Open

relates to

IMPALA-2568 ExecuteStatement RPC (and beeswax query() RPC) should not block

Open

Activity

People

Assignee:: Qifan Chen

Reporter:: Amogh Margoor

Votes:: 0 Vote for this issue

Watchers:: 6 Start watching this issue

Dates

Created:: 20/Jul/21 14:19

Updated:: 15/Jan/24 05:52

Resolved:: 25/Oct/21 13:18