Details
-
Bug
-
Status: Resolved
-
Critical
-
Resolution: Fixed
-
None
-
None
-
ghx-label-5
Description
We have seen trivial GetPartialCatalogObject RPCs hanging in coordinator side, e.g. IMPALA-11409. Due to the piggyback mechanism of fetching metadata in local-catalog mode (see IMPALA-7534 or comments in CatalogdMetaProvider#loadWithCaching()), a hanging RPC on shared metadata (e.g. db list or table list of a db) could block other queries.
We have also seen thrift RPCs hanging in IMPALA-3575. In fact, GetPartialCatalogObject RPCs are read-only requests. They can be cleanly retried. We should consider using a dedicated catalogd client cache for GetPartialCatalogObject requests and set an appropriate timeout for the socket.
The current catalogd client cache:
https://github.com/apache/impala/blob/cdac777c51febc99500b8426c2b3aabc7e9addd7/be/src/runtime/exec-env.cc#L224-L226
The related flags:
https://github.com/apache/impala/blob/cdac777c51febc99500b8426c2b3aabc7e9addd7/be/src/runtime/exec-env.cc#L161-L167
CC wzhou
Attachments
Issue Links
- relates to
-
IMPALA-7534 Handle invalidation races in CatalogdMetaProvider cache
- Resolved
-
IMPALA-3575 Impala should retry backend connection request and apply a send timeout
- Resolved
-
IMPALA-11409 Skip UpdateCatalogMetrics if another thead is on-going in it
- Resolved