Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-12699

Coordinator should retry GetPartialCatalogObject request and apply a recv timeout

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Critical
    • Resolution: Fixed
    • None
    • Impala 4.4.0
    • Catalog
    • None
    • ghx-label-5

    Description

      We have seen trivial GetPartialCatalogObject RPCs hanging in coordinator side, e.g. IMPALA-11409. Due to the piggyback mechanism of fetching metadata in local-catalog mode (see IMPALA-7534 or comments in CatalogdMetaProvider#loadWithCaching()), a hanging RPC on shared metadata (e.g. db list or table list of a db) could block other queries.

      We have also seen thrift RPCs hanging in IMPALA-3575. In fact, GetPartialCatalogObject RPCs are read-only requests. They can be cleanly retried. We should consider using a dedicated catalogd client cache for GetPartialCatalogObject requests and set an appropriate timeout for the socket.

      The current catalogd client cache:
      https://github.com/apache/impala/blob/cdac777c51febc99500b8426c2b3aabc7e9addd7/be/src/runtime/exec-env.cc#L224-L226
      The related flags:
      https://github.com/apache/impala/blob/cdac777c51febc99500b8426c2b3aabc7e9addd7/be/src/runtime/exec-env.cc#L161-L167

      CC wzhou

      Attachments

        Issue Links

          Activity

            People

              stigahuang Quanlong Huang
              stigahuang Quanlong Huang
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: