The call trace for executing a REFRESH statement in Catalogd is
In CatalogServiceCatalog#reloadTable(), the Tbl object may be stale if there's a concurrent reset, i.e. INVALIDATE METADATA, running. Then CatalogServiceCatalog#reloadTable will return the thrift object of a stale Table. It can't be found in the catalog cache and the topicUpdateLog_, so waitForSyncDdlVersion will finally hang or run out of attempts.
Here is an example. Let's say table1 is an unpartitioned table and is loaded. Two queries, "Refresh table1" and "Invalidate metadata" are running concurrently.
- Gets the Table object in CatalogServiceCatalog#execResetMetadata and goes into reloadTable. The catalog version of table1 is 50.
- Waiting for both version lock and table lock here: https://github.com/apache/impala/blob/a1588e44980c648cb7f9263cbd0409abfbaeacf7/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java#L2023
Thread-2 (Invalidate Metadata):
- Holds the version lock and replace the whole catalog cache with a new one. Makes all existing catalog objects stale. Now the catalog version of table1 is 90.
- Release the version lock.
- Gets the version lock and table lock
- Get a new catalog version, let's say 100. Then release version lock.
- Load the metadata into the stale Table object. Bump its catalog version from 50 to 100.
- Return the thrift object of the updated stale object from reloadTable
- Goes into waitForSyncDdlVersion. Wait for an update of table1 is sent and the sent version >= 100.
However, table1 in the catalog cache is with version 90. Unless there's another update on this table, Thread-1 will hang or run out of attempts for waiting the expected update.