Uploaded image for project: 'IMPALA'
  2. IMPALA-9532

Functions can disappear when a concurrent invalidate metadata is running



    • Type: Bug
    • Status: Open
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: Catalog
    • Labels:
    • Epic Color:


      The global invalidate metadata takes a write lock on the versionLock_. However, the locking protocol for ddls release the versionLock_ as soon as the table level lock is acquired. This allows for a concurrent invalidate metadata to run while the DDL operation is in progress. This can lead to weird race conditions. One such example is below can lead to functions disappearing from the catalog until a invalidate metadata is issued again.

      Following sequence of events can reproduce this race condition:

      [localhost:21000] default> create function default.f() returns int location '/test-warehouse/libTestUdfs.so' symbol='NoArgs';
      Query: create function default.f() returns int location '/test-warehouse/libTestUdfs.so' symbol='NoArgs'
      | summary                    |
      | Function has been created. |
      Fetched 1 row(s) in 10.26s
      --> Session 2 invokes invalidate metadata concurrently
      [localhost:21001] default> invalidate metadata; Query: invalidate metadata Query submitted at: 2020-03-18 15:04:25 (Coordinator: http://vihang-Precision-21575:25001) Query progress can be monitored at: http://<redacted>/query_plan?query_id=d3463484ff635684:620fbfef00000000 Fetched 0 row(s) in 4.30s
      --> drop function from session1 says function does not exist but show functions shows it.
      [localhost:21000] default> drop function f();
      Query: drop function f()
      ERROR: CatalogException: Function: f() does not exist.
      [localhost:21000] default> show functions;
      Query: show functions
      | return type | signature | binary type | is persistent |
      | INT         | f()       | NATIVE      | true          |
      Fetched 1 row(s) in 0.01s
      [localhost:21000] default> 
      -- Session 2 never sees the function f:
      [localhost:21001] default> show functions;
      Query: show functions
      Fetched 0 row(s) in 0.00s

      When the create function statement is executing in CatalogOpExecutor we apply the alterDatabase in HMS to persist the new db parameters here: https://github.com/apache/impala/blob/master/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java#L1409

      Note the we have released the versionLock_ by line 1409. Meanwhile a concurrent invalidate metadata fetches the db params from HMS here https://github.com/apache/impala/blob/master/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java#L1326 which will override the parameters of the newly created Db object. Hence effectively we are removing the function from the parameters since the operation 1 to alterDatabase is not yet committed in HMS.

      All subsequent commands of show functions, drop function will show inconsistent results. I was able to reproduce this race condition by added a sleep statement just before the alterDatabase call in the createFunction method.

      Note: Above code links are based of commit hash 7dd13f72784514a59f82c9a7a5e2250503dbfaf0




            • Assignee:
              vihangk1 Vihang Karajgaonkar
            • Votes:
              0 Vote for this issue
              3 Start watching this issue


              • Created: