Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-11503

Dropping files of Iceberg table in HiveCatalog will cause DROP TABLE to fail

Agile BoardAttach filesAttach ScreenshotAdd voteVotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Major
    • Resolution: Unresolved
    • Impala 4.1.0
    • None
    • Frontend
    • ghx-label-9

    Description

      When the files of n Iceberg table are dropped then a DROP TABLE will result in an error while the table will still show up in SHOW TABLES
      Here are the steps to repro:

      1) Run from Impala-shell

      DROP DATABASE IF EXISTS `drop_incomplete_table2` CASCADE;
      CREATE DATABASE `drop_incomplete_table2`;
      CREATE TABLE drop_incomplete_table2.iceberg_tbl (i int) stored as iceberg;
      INSERT INTO drop_incomplete_table2.iceberg_tbl VALUES (1), (2), (3); 

      2) Drop the folder of the table with hdfs dfs

      hdfs dfs -rm -r hdfs://localhost:20500/test-warehouse/drop_incomplete_table2.db/iceberg_tbl 

      3) Try to drop the table from Impala-shell

      DROP TABLE drop_incomplete_table2.iceberg_tbl;
      

      This results in the following error:

      ERROR: NotFoundException: Failed to open input stream for file: hdfs://localhost:20500/test-warehouse/drop_incomplete_table2.db/iceberg_tbl/metadata/00001-e2568132-d74d-44c2-9b7f-8838453e5944.metadata.json
      CAUSED BY: FileNotFoundException: File does not exist: /test-warehouse/drop_incomplete_table2.db/iceberg_tbl/metadata/00001-e2568132-d74d-44c2-9b7f-8838453e5944.metadata.json
          at org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:87)
          at org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:77)
          at org.apache.hadoop.hdfs.server.namenode.FSDirStatAndListingOp.getBlockLocations(FSDirStatAndListingOp.java:159)
          at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:2040)
          at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getBlockLocations(NameNodeRpcServer.java:737)
          at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getBlockLocations(ClientNamenodeProtocolServerSideTranslatorPB.java:454)
          at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
          at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:533)
          at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1070)
          at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:989)
          at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:917)
          at java.security.AccessController.doPrivileged(Native Method)
          at javax.security.auth.Subject.doAs(Subject.java:422)
          at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1899)
          at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2894)CAUSED BY: RemoteException: File does not exist: /test-warehouse/drop_incomplete_table2.db/iceberg_tbl/metadata/00001-e2568132-d74d-44c2-9b7f-8838453e5944.metadata.json
          at org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:87)
          at org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:77)
          at org.apache.hadoop.hdfs.server.namenode.FSDirStatAndListingOp.getBlockLocations(FSDirStatAndListingOp.java:159)
          at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:2040)
          at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getBlockLocations(NameNodeRpcServer.java:737)
          at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getBlockLocations(ClientNamenodeProtocolServerSideTranslatorPB.java:454)
          at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
          at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:533)
          at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1070)
          at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:989)
          at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:917)
          at java.security.AccessController.doPrivileged(Native Method)
          at javax.security.auth.Subject.doAs(Subject.java:422)
          at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1899)
          at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2894) 

      While table is still there in show tables output even after an invalidate metadata.

      Note, it's important for the repro to execute some SQL against the newly created table to load it in Impala. In this case I used an INSERT INTO but e.g. an ALTER TABLE would also be good. Apparently, when the table is "incomplete" (this is the state right after running CREATE TABLE) this works fine but not if the table is loaded.
      The suspicious part of code is in StmtMetadataLoader.loadTables() and getMissingTables() where there is a distinction between loaded and Incomplete tables.
      https://github.com/apache/impala/blob/2f74e956aa10db5af6a7cdc47e2ad42f63d5030f/fe/src/main/java/org/apache/impala/analysis/StmtMetadataLoader.java#L196

       

      Note2, the issue is quite similar to https://issues.apache.org/jira/browse/IMPALA-11502 but here the repro steps and the error is somewhat different.

      Attachments

        Issue Links

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            Unassigned Unassigned
            gaborkaszab Gabor Kaszab

            Dates

              Created:
              Updated:

              Slack

                Issue deployment