Uploaded image for project: 'Hadoop HDFS'
  1. Hadoop HDFS
  2. HDFS-14479

Closing of HDFS file handle can be quite slow if file was deleted under the client

    XMLWordPrintableJSON

    Details

    • Type: Improvement
    • Status: Open
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: None
    • Labels:
      None

      Description

      In IMPALA-7176 we saw that hdfsClose() could sometimes take upwards of a second if the directory containing the file was deleted from underneath the writer. The error we get is like this:

      Error(2): No such file or directory
      Root cause: RemoteException: File does not exist: /test-warehouse/functional_parquet.db/alltypesinsert/_impala_insert_staging/3f4e3729014920fd_6348d0a700000000/.3f4e3729014920fd-6348d0a700000006_1116334753_dir/year=2009/month=90/3f4e3729014920fd-6348d0a700000006_1538863142_data.0.parq (inode 111416) [Lease.  Holder: DFSClient_NONMAPREDUCE_1345999117_1, pending creates: 237]
              at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:2782)
              at org.apache.hadoop.hdfs.server.namenode.FSDirWriteFileOp.analyzeFileState(FSDirWriteFileOp.java:599)
              at org.apache.hadoop.hdfs.server.namenode.FSDirWriteFileOp.validateAddBlock(FSDirWriteFileOp.java:171)
              at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2661)
              at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:872)
              at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:550)
              at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
              at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:523)
              at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:991)
              at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:869)
              at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:815)
              at java.security.AccessController.doPrivileged(Native Method)
              at javax.security.auth.Subject.doAs(Subject.java:422)
              at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1875)
              at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2675)
      

      A second doesn't sound so bad, but it really adds up if you have a significant number of files open (e.g. inserting into a partitioned Hive table) - just cleaning up the open file handles ties up a thread for a long time. It would be helpful for us if this was faster.

        Attachments

          Activity

            People

            • Assignee:
              Unassigned
              Reporter:
              tarmstrong Tim Armstrong
            • Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

              • Created:
                Updated: