Uploaded image for project: 'Hadoop HDFS'
  1. Hadoop HDFS
  2. HDFS-6592

Use Fluentd to write data into Hadoop HDFS over WebHDFS. Throw the AlreadyBeingCreatedException exception

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 2.3.0
    • None
    • None
    • None

    Description

      We use Fluent to collect log data. The log data append to the files in HDFS.

      The cluster configuration:

      Namenode : namenode1(hostname)

      secondnamenode: namenode2
      3 datanodes: datanode1, datanode2, datanode3
      3 replications

      Every few days, suffere from the following exception:

      Exception in nameNode1:

      2014-06-22 09:54:41,892 WARN org.apache.hadoop.hdfs.StateChange: DIR* NameSystem.append: Failed to create file [file_nameXXXXX] for [DFSClient_NONMAPREDUCE_-1425263782_2027206] on client [dataNode1], because this file is already being created by [DFSClient_NONMAPREDUCE_349196146_2027206] on [dataNode1]
      2014-06-22 09:54:41,892 WARN org.apache.hadoop.security.UserGroupInformation: PriviledgedActionException as:hadoop (auth:SIMPLE) cause:org.apache.hadoop.hdfs.protocol.AlreadyBeingCreatedException: Failed to create file [file_nameXXXXX] for [DFSClient_NONMAPREDUCE_-1425263782_2027206] on client [dataNode1], because this file is already being created by [DFSClient_NONMAPREDUCE_349196146_2027206] on [dataNode1]

      Exception in DataNode1:

      2014-06-22 09:54:45,771 WARN org.apache.hadoop.security.UserGroupInformation: PriviledgedActionException as:hadoop (auth:SIMPLE) cause:java.io.IOException: Unable to close file because the last block does not have enough number of replicas.
      2014-06-22 09:54:45,813 WARN org.apache.hadoop.security.UserGroupInformation: PriviledgedActionException as:hadoop (auth:SIMPLE) cause:org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.protocol.AlreadyBeingCreatedException): Failed to create file [file_nameXXXXX] for [DFSClient_NONMAPREDUCE_-1425263782_2027206] on client [dataNode1], because this file is already being created by [DFSClient_NONMAPREDUCE_349196146_2027206] on [dataNode1]
      at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.recoverLeaseInternal(FSNamesystem.java:2441)
      at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.appendFileInternal(FSNamesystem.java:2277)
      at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.appendFileInt(FSNamesystem.java:2505)
      at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.appendFile(FSNamesystem.java:2468)
      at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.append(NameNodeRpcServer.java:516)
      at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.append(ClientNamenodeProtocolServerSideTranslatorPB.java:340)
      at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
      at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585)
      at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928)
      at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1962)
      at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1958)
      at java.security.AccessController.doPrivileged(Native Method)
      at javax.security.auth.Subject.doAs(Subject.java:415)
      at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
      at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1956)

      According to the log, we infer the flow of the exception:

      1. Namenode update pipeline with just one datanode

      namenode1 log: 2014-06-22 09:54:16,604 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: updatePipeline
      (block=BP-1611177164-datanode1-1399894698024:blk_1074496235_1935947, newGenerationStamp=1935951, newLength=98839816, newNodes=[datanode1:50010],
      clientName=DFSClient_NONMAPREDUCE_349196146_2027206)

      2. datanode1 throw exception during close.

      datanode1 log: 2014-06-22 09:54:26,569 INFO org.apache.hadoop.hdfs.DFSClient: Could not complete file_name retrying...

      3. The subsequent collected data from Fluent will triger another DFSClient to append to the same file.

      namenode1 log: 2014-06-22 09:54:41,892 WARN org.apache.hadoop.security.UserGroupInformation: PriviledgedActionException as:hadoop (auth:SIMPLE)
      cause:org.apache.hadoop.hdfs.protocol.AlreadyBeingCreatedException: Failed to create file [file_name] for [DFSClient_NONMAPREDUCE_-1425263782_2027206] on client
      [datanode1], because this file is already being created by [DFSClient_NONMAPREDUCE_349196146_2027206] on [datanode1]

      4. The subsequent DFSClient will triger to recover the Lease every LEASE_SOFTLIMIT_PERIOD

      namenode1 log: 2014-06-22 09:58:34,722 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: startFile: recover [Lease. Holder:
      DFSClient_NONMAPREDUCE_349196146_2027206, pendingcreates: 1], src=file_name client DFSClient_NONMAPREDUCE_349196146_2027206

      5. Fail to recover the lease.

      namenode1 log:

      2014-06-22 09:58:34,722 WARN org.apache.hadoop.security.UserGroupInformation: PriviledgedActionException as:hadoop (auth:SIMPLE)
      cause:org.apache.hadoop.hdfs.protocol.AlreadyBeingCreatedException: DIR* NameSystem.internalReleaseLease: Failed to release lease for file file_name. Committed
      blocks are waiting to be minimally replicated. Try again later.

      datanode1 log:

      2014-06-22 09:56:15,149 WARN org.apache.hadoop.security.UserGroupInformation: PriviledgedActionException as:hadoop (auth:SIMPLE)
      cause:org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.protocol.AlreadyBeingCreatedException): DIR* NameSystem.internalReleaseLease: Failed to release
      lease for file file_name. Committed blocks are waiting to be minimally replicated. Try again later.
      at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.internalReleaseLease(FSNamesystem.java:3722)
      at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.recoverLeaseInternal(FSNamesystem.java:2429)
      at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.appendFileInternal(FSNamesystem.java:2277)
      at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.appendFileInt(FSNamesystem.java:2505)
      at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.appendFile(FSNamesystem.java:2468)
      at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.append(NameNodeRpcServer.java:516)
      at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.append(ClientNamenodeProtocolServerSideTranslatorPB.java:340)
      at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
      at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585)
      at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928)
      at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1962)
      at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1958)
      at java.security.AccessController.doPrivileged(Native Method)
      at javax.security.auth.Subject.doAs(Subject.java:415)
      at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
      at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1956)

      6. epire the LEASE_HARDLIMIT_PERIOD

      namenode1 log: WARN org.apache.hadoop.hdfs.StateChange: DIR* NameSystem.append: failed to create file file_name for DFSClient on client datanode1 because pendingCreates is non-null but no leases found.

      we found HDFS-4504 may be associated with our issue. However, need to check.

      Attachments

        Activity

          People

            Unassigned Unassigned
            huasanyelao jack
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated: