Uploaded image for project: 'Hadoop Common'
  1. Hadoop Common
  2. HADOOP-3657

HDFS writes get stuck trying to recoverBlock

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Blocker
    • Resolution: Cannot Reproduce
    • Affects Version/s: 0.18.0
    • Fix Version/s: None
    • Component/s: None
    • Labels:
      None

      Description

      A few reduces got stuck in a sort500 job with the following thread dump:

      "main" prio=10 tid=0x0805b800 nid=0x1951 waiting for monitor entry [0xf7e6d000..0xf7e6e1f8]
         java.lang.Thread.State: BLOCKED (on object monitor)
        at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.writeChunk(DFSClient.java:2485)
        - waiting to lock <0xe905e8f8> (a java.util.LinkedList)
        - locked <0xe905e928> (a org.apache.hadoop.dfs.DFSClient$DFSOutputStream)
        at org.apache.hadoop.fs.FSOutputSummer.writeChecksumChunk(FSOutputSummer.java:155)
        at org.apache.hadoop.fs.FSOutputSummer.flushBuffer(FSOutputSummer.java:132)
        - locked <0xe905e928> (a org.apache.hadoop.dfs.DFSClient$DFSOutputStream)
        at org.apache.hadoop.fs.FSOutputSummer.flushBuffer(FSOutputSummer.java:121)
        - locked <0xe905e928> (a org.apache.hadoop.dfs.DFSClient$DFSOutputStream)
        at org.apache.hadoop.fs.FSOutputSummer.write(FSOutputSummer.java:58)
        - locked <0xe905e928> (a org.apache.hadoop.dfs.DFSClient$DFSOutputStream)
        at org.apache.hadoop.fs.FSDataOutputStream$PositionCache.write(FSDataOutputStream.java:39)
        at java.io.DataOutputStream.writeInt(DataOutputStream.java:181)
        at org.apache.hadoop.io.SequenceFile$Writer.append(SequenceFile.java:1014)
        - locked <0xe90889e8> (a org.apache.hadoop.io.SequenceFile$Writer)
        at org.apache.hadoop.mapred.SequenceFileOutputFormat$1.write(SequenceFileOutputFormat.java:70)
        at org.apache.hadoop.mapred.ReduceTask$3.collect(ReduceTask.java:298)
        at org.apache.hadoop.mapred.lib.IdentityReducer.reduce(IdentityReducer.java:39)
        at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:316)
        at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2157)
      
      "DataStreamer for file /rw/out/_temporary/_attempt_200806261801_0006_r_000712_0/part-00712 block blk_-3923696991063961587_9628" daemon prio=10 tid=0x08413c00 nid=0x367a in Object.wait() [0xd00e4000..0xd00e4f20]
         java.lang.Thread.State: WAITING (on object monitor)
        at java.lang.Object.wait(Native Method)
        at java.lang.Object.wait(Object.java:485)
        at org.apache.hadoop.ipc.Client.call(Client.java:701)
        - locked <0xf167d540> (a org.apache.hadoop.ipc.Client$Call)
        at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:216)
        at org.apache.hadoop.dfs.$Proxy2.recoverBlock(Unknown Source)
        at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.processDatanodeError(DFSClient.java:2186)
        at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.access$1400(DFSClient.java:1737)
        at org.apache.hadoop.dfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:1891)
        - locked <0xe905e8f8> (a java.util.LinkedList)
      

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                rangadi Raghu Angadi
                Reporter:
                acmurthy Arun Murthy
              • Votes:
                0 Vote for this issue
                Watchers:
                2 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: