Uploaded image for project: 'Hadoop HDFS'
  1. Hadoop HDFS
  2. HDFS-3655

Datanode recoverRbw could hang sometime

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Duplicate
    • 0.22.0, 1.0.3, 2.0.0-alpha
    • None
    • datanode
    • None

    Description

      This bug seems to apply to 0.22 and hadoop 2.0. I will upload the initial fix done by my colleague Xiaobo Peng shortly ( there is some logistics issue being worked on so that he can upload patch himself later ).

      recoverRbw try to kill the old writer thread, but it took the lock (FSDataset monitor object) which the old writer thread is waiting on ( for example the call to data.getTmpInputStreams ).

      "DataXceiver for client /10.110.3.43:40193 [Receiving block blk_-3037542385914640638_57111747 client=DFSClient_attempt_201206021424_0001_m_000401_0]" daemon prio=10 tid=0x00007facf8111800 nid=0x6b64 in Object.wait() [0x00007facd1ddb000]
      java.lang.Thread.State: WAITING (on object monitor)
      at java.lang.Object.wait(Native Method)
      at java.lang.Thread.join(Thread.java:1186)

      ■locked <0x00000007856c1200> (a org.apache.hadoop.util.Daemon)
      at java.lang.Thread.join(Thread.java:1239)
      at org.apache.hadoop.hdfs.server.datanode.ReplicaInPipeline.stopWriter(ReplicaInPipeline.java:158)
      at org.apache.hadoop.hdfs.server.datanode.FSDataset.recoverRbw(FSDataset.java:1347)
      ■locked <0x00000007838398c0> (a org.apache.hadoop.hdfs.server.datanode.FSDataset)
      at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.<init>(BlockReceiver.java:119)
      at org.apache.hadoop.hdfs.server.datanode.DataXceiver.opWriteBlockInternal(DataXceiver.java:391)
      at org.apache.hadoop.hdfs.server.datanode.DataXceiver.opWriteBlock(DataXceiver.java:327)
      at org.apache.hadoop.hdfs.protocol.DataTransferProtocol$Receiver.opWriteBlock(DataTransferProtocol.java:405)
      at org.apache.hadoop.hdfs.protocol.DataTransferProtocol$Receiver.processOp(DataTransferProtocol.java:344)
      at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:183)
      at java.lang.Thread.run(Thread.java:662)

      Attachments

        1. HDFS-3655-0.22-use-join-instead-of-wait.patch
          16 kB
          Xiaobo Peng
        2. HDFS-3655-0.22.patch
          11 kB
          Ming Ma

        Issue Links

          Activity

            People

              Unassigned Unassigned
              mingma Ming Ma
              Votes:
              0 Vote for this issue
              Watchers:
              16 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: