Uploaded image for project: 'Hadoop HDFS'
  1. Hadoop HDFS
  2. HDFS-88

Hung on hdfs: writeChunk, DFSClient.java:2126, DataStreamer socketWrite

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Incomplete
    • None
    • None
    • None
    • None

    Description

      We've seen this hang rare enough but when it happens it locks up the application. We've seen it at least in 0.18.x and 0.19.x (we don't have much experience with 0.20.x hdfs yet).

      Here we're doing a sequencefile#append

      "IPC Server handler 9 on 60020" daemon prio=10 tid=0x00007fef1c3f0400 nid=0x7470 waiting for monitor entry [0x0000000042d18000..0x0000000042d189f0]
         java.lang.Thread.State: BLOCKED (on object monitor)
      	at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.writeChunk(DFSClient.java:2486)
      	- waiting to lock <0x00007fef38ecc138> (a java.util.LinkedList)
      	- locked <0x00007fef38ecbdb8> (a org.apache.hadoop.dfs.DFSClient$DFSOutputStream)
      	at org.apache.hadoop.fs.FSOutputSummer.writeChecksumChunk(FSOutputSummer.java:155)
      	at org.apache.hadoop.fs.FSOutputSummer.flushBuffer(FSOutputSummer.java:132)
      	- locked <0x00007fef38ecbdb8> (a org.apache.hadoop.dfs.DFSClient$DFSOutputStream)
      	at org.apache.hadoop.fs.FSOutputSummer.flushBuffer(FSOutputSummer.java:121)
      	- locked <0x00007fef38ecbdb8> (a org.apache.hadoop.dfs.DFSClient$DFSOutputStream)
      	at org.apache.hadoop.fs.FSOutputSummer.write1(FSOutputSummer.java:112)
      	at org.apache.hadoop.fs.FSOutputSummer.write(FSOutputSummer.java:86)
      	- locked <0x00007fef38ecbdb8> (a org.apache.hadoop.dfs.DFSClient$DFSOutputStream)
      	at org.apache.hadoop.fs.FSDataOutputStream$PositionCache.write(FSDataOutputStream.java:47)
      	at java.io.DataOutputStream.write(DataOutputStream.java:107)
      	- locked <0x00007fef38e09fc0> (a org.apache.hadoop.fs.FSDataOutputStream)
      	at org.apache.hadoop.io.SequenceFile$Writer.append(SequenceFile.java:1016)
      	- locked <0x00007fef38e09f30> (a org.apache.hadoop.io.SequenceFile$Writer)
      	at org.apache.hadoop.io.SequenceFile$Writer.append(SequenceFile.java:980)
      	- locked <0x00007fef38e09f30> (a org.apache.hadoop.io.SequenceFile$Writer)
      	at org.apache.hadoop.hbase.regionserver.HLog.doWrite(HLog.java:461)
      	at org.apache.hadoop.hbase.regionserver.HLog.append(HLog.java:421)
      	- locked <0x00007fef29ad9588> (a java.lang.Integer)
      	at org.apache.hadoop.hbase.regionserver.HRegion.update(HRegion.java:1676)
      	at org.apache.hadoop.hbase.regionserver.HRegion.batchUpdate(HRegion.java:1439)
      	at org.apache.hadoop.hbase.regionserver.HRegion.batchUpdate(HRegion.java:1378)
      	at org.apache.hadoop.hbase.regionserver.HRegionServer.batchUpdates(HRegionServer.java:1184)
      	at sun.reflect.GeneratedMethodAccessor27.invoke(Unknown Source)
      	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
      	at java.lang.reflect.Method.invoke(Method.java:616)
      	at org.apache.hadoop.hbase.ipc.HbaseRPC$Server.call(HbaseRPC.java:622)
      	at org.apache.hadoop.ipc.Server$Handler.run(Server.java:888)
      

      The DataStreamer that is supposed to servicing the above writeChunk is stuck here:

      "DataStreamer for file /hbase/log_72.34.249.212_1225407466779_60020/hlog.dat.1227075571390 block blk_-7436808403424765554_553837" daemon prio=10 tid=0x0000000001c84c00 nid=0x7125 in Object.wait() [0x00000000409b3000..0x00000000409b3d70]
         java.lang.Thread.State: WAITING (on object monitor)
      	at java.lang.Object.wait(Native Method)
      	at java.lang.Object.wait(Object.java:502)
      	at org.apache.hadoop.ipc.Client.call(Client.java:709)
      	- locked <0x00007fef39520bb8> (a org.apache.hadoop.ipc.Client$Call)
      	at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:216)
      	at org.apache.hadoop.dfs.$Proxy4.getProtocolVersion(Unknown Source)
      	at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:319)
      	at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:306)
      	at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:343)
      	at org.apache.hadoop.ipc.RPC.waitForProxy(RPC.java:288)
      	at org.apache.hadoop.dfs.DFSClient.createClientDatanodeProtocolProxy(DFSClient.java:139)
      	at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.processDatanodeError(DFSClient.java:2185)
      	at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.access$1400(DFSClient.java:1735)
      	at org.apache.hadoop.dfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:1889)
      	- locked <0x00007fef38ecc138> (a java.util.LinkedList)
      

      The writeChunk is trying to synchronize on dataQueue.

      DataQueue is held by DataStreamer#run which is down in processDatanodeError trying to recover a problem with a block.

      Another example of the hang and some more detail can be found over in HBASE-667.

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              stack Michael Stack
              Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: