Uploaded image for project: 'Hadoop Common'
  1. Hadoop Common
  2. HADOOP-1955

Corrupted block replication retries for ever

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Blocker
    • Resolution: Fixed
    • 0.14.1
    • 0.14.2, 0.15.0
    • None
    • None

    Description

      When replicating corrupted block, receiving side rejects the block due to checksum error. Namenode keeps on retrying (with the same source datanode).
      Fsck shows those blocks as under-replicated.

      [Namenode log]

       
      2007-09-27 02:00:05,273 INFO org.apache.hadoop.dfs.StateChange: BLOCK* NameSystem.heartbeatCheck: lost heartbeat from 99.2.99.111
      ...
      2007-09-27 02:01:02,618 INFO org.apache.hadoop.dfs.StateChange: BLOCK* NameSystem.pendingTransfer: ask 99.9.99.11:9999 to replicate blk_-5925066143536023890 to datanode(s) 99.9.99.37:9999
      2007-09-27 02:10:03,843 WARN org.apache.hadoop.fs.FSNamesystem: PendingReplicationMonitor timed out block blk_-5925066143536023890
      2007-09-27 02:10:08,248 INFO org.apache.hadoop.dfs.StateChange: BLOCK* NameSystem.pendingTransfer: ask 99.9.99.11:9999 to replicate blk_-5925066143536023890 to datanode(s) 99.9.99.35:9999
      2007-09-27 02:20:03,848 WARN org.apache.hadoop.fs.FSNamesystem: PendingReplicationMonitor timed out block blk_-5925066143536023890
      2007-09-27 02:20:08,646 INFO org.apache.hadoop.dfs.StateChange: BLOCK* NameSystem.pendingTransfer: ask 99.9.99.11:9999 to replicate blk_-5925066143536023890 to datanode(s) 99.9.99.19:9999
      (repeats)
      

      [Datanode(sender) 99.9.99.11 log]

       
      2007-09-27 02:01:04,493 INFO org.apache.hadoop.dfs.DataNode: Starting thread to transfer block blk_-5925066143536023890 to [Lorg.apache.hadoop.dfs.DatanodeInfo;@e58187
      2007-09-27 02:01:05,153 WARN org.apache.hadoop.dfs.DataNode: Failed to transfer blk_-5925066143536023890 to 74.6.128.37:50010 got java.net.SocketException: Connection reset
        at java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:96)
        at java.net.SocketOutputStream.write(SocketOutputStream.java:136)
        at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:65)
        at java.io.BufferedOutputStream.write(BufferedOutputStream.java:109)
        at java.io.DataOutputStream.write(DataOutputStream.java:90)
        at org.apache.hadoop.dfs.DataNode.sendBlock(DataNode.java:1231)
        at org.apache.hadoop.dfs.DataNode$DataTransfer.run(DataNode.java:1280)
        at java.lang.Thread.run(Thread.java:619)
      (repeats)
      

      [Datanode(one of the receiver) 99.9.99.37 log]

       
      2007-09-27 02:01:05,150 ERROR org.apache.hadoop.dfs.DataNode: DataXceiver: java.io.IOException: Unexpected checksum mismatch while writing blk_-5925066143536023890 from /74.6.128.33:57605
        at org.apache.hadoop.dfs.DataNode$DataXceiver.writeBlock(DataNode.java:902)
        at org.apache.hadoop.dfs.DataNode$DataXceiver.run(DataNode.java:727)
        at java.lang.Thread.run(Thread.java:619)
      

      Attachments

        1. HADOOP-1955.patch
          11 kB
          Raghu Angadi
        2. HADOOP-1955.patch
          10 kB
          Raghu Angadi
        3. HADOOP-1955.patch
          10 kB
          Raghu Angadi
        4. HADOOP-1955.patch
          1 kB
          Raghu Angadi
        5. HADOOP-1955-branch14.patch
          11 kB
          Raghu Angadi
        6. HADOOP-1955-branch14.patch
          11 kB
          Raghu Angadi
        7. HADOOP-1955-branch14.patch
          11 kB
          Raghu Angadi

        Issue Links

          Activity

            People

              rangadi Raghu Angadi
              knoguchi Koji Noguchi
              Votes:
              0 Vote for this issue
              Watchers:
              0 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: