Uploaded image for project: 'Hadoop Common'
  1. Hadoop Common
  2. HADOOP-3035

Data nodes should inform the name-node about block crc errors.

VotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Fixed
    • 0.16.0
    • 0.18.0
    • None
    • None
    • Incompatible change
    • Changed protocol for transferring blocks between data nodes to report corrupt blocks to data node for re-replication from a good replica.

    Description

      Currently if a crc error occurs when data-node replicates a block to another node it throws an exception, and continues.

          [junit] 2008-03-17 19:46:11,855 INFO  dfs.DataNode (DataNode.java:transferBlocks(811)) - 127.0.0.1:3730 Starting thread to transfer block blk_-1962819020391742554 to 127.0.0.1:3740
          [junit] 2008-03-17 19:46:11,855 INFO  dfs.DataNode (DataNode.java:writeBlock(1067)) - Receiving block blk_-1962819020391742554 src: /127.0.0.1:3791 dest: /127.0.0.1:3740
          [junit] 2008-03-17 19:46:11,855 INFO  dfs.DataNode (DataNode.java:receiveBlock(2504)) - Exception in receiveBlock for block blk_-1962819020391742554 java.io.IOException: Unexpected checksum mismatch while writing blk_-1962819020391742554 from /127.0.0.1
          [junit] 2008-03-17 19:46:11,871 INFO  dfs.DataNode (DataNode.java:run(2626)) - 127.0.0.1:3730:Transmitted block blk_-1962819020391742554 to /127.0.0.1:3740
          [junit] 2008-03-17 19:46:11,871 INFO  dfs.DataNode (DataNode.java:writeBlock(1192)) - writeBlock blk_-1962819020391742554 received exception java.io.IOException: Unexpected checksum mismatch while writing blk_-1962819020391742554 from /127.0.0.1
          [junit] 2008-03-17 19:46:11,871 ERROR dfs.DataNode (DataNode.java:run(979)) - 127.0.0.1:3740:DataXceiver: java.io.IOException: Unexpected checksum mismatch while writing blk_-1962819020391742554 from /127.0.0.1
          [junit]     at org.apache.hadoop.dfs.DataNode$BlockReceiver.receiveChunk(DataNode.java:2246)
          [junit]     at org.apache.hadoop.dfs.DataNode$BlockReceiver.receivePacket(DataNode.java:2416)
          [junit]     at org.apache.hadoop.dfs.DataNode$BlockReceiver.receiveBlock(DataNode.java:2474)
          [junit]     at org.apache.hadoop.dfs.DataNode$DataXceiver.writeBlock(DataNode.java:1173)
          [junit]     at org.apache.hadoop.dfs.DataNode$DataXceiver.run(DataNode.java:956)
          [junit]     at java.lang.Thread.run(Thread.java:595)
      

      The data-node should report the error to the name-node so that the corrupted replica could be removed and replicated.

      Attachments

        1. HADOOP-3035-3.patch
          12 kB
          Lohit Vijaya Renu
        2. HADOOP-3035-2.patch
          10 kB
          Lohit Vijaya Renu
        3. HADOOP-3035-1.patch
          10 kB
          Lohit Vijaya Renu

        Issue Links

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            lohit Lohit Vijaya Renu
            shv Konstantin Shvachko
            Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment