Hadoop Common
  1. Hadoop Common
  2. HADOOP-3035

Data nodes should inform the name-node about block crc errors.

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 0.16.0
    • Fix Version/s: 0.18.0
    • Component/s: None
    • Labels:
      None
    • Hadoop Flags:
      Incompatible change
    • Release Note:
      Changed protocol for transferring blocks between data nodes to report corrupt blocks to data node for re-replication from a good replica.

      Description

      Currently if a crc error occurs when data-node replicates a block to another node it throws an exception, and continues.

          [junit] 2008-03-17 19:46:11,855 INFO  dfs.DataNode (DataNode.java:transferBlocks(811)) - 127.0.0.1:3730 Starting thread to transfer block blk_-1962819020391742554 to 127.0.0.1:3740
          [junit] 2008-03-17 19:46:11,855 INFO  dfs.DataNode (DataNode.java:writeBlock(1067)) - Receiving block blk_-1962819020391742554 src: /127.0.0.1:3791 dest: /127.0.0.1:3740
          [junit] 2008-03-17 19:46:11,855 INFO  dfs.DataNode (DataNode.java:receiveBlock(2504)) - Exception in receiveBlock for block blk_-1962819020391742554 java.io.IOException: Unexpected checksum mismatch while writing blk_-1962819020391742554 from /127.0.0.1
          [junit] 2008-03-17 19:46:11,871 INFO  dfs.DataNode (DataNode.java:run(2626)) - 127.0.0.1:3730:Transmitted block blk_-1962819020391742554 to /127.0.0.1:3740
          [junit] 2008-03-17 19:46:11,871 INFO  dfs.DataNode (DataNode.java:writeBlock(1192)) - writeBlock blk_-1962819020391742554 received exception java.io.IOException: Unexpected checksum mismatch while writing blk_-1962819020391742554 from /127.0.0.1
          [junit] 2008-03-17 19:46:11,871 ERROR dfs.DataNode (DataNode.java:run(979)) - 127.0.0.1:3740:DataXceiver: java.io.IOException: Unexpected checksum mismatch while writing blk_-1962819020391742554 from /127.0.0.1
          [junit]     at org.apache.hadoop.dfs.DataNode$BlockReceiver.receiveChunk(DataNode.java:2246)
          [junit]     at org.apache.hadoop.dfs.DataNode$BlockReceiver.receivePacket(DataNode.java:2416)
          [junit]     at org.apache.hadoop.dfs.DataNode$BlockReceiver.receiveBlock(DataNode.java:2474)
          [junit]     at org.apache.hadoop.dfs.DataNode$DataXceiver.writeBlock(DataNode.java:1173)
          [junit]     at org.apache.hadoop.dfs.DataNode$DataXceiver.run(DataNode.java:956)
          [junit]     at java.lang.Thread.run(Thread.java:595)
      

      The data-node should report the error to the name-node so that the corrupted replica could be removed and replicated.

      1. HADOOP-3035-3.patch
        12 kB
        Lohit Vijayarenu
      2. HADOOP-3035-2.patch
        10 kB
        Lohit Vijayarenu
      3. HADOOP-3035-1.patch
        10 kB
        Lohit Vijayarenu

        Issue Links

          Activity

          Konstantin Shvachko created issue -
          dhruba borthakur made changes -
          Field Original Value New Value
          Link This issue relates to HADOOP-3314 [ HADOOP-3314 ]
          Hide
          Lohit Vijayarenu added a comment -

          This patch fixes the problem

          • With OP_WRITE_BLOCK, we also send a boolean.
          • If this boolean is true, we also send Client DatanodeInfo along with Client name string
          • This DatanodeInfo would be used to report bad blocks to the namenode by the receiving datanode
          • Added a test case, which creates a file with replication of 1, corrupts it and requests a replication of 2. Upon replication, receiving node detects this and reports it as bad block.
          Show
          Lohit Vijayarenu added a comment - This patch fixes the problem With OP_WRITE_BLOCK, we also send a boolean. If this boolean is true, we also send Client DatanodeInfo along with Client name string This DatanodeInfo would be used to report bad blocks to the namenode by the receiving datanode Added a test case, which creates a file with replication of 1, corrupts it and requests a replication of 2. Upon replication, receiving node detects this and reports it as bad block.
          Lohit Vijayarenu made changes -
          Attachment HADOOP-3035-1.patch [ 12382062 ]
          Hide
          Lohit Vijayarenu added a comment -

          Updated patch with different variable names

          Show
          Lohit Vijayarenu added a comment - Updated patch with different variable names
          Lohit Vijayarenu made changes -
          Attachment HADOOP-3035-2.patch [ 12382066 ]
          Lohit Vijayarenu made changes -
          Status Open [ 1 ] Patch Available [ 10002 ]
          Release Note During block transfers between datanodes, the receiving datanode, now can report corrupt replicas received from src node to the namenode
          Hadoop Flags [Incompatible change]
          Hide
          Hadoop QA added a comment -

          +1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12382066/HADOOP-3035-2.patch
          against trunk revision 656270.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 7 new or modified tests.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 findbugs. The patch does not introduce any new Findbugs warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 core tests. The patch passed core unit tests.

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2468/testReport/
          Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2468/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
          Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2468/artifact/trunk/build/test/checkstyle-errors.html
          Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2468/console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - +1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12382066/HADOOP-3035-2.patch against trunk revision 656270. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 7 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2468/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2468/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2468/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2468/console This message is automatically generated.
          Chris Douglas made changes -
          Assignee lohit vijayarenu [ lohit ]
          Hide
          Raghu Angadi added a comment -

          +1. A few minor comments :

          • Unit test: our normal approach is to wait in a loop and in each iteration, wait for shorter time (500 millisec) in each iteration. So normally test finishes faster and will be able to handle platform related unexpected (and unavoidable) delays.
          • The test does not belong to TestDatadndeBlockScanner.
          • you could log before invoking reportBadBlocks().
          Show
          Raghu Angadi added a comment - +1. A few minor comments : Unit test: our normal approach is to wait in a loop and in each iteration, wait for shorter time (500 millisec) in each iteration. So normally test finishes faster and will be able to handle platform related unexpected (and unavoidable) delays. The test does not belong to TestDatadndeBlockScanner. you could log before invoking reportBadBlocks().
          Lohit Vijayarenu made changes -
          Status Patch Available [ 10002 ] Open [ 1 ]
          Hide
          Lohit Vijayarenu added a comment -

          Attached patch changes as suggested by Raghu.

          Show
          Lohit Vijayarenu added a comment - Attached patch changes as suggested by Raghu.
          Lohit Vijayarenu made changes -
          Attachment HADOOP-3035-3.patch [ 12382506 ]
          Lohit Vijayarenu made changes -
          Status Open [ 1 ] Patch Available [ 10002 ]
          Lohit Vijayarenu made changes -
          Status Patch Available [ 10002 ] Open [ 1 ]
          Lohit Vijayarenu made changes -
          Status Open [ 1 ] Patch Available [ 10002 ]
          Hide
          Hadoop QA added a comment -

          +1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12382506/HADOOP-3035-3.patch
          against trunk revision 659005.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 10 new or modified tests.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 findbugs. The patch does not introduce any new Findbugs warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 core tests. The patch passed core unit tests.

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2515/testReport/
          Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2515/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
          Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2515/artifact/trunk/build/test/checkstyle-errors.html
          Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2515/console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - +1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12382506/HADOOP-3035-3.patch against trunk revision 659005. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 10 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2515/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2515/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2515/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2515/console This message is automatically generated.
          Hide
          Raghu Angadi added a comment -

          I just committed this. Thanks Lohit!

          Show
          Raghu Angadi added a comment - I just committed this. Thanks Lohit!
          Robert Chansler made changes -
          Status Patch Available [ 10002 ] Resolved [ 5 ]
          Fix Version/s 0.18.0 [ 12312972 ]
          Resolution Fixed [ 1 ]
          Hide
          Hudson added a comment -
          Show
          Hudson added a comment - Integrated in Hadoop-trunk #500 (See http://hudson.zones.apache.org/hudson/job/Hadoop-trunk/500/ )
          Robert Chansler made changes -
          Release Note During block transfers between datanodes, the receiving datanode, now can report corrupt replicas received from src node to the namenode Changed protocol for transferring blocks between data nodes to report corrupt blocks to data node for re-replication from a good replica.
          Nigel Daley made changes -
          Status Resolved [ 5 ] Closed [ 6 ]
          Owen O'Malley made changes -
          Component/s dfs [ 12310710 ]
          Transition Time In Source Status Execution Times Last Executer Last Execution Date
          Patch Available Patch Available Open Open
          7d 7h 14m 2 Lohit Vijayarenu 22/May/08 02:43
          Open Open Patch Available Patch Available
          57d 15h 25m 3 Lohit Vijayarenu 22/May/08 02:43
          Patch Available Patch Available Resolved Resolved
          22h 27m 1 Robert Chansler 23/May/08 01:10
          Resolved Resolved Closed Closed
          91d 19h 39m 1 Nigel Daley 22/Aug/08 20:50

            People

            • Assignee:
              Lohit Vijayarenu
              Reporter:
              Konstantin Shvachko
            • Votes:
              0 Vote for this issue
              Watchers:
              0 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development