Hadoop Common
  1. Hadoop Common
  2. HADOOP-3033

Datanode fails write to DFS file with exception message "Trying to change block file offset"

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 0.16.1
    • Fix Version/s: 0.16.2
    • Component/s: None
    • Labels:
      None

      Description

      A write to a DFS block failed with the lastdatanode in the pipeline reporting this error:

      Receiving block blk_-7279084187433655573 src: /xx.xx.xx.xx:xx dest: /xx.xx.xx.xx:50010
      Changing block file offset of block blk_-7279084187433655573 from 9043968 to 9043968 meta file offset to 70663
      Changing block file offset of block blk_-7279084187433655573 from 111935488 to 112001024 meta file offset to 875015
      Exception in receiveBlock for block blk_-7279084187433655573 java.io.IOException: Trying to change block file offset of block blk_-7279084187433655573 to 112001024 but actual size of file is 111935488
      PacketResponder 0 for block blk_-7279084187433655573 Interrupted.
      PacketResponder 0 for block blk_-7279084187433655573 terminating
      writeBlock blk_-7279084187433655573 received exception java.io.IOException: Trying to change block file offset of block blk_-7279084187433655573 to 112001024 but actual size of file is 111935488
      DataXceiver: java.io.IOException: Trying to change block file offset of block blk_-7279084187433655573 to 112001024 but actual size of file is 111935488

      1. badnode2.patch
        5 kB
        dhruba borthakur
      2. badnode.patch
        2 kB
        dhruba borthakur

        Issue Links

          Activity

          Hide
          dhruba borthakur added a comment -

          Here is a patch that changes this portion of code in the datanode to look like what is in trunk. Investigation is still on whether this patch actually solves the problem or not.

          Show
          dhruba borthakur added a comment - Here is a patch that changes this portion of code in the datanode to look like what is in trunk. Investigation is still on whether this patch actually solves the problem or not.
          Hide
          Tsz Wo Nicholas Sze added a comment -
          • DFSBufferedOutputStream.out and DFSBufferedOutputStream.flush() do not change anything since DFSBufferedOutputStream.out is never used and DFSBufferedOutputStream.flush() is already synchronized in super.flush().
          • putting DFSBufferedOutputStream.count inside a synchronized method should be useful since count may be being updated in another synchronized method.
          Show
          Tsz Wo Nicholas Sze added a comment - DFSBufferedOutputStream.out and DFSBufferedOutputStream.flush() do not change anything since DFSBufferedOutputStream.out is never used and DFSBufferedOutputStream.flush() is already synchronized in super.flush(). putting DFSBufferedOutputStream.count inside a synchronized method should be useful since count may be being updated in another synchronized method.
          Hide
          Raghu Angadi added a comment - - edited

          I think we should still explain how that can lead to what we saw :

          One particular case I looked at shows one datanode does not write 64k of data (or overwrites last 64k):

          The last (third) data node in the pipeline failed with :

          2008-03-17 20:38:01,928 INFO org.apache.hadoop.dfs.DataNode: Changing block file offset of block blk_7114623733442731588 from 85983232 to 86048768 meta file offset to 672263
          2008-03-17 20:38:01,928 INFO org.apache.hadoop.dfs.DataNode: Exception in receiveBlock for block blk_7114623733442731588 java.io.IOException: Trying to change block file offset of block blk_7114623733442731588 to 86048768 but actual size of file is 85983232
          

          The client retried with remaining DNs and succeded.

          Say 'x' == 85983232.

          Block file in tmp dir on bad datanode is x bytes long and meta data file is 672263 bytes long.

          Data from failed datanode and a good datanode for this block shows that data till x-64k matches on both. 64k at at x-64k on bad datanode matches 64k at x at the good datanode. The meta data file data matches on both side. So this shows that the bad datanode either some how did not write the last but one packet or overwrote it with the last packet. Each packet has 64k of real data.

          Show
          Raghu Angadi added a comment - - edited I think we should still explain how that can lead to what we saw : One particular case I looked at shows one datanode does not write 64k of data (or overwrites last 64k): The last (third) data node in the pipeline failed with : 2008-03-17 20:38:01,928 INFO org.apache.hadoop.dfs.DataNode: Changing block file offset of block blk_7114623733442731588 from 85983232 to 86048768 meta file offset to 672263 2008-03-17 20:38:01,928 INFO org.apache.hadoop.dfs.DataNode: Exception in receiveBlock for block blk_7114623733442731588 java.io.IOException: Trying to change block file offset of block blk_7114623733442731588 to 86048768 but actual size of file is 85983232 The client retried with remaining DNs and succeded. Say 'x' == 85983232. Block file in tmp dir on bad datanode is x bytes long and meta data file is 672263 bytes long. Data from failed datanode and a good datanode for this block shows that data till x-64k matches on both. 64k at at x-64k on bad datanode matches 64k at x at the good datanode. The meta data file data matches on both side. So this shows that the bad datanode either some how did not write the last but one packet or overwrote it with the last packet. Each packet has 64k of real data.
          Hide
          Raghu Angadi added a comment -

          Earlier in the same log file for another block :

          2008-03-17 20:03:06,945 INFO org.apache.hadoop.dfs.DataNode: Changing block file offset of block blk_-8611433653592759099 from 84869120 to 84934656 meta file offset to 663559
          2008-03-17 20:03:08,128 INFO org.apache.hadoop.dfs.DataNode: Changing block file offset of block blk_-8611433653592759099 from 114294784 to114360320 meta file offset to 893447
          2008-03-17 20:03:09,057 INFO org.apache.hadoop.dfs.DataNode: Received block blk_-8611433653592759099 of size 134217728 from /a.b.c.79
          

          This is strange too. The BlockReceiver thread calls out.write() then the above 'Changing ...' message is logged. But the check inside setChannelPosition() and it did not result in an exception. So something changed the channels position in the mean time (during this, there were no writes to 'out').

          Show
          Raghu Angadi added a comment - Earlier in the same log file for another block : 2008-03-17 20:03:06,945 INFO org.apache.hadoop.dfs.DataNode: Changing block file offset of block blk_-8611433653592759099 from 84869120 to 84934656 meta file offset to 663559 2008-03-17 20:03:08,128 INFO org.apache.hadoop.dfs.DataNode: Changing block file offset of block blk_-8611433653592759099 from 114294784 to114360320 meta file offset to 893447 2008-03-17 20:03:09,057 INFO org.apache.hadoop.dfs.DataNode: Received block blk_-8611433653592759099 of size 134217728 from /a.b.c.79 This is strange too. The BlockReceiver thread calls out.write() then the above 'Changing ...' message is logged. But the check inside setChannelPosition() and it did not result in an exception. So something changed the channels position in the mean time (during this, there were no writes to 'out').
          Hide
          dhruba borthakur added a comment -

          I think I have a possible explanation of this behaviour... it is a race between a flush and setChannelPosition. The BufferedOutputStream.flush and BufferedOutputStream.write are mutually exclusive, but there is no guarantee that there is any mutual exclusion between a BufferedOutputStream.flush and setChannelPosition.

          The BlockReceiver thread must be in the executing a getChannelPosition/setChannelPosition while processing the current packet. However, the previous packet that was sitting in the buffer of the BufferedOutputSteam is being concurrently flushed by the PacketResponder thread.

          Show
          dhruba borthakur added a comment - I think I have a possible explanation of this behaviour... it is a race between a flush and setChannelPosition. The BufferedOutputStream.flush and BufferedOutputStream.write are mutually exclusive, but there is no guarantee that there is any mutual exclusion between a BufferedOutputStream.flush and setChannelPosition. The BlockReceiver thread must be in the executing a getChannelPosition/setChannelPosition while processing the current packet. However, the previous packet that was sitting in the buffer of the BufferedOutputSteam is being concurrently flushed by the PacketResponder thread.
          Hide
          Raghu Angadi added a comment -

          I don't think it explains above logs. There is a out.flush() right before setChannelPosition(). So there is no data and count is zero before setChannelPostion(). setChannelPosition() and out.write() are called by the same thread, right?

          Show
          Raghu Angadi added a comment - I don't think it explains above logs. There is a out.flush() right before setChannelPosition(). So there is no data and count is zero before setChannelPostion(). setChannelPosition() and out.write() are called by the same thread, right?
          Hide
          dhruba borthakur added a comment -

          I think the value of count fetched by the BlockResponder thread is stale. Especially because 'count' is not a volatile and no locking is enforced while accesing this variable. This value is probably being fetched from the CPU's cache and since no locking is enforced, it is getting a stale value.

          Show
          dhruba borthakur added a comment - I think the value of count fetched by the BlockResponder thread is stale. Especially because 'count' is not a volatile and no locking is enforced while accesing this variable. This value is probably being fetched from the CPU's cache and since no locking is enforced, it is getting a stale value.
          Hide
          Raghu Angadi added a comment -

          ok, I still don't see how it explains the error. I was trying to see a sequence of events that lead to this particular error... such a sequence would clarify better I think.

          As I see it, irrespective of value of 'count', 'out' is flushed completely before calling setChannelPosition(). I will chat with you to see how it happens.

          Show
          Raghu Angadi added a comment - ok, I still don't see how it explains the error. I was trying to see a sequence of events that lead to this particular error... such a sequence would clarify better I think. As I see it, irrespective of value of 'count', 'out' is flushed completely before calling setChannelPosition(). I will chat with you to see how it happens.
          Hide
          dhruba borthakur added a comment -

          t is possible that the output stream created using RandomAccessFile caches data in some underlying layer. Here is a patch that incorprates Raghu's tip that it is safe to make the Blockreceiver thread invoke the flush too. This patch works!

          Show
          dhruba borthakur added a comment - t is possible that the output stream created using RandomAccessFile caches data in some underlying layer. Here is a patch that incorprates Raghu's tip that it is safe to make the Blockreceiver thread invoke the flush too. This patch works!
          Hide
          Tsz Wo Nicholas Sze added a comment -

          +1 codes look good

          Some minor comments:

          • It might be better to use ackQueue lock instead of PacketResponder.this for ackQueue related synchronization.
          • Should check LOG.isDebugEnabled() before printing debug messages. Then the message will not be constructed (i.e. better performance) if debug is not enabled.
          Show
          Tsz Wo Nicholas Sze added a comment - +1 codes look good Some minor comments: It might be better to use ackQueue lock instead of PacketResponder.this for ackQueue related synchronization. Should check LOG.isDebugEnabled() before printing debug messages. Then the message will not be constructed (i.e. better performance) if debug is not enabled.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12378104/badnode2.patch
          against trunk revision 619744.

          @author +1. The patch does not contain any @author tags.

          tests included -1. The patch doesn't appear to include any new or modified tests.
          Please justify why no tests are needed for this patch.

          javadoc +1. The javadoc tool did not generate any warning messages.

          javac +1. The applied patch does not generate any new javac compiler warnings.

          release audit +1. The applied patch does not generate any new release audit warnings.

          findbugs +1. The patch does not introduce any new Findbugs warnings.

          core tests +1. The patch passed core unit tests.

          contrib tests +1. The patch passed contrib unit tests.

          Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/1990/testReport/
          Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/1990/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
          Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/1990/artifact/trunk/build/test/checkstyle-errors.html
          Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/1990/console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12378104/badnode2.patch against trunk revision 619744. @author +1. The patch does not contain any @author tags. tests included -1. The patch doesn't appear to include any new or modified tests. Please justify why no tests are needed for this patch. javadoc +1. The javadoc tool did not generate any warning messages. javac +1. The applied patch does not generate any new javac compiler warnings. release audit +1. The applied patch does not generate any new release audit warnings. findbugs +1. The patch does not introduce any new Findbugs warnings. core tests +1. The patch passed core unit tests. contrib tests +1. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/1990/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/1990/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/1990/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/1990/console This message is automatically generated.
          Hide
          dhruba borthakur added a comment -

          Incorporated Nicholas' review comments about logging debug log messages.
          I just committed this.

          Show
          dhruba borthakur added a comment - Incorporated Nicholas' review comments about logging debug log messages. I just committed this.
          Hide
          Hudson added a comment -
          Show
          Hudson added a comment - Integrated in Hadoop-trunk #433 (See http://hudson.zones.apache.org/hudson/job/Hadoop-trunk/433/ )

            People

            • Assignee:
              dhruba borthakur
              Reporter:
              dhruba borthakur
            • Votes:
              0 Vote for this issue
              Watchers:
              0 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development