Hadoop Common
  1. Hadoop Common
  2. HADOOP-3678

Avoid spurious "DataXceiver: java.io.IOException: Connection reset by peer" errors in DataNode log

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Blocker Blocker
    • Resolution: Fixed
    • Affects Version/s: 0.17.0
    • Fix Version/s: 0.17.2
    • Component/s: None
    • Labels:
      None
    • Hadoop Flags:
      Reviewed
    • Release Note:
      Avoid spurious exceptions logged at DataNode when clients read from DFS.

      Description

      When a client reads data using read(), it closes the sockets after it is done. Often it might not read till the end of a block. The datanode on the other side keeps writing data until the client connection is closed or end of the block is reached. If the client does not read till the end of the block, Datanode writes an error message and stack trace to the datanode log. It should not. This is not an error and it just pollutes the log and confuses the user.

      1. HADOOP-3678.patch
        3 kB
        Raghu Angadi
      2. HADOOP-3678.patch
        3 kB
        Raghu Angadi
      3. HADOOP-3678-branch-17.patch
        1 kB
        Raghu Angadi

        Issue Links

          Activity

          Hide
          Raghu Angadi added a comment -

          Suggested patch. It does a minor code reorg in sendChunks().

          Note that while using transferTo(), we can not easily distinguish if the error occurred while reading from the disk or while writing to the client, though most likely its while writing to the client. One hack is to check the exception message, but that is usually not encouraged.

          Show
          Raghu Angadi added a comment - Suggested patch. It does a minor code reorg in sendChunks(). Note that while using transferTo(), we can not easily distinguish if the error occurred while reading from the disk or while writing to the client, though most likely its while writing to the client. One hack is to check the exception message, but that is usually not encouraged.
          Hide
          Tsz Wo Nicholas Sze added a comment -

          Patch looks good. Some minor comments:

          • It is better to compare Class object instead of String.
          • Could you add some message saying that the new SocketException is caused by the IOException?
          Show
          Tsz Wo Nicholas Sze added a comment - Patch looks good. Some minor comments: It is better to compare Class object instead of String. Could you add some message saying that the new SocketException is caused by the IOException?
          Hide
          Raghu Angadi added a comment -

          Thats better. Thanks Nicholas.

          Attached patch incorporates the changes.

          Show
          Raghu Angadi added a comment - Thats better. Thanks Nicholas. Attached patch incorporates the changes.
          Hide
          Tsz Wo Nicholas Sze added a comment -

          +1

          Show
          Tsz Wo Nicholas Sze added a comment - +1
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12385136/HADOOP-3678.patch
          against trunk revision 673445.

          +1 @author. The patch does not contain any @author tags.

          -1 tests included. The patch doesn't appear to include any new or modified tests.
          Please justify why no tests are needed for this patch.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 findbugs. The patch does not introduce any new Findbugs warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 core tests. The patch passed core unit tests.

          -1 contrib tests. The patch failed contrib unit tests.

          Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2783/testReport/
          Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2783/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
          Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2783/artifact/trunk/build/test/checkstyle-errors.html
          Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2783/console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12385136/HADOOP-3678.patch against trunk revision 673445. +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no tests are needed for this patch. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. -1 contrib tests. The patch failed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2783/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2783/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2783/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2783/console This message is automatically generated.
          Hide
          Raghu Angadi added a comment -

          Regd unit tests : This is a minor fix to avoid a spurious error. There is no functionality change.

          Show
          Raghu Angadi added a comment - Regd unit tests : This is a minor fix to avoid a spurious error. There is no functionality change.
          Hide
          Raghu Angadi added a comment -

          I just committed this.

          Show
          Raghu Angadi added a comment - I just committed this.
          Hide
          Rong-En Fan added a comment -

          Can we backport this into 0.17, too?

          Show
          Rong-En Fan added a comment - Can we backport this into 0.17, too?
          Hide
          Luo Ning added a comment -

          need a patch for 0.17.1 release too.

          Show
          Luo Ning added a comment - need a patch for 0.17.1 release too.
          Hide
          Hudson added a comment -
          Show
          Hudson added a comment - Integrated in Hadoop-trunk #537 (See http://hudson.zones.apache.org/hudson/job/Hadoop-trunk/537/ )
          Hide
          Raghu Angadi added a comment -

          I will attach a patch for 0.17.

          Show
          Raghu Angadi added a comment - I will attach a patch for 0.17.
          Hide
          Raghu Angadi added a comment -

          Patch for 0.17 attached.

          Show
          Raghu Angadi added a comment - Patch for 0.17 attached.
          Hide
          Rong-En Fan added a comment -

          can we fix this in 0.17.2, too?

          Show
          Rong-En Fan added a comment - can we fix this in 0.17.2, too?
          Hide
          Raghu Angadi added a comment -

          +1 for 0.17.2.

          Show
          Raghu Angadi added a comment - +1 for 0.17.2.
          Hide
          Raghu Angadi added a comment -

          Reopening it to recommend it for 0.17.2.

          Show
          Raghu Angadi added a comment - Reopening it to recommend it for 0.17.2.
          Hide
          Raghu Angadi added a comment -

          I just committed this to 0.17.2 and moved CHANGES.txt entry on the other lines.

          Show
          Raghu Angadi added a comment - I just committed this to 0.17.2 and moved CHANGES.txt entry on the other lines.
          Hide
          Hudson added a comment -
          Show
          Hudson added a comment - Integrated in Hadoop-trunk #581 (See http://hudson.zones.apache.org/hudson/job/Hadoop-trunk/581/ )
          Hide
          Cosmin Lehene added a comment -

          I'm getting this with Hadoop 0.18.2. Reopening

          2008-12-10 02:00:17,059 WARN org.apache.hadoop.dfs.DataNode: DatanodeRegistration(10.72.7.153:50010, storageID=DS-594776053-127.0.0.1-50010-1223411294140, infoPort=50075, ipcPort=50020):Got exception w
          hile serving blk_698267671609892267_112619 to /10.72.7.153:
          java.io.IOException: Connection reset by peer
          at sun.nio.ch.FileDispatcher.write0(Native Method)
          at sun.nio.ch.SocketDispatcher.write(SocketDispatcher.java:29)
          at sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:104)
          at sun.nio.ch.IOUtil.write(IOUtil.java:75)
          at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:334)
          at org.apache.hadoop.net.SocketOutputStream$Writer.performIO(SocketOutputStream.java:55)
          at org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:140)
          at org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:146)
          at org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:107)
          at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:65)
          at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:123)
          at java.io.DataOutputStream.flush(DataOutputStream.java:106)
          at org.apache.hadoop.dfs.DataNode$BlockSender.sendBlock(DataNode.java:2019)
          at org.apache.hadoop.dfs.DataNode$DataXceiver.readBlock(DataNode.java:1140)
          at org.apache.hadoop.dfs.DataNode$DataXceiver.run(DataNode.java:1068)
          at java.lang.Thread.run(Thread.java:619)

          2008-12-10 02:00:17,060 ERROR org.apache.hadoop.dfs.DataNode: DatanodeRegistration(10.72.7.153:50010, storageID=DS-594776053-127.0.0.1-50010-1223411294140, infoPort=50075, ipcPort=50020):DataXceiver: j
          ava.io.IOException: Connection reset by peer
          at sun.nio.ch.FileDispatcher.write0(Native Method)
          at sun.nio.ch.SocketDispatcher.write(SocketDispatcher.java:29)
          at sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:104)
          at sun.nio.ch.IOUtil.write(IOUtil.java:75)
          at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:334)
          at org.apache.hadoop.net.SocketOutputStream$Writer.performIO(SocketOutputStream.java:55)
          at org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:140)
          at org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:146)
          at org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:107)
          at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:65)
          at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:123)
          at java.io.DataOutputStream.flush(DataOutputStream.java:106)
          at org.apache.hadoop.dfs.DataNode$BlockSender.sendBlock(DataNode.java:2019)
          at org.apache.hadoop.dfs.DataNode$DataXceiver.readBlock(DataNode.java:1140)
          at org.apache.hadoop.dfs.DataNode$DataXceiver.run(DataNode.java:1068)
          at java.lang.Thread.run(Thread.java:619)

          Show
          Cosmin Lehene added a comment - I'm getting this with Hadoop 0.18.2. Reopening 2008-12-10 02:00:17,059 WARN org.apache.hadoop.dfs.DataNode: DatanodeRegistration(10.72.7.153:50010, storageID=DS-594776053-127.0.0.1-50010-1223411294140, infoPort=50075, ipcPort=50020):Got exception w hile serving blk_698267671609892267_112619 to /10.72.7.153: java.io.IOException: Connection reset by peer at sun.nio.ch.FileDispatcher.write0(Native Method) at sun.nio.ch.SocketDispatcher.write(SocketDispatcher.java:29) at sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:104) at sun.nio.ch.IOUtil.write(IOUtil.java:75) at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:334) at org.apache.hadoop.net.SocketOutputStream$Writer.performIO(SocketOutputStream.java:55) at org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:140) at org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:146) at org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:107) at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:65) at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:123) at java.io.DataOutputStream.flush(DataOutputStream.java:106) at org.apache.hadoop.dfs.DataNode$BlockSender.sendBlock(DataNode.java:2019) at org.apache.hadoop.dfs.DataNode$DataXceiver.readBlock(DataNode.java:1140) at org.apache.hadoop.dfs.DataNode$DataXceiver.run(DataNode.java:1068) at java.lang.Thread.run(Thread.java:619) 2008-12-10 02:00:17,060 ERROR org.apache.hadoop.dfs.DataNode: DatanodeRegistration(10.72.7.153:50010, storageID=DS-594776053-127.0.0.1-50010-1223411294140, infoPort=50075, ipcPort=50020):DataXceiver: j ava.io.IOException: Connection reset by peer at sun.nio.ch.FileDispatcher.write0(Native Method) at sun.nio.ch.SocketDispatcher.write(SocketDispatcher.java:29) at sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:104) at sun.nio.ch.IOUtil.write(IOUtil.java:75) at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:334) at org.apache.hadoop.net.SocketOutputStream$Writer.performIO(SocketOutputStream.java:55) at org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:140) at org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:146) at org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:107) at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:65) at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:123) at java.io.DataOutputStream.flush(DataOutputStream.java:106) at org.apache.hadoop.dfs.DataNode$BlockSender.sendBlock(DataNode.java:2019) at org.apache.hadoop.dfs.DataNode$DataXceiver.readBlock(DataNode.java:1140) at org.apache.hadoop.dfs.DataNode$DataXceiver.run(DataNode.java:1068) at java.lang.Thread.run(Thread.java:619)
          Hide
          Raghu Angadi added a comment -

          Aah.. missed the last 4 bytes after sending the block data. will have a fix. I don't think this should be a blocker but better to be in 0.18.3.

          Show
          Raghu Angadi added a comment - Aah.. missed the last 4 bytes after sending the block data. will have a fix. I don't think this should be a blocker but better to be in 0.18.3.
          Hide
          Raghu Angadi added a comment -

          Cosmin, I filed HADOOP-4862 to fix the log you are seeing.

          Show
          Raghu Angadi added a comment - Cosmin, I filed HADOOP-4862 to fix the log you are seeing.
          Hide
          Hudson added a comment -
          Show
          Hudson added a comment - Integrated in Hadoop-trunk #756 (See http://hudson.zones.apache.org/hudson/job/Hadoop-trunk/756/ )

            People

            • Assignee:
              Raghu Angadi
              Reporter:
              Raghu Angadi
            • Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development