Issue Details (XML | Word | Printable)

Key: HADOOP-2971
Type: Bug Bug
Status: Closed Closed
Resolution: Fixed
Priority: Major Major
Assignee: Raghu Angadi
Reporter: Raghu Angadi
Votes: 0
Watchers: 1
Operations

If you were logged in you would be able to see more operations.
Hadoop Common

SocketTimeoutException in unit tests

Created: 07/Mar/08 09:42 PM   Updated: 21/May/08 08:05 PM
Return to search
Component/s: io
Affects Version/s: 0.17.0
Fix Version/s: 0.17.0

Time Tracking:
Not Specified

File Attachments:
  Size
Text File Licensed for inclusion in ASF works HADOOP-2971.patch 2008-03-10 05:24 PM Raghu Angadi 3 kB
Text File Licensed for inclusion in ASF works HADOOP-2971.patch 2008-03-08 01:32 AM Raghu Angadi 3 kB

Resolution Date: 11/Mar/08 03:51 AM


 Description  « Hide

TestJobStatusPersistency failed and contained DataNode stacktraces similar to the following :

2008-03-07 21:27:00,410 ERROR dfs.DataNode (DataNode.java:run(976)) - 127.0.0.1:57790:DataXceiver: java.net.SocketTimeoutException: 0 millis 
timeout while waiting for Unknown Addr (local: /127.0.0.1:57790) to be ready for read
        at org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:188)
        at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:135)
        at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:121)
        at java.io.BufferedInputStream.fill(BufferedInputStream.java:218)
        at java.io.BufferedInputStream.read(BufferedInputStream.java:237)
        at java.io.DataInputStream.readInt(DataInputStream.java:370)
        at org.apache.hadoop.dfs.DataNode$BlockReceiver.receiveBlock(DataNode.java:2434)
        at org.apache.hadoop.dfs.DataNode$DataXceiver.writeBlock(DataNode.java:1170)
        at org.apache.hadoop.dfs.DataNode$DataXceiver.run(DataNode.java:953)
        at java.lang.Thread.run(Thread.java:619)

This is mostly related to HADOOP-2346. The error is strange. socket.getRemoteSocketAddress() returned null implying this socket is not connected yet. But we have already read a few bytes from it!.



 All   Comments   Work Log   Change History   Subversion Commits      Sort Order: Ascending order - Click to sort in descending order
Raghu Angadi added a comment - 08/Mar/08 12:55 AM
I am willing to bet that for random reasons Java select() returns 0, irrespective of timeout. So we need to keep track how long we have waited. Oddly, when the test passes, there are no instances of these. But when the test fails, there are lot of instances of this.

Raghu Angadi added a comment - 08/Mar/08 01:32 AM - edited
I thought I could avoid calling System.currentTimeMillis() while waiting and depend on select(). Tough luck.

The attached patch polls in a loop until timeout passes. Also removes a large block for setting "channeStr". we use channel.toString() instead.


Raghu Angadi added a comment - 10/Mar/08 05:24 PM
minor modification to patch.

Tsz Wo (Nicholas), SZE added a comment - 10/Mar/08 05:58 PM
+1 codes look good

Raghu Angadi added a comment - 10/Mar/08 06:06 PM
Thanks Nicholas.

Raghu Angadi added a comment - 10/Mar/08 10:11 PM
Still waiting for Hudson blessings. Please apply this patch if you are seeing random unit test failures before investigating further.

Hadoop QA added a comment - 11/Mar/08 03:15 AM
-1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12377535/HADOOP-2971.patch
against trunk revision 619744.

@author +1. The patch does not contain any @author tags.

tests included -1. The patch doesn't appear to include any new or modified tests.
Please justify why no tests are needed for this patch.

javadoc +1. The javadoc tool did not generate any warning messages.

javac +1. The applied patch does not generate any new javac compiler warnings.

release audit +1. The applied patch does not generate any new release audit warnings.

findbugs +1. The patch does not introduce any new Findbugs warnings.

core tests -1. The patch failed core unit tests.

contrib tests +1. The patch passed contrib unit tests.

Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/1931/testReport/
Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/1931/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/1931/artifact/trunk/build/test/checkstyle-errors.html
Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/1931/console

This message is automatically generated.


Raghu Angadi added a comment - 11/Mar/08 03:50 AM
The fixes some of the sporadically failing tests.

Regd the core tests failed, I can't trace them to bug here but this might improve anyway.


Raghu Angadi added a comment - 11/Mar/08 03:51 AM
I just committed this.

Hudson added a comment - 12/Mar/08 12:18 PM