Details
-
Bug
-
Status: Open
-
Major
-
Resolution: Unresolved
-
None
-
None
-
None
Description
In some specific circumstances, org.apache.hadoop.hdfs.DistributedFileSystem.open(invalid URI) never timeouts and last forever.
What are specific circumstances:
1) HDFS URI (hdfs://share.example.com:8020/someDir/someFile.txt) should point to valid IP address but without name node service running on it.
2) There should be at least 2 IP addresses for such a URI. See output below:
/proj/quickbox$ nslookup share.example.com
Server: 127.0.1.1
Address: 127.0.1.1#53share.example.com canonical name = internal-realm-share-example-com-1234.us-east-1.elb.amazonaws.com.
Name: internal-realm-share-example-com-1234.us-east-1.elb.amazonaws.com
Address: 192.168.1.223
Name: internal-realm-share-example-com-1234.us-east-1.elb.amazonaws.com
Address: 192.168.1.65
In such a case the org.apache.hadoop.ipc.Client.Connection.updateAddress() returns sometimes true (even if address didn't actually changed see img. 1) and the timeoutFailures counter is set to 0 (see img. 2). The maxRetriesOnSocketTimeouts (45) is never reached and connection attempt is repeated forever.