Details
-
Bug
-
Status: Open
-
Major
-
Resolution: Unresolved
-
2.7.3
-
None
-
None
-
HDP 2.6.5 and HDP 2.6.2
HotSpot 8u192 and 8u92
Linux Redhat 3.10.0-862.14.4.el7.x86_64
Description
When authentication is activated there is no keep-alive on http(s) connections.
That's because the JDK Http(s)URLConnection explicitly closes the connection after the HTTP 401 that negotiate the authentication.
This lead to poor performance, especially when encryption is on.
To see the issue, simply strace and compare the number of connection between hdfs implementation and curl:
$ strace -T -tt -f hdfs dfs -ls swebhdfs://dtltstap009.fr.world.socgen:50470/user 2>&1 | grep "sin_port=htons(50470)" [pid 92879] 15:11:47.019865 connect(386, {sa_family=AF_INET, sin_port=htons(50470), sin_addr=inet_addr("192.163.201.117")}, 16) = -1 EINPROGRESS (Operation now in progress) <0.000157> [pid 92879] 15:11:47.182110 connect(386, {sa_family=AF_INET, sin_port=htons(50470), sin_addr=inet_addr("192.163.201.117")}, 16 <unfinished ...> [pid 92879] 15:11:47.387073 connect(386, {sa_family=AF_INET, sin_port=htons(50470), sin_addr=inet_addr("192.163.201.117")}, 16) = -1 EINPROGRESS (Operation now in progress) <0.000167> [pid 92879] 15:11:47.429716 connect(386, {sa_family=AF_INET, sin_port=htons(50470), sin_addr=inet_addr("192.163.201.117")}, 16 <unfinished ...> [pid 93116] 15:11:47.528073 connect(386, {sa_family=AF_INET, sin_port=htons(50470), sin_addr=inet_addr("192.163.201.117")}, 16) = -1 EINPROGRESS (Operation now in progress) <0.000110> [pid 93116] 15:11:47.566947 connect(386, {sa_family=AF_INET, sin_port=htons(50470), sin_addr=inet_addr("192.163.201.117")}, 16 <unfinished ...> => 6 connect
$ strace -T -tt -f curl --negotiate -u: -v https://dtltstap009.fr.world.socgen:50470/webhdfs/v1/user/?op=GETFILESTATUS 2>&1 | grep "sin_port=htons(50470)" 15:10:53.671358 connect(3, {sa_family=AF_INET, sin_port=htons(50470), sin_addr=inet_addr("192.163.201.117")}, 16) = -1 EINPROGRESS (Operation now in progress) <0.000118> 15:10:53.683513 getpeername(3, {sa_family=AF_INET, sin_port=htons(50470), sin_addr=inet_addr("192.163.201.117")}, [16]) = 0 <0.000009> 15:10:53.869482 getpeername(3, {sa_family=AF_INET, sin_port=htons(50470), sin_addr=inet_addr("192.163.201.117")}, [16]) = 0 <0.000009> 15:10:53.869576 getpeername(3, {sa_family=AF_INET, sin_port=htons(50470), sin_addr=inet_addr("192.163.201.117")}, [16]) = 0 <0.000008> [bash-4.2.46][j:0|h:4961|?:0][2019-06-21 15:10:53][dtlprd05@nazare:~/test-hdfs] => only one connect
In addition, even without encryption, too many connection are used:
$ strace -T -tt -f hdfs dfs -ls webhdfs://dtltstap009.fr.world.socgen:50070/user 2>&1 | grep "sin_port=htons(50070)" [pid 99569] 15:13:13.838257 connect(386, {sa_family=AF_INET, sin_port=htons(50070), sin_addr=inet_addr("192.163.201.117")}, 16) = -1 EINPROGRESS (Operation now in progress) <0.000119> [pid 99569] 15:13:13.904255 connect(386, {sa_family=AF_INET, sin_port=htons(50070), sin_addr=inet_addr("192.163.201.117")}, 16 <unfinished ...> [pid 99635] 15:13:14.201236 connect(386, {sa_family=AF_INET, sin_port=htons(50070), sin_addr=inet_addr("192.163.201.117")}, 16 <unfinished ...> => 3 connect
Looking in the JDK code, https://github.com/openjdk/jdk/blob/jdk8-b120/jdk/src/share/classes/sun/net/www/protocol/http/HttpURLConnection.java
serverAuthentication = getServerAuthentication(srvHdr); currentServerCredentials = serverAuthentication; if (serverAuthentication != null) { disconnectWeb(); redirects++; // don't let things loop ad nauseum setCookieHeader(); continue; }
disconnectWeb() will close the connection (no keep alive reuse)
Finally we have some unexplained webhdfs command that are stucked in sun.security.ssl.SSLSocketImpl.performInitialHandshake(SSLSocketImpl.java:1375):
-) for hdfs dfs commands with swebhdfs schema
-) for some TEZ job using the same implementation for the shuffle service when encryption is on
All other services (typically RPC) are working fine on the cluster.
It really seams that Http(s)URLConnection causes some issues that Netty or HttpClient don't have.
Regards,