Uploaded image for project: 'Hadoop HDFS'
  1. Hadoop HDFS
  2. HDFS-14594

Replace all Http(s)URLConnection

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 2.7.3
    • None
    • webhdfs
    • None
    • HDP 2.6.5 and HDP 2.6.2

      HotSpot 8u192 and 8u92

      Linux Redhat 3.10.0-862.14.4.el7.x86_64

    Description

      When authentication is activated there is no keep-alive on http(s) connections.

      That's because the JDK Http(s)URLConnection explicitly closes the connection after the HTTP 401 that negotiate the authentication.

      This lead to poor performance, especially when encryption is on.

      To see the issue, simply strace and compare the number of connection between hdfs implementation and curl:

      $ strace -T -tt -f hdfs dfs -ls swebhdfs://dtltstap009.fr.world.socgen:50470/user 2>&1 | grep "sin_port=htons(50470)" 
      [pid 92879] 15:11:47.019865 connect(386, {sa_family=AF_INET, sin_port=htons(50470), sin_addr=inet_addr("192.163.201.117")}, 16) = -1 EINPROGRESS (Operation now in progress) <0.000157>
      [pid 92879] 15:11:47.182110 connect(386, {sa_family=AF_INET, sin_port=htons(50470), sin_addr=inet_addr("192.163.201.117")}, 16 <unfinished ...>
      [pid 92879] 15:11:47.387073 connect(386, {sa_family=AF_INET, sin_port=htons(50470), sin_addr=inet_addr("192.163.201.117")}, 16) = -1 EINPROGRESS (Operation now in progress) <0.000167>
      [pid 92879] 15:11:47.429716 connect(386, {sa_family=AF_INET, sin_port=htons(50470), sin_addr=inet_addr("192.163.201.117")}, 16 <unfinished ...>
      [pid 93116] 15:11:47.528073 connect(386, {sa_family=AF_INET, sin_port=htons(50470), sin_addr=inet_addr("192.163.201.117")}, 16) = -1 EINPROGRESS (Operation now in progress) <0.000110>
      [pid 93116] 15:11:47.566947 connect(386, {sa_family=AF_INET, sin_port=htons(50470), sin_addr=inet_addr("192.163.201.117")}, 16 <unfinished ...>
      
      => 6 connect
      $ strace -T -tt -f curl --negotiate -u: -v https://dtltstap009.fr.world.socgen:50470/webhdfs/v1/user/?op=GETFILESTATUS 2>&1 | grep "sin_port=htons(50470)" 
      15:10:53.671358 connect(3, {sa_family=AF_INET, sin_port=htons(50470), sin_addr=inet_addr("192.163.201.117")}, 16) = -1 EINPROGRESS (Operation now in progress) <0.000118>
      15:10:53.683513 getpeername(3, {sa_family=AF_INET, sin_port=htons(50470), sin_addr=inet_addr("192.163.201.117")}, [16]) = 0 <0.000009>
      15:10:53.869482 getpeername(3, {sa_family=AF_INET, sin_port=htons(50470), sin_addr=inet_addr("192.163.201.117")}, [16]) = 0 <0.000009>
      15:10:53.869576 getpeername(3, {sa_family=AF_INET, sin_port=htons(50470), sin_addr=inet_addr("192.163.201.117")}, [16]) = 0 <0.000008>
      [bash-4.2.46][j:0|h:4961|?:0][2019-06-21 15:10:53][dtlprd05@nazare:~/test-hdfs]
      
      => only one connect

       

      In addition, even without encryption, too many connection are used:

      $ strace -T -tt -f hdfs dfs -ls webhdfs://dtltstap009.fr.world.socgen:50070/user 2>&1 | grep "sin_port=htons(50070)" 
      [pid 99569] 15:13:13.838257 connect(386, {sa_family=AF_INET, sin_port=htons(50070), sin_addr=inet_addr("192.163.201.117")}, 16) = -1 EINPROGRESS (Operation now in progress) <0.000119>
      [pid 99569] 15:13:13.904255 connect(386, {sa_family=AF_INET, sin_port=htons(50070), sin_addr=inet_addr("192.163.201.117")}, 16 <unfinished ...>
      [pid 99635] 15:13:14.201236 connect(386, {sa_family=AF_INET, sin_port=htons(50070), sin_addr=inet_addr("192.163.201.117")}, 16 <unfinished ...>
      
      => 3 connect

       

      Looking in the JDK code, https://github.com/openjdk/jdk/blob/jdk8-b120/jdk/src/share/classes/sun/net/www/protocol/http/HttpURLConnection.java

      serverAuthentication = getServerAuthentication(srvHdr);
      currentServerCredentials = serverAuthentication;
      
      if (serverAuthentication != null) {
          disconnectWeb();
          redirects++; // don't let things loop ad nauseum
          setCookieHeader();
          continue;
      }

      disconnectWeb() will close the connection (no keep alive reuse)

      Finally we have some unexplained webhdfs command that are stucked in sun.security.ssl.SSLSocketImpl.performInitialHandshake(SSLSocketImpl.java:1375):

      -) for hdfs dfs commands with swebhdfs schema

      -) for some TEZ job using the same implementation for the shuffle service when encryption is on

      All other services (typically RPC) are working fine on the cluster.

      It really seams that Http(s)URLConnection causes some issues that Netty or HttpClient don't have.

      Regards,

       

       

       

      Attachments

        Activity

          People

            Unassigned Unassigned
            Sebastien Barnoud Sebastien Barnoud
            Votes:
            0 Vote for this issue
            Watchers:
            11 Start watching this issue

            Dates

              Created:
              Updated: