Uploaded image for project: 'Hadoop Common'
  1. Hadoop Common
  2. HADOOP-12653

Use SO_REUSEADDR to avoid getting "Address already in use" when using kerberos and attempting to bind to any port on the local IP address

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 2.4.0
    • Fix Version/s: 2.9.0, 3.0.0-alpha1
    • Component/s: net
    • Labels:
      None
    • Target Version/s:

      Description

      Client.java can get "Address already in use" when using kerberos and attempting to bind to any port on the local IP address. It appears to be caused by the host running out of ports in the ephemeral range.

      1. HADOOP-12653.001.patch
        0.8 kB
        Colin P. McCabe

        Activity

        Hide
        cmccabe Colin P. McCabe added a comment -

        The code that's having the problem is here:

        /*
         * Bind the socket to the host specified in the principal name of the
         * client, to ensure Server matching address of the client connection
         * to host name in principal passed.
         */
        UserGroupInformation ticket = remoteId.getTicket();
        if (ticket != null && ticket.hasKerberosCredentials()) {
          KerberosInfo krbInfo = 
            remoteId.getProtocol().getAnnotation(KerberosInfo.class);
          if (krbInfo != null && krbInfo.clientPrincipal() != null) {
            String host = 
              SecurityUtil.getHostFromPrincipal(remoteId.getTicket().getUserName());
            
            // If host name is a valid local address then bind socket to it
            InetAddress localAddr = NetUtils.getLocalInetAddress(host);
            if (localAddr != null) {
              this.socket.bind(new InetSocketAddress(localAddr, 0));  <=== HERE
            }
          }
        

        You can see that this is binding to port 0, so the usual explanations for getting "address already in use" are not relevant here.

        There is a discussion here: https://idea.popcount.org/2014-04-03-bind-before-connect/

        It's kind of a confusing issue, but it boils down to:

        • Every TCP connection is identified by a unique 4-tuple of (src ip, src port, dst ip, dst port).
        • Calling bind-then-connect imposes restrictions on what src port can be that simply calling connect does not. Specifically bind has to choose a port without knowing what dst ip and dst port will be, meaning that it has to be more conservative to ensure global uniqueness.

        I think using SO_REUSEADDR can help here. It's a bit confusing since that also opens us up to getting EADDRNOTAVAIL. If I'm reading this right, though, that error code would only happen in the rare case where two threads happened to get into the critical section between bind and connect at the same time AND choose the same source port. We could either retry in that case or ignore it and rely on higher-level retry mechanisms to kick in.

        Show
        cmccabe Colin P. McCabe added a comment - The code that's having the problem is here: /* * Bind the socket to the host specified in the principal name of the * client, to ensure Server matching address of the client connection * to host name in principal passed. */ UserGroupInformation ticket = remoteId.getTicket(); if (ticket != null && ticket.hasKerberosCredentials()) { KerberosInfo krbInfo = remoteId.getProtocol().getAnnotation(KerberosInfo.class); if (krbInfo != null && krbInfo.clientPrincipal() != null ) { String host = SecurityUtil.getHostFromPrincipal(remoteId.getTicket().getUserName()); // If host name is a valid local address then bind socket to it InetAddress localAddr = NetUtils.getLocalInetAddress(host); if (localAddr != null ) { this .socket.bind( new InetSocketAddress(localAddr, 0)); <=== HERE } } You can see that this is binding to port 0, so the usual explanations for getting "address already in use" are not relevant here. There is a discussion here: https://idea.popcount.org/2014-04-03-bind-before-connect/ It's kind of a confusing issue, but it boils down to: Every TCP connection is identified by a unique 4-tuple of (src ip, src port, dst ip, dst port). Calling bind-then-connect imposes restrictions on what src port can be that simply calling connect does not. Specifically bind has to choose a port without knowing what dst ip and dst port will be, meaning that it has to be more conservative to ensure global uniqueness. I think using SO_REUSEADDR can help here. It's a bit confusing since that also opens us up to getting EADDRNOTAVAIL . If I'm reading this right, though, that error code would only happen in the rare case where two threads happened to get into the critical section between bind and connect at the same time AND choose the same source port. We could either retry in that case or ignore it and rely on higher-level retry mechanisms to kick in.
        Hide
        hadoopqa Hadoop QA added a comment -
        -1 overall



        Vote Subsystem Runtime Comment
        0 reexec 0m 0s Docker mode activated.
        +1 @author 0m 0s The patch does not contain any @author tags.
        -1 test4tests 0m 0s The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch.
        +1 mvninstall 7m 26s trunk passed
        +1 compile 8m 12s trunk passed with JDK v1.8.0_66
        +1 compile 9m 7s trunk passed with JDK v1.7.0_91
        +1 checkstyle 0m 15s trunk passed
        +1 mvnsite 1m 1s trunk passed
        +1 mvneclipse 0m 14s trunk passed
        +1 findbugs 1m 51s trunk passed
        +1 javadoc 0m 54s trunk passed with JDK v1.8.0_66
        +1 javadoc 1m 5s trunk passed with JDK v1.7.0_91
        +1 mvninstall 1m 34s the patch passed
        +1 compile 8m 7s the patch passed with JDK v1.8.0_66
        +1 javac 8m 7s the patch passed
        +1 compile 8m 56s the patch passed with JDK v1.7.0_91
        +1 javac 8m 56s the patch passed
        +1 checkstyle 0m 15s the patch passed
        +1 mvnsite 1m 3s the patch passed
        +1 mvneclipse 0m 14s the patch passed
        +1 whitespace 0m 0s Patch has no whitespace issues.
        +1 findbugs 2m 2s the patch passed
        +1 javadoc 0m 52s the patch passed with JDK v1.8.0_66
        +1 javadoc 1m 5s the patch passed with JDK v1.7.0_91
        -1 unit 18m 58s hadoop-common in the patch failed with JDK v1.8.0_66.
        -1 unit 6m 54s hadoop-common in the patch failed with JDK v1.7.0_91.
        +1 asflicense 0m 20s Patch does not generate ASF License warnings.
        81m 30s



        Reason Tests
        JDK v1.8.0_66 Timed out junit tests org.apache.hadoop.http.TestHttpServerLifecycle
        JDK v1.7.0_91 Failed junit tests hadoop.ipc.TestIPC
          hadoop.metrics2.impl.TestGangliaMetrics



        Subsystem Report/Notes
        Docker Image:yetus/hadoop:0ca8df7
        JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12778357/HADOOP-12653.001.patch
        JIRA Issue HADOOP-12653
        Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle
        uname Linux ecd690b1ea5b 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
        Build tool maven
        Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh
        git revision trunk / 2f623fb
        findbugs v3.0.0
        unit https://builds.apache.org/job/PreCommit-HADOOP-Build/8268/artifact/patchprocess/patch-unit-hadoop-common-project_hadoop-common-jdk1.8.0_66.txt
        unit https://builds.apache.org/job/PreCommit-HADOOP-Build/8268/artifact/patchprocess/patch-unit-hadoop-common-project_hadoop-common-jdk1.7.0_91.txt
        unit test logs https://builds.apache.org/job/PreCommit-HADOOP-Build/8268/artifact/patchprocess/patch-unit-hadoop-common-project_hadoop-common-jdk1.8.0_66.txt https://builds.apache.org/job/PreCommit-HADOOP-Build/8268/artifact/patchprocess/patch-unit-hadoop-common-project_hadoop-common-jdk1.7.0_91.txt
        JDK v1.7.0_91 Test Results https://builds.apache.org/job/PreCommit-HADOOP-Build/8268/testReport/
        modules C: hadoop-common-project/hadoop-common U: hadoop-common-project/hadoop-common
        Max memory used 75MB
        Powered by Apache Yetus 0.2.0-SNAPSHOT http://yetus.apache.org
        Console output https://builds.apache.org/job/PreCommit-HADOOP-Build/8268/console

        This message was automatically generated.

        Show
        hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment 0 reexec 0m 0s Docker mode activated. +1 @author 0m 0s The patch does not contain any @author tags. -1 test4tests 0m 0s The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 mvninstall 7m 26s trunk passed +1 compile 8m 12s trunk passed with JDK v1.8.0_66 +1 compile 9m 7s trunk passed with JDK v1.7.0_91 +1 checkstyle 0m 15s trunk passed +1 mvnsite 1m 1s trunk passed +1 mvneclipse 0m 14s trunk passed +1 findbugs 1m 51s trunk passed +1 javadoc 0m 54s trunk passed with JDK v1.8.0_66 +1 javadoc 1m 5s trunk passed with JDK v1.7.0_91 +1 mvninstall 1m 34s the patch passed +1 compile 8m 7s the patch passed with JDK v1.8.0_66 +1 javac 8m 7s the patch passed +1 compile 8m 56s the patch passed with JDK v1.7.0_91 +1 javac 8m 56s the patch passed +1 checkstyle 0m 15s the patch passed +1 mvnsite 1m 3s the patch passed +1 mvneclipse 0m 14s the patch passed +1 whitespace 0m 0s Patch has no whitespace issues. +1 findbugs 2m 2s the patch passed +1 javadoc 0m 52s the patch passed with JDK v1.8.0_66 +1 javadoc 1m 5s the patch passed with JDK v1.7.0_91 -1 unit 18m 58s hadoop-common in the patch failed with JDK v1.8.0_66. -1 unit 6m 54s hadoop-common in the patch failed with JDK v1.7.0_91. +1 asflicense 0m 20s Patch does not generate ASF License warnings. 81m 30s Reason Tests JDK v1.8.0_66 Timed out junit tests org.apache.hadoop.http.TestHttpServerLifecycle JDK v1.7.0_91 Failed junit tests hadoop.ipc.TestIPC   hadoop.metrics2.impl.TestGangliaMetrics Subsystem Report/Notes Docker Image:yetus/hadoop:0ca8df7 JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12778357/HADOOP-12653.001.patch JIRA Issue HADOOP-12653 Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle uname Linux ecd690b1ea5b 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux Build tool maven Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh git revision trunk / 2f623fb findbugs v3.0.0 unit https://builds.apache.org/job/PreCommit-HADOOP-Build/8268/artifact/patchprocess/patch-unit-hadoop-common-project_hadoop-common-jdk1.8.0_66.txt unit https://builds.apache.org/job/PreCommit-HADOOP-Build/8268/artifact/patchprocess/patch-unit-hadoop-common-project_hadoop-common-jdk1.7.0_91.txt unit test logs https://builds.apache.org/job/PreCommit-HADOOP-Build/8268/artifact/patchprocess/patch-unit-hadoop-common-project_hadoop-common-jdk1.8.0_66.txt https://builds.apache.org/job/PreCommit-HADOOP-Build/8268/artifact/patchprocess/patch-unit-hadoop-common-project_hadoop-common-jdk1.7.0_91.txt JDK v1.7.0_91 Test Results https://builds.apache.org/job/PreCommit-HADOOP-Build/8268/testReport/ modules C: hadoop-common-project/hadoop-common U: hadoop-common-project/hadoop-common Max memory used 75MB Powered by Apache Yetus 0.2.0-SNAPSHOT http://yetus.apache.org Console output https://builds.apache.org/job/PreCommit-HADOOP-Build/8268/console This message was automatically generated.
        Hide
        stevel@apache.org Steve Loughran added a comment -

        +1

        Show
        stevel@apache.org Steve Loughran added a comment - +1
        Hide
        cmccabe Colin P. McCabe added a comment -

        Committed to 2.9. Thanks, Steve Loughran.

        Show
        cmccabe Colin P. McCabe added a comment - Committed to 2.9. Thanks, Steve Loughran .
        Hide
        hudson Hudson added a comment -

        SUCCESS: Integrated in Hadoop-trunk-Commit #9096 (See https://builds.apache.org/job/Hadoop-trunk-Commit/9096/)
        HADOOP-12653. Use SO_REUSEADDR to avoid getting "Address already in use" (cmccabe: rev 30c7dfd8ba87fe1b455ad6c05c0a6cd6486f55b7)

        • hadoop-common-project/hadoop-common/CHANGES.txt
        • hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ipc/Client.java
        Show
        hudson Hudson added a comment - SUCCESS: Integrated in Hadoop-trunk-Commit #9096 (See https://builds.apache.org/job/Hadoop-trunk-Commit/9096/ ) HADOOP-12653 . Use SO_REUSEADDR to avoid getting "Address already in use" (cmccabe: rev 30c7dfd8ba87fe1b455ad6c05c0a6cd6486f55b7) hadoop-common-project/hadoop-common/CHANGES.txt hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ipc/Client.java

          People

          • Assignee:
            cmccabe Colin P. McCabe
            Reporter:
            cmccabe Colin P. McCabe
          • Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development