Hadoop Common
  1. Hadoop Common
  2. HADOOP-2924

HOD is trying to bring up task tracker on port which is already in close_wait state

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Critical Critical
    • Resolution: Fixed
    • Affects Version/s: 0.16.0
    • Fix Version/s: 0.17.0
    • Component/s: contrib/hod
    • Labels:
      None

      Description

      While bringing up task tracker using random ports, HOD is not checking whether the port is in CLOSE_WAIT state. So when it starts task tracker, we will be getting an address bind error on that port. We can avoid this error if we check for CLOSE_WAIT state on that port before starting the tasktracker.

      1. HADOOP-2924.1
        8 kB
        Vinod Kumar Vavilapalli
      2. HADOOP-2924
        3 kB
        Vinod Kumar Vavilapalli

        Activity

        Aroop Maliakkal created issue -
        Hide
        Hemanth Yamijala added a comment -

        This could also happen if the host is using a port as the source port for a connection.

        Show
        Hemanth Yamijala added a comment - This could also happen if the host is using a port as the source port for a connection.
        Hemanth Yamijala made changes -
        Field Original Value New Value
        Priority Major [ 3 ] Critical [ 2 ]
        Issue Type Improvement [ 4 ] Bug [ 1 ]
        Assignee Vinod Kumar Vavilapalli [ vinodkv ]
        Fix Version/s 0.17.0 [ 12312913 ]
        Affects Version/s 0.16.0 [ 12312740 ]
        Hide
        Vinod Kumar Vavilapalli added a comment -

        This is happening because of the way HOD currently looks for a free port - it tries to connect and if it gets an exception, it takes it as a free port. And this fails while trying with source ports and/or ports in CLOSE_WAIT state as said above.

        This patch changes the implementation, by testing for a free port by 'binding' to the port and ensuring that we don't get a bind exception. Tested this behaviour. Also carried tests to make sure that ports bound like this and found to be free are still free and usable by the time the hadoop daemons actually bind to them.

        Marked some unused methods, as UNUSED.

        Show
        Vinod Kumar Vavilapalli added a comment - This is happening because of the way HOD currently looks for a free port - it tries to connect and if it gets an exception, it takes it as a free port. And this fails while trying with source ports and/or ports in CLOSE_WAIT state as said above. This patch changes the implementation, by testing for a free port by 'binding' to the port and ensuring that we don't get a bind exception. Tested this behaviour. Also carried tests to make sure that ports bound like this and found to be free are still free and usable by the time the hadoop daemons actually bind to them. Marked some unused methods, as UNUSED.
        Vinod Kumar Vavilapalli made changes -
        Attachment HADOOP-2924 [ 12377502 ]
        Hide
        Hemanth Yamijala added a comment -

        Couple of comments:

        • We should print the error message from socket.error. This will help us debug problems better, given the very core nature of the change we are making to the port detection method.
        • We should also decrement the value of retries in case we get socket.error. This is not being done even currently, but, I think, is clearly intended.
        • Recommend removing the unused methods. I've checked they are indeed unused.
        Show
        Hemanth Yamijala added a comment - Couple of comments: We should print the error message from socket.error. This will help us debug problems better, given the very core nature of the change we are making to the port detection method. We should also decrement the value of retries in case we get socket.error. This is not being done even currently, but, I think, is clearly intended. Recommend removing the unused methods. I've checked they are indeed unused.
        Hide
        Vinod Kumar Vavilapalli added a comment -

        Included the above changes. Also, increased the default retries to 900(which takes around 6.2 secs in the worst case of no single available port out of 900 ports). The default retries used to be 30.

        Show
        Vinod Kumar Vavilapalli added a comment - Included the above changes. Also, increased the default retries to 900(which takes around 6.2 secs in the worst case of no single available port out of 900 ports). The default retries used to be 30.
        Vinod Kumar Vavilapalli made changes -
        Attachment HADOOP-2924.1 [ 12377678 ]
        Hide
        Hemanth Yamijala added a comment -

        +1 for the changes.

        Going to do patch available to run through Hudson.

        Show
        Hemanth Yamijala added a comment - +1 for the changes. Going to do patch available to run through Hudson.
        Hide
        Hemanth Yamijala added a comment -

        Running through Hudson.

        Show
        Hemanth Yamijala added a comment - Running through Hudson.
        Hemanth Yamijala made changes -
        Status Open [ 1 ] Patch Available [ 10002 ]
        Hide
        Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12377678/HADOOP-2924.1
        against trunk revision 619744.

        @author +1. The patch does not contain any @author tags.

        tests included -1. The patch doesn't appear to include any new or modified tests.
        Please justify why no tests are needed for this patch.

        javadoc +1. The javadoc tool did not generate any warning messages.

        javac +1. The applied patch does not generate any new javac compiler warnings.

        release audit +1. The applied patch does not generate any new release audit warnings.

        findbugs +1. The patch does not introduce any new Findbugs warnings.

        core tests +1. The patch passed core unit tests.

        contrib tests +1. The patch passed contrib unit tests.

        Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/1962/testReport/
        Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/1962/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
        Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/1962/artifact/trunk/build/test/checkstyle-errors.html
        Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/1962/console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12377678/HADOOP-2924.1 against trunk revision 619744. @author +1. The patch does not contain any @author tags. tests included -1. The patch doesn't appear to include any new or modified tests. Please justify why no tests are needed for this patch. javadoc +1. The javadoc tool did not generate any warning messages. javac +1. The applied patch does not generate any new javac compiler warnings. release audit +1. The applied patch does not generate any new release audit warnings. findbugs +1. The patch does not introduce any new Findbugs warnings. core tests +1. The patch passed core unit tests. contrib tests +1. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/1962/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/1962/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/1962/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/1962/console This message is automatically generated.
        Hide
        Devaraj Das added a comment -

        I just committed this. Thanks, Vinod!

        Show
        Devaraj Das added a comment - I just committed this. Thanks, Vinod!
        Devaraj Das made changes -
        Status Patch Available [ 10002 ] Resolved [ 5 ]
        Resolution Fixed [ 1 ]
        Hide
        Hudson added a comment -
        Show
        Hudson added a comment - Integrated in Hadoop-trunk #431 (See http://hudson.zones.apache.org/hudson/job/Hadoop-trunk/431/ )
        Nigel Daley made changes -
        Status Resolved [ 5 ] Closed [ 6 ]

          People

          • Assignee:
            Vinod Kumar Vavilapalli
            Reporter:
            Aroop Maliakkal
          • Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development