Hadoop Map/Reduce
  1. Hadoop Map/Reduce
  2. MAPREDUCE-1222

[Mumak] We should not include nodes with numeric ips in cluster topology.

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 0.21.0, 0.22.0
    • Fix Version/s: 0.21.0
    • Component/s: contrib/mumak
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      Rumen infers cluster topology by parsing input split locations from job history logs. Due to HDFS-778, a cluster node may appear both as a numeric ip or as a host name in job history logs. We should exclude nodes appeared as numeric ips in cluster toplogy when we run mumak until a solution is found so that numeric ips would never appear in input split locations.

      1. mapreduce-1222-20091210.patch
        10 kB
        Hong Tang
      2. IPv6-predicate.patch
        5 kB
        Dick King
      3. mapreduce-1222-20091121.patch
        10 kB
        Hong Tang
      4. mapreduce-1222-20091119.patch
        10 kB
        Hong Tang

        Activity

        Hide
        Hudson added a comment -

        Integrated in Hadoop-Mapreduce-trunk #196 (See http://hudson.zones.apache.org/hudson/job/Hadoop-Mapreduce-trunk/196/)

        Show
        Hudson added a comment - Integrated in Hadoop-Mapreduce-trunk #196 (See http://hudson.zones.apache.org/hudson/job/Hadoop-Mapreduce-trunk/196/ )
        Hide
        Chris Douglas added a comment -

        +1

        I committed this. Thanks, Hong!

        Show
        Chris Douglas added a comment - +1 I committed this. Thanks, Hong!
        Hide
        Hong Tang added a comment -

        The failed contrib test is well known MAPREDUCE-1124. The release audit warning is due to the inclusion of two json files that cannot have license heading included.

        Show
        Hong Tang added a comment - The failed contrib test is well known MAPREDUCE-1124 . The release audit warning is due to the inclusion of two json files that cannot have license heading included.
        Hide
        Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12427681/mapreduce-1222-20091210.patch
        against trunk revision 889496.

        +1 @author. The patch does not contain any @author tags.

        +1 tests included. The patch appears to include 6 new or modified tests.

        +1 javadoc. The javadoc tool did not generate any warning messages.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        +1 findbugs. The patch does not introduce any new Findbugs warnings.

        -1 release audit. The applied patch generated 161 release audit warnings (more than the trunk's current 159 warnings).

        +1 core tests. The patch passed core unit tests.

        -1 contrib tests. The patch failed contrib unit tests.

        Test results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/319/testReport/
        Release audit warnings: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/319/artifact/trunk/patchprocess/releaseAuditDiffWarnings.txt
        Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/319/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
        Checkstyle results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/319/artifact/trunk/build/test/checkstyle-errors.html
        Console output: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/319/console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12427681/mapreduce-1222-20091210.patch against trunk revision 889496. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 6 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. -1 release audit. The applied patch generated 161 release audit warnings (more than the trunk's current 159 warnings). +1 core tests. The patch passed core unit tests. -1 contrib tests. The patch failed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/319/testReport/ Release audit warnings: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/319/artifact/trunk/patchprocess/releaseAuditDiffWarnings.txt Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/319/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/319/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/319/console This message is automatically generated.
        Hide
        Hong Tang added a comment -

        New patch that excludes well-formatted ipv4 and ipv6 addresses. For ipv4 addresses, optimally allow each period preceded with double backslashes.

        Show
        Hong Tang added a comment - New patch that excludes well-formatted ipv4 and ipv6 addresses. For ipv4 addresses, optimally allow each period preceded with double backslashes.
        Hide
        Hong Tang added a comment -

        Ok, I did a bit research, and (with some help from Hairong) found out that the numeric ip string is obtained by NN when a DN registers itself with NN through o.a.h.ipc.Server.getRemoteIAddress(), which in turn calls InetAddress.getHostAddress() to get the string representation of the ip address. For an Inet6Address, the format would always be 8 hexadecimal numbers (in the range from 0 to ffff) separated by ":" (each number may be represented by 1 to 4 hexadecimal characters).

        So for this jira, I'd like to just have a simple regex to recognize this format instead of arbitrary ipv6 representations.

        Show
        Hong Tang added a comment - Ok, I did a bit research, and (with some help from Hairong) found out that the numeric ip string is obtained by NN when a DN registers itself with NN through o.a.h.ipc.Server.getRemoteIAddress(), which in turn calls InetAddress.getHostAddress() to get the string representation of the ip address. For an Inet6Address, the format would always be 8 hexadecimal numbers (in the range from 0 to ffff) separated by ":" (each number may be represented by 1 to 4 hexadecimal characters). So for this jira, I'd like to just have a simple regex to recognize this format instead of arbitrary ipv6 representations.
        Hide
        Hong Tang added a comment -

        @dick, thanks for the comments. I am combing through the jdk code too and struggling to find the any explicit API. Finally, I found out that there are calls to directly translate ipv4/6 addresses from string literal form ot Inet

        {4|6}

        Address objects, but they are not in JDK, instead they are in sun.net.util.IPAddressUtil - which makes it unusable (unless we mandate everybody use sun jvm).

        What bothers me for re-implementing RFC 2372 (IPv6 address architecture) is that it is a complicated scheme and we probably need a suite of unit-tests to guarantee our implementation is correct - which sounds to me to be way beyond the scope of this jira (which should not even exist if HDFS-778 is fixed).

        Show
        Hong Tang added a comment - @dick, thanks for the comments. I am combing through the jdk code too and struggling to find the any explicit API. Finally, I found out that there are calls to directly translate ipv4/6 addresses from string literal form ot Inet {4|6} Address objects, but they are not in JDK, instead they are in sun.net.util.IPAddressUtil - which makes it unusable (unless we mandate everybody use sun jvm). What bothers me for re-implementing RFC 2372 (IPv6 address architecture) is that it is a complicated scheme and we probably need a suite of unit-tests to guarantee our implementation is correct - which sounds to me to be way beyond the scope of this jira (which should not even exist if HDFS-778 is fixed).
        Hide
        Dick King added a comment -

        After I wrote my comment of 24/Nov/09 07:59 PM , I looked at the Java API because I came to wonder whether unescaping and using the Java API could be made to work by itself. I did look for alternatives before I created my big regular expression.

        The big problem is that Java doesn't really present any API that distinguishes numeric IP addresses from symbolic addresses. Although InetAddress.getByName(String) must have some means of parsing an IPV4 and IPV6 literal numeric address, this functionality is not presented to java.net.* users. InetAddress.getByName(String) will parse either a numeric address or a symbolic name and produce indistinguishable results. That piece of the API does not give us a means to distinguish the two. I was unable to find any other API that did make the distinction.

        The formats of numeric literal IPV4 and IPV6 internet addresses are fixed in RFCs and are extremely unlikely to be changed in the foreseeable future. We are therefore not exposed to any non-future-proofing. The only exposure we have is a possible future IPV8, but the ICANN is doing its best to make that unnecessary for a very long time.

        Considering that Apache already owns this regular expression we should consider using it.

        I considered the simpler approach of considering any address that contains a colon character to be a numeric IPV6 address, but colons are used as other punctuation, ie., separation between IP address and port number. That solution felt to me to be too brittle and accident-prone, and doesn't solve the IPV8 problem. There is a continuum of IPV6 solutions ranging from "look for a colon" to the correct regular expression you see here, and no principled way to decide where to stop.

        Show
        Dick King added a comment - After I wrote my comment of 24/Nov/09 07:59 PM , I looked at the Java API because I came to wonder whether unescaping and using the Java API could be made to work by itself. I did look for alternatives before I created my big regular expression. The big problem is that Java doesn't really present any API that distinguishes numeric IP addresses from symbolic addresses. Although InetAddress.getByName(String) must have some means of parsing an IPV4 and IPV6 literal numeric address, this functionality is not presented to java.net.* users. InetAddress.getByName(String) will parse either a numeric address or a symbolic name and produce indistinguishable results. That piece of the API does not give us a means to distinguish the two. I was unable to find any other API that did make the distinction. The formats of numeric literal IPV4 and IPV6 internet addresses are fixed in RFCs and are extremely unlikely to be changed in the foreseeable future. We are therefore not exposed to any non-future-proofing. The only exposure we have is a possible future IPV8, but the ICANN is doing its best to make that unnecessary for a very long time. Considering that Apache already owns this regular expression we should consider using it. I considered the simpler approach of considering any address that contains a colon character to be a numeric IPV6 address, but colons are used as other punctuation, ie., separation between IP address and port number. That solution felt to me to be too brittle and accident-prone, and doesn't solve the IPV8 problem. There is a continuum of IPV6 solutions ranging from "look for a colon" to the correct regular expression you see here, and no principled way to decide where to stop.
        Hide
        Hong Tang added a comment -

        @dick, thanks for the help. The solution is more complex than I would like to have in solving the trivial problem this jira represents.

        I suggest we go for a less efficient, but more direct and thus easier-to-maintain solution: unescaping the dots and rely on java's IPv4 or IPv6 parsing code.

        Show
        Hong Tang added a comment - @dick, thanks for the help. The solution is more complex than I would like to have in solving the trivial problem this jira represents. I suggest we go for a less efficient, but more direct and thus easier-to-maintain solution: unescaping the dots and rely on java's IPv4 or IPv6 parsing code.
        Hide
        Dick King added a comment -

        This is a supplement to the previous patch to expand the regexp to meet the requirements of RFC 4291 .

        The regular expression and the finite state machine that supports it is longer than it needs to be – about 2.6K, and probably about 500-1000 nodes in the FSM. I did it this way to avoid as much backtracking as possible. It seems to take about 10 microseconds to handle an IPV6 full address, a bit more for an address with elisions, a bit less for an IPV4 address, and about 5 microseconds for a non-numeric address.

        I'll open the regular expression for comment before I fold it in with the patch and with other changes I've suggested in the main patch.

        Show
        Dick King added a comment - This is a supplement to the previous patch to expand the regexp to meet the requirements of RFC 4291 . The regular expression and the finite state machine that supports it is longer than it needs to be – about 2.6K, and probably about 500-1000 nodes in the FSM. I did it this way to avoid as much backtracking as possible. It seems to take about 10 microseconds to handle an IPV6 full address, a bit more for an address with elisions, a bit less for an IPV4 address, and about 5 microseconds for a non-numeric address. I'll open the regular expression for comment before I fold it in with the patch and with other changes I've suggested in the main patch.
        Hide
        Dick King added a comment -

        We have to use a regular expression rather than any API that java might provide for parsing IP addresses, because the punctuation characters in IP addresses sometimes are – and sometimes are not – escaped.

        Show
        Dick King added a comment - We have to use a regular expression rather than any API that java might provide for parsing IP addresses, because the punctuation characters in IP addresses sometimes are – and sometimes are not – escaped.
        Hide
        Allen Wittenauer added a comment -

        Yes, you should worry about IPv6.

        Show
        Allen Wittenauer added a comment - Yes, you should worry about IPv6.
        Hide
        Dick King added a comment -

        1: Should we worry about IPV6 numeric addresses? I'll write the regex if you want.

        2: Proposed line 249 of SimulatorEngine.java reads:

        // ips and as host names. We remove them from the parsed network topology

        and should read

        // ips [as opposed to host names]. We remove numeric ips from the parsed network topology

        The former wording implies that we remove both forms of a doubled address.

        3: Do we mean to make isIPAddress part of the package local API?

        3a: If not, testing isIPAddress should be accomplished by making the addresses that superficially look like IPs but aren't part of the test case in topo-with-numeric-ips.json .

        Show
        Dick King added a comment - 1: Should we worry about IPV6 numeric addresses? I'll write the regex if you want. 2: Proposed line 249 of SimulatorEngine.java reads: // ips and as host names. We remove them from the parsed network topology and should read // ips [as opposed to host names] . We remove numeric ips from the parsed network topology The former wording implies that we remove both forms of a doubled address. 3: Do we mean to make isIPAddress part of the package local API? 3a: If not, testing isIPAddress should be accomplished by making the addresses that superficially look like IPs but aren't part of the test case in topo-with-numeric-ips.json .
        Hide
        Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12425721/mapreduce-1222-20091121.patch
        against trunk revision 882790.

        +1 @author. The patch does not contain any @author tags.

        +1 tests included. The patch appears to include 6 new or modified tests.

        +1 javadoc. The javadoc tool did not generate any warning messages.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        +1 findbugs. The patch does not introduce any new Findbugs warnings.

        -1 release audit. The applied patch generated 161 release audit warnings (more than the trunk's current 159 warnings).

        +1 core tests. The patch passed core unit tests.

        +1 contrib tests. The patch passed contrib unit tests.

        Test results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/257/testReport/
        Release audit warnings: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/257/artifact/trunk/patchprocess/releaseAuditDiffWarnings.txt
        Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/257/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
        Checkstyle results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/257/artifact/trunk/build/test/checkstyle-errors.html
        Console output: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/257/console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12425721/mapreduce-1222-20091121.patch against trunk revision 882790. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 6 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. -1 release audit. The applied patch generated 161 release audit warnings (more than the trunk's current 159 warnings). +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/257/testReport/ Release audit warnings: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/257/artifact/trunk/patchprocess/releaseAuditDiffWarnings.txt Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/257/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/257/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/257/console This message is automatically generated.
        Hide
        Hong Tang added a comment -

        Added a boolean configuration parameter "mumak.topology.filter-numeric-ips" to enable/disable numeric ip filtering.

        Show
        Hong Tang added a comment - Added a boolean configuration parameter "mumak.topology.filter-numeric-ips" to enable/disable numeric ip filtering.
        Hide
        Hong Tang added a comment -

        The release audit warning is due to the inclusion of json traces (where I cannot add license header).

        The failed contrib test is from gridmix3, and is unrelated to this patch.

        Show
        Hong Tang added a comment - The release audit warning is due to the inclusion of json traces (where I cannot add license header). The failed contrib test is from gridmix3, and is unrelated to this patch.
        Hide
        Hong Tang added a comment -

        What happens if there is no hostname? i.e., the cluster is completely done by IP address?

        The attached patch assumes that task trackers advertise themselves by host names. To avoid overly complicating the patch, I'd probably just add an option to allow users to enable/disable node exclusion based on numeric ips.

        Show
        Hong Tang added a comment - What happens if there is no hostname? i.e., the cluster is completely done by IP address? The attached patch assumes that task trackers advertise themselves by host names. To avoid overly complicating the patch, I'd probably just add an option to allow users to enable/disable node exclusion based on numeric ips.
        Hide
        Allen Wittenauer added a comment -

        What happens if there is no hostname? i.e., the cluster is completely done by IP address?

        Show
        Allen Wittenauer added a comment - What happens if there is no hostname? i.e., the cluster is completely done by IP address?
        Hide
        Hong Tang added a comment -

        Rather than ignore, shouldn't you just normalized all of the nodes to be hostnames? [The fact that hadoop uses IPs internally at all is a mistake IMO.]

        This would have to be done by modifying the API of ClusterStory in rumen. I'd prefer to wait until we know whether HDFS-778 is fixable or not.

        Show
        Hong Tang added a comment - Rather than ignore, shouldn't you just normalized all of the nodes to be hostnames? [The fact that hadoop uses IPs internally at all is a mistake IMO.] This would have to be done by modifying the API of ClusterStory in rumen. I'd prefer to wait until we know whether HDFS-778 is fixable or not.
        Hide
        Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12425489/mapreduce-1222-20091119.patch
        against trunk revision 881673.

        +1 @author. The patch does not contain any @author tags.

        +1 tests included. The patch appears to include 6 new or modified tests.

        +1 javadoc. The javadoc tool did not generate any warning messages.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        +1 findbugs. The patch does not introduce any new Findbugs warnings.

        -1 release audit. The applied patch generated 161 release audit warnings (more than the trunk's current 159 warnings).

        +1 core tests. The patch passed core unit tests.

        -1 contrib tests. The patch failed contrib unit tests.

        Test results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/254/testReport/
        Release audit warnings: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/254/artifact/trunk/patchprocess/releaseAuditDiffWarnings.txt
        Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/254/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
        Checkstyle results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/254/artifact/trunk/build/test/checkstyle-errors.html
        Console output: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/254/console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12425489/mapreduce-1222-20091119.patch against trunk revision 881673. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 6 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. -1 release audit. The applied patch generated 161 release audit warnings (more than the trunk's current 159 warnings). +1 core tests. The patch passed core unit tests. -1 contrib tests. The patch failed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/254/testReport/ Release audit warnings: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/254/artifact/trunk/patchprocess/releaseAuditDiffWarnings.txt Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/254/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/254/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/254/console This message is automatically generated.
        Hide
        Allen Wittenauer added a comment -

        Rather than ignore, shouldn't you just normalized all of the nodes to be hostnames? [The fact that hadoop uses IPs internally at all is a mistake IMO.]

        Show
        Allen Wittenauer added a comment - Rather than ignore, shouldn't you just normalized all of the nodes to be hostnames? [The fact that hadoop uses IPs internally at all is a mistake IMO.]
        Hide
        Hong Tang added a comment -

        BTW, we should still maintain the mapping between static ips to rack-switch-ips, and feed mapping into the static dns-to-swtich mapping.

        Show
        Hong Tang added a comment - BTW, we should still maintain the mapping between static ips to rack-switch-ips, and feed mapping into the static dns-to-swtich mapping.

          People

          • Assignee:
            Hong Tang
            Reporter:
            Hong Tang
          • Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development