Hadoop Map/Reduce
  1. Hadoop Map/Reduce
  2. MAPREDUCE-1740

NPE in getMatchingLevelForNodes when node locations are variable depth

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 0.20.3
    • Fix Version/s: 1.1.0, 2.0.0-alpha
    • Component/s: jobtracker
    • Labels:
      None
    • Target Version/s:
    • Hadoop Flags:
      Reviewed

      Description

      In getMatchingLevelForNodes, we assume that both nodes have the same "depth" (ie number of path components). If the user provides a topology script that assigns one node a path like /foo/bar/baz and another node a path like /foo/blah, this function will throw an NPE.

      I'm not sure if there are other places where we assume that all node locations have a constant number of paths. If so we should check the output of the topology script aggressively to be sure this is the case. Otherwise I think we simply need to add && n2 != null to the while loop

      1. mapreduce-1740.txt
        6 kB
        Todd Lipcon
      2. MAPREDUCE-1740.patch
        7 kB
        Ahmed Radwan
      3. MAPREDUCE-1740_branch-1.0.patch
        7 kB
        Ahmed Radwan
      4. MAPREDUCE-1740_trunk.patch
        6 kB
        Ahmed Radwan

        Issue Links

          Activity

          Hide
          Todd Lipcon added a comment -

          Attached patch fixes the NPE, though I haven't had a chance to test on an actual misconfigured cluster.

          Is this the right approach or should we just do a stricter job of verifying the results of the topology mapping script to ensure that all hosts are the same level? Does anyone have a legitimate use case for locality of varying depths?

          Show
          Todd Lipcon added a comment - Attached patch fixes the NPE, though I haven't had a chance to test on an actual misconfigured cluster. Is this the right approach or should we just do a stricter job of verifying the results of the topology mapping script to ensure that all hosts are the same level? Does anyone have a legitimate use case for locality of varying depths?
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12454248/mapreduce-1740.txt
          against trunk revision 1075216.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 3 new or modified tests.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          -1 javac. The patch appears to cause tar ant target to fail.

          -1 findbugs. The patch appears to cause Findbugs (version 1.3.9) to fail.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed these core unit tests:

          -1 contrib tests. The patch failed contrib unit tests.

          -1 system test framework. The patch failed system test framework compile.

          Test results: https://hudson.apache.org/hudson/job/PreCommit-MAPREDUCE-Build/82//testReport/
          Console output: https://hudson.apache.org/hudson/job/PreCommit-MAPREDUCE-Build/82//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12454248/mapreduce-1740.txt against trunk revision 1075216. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. -1 javac. The patch appears to cause tar ant target to fail. -1 findbugs. The patch appears to cause Findbugs (version 1.3.9) to fail. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these core unit tests: -1 contrib tests. The patch failed contrib unit tests. -1 system test framework. The patch failed system test framework compile. Test results: https://hudson.apache.org/hudson/job/PreCommit-MAPREDUCE-Build/82//testReport/ Console output: https://hudson.apache.org/hudson/job/PreCommit-MAPREDUCE-Build/82//console This message is automatically generated.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12454248/mapreduce-1740.txt
          against trunk revision 1079072.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 3 new or modified tests.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          -1 javac. The patch appears to cause tar ant target to fail.

          -1 findbugs. The patch appears to cause Findbugs (version 1.3.9) to fail.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed these core unit tests:

          -1 contrib tests. The patch failed contrib unit tests.

          -1 system test framework. The patch failed system test framework compile.

          Test results: https://hudson.apache.org/hudson/job/PreCommit-MAPREDUCE-Build/128//testReport/
          Console output: https://hudson.apache.org/hudson/job/PreCommit-MAPREDUCE-Build/128//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12454248/mapreduce-1740.txt against trunk revision 1079072. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. -1 javac. The patch appears to cause tar ant target to fail. -1 findbugs. The patch appears to cause Findbugs (version 1.3.9) to fail. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these core unit tests: -1 contrib tests. The patch failed contrib unit tests. -1 system test framework. The patch failed system test framework compile. Test results: https://hudson.apache.org/hudson/job/PreCommit-MAPREDUCE-Build/128//testReport/ Console output: https://hudson.apache.org/hudson/job/PreCommit-MAPREDUCE-Build/128//console This message is automatically generated.
          Hide
          Arun C Murthy added a comment -

          Sorry to come in late, the patch has gone stale. Can you please rebase? Thanks.

          Given this is not an issue with MRv2 should we still commit this? I'm happy to, but not sure it's useful. Thanks.

          Show
          Arun C Murthy added a comment - Sorry to come in late, the patch has gone stale. Can you please rebase? Thanks. Given this is not an issue with MRv2 should we still commit this? I'm happy to, but not sure it's useful. Thanks.
          Hide
          Harsh J added a comment -

          This should be fixed for 1.x as it is pretty serious, and can bring the JT scheduler down on its knees.

          One strace:

          INFO org.apache.hadoop.ipc.Server: IPC Server handler 2 on 8021, call heartbeat(org.apache.hadoop.mapred.TaskTrackerStatus, false, false, true, ...) from xyzzy:abcdef: error: java.io.IOException: java.lang.NullPointerException
          java.io.IOException: java.lang.NullPointerException
          	at org.apache.hadoop.mapred.JobInProgress.getMatchingLevelForNodes(JobInProgress.java:1676)
          	at org.apache.hadoop.mapred.JobInProgress.getLocalityLevel(JobInProgress.java:3354)
          	at org.apache.hadoop.mapred.JobInProgress.addRunningTaskToTIP(JobInProgress.java:1752)
          	at org.apache.hadoop.mapred.JobInProgress.obtainNewMapTask(JobInProgress.java:1341)
          	at org.apache.hadoop.mapred.JobSchedulable.assignTask(JobSchedulable.java:142)
          	at org.apache.hadoop.mapred.PoolSchedulable.assignTask(PoolSchedulable.java:160)
          	at org.apache.hadoop.mapred.FairScheduler.assignTasks(FairScheduler.java:440)
          	at org.apache.hadoop.mapred.JobTracker.heartbeat(JobTracker.java:3226)
          	at sun.reflect.GeneratedMethodAccessor2.invoke(Unknown Source)
          	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
          	at java.lang.reflect.Method.invoke(Method.java:597)
          	at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:557)
          	at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1434)
          	at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1430)
          	at java.security.AccessController.doPrivileged(Native Method)
          	at javax.security.auth.Subject.doAs(Subject.java:396)
          	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1127)
          	at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1428)
          
          Show
          Harsh J added a comment - This should be fixed for 1.x as it is pretty serious, and can bring the JT scheduler down on its knees. One strace: INFO org.apache.hadoop.ipc.Server: IPC Server handler 2 on 8021, call heartbeat(org.apache.hadoop.mapred.TaskTrackerStatus, false , false , true , ...) from xyzzy:abcdef: error: java.io.IOException: java.lang.NullPointerException java.io.IOException: java.lang.NullPointerException at org.apache.hadoop.mapred.JobInProgress.getMatchingLevelForNodes(JobInProgress.java:1676) at org.apache.hadoop.mapred.JobInProgress.getLocalityLevel(JobInProgress.java:3354) at org.apache.hadoop.mapred.JobInProgress.addRunningTaskToTIP(JobInProgress.java:1752) at org.apache.hadoop.mapred.JobInProgress.obtainNewMapTask(JobInProgress.java:1341) at org.apache.hadoop.mapred.JobSchedulable.assignTask(JobSchedulable.java:142) at org.apache.hadoop.mapred.PoolSchedulable.assignTask(PoolSchedulable.java:160) at org.apache.hadoop.mapred.FairScheduler.assignTasks(FairScheduler.java:440) at org.apache.hadoop.mapred.JobTracker.heartbeat(JobTracker.java:3226) at sun.reflect.GeneratedMethodAccessor2.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:557) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1434) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1430) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1127) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1428)
          Hide
          Ahmed Radwan added a comment -

          An updated patch for branch-1.0. Please review.

          Show
          Ahmed Radwan added a comment - An updated patch for branch-1.0. Please review.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12517192/MAPREDUCE-1740.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 3 new or modified tests.

          -1 patch. The patch command could not apply the patch.

          Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/2010//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12517192/MAPREDUCE-1740.patch against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. -1 patch. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/2010//console This message is automatically generated.
          Hide
          Steve Loughran added a comment -

          Looks OK on a quick review; will need to test a bit more.

          (trunk only) AbstractDNSToSwitchMapping() has a getSwitchMap() method which can return the map of host->switch (or null in the base class), which could be used for more assertions. This class isn't in branch1, so these asserts will have to be left out there.

          It also has dumpTopology() method to convert the map to a string, which can be used for diagnostics in logs or assertions.

          style issues

          1. please follow hadoop project spacing rules esp. in if () conditions
          2. use LOG over System.out even in tests
          3. some of the static fields used in the tests could be made final
          Show
          Steve Loughran added a comment - Looks OK on a quick review; will need to test a bit more. (trunk only) AbstractDNSToSwitchMapping() has a getSwitchMap() method which can return the map of host->switch (or null in the base class), which could be used for more assertions. This class isn't in branch1, so these asserts will have to be left out there. It also has dumpTopology() method to convert the map to a string, which can be used for diagnostics in logs or assertions. style issues please follow hadoop project spacing rules esp. in if () conditions use LOG over System.out even in tests some of the static fields used in the tests could be made final
          Hide
          Todd Lipcon added a comment -

          Steve: not sure I follow what you mean about making use of getSwitchMap – which assert are you suggesting this be used for?

          Show
          Todd Lipcon added a comment - Steve: not sure I follow what you mean about making use of getSwitchMap – which assert are you suggesting this be used for?
          Hide
          Steve Loughran added a comment -

          Todd -it makes it possible to view the entire map of host -> switch. If the tests can avoid that, its better because then they will all work on 1.x too.

          Show
          Steve Loughran added a comment - Todd -it makes it possible to view the entire map of host -> switch. If the tests can avoid that, its better because then they will all work on 1.x too.
          Hide
          Ahmed Radwan added a comment -

          Thanks Steve and Todd for the review! I am attaching an updated patch for branch-1.0 incorporating Steve's style comments. The patch mainly is to avoid the NPE that we are seeing with some users when the compared nodes are at different depth.

          Show
          Ahmed Radwan added a comment - Thanks Steve and Todd for the review! I am attaching an updated patch for branch-1.0 incorporating Steve's style comments. The patch mainly is to avoid the NPE that we are seeing with some users when the compared nodes are at different depth.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12517623/MAPREDUCE-1740_branch-1.0.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 3 new or modified tests.

          -1 patch. The patch command could not apply the patch.

          Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/2019//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12517623/MAPREDUCE-1740_branch-1.0.patch against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. -1 patch. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/2019//console This message is automatically generated.
          Hide
          Todd Lipcon added a comment -

          Hi Ahmed. Can you attach a patch for trunk? We need to fix this in trunk as well, right? We can commit the 1.0 patch at the same time.

          Show
          Todd Lipcon added a comment - Hi Ahmed. Can you attach a patch for trunk? We need to fix this in trunk as well, right? We can commit the 1.0 patch at the same time.
          Hide
          Ahmed Radwan added a comment -

          Thanks Todd! Attached an updated version of the patch for trunk.

          Show
          Ahmed Radwan added a comment - Thanks Todd! Attached an updated version of the patch for trunk.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12518506/MAPREDUCE-1740_trunk.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 3 new or modified tests.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 eclipse:eclipse. The patch built with eclipse:eclipse.

          +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed these unit tests:
          org.apache.hadoop.mapred.TestWritableJobConf

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/2059//testReport/
          Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/2059//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12518506/MAPREDUCE-1740_trunk.patch against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 eclipse:eclipse. The patch built with eclipse:eclipse. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these unit tests: org.apache.hadoop.mapred.TestWritableJobConf +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/2059//testReport/ Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/2059//console This message is automatically generated.
          Hide
          Ahmed Radwan added a comment -

          The TestWritableJobConf failure doesn't look related to this patch.

          Show
          Ahmed Radwan added a comment - The TestWritableJobConf failure doesn't look related to this patch.
          Hide
          Alejandro Abdelnur added a comment -

          +1

          Show
          Alejandro Abdelnur added a comment - +1
          Hide
          Alejandro Abdelnur added a comment -

          Thanks Ahmed. Committed to trunk and branch-1

          Show
          Alejandro Abdelnur added a comment - Thanks Ahmed. Committed to trunk and branch-1
          Hide
          Arun C Murthy added a comment -

          This is nice to fix. However, we should look at HDFS too. Doing it in isolation in MR isn't sufficient.

          Show
          Arun C Murthy added a comment - This is nice to fix. However, we should look at HDFS too. Doing it in isolation in MR isn't sufficient.
          Hide
          Hudson added a comment -

          Integrated in Hadoop-Common-trunk-Commit #1907 (See https://builds.apache.org/job/Hadoop-Common-trunk-Commit/1907/)
          MAPREDUCE-1740. NPE in getMatchingLevelForNodes when node locations are variable depth (ahmed via tucu) (Revision 1303076)

          Result = SUCCESS
          tucu : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1303076
          Files :

          • /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
          • /hadoop/common/trunk/hadoop-mapreduce-project/src/java/org/apache/hadoop/mapred/JobInProgress.java
          • /hadoop/common/trunk/hadoop-mapreduce-project/src/test/mapred/org/apache/hadoop/mapred/TestJobInProgress.java
          Show
          Hudson added a comment - Integrated in Hadoop-Common-trunk-Commit #1907 (See https://builds.apache.org/job/Hadoop-Common-trunk-Commit/1907/ ) MAPREDUCE-1740 . NPE in getMatchingLevelForNodes when node locations are variable depth (ahmed via tucu) (Revision 1303076) Result = SUCCESS tucu : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1303076 Files : /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt /hadoop/common/trunk/hadoop-mapreduce-project/src/java/org/apache/hadoop/mapred/JobInProgress.java /hadoop/common/trunk/hadoop-mapreduce-project/src/test/mapred/org/apache/hadoop/mapred/TestJobInProgress.java
          Hide
          Hudson added a comment -

          Integrated in Hadoop-Hdfs-trunk-Commit #1981 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk-Commit/1981/)
          MAPREDUCE-1740. NPE in getMatchingLevelForNodes when node locations are variable depth (ahmed via tucu) (Revision 1303076)

          Result = SUCCESS
          tucu : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1303076
          Files :

          • /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
          • /hadoop/common/trunk/hadoop-mapreduce-project/src/java/org/apache/hadoop/mapred/JobInProgress.java
          • /hadoop/common/trunk/hadoop-mapreduce-project/src/test/mapred/org/apache/hadoop/mapred/TestJobInProgress.java
          Show
          Hudson added a comment - Integrated in Hadoop-Hdfs-trunk-Commit #1981 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk-Commit/1981/ ) MAPREDUCE-1740 . NPE in getMatchingLevelForNodes when node locations are variable depth (ahmed via tucu) (Revision 1303076) Result = SUCCESS tucu : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1303076 Files : /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt /hadoop/common/trunk/hadoop-mapreduce-project/src/java/org/apache/hadoop/mapred/JobInProgress.java /hadoop/common/trunk/hadoop-mapreduce-project/src/test/mapred/org/apache/hadoop/mapred/TestJobInProgress.java
          Hide
          Hudson added a comment -

          Integrated in Hadoop-Mapreduce-trunk #1025 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1025/)
          MAPREDUCE-1740. NPE in getMatchingLevelForNodes when node locations are variable depth (ahmed via tucu) (Revision 1303076)

          Result = SUCCESS
          tucu : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1303076
          Files :

          • /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
          • /hadoop/common/trunk/hadoop-mapreduce-project/src/java/org/apache/hadoop/mapred/JobInProgress.java
          • /hadoop/common/trunk/hadoop-mapreduce-project/src/test/mapred/org/apache/hadoop/mapred/TestJobInProgress.java
          Show
          Hudson added a comment - Integrated in Hadoop-Mapreduce-trunk #1025 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1025/ ) MAPREDUCE-1740 . NPE in getMatchingLevelForNodes when node locations are variable depth (ahmed via tucu) (Revision 1303076) Result = SUCCESS tucu : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1303076 Files : /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt /hadoop/common/trunk/hadoop-mapreduce-project/src/java/org/apache/hadoop/mapred/JobInProgress.java /hadoop/common/trunk/hadoop-mapreduce-project/src/test/mapred/org/apache/hadoop/mapred/TestJobInProgress.java
          Hide
          Matt Foley added a comment -

          made FixedVersions compatible with commits

          Show
          Matt Foley added a comment - made FixedVersions compatible with commits
          Hide
          Matt Foley added a comment -

          Closed upon release of Hadoop-1.1.0.

          Show
          Matt Foley added a comment - Closed upon release of Hadoop-1.1.0.

            People

            • Assignee:
              Ahmed Radwan
              Reporter:
              Todd Lipcon
            • Votes:
              2 Vote for this issue
              Watchers:
              10 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development