Hadoop Map/Reduce
  1. Hadoop Map/Reduce
  2. MAPREDUCE-2

ArrayOutOfIndex error in KeyFieldBasedPartitioner on empty key

    Details

    • Type: Bug Bug
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.20.1
    • Component/s: None
    • Labels:
      None
    • Hadoop Flags:
      Reviewed
    • Release Note:
      KeyFieldBasedPartitioner throws ArrayOutOfIndex when passed an empty key. This patch hashes empty key to 0 hashcode.

      Description

      When using KeyFieldBasedPartitioner, if the record doesn't contain the specified field, the endChar would equal with array.length, which throw ArrayOutOfIndex exception, losing that record!

      1. MAPREDUCE-2-v1.1-branch-0.20.patch
        3 kB
        Amar Kamat
      2. MAPREDUCE-2-v1.1.patch
        3 kB
        Amar Kamat
      3. MAPREDUCE-2-v1.0.patch
        3 kB
        Amar Kamat
      4. HADOOP-6052-v1.1.patch
        3 kB
        Amar Kamat
      5. HADOOP-6052-v1.0-branch0.20.patch
        3 kB
        Amar Kamat
      6. HADOOP-6052-v1.0.patch
        3 kB
        Amar Kamat

        Issue Links

          Activity

          Hide
          Amar Kamat added a comment -

          Attaching a fix. Incorporated Jothi's comments from HADOOP-5779. Result of test-patch
          [exec] +1 overall.
          [exec]
          [exec] +1 @author. The patch does not contain any @author tags.
          [exec]
          [exec] +1 tests included. The patch appears to include 3 new or modified tests.
          [exec]
          [exec] +1 javadoc. The javadoc tool did not generate any warning messages.
          [exec]
          [exec] +1 javac. The applied patch does not increase the total number of javac compiler warnings.
          [exec]
          [exec] +1 findbugs. The patch does not introduce any new Findbugs warnings.
          [exec]
          [exec] +1 Eclipse classpath. The patch retains Eclipse classpath integrity.
          [exec]
          [exec] +1 release audit. The applied patch does not increase the total number of release audit warnings.

          Running ant test now.

          Show
          Amar Kamat added a comment - Attaching a fix. Incorporated Jothi's comments from HADOOP-5779 . Result of test-patch [exec] +1 overall. [exec] [exec] +1 @author. The patch does not contain any @author tags. [exec] [exec] +1 tests included. The patch appears to include 3 new or modified tests. [exec] [exec] +1 javadoc. The javadoc tool did not generate any warning messages. [exec] [exec] +1 javac. The applied patch does not increase the total number of javac compiler warnings. [exec] [exec] +1 findbugs. The patch does not introduce any new Findbugs warnings. [exec] [exec] +1 Eclipse classpath. The patch retains Eclipse classpath integrity. [exec] [exec] +1 release audit. The applied patch does not increase the total number of release audit warnings. Running ant test now.
          Hide
          Amar Kamat added a comment -

          Following tests failed.

          Name Type Result Resolution
          org.apache.hadoop.mapred.TestReduceFetch FAILED Rerun also failed HADOOP-6029
          org.apache.hadoop.mapred.TestRunningTaskLimits FAILED Rerun passed ?
          org.apache.hadoop.mapred.TestTaskLimits FAILED (timeout) Rerun also failed HADOOP-5993/HADOOP-6061

          Looking at TestRunningTaskLimits, I see the following code

          
              JobConf jobConf = createWaitJobConf(mr, "job1", 20, 20);
              jobConf.setRunningMapLimit(5);
              jobConf.setRunningReduceLimit(3);
              
              // Submit the job
              RunningJob rJob = (new JobClient(jobConf)).submitJob(jobConf);
              
              // Wait 20 seconds for it to start up
              UtilsForTests.waitFor(20000);
              
              // Check the number of running tasks
              JobTracker jobTracker = mr.getJobTrackerRunner().getJobTracker();
              JobInProgress jip = jobTracker.getJob(rJob.getID());
              assertEquals(5, jip.runningMaps());
              assertEquals(3, jip.runningReduces());
          

          I dont think waiting for 20 secs is a good thing to do. When I see the logs only one reducer was scheduled.

          Contrib tests passed except

          Name Type Result Resolution
          org.apache.hadoop.streaming.TestStreamingExitStatus FAILED Known issue HADOOP-5906
          org.apache.hadoop.streaming.TestStreamingStderr FAILED (timeout) Known issue HADOOP-6062
          org.apache.hadoop.mapred.TestCapacitySchedulerConf FAILED Second run passed after deleting capacity-scheduler.xml from conf ?
          Show
          Amar Kamat added a comment - Following tests failed. Name Type Result Resolution org.apache.hadoop.mapred.TestReduceFetch FAILED Rerun also failed HADOOP-6029 org.apache.hadoop.mapred.TestRunningTaskLimits FAILED Rerun passed ? org.apache.hadoop.mapred.TestTaskLimits FAILED (timeout) Rerun also failed HADOOP-5993 / HADOOP-6061 Looking at TestRunningTaskLimits, I see the following code JobConf jobConf = createWaitJobConf(mr, "job1" , 20, 20); jobConf.setRunningMapLimit(5); jobConf.setRunningReduceLimit(3); // Submit the job RunningJob rJob = ( new JobClient(jobConf)).submitJob(jobConf); // Wait 20 seconds for it to start up UtilsForTests.waitFor(20000); // Check the number of running tasks JobTracker jobTracker = mr.getJobTrackerRunner().getJobTracker(); JobInProgress jip = jobTracker.getJob(rJob.getID()); assertEquals(5, jip.runningMaps()); assertEquals(3, jip.runningReduces()); I dont think waiting for 20 secs is a good thing to do. When I see the logs only one reducer was scheduled. Contrib tests passed except Name Type Result Resolution org.apache.hadoop.streaming.TestStreamingExitStatus FAILED Known issue HADOOP-5906 org.apache.hadoop.streaming.TestStreamingStderr FAILED (timeout) Known issue HADOOP-6062 org.apache.hadoop.mapred.TestCapacitySchedulerConf FAILED Second run passed after deleting capacity-scheduler.xml from conf ?
          Hide
          Amar Kamat added a comment -

          Opened HADOOP-6065 to address the failure of TestRunningTaskLimit.

          Show
          Amar Kamat added a comment - Opened HADOOP-6065 to address the failure of TestRunningTaskLimit.
          Hide
          Amar Kamat added a comment -

          Attaching a patch for branch 0.20

          Show
          Amar Kamat added a comment - Attaching a patch for branch 0.20
          Hide
          Devaraj Das added a comment -

          Sorry for commenting so late on this one - the check for (startChar < 0), should happen before endChar is evaluated, no? If startChar < 0, the endChar evaluation is redundant..

          Show
          Devaraj Das added a comment - Sorry for commenting so late on this one - the check for (startChar < 0), should happen before endChar is evaluated, no? If startChar < 0, the endChar evaluation is redundant..
          Hide
          Amar Kamat added a comment -

          Opened HADOOP-6075 to address TaskTaskTrackerMemoryManager failure.

          Show
          Amar Kamat added a comment - Opened HADOOP-6075 to address TaskTaskTrackerMemoryManager failure.
          Hide
          Amar Kamat added a comment -

          Attaching a new patch incorporating Devaraj's comments. Running test-patch. Waiting for HADOOP-6076.

          Show
          Amar Kamat added a comment - Attaching a new patch incorporating Devaraj's comments. Running test-patch. Waiting for HADOOP-6076 .
          Hide
          Amar Kamat added a comment -

          Result of test-patch
          [exec] +1 overall.
          [exec]
          [exec] +1 @author. The patch does not contain any @author tags.
          [exec]
          [exec] +1 tests included. The patch appears to include 3 new or modified tests.
          [exec]
          [exec] +1 javadoc. The javadoc tool did not generate any warning messages.
          [exec]
          [exec] +1 javac. The applied patch does not increase the total number of javac compiler warnings.
          [exec]
          [exec] +1 findbugs. The patch does not introduce any new Findbugs warnings.
          [exec]
          [exec] +1 Eclipse classpath. The patch retains Eclipse classpath integrity.
          [exec]
          [exec] +1 release audit. The applied patch does not increase the total number of release audit warnings.

          Running ant test now.

          Show
          Amar Kamat added a comment - Result of test-patch [exec] +1 overall. [exec] [exec] +1 @author. The patch does not contain any @author tags. [exec] [exec] +1 tests included. The patch appears to include 3 new or modified tests. [exec] [exec] +1 javadoc. The javadoc tool did not generate any warning messages. [exec] [exec] +1 javac. The applied patch does not increase the total number of javac compiler warnings. [exec] [exec] +1 findbugs. The patch does not introduce any new Findbugs warnings. [exec] [exec] +1 Eclipse classpath. The patch retains Eclipse classpath integrity. [exec] [exec] +1 release audit. The applied patch does not increase the total number of release audit warnings. Running ant test now.
          Hide
          Amar Kamat added a comment -

          Attaching a patch for mapreduce. patch applies cleanly on my box.

          Show
          Amar Kamat added a comment - Attaching a patch for mapreduce. patch applies cleanly on my box.
          Hide
          Amar Kamat added a comment -

          Attaching a new patch which moves the testcase to lib/TestKeyFieldBasedPartitioner.

          Show
          Amar Kamat added a comment - Attaching a new patch which moves the testcase to lib/TestKeyFieldBasedPartitioner.
          Hide
          Amar Kamat added a comment -

          Attaching a patch for branch-20.

          Show
          Amar Kamat added a comment - Attaching a patch for branch-20.
          Hide
          Sharad Agarwal added a comment -

          I committed this. Thanks Amar!

          Show
          Sharad Agarwal added a comment - I committed this. Thanks Amar!
          Hide
          Sharad Agarwal added a comment -

          Committed to mapreduce trunk and 0.20 branch.

          Show
          Sharad Agarwal added a comment - Committed to mapreduce trunk and 0.20 branch.
          Hide
          Hudson added a comment -

          Integrated in Hadoop-Mapreduce-trunk #15 (See http://hudson.zones.apache.org/hudson/job/Hadoop-Mapreduce-trunk/15/)

          Show
          Hudson added a comment - Integrated in Hadoop-Mapreduce-trunk #15 (See http://hudson.zones.apache.org/hudson/job/Hadoop-Mapreduce-trunk/15/ )

            People

            • Assignee:
              Amar Kamat
              Reporter:
              Amar Kamat
            • Votes:
              0 Vote for this issue
              Watchers:
              0 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development