Hadoop Common
  1. Hadoop Common
  2. HADOOP-5850

map/reduce doesn't run jobs with 0 maps

    Details

    • Type: Bug Bug
    • Status: Resolved
    • Priority: Critical Critical
    • Resolution: Fixed
    • Affects Version/s: 0.20.0
    • Fix Version/s: 0.20.1
    • Component/s: None
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      Currently, the framework ignores jobs that have 0 maps. This is incorrect. Many pipelines need the job to run (if nothing else, to create the output directory!) so that subsequent jobs don't fail. Effectively, there will be no map tasks and the reduce tasks should immediately set up the Reducer and RecordWriter and then call close on both since there are no inputs to the reduce. I believe it should just work if we remove the check...

      1. HADOOP-5850-20090522-branch-20-final.txt
        20 kB
        Vinod Kumar Vavilapalli
      2. HADOOP-5850-20090522.txt
        21 kB
        Vinod Kumar Vavilapalli
      3. screenshot-1.jpg
        174 kB
        Ramya Sunil
      4. HADOOP-5850-20090520-svn-branch-20.v2.txt
        18 kB
        Vinod Kumar Vavilapalli
      5. HADOOP-5850-20090520-svn.1.txt
        18 kB
        Vinod Kumar Vavilapalli
      6. HADOOP-5850-20090519.1.txt
        14 kB
        Vinod Kumar Vavilapalli
      7. HADOOP-5850-20090519.txt
        14 kB
        Vinod Kumar Vavilapalli

        Activity

        Owen O'Malley made changes -
        Component/s mapred [ 12310690 ]
        Hide
        Vinod Kumar Vavilapalli added a comment -

        [...] Will address this in a new JIRA.

        HADOOP-5908

        Show
        Vinod Kumar Vavilapalli added a comment - [...] Will address this in a new JIRA. HADOOP-5908
        Hide
        Vinod Kumar Vavilapalli added a comment -

        This situation occurs if scheduler is invoked and it calls job.obtainNewMapTask() while the job-clean-up task of this job is still running. Discussed with this Devaraj who concurs that obtainNewMapTask()/obtainNewReduceTask() should return immediately, doing nothing, when job-cleanup is running. Will address this in a new JIRA.

        Show
        Vinod Kumar Vavilapalli added a comment - This situation occurs if scheduler is invoked and it calls job.obtainNewMapTask() while the job-clean-up task of this job is still running. Discussed with this Devaraj who concurs that obtainNewMapTask()/obtainNewReduceTask() should return immediately, doing nothing, when job-cleanup is running. Will address this in a new JIRA.
        Hide
        Ramya Sunil added a comment -

        With the above fix, when a job (writing to DFS) with 0 maps and >0 reduces is submitted, the cluster hangs completely.
        The TT logs show "INFO org.apache.hadoop.mapred.TaskTracker: Resending 'status' to <jt>' with reponseId 'ID" infinitely and the JT throws java.io.IOException: java.lang.ArithmeticException forever.
        Below is the stacktrace:

         
        2009-05-25 08:13:00,124 INFO org.apache.hadoop.ipc.Server: IPC Server handler 37 on <portno>, call heartbeat(org.apache.hadoop.mapred.TaskTrackerStatus@14d128c, false, false, true, 3231) from <ip>:<port> error: java.io.IOException:
         java.lang.ArithmeticException: / by zero
        java.io.IOException: java.lang.ArithmeticException: / by zero
                at org.apache.hadoop.mapred.ResourceEstimator.getEstimatedMapOutputSize(ResourceEstimator.java:85)
                at org.apache.hadoop.mapred.JobInProgress.findNewMapTask(JobInProgress.java:1729)
                at org.apache.hadoop.mapred.JobInProgress.obtainNewMapTask(JobInProgress.java:978)
                at org.apache.hadoop.mapred.CapacityTaskScheduler$MapSchedulingMgr.obtainNewTask(CapacityTaskScheduler.java:572)
                at org.apache.hadoop.mapred.CapacityTaskScheduler$TaskSchedulingMgr.getTaskFromQueue(CapacityTaskScheduler.java:418)
                at org.apache.hadoop.mapred.CapacityTaskScheduler$TaskSchedulingMgr.assignTasks(CapacityTaskScheduler.java:498)
                at org.apache.hadoop.mapred.CapacityTaskScheduler$TaskSchedulingMgr.access$500(CapacityTaskScheduler.java:277)
                at org.apache.hadoop.mapred.CapacityTaskScheduler.assignTasks(CapacityTaskScheduler.java:977)
                at org.apache.hadoop.mapred.JobTracker.heartbeat(JobTracker.java:2605)
                at sun.reflect.GeneratedMethodAccessor7.invoke(Unknown Source)
                at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
                at java.lang.reflect.Method.invoke(Method.java:597)
                at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:508)
                at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:959)
                at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:955)
                at java.security.AccessController.doPrivileged(Native Method)
                at javax.security.auth.Subject.doAs(Subject.java:396)
                at org.apache.hadoop.ipc.Server$Handler.run(Server.java:953)
        

        In such a case, all the jobs hang infinitely without progressing and the cluster is completely down.
        This problem is solved only when the no-map job is killed. Once the job is killed the cluster is back running and the other jobs proceed smoothly.

        Show
        Ramya Sunil added a comment - With the above fix, when a job (writing to DFS) with 0 maps and >0 reduces is submitted, the cluster hangs completely. The TT logs show "INFO org.apache.hadoop.mapred.TaskTracker: Resending 'status' to <jt>' with reponseId 'ID" infinitely and the JT throws java.io.IOException: java.lang.ArithmeticException forever. Below is the stacktrace: 2009-05-25 08:13:00,124 INFO org.apache.hadoop.ipc.Server: IPC Server handler 37 on <portno>, call heartbeat(org.apache.hadoop.mapred.TaskTrackerStatus@14d128c, false, false, true, 3231) from <ip>:<port> error: java.io.IOException: java.lang.ArithmeticException: / by zero java.io.IOException: java.lang.ArithmeticException: / by zero at org.apache.hadoop.mapred.ResourceEstimator.getEstimatedMapOutputSize(ResourceEstimator.java:85) at org.apache.hadoop.mapred.JobInProgress.findNewMapTask(JobInProgress.java:1729) at org.apache.hadoop.mapred.JobInProgress.obtainNewMapTask(JobInProgress.java:978) at org.apache.hadoop.mapred.CapacityTaskScheduler$MapSchedulingMgr.obtainNewTask(CapacityTaskScheduler.java:572) at org.apache.hadoop.mapred.CapacityTaskScheduler$TaskSchedulingMgr.getTaskFromQueue(CapacityTaskScheduler.java:418) at org.apache.hadoop.mapred.CapacityTaskScheduler$TaskSchedulingMgr.assignTasks(CapacityTaskScheduler.java:498) at org.apache.hadoop.mapred.CapacityTaskScheduler$TaskSchedulingMgr.access$500(CapacityTaskScheduler.java:277) at org.apache.hadoop.mapred.CapacityTaskScheduler.assignTasks(CapacityTaskScheduler.java:977) at org.apache.hadoop.mapred.JobTracker.heartbeat(JobTracker.java:2605) at sun.reflect.GeneratedMethodAccessor7.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:508) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:959) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:955) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:953) In such a case, all the jobs hang infinitely without progressing and the cluster is completely down. This problem is solved only when the no-map job is killed. Once the job is killed the cluster is back running and the other jobs proceed smoothly.
        Devaraj Das made changes -
        Status Open [ 1 ] Resolved [ 5 ]
        Hadoop Flags [Reviewed]
        Resolution Fixed [ 1 ]
        Hide
        Devaraj Das added a comment -

        I just committed this. Thanks, Vinod!

        Show
        Devaraj Das added a comment - I just committed this. Thanks, Vinod!
        Vinod Kumar Vavilapalli made changes -
        Hide
        Vinod Kumar Vavilapalli added a comment -

        Patch for branch-20.

        Show
        Vinod Kumar Vavilapalli added a comment - Patch for branch-20.
        Hide
        Vinod Kumar Vavilapalli added a comment -

        ant test-patch and run-test-mapred targets passed with the patch.

        Show
        Vinod Kumar Vavilapalli added a comment - ant test-patch and run-test-mapred targets passed with the patch.
        Hide
        Amareshwari Sriramadasu added a comment -

        changes look fine to me

        Show
        Amareshwari Sriramadasu added a comment - changes look fine to me
        Vinod Kumar Vavilapalli made changes -
        Attachment HADOOP-5850-20090522.txt [ 12408793 ]
        Hide
        Vinod Kumar Vavilapalli added a comment -

        Attaching patch incorporating the review comments and fixing the issues.

        Show
        Vinod Kumar Vavilapalli added a comment - Attaching patch incorporating the review comments and fixing the issues.
        Vinod Kumar Vavilapalli made changes -
        Status Patch Available [ 10002 ] Open [ 1 ]
        Hide
        Vinod Kumar Vavilapalli added a comment -

        Thank you Ramya for finding out a problem with the patch! I am working on fixing this and uploading a new patch.

        Show
        Vinod Kumar Vavilapalli added a comment - Thank you Ramya for finding out a problem with the patch! I am working on fixing this and uploading a new patch.
        Ramya Sunil made changes -
        Attachment screenshot-1.jpg [ 12408771 ]
        Hide
        Ramya Sunil added a comment -

        Had offline discussion with Nigel. Attaching a screenshot for the Exception thrown in taskdetails.jsp page.

        Show
        Ramya Sunil added a comment - Had offline discussion with Nigel. Attaching a screenshot for the Exception thrown in taskdetails.jsp page.
        Hide
        Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12408651/HADOOP-5850-20090520-svn-branch-20.v2.txt
        against trunk revision 777330.

        +1 @author. The patch does not contain any @author tags.

        +1 tests included. The patch appears to include 11 new or modified tests.

        -1 patch. The patch command could not apply the patch.

        Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/374/console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12408651/HADOOP-5850-20090520-svn-branch-20.v2.txt against trunk revision 777330. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 11 new or modified tests. -1 patch. The patch command could not apply the patch. Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/374/console This message is automatically generated.
        Hide
        Ramya Sunil added a comment -

        The patch seems to introduce a new error. The taskdetails.jsp page of job setup and cleanup tasks throws a NullPointerException. This is the case with all the jobs' setup and cleanup tasks. Below is the Exception seen on the UI:
        java.lang.NullPointerException
        at org.apache.hadoop.mapred.TaskInProgress.getSplitNodes(TaskInProgress.java:1034)
        at org.apache.hadoop.mapred.taskdetails_jsp._jspService(taskdetails_jsp.java:288)
        at org.apache.jasper.runtime.HttpJspBase.service(HttpJspBase.java:97)
        at javax.servlet.http.HttpServlet.service(HttpServlet.java:820)
        at org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:502)
        at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:363)
        at org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
        at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:181)
        at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766)
        at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:4

        Show
        Ramya Sunil added a comment - The patch seems to introduce a new error. The taskdetails.jsp page of job setup and cleanup tasks throws a NullPointerException. This is the case with all the jobs' setup and cleanup tasks. Below is the Exception seen on the UI: java.lang.NullPointerException at org.apache.hadoop.mapred.TaskInProgress.getSplitNodes(TaskInProgress.java:1034) at org.apache.hadoop.mapred.taskdetails_jsp._jspService(taskdetails_jsp.java:288) at org.apache.jasper.runtime.HttpJspBase.service(HttpJspBase.java:97) at javax.servlet.http.HttpServlet.service(HttpServlet.java:820) at org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:502) at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:363) at org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216) at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:181) at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766) at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:4
        Hide
        Amareshwari Sriramadasu added a comment -

        Setting Map Progress and reduce progress to 1.0f should not be done in updateTaskStatus() method, it can be done when setup completes in completedTask().
        Catching NullPointerException in testcase doesn't seem correct, just throw it out if there is any.

        Show
        Amareshwari Sriramadasu added a comment - Setting Map Progress and reduce progress to 1.0f should not be done in updateTaskStatus() method, it can be done when setup completes in completedTask(). Catching NullPointerException in testcase doesn't seem correct, just throw it out if there is any.
        Vinod Kumar Vavilapalli made changes -
        Hide
        Vinod Kumar Vavilapalli added a comment -

        Patch for branch 0.20.

        Show
        Vinod Kumar Vavilapalli added a comment - Patch for branch 0.20.
        Hide
        Vinod Kumar Vavilapalli added a comment -

        Forgot to summarize. With this patch,

        • Jobs with zero maps will still run job setup and cleanup tasks.
        • If number of reduces is non-zero, reduces are run leaving behind the corresponding number of empty part-files in the output directory.
        • If the number of reduces is also zero, an empty output directory is left behind.
        • The map progress(and reduce progress if number of reduces is zero) is set to 1.0 once the job cleanup task finishes.
        Show
        Vinod Kumar Vavilapalli added a comment - Forgot to summarize. With this patch, Jobs with zero maps will still run job setup and cleanup tasks. If number of reduces is non-zero, reduces are run leaving behind the corresponding number of empty part-files in the output directory. If the number of reduces is also zero, an empty output directory is left behind. The map progress(and reduce progress if number of reduces is zero) is set to 1.0 once the job cleanup task finishes.
        Vinod Kumar Vavilapalli made changes -
        Status Open [ 1 ] Patch Available [ 10002 ]
        Vinod Kumar Vavilapalli made changes -
        Status Patch Available [ 10002 ] Open [ 1 ]
        Vinod Kumar Vavilapalli made changes -
        Attachment HADOOP-5850-20090520-svn.1.txt [ 12408613 ]
        Hide
        Vinod Kumar Vavilapalli added a comment -

        Attaching final patch. The previous patch had some problems because of which TestSetupAndCleanupFailure failed.

        This patch fixes those problems. It passed ant test-patch, core and contrib tests except the following which failed/timeout even without this patch and are unrelated to the changes in this patch.

        Failed:

        • org.apache.hadoop.streaming.TestMultipleCachefiles
        • org.apache.hadoop.streaming.TestStreamingBadRecords
        • org.apache.hadoop.streaming.TestSymLink

        Timedout:

        • org.apache.hadoop.mapred.TestJobInProgressListener FAILED (timeout)
        • org.apache.hadoop.mapred.TestQueueCapacities FAILED (timeout)
        Show
        Vinod Kumar Vavilapalli added a comment - Attaching final patch. The previous patch had some problems because of which TestSetupAndCleanupFailure failed. This patch fixes those problems. It passed ant test-patch, core and contrib tests except the following which failed/timeout even without this patch and are unrelated to the changes in this patch. Failed: org.apache.hadoop.streaming.TestMultipleCachefiles org.apache.hadoop.streaming.TestStreamingBadRecords org.apache.hadoop.streaming.TestSymLink Timedout: org.apache.hadoop.mapred.TestJobInProgressListener FAILED (timeout) org.apache.hadoop.mapred.TestQueueCapacities FAILED (timeout)
        Hide
        Tom White added a comment -

        You should be able to use LazyOutputFormat if you wanted no part files to be produced.

        Show
        Tom White added a comment - You should be able to use LazyOutputFormat if you wanted no part files to be produced.
        Hide
        Vinod Kumar Vavilapalli added a comment -

        Should the output directories be empty or should they produce zero-length part files ?

        The job will run N number of reduces that the user intends and they should produce zero-length part files. The patch does the same.

        Show
        Vinod Kumar Vavilapalli added a comment - Should the output directories be empty or should they produce zero-length part files ? The job will run N number of reduces that the user intends and they should produce zero-length part files. The patch does the same.
        Hide
        Milind Bhandarkar added a comment -

        Should the output directories be empty or should they produce zero-length part files ?

        Show
        Milind Bhandarkar added a comment - Should the output directories be empty or should they produce zero-length part files ?
        Hide
        Vinod Kumar Vavilapalli added a comment -

        ant test-patch results:

             [exec] +1 overall.
             [exec]
             [exec]     +1 @author.  The patch does not contain any @author tags.
             [exec]
             [exec]     +1 tests included.  The patch appears to include 6 new or modified tests.
             [exec]
             [exec]     +1 javadoc.  The javadoc tool did not generate any warning messages.
             [exec]
             [exec]     +1 javac.  The applied patch does not increase the total number of javac compiler warnings.
             [exec]
             [exec]     +1 findbugs.  The patch does not introduce any new Findbugs warnings.
             [exec]
             [exec]     +1 Eclipse classpath. The patch retains Eclipse classpath integrity.
             [exec]
             [exec]     +1 release audit.  The applied patch does not increase the total number of release audit warnings.
             [exec]
             [exec]
             [exec]
             [exec]
             [exec] ======================================================================
             [exec] ======================================================================
             [exec]     Finished build.
             [exec] ======================================================================
             [exec] ======================================================================
        
        Show
        Vinod Kumar Vavilapalli added a comment - ant test-patch results: [exec] +1 overall. [exec] [exec] +1 @author. The patch does not contain any @author tags. [exec] [exec] +1 tests included. The patch appears to include 6 new or modified tests. [exec] [exec] +1 javadoc. The javadoc tool did not generate any warning messages. [exec] [exec] +1 javac. The applied patch does not increase the total number of javac compiler warnings. [exec] [exec] +1 findbugs. The patch does not introduce any new Findbugs warnings. [exec] [exec] +1 Eclipse classpath. The patch retains Eclipse classpath integrity. [exec] [exec] +1 release audit. The applied patch does not increase the total number of release audit warnings. [exec] [exec] [exec] [exec] [exec] ====================================================================== [exec] ====================================================================== [exec] Finished build. [exec] ====================================================================== [exec] ======================================================================
        Vinod Kumar Vavilapalli made changes -
        Status Open [ 1 ] Patch Available [ 10002 ]
        Vinod Kumar Vavilapalli made changes -
        Status Patch Available [ 10002 ] Open [ 1 ]
        Vinod Kumar Vavilapalli made changes -
        Attachment HADOOP-5850-20090519.1.txt [ 12408469 ]
        Hide
        Vinod Kumar Vavilapalli added a comment -

        Attaching new patch with the suggested changes. The test now fails without the core changes and succeeds with.

        Show
        Vinod Kumar Vavilapalli added a comment - Attaching new patch with the suggested changes. The test now fails without the core changes and succeeds with.
        Hide
        Jothi Padmanabhan added a comment -

        Changes to JobInProgress looks fine.
        Could you add an assertion to the test case to verify presence of empty output directories as well? That should fail without the patch and should pass with the patch.

        Show
        Jothi Padmanabhan added a comment - Changes to JobInProgress looks fine. Could you add an assertion to the test case to verify presence of empty output directories as well? That should fail without the patch and should pass with the patch.
        Vinod Kumar Vavilapalli made changes -
        Status Open [ 1 ] Patch Available [ 10002 ]
        Vinod Kumar Vavilapalli made changes -
        Attachment HADOOP-5850-20090519.txt [ 12408448 ]
        Hide
        Vinod Kumar Vavilapalli added a comment -

        Attaching patch to fix this issue.

        • With this patch, jobs will 0 maps or no input still run JobSetUp, any number of reduces( which do nothing), and the JobCleanUp task.
        • Removed the 0-splits check in JobInProgress.initTasks() and added checks so that cleanup task doesn't launch before setup tasks when number of splits is zero.
        • Renamed TestEmptyJobWithDFS to TestEmptyJob, removed HDFS dependence to quicken the test, added checks for verifying the number of map and reduce tasks run for an empty-job.
        Show
        Vinod Kumar Vavilapalli added a comment - Attaching patch to fix this issue. With this patch, jobs will 0 maps or no input still run JobSetUp, any number of reduces( which do nothing), and the JobCleanUp task. Removed the 0-splits check in JobInProgress.initTasks() and added checks so that cleanup task doesn't launch before setup tasks when number of splits is zero. Renamed TestEmptyJobWithDFS to TestEmptyJob, removed HDFS dependence to quicken the test, added checks for verifying the number of map and reduce tasks run for an empty-job.
        Vinod Kumar Vavilapalli made changes -
        Field Original Value New Value
        Assignee Vinod K V [ vinodkv ]
        Owen O'Malley created issue -

          People

          • Assignee:
            Vinod Kumar Vavilapalli
            Reporter:
            Owen O'Malley
          • Votes:
            1 Vote for this issue
            Watchers:
            8 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development