Hadoop Map/Reduce
  1. Hadoop Map/Reduce
  2. MAPREDUCE-2384

The job submitter should make sure to validate jobs before creation of necessary files

    Details

    • Type: Improvement Improvement
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 0.21.0
    • Fix Version/s: 3.0.0
    • Component/s: job submission, test
    • Labels:
      None
    • Tags:
      test

      Description

      In 0.20.x/1.x, 0.21, 0.22 the MapReduce job submitter writes some job-necessary files to the JT FS before checking for output specs or other job validation items. This appears unnecessary to do.

      This has since been silently fixed in the rewrite of the MRApp (called MRv2) in the MAPREDUCE-279 dump thats now replaced the older MR (or, MRv1 now). However, we can still do with a test case to prevent regressing again.

      Original description below:

      When I read the source code of MapReduce in Hadoop 0.21.0, sometimes it made me confused about error response. For example:
      1. JobSubmitter checking output for each job. MapReduce makes rule to limit that each job output must be not exist to avoid fault overwrite. In my opinion, MR should verify output at the point of client submitting. Actually, it copies related files to specified target and then, doing the verifying.
      2. JobTracker. Job has been submitted to JobTracker. In first step, JT create JIT object that is very "huge" . Next step, JT start to verify job queue authority and memory requirements.

      In normal case, verifying client input then response immediately if any cases in fault. Regular logic can be performed if all the inputs have passed.
      It seems like that those code does not make sense for understanding. Is only my personal opinion? Wish someone help me to explain the details. Thanks!

      1. MAPREDUCE-2384.r1.diff
        0.9 kB
        Harsh J
      2. MAPREDUCE-2384.r2.diff
        3 kB
        Harsh J
      3. MAPREDUCE-2384.r3.diff
        4 kB
        Harsh J
      4. MAPREDUCE-2384.r4.diff
        3 kB
        Harsh J

        Issue Links

          Activity

          Harsh J made changes -
          Link This issue relates to MAPREDUCE-3154 [ MAPREDUCE-3154 ]
          Harsh J made changes -
          Component/s test [ 12312904 ]
          Harsh J made changes -
          Status Patch Available [ 10002 ] Resolved [ 5 ]
          Target Version/s 3.0.0 [ 12320355 ]
          Fix Version/s 3.0.0 [ 12320355 ]
          Resolution Fixed [ 1 ]
          Tags test
          Harsh J made changes -
          Release Note Submitter should fail on errors early, before transferring files.
          Harsh J made changes -
          Summary Can MR make error response Immediately? The job submitter should make sure to validate jobs before creation of necessary files
          Description When I read the source code of MapReduce in Hadoop 0.21.0, sometimes it made me confused about error response. For example:
                  1. JobSubmitter checking output for each job. MapReduce makes rule to limit that each job output must be not exist to avoid fault overwrite. In my opinion, MR should verify output at the point of client submitting. Actually, it copies related files to specified target and then, doing the verifying.
                  2. JobTracker. Job has been submitted to JobTracker. In first step, JT create JIT object that is very "huge" . Next step, JT start to verify job queue authority and memory requirements.
           
                  In normal case, verifying client input then response immediately if any cases in fault. Regular logic can be performed if all the inputs have passed.
                  It seems like that those code does not make sense for understanding. Is only my personal opinion? Wish someone help me to explain the details. Thanks!
          In 0.20.x/1.x, 0.21, 0.22 the MapReduce job submitter writes some job-necessary files to the JT FS before checking for output specs or other job validation items. This appears unnecessary to do.

          This has since been silently fixed in the rewrite of the MRApp (called MRv2) in the MAPREDUCE-279 dump thats now replaced the older MR (or, MRv1 now). However, we can still do with a test case to prevent regressing again.

          Original description below:

          {quote}
          When I read the source code of MapReduce in Hadoop 0.21.0, sometimes it made me confused about error response. For example:
                  1. JobSubmitter checking output for each job. MapReduce makes rule to limit that each job output must be not exist to avoid fault overwrite. In my opinion, MR should verify output at the point of client submitting. Actually, it copies related files to specified target and then, doing the verifying.
                  2. JobTracker. Job has been submitted to JobTracker. In first step, JT create JIT object that is very "huge" . Next step, JT start to verify job queue authority and memory requirements.
           
                  In normal case, verifying client input then response immediately if any cases in fault. Regular logic can be performed if all the inputs have passed.
                  It seems like that those code does not make sense for understanding. Is only my personal opinion? Wish someone help me to explain the details. Thanks!
          {quote}
          Harsh J made changes -
          Status Open [ 1 ] Patch Available [ 10002 ]
          Target Version/s 3.0.0 [ 12320355 ]
          Fix Version/s 0.23.0 [ 12315570 ]
          Harsh J made changes -
          Attachment MAPREDUCE-2384.r4.diff [ 12526630 ]
          Harsh J made changes -
          Status Patch Available [ 10002 ] Open [ 1 ]
          Fix Version/s 0.23.0 [ 12315570 ]
          Fix Version/s 0.24.0 [ 12317654 ]
          Harsh J made changes -
          Status Open [ 1 ] Patch Available [ 10002 ]
          Arun C Murthy made changes -
          Fix Version/s 0.24.0 [ 12317654 ]
          Fix Version/s 0.23.0 [ 12315570 ]
          Arun C Murthy made changes -
          Status Patch Available [ 10002 ] Open [ 1 ]
          Harsh J made changes -
          Attachment MAPREDUCE-2384.r3.diff [ 12492888 ]
          Harsh J made changes -
          Attachment MAPREDUCE-2384.r2.diff [ 12487573 ]
          Harsh J made changes -
          Attachment MAPREDUCE-2384.r2.diff [ 12487572 ]
          Harsh J made changes -
          Attachment MAPREDUCE-2384.r2.diff [ 12487572 ]
          Harsh J made changes -
          Link This issue is duplicated by MAPREDUCE-432 [ MAPREDUCE-432 ]
          Harsh J made changes -
          Status Open [ 1 ] Patch Available [ 10002 ]
          Release Note Submitter should fail on errors early, before transferring files.
          Fix Version/s 0.23.0 [ 12315570 ]
          Harsh J made changes -
          Attachment MAPREDUCE-2384.r1.diff [ 12480002 ]
          Harsh J made changes -
          Field Original Value New Value
          Assignee Harsh J Chouraria [ qwertymaniac ]
          Denny Ye created issue -

            People

            • Assignee:
              Harsh J
              Reporter:
              Denny Ye
            • Votes:
              1 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development