Hadoop Map/Reduce
  1. Hadoop Map/Reduce
  2. MAPREDUCE-1711

Gridmix should provide an option to submit jobs to the same queues as specified in the trace.

    Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.22.0
    • Component/s: contrib/gridmix
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      Gridmix should provide an option to submit jobs to the same queues as specified in the trace.

      1. MR-1711-Yhadoop-20-crossPort-2.patch
        186 kB
        rahul k singh
      2. MR-1711-Yhadoop-20-crossPort-1.patch
        186 kB
        rahul k singh
      3. diff-rumen.patch
        11 kB
        Hong Tang
      4. diff-gridmix.patch
        48 kB
        Hong Tang
      5. MR-1711-Yhadoop-20-crossPort.patch
        171 kB
        rahul k singh
      6. MR-1711-yhadoop-20-1xx-7.patch
        11 kB
        rahul k singh
      7. MR-1711-yhadoop-20-1xx-6.patch
        11 kB
        rahul k singh
      8. MR-1711-yhadoop-20-1xx-5.patch
        12 kB
        rahul k singh
      9. MR-1711-yhadoop-20-1xx-4.patch
        12 kB
        rahul k singh
      10. MR-1711-yhadoop-20-1xx-3.patch
        12 kB
        rahul k singh
      11. MR-1711-yhadoop-20-1xx-2.patch
        11 kB
        rahul k singh
      12. MR-1711-yhadoop-20-1xx.patch
        10 kB
        rahul k singh
      13. ASF.LICENSE.NOT.GRANTED--mr-1711-yhadoop-20.1xx-20100416.patch
        4 kB
        Hong Tang

        Issue Links

          Activity

          Hide
          Chris Douglas added a comment -

          Fixed in MAPREDUCE-1840

          Show
          Chris Douglas added a comment - Fixed in MAPREDUCE-1840
          Hide
          Hong Tang added a comment -

          Jobs arent submitted concurrently , JobFactories have single thread which submits it one after another.

          Yes, I think you are right. There is no correctness issues here. But it would be better to set job specific conf settings through "job.getConfiguration().set...".

          (Task.Counter.SPLIT_RAW_BYTES) isnt there in 20.10 . So i reverted it to the current state.

          I see.

          Show
          Hong Tang added a comment - Jobs arent submitted concurrently , JobFactories have single thread which submits it one after another. Yes, I think you are right. There is no correctness issues here. But it would be better to set job specific conf settings through "job.getConfiguration().set...". (Task.Counter.SPLIT_RAW_BYTES) isnt there in 20.10 . So i reverted it to the current state. I see.
          Hide
          rahul k singh added a comment -
          1. In the first GridmixJob constructor, you are setting the UGI in the shared instance of Configuration object. This would interfere with jobs submitted concurrently.

          Jobs arent submitted concurrently , JobFactories have single thread which submits it one after another.

          1. TestGridmixSubmission, I saw some changes not synced (see diff-gridmix.patch).

          (Task.Counter.SPLIT_RAW_BYTES) isnt there in 20.10 . So i reverted it to the current state.

          Show
          rahul k singh added a comment - In the first GridmixJob constructor, you are setting the UGI in the shared instance of Configuration object. This would interfere with jobs submitted concurrently. Jobs arent submitted concurrently , JobFactories have single thread which submits it one after another. TestGridmixSubmission, I saw some changes not synced (see diff-gridmix.patch). (Task.Counter.SPLIT_RAW_BYTES) isnt there in 20.10 . So i reverted it to the current state.
          Hide
          Hong Tang added a comment -

          I applied your patch to yhadoop 20.10 and compared it with 20.1xx. I will attach the diff files after this.

          Some comments:

          • src/contrib/build.xml contains changes related to hdfsproxy.
          • In the first GridmixJob constructor, you are setting the UGI in the shared instance of Configuration object. This would interfere with jobs submitted concurrently.
          • In JobSubmitter.java, should keep the same logging messages.
          • Please remove the GridmixSplit class
          • EchoUserResolver is not synced with 20.1xx branch (see diff-gridmix.patch).
          • TestGridmixSubmission, I saw some changes not synced (see diff-gridmix.patch).
          • Some changes in Rumen do not seem to be synced (see diff-rumen.patch).

          Also, a general comment, please try to minimize white space changes. Otherwise, it would make future cross porting harder, and increase the burden of reviewing effort.

          Show
          Hong Tang added a comment - I applied your patch to yhadoop 20.10 and compared it with 20.1xx. I will attach the diff files after this. Some comments: src/contrib/build.xml contains changes related to hdfsproxy. In the first GridmixJob constructor, you are setting the UGI in the shared instance of Configuration object. This would interfere with jobs submitted concurrently. In JobSubmitter.java, should keep the same logging messages. Please remove the GridmixSplit class EchoUserResolver is not synced with 20.1xx branch (see diff-gridmix.patch). TestGridmixSubmission, I saw some changes not synced (see diff-gridmix.patch). Some changes in Rumen do not seem to be synced (see diff-rumen.patch). Also, a general comment, please try to minimize white space changes. Otherwise, it would make future cross porting harder, and increase the burden of reviewing effort.
          Hide
          rahul k singh added a comment -

          Attaching the patch for cross porting features from 20.1xx to 20.10 .

          Show
          rahul k singh added a comment - Attaching the patch for cross porting features from 20.1xx to 20.10 .
          Hide
          rahul k singh added a comment -

          Implemented the above suggestions.

          Show
          rahul k singh added a comment - Implemented the above suggestions.
          Hide
          Hong Tang added a comment -

          Removed the mapping altogether , as we are already storing jobstory in submitted list , it is being used in verify.

          That is a good change. Thanks. Two more minor comments:

          • There is another occurrence where you do string replacement of GRIDMIX with MOCKJOB, shouldn't that be replaced too?
          • One minor thing - any reason why you change the # of TTs in MiniMRCluster from 3 to 6? I am concerned that with the default 2/2 map/reduce configuration, you would end up having up to 24 tasks running simultaneously on a single machine.
          Show
          Hong Tang added a comment - Removed the mapping altogether , as we are already storing jobstory in submitted list , it is being used in verify. That is a good change. Thanks. Two more minor comments: There is another occurrence where you do string replacement of GRIDMIX with MOCKJOB, shouldn't that be replaced too? One minor thing - any reason why you change the # of TTs in MiniMRCluster from 3 to 6? I am concerned that with the default 2/2 map/reduce configuration, you would end up having up to 24 tasks running simultaneously on a single machine.
          Hide
          rahul k singh added a comment -

          Implemented all the suggestions.

          Removed the mapping altogether , as we are already storing jobstory in submitted list , it is being used in verify.

          Show
          rahul k singh added a comment - Implemented all the suggestions. Removed the mapping altogether , as we are already storing jobstory in submitted list , it is being used in verify.
          Hide
          rahul k singh added a comment -

          small issue with the patch will upload the new patch.

          Show
          rahul k singh added a comment - small issue with the patch will upload the new patch.
          Hide
          Hong Tang added a comment -

          There still seem to be some bugs in the code.

          • MockJob.getQueueName() should not rely on seq to get the queue name. Consider the case when we call oneMockJob.getQueueName() after a few more mock jobs are created after oneMockJob.
          • The mapping from job name to queue name in DebugJobProducer should not be static.
          • In the following code, it is a bad idea to hardwire the knowledge how gridmix construct names. Also, the sequencing of gridmix jobs could be different from mock jobs. Cannot you just do job.getConfiguration().get(GridmixJob.ORIGNAME)?
            +        } else {
            +          assertEquals(
            +            " Improper queue for  " + job.getJobName() + " ",
            +            job.getConfiguration().get("mapred.job.queue.name"),
            +            DebugJobProducer.getQueueName(
            +              job.getJobName().replace(
            +                "GRIDMIX", "MOCKJOB")));
            +        }
            
          Show
          Hong Tang added a comment - There still seem to be some bugs in the code. MockJob.getQueueName() should not rely on seq to get the queue name. Consider the case when we call oneMockJob.getQueueName() after a few more mock jobs are created after oneMockJob. The mapping from job name to queue name in DebugJobProducer should not be static. In the following code, it is a bad idea to hardwire the knowledge how gridmix construct names. Also, the sequencing of gridmix jobs could be different from mock jobs. Cannot you just do job.getConfiguration().get(GridmixJob.ORIGNAME)? + } else { + assertEquals( + " Improper queue for " + job.getJobName() + " ", + job.getConfiguration().get("mapred.job.queue.name"), + DebugJobProducer.getQueueName( + job.getJobName().replace( + "GRIDMIX", "MOCKJOB"))); + }
          Hide
          rahul k singh added a comment -

          Implemented the changes suggested by hong

          Show
          rahul k singh added a comment - Implemented the changes suggested by hong
          Hide
          Hong Tang added a comment -
          +        assertEquals(" Improper queue for  " + job.getJobName() + " " ,
          +          job.getConfiguration().get("mapred.job.queue.name"),
          +                     "q" + (Integer.valueOf(job.getJobName().substring(
          +                       job.getJobName().length() - 1))+1));
          

          This code is fragile, why not using GridmixJob.getJobSeqId(job)?

          Show
          Hong Tang added a comment - + assertEquals(" Improper queue for " + job.getJobName() + " " , + job.getConfiguration().get("mapred.job.queue.name"), + "q" + (Integer.valueOf(job.getJobName().substring( + job.getJobName().length() - 1))+1)); This code is fragile, why not using GridmixJob.getJobSeqId(job)?
          Hide
          Hong Tang added a comment -
          • We should generate data only once.
          • We may want to increase the # of jobs submitted more than 3. To avoid lengthening too much of unit test run time, we should consider running sleep jobs, or processing less bytes in that specific case.
          • There is another problem I missed which is that JobQueueTaskScheduler does not really understand the notion of queues while we may not be able to have the unit test depending on capacity scheduler. This issue is probably outside the scope of this jira, though.
          Show
          Hong Tang added a comment - We should generate data only once. We may want to increase the # of jobs submitted more than 3. To avoid lengthening too much of unit test run time, we should consider running sleep jobs, or processing less bytes in that specific case. There is another problem I missed which is that JobQueueTaskScheduler does not really understand the notion of queues while we may not be able to have the unit test depending on capacity scheduler. This issue is probably outside the scope of this jira, though.
          Hide
          rahul k singh added a comment -
          • The test case only submits two jobs into two queues (q1 and q2). Can we increase the number of jobs to something between 5-10 (pick one that allows the test to finish in reasonable amount of time like 5mins)? And can we randomly decide which queue each job is submitted to (1 or 2)?

          Implemented the first suggestion now there are 3 jobs submitting to 2 queue.

          • Need a unit case to test replay submission with default queue.
            The code path for setting of queue is alredy covered in STRESS.
          Show
          rahul k singh added a comment - The test case only submits two jobs into two queues (q1 and q2). Can we increase the number of jobs to something between 5-10 (pick one that allows the test to finish in reasonable amount of time like 5mins)? And can we randomly decide which queue each job is submitted to (1 or 2)? Implemented the first suggestion now there are 3 jobs submitting to 2 queue. Need a unit case to test replay submission with default queue. The code path for setting of queue is alredy covered in STRESS.
          Hide
          Hong Tang added a comment -

          Two quick comments:

          • The test case only submits two jobs into two queues (q1 and q2). Can we increase the number of jobs to something between 5-10 (pick one that allows the test to finish in reasonable amount of time like 5mins)? And can we randomly decide which queue each job is submitted to (1 or 2)?
          • Need a unit case to test replay submission with default queue.
          Show
          Hong Tang added a comment - Two quick comments: The test case only submits two jobs into two queues (q1 and q2). Can we increase the number of jobs to something between 5-10 (pick one that allows the test to finish in reasonable amount of time like 5mins)? And can we randomly decide which queue each job is submitted to (1 or 2)? Need a unit case to test replay submission with default queue.
          Hide
          rahul k singh added a comment -

          Changed the TestGridmixSubmission to add checks for queue.

          Show
          rahul k singh added a comment - Changed the TestGridmixSubmission to add checks for queue.
          Hide
          Hong Tang added a comment -

          preliminary patch for yhadoop 20.1xx branch.

          Show
          Hong Tang added a comment - preliminary patch for yhadoop 20.1xx branch.

            People

            • Assignee:
              rahul k singh
              Reporter:
              Hong Tang
            • Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development