Hadoop Map/Reduce
  1. Hadoop Map/Reduce
  2. MAPREDUCE-1695

capacity scheduler is not included in findbugs/javadoc targets

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.21.0
    • Component/s: capacity-sched
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      Capacity Scheduler is not included in findbugs/javadoc targets.

      1. ASF.LICENSE.NOT.GRANTED--MAPREDUCE-1695.patch
        1 kB
        Hong Tang
      2. ASF.LICENSE.NOT.GRANTED--mr1695-hadoop-findbugs-report-1.html
        35 kB
        Hong Tang
      3. ASF.LICENSE.NOT.GRANTED--mr1695-hadoop-findbugs-report-2.html
        16 kB
        Hong Tang
      4. ASF.LICENSE.NOT.GRANTED--MAPREDUCE-1695-2.patch
        14 kB
        Hong Tang
      5. ASF.LICENSE.NOT.GRANTED--MAPREDUCE-1695-3.patch
        15 kB
        Hong Tang
      6. mapreduce-1695-20100420.patch
        15 kB
        Hong Tang
      7. mapreduce-1695-20100420-2.patch
        15 kB
        Hong Tang
      8. mapreduce-1695-20100421.patch
        19 kB
        Hong Tang
      9. mapreduce-1695-20100420-2.patch
        15 kB
        Hong Tang
      10. mapreduce-1695-20100422.patch
        20 kB
        Hong Tang

        Issue Links

          Activity

          Hide
          Tom White added a comment -

          As noted in MAPREDUCE-1625, since the capacity scheduler lives in the mapred package, including it in the user javadoc is potentially confusing to users, since it is not a user-facing API. Perhaps it should just be added to the javadoc-dev target.

          Show
          Tom White added a comment - As noted in MAPREDUCE-1625 , since the capacity scheduler lives in the mapred package, including it in the user javadoc is potentially confusing to users, since it is not a user-facing API. Perhaps it should just be added to the javadoc-dev target.
          Hide
          Hong Tang added a comment -

          Good point.

          Show
          Hong Tang added a comment - Good point.
          Hide
          Hong Tang added a comment -

          Trivial patch that adds capacity-scheduler for the following targets: all-jars, findbugs, javadoc-dev.

          Show
          Hong Tang added a comment - Trivial patch that adds capacity-scheduler for the following targets: all-jars, findbugs, javadoc-dev.
          Hide
          Hong Tang added a comment -

          No test included, as it only changes build.xml.

          Show
          Hong Tang added a comment - No test included, as it only changes build.xml.
          Hide
          Hong Tang added a comment -

          I raw findbugs locally (version 1.3.8) and found quite a few bugs related to capacity scheduler. File mr1695-hadoop-findbugs-report-1.html is the findbugs report.

          Show
          Hong Tang added a comment - I raw findbugs locally (version 1.3.8) and found quite a few bugs related to capacity scheduler. File mr1695-hadoop-findbugs-report-1.html is the findbugs report.
          Hide
          Hong Tang added a comment -

          Attached the wrong file in my previous attempt. There are 22 findbugs warnings, 5 of which are existent in current trunk.

          Show
          Hong Tang added a comment - Attached the wrong file in my previous attempt. There are 22 findbugs warnings, 5 of which are existent in current trunk.
          Hide
          Hong Tang added a comment -

          I fixed most of them. There are 8 findbugs warnings now. 5 existing ones, three about unused variables.

          Show
          Hong Tang added a comment - I fixed most of them. There are 8 findbugs warnings now. 5 existing ones, three about unused variables.
          Hide
          Hong Tang added a comment -

          New patch that includes both the ant build target updates and the findbugs warning fixes.

          Show
          Hong Tang added a comment - New patch that includes both the ant build target updates and the findbugs warning fixes.
          Hide
          Hong Tang added a comment -

          New patch that also eliminates the three warnings wrt unused variables.

          Show
          Hong Tang added a comment - New patch that also eliminates the three warnings wrt unused variables.
          Hide
          Hong Tang added a comment -

          resubmit for hudson.

          Show
          Hong Tang added a comment - resubmit for hudson.
          Hide
          Hong Tang added a comment -

          Hudson seems to have problem commenting on jira now. The following is what I manually copied from Hudson test-patch output. Link: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/106/console.

               [exec] 
               [exec] +1 overall.  Here are the results of testing the latest attachment 
               [exec]   http://issues.apache.org/jira/secure/attachment/12441668/MAPREDUCE-1695-3.patch
               [exec]   against trunk revision 933441.
               [exec] 
               [exec]     +1 @author.  The patch does not contain any @author tags.
               [exec] 
               [exec]     +1 tests included.  The patch appears to include 3 new or modified tests.
               [exec] 
               [exec]     +1 javadoc.  The javadoc tool did not generate any warning messages.
               [exec] 
               [exec]     +1 javac.  The applied patch does not increase the total number of javac compiler warnings.
               [exec] 
               [exec]     +1 findbugs.  The patch does not introduce any new Findbugs warnings.
               [exec] 
               [exec]     +1 release audit.  The applied patch does not increase the total number of release audit warnings.
               [exec] 
               [exec]     +1 core tests.  The patch passed core unit tests.
               [exec] 
               [exec]     +1 contrib tests.  The patch passed contrib unit tests.
          
          Show
          Hong Tang added a comment - Hudson seems to have problem commenting on jira now. The following is what I manually copied from Hudson test-patch output. Link: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/106/console . [exec] [exec] +1 overall. Here are the results of testing the latest attachment [exec] http://issues.apache.org/jira/secure/attachment/12441668/MAPREDUCE-1695-3.patch [exec] against trunk revision 933441. [exec] [exec] +1 @author. The patch does not contain any @author tags. [exec] [exec] +1 tests included. The patch appears to include 3 new or modified tests. [exec] [exec] +1 javadoc. The javadoc tool did not generate any warning messages. [exec] [exec] +1 javac. The applied patch does not increase the total number of javac compiler warnings. [exec] [exec] +1 findbugs. The patch does not introduce any new Findbugs warnings. [exec] [exec] +1 release audit. The applied patch does not increase the total number of release audit warnings. [exec] [exec] +1 core tests. The patch passed core unit tests. [exec] [exec] +1 contrib tests. The patch passed contrib unit tests.
          Hide
          Hong Tang added a comment -

          I hereby license the attachment "MAPREDUCE-1695-3.patch" to ASF.

          Show
          Hong Tang added a comment - I hereby license the attachment " MAPREDUCE-1695 -3.patch" to ASF.
          Hide
          Hemanth Yamijala added a comment -

          Is it fair to assume that most projects would want to be included in findbugs target and javadoc-dev target by default ? Is there some way by which we can make this the default behavior then ?

          Show
          Hemanth Yamijala added a comment - Is it fair to assume that most projects would want to be included in findbugs target and javadoc-dev target by default ? Is there some way by which we can make this the default behavior then ?
          Hide
          Hemanth Yamijala added a comment -

          Is there some way by which we can make this the default behavior then ?

          To clarify, I mean make it the default behavior without the project needing to do anything.

          Show
          Hemanth Yamijala added a comment - Is there some way by which we can make this the default behavior then ? To clarify, I mean make it the default behavior without the project needing to do anything.
          Hide
          Hadoop QA added a comment -

          +1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12441668/MAPREDUCE-1695-3.patch
          against trunk revision 933441.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 3 new or modified tests.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 findbugs. The patch does not introduce any new Findbugs warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 core tests. The patch passed core unit tests.

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/357/testReport/
          Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/357/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
          Checkstyle results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/357/artifact/trunk/build/test/checkstyle-errors.html
          Console output: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/357/console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - +1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12441668/MAPREDUCE-1695-3.patch against trunk revision 933441. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/357/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/357/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/357/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/357/console This message is automatically generated.
          Hide
          Hong Tang added a comment -

          Is it fair to assume that most projects would want to be included in findbugs target and javadoc-dev target by default ?

          I'd say YES to this question. But I suggest we separate this to a different jira because it would affect all contrib projects (and possibly an inevitable discussion on "contrib projects" vs "subprojects"), and the solution may need some more deliberation.

          Show
          Hong Tang added a comment - Is it fair to assume that most projects would want to be included in findbugs target and javadoc-dev target by default ? I'd say YES to this question. But I suggest we separate this to a different jira because it would affect all contrib projects (and possibly an inevitable discussion on "contrib projects" vs "subprojects"), and the solution may need some more deliberation.
          Hide
          Hadoop QA added a comment -

          +1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12441668/MAPREDUCE-1695-3.patch
          against trunk revision 933441.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 3 new or modified tests.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 findbugs. The patch does not introduce any new Findbugs warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 core tests. The patch passed core unit tests.

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/358/testReport/
          Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/358/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
          Checkstyle results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/358/artifact/trunk/build/test/checkstyle-errors.html
          Console output: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/358/console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - +1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12441668/MAPREDUCE-1695-3.patch against trunk revision 933441. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/358/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/358/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/358/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/358/console This message is automatically generated.
          Hide
          Hemanth Yamijala added a comment -

          Hong, I agree that changing the default behavior to include other contrib projects into findbugs etc automatically should be a separate JIRA.

          Show
          Hemanth Yamijala added a comment - Hong, I agree that changing the default behavior to include other contrib projects into findbugs etc automatically should be a separate JIRA.
          Hide
          Hemanth Yamijala added a comment -

          I started looking at this patch. A few observations:

          • The synchronization added around obtainNewTask seems unnecessary as all code paths are locked on the scheduler. But as such, I don't see any harm with the changes made here in the patch.
          • getClusterCapacity seems unused. Should we remove it ?
          • There seem to be some constructor variables that are not needed. They are named 'ignored'. I suppose we can remove them ?
          • Similarly, I suppose we can also remove the MapTaskDataView and ReduceTaskDataView constructors ?
          • This is not really a comment on this patch, but I am noting it here to get some comments from other folks:
            There seems to be a bug in the user limit handling code that is fixed by this patch. Before this patch, TaskSchedulingContext.updateNoOfSlotsOccupiedByUser on a container queue was only adding entries when the user's name was already added to its data structure. However, since this map begins empty, it would never be updated with a user's statistics from a child queue. While the bug is being fixed in this patch, I think this is a serious enough bug to warrant a test case. If my understanding is correct, we should file a JIRA to add a test case to verify this.
          Show
          Hemanth Yamijala added a comment - I started looking at this patch. A few observations: The synchronization added around obtainNewTask seems unnecessary as all code paths are locked on the scheduler. But as such, I don't see any harm with the changes made here in the patch. getClusterCapacity seems unused. Should we remove it ? There seem to be some constructor variables that are not needed. They are named 'ignored'. I suppose we can remove them ? Similarly, I suppose we can also remove the MapTaskDataView and ReduceTaskDataView constructors ? This is not really a comment on this patch, but I am noting it here to get some comments from other folks: There seems to be a bug in the user limit handling code that is fixed by this patch. Before this patch, TaskSchedulingContext.updateNoOfSlotsOccupiedByUser on a container queue was only adding entries when the user's name was already added to its data structure. However, since this map begins empty, it would never be updated with a user's statistics from a child queue. While the bug is being fixed in this patch, I think this is a serious enough bug to warrant a test case. If my understanding is correct, we should file a JIRA to add a test case to verify this.
          Hide
          Hong Tang added a comment -

          Hemanth, thanks for reviewing the patch!

          The synchronization added around obtainNewTask seems unnecessary as all code paths are locked on the scheduler.

          Possibly true. My changes were only motivated by the findbugs report which could produce false positives.

          getClusterCapacity seems unused. Should we remove it ?

          It seems to be an interface method and could be usable to support adaptive scheduling decisions based on load pressure in the future. But I am not familiar with Capacity Scheduler enough, so either way is fine with me.

          There seem to be some constructor variables that are not needed. They are named 'ignored'. I suppose we can remove them ?

          Similarly, I suppose we can also remove the MapTaskDataView and ReduceTaskDataView constructors ?

          Sure. Only reason I kept it there is in case these classes are constructed through reflection and removing a parameter would lead to constructor-not-found exception.

          This is not really a comment on this patch, but I am noting it here to get some comments from other folks: There seems to be a bug in the user limit handling code that is fixed by this patch. Before this patch, TaskSchedulingContext.updateNoOfSlotsOccupiedByUser on a container queue was only adding entries when the user's name was already added to its data structure. However, since this map begins empty, it would never be updated with a user's statistics from a child queue. While the bug is being fixed in this patch, I think this is a serious enough bug to warrant a test case. If my understanding is correct, we should file a JIRA to add a test case to verify this.

          Yes, I talked with Arun about this, and I was very puzzled by this code that seems to be doing nothing useful. I will remove the fix in this patch (the else statement) and regenerate the patch.

          Show
          Hong Tang added a comment - Hemanth, thanks for reviewing the patch! The synchronization added around obtainNewTask seems unnecessary as all code paths are locked on the scheduler. Possibly true. My changes were only motivated by the findbugs report which could produce false positives. getClusterCapacity seems unused. Should we remove it ? It seems to be an interface method and could be usable to support adaptive scheduling decisions based on load pressure in the future. But I am not familiar with Capacity Scheduler enough, so either way is fine with me. There seem to be some constructor variables that are not needed. They are named 'ignored'. I suppose we can remove them ? Similarly, I suppose we can also remove the MapTaskDataView and ReduceTaskDataView constructors ? Sure. Only reason I kept it there is in case these classes are constructed through reflection and removing a parameter would lead to constructor-not-found exception. This is not really a comment on this patch, but I am noting it here to get some comments from other folks: There seems to be a bug in the user limit handling code that is fixed by this patch. Before this patch, TaskSchedulingContext.updateNoOfSlotsOccupiedByUser on a container queue was only adding entries when the user's name was already added to its data structure. However, since this map begins empty, it would never be updated with a user's statistics from a child queue. While the bug is being fixed in this patch, I think this is a serious enough bug to warrant a test case. If my understanding is correct, we should file a JIRA to add a test case to verify this. Yes, I talked with Arun about this, and I was very puzzled by this code that seems to be doing nothing useful. I will remove the fix in this patch (the else statement) and regenerate the patch.
          Hide
          Hong Tang added a comment -

          If my understanding is correct, we should file a JIRA to add a test case to verify this.

          Filed MAPREDUCE-1715.

          Show
          Hong Tang added a comment - If my understanding is correct, we should file a JIRA to add a test case to verify this. Filed MAPREDUCE-1715 .
          Hide
          Hong Tang added a comment -

          New patch removed the fix.

          Show
          Hong Tang added a comment - New patch removed the fix.
          Hide
          Hemanth Yamijala added a comment -

          Only reason I kept it there is in case these classes are constructed through reflection and removing a parameter would lead to constructor-not-found exception.

          Hong, I don't think this is the case for the classes in question - namely QueueInfo and TaskSchedulingContext. IMO, it would make the code more clear removing them.

          I agree leaving the other methods untouched, because they don't hurt or confuse in any way, and as you point out might help in future.

          +1 also for fixing MAPREDUCE-1715 separately.

          Show
          Hemanth Yamijala added a comment - Only reason I kept it there is in case these classes are constructed through reflection and removing a parameter would lead to constructor-not-found exception. Hong, I don't think this is the case for the classes in question - namely QueueInfo and TaskSchedulingContext. IMO, it would make the code more clear removing them. I agree leaving the other methods untouched, because they don't hurt or confuse in any way, and as you point out might help in future. +1 also for fixing MAPREDUCE-1715 separately.
          Hide
          Hong Tang added a comment -

          Hong, I don't think this is the case for the classes in question - namely QueueInfo and TaskSchedulingContext. IMO, it would make the code more clear removing them.

          Sure. I will update the patch to remove the "ignore" parameters.

          Show
          Hong Tang added a comment - Hong, I don't think this is the case for the classes in question - namely QueueInfo and TaskSchedulingContext. IMO, it would make the code more clear removing them. Sure. I will update the patch to remove the "ignore" parameters.
          Hide
          Hong Tang added a comment -

          new patch addressed hemanth's comments.

          Show
          Hong Tang added a comment - new patch addressed hemanth's comments.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12442335/mapreduce-1695-20100420-2.patch
          against trunk revision 936042.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 3 new or modified tests.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 findbugs. The patch does not introduce any new Findbugs warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed core unit tests.

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/359/testReport/
          Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/359/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
          Checkstyle results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/359/artifact/trunk/build/test/checkstyle-errors.html
          Console output: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/359/console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12442335/mapreduce-1695-20100420-2.patch against trunk revision 936042. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/359/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/359/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/359/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/359/console This message is automatically generated.
          Hide
          Hemanth Yamijala added a comment -

          +1 for the changes.

          -1 core tests. The patch failed core unit tests.

          org.apache.hadoop.mapred.TestNodeRefresh.testMRRefreshRecommissioning was reported as failed with the error "Forked Java VM exited abnormally. Please note the time in the report does not reflect the time until the VM exit."

          This error has been seen on and off on Hudson builds. However,

          • This patch does nothing to change Map/Reduce code base.
          • The testcase ran locally when I ran it with the patch.

          Based on this, I think this is good to go.

          Show
          Hemanth Yamijala added a comment - +1 for the changes. -1 core tests. The patch failed core unit tests. org.apache.hadoop.mapred.TestNodeRefresh.testMRRefreshRecommissioning was reported as failed with the error "Forked Java VM exited abnormally. Please note the time in the report does not reflect the time until the VM exit." This error has been seen on and off on Hudson builds. However, This patch does nothing to change Map/Reduce code base. The testcase ran locally when I ran it with the patch. Based on this, I think this is good to go.
          Hide
          Hemanth Yamijala added a comment -

          I will hold commit for just a while. There is one point I wanted to bring to attention, and would appreciate a second look:

          The patch introduces a locking order of queue manager, and then scheduler. This is in the refreshQueues call in CapacitySchedulerQueueRefresher. There is a locking of the scheduler and then the queue manager, when the scheduler calls setSchedulerInfo in createDisplayInfo. At the face of it, this seems like a cyclic lock order with a potential to create a deadlock.

          On closer look, I think the two code paths cannot be active at the same time. The call to refreshQueues can happen only when a refreshQueues command is fired, and right now, that can happen only via an RPC call to the JobTracker. The call to setSchedulerInfo happens only at startup time, and this happens before the InterTrackerServer is started.

          So, the patch seems safe still. But does it seem too close to be comfortable ?

          Show
          Hemanth Yamijala added a comment - I will hold commit for just a while. There is one point I wanted to bring to attention, and would appreciate a second look: The patch introduces a locking order of queue manager, and then scheduler. This is in the refreshQueues call in CapacitySchedulerQueueRefresher. There is a locking of the scheduler and then the queue manager, when the scheduler calls setSchedulerInfo in createDisplayInfo. At the face of it, this seems like a cyclic lock order with a potential to create a deadlock. On closer look, I think the two code paths cannot be active at the same time. The call to refreshQueues can happen only when a refreshQueues command is fired, and right now, that can happen only via an RPC call to the JobTracker. The call to setSchedulerInfo happens only at startup time, and this happens before the InterTrackerServer is started. So, the patch seems safe still. But does it seem too close to be comfortable ?
          Hide
          Hong Tang added a comment -

          So, the patch seems safe still. But does it seem too close to be comfortable ?

          No. I would never feel comfortable knowing that the correctness of the synchronization structure is hanging on a thin thread of reasoning.

          I dig a bit deeper in the code, it seems task scheduler has the knowledge of queue manager but not vice versa (queue manager only knows queue refresher). So I suppose the right order of locking should be "task scheduler" -> "queue manager". To get that correct, it seems we can fix it by having JobTracker to lock task scheduler before it calls anything in queue manager that could call back to task scheduler - for now there is only one such call site which is refreshQueues().

          Show
          Hong Tang added a comment - So, the patch seems safe still. But does it seem too close to be comfortable ? No. I would never feel comfortable knowing that the correctness of the synchronization structure is hanging on a thin thread of reasoning. I dig a bit deeper in the code, it seems task scheduler has the knowledge of queue manager but not vice versa (queue manager only knows queue refresher). So I suppose the right order of locking should be "task scheduler" -> "queue manager". To get that correct, it seems we can fix it by having JobTracker to lock task scheduler before it calls anything in queue manager that could call back to task scheduler - for now there is only one such call site which is refreshQueues().
          Hide
          Hong Tang added a comment -

          New patch that corrected the order of the synchronization between task scheduler and queue manager. Findbugs still warns about the inconsistent synchronization and I have to exclude them in findbugsExcludeFile.xml.

          Show
          Hong Tang added a comment - New patch that corrected the order of the synchronization between task scheduler and queue manager. Findbugs still warns about the inconsistent synchronization and I have to exclude them in findbugsExcludeFile.xml.
          Hide
          Hemanth Yamijala added a comment -

          Hong, this approach introduces a coarse locking of the scheduler. It is imaginable that schedulers can implement lazy refreshing options - for e.g. let the current scheduling operation complete before picking up the refreshed configuration. For such cases, the current approach is imposing a locking constraint that circumvents any such optimization. (In fact, digging further, I actually see we are already locking the scheduler in initializeQueues when we want to switch the queue hierarchy - hence we already have the inverse locking issue even without this patch - but for the same reason I explained before, it probably still works).

          That said, I am not fully convinced this is a bad idea. On the one hand, the simplicity of this approach is appealing. On the other, the effects of a coarse locking have bitten us hard before, as you are well aware. So, I am just a little wary.

          Thoughts ? Opinion from others ?

          Show
          Hemanth Yamijala added a comment - Hong, this approach introduces a coarse locking of the scheduler. It is imaginable that schedulers can implement lazy refreshing options - for e.g. let the current scheduling operation complete before picking up the refreshed configuration. For such cases, the current approach is imposing a locking constraint that circumvents any such optimization. (In fact, digging further, I actually see we are already locking the scheduler in initializeQueues when we want to switch the queue hierarchy - hence we already have the inverse locking issue even without this patch - but for the same reason I explained before, it probably still works). That said, I am not fully convinced this is a bad idea. On the one hand, the simplicity of this approach is appealing. On the other, the effects of a coarse locking have bitten us hard before, as you are well aware. So, I am just a little wary. Thoughts ? Opinion from others ?
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12442406/mapreduce-1695-20100421.patch
          against trunk revision 936166.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 9 new or modified tests.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 findbugs. The patch does not introduce any new Findbugs warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed core unit tests.

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/124/testReport/
          Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/124/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
          Checkstyle results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/124/artifact/trunk/build/test/checkstyle-errors.html
          Console output: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/124/console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12442406/mapreduce-1695-20100421.patch against trunk revision 936166. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 9 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/124/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/124/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/124/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/124/console This message is automatically generated.
          Hide
          Hong Tang added a comment -

          It is imaginable that schedulers can implement lazy refreshing options - for e.g. let the current scheduling operation complete before picking up the refreshed configuration.

          To implement such optimizations, it would probably need a bigger surgery to the synchronization structure (e.g. explicitly using conditional variables to signal a good timing when the queue could be refreshed). On the other hand, does anybody know how frequently would the queue structure be refreshed? If not very frequent, the main overhead of the implementation is to tie up one thread waiting for the task scheduler lock, and I'd tend to think it is acceptable.

          Show
          Hong Tang added a comment - It is imaginable that schedulers can implement lazy refreshing options - for e.g. let the current scheduling operation complete before picking up the refreshed configuration. To implement such optimizations, it would probably need a bigger surgery to the synchronization structure (e.g. explicitly using conditional variables to signal a good timing when the queue could be refreshed). On the other hand, does anybody know how frequently would the queue structure be refreshed? If not very frequent, the main overhead of the implementation is to tie up one thread waiting for the task scheduler lock, and I'd tend to think it is acceptable.
          Hide
          Hemanth Yamijala added a comment -

          On the other hand, does anybody know how frequently would the queue structure be refreshed?

          Very infrequently, compared to the other competing operations like heartbeats.

          If not very frequent, the main overhead of the implementation is to tie up one thread waiting for the task scheduler lock, and I'd tend to think it is acceptable.

          Typical implementations of scheduling lock the task scheduler (i.e. the assignTasks call in the scheduler). Hence, when the queue refresh is triggered, all heartbeats will be locked. At least that is the case with the capacity scheduler.

          Still, given how infrequent this is going to be (and also a typical queueRefresh operation is very fast) I think I am fine with this approach. If there are no other objections, let us go ahead. Makes sense ?

          That said, looking at the latest changes, I did not quite follow why you introduced a new static refreshQueues method. The callers of this (apart from the non-static method in JobTracker) are in test cases. All these tests have a scheduler instance. We could lock it and then call the QueueManager API to be consistent. So it seems that the static method is an unnecessary indirection. Am I missing something ?

          Findbugs still warns about the inconsistent synchronization and I have to exclude them in findbugsExcludeFile.xml.

          I suppose this is because FindBugs does not realize the taskScheduler instance on which we are locking in the refreshQueues is the same instance which is locked in the other usages of this variable. So, this seems a valid reason to add to the exclude file. Do you think it makes sense to document this rationale in the excludes file so there is context ? BTW, thanks for updating the API documentation of the refreshQueues contract in TaskScheduler. It is very useful.

          Show
          Hemanth Yamijala added a comment - On the other hand, does anybody know how frequently would the queue structure be refreshed? Very infrequently, compared to the other competing operations like heartbeats. If not very frequent, the main overhead of the implementation is to tie up one thread waiting for the task scheduler lock, and I'd tend to think it is acceptable. Typical implementations of scheduling lock the task scheduler (i.e. the assignTasks call in the scheduler). Hence, when the queue refresh is triggered, all heartbeats will be locked. At least that is the case with the capacity scheduler. Still, given how infrequent this is going to be (and also a typical queueRefresh operation is very fast) I think I am fine with this approach. If there are no other objections, let us go ahead. Makes sense ? That said, looking at the latest changes, I did not quite follow why you introduced a new static refreshQueues method. The callers of this (apart from the non-static method in JobTracker) are in test cases. All these tests have a scheduler instance. We could lock it and then call the QueueManager API to be consistent. So it seems that the static method is an unnecessary indirection. Am I missing something ? Findbugs still warns about the inconsistent synchronization and I have to exclude them in findbugsExcludeFile.xml. I suppose this is because FindBugs does not realize the taskScheduler instance on which we are locking in the refreshQueues is the same instance which is locked in the other usages of this variable. So, this seems a valid reason to add to the exclude file. Do you think it makes sense to document this rationale in the excludes file so there is context ? BTW, thanks for updating the API documentation of the refreshQueues contract in TaskScheduler. It is very useful.
          Hide
          Hong Tang added a comment -

          That said, looking at the latest changes, I did not quite follow why you introduced a new static refreshQueues method.

          Probably a matter of taste. I hate to replicate the pattern of "synchronized (taskscheduler)

          { queueManager.refreshQueues(...); }

          " and thus introduced this helper method. But since all other usages are in one test case TestRefreshOfQueues, I suppose I can move this method into that class instead to avoid having the test case depend on a non-public method in JobTracker.

          Do you think it makes sense to document this rationale in the excludes file so there is context ?

          Sure, I can add the comments in the excludes file to that effect.

          BTW, thanks for the prompt and detailed reviews.

          Show
          Hong Tang added a comment - That said, looking at the latest changes, I did not quite follow why you introduced a new static refreshQueues method. Probably a matter of taste. I hate to replicate the pattern of "synchronized (taskscheduler) { queueManager.refreshQueues(...); } " and thus introduced this helper method. But since all other usages are in one test case TestRefreshOfQueues, I suppose I can move this method into that class instead to avoid having the test case depend on a non-public method in JobTracker. Do you think it makes sense to document this rationale in the excludes file so there is context ? Sure, I can add the comments in the excludes file to that effect. BTW, thanks for the prompt and detailed reviews.
          Hide
          Hemanth Yamijala added a comment -

          But since all other usages are in one test case TestRefreshOfQueues, I suppose I can move this method into that class instead to avoid having the test case depend on a non-public method in JobTracker.

          +1. This sounds right to me.

          Show
          Hemanth Yamijala added a comment - But since all other usages are in one test case TestRefreshOfQueues, I suppose I can move this method into that class instead to avoid having the test case depend on a non-public method in JobTracker. +1. This sounds right to me.
          Hide
          Hong Tang added a comment -

          New patch that moved the helper method to TestRefreshOfQueues, and added comments in findbugs exclusion file.

          Show
          Hong Tang added a comment - New patch that moved the helper method to TestRefreshOfQueues, and added comments in findbugs exclusion file.
          Hide
          Hemanth Yamijala added a comment -

          Hong, it seems like you attached an old patch. Can you please check ?

          Show
          Hemanth Yamijala added a comment - Hong, it seems like you attached an old patch. Can you please check ?
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12442522/mapreduce-1695-20100420-2.patch
          against trunk revision 936166.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 3 new or modified tests.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 findbugs. The patch does not introduce any new Findbugs warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 core tests. The patch passed core unit tests.

          -1 contrib tests. The patch failed contrib unit tests.

          Test results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/126/testReport/
          Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/126/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
          Checkstyle results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/126/artifact/trunk/build/test/checkstyle-errors.html
          Console output: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/126/console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12442522/mapreduce-1695-20100420-2.patch against trunk revision 936166. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. -1 contrib tests. The patch failed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/126/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/126/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/126/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/126/console This message is automatically generated.
          Hide
          Hong Tang added a comment -

          My bad. Attaching the correct patch.

          Show
          Hong Tang added a comment - My bad. Attaching the correct patch.
          Hide
          Hadoop QA added a comment -

          +1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12442564/mapreduce-1695-20100422.patch
          against trunk revision 936166.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 9 new or modified tests.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 findbugs. The patch does not introduce any new Findbugs warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 core tests. The patch passed core unit tests.

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/128/testReport/
          Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/128/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
          Checkstyle results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/128/artifact/trunk/build/test/checkstyle-errors.html
          Console output: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/128/console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - +1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12442564/mapreduce-1695-20100422.patch against trunk revision 936166. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 9 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/128/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/128/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/128/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/128/console This message is automatically generated.
          Hide
          Hemanth Yamijala added a comment -

          +1. Looks good to me. This is ready to go.

          Show
          Hemanth Yamijala added a comment - +1. Looks good to me. This is ready to go.
          Hide
          Hemanth Yamijala added a comment -

          I just committed this. Thanks, Hong !

          Show
          Hemanth Yamijala added a comment - I just committed this. Thanks, Hong !

            People

            • Assignee:
              Hong Tang
              Reporter:
              Hong Tang
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development