Hadoop Map/Reduce
  1. Hadoop Map/Reduce
  2. MAPREDUCE-1376

Support for varied user submission in Gridmix

    Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.22.0
    • Component/s: contrib/gridmix
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      Gridmix currently submits all synthetic jobs as the client user. It should be possible to map users in the trace to a set of users appropriate for the target cluster.

      1. 1376-2-yhadoop-security.patch
        117 kB
        rahul k singh
      2. 1376-3-yhadoop20.100.patch
        58 kB
        rahul k singh
      3. 1376-4-yhadoop20.100.patch
        125 kB
        rahul k singh
      4. 1376-5-yhadoop20-100.patch
        124 kB
        rahul k singh
      5. 1376-yhadoop-security.patch
        118 kB
        rahul k singh
      6. M1376-0.patch
        34 kB
        Chris Douglas
      7. M1376-1.patch
        34 kB
        Chris Douglas
      8. M1376-2.patch
        35 kB
        Chris Douglas
      9. M1376-3.patch
        35 kB
        Chris Douglas
      10. M1376-4.patch
        36 kB
        Chris Douglas

        Issue Links

          Activity

          Hide
          Chris Douglas added a comment -

          Patch:

          • Adds a pluggable mechanism for resolving a user in the trace to another (defaults to Submitter, the current behavior)
          • Fixes a minor bug in data generation that could write zero-length segments
          • Includes resolver that consistently assigns users in the trace to a queue of target users
          Show
          Chris Douglas added a comment - Patch: Adds a pluggable mechanism for resolving a user in the trace to another (defaults to Submitter, the current behavior) Fixes a minor bug in data generation that could write zero-length segments Includes resolver that consistently assigns users in the trace to a queue of target users
          Hide
          Hong Tang added a comment -

          Nice feature to have. This would be useful for evaluating fairness or potential starvation problems of schedulers like CapacityScheduler.

          Show
          Hong Tang added a comment - Nice feature to have. This would be useful for evaluating fairness or potential starvation problems of schedulers like CapacityScheduler.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12430343/M1376-0.patch
          against trunk revision 899501.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 8 new or modified tests.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          -1 findbugs. The patch appears to introduce 1 new Findbugs warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 core tests. The patch passed core unit tests.

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/271/testReport/
          Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/271/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
          Checkstyle results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/271/artifact/trunk/build/test/checkstyle-errors.html
          Console output: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/271/console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12430343/M1376-0.patch against trunk revision 899501. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 8 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. -1 findbugs. The patch appears to introduce 1 new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/271/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/271/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/271/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/271/console This message is automatically generated.
          Hide
          Chris Douglas added a comment -

          Removed unused field.

          Show
          Chris Douglas added a comment - Removed unused field.
          Hide
          Hadoop QA added a comment -

          +1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12430363/M1376-1.patch
          against trunk revision 899501.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 8 new or modified tests.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 findbugs. The patch does not introduce any new Findbugs warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 core tests. The patch passed core unit tests.

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/389/testReport/
          Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/389/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
          Checkstyle results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/389/artifact/trunk/build/test/checkstyle-errors.html
          Console output: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/389/console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - +1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12430363/M1376-1.patch against trunk revision 899501. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 8 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/389/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/389/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/389/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/389/console This message is automatically generated.
          Hide
          Hong Tang added a comment -

          Patch looks good mostly. A few comments:

          • Please open a jira to enhance rumen to extract UGI mapping information from job traces.
          • There is no API documentation of the interface methods in UserResolver.
          • There seems to be a typo in UserResolver javadoc "Maps users to a set of users on the test cluster." => "Maps users to a set of groups on the test cluster."
          • The interface of UserResolver.setTargetUsers() requires the userspec to be stored in a file. How about changing it to "setTargetUsers(List<UserGroupInformation>, Configuration conf)"? Also currently EchoUserResolver/SubmitterUserResolver ignores this info, maybe we can have the method to return a boolean to indicate whether the method call takes effect or not? This would also require UserResolver.parseUserList moved to GridMix.run().
          • Newly added conf parameters are not updated in GridMix.printUsage() method.
          • Any reason we split GridMix.run() to GridMix.run() and GridMix.start() now?
          Show
          Hong Tang added a comment - Patch looks good mostly. A few comments: Please open a jira to enhance rumen to extract UGI mapping information from job traces. There is no API documentation of the interface methods in UserResolver. There seems to be a typo in UserResolver javadoc "Maps users to a set of users on the test cluster." => "Maps users to a set of groups on the test cluster." The interface of UserResolver.setTargetUsers() requires the userspec to be stored in a file. How about changing it to "setTargetUsers(List<UserGroupInformation>, Configuration conf)"? Also currently EchoUserResolver/SubmitterUserResolver ignores this info, maybe we can have the method to return a boolean to indicate whether the method call takes effect or not? This would also require UserResolver.parseUserList moved to GridMix.run(). Newly added conf parameters are not updated in GridMix.printUsage() method. Any reason we split GridMix.run() to GridMix.run() and GridMix.start() now?
          Hide
          Chris Douglas added a comment -

          Please open a jira to enhance rumen to extract UGI mapping information from job traces.

          Filed MAPREDUCE-1384

          There is no API documentation of the interface methods in UserResolver.

          Newly added conf parameters are not updated in GridMix.printUsage() method.

          There seems to be a typo in UserResolver javadoc "Maps users to a set of users on the test cluster." => "Maps users to a set of groups on the test cluster."

          That wasn't a typo, but it's clearly not a helpful comment. Replaced with "Maps users in the trace to a set of valid target users on the test cluster." Also added interface and option documentation.

          The interface of UserResolver.setTargetUsers() requires the userspec to be stored in a file. How about changing it to "setTargetUsers(List<UserGroupInformation>, Configuration conf)"? [snip] This would also require UserResolver.parseUserList moved to GridMix.run()

          This was the first interface I tried, but a UserResolver is really just getting an (optional) URI argument and a conf. How/whether it builds a user list from that resource may be specific to the UserResolver implementation. Since -users is a command-line param, I gave it a Path type and assume that will be sufficient. I left the parsing of the default file format in the base class, so I agree that it's likely the only implementation, but I left it as a member function so it's easy to override.

          Also currently EchoUserResolver/SubmitterUserResolver ignores this info, maybe we can have the method to return a boolean to indicate whether the method call takes effect or not?

          The driver would probably ignore that feedback, wouldn't it?

          Any reason we split GridMix.run() to GridMix.run() and GridMix.start() now?

          The command-line parsing was becoming less trivial, but breaking it into a method called by run would have been uglier. Either way is fine w/ me; any preference?

          Show
          Chris Douglas added a comment - Please open a jira to enhance rumen to extract UGI mapping information from job traces. Filed MAPREDUCE-1384 There is no API documentation of the interface methods in UserResolver. Newly added conf parameters are not updated in GridMix.printUsage() method. There seems to be a typo in UserResolver javadoc "Maps users to a set of users on the test cluster." => "Maps users to a set of groups on the test cluster." That wasn't a typo, but it's clearly not a helpful comment. Replaced with "Maps users in the trace to a set of valid target users on the test cluster." Also added interface and option documentation. The interface of UserResolver.setTargetUsers() requires the userspec to be stored in a file. How about changing it to "setTargetUsers(List<UserGroupInformation>, Configuration conf)"? [snip] This would also require UserResolver.parseUserList moved to GridMix.run() This was the first interface I tried, but a UserResolver is really just getting an (optional) URI argument and a conf. How/whether it builds a user list from that resource may be specific to the UserResolver implementation. Since -users is a command-line param, I gave it a Path type and assume that will be sufficient. I left the parsing of the default file format in the base class, so I agree that it's likely the only implementation, but I left it as a member function so it's easy to override. Also currently EchoUserResolver/SubmitterUserResolver ignores this info, maybe we can have the method to return a boolean to indicate whether the method call takes effect or not? The driver would probably ignore that feedback, wouldn't it? Any reason we split GridMix.run() to GridMix.run() and GridMix.start() now? The command-line parsing was becoming less trivial, but breaking it into a method called by run would have been uglier. Either way is fine w/ me; any preference?
          Hide
          Hadoop QA added a comment -

          +1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12430483/M1376-2.patch
          against trunk revision 899844.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 8 new or modified tests.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 findbugs. The patch does not introduce any new Findbugs warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 core tests. The patch passed core unit tests.

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/274/testReport/
          Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/274/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
          Checkstyle results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/274/artifact/trunk/build/test/checkstyle-errors.html
          Console output: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/274/console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - +1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12430483/M1376-2.patch against trunk revision 899844. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 8 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/274/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/274/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/274/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/274/console This message is automatically generated.
          Hide
          Hong Tang added a comment -

          The added set of publicly visible configuration parameters, such as GRIDMIX_USR_RSV, are still not documented in printUsage() method.

          How/whether it builds a user list from that resource may be specific to the UserResolver implementation.

          I'd suggest we decouple the logic of obtaining the list of target users from resolving source users to target users. It also seems reasonable to me to only support getting the list of target users from a file with a well-known format for now. We can extend it later if we want to support specifying users through command line or from LDAP, etc (as you can see, some of such extensions may not be easily doable through a URI-based interface).

          The driver would probably ignore that feedback, wouldn't it?

          There could be a couple of possible enhancements based such feedback: One is to issue an INFO/WARN log by Gridmix3 in the case when the resolver ignores the targeted user list. The other is to do extra runtime check that a user returned by the resolver is actually valid before submitting the job to the job tracker for plugins that do not honor the targeted set of users. We do not have to implement such enhancements in this patch, but I think it is a good idea to make the interface not hide any capability of a particular resolver.

          Other than these, the new patch addressed the remaining of my previous comments. Thanks.

          Show
          Hong Tang added a comment - The added set of publicly visible configuration parameters, such as GRIDMIX_USR_RSV, are still not documented in printUsage() method. How/whether it builds a user list from that resource may be specific to the UserResolver implementation. I'd suggest we decouple the logic of obtaining the list of target users from resolving source users to target users. It also seems reasonable to me to only support getting the list of target users from a file with a well-known format for now. We can extend it later if we want to support specifying users through command line or from LDAP, etc (as you can see, some of such extensions may not be easily doable through a URI-based interface). The driver would probably ignore that feedback, wouldn't it? There could be a couple of possible enhancements based such feedback: One is to issue an INFO/WARN log by Gridmix3 in the case when the resolver ignores the targeted user list. The other is to do extra runtime check that a user returned by the resolver is actually valid before submitting the job to the job tracker for plugins that do not honor the targeted set of users. We do not have to implement such enhancements in this patch, but I think it is a good idea to make the interface not hide any capability of a particular resolver. Other than these, the new patch addressed the remaining of my previous comments. Thanks.
          Hide
          Chris Douglas added a comment -

          The added set of publicly visible configuration parameters, such as GRIDMIX_USR_RSV, are still not documented in printUsage() method.

          Sorry; I'd included this change, but must not have regenerated the patch.

          I'd suggest we decouple the logic of obtaining the list of target users from resolving source users to target users.

          I agree that the strategy for obtaining the list of users is a reasonable abstraction, but why would this be part of the driver and not the user resolution types? I've changed the Path type to a URI to be more general, but emphasize that it's only there for usability. Any of the alternative specs you propose- including the command-line- can be implemented using Configuration and GenericOptionsParser.

          It seems most of our disagreement would go away if I just removed the -users parameter and all resolution was done through Configuration.

          There could be a couple of possible enhancements based such feedback: One is to issue an INFO/WARN log by Gridmix3 in the case when the resolver ignores the targeted user list. [...]

          Whether the resource is used is implicit in the contract of the particular UserResolver isn't it? I added the warning, anyway.


          It's worth pointing out that (the now PA) HADOOP-6299 completely invalidates this patch. Many of the classes will no longer exist after its imminent commit.

          Show
          Chris Douglas added a comment - The added set of publicly visible configuration parameters, such as GRIDMIX_USR_RSV, are still not documented in printUsage() method. Sorry; I'd included this change, but must not have regenerated the patch. I'd suggest we decouple the logic of obtaining the list of target users from resolving source users to target users. I agree that the strategy for obtaining the list of users is a reasonable abstraction, but why would this be part of the driver and not the user resolution types? I've changed the Path type to a URI to be more general, but emphasize that it's only there for usability. Any of the alternative specs you propose- including the command-line- can be implemented using Configuration and GenericOptionsParser. It seems most of our disagreement would go away if I just removed the -users parameter and all resolution was done through Configuration. There could be a couple of possible enhancements based such feedback: One is to issue an INFO/WARN log by Gridmix3 in the case when the resolver ignores the targeted user list. [...] Whether the resource is used is implicit in the contract of the particular UserResolver isn't it? I added the warning, anyway. It's worth pointing out that (the now PA) HADOOP-6299 completely invalidates this patch. Many of the classes will no longer exist after its imminent commit.
          Hide
          Hong Tang added a comment -

          I agree that the strategy for obtaining the list of users is a reasonable abstraction, but why would this be part of the driver and not the user resolution types?

          My thoughts is that the knowledge of the list of targeted users is tied to a specific cluster where we run gridmix and thus should be independent from which specific resolver we use. But on the other hand, the interface I suggested is also less flexible that it prevents us from passing down more info to the resolver in its initialization (e.g. what if we want to do weighted random among the set of targeted users). So I am leaning toward your suggestion of keeping the URI-based interface but making a note that commonly, the URI should point to a file following the simple format parsable by UserResolver.parseUserList (would you please add such comments in the javadoc for UserResolver.setTargetUsers?).

          It seems most of our disagreement would go away if I just removed the -users parameter and all resolution was done through Configuration.

          I actually do not like the idea of completely hiding the abstraction of targeted users. Would you agree?

          So in summary, I am fine with the latest patch with only the minor suggestion of adding a comment to emphasize the common format convention of userspec.

          Show
          Hong Tang added a comment - I agree that the strategy for obtaining the list of users is a reasonable abstraction, but why would this be part of the driver and not the user resolution types? My thoughts is that the knowledge of the list of targeted users is tied to a specific cluster where we run gridmix and thus should be independent from which specific resolver we use. But on the other hand, the interface I suggested is also less flexible that it prevents us from passing down more info to the resolver in its initialization (e.g. what if we want to do weighted random among the set of targeted users). So I am leaning toward your suggestion of keeping the URI-based interface but making a note that commonly, the URI should point to a file following the simple format parsable by UserResolver.parseUserList (would you please add such comments in the javadoc for UserResolver.setTargetUsers?). It seems most of our disagreement would go away if I just removed the -users parameter and all resolution was done through Configuration. I actually do not like the idea of completely hiding the abstraction of targeted users. Would you agree? So in summary, I am fine with the latest patch with only the minor suggestion of adding a comment to emphasize the common format convention of userspec.
          Hide
          Chris Douglas added a comment -

          making a note that commonly, the URI should point to a file following the simple format parsable by UserResolver.parseUserList (would you please add such comments in the javadoc for UserResolver.setTargetUsers?).

          Sounds good; done.

          I actually do not like the idea of completely hiding the abstraction of targeted users. Would you agree?

          Yes, I agree. It's more of a first-class feature than most of the other configurable parameters.

          Show
          Chris Douglas added a comment - making a note that commonly, the URI should point to a file following the simple format parsable by UserResolver.parseUserList (would you please add such comments in the javadoc for UserResolver.setTargetUsers?). Sounds good; done. I actually do not like the idea of completely hiding the abstraction of targeted users. Would you agree? Yes, I agree. It's more of a first-class feature than most of the other configurable parameters.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12431174/M1376-4.patch
          against trunk revision 902272.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 8 new or modified tests.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 findbugs. The patch does not introduce any new Findbugs warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed core unit tests.

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/283/testReport/
          Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/283/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
          Checkstyle results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/283/artifact/trunk/build/test/checkstyle-errors.html
          Console output: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/283/console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12431174/M1376-4.patch against trunk revision 902272. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 8 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/283/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/283/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/283/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/283/console This message is automatically generated.
          Hide
          Chris Douglas added a comment -

          The failing test, TestTaskFail.testWithDFS is not related.

          Show
          Chris Douglas added a comment - The failing test, TestTaskFail.testWithDFS is not related.
          Hide
          Chris Douglas added a comment -

          The strategy effecting this is invalid after HADOOP-6299

          Show
          Chris Douglas added a comment - The strategy effecting this is invalid after HADOOP-6299
          Hide
          rahul k singh added a comment -

          Attaching the yahoo hadoop security patch for this feature , this also includes changes for required for 1408

          Show
          rahul k singh added a comment - Attaching the yahoo hadoop security patch for this feature , this also includes changes for required for 1408
          Hide
          rahul k singh added a comment -

          Implemented the changes suggested by chris.

          In the earlier path RoundRobinUserResolver was defining its own GroupMappingService , that part is removed now.

          Show
          rahul k singh added a comment - Implemented the changes suggested by chris. In the earlier path RoundRobinUserResolver was defining its own GroupMappingService , that part is removed now.
          Hide
          Chris Douglas added a comment -

          The attached patch, even with security disabled, did not work for me using RoundRobinUserResolver (user from trace replaced with <traceuser>, user from userlist replaced with <targetuser>):

          10/03/12 20:20:52 WARN gridmix.JobSubmitter: Failed to submit GRIDMIX00106 as <targetuser> via <traceuser>
          org.apache.hadoop.ipc.RemoteException: User: <traceuser> is not allowed to impersonate <targetuser>
                  at org.apache.hadoop.ipc.Client.call(Client.java:873)
                  at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:222)
                  at org.apache.hadoop.mapred.$Proxy1.getProtocolVersion(Unknown Source)
                  at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:360)
                  at org.apache.hadoop.mapred.JobClient.createRPCProxy(JobClient.java:443)
                  at org.apache.hadoop.mapred.JobClient.init(JobClient.java:437)
                  at org.apache.hadoop.mapred.JobClient.<init>(JobClient.java:422)
                  at org.apache.hadoop.mapreduce.Job$1.run(Job.java:477)
                  at java.security.AccessController.doPrivileged(Native Method)
                  at javax.security.auth.Subject.doAs(Subject.java:396)
                  at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:766)
                  at org.apache.hadoop.mapreduce.Job.connect(Job.java:475)
                  at org.apache.hadoop.mapreduce.Job.submit(Job.java:464)
                  at org.apache.hadoop.mapred.gridmix.GridmixJob.call(GridmixJob.java:230)
                  at org.apache.hadoop.mapred.gridmix.JobSubmitter$SubmitTask.run(JobSubmitter.java:119)
                  at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
                  at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
                  at java.lang.Thread.run(Thread.java:619)
          

          Other feedback:

          • The following should be removed:
            • This method is not used:
              +  private void throwException(Throwable exception) throws Throwable{
              +    throw exception;
              +  }
              
            • GridmixJob contains a few lines with empty statements (see doAs blocks)
            • GridmixTestUtils is not a useful abstraction. Its functionality should remain in/be added to TestGridmixSubmission
          • Consider Collections.emptyList() instead of creating and returning new, empty collections (e.g. EchoUserResolver)
          • EchoUserResolver only needs to extend ShellBasedUnixGroupsMapping for the unit test, right? The group mapping isn't important for the type. A separate group mapping class in test would be appropriate (presumably one already exists)
          • RoundRobinUserResolver::parseUserList should be protected so subclasses may override it
          • Since UserResolver can remain an abstract class (no need to extend any groups mapping), parseUserList can remain there.
          • The default policy should be REPLAY, not STRESS
          • JobSubmitter should not re-resolve the resolved UGI before calling buildSplits.
          • It is not sufficient to fix the failure above, but the job is not submitted in a doAs block.
          Show
          Chris Douglas added a comment - The attached patch, even with security disabled, did not work for me using RoundRobinUserResolver (user from trace replaced with <traceuser> , user from userlist replaced with <targetuser> ): 10/03/12 20:20:52 WARN gridmix.JobSubmitter: Failed to submit GRIDMIX00106 as <targetuser> via <traceuser> org.apache.hadoop.ipc.RemoteException: User: <traceuser> is not allowed to impersonate <targetuser> at org.apache.hadoop.ipc.Client.call(Client.java:873) at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:222) at org.apache.hadoop.mapred.$Proxy1.getProtocolVersion(Unknown Source) at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:360) at org.apache.hadoop.mapred.JobClient.createRPCProxy(JobClient.java:443) at org.apache.hadoop.mapred.JobClient.init(JobClient.java:437) at org.apache.hadoop.mapred.JobClient.<init>(JobClient.java:422) at org.apache.hadoop.mapreduce.Job$1.run(Job.java:477) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:766) at org.apache.hadoop.mapreduce.Job.connect(Job.java:475) at org.apache.hadoop.mapreduce.Job.submit(Job.java:464) at org.apache.hadoop.mapred.gridmix.GridmixJob.call(GridmixJob.java:230) at org.apache.hadoop.mapred.gridmix.JobSubmitter$SubmitTask.run(JobSubmitter.java:119) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:619) Other feedback: The following should be removed: This method is not used: + private void throwException(Throwable exception) throws Throwable{ + throw exception; + } GridmixJob contains a few lines with empty statements (see doAs blocks) GridmixTestUtils is not a useful abstraction. Its functionality should remain in/be added to TestGridmixSubmission Consider Collections.emptyList() instead of creating and returning new, empty collections (e.g. EchoUserResolver ) EchoUserResolver only needs to extend ShellBasedUnixGroupsMapping for the unit test, right? The group mapping isn't important for the type. A separate group mapping class in test would be appropriate (presumably one already exists) RoundRobinUserResolver::parseUserList should be protected so subclasses may override it Since UserResolver can remain an abstract class (no need to extend any groups mapping), parseUserList can remain there. The default policy should be REPLAY , not STRESS JobSubmitter should not re-resolve the resolved UGI before calling buildSplits . It is not sufficient to fix the failure above, but the job is not submitted in a doAs block.
          Hide
          rahul k singh added a comment -

          Implemented the comments

          Ran the fix on 150 node cluster . It works fine.

          Show
          rahul k singh added a comment - Implemented the comments Ran the fix on 150 node cluster . It works fine.
          Hide
          rahul k singh added a comment -

          Attached the correct patch

          Show
          rahul k singh added a comment - Attached the correct patch
          Hide
          rahul k singh added a comment -

          Changes w.r.t to Testcases.

          Show
          rahul k singh added a comment - Changes w.r.t to Testcases.
          Hide
          rahul k singh added a comment -
          1. GridmixTestUtils is not a useful abstraction. Its functionality should remain in/be added to TestGridmixSubmission
            I feel , it is clean . I have added more code for creation of staging directories in it so I thought TestGridmixSubmission was getting really big with some reusable components . I think we might use those components again in future. Also It is was taking more time get it back to TestGridmixSubmission , I understand that we could do it later as well but given the urgency and as there is no harm in pulling some reusable code I am leaving it as it is.
          1. Since UserResolver can remain an abstract class (no need to extend any groups mapping), parseUserList can remain there.
            parseUserList is only being used by RoundRobinUserResolve to iam leaving the interface as it is.
          Show
          rahul k singh added a comment - GridmixTestUtils is not a useful abstraction. Its functionality should remain in/be added to TestGridmixSubmission I feel , it is clean . I have added more code for creation of staging directories in it so I thought TestGridmixSubmission was getting really big with some reusable components . I think we might use those components again in future. Also It is was taking more time get it back to TestGridmixSubmission , I understand that we could do it later as well but given the urgency and as there is no harm in pulling some reusable code I am leaving it as it is. Since UserResolver can remain an abstract class (no need to extend any groups mapping), parseUserList can remain there. parseUserList is only being used by RoundRobinUserResolve to iam leaving the interface as it is.
          Hide
          Chris Douglas added a comment -

          I feel , it is clean . I have added more code for creation of staging directories in it so I thought TestGridmixSubmission was getting really big with some reusable components . I think we might use those components again in future.

          Code written with *Util classes tends to be difficult to read and the tenuous dependencies between tests created through these classes are avoidable. Concerning its content, other than boilerplate MiniMRCluster code (which must be called in the same, boilerplate way from the test) and the one-line changePermission, there's only createHomeAndStagingDirectory, which calls back to the TestGridmixSubmission LOG field, anyway. But OK.

          parseUserList is only being used by RoundRobinUserResolve to iam leaving the interface as it is.

          Abstract classes are generally easier to evolve than interfaces- and creating new UserResolvers without rewriting the parser or depending on RoundRobinUserResolver is a pleasant property- but OK. Making parseUserList protected would at least allow subclasses of RRUR to share the parsing code.

          • There is some commented-out code in TestGridmixSubmission:
            +//    GridmixTestUtils.createHomeAndStagingDirectory((JobConf)conf);
          • There is a javadoc error since parseUserList is moved:
            +   * listing target users. The format of this file is defined by {@link
            +   * #parseUserList}.
          • Many javadoc comments seem to include </p> at random
          • Many of the exception and log messages start with spaces, e.g. " Could not run job submitted ", which is not conventional. Please don't do this. There are also many spelling errors in the messages, e.g. LOG.error(" EXCEPTOIN in availableSlots ", e);
          • Should jobs be submitted with ACLs so that the statistics can be viewed by the submitting user? This would allow statistics, etc. to be collected without requiring a doAs block, right?
          • GridmixJob::call can declare the exceptions it will throw instead of catching everything and returning null. This is true of all the PrivilegedExceptionAction uses, including the doAs calling Gridmix::runJob, which can have a signature narrower than Exception
          • TestGridmixSubmission::doSubmission should take a GridmixJobSubmissionPolicy as an argument, rather than setting a static and calling the method from each submission type.

          +1 Overall. None of these need to block commit, but please attend to them as part of either the Apache patch or concurrently with related work.

          Show
          Chris Douglas added a comment - I feel , it is clean . I have added more code for creation of staging directories in it so I thought TestGridmixSubmission was getting really big with some reusable components . I think we might use those components again in future. Code written with *Util classes tends to be difficult to read and the tenuous dependencies between tests created through these classes are avoidable. Concerning its content, other than boilerplate MiniMRCluster code (which must be called in the same, boilerplate way from the test) and the one-line changePermission , there's only createHomeAndStagingDirectory , which calls back to the TestGridmixSubmission LOG field, anyway. But OK. parseUserList is only being used by RoundRobinUserResolve to iam leaving the interface as it is. Abstract classes are generally easier to evolve than interfaces- and creating new UserResolvers without rewriting the parser or depending on RoundRobinUserResolver is a pleasant property- but OK. Making parseUserList protected would at least allow subclasses of RRUR to share the parsing code. There is some commented-out code in TestGridmixSubmission : +// GridmixTestUtils.createHomeAndStagingDirectory((JobConf)conf); There is a javadoc error since parseUserList is moved: + * listing target users. The format of this file is defined by {@link + * #parseUserList}. Many javadoc comments seem to include </p> at random Many of the exception and log messages start with spaces, e.g. " Could not run job submitted " , which is not conventional. Please don't do this. There are also many spelling errors in the messages, e.g. LOG.error(" EXCEPTOIN in availableSlots ", e); Should jobs be submitted with ACLs so that the statistics can be viewed by the submitting user? This would allow statistics, etc. to be collected without requiring a doAs block, right? GridmixJob::call can declare the exceptions it will throw instead of catching everything and returning null. This is true of all the PrivilegedExceptionAction uses, including the doAs calling Gridmix::runJob , which can have a signature narrower than Exception TestGridmixSubmission::doSubmission should take a GridmixJobSubmissionPolicy as an argument, rather than setting a static and calling the method from each submission type. +1 Overall. None of these need to block commit, but please attend to them as part of either the Apache patch or concurrently with related work.
          Hide
          Chris Douglas added a comment -

          Fixed in MAPREDUCE-1840

          Show
          Chris Douglas added a comment - Fixed in MAPREDUCE-1840

            People

            • Assignee:
              Chris Douglas
              Reporter:
              Chris Douglas
            • Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development