Uploaded image for project: 'Hadoop Common'
  1. Hadoop Common
  2. HADOOP-13817

Add a finite shell command timeout to ShellBasedUnixGroupsMapping

    Details

    • Type: Improvement
    • Status: Resolved
    • Priority: Minor
    • Resolution: Fixed
    • Affects Version/s: 2.6.0
    • Fix Version/s: 2.9.0, 3.0.0-alpha4
    • Component/s: security
    • Labels:
      None
    • Hadoop Flags:
      Reviewed
    • Release Note:
      Hide
      A new introduced configuration key "hadoop.security.groups.shell.command.timeout" allows applying a finite wait timeout over the 'id' commands launched by the ShellBasedUnixGroupsMapping plugin. Values specified can be in any valid time duration units: https://hadoop.apache.org/docs/current/api/org/apache/hadoop/conf/Configuration.html#getTimeDuration-java.lang.String-long-java.util.concurrent.TimeUnit-

      Value defaults to 0, indicating infinite wait (preserving existing behaviour).
      Show
      A new introduced configuration key "hadoop.security.groups.shell.command.timeout" allows applying a finite wait timeout over the 'id' commands launched by the ShellBasedUnixGroupsMapping plugin. Values specified can be in any valid time duration units: https://hadoop.apache.org/docs/current/api/org/apache/hadoop/conf/Configuration.html#getTimeDuration-java.lang.String-long-java.util.concurrent.TimeUnit- Value defaults to 0, indicating infinite wait (preserving existing behaviour).

      Description

      The ShellBasedUnixGroupsMapping run various id commands via the ShellCommandExecutor modules without a timeout set (its set to 0, which implies infinite).

      If this command hangs for a long time on the OS end due to an unresponsive groups backend or other reasons, it also blocks the handlers that use it on the NameNode (or other services that use this class). That inadvertently causes odd timeout troubles on the client end where its forced to retry (only to likely run into such hangs again with every attempt until at least one command returns).

      It would be helpful to have a finite command timeout after which we may give up on the command and return the result equivalent of no groups found.

        Issue Links

          Activity

          Hide
          hudson Hudson added a comment -

          SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #11301 (See https://builds.apache.org/job/Hadoop-trunk-Commit/11301/)
          HADOOP-13817. Add a finite shell command timeout to (harsh: rev e8694deb6ad180449f8ce6c1c8b4f84873c0587a)

          • (edit) hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/util/Shell.java
          • (edit) hadoop-common-project/hadoop-common/src/main/resources/core-default.xml
          • (edit) hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/CommonConfigurationKeysPublic.java
          • (edit) hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/security/TestGroupsCaching.java
          • (edit) hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/security/TestShellBasedUnixGroupsMapping.java
          • (edit) hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/security/ShellBasedUnixGroupsMapping.java
          Show
          hudson Hudson added a comment - SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #11301 (See https://builds.apache.org/job/Hadoop-trunk-Commit/11301/ ) HADOOP-13817 . Add a finite shell command timeout to (harsh: rev e8694deb6ad180449f8ce6c1c8b4f84873c0587a) (edit) hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/util/Shell.java (edit) hadoop-common-project/hadoop-common/src/main/resources/core-default.xml (edit) hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/CommonConfigurationKeysPublic.java (edit) hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/security/TestGroupsCaching.java (edit) hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/security/TestShellBasedUnixGroupsMapping.java (edit) hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/security/ShellBasedUnixGroupsMapping.java
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user QwertyManiac closed the pull request at:

          https://github.com/apache/hadoop/pull/161

          Show
          githubbot ASF GitHub Bot added a comment - Github user QwertyManiac closed the pull request at: https://github.com/apache/hadoop/pull/161
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user QwertyManiac commented on the issue:

          https://github.com/apache/hadoop/pull/161

          Done via e8694deb6ad180449f8ce6c1c8b4f84873c0587a.

          Show
          githubbot ASF GitHub Bot added a comment - Github user QwertyManiac commented on the issue: https://github.com/apache/hadoop/pull/161 Done via e8694deb6ad180449f8ce6c1c8b4f84873c0587a.
          Hide
          jojochuang Wei-Chiu Chuang added a comment -

          Test failures are not reproducible in my local tree.

          +1 based on my review and Xiaoyu Yao's review on GitHub.

          Show
          jojochuang Wei-Chiu Chuang added a comment - Test failures are not reproducible in my local tree. +1 based on my review and Xiaoyu Yao 's review on GitHub.
          Hide
          hadoopqa Hadoop QA added a comment -
          -1 overall



          Vote Subsystem Runtime Comment
          0 reexec 0m 21s Docker mode activated.
          +1 @author 0m 0s The patch does not contain any @author tags.
          +1 test4tests 0m 0s The patch appears to include 2 new or modified test files.
          +1 mvninstall 12m 37s trunk passed
          +1 compile 12m 7s trunk passed
          +1 checkstyle 0m 35s trunk passed
          +1 mvnsite 1m 2s trunk passed
          +1 mvneclipse 0m 18s trunk passed
          +1 findbugs 1m 25s trunk passed
          +1 javadoc 0m 48s trunk passed
          +1 mvninstall 0m 37s the patch passed
          +1 compile 11m 13s the patch passed
          +1 javac 11m 13s the patch passed
          +1 checkstyle 0m 34s the patch passed
          +1 mvnsite 0m 58s the patch passed
          +1 mvneclipse 0m 18s the patch passed
          +1 whitespace 0m 0s The patch has no whitespace issues.
          +1 xml 0m 1s The patch has no ill-formed XML file.
          +1 findbugs 1m 31s the patch passed
          +1 javadoc 0m 47s the patch passed
          -1 unit 8m 11s hadoop-common in the patch failed.
          +1 asflicense 0m 31s The patch does not generate ASF License warnings.
          55m 48s



          Reason Tests
          Failed junit tests hadoop.security.TestKDiag



          Subsystem Report/Notes
          Docker Image:yetus/hadoop:a9ad5d6
          JIRA Issue HADOOP-13817
          GITHUB PR https://github.com/apache/hadoop/pull/161
          Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle xml
          uname Linux 93f53f4231a1 3.13.0-106-generic #153-Ubuntu SMP Tue Dec 6 15:44:32 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux
          Build tool maven
          Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh
          git revision trunk / e60c654
          Default Java 1.8.0_121
          findbugs v3.0.0
          unit https://builds.apache.org/job/PreCommit-HADOOP-Build/11709/artifact/patchprocess/patch-unit-hadoop-common-project_hadoop-common.txt
          Test Results https://builds.apache.org/job/PreCommit-HADOOP-Build/11709/testReport/
          modules C: hadoop-common-project/hadoop-common U: hadoop-common-project/hadoop-common
          Console output https://builds.apache.org/job/PreCommit-HADOOP-Build/11709/console
          Powered by Apache Yetus 0.5.0-SNAPSHOT http://yetus.apache.org

          This message was automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment 0 reexec 0m 21s Docker mode activated. +1 @author 0m 0s The patch does not contain any @author tags. +1 test4tests 0m 0s The patch appears to include 2 new or modified test files. +1 mvninstall 12m 37s trunk passed +1 compile 12m 7s trunk passed +1 checkstyle 0m 35s trunk passed +1 mvnsite 1m 2s trunk passed +1 mvneclipse 0m 18s trunk passed +1 findbugs 1m 25s trunk passed +1 javadoc 0m 48s trunk passed +1 mvninstall 0m 37s the patch passed +1 compile 11m 13s the patch passed +1 javac 11m 13s the patch passed +1 checkstyle 0m 34s the patch passed +1 mvnsite 0m 58s the patch passed +1 mvneclipse 0m 18s the patch passed +1 whitespace 0m 0s The patch has no whitespace issues. +1 xml 0m 1s The patch has no ill-formed XML file. +1 findbugs 1m 31s the patch passed +1 javadoc 0m 47s the patch passed -1 unit 8m 11s hadoop-common in the patch failed. +1 asflicense 0m 31s The patch does not generate ASF License warnings. 55m 48s Reason Tests Failed junit tests hadoop.security.TestKDiag Subsystem Report/Notes Docker Image:yetus/hadoop:a9ad5d6 JIRA Issue HADOOP-13817 GITHUB PR https://github.com/apache/hadoop/pull/161 Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle xml uname Linux 93f53f4231a1 3.13.0-106-generic #153-Ubuntu SMP Tue Dec 6 15:44:32 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux Build tool maven Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh git revision trunk / e60c654 Default Java 1.8.0_121 findbugs v3.0.0 unit https://builds.apache.org/job/PreCommit-HADOOP-Build/11709/artifact/patchprocess/patch-unit-hadoop-common-project_hadoop-common.txt Test Results https://builds.apache.org/job/PreCommit-HADOOP-Build/11709/testReport/ modules C: hadoop-common-project/hadoop-common U: hadoop-common-project/hadoop-common Console output https://builds.apache.org/job/PreCommit-HADOOP-Build/11709/console Powered by Apache Yetus 0.5.0-SNAPSHOT http://yetus.apache.org This message was automatically generated.
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user QwertyManiac commented on a diff in the pull request:

          https://github.com/apache/hadoop/pull/161#discussion_r102943834

          — Diff: hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/security/ShellBasedUnixGroupsMapping.java —
          @@ -133,8 +177,26 @@ protected ShellCommandExecutor createGroupIDExecutor(String userName)

          { groups = resolvePartialGroupNames(user, e.getMessage(), executor.getOutput()); }

          catch (PartialGroupNameException pge) {

          • LOG.warn("unable to return groups for user " + user, pge);
          • return new LinkedList<>();
            + LOG.warn("unable to return groups for user {}", user, pge);
            + return EMPTY_GROUPS;
            + }
            + } catch (IOException ioe) {
            + // If its a shell executor timeout, indicate so in the message
            + // but treat the result as empty instead of throwing it up,
            + // similar to how partial resolution failures are handled above
            + if (executor.isTimedOut()) {
            + LOG.warn(
            + "Unable to return groups for user '{}' as shell group lookup " +
            + "command '{}' ran longer than the configured timeout limit of " +
            + "{} seconds.",
            + user,
            + Arrays.asList(executor.getExecString()),
              • End diff –

          Thank you again @jojochuang. I've added a new change here addressing your comments. I've also uploaded the full patch directly to JIRA to trigger a build test.

          Show
          githubbot ASF GitHub Bot added a comment - Github user QwertyManiac commented on a diff in the pull request: https://github.com/apache/hadoop/pull/161#discussion_r102943834 — Diff: hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/security/ShellBasedUnixGroupsMapping.java — @@ -133,8 +177,26 @@ protected ShellCommandExecutor createGroupIDExecutor(String userName) { groups = resolvePartialGroupNames(user, e.getMessage(), executor.getOutput()); } catch (PartialGroupNameException pge) { LOG.warn("unable to return groups for user " + user, pge); return new LinkedList<>(); + LOG.warn("unable to return groups for user {}", user, pge); + return EMPTY_GROUPS; + } + } catch (IOException ioe) { + // If its a shell executor timeout, indicate so in the message + // but treat the result as empty instead of throwing it up, + // similar to how partial resolution failures are handled above + if (executor.isTimedOut()) { + LOG.warn( + "Unable to return groups for user '{}' as shell group lookup " + + "command '{}' ran longer than the configured timeout limit of " + + "{} seconds.", + user, + Arrays.asList(executor.getExecString()), End diff – Thank you again @jojochuang. I've added a new change here addressing your comments. I've also uploaded the full patch directly to JIRA to trigger a build test.
          Hide
          qwertymaniac Harsh J added a comment -

          Uploading direct patches to trigger Jenkins build.

          Show
          qwertymaniac Harsh J added a comment - Uploading direct patches to trigger Jenkins build.
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user jojochuang commented on a diff in the pull request:

          https://github.com/apache/hadoop/pull/161#discussion_r102665699

          — Diff: hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/security/ShellBasedUnixGroupsMapping.java —
          @@ -133,8 +177,26 @@ protected ShellCommandExecutor createGroupIDExecutor(String userName)

          { groups = resolvePartialGroupNames(user, e.getMessage(), executor.getOutput()); }

          catch (PartialGroupNameException pge) {

          • LOG.warn("unable to return groups for user " + user, pge);
          • return new LinkedList<>();
            + LOG.warn("unable to return groups for user {}", user, pge);
            + return EMPTY_GROUPS;
            + }
            + } catch (IOException ioe) {
            + // If its a shell executor timeout, indicate so in the message
            + // but treat the result as empty instead of throwing it up,
            + // similar to how partial resolution failures are handled above
            + if (executor.isTimedOut()) {
            + LOG.warn(
            + "Unable to return groups for user '{}' as shell group lookup " +
            + "command '{}' ran longer than the configured timeout limit of " +
            + "{} seconds.",
            + user,
            + Arrays.asList(executor.getExecString()),
              • End diff –

          I am +1 pending this and Jenkins precommit build. Somehow Jenkins is never run for your patches.

          Show
          githubbot ASF GitHub Bot added a comment - Github user jojochuang commented on a diff in the pull request: https://github.com/apache/hadoop/pull/161#discussion_r102665699 — Diff: hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/security/ShellBasedUnixGroupsMapping.java — @@ -133,8 +177,26 @@ protected ShellCommandExecutor createGroupIDExecutor(String userName) { groups = resolvePartialGroupNames(user, e.getMessage(), executor.getOutput()); } catch (PartialGroupNameException pge) { LOG.warn("unable to return groups for user " + user, pge); return new LinkedList<>(); + LOG.warn("unable to return groups for user {}", user, pge); + return EMPTY_GROUPS; + } + } catch (IOException ioe) { + // If its a shell executor timeout, indicate so in the message + // but treat the result as empty instead of throwing it up, + // similar to how partial resolution failures are handled above + if (executor.isTimedOut()) { + LOG.warn( + "Unable to return groups for user '{}' as shell group lookup " + + "command '{}' ran longer than the configured timeout limit of " + + "{} seconds.", + user, + Arrays.asList(executor.getExecString()), End diff – I am +1 pending this and Jenkins precommit build. Somehow Jenkins is never run for your patches.
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user jojochuang commented on a diff in the pull request:

          https://github.com/apache/hadoop/pull/161#discussion_r102665241

          — Diff: hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/security/ShellBasedUnixGroupsMapping.java —
          @@ -133,8 +177,26 @@ protected ShellCommandExecutor createGroupIDExecutor(String userName)

          { groups = resolvePartialGroupNames(user, e.getMessage(), executor.getOutput()); }

          catch (PartialGroupNameException pge) {

          • LOG.warn("unable to return groups for user " + user, pge);
          • return new LinkedList<>();
            + LOG.warn("unable to return groups for user {}", user, pge);
            + return EMPTY_GROUPS;
            + }
            + } catch (IOException ioe) {
            + // If its a shell executor timeout, indicate so in the message
            + // but treat the result as empty instead of throwing it up,
            + // similar to how partial resolution failures are handled above
            + if (executor.isTimedOut()) {
            + LOG.warn(
            + "Unable to return groups for user '{}' as shell group lookup " +
            + "command '{}' ran longer than the configured timeout limit of " +
            + "{} seconds.",
            + user,
            + Arrays.asList(executor.getExecString()),
              • End diff –

          Reviewed the patch again, and I think it's almost ready.
          Here's one nit regarding the warning message here: it would print something like "Unable to return groups for user 'foobarnonexistinguser' as shell group lookup command '[sleep, 2]'". But this can be confusing as the command is not run with brackets and comas.

          Show
          githubbot ASF GitHub Bot added a comment - Github user jojochuang commented on a diff in the pull request: https://github.com/apache/hadoop/pull/161#discussion_r102665241 — Diff: hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/security/ShellBasedUnixGroupsMapping.java — @@ -133,8 +177,26 @@ protected ShellCommandExecutor createGroupIDExecutor(String userName) { groups = resolvePartialGroupNames(user, e.getMessage(), executor.getOutput()); } catch (PartialGroupNameException pge) { LOG.warn("unable to return groups for user " + user, pge); return new LinkedList<>(); + LOG.warn("unable to return groups for user {}", user, pge); + return EMPTY_GROUPS; + } + } catch (IOException ioe) { + // If its a shell executor timeout, indicate so in the message + // but treat the result as empty instead of throwing it up, + // similar to how partial resolution failures are handled above + if (executor.isTimedOut()) { + LOG.warn( + "Unable to return groups for user '{}' as shell group lookup " + + "command '{}' ran longer than the configured timeout limit of " + + "{} seconds.", + user, + Arrays.asList(executor.getExecString()), End diff – Reviewed the patch again, and I think it's almost ready. Here's one nit regarding the warning message here: it would print something like "Unable to return groups for user 'foobarnonexistinguser' as shell group lookup command ' [sleep, 2] '". But this can be confusing as the command is not run with brackets and comas.
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user QwertyManiac commented on a diff in the pull request:

          https://github.com/apache/hadoop/pull/161#discussion_r101673820

          — Diff: hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/security/ShellBasedUnixGroupsMapping.java —
          @@ -225,8 +287,16 @@ protected ShellCommandExecutor createGroupIDExecutor(String userName)

          { throw new PartialGroupNameException("failed to get group id list for " + "user '" + userName + "'", ece); }

          catch (IOException ioe) {

          • throw new PartialGroupNameException("can't execute the shell command to"
          • + " get the list of group id for user '" + userName + "'", ioe);
            + String message =
            + "Can't execute the shell command to " +
            + "get the list of group id for user '" + userName + "'";
            + if (exec2.isTimedOut()) {
            + message +=
            + " because of the command taking longer than " +
            + "the configured timeout: " + timeout + " seconds";
            + throw new PartialGroupNameException(message);
              • End diff –

          Thanks, addressed in new commit. I just felt the timeout exception may look weird, but I've dropped the line so we can be consistent in exposing the exception at all times.

          Show
          githubbot ASF GitHub Bot added a comment - Github user QwertyManiac commented on a diff in the pull request: https://github.com/apache/hadoop/pull/161#discussion_r101673820 — Diff: hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/security/ShellBasedUnixGroupsMapping.java — @@ -225,8 +287,16 @@ protected ShellCommandExecutor createGroupIDExecutor(String userName) { throw new PartialGroupNameException("failed to get group id list for " + "user '" + userName + "'", ece); } catch (IOException ioe) { throw new PartialGroupNameException("can't execute the shell command to" + " get the list of group id for user '" + userName + "'", ioe); + String message = + "Can't execute the shell command to " + + "get the list of group id for user '" + userName + "'"; + if (exec2.isTimedOut()) { + message += + " because of the command taking longer than " + + "the configured timeout: " + timeout + " seconds"; + throw new PartialGroupNameException(message); End diff – Thanks, addressed in new commit. I just felt the timeout exception may look weird, but I've dropped the line so we can be consistent in exposing the exception at all times.
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user QwertyManiac commented on a diff in the pull request:

          https://github.com/apache/hadoop/pull/161#discussion_r101673744

          — Diff: hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/security/TestShellBasedUnixGroupsMapping.java —
          @@ -22,19 +22,32 @@

          import org.apache.commons.logging.Log;
          import org.apache.commons.logging.LogFactory;
          +import org.apache.hadoop.conf.Configuration;
          +import org.apache.hadoop.fs.CommonConfigurationKeys;
          +import org.apache.hadoop.test.GenericTestUtils;
          +import org.apache.hadoop.util.ReflectionUtils;
          +import org.apache.hadoop.util.Shell;
          import org.apache.hadoop.util.Shell.ExitCodeException;
          import org.apache.hadoop.util.Shell.ShellCommandExecutor;
          import org.junit.Test;
          +
          import static org.junit.Assert.*;
          import static org.mockito.Mockito.doNothing;
          import static org.mockito.Mockito.doThrow;
          import static org.mockito.Mockito.mock;
          import static org.mockito.Mockito.when;

          public class TestShellBasedUnixGroupsMapping {

          • private static final Log LOG =
            + private static final Log TESTLOG =
            LogFactory.getLog(TestShellBasedUnixGroupsMapping.class);

          + private final GenericTestUtils.LogCapturer shellMappingLog =
          + GenericTestUtils.LogCapturer.captureLogs(
          + ShellBasedUnixGroupsMapping.LOG);
          +
          + private static final boolean WINDOWS =
          + (Shell.osType == Shell.OSType.OS_TYPE_WIN);
          — End diff –

          Thank you, addressed in new commit.

          Show
          githubbot ASF GitHub Bot added a comment - Github user QwertyManiac commented on a diff in the pull request: https://github.com/apache/hadoop/pull/161#discussion_r101673744 — Diff: hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/security/TestShellBasedUnixGroupsMapping.java — @@ -22,19 +22,32 @@ import org.apache.commons.logging.Log; import org.apache.commons.logging.LogFactory; +import org.apache.hadoop.conf.Configuration; +import org.apache.hadoop.fs.CommonConfigurationKeys; +import org.apache.hadoop.test.GenericTestUtils; +import org.apache.hadoop.util.ReflectionUtils; +import org.apache.hadoop.util.Shell; import org.apache.hadoop.util.Shell.ExitCodeException; import org.apache.hadoop.util.Shell.ShellCommandExecutor; import org.junit.Test; + import static org.junit.Assert.*; import static org.mockito.Mockito.doNothing; import static org.mockito.Mockito.doThrow; import static org.mockito.Mockito.mock; import static org.mockito.Mockito.when; public class TestShellBasedUnixGroupsMapping { private static final Log LOG = + private static final Log TESTLOG = LogFactory.getLog(TestShellBasedUnixGroupsMapping.class); + private final GenericTestUtils.LogCapturer shellMappingLog = + GenericTestUtils.LogCapturer.captureLogs( + ShellBasedUnixGroupsMapping.LOG); + + private static final boolean WINDOWS = + (Shell.osType == Shell.OSType.OS_TYPE_WIN); — End diff – Thank you, addressed in new commit.
          Hide
          jojochuang Wei-Chiu Chuang added a comment -

          Okay I did a quick review and there were only two nits after Xiaoyu's numerous reviews.
          By the way, Yetus does not support precommit check if the patch is a github pull request, right? I am not sure how to make that happen... Maybe attach a patch into this jira would trigger that.

          Show
          jojochuang Wei-Chiu Chuang added a comment - Okay I did a quick review and there were only two nits after Xiaoyu's numerous reviews. By the way, Yetus does not support precommit check if the patch is a github pull request, right? I am not sure how to make that happen... Maybe attach a patch into this jira would trigger that.
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user jojochuang commented on a diff in the pull request:

          https://github.com/apache/hadoop/pull/161#discussion_r101592507

          — Diff: hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/security/ShellBasedUnixGroupsMapping.java —
          @@ -225,8 +287,16 @@ protected ShellCommandExecutor createGroupIDExecutor(String userName)

          { throw new PartialGroupNameException("failed to get group id list for " + "user '" + userName + "'", ece); }

          catch (IOException ioe) {

          • throw new PartialGroupNameException("can't execute the shell command to"
          • + " get the list of group id for user '" + userName + "'", ioe);
            + String message =
            + "Can't execute the shell command to " +
            + "get the list of group id for user '" + userName + "'";
            + if (exec2.isTimedOut()) {
            + message +=
            + " because of the command taking longer than " +
            + "the configured timeout: " + timeout + " seconds";
            + throw new PartialGroupNameException(message);
              • End diff –

          Maybe this line is not needed if it will throw the exception anyway? The only difference is that it will not carry the original exception

          Show
          githubbot ASF GitHub Bot added a comment - Github user jojochuang commented on a diff in the pull request: https://github.com/apache/hadoop/pull/161#discussion_r101592507 — Diff: hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/security/ShellBasedUnixGroupsMapping.java — @@ -225,8 +287,16 @@ protected ShellCommandExecutor createGroupIDExecutor(String userName) { throw new PartialGroupNameException("failed to get group id list for " + "user '" + userName + "'", ece); } catch (IOException ioe) { throw new PartialGroupNameException("can't execute the shell command to" + " get the list of group id for user '" + userName + "'", ioe); + String message = + "Can't execute the shell command to " + + "get the list of group id for user '" + userName + "'"; + if (exec2.isTimedOut()) { + message += + " because of the command taking longer than " + + "the configured timeout: " + timeout + " seconds"; + throw new PartialGroupNameException(message); End diff – Maybe this line is not needed if it will throw the exception anyway? The only difference is that it will not carry the original exception
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user jojochuang commented on a diff in the pull request:

          https://github.com/apache/hadoop/pull/161#discussion_r101592163

          — Diff: hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/security/TestShellBasedUnixGroupsMapping.java —
          @@ -22,19 +22,32 @@

          import org.apache.commons.logging.Log;
          import org.apache.commons.logging.LogFactory;
          +import org.apache.hadoop.conf.Configuration;
          +import org.apache.hadoop.fs.CommonConfigurationKeys;
          +import org.apache.hadoop.test.GenericTestUtils;
          +import org.apache.hadoop.util.ReflectionUtils;
          +import org.apache.hadoop.util.Shell;
          import org.apache.hadoop.util.Shell.ExitCodeException;
          import org.apache.hadoop.util.Shell.ShellCommandExecutor;
          import org.junit.Test;
          +
          import static org.junit.Assert.*;
          import static org.mockito.Mockito.doNothing;
          import static org.mockito.Mockito.doThrow;
          import static org.mockito.Mockito.mock;
          import static org.mockito.Mockito.when;

          public class TestShellBasedUnixGroupsMapping {

          • private static final Log LOG =
            + private static final Log TESTLOG =
            LogFactory.getLog(TestShellBasedUnixGroupsMapping.class);

          + private final GenericTestUtils.LogCapturer shellMappingLog =
          + GenericTestUtils.LogCapturer.captureLogs(
          + ShellBasedUnixGroupsMapping.LOG);
          +
          + private static final boolean WINDOWS =
          + (Shell.osType == Shell.OSType.OS_TYPE_WIN);
          — End diff –

          I think you can just use Shell.WINDOWS instead.

          Show
          githubbot ASF GitHub Bot added a comment - Github user jojochuang commented on a diff in the pull request: https://github.com/apache/hadoop/pull/161#discussion_r101592163 — Diff: hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/security/TestShellBasedUnixGroupsMapping.java — @@ -22,19 +22,32 @@ import org.apache.commons.logging.Log; import org.apache.commons.logging.LogFactory; +import org.apache.hadoop.conf.Configuration; +import org.apache.hadoop.fs.CommonConfigurationKeys; +import org.apache.hadoop.test.GenericTestUtils; +import org.apache.hadoop.util.ReflectionUtils; +import org.apache.hadoop.util.Shell; import org.apache.hadoop.util.Shell.ExitCodeException; import org.apache.hadoop.util.Shell.ShellCommandExecutor; import org.junit.Test; + import static org.junit.Assert.*; import static org.mockito.Mockito.doNothing; import static org.mockito.Mockito.doThrow; import static org.mockito.Mockito.mock; import static org.mockito.Mockito.when; public class TestShellBasedUnixGroupsMapping { private static final Log LOG = + private static final Log TESTLOG = LogFactory.getLog(TestShellBasedUnixGroupsMapping.class); + private final GenericTestUtils.LogCapturer shellMappingLog = + GenericTestUtils.LogCapturer.captureLogs( + ShellBasedUnixGroupsMapping.LOG); + + private static final boolean WINDOWS = + (Shell.osType == Shell.OSType.OS_TYPE_WIN); — End diff – I think you can just use Shell.WINDOWS instead.
          Hide
          jojochuang Wei-Chiu Chuang added a comment -

          Hey Harsh J this is a big patch, but sure I'll review it today.

          Show
          jojochuang Wei-Chiu Chuang added a comment - Hey Harsh J this is a big patch, but sure I'll review it today.
          Hide
          qwertymaniac Harsh J added a comment -

          Wei-Chiu Chuang - Could you help in reviewing this? Current diff is at https://github.com/apache/hadoop/pull/161

          Show
          qwertymaniac Harsh J added a comment - Wei-Chiu Chuang - Could you help in reviewing this? Current diff is at https://github.com/apache/hadoop/pull/161
          Hide
          qwertymaniac Harsh J added a comment -

          Xiaoyu Yao / Yongjun Zhang - Ping. Hoping you have some time for reviewing this one?

          Show
          qwertymaniac Harsh J added a comment - Xiaoyu Yao / Yongjun Zhang - Ping. Hoping you have some time for reviewing this one?
          Hide
          qwertymaniac Harsh J added a comment -

          Xiaoyu Yao - Thanks for the reviews so far. Could you take a look at the current patch-set?

          Yongjun Zhang - Could you help take a look at this one?

          Show
          qwertymaniac Harsh J added a comment - Xiaoyu Yao - Thanks for the reviews so far. Could you take a look at the current patch-set? Yongjun Zhang - Could you help take a look at this one?
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user QwertyManiac commented on a diff in the pull request:

          https://github.com/apache/hadoop/pull/161#discussion_r88366053

          — Diff: hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/security/ShellBasedUnixGroupsMapping.java —
          @@ -51,14 +52,16 @@
          LoggerFactory.getLogger(ShellBasedUnixGroupsMapping.class);

          private long timeout = 0L;
          + private final List<String> emptyGroupsList = new LinkedList<>();
          — End diff –

          Done in the latest commit.

          Show
          githubbot ASF GitHub Bot added a comment - Github user QwertyManiac commented on a diff in the pull request: https://github.com/apache/hadoop/pull/161#discussion_r88366053 — Diff: hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/security/ShellBasedUnixGroupsMapping.java — @@ -51,14 +52,16 @@ LoggerFactory.getLogger(ShellBasedUnixGroupsMapping.class); private long timeout = 0L; + private final List<String> emptyGroupsList = new LinkedList<>(); — End diff – Done in the latest commit.
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user QwertyManiac commented on a diff in the pull request:

          https://github.com/apache/hadoop/pull/161#discussion_r88366026

          — Diff: hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/CommonConfigurationKeysPublic.java —
          @@ -517,15 +517,15 @@

          • <a href=" {@docRoot}

            /../hadoop-project-dist/hadoop-common/core-default.xml">

          • core-default.xml</a>
            */
          • public static final String HADOOP_SECURITY_GROUP_SHELL_COMMAND_TIMEOUT =
          • "hadoop.security.groups.shell.groups.command.timeout";
            + public static final String HADOOP_SECURITY_GROUP_SHELL_COMMAND_TIMEOUT_SECS =
            + "hadoop.security.groups.shell.command.timeout.secs";
              • End diff –

          I'd just felt it would be clearer to have '.secs' in it indicating the minimum unit type despite the parser. I've removed that in the latest commit and added some examples into its doc-text in the xml.

          Show
          githubbot ASF GitHub Bot added a comment - Github user QwertyManiac commented on a diff in the pull request: https://github.com/apache/hadoop/pull/161#discussion_r88366026 — Diff: hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/CommonConfigurationKeysPublic.java — @@ -517,15 +517,15 @@ <a href=" {@docRoot} /../hadoop-project-dist/hadoop-common/core-default.xml"> core-default.xml</a> */ public static final String HADOOP_SECURITY_GROUP_SHELL_COMMAND_TIMEOUT = "hadoop.security.groups.shell.groups.command.timeout"; + public static final String HADOOP_SECURITY_GROUP_SHELL_COMMAND_TIMEOUT_SECS = + "hadoop.security.groups.shell.command.timeout.secs"; End diff – I'd just felt it would be clearer to have '.secs' in it indicating the minimum unit type despite the parser. I've removed that in the latest commit and added some examples into its doc-text in the xml.
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user xiaoyuyao commented on a diff in the pull request:

          https://github.com/apache/hadoop/pull/161#discussion_r88084092

          — Diff: hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/security/ShellBasedUnixGroupsMapping.java —
          @@ -51,14 +52,16 @@
          LoggerFactory.getLogger(ShellBasedUnixGroupsMapping.class);

          private long timeout = 0L;
          + private final List<String> emptyGroupsList = new LinkedList<>();
          — End diff –

          This can be static.

          Show
          githubbot ASF GitHub Bot added a comment - Github user xiaoyuyao commented on a diff in the pull request: https://github.com/apache/hadoop/pull/161#discussion_r88084092 — Diff: hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/security/ShellBasedUnixGroupsMapping.java — @@ -51,14 +52,16 @@ LoggerFactory.getLogger(ShellBasedUnixGroupsMapping.class); private long timeout = 0L; + private final List<String> emptyGroupsList = new LinkedList<>(); — End diff – This can be static.
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user xiaoyuyao commented on a diff in the pull request:

          https://github.com/apache/hadoop/pull/161#discussion_r88082752

          — Diff: hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/CommonConfigurationKeysPublic.java —
          @@ -517,15 +517,15 @@

          • <a href=" {@docRoot}

            /../hadoop-project-dist/hadoop-common/core-default.xml">

          • core-default.xml</a>
            */
          • public static final String HADOOP_SECURITY_GROUP_SHELL_COMMAND_TIMEOUT =
          • "hadoop.security.groups.shell.groups.command.timeout";
            + public static final String HADOOP_SECURITY_GROUP_SHELL_COMMAND_TIMEOUT_SECS =
            + "hadoop.security.groups.shell.command.timeout.secs";
              • End diff –

          I think we should keep "hadoop.security.groups.shell.groups.command.timeout". Sorry I was not clear about this when suggesting Configuration#getTimeDuration. It allows admin to specify the time values with suffix like 10s, 1m, 2h. So the ."secs" won't be needed here.

          Show
          githubbot ASF GitHub Bot added a comment - Github user xiaoyuyao commented on a diff in the pull request: https://github.com/apache/hadoop/pull/161#discussion_r88082752 — Diff: hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/CommonConfigurationKeysPublic.java — @@ -517,15 +517,15 @@ <a href=" {@docRoot} /../hadoop-project-dist/hadoop-common/core-default.xml"> core-default.xml</a> */ public static final String HADOOP_SECURITY_GROUP_SHELL_COMMAND_TIMEOUT = "hadoop.security.groups.shell.groups.command.timeout"; + public static final String HADOOP_SECURITY_GROUP_SHELL_COMMAND_TIMEOUT_SECS = + "hadoop.security.groups.shell.command.timeout.secs"; End diff – I think we should keep "hadoop.security.groups.shell.groups.command.timeout". Sorry I was not clear about this when suggesting Configuration#getTimeDuration. It allows admin to specify the time values with suffix like 10s, 1m, 2h. So the ."secs" won't be needed here.
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user QwertyManiac commented on a diff in the pull request:

          https://github.com/apache/hadoop/pull/161#discussion_r88010241

          — Diff: hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/security/ShellBasedUnixGroupsMapping.java —
          @@ -133,8 +171,26 @@ protected ShellCommandExecutor createGroupIDExecutor(String userName)

          { groups = resolvePartialGroupNames(user, e.getMessage(), executor.getOutput()); }

          catch (PartialGroupNameException pge) {

          • LOG.warn("unable to return groups for user " + user, pge);
            + LOG.warn("unable to return groups for user {}", user, pge);
            + return new LinkedList<>();
              • End diff –

          Done in added commit.

          Show
          githubbot ASF GitHub Bot added a comment - Github user QwertyManiac commented on a diff in the pull request: https://github.com/apache/hadoop/pull/161#discussion_r88010241 — Diff: hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/security/ShellBasedUnixGroupsMapping.java — @@ -133,8 +171,26 @@ protected ShellCommandExecutor createGroupIDExecutor(String userName) { groups = resolvePartialGroupNames(user, e.getMessage(), executor.getOutput()); } catch (PartialGroupNameException pge) { LOG.warn("unable to return groups for user " + user, pge); + LOG.warn("unable to return groups for user {}", user, pge); + return new LinkedList<>(); End diff – Done in added commit.
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user QwertyManiac commented on a diff in the pull request:

          https://github.com/apache/hadoop/pull/161#discussion_r88010210

          — Diff: hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/security/ShellBasedUnixGroupsMapping.java —
          @@ -37,11 +43,23 @@
          */
          @InterfaceAudience.LimitedPrivate(

          {"HDFS", "MapReduce"}

          )
          @InterfaceStability.Evolving
          -public class ShellBasedUnixGroupsMapping
          +public class ShellBasedUnixGroupsMapping extends Configured
          implements GroupMappingServiceProvider {

          • private static final Log LOG =
          • LogFactory.getLog(ShellBasedUnixGroupsMapping.class);
            +
            + @VisibleForTesting
            + protected static final Logger LOG =
            + LoggerFactory.getLogger(ShellBasedUnixGroupsMapping.class);
            +
            + private long timeout = 0L;
            +
            + @Override
            + public void setConf(Configuration conf) {
            + super.setConf(conf);
            + timeout = conf.getLong(
              • End diff –

          Done in added commit.

          Show
          githubbot ASF GitHub Bot added a comment - Github user QwertyManiac commented on a diff in the pull request: https://github.com/apache/hadoop/pull/161#discussion_r88010210 — Diff: hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/security/ShellBasedUnixGroupsMapping.java — @@ -37,11 +43,23 @@ */ @InterfaceAudience.LimitedPrivate( {"HDFS", "MapReduce"} ) @InterfaceStability.Evolving -public class ShellBasedUnixGroupsMapping +public class ShellBasedUnixGroupsMapping extends Configured implements GroupMappingServiceProvider { private static final Log LOG = LogFactory.getLog(ShellBasedUnixGroupsMapping.class); + + @VisibleForTesting + protected static final Logger LOG = + LoggerFactory.getLogger(ShellBasedUnixGroupsMapping.class); + + private long timeout = 0L; + + @Override + public void setConf(Configuration conf) { + super.setConf(conf); + timeout = conf.getLong( End diff – Done in added commit.
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user xiaoyuyao commented on a diff in the pull request:

          https://github.com/apache/hadoop/pull/161#discussion_r87855020

          — Diff: hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/security/ShellBasedUnixGroupsMapping.java —
          @@ -133,8 +171,26 @@ protected ShellCommandExecutor createGroupIDExecutor(String userName)

          { groups = resolvePartialGroupNames(user, e.getMessage(), executor.getOutput()); }

          catch (PartialGroupNameException pge) {

          • LOG.warn("unable to return groups for user " + user, pge);
            + LOG.warn("unable to return groups for user {}", user, pge);
            + return new LinkedList<>();
            + }
            + } catch (IOException ioe) {
            + // If its a shell executor timeout, indicate so in the message
            + // but treat the result as empty instead of throwing it up,
            + // similar to how partial resolution failures are handled above
            + if (executor.isTimedOut()) {
            + LOG.warn(
            + "Unable to return groups for user '{}' as shell group lookup " +
            + "command '{}' ran longer than the configured timeout limit of " +
            + "{} seconds.",
            + user,
            + Arrays.asList(executor.getExecString()),
            + timeout
            + );
            return new LinkedList<>();
              • End diff –

          flyweight pattern applies here as well.

          Show
          githubbot ASF GitHub Bot added a comment - Github user xiaoyuyao commented on a diff in the pull request: https://github.com/apache/hadoop/pull/161#discussion_r87855020 — Diff: hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/security/ShellBasedUnixGroupsMapping.java — @@ -133,8 +171,26 @@ protected ShellCommandExecutor createGroupIDExecutor(String userName) { groups = resolvePartialGroupNames(user, e.getMessage(), executor.getOutput()); } catch (PartialGroupNameException pge) { LOG.warn("unable to return groups for user " + user, pge); + LOG.warn("unable to return groups for user {}", user, pge); + return new LinkedList<>(); + } + } catch (IOException ioe) { + // If its a shell executor timeout, indicate so in the message + // but treat the result as empty instead of throwing it up, + // similar to how partial resolution failures are handled above + if (executor.isTimedOut()) { + LOG.warn( + "Unable to return groups for user '{}' as shell group lookup " + + "command '{}' ran longer than the configured timeout limit of " + + "{} seconds.", + user, + Arrays.asList(executor.getExecString()), + timeout + ); return new LinkedList<>(); End diff – flyweight pattern applies here as well.
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user xiaoyuyao commented on a diff in the pull request:

          https://github.com/apache/hadoop/pull/161#discussion_r87854835

          — Diff: hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/security/ShellBasedUnixGroupsMapping.java —
          @@ -133,8 +171,26 @@ protected ShellCommandExecutor createGroupIDExecutor(String userName)

          { groups = resolvePartialGroupNames(user, e.getMessage(), executor.getOutput()); }

          catch (PartialGroupNameException pge) {

          • LOG.warn("unable to return groups for user " + user, pge);
            + LOG.warn("unable to return groups for user {}", user, pge);
            + return new LinkedList<>();
              • End diff –

          Can we use flyweight pattern to minimize memory usage for the empty LinkList?

          Show
          githubbot ASF GitHub Bot added a comment - Github user xiaoyuyao commented on a diff in the pull request: https://github.com/apache/hadoop/pull/161#discussion_r87854835 — Diff: hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/security/ShellBasedUnixGroupsMapping.java — @@ -133,8 +171,26 @@ protected ShellCommandExecutor createGroupIDExecutor(String userName) { groups = resolvePartialGroupNames(user, e.getMessage(), executor.getOutput()); } catch (PartialGroupNameException pge) { LOG.warn("unable to return groups for user " + user, pge); + LOG.warn("unable to return groups for user {}", user, pge); + return new LinkedList<>(); End diff – Can we use flyweight pattern to minimize memory usage for the empty LinkList?
          Hide
          githubbot ASF GitHub Bot added a comment -

          GitHub user QwertyManiac opened a pull request:

          https://github.com/apache/hadoop/pull/161

          HADOOP-13817. Add a finite shell command timeout to ShellBasedUnixGroupsMapping.

          • Tests required log capture so the LOG variable was elevated in visibility, which required changes in a few test methods unrelated to just this change.
          • Timeout is applied to both, the regular groups and the partial group listing check commands.
          • Default is left to 0 to not break current behaviour (i.e. to wait forever until groups are resolvable).

          You can merge this pull request into a Git repository by running:

          $ git pull https://github.com/QwertyManiac/hadoop HADOOP-13817

          Alternatively you can review and apply these changes as the patch at:

          https://github.com/apache/hadoop/pull/161.patch

          To close this pull request, make a commit to your master/trunk branch
          with (at least) the following in the commit message:

          This closes #161


          commit 6707438a05d46ac57b230d4e9ade59a1a935c9c3
          Author: Harsh J <harsh@cloudera.com>
          Date: 2016-11-14T10:29:58Z

          HADOOP-13817. Add a finite shell command timeout to ShellBasedUnixGroupsMapping.


          Show
          githubbot ASF GitHub Bot added a comment - GitHub user QwertyManiac opened a pull request: https://github.com/apache/hadoop/pull/161 HADOOP-13817 . Add a finite shell command timeout to ShellBasedUnixGroupsMapping. Tests required log capture so the LOG variable was elevated in visibility, which required changes in a few test methods unrelated to just this change. Timeout is applied to both, the regular groups and the partial group listing check commands. Default is left to 0 to not break current behaviour (i.e. to wait forever until groups are resolvable). You can merge this pull request into a Git repository by running: $ git pull https://github.com/QwertyManiac/hadoop HADOOP-13817 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/hadoop/pull/161.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #161 commit 6707438a05d46ac57b230d4e9ade59a1a935c9c3 Author: Harsh J <harsh@cloudera.com> Date: 2016-11-14T10:29:58Z HADOOP-13817 . Add a finite shell command timeout to ShellBasedUnixGroupsMapping.

            People

            • Assignee:
              qwertymaniac Harsh J
              Reporter:
              qwertymaniac Harsh J
            • Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development