Hadoop Common
  1. Hadoop Common
  2. HADOOP-10522

JniBasedUnixGroupMapping mishandles errors

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Critical Critical
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 2.4.1
    • Component/s: None
    • Labels:
      None
    • Target Version/s:
    • Hadoop Flags:
      Reviewed

      Description

      The mishandling of errors in the jni user-to-groups mapping modules can cause segmentation faults in subsequent calls. Here are the bugs:

      1) If hadoop_user_info_fetch() returns an error code that is not ENOENT, the error may not be handled at all. This bug was found by Chris Nauroth.

      2) In hadoop_user_info_fetch() and hadoop_group_info_fetch(), the global errno is directly used. This is not thread-safe and could be the cause of some failures that disappeared after enabling the big lookup lock.

      3) In the above methods, there is no limit on retries.

        Issue Links

          Activity

          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12640897/hadoop-10522.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          -1 tests included. The patch doesn't appear to include any new or modified tests.
          Please justify why no new tests are needed for this patch.
          Also please list what manual steps were performed to verify this patch.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 javadoc. There were no new javadoc warning messages.

          +1 eclipse:eclipse. The patch built with eclipse:eclipse.

          +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 core tests. The patch passed unit tests in hadoop-common-project/hadoop-common.

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-HADOOP-Build/3817//testReport/
          Console output: https://builds.apache.org/job/PreCommit-HADOOP-Build/3817//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12640897/hadoop-10522.patch against trunk revision . +1 @author . The patch does not contain any @author tags. -1 tests included . The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . There were no new javadoc warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. +1 findbugs . The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. +1 core tests . The patch passed unit tests in hadoop-common-project/hadoop-common. +1 contrib tests . The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HADOOP-Build/3817//testReport/ Console output: https://builds.apache.org/job/PreCommit-HADOOP-Build/3817//console This message is automatically generated.
          Hide
          Chris Nauroth added a comment -

          Hi, Kihwal Lee. Thank you for putting this patch together.

          The changes look good to me for handling the other error codes in hadoop_user_info_fetch and limiting retries. I didn't understand the part about errno though. According to POSIX, mutation of errno on one thread is not visible in other threads:

          http://www.unix.org/whitepapers/reentrant.html

          The Linux man page specifically says that it's in thread-local storage:

          http://linux.die.net/man/3/errno

          Additionally, the POSIX docs say that both getgrgid_r and getpwnam_r are supposed to be thread-safe:

          http://pubs.opengroup.org/onlinepubs/9699919799/functions/getgrgid.html

          http://pubs.opengroup.org/onlinepubs/009695399/functions/getpwnam.html

          Have you observed something different in practice?

          Show
          Chris Nauroth added a comment - Hi, Kihwal Lee . Thank you for putting this patch together. The changes look good to me for handling the other error codes in hadoop_user_info_fetch and limiting retries. I didn't understand the part about errno though. According to POSIX, mutation of errno on one thread is not visible in other threads: http://www.unix.org/whitepapers/reentrant.html The Linux man page specifically says that it's in thread-local storage: http://linux.die.net/man/3/errno Additionally, the POSIX docs say that both getgrgid_r and getpwnam_r are supposed to be thread-safe: http://pubs.opengroup.org/onlinepubs/9699919799/functions/getgrgid.html http://pubs.opengroup.org/onlinepubs/009695399/functions/getpwnam.html Have you observed something different in practice?
          Hide
          Kihwal Lee added a comment -

          Chris Nauroth: You are right. errno should be thread-safe on most modern platforms. I think it is still safer to use the return value than errno whenever possible. Buggy code could make decision based on errno set by previous sys call.

          Show
          Kihwal Lee added a comment - Chris Nauroth : You are right. errno should be thread-safe on most modern platforms. I think it is still safer to use the return value than errno whenever possible. Buggy code could make decision based on errno set by previous sys call.
          Hide
          Daryn Sharp added a comment -

          +1 Looks good to me. Since the *_r functions return errno, I think that's the safest value to use.

          Show
          Daryn Sharp added a comment - +1 Looks good to me. Since the *_r functions return errno, I think that's the safest value to use.
          Hide
          Colin P. McCabe added a comment -
          +    // The following call returns errno. Reading the global errno wihtout
          +    // locking is not thread-safe.
          

          As Chris Nauroth mentioned, this is wrong. Please remove

          +    pwd = NULL;
          +    ret = getpwnam_r(username, &uinfo->pwd, uinfo->buf,
                                    uinfo->buf_sz, &pwd);
          -    } while ((!pwd) && (errno == EINTR));
          

          Unfortunately, this is wrong too.

          getgrgrid_r does not set errno (or at least is not documented to do so by POSIX). Instead, it returns the error number directly. Here's the man page on my system:

                 On  success,  getgrnam_r() and getgrgid_r() return zero, and set *result to grp.  If no matching group record was found, these functions return 0 and store NULL in *result.  In case of error, an error number is returned, and NULL is
                 stored in *result.
          

          Notice that errno is not mentioned.

             ret = hadoop_user_info_fetch(uinfo, username);
          -  if (ret == ENOENT) {
          -    jgroups = (*env)->NewObjectArray(env, 0, g_string_clazz, NULL);
          +  if (ret) {
          +    if (ret == ENOENT) {
          +      jgroups = (*env)->NewObjectArray(env, 0, g_string_clazz, NULL);
          +    } else { // handle other errors
          +      char buf[128];
          +      snprintf(buf, sizeof(buf), "getgrouplist: error looking up user. %d (%s)",
          +               ret, terror(ret));
          +      THROW(env, "java/lang/RuntimeException", buf);
          +    }
          

          Try (*env)->Throw(env, newRuntimeException("getgrouplist: error looking up user. %d (%s)", ret, terror(ret))) here instead.

          +  for (i = 0, ret = 0; i < MAX_USER_LOOKUP_TRIES; i++) {
          

          This is wrong. Just because we get EINTR 5 times doesn't mean we should fail. Maybe we're just handling a lot of signals (the JVM sends signals internally). Also, why are we increasing the buffer size when we get EINTR? We should only increase the buffer size when we get ERANGE. I think the better way to do this would be to keep the old loop, but break out if we got an ERANGE and the buffer size was above some fixed amount. However, this still seems strange to me. Clearly the underlying library is buggy, if it keeps telling us ERANGE forever. Is there a particular bug we are trying to work around in this patch?

          Show
          Colin P. McCabe added a comment - + // The following call returns errno. Reading the global errno wihtout + // locking is not thread-safe. As Chris Nauroth mentioned, this is wrong. Please remove + pwd = NULL; + ret = getpwnam_r(username, &uinfo->pwd, uinfo->buf, uinfo->buf_sz, &pwd); - } while ((!pwd) && (errno == EINTR)); Unfortunately, this is wrong too. getgrgrid_r does not set errno (or at least is not documented to do so by POSIX). Instead, it returns the error number directly. Here's the man page on my system: On success, getgrnam_r() and getgrgid_r() return zero, and set *result to grp. If no matching group record was found, these functions return 0 and store NULL in *result. In case of error, an error number is returned, and NULL is stored in *result. Notice that errno is not mentioned. ret = hadoop_user_info_fetch(uinfo, username); - if (ret == ENOENT) { - jgroups = (*env)->NewObjectArray(env, 0, g_string_clazz, NULL); + if (ret) { + if (ret == ENOENT) { + jgroups = (*env)->NewObjectArray(env, 0, g_string_clazz, NULL); + } else { // handle other errors + char buf[128]; + snprintf(buf, sizeof(buf), "getgrouplist: error looking up user. %d (%s)" , + ret, terror(ret)); + THROW(env, "java/lang/RuntimeException" , buf); + } Try (*env)->Throw(env, newRuntimeException("getgrouplist: error looking up user. %d (%s)", ret, terror(ret))) here instead. + for (i = 0, ret = 0; i < MAX_USER_LOOKUP_TRIES; i++) { This is wrong. Just because we get EINTR 5 times doesn't mean we should fail. Maybe we're just handling a lot of signals (the JVM sends signals internally). Also, why are we increasing the buffer size when we get EINTR ? We should only increase the buffer size when we get ERANGE . I think the better way to do this would be to keep the old loop, but break out if we got an ERANGE and the buffer size was above some fixed amount. However, this still seems strange to me. Clearly the underlying library is buggy, if it keeps telling us ERANGE forever. Is there a particular bug we are trying to work around in this patch?
          Hide
          Colin P. McCabe added a comment -

          Let's hold off on committing this until we fix these issues.

          Show
          Colin P. McCabe added a comment - Let's hold off on committing this until we fix these issues.
          Hide
          Kihwal Lee added a comment -

          Thanks for the reviews, Chris and Daryn. I've committed this to trunk, branch-2 and branch-2.4.

          Show
          Kihwal Lee added a comment - Thanks for the reviews, Chris and Daryn. I've committed this to trunk, branch-2 and branch-2.4.
          Hide
          Colin P. McCabe added a comment -

          Kihwal Lee, it seems like you missed my review. You probably didn't hit refresh soon enough. Do you mind if I back this out so we can fix some of the issues with this patch?

          Show
          Colin P. McCabe added a comment - Kihwal Lee , it seems like you missed my review. You probably didn't hit refresh soon enough. Do you mind if I back this out so we can fix some of the issues with this patch?
          Hide
          Hudson added a comment -

          SUCCESS: Integrated in Hadoop-trunk-Commit #5544 (See https://builds.apache.org/job/Hadoop-trunk-Commit/5544/)
          HADOOP-10522. JniBasedUnixGroupMapping mishandles errors. Contributed by Kihwal Lee. (kihwal: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1588949)

          • /hadoop/common/trunk/hadoop-common-project/hadoop-common/CHANGES.txt
          • /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/native/src/org/apache/hadoop/security/JniBasedUnixGroupsMapping.c
          • /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/native/src/org/apache/hadoop/security/hadoop_group_info.c
          • /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/native/src/org/apache/hadoop/security/hadoop_user_info.c
          Show
          Hudson added a comment - SUCCESS: Integrated in Hadoop-trunk-Commit #5544 (See https://builds.apache.org/job/Hadoop-trunk-Commit/5544/ ) HADOOP-10522 . JniBasedUnixGroupMapping mishandles errors. Contributed by Kihwal Lee. (kihwal: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1588949 ) /hadoop/common/trunk/hadoop-common-project/hadoop-common/CHANGES.txt /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/native/src/org/apache/hadoop/security/JniBasedUnixGroupsMapping.c /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/native/src/org/apache/hadoop/security/hadoop_group_info.c /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/native/src/org/apache/hadoop/security/hadoop_user_info.c
          Hide
          Kihwal Lee added a comment - - edited

          Thanks for the feedback, Colin P. McCabe

          getgrgrid_r does not set errno (or at least is not documented to do so by POSIX).

          The patch does not use errno. It explicitly use the return value. The code before my patch used errno.

          Try (*env)->Throw(env, newRuntimeException("getgrouplist: error looking up user. %d (%s)", ret, terror(ret))) here instead.

          I don't think it is correct to throw a RuntimeException when a user is not found (ENOENT).

          Just because we get EINTR 5 times doesn't mean we should fail.

          Probably. I will fix that.

          Also, why are we increasing the buffer size when we get EINTR?

          We don't.

          Another bug in my patch is to return EIO when user/group is not found. I will fix it and enable retries for EINTR forever in a separate jira. I will file one and attach a patch.

          Show
          Kihwal Lee added a comment - - edited Thanks for the feedback, Colin P. McCabe getgrgrid_r does not set errno (or at least is not documented to do so by POSIX). The patch does not use errno. It explicitly use the return value. The code before my patch used errno. Try (*env)->Throw(env, newRuntimeException("getgrouplist: error looking up user. %d (%s)", ret, terror(ret))) here instead. I don't think it is correct to throw a RuntimeException when a user is not found (ENOENT). Just because we get EINTR 5 times doesn't mean we should fail. Probably. I will fix that. Also, why are we increasing the buffer size when we get EINTR? We don't. Another bug in my patch is to return EIO when user/group is not found. I will fix it and enable retries for EINTR forever in a separate jira. I will file one and attach a patch.
          Hide
          Kihwal Lee added a comment -

          Filed HADOOP-10527 and attached a patch.

          Show
          Kihwal Lee added a comment - Filed HADOOP-10527 and attached a patch.
          Hide
          Colin P. McCabe added a comment -

          Thanks for filing the follow-up, Kihwal. I'll move my responses there.

          Show
          Colin P. McCabe added a comment - Thanks for filing the follow-up, Kihwal. I'll move my responses there.
          Hide
          Hudson added a comment -

          SUCCESS: Integrated in Hadoop-Yarn-trunk #548 (See https://builds.apache.org/job/Hadoop-Yarn-trunk/548/)
          HADOOP-10522. JniBasedUnixGroupMapping mishandles errors. Contributed by Kihwal Lee. (kihwal: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1588949)

          • /hadoop/common/trunk/hadoop-common-project/hadoop-common/CHANGES.txt
          • /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/native/src/org/apache/hadoop/security/JniBasedUnixGroupsMapping.c
          • /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/native/src/org/apache/hadoop/security/hadoop_group_info.c
          • /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/native/src/org/apache/hadoop/security/hadoop_user_info.c
          Show
          Hudson added a comment - SUCCESS: Integrated in Hadoop-Yarn-trunk #548 (See https://builds.apache.org/job/Hadoop-Yarn-trunk/548/ ) HADOOP-10522 . JniBasedUnixGroupMapping mishandles errors. Contributed by Kihwal Lee. (kihwal: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1588949 ) /hadoop/common/trunk/hadoop-common-project/hadoop-common/CHANGES.txt /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/native/src/org/apache/hadoop/security/JniBasedUnixGroupsMapping.c /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/native/src/org/apache/hadoop/security/hadoop_group_info.c /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/native/src/org/apache/hadoop/security/hadoop_user_info.c
          Hide
          Hudson added a comment -

          SUCCESS: Integrated in Hadoop-Hdfs-trunk #1740 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk/1740/)
          HADOOP-10522. JniBasedUnixGroupMapping mishandles errors. Contributed by Kihwal Lee. (kihwal: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1588949)

          • /hadoop/common/trunk/hadoop-common-project/hadoop-common/CHANGES.txt
          • /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/native/src/org/apache/hadoop/security/JniBasedUnixGroupsMapping.c
          • /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/native/src/org/apache/hadoop/security/hadoop_group_info.c
          • /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/native/src/org/apache/hadoop/security/hadoop_user_info.c
          Show
          Hudson added a comment - SUCCESS: Integrated in Hadoop-Hdfs-trunk #1740 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk/1740/ ) HADOOP-10522 . JniBasedUnixGroupMapping mishandles errors. Contributed by Kihwal Lee. (kihwal: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1588949 ) /hadoop/common/trunk/hadoop-common-project/hadoop-common/CHANGES.txt /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/native/src/org/apache/hadoop/security/JniBasedUnixGroupsMapping.c /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/native/src/org/apache/hadoop/security/hadoop_group_info.c /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/native/src/org/apache/hadoop/security/hadoop_user_info.c
          Hide
          Hudson added a comment -

          SUCCESS: Integrated in Hadoop-Mapreduce-trunk #1765 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1765/)
          HADOOP-10522. JniBasedUnixGroupMapping mishandles errors. Contributed by Kihwal Lee. (kihwal: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1588949)

          • /hadoop/common/trunk/hadoop-common-project/hadoop-common/CHANGES.txt
          • /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/native/src/org/apache/hadoop/security/JniBasedUnixGroupsMapping.c
          • /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/native/src/org/apache/hadoop/security/hadoop_group_info.c
          • /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/native/src/org/apache/hadoop/security/hadoop_user_info.c
          Show
          Hudson added a comment - SUCCESS: Integrated in Hadoop-Mapreduce-trunk #1765 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1765/ ) HADOOP-10522 . JniBasedUnixGroupMapping mishandles errors. Contributed by Kihwal Lee. (kihwal: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1588949 ) /hadoop/common/trunk/hadoop-common-project/hadoop-common/CHANGES.txt /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/native/src/org/apache/hadoop/security/JniBasedUnixGroupsMapping.c /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/native/src/org/apache/hadoop/security/hadoop_group_info.c /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/native/src/org/apache/hadoop/security/hadoop_user_info.c

            People

            • Assignee:
              Kihwal Lee
              Reporter:
              Kihwal Lee
            • Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development