Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 2.0.1-alpha
    • Fix Version/s: 2.0.2-alpha
    • Component/s: libhdfs
    • Labels:
      None

      Description

      libhdfs does not consistently handle exceptions. Sometimes we don't free the memory associated with them (memory leak). Sometimes we invoke JNI functions that are not supposed to be invoked when an exception is active.

      Running a libhdfs test program with -Xcheck:jni shows the latter problem clearly:

      WARNING in native method: JNI call made with exception pending
      WARNING in native method: JNI call made with exception pending
      WARNING in native method: JNI call made with exception pending
      WARNING in native method: JNI call made with exception pending
      WARNING in native method: JNI call made with exception pending
      Exception in thread "main" java.io.IOException: ...
      
      1. HDFS-3579.004.patch
        138 kB
        Colin Patrick McCabe
      2. HDFS-3579.005.patch
        137 kB
        Colin Patrick McCabe
      3. HDFS-3579.006.patch
        139 kB
        Colin Patrick McCabe

        Issue Links

          Activity

          Hide
          Colin Patrick McCabe added a comment -

          This change fixes exception handling in libhdfs.

          It establishes the convention that we clear pending exceptions as soon as they are raised. We never assume that the caller of the function will perform this cleanup-- the function itself must do it. This is important to do because when exceptions are pending in a thread, most JNI calls cannot be made (with the exception of DeleteLocalReference and a few others). The caller of the function is responsible for freeing the local reference to the exception, if one was returned.

          This clears up the warnings about JNI calls being made while exceptions are pending. It also fixes many, many cases where we create a local reference to an exception and then never dispose of it-- one source of memory leaks in long-running libhdfs-users like fuse_dfs.

          Also fix some errno problems that could lead to errno not being set, or being set to something incorrect. Most functions can clobber the value of errno. So when returning errno from a libhdfs function, the assignment to errno should be literally the last statement executed.

          Show
          Colin Patrick McCabe added a comment - This change fixes exception handling in libhdfs. It establishes the convention that we clear pending exceptions as soon as they are raised. We never assume that the caller of the function will perform this cleanup-- the function itself must do it. This is important to do because when exceptions are pending in a thread, most JNI calls cannot be made (with the exception of DeleteLocalReference and a few others). The caller of the function is responsible for freeing the local reference to the exception, if one was returned. This clears up the warnings about JNI calls being made while exceptions are pending. It also fixes many, many cases where we create a local reference to an exception and then never dispose of it-- one source of memory leaks in long-running libhdfs-users like fuse_dfs. Also fix some errno problems that could lead to errno not being set, or being set to something incorrect. Most functions can clobber the value of errno. So when returning errno from a libhdfs function, the assignment to errno should be literally the last statement executed.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12537225/HDFS-3579.004.patch
          against trunk revision .

          -1 patch. The patch command could not apply the patch.

          Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/2868//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12537225/HDFS-3579.004.patch against trunk revision . -1 patch. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/2868//console This message is automatically generated.
          Hide
          Colin Patrick McCabe added a comment -

          rebase on trunk

          Show
          Colin Patrick McCabe added a comment - rebase on trunk
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12537283/HDFS-3579.005.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          -1 tests included. The patch doesn't appear to include any new or modified tests.
          Please justify why no new tests are needed for this patch.
          Also please list what manual steps were performed to verify this patch.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 eclipse:eclipse. The patch built with eclipse:eclipse.

          +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed these unit tests in hadoop-hdfs-project/hadoop-hdfs:

          org.apache.hadoop.hdfs.server.blockmanagement.TestBlocksWithNotEnoughRacks
          org.apache.hadoop.hdfs.server.namenode.metrics.TestNameNodeMetrics

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/2874//testReport/
          Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/2874//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12537283/HDFS-3579.005.patch against trunk revision . +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 javadoc. The javadoc tool did not generate any warning messages. +1 eclipse:eclipse. The patch built with eclipse:eclipse. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these unit tests in hadoop-hdfs-project/hadoop-hdfs: org.apache.hadoop.hdfs.server.blockmanagement.TestBlocksWithNotEnoughRacks org.apache.hadoop.hdfs.server.namenode.metrics.TestNameNodeMetrics +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/2874//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/2874//console This message is automatically generated.
          Hide
          Andy Isaacson added a comment -
          ...
          +    jvalue jVal;
          +
          +    jthrowable jthr = classNameOfObject(exc, env, &className);
          +    if (jthr) {
          

          the blank line should be after the declaration of jthr.

          +        fprintf(stderr, "PrintExceptionAndFree: error determining class name "
          +            "of exception!");
          

          add a \n to this printf. If it were me I would replace all the surprised "!"s with "." but I am not going to insist on unexcitifying the error messages.

          +        fprintf(stderr, " error: (no exception)");
          

          another missing \n.

          -        constructNewObjectOfClass(env, NULL, "org/apache/hadoop/fs/Path",
          +    jthr = constructNewObjectOfClass(env, &jPath, "org/apache/hadoop/fs/Path",
                                             "(Ljava/lang/String;)V", jPathString);
          

          indent the continuation line to match the (.

          +    jthr = newCStr(env, jRet, val);
          +    if (jthr)
                   goto done;
           done:
          

          having a "goto done" on the line before "done:" is a bit confusing. I'd just drop it.

          --- hadoop-hdfs-project/hadoop-hdfs/src/main/native/libhdfs/jni_helper.h
          +++ hadoop-hdfs-project/hadoop-hdfs/src/main/native/libhdfs/jni_helper.h
          ...
          +void destroyLocalReference(JNIEnv *env, jobject jObject);
          

          Since this is external linkage C API, I think we need to use a reserved namespace prefix to avoid breaking outside code. So, make this hdfsDestroyLocalReference perhaps? This is a large change, and this is newly exposed code, so that change can be done in a separate jira.

          An alternate fix, if we only support a .so and don't ship a .a, is to use a linker script to control symbol visibility.

          But one way or another we need to make sure we have sanitary symbol table usage.

          +    if (jthr) {
          +        ret = printExceptionAndFree(env, jthr, PRINT_EXC_ALL,
          +            "hdfsDisconnect: org.apache.hadoop.fs.FileSystem::close");
          

          I don't think :: is right for Java? I think it should be "org.apache.hadoop.fs.FileSystem#close".

          Please check all the printExceptionAndFree calls and verify that they are consistent, I see some with and some without the org.class.path. I don't care what standard, but I think "cFuncName: org.class.path#method" is the right string unless there's a better idea.

               //bufferSize
               if (!bufferSize) {
          

          Please delete this useless comment. (Not your code, but you're editing right here, let's just fix it.)

               }  else if ((flags & O_WRONLY) && (flags & O_APPEND)) {
          +        // WRITE/APPEND?
          +       jthr = invokeMethod(env, &jVal, INSTANCE, jFS, HADOOP_FS,
          

          There's a bunch of funky whitespace here, let's fix it. "} else" has an extra space, and I think the "// WRITE" is indented one space too far.

          +    file = calloc(1, sizeof(struct hdfsFile_internal));
          

          I find file = calloc(1, sizeof *file); easier to convince myself of, but I don't know if we have a local style guideline on this point?

          +    if ((flags & O_WRONLY) == 0) {
          

          This is wrong per HDFS-3710. (Many occurrences of this pattern.) No need to fix the existing ones outside of the code you're touching, but please don't add more.

          more to come in next comment...

          Show
          Andy Isaacson added a comment - ... + jvalue jVal; + + jthrowable jthr = classNameOfObject(exc, env, &className); + if (jthr) { the blank line should be after the declaration of jthr. + fprintf(stderr, "PrintExceptionAndFree: error determining class name " + "of exception!" ); add a \n to this printf. If it were me I would replace all the surprised "!"s with "." but I am not going to insist on unexcitifying the error messages. + fprintf(stderr, " error: (no exception)" ); another missing \n. - constructNewObjectOfClass(env, NULL, "org/apache/hadoop/fs/Path" , + jthr = constructNewObjectOfClass(env, &jPath, "org/apache/hadoop/fs/Path" , "(Ljava/lang/ String ;)V" , jPathString); indent the continuation line to match the (. + jthr = newCStr(env, jRet, val); + if (jthr) goto done; done: having a "goto done" on the line before "done:" is a bit confusing. I'd just drop it. --- hadoop-hdfs-project/hadoop-hdfs/src/main/ native /libhdfs/jni_helper.h +++ hadoop-hdfs-project/hadoop-hdfs/src/main/ native /libhdfs/jni_helper.h ... +void destroyLocalReference(JNIEnv *env, jobject jObject); Since this is external linkage C API, I think we need to use a reserved namespace prefix to avoid breaking outside code. So, make this hdfsDestroyLocalReference perhaps? This is a large change, and this is newly exposed code, so that change can be done in a separate jira. An alternate fix, if we only support a .so and don't ship a .a, is to use a linker script to control symbol visibility. But one way or another we need to make sure we have sanitary symbol table usage. + if (jthr) { + ret = printExceptionAndFree(env, jthr, PRINT_EXC_ALL, + "hdfsDisconnect: org.apache.hadoop.fs.FileSystem::close" ); I don't think :: is right for Java? I think it should be "org.apache.hadoop.fs.FileSystem#close". Please check all the printExceptionAndFree calls and verify that they are consistent, I see some with and some without the org.class.path. I don't care what standard, but I think "cFuncName: org.class.path#method" is the right string unless there's a better idea. //bufferSize if (!bufferSize) { Please delete this useless comment. (Not your code, but you're editing right here, let's just fix it.) } else if ((flags & O_WRONLY) && (flags & O_APPEND)) { + // WRITE/APPEND? + jthr = invokeMethod(env, &jVal, INSTANCE, jFS, HADOOP_FS, There's a bunch of funky whitespace here, let's fix it. "} else" has an extra space, and I think the "// WRITE" is indented one space too far. + file = calloc(1, sizeof(struct hdfsFile_internal)); I find file = calloc(1, sizeof *file); easier to convince myself of, but I don't know if we have a local style guideline on this point? + if ((flags & O_WRONLY) == 0) { This is wrong per HDFS-3710 . (Many occurrences of this pattern.) No need to fix the existing ones outside of the code you're touching, but please don't add more. more to come in next comment...
          Hide
          Colin Patrick McCabe added a comment -
          • rebase on trunk
          • fix whitespace
          • NOPRINT_EXC_ILLEGAL_ARGUMENT should be 0x10
          • get rid of "typedef jvalue RetVal"-- it provides no value beyond just using jvalue directly
          • Use # instead of :: to describe Java methods
          • don't restate the long name of java classes in error messages-- the exceptions themselves contain that information, so it's just visual clutter
          • newRuntimeError is declared in exception.h, so it doesn't need to also be declared in jni_helper.h
          Show
          Colin Patrick McCabe added a comment - rebase on trunk fix whitespace NOPRINT_EXC_ILLEGAL_ARGUMENT should be 0x10 get rid of "typedef jvalue RetVal"-- it provides no value beyond just using jvalue directly Use # instead of :: to describe Java methods don't restate the long name of java classes in error messages-- the exceptions themselves contain that information, so it's just visual clutter newRuntimeError is declared in exception.h, so it doesn't need to also be declared in jni_helper.h
          Hide
          Colin Patrick McCabe added a comment -

          Thanks for the review, Andy. I'm looking forward to the next comment.

          [O_WRONLY discussion]

          I held off on the O_WRONLY fixes, since we have HDFS-3710 open for that. What I've done here is just move the code, not add any new (mis)uses.

          [linker script discussion]

          Yeah, HDFS-3742 is open for this.

          I tried to fix as much whitespace as I could; I'm sure there's still funky stuff lingering somewhere. We can always circle back on that later, though-- as long as this patch moves things in the right direction.

          Show
          Colin Patrick McCabe added a comment - Thanks for the review, Andy. I'm looking forward to the next comment. [O_WRONLY discussion] I held off on the O_WRONLY fixes, since we have HDFS-3710 open for that. What I've done here is just move the code, not add any new (mis)uses. [linker script discussion] Yeah, HDFS-3742 is open for this. I tried to fix as much whitespace as I could; I'm sure there's still funky stuff lingering somewhere. We can always circle back on that later, though-- as long as this patch moves things in the right direction.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12538632/HDFS-3579.006.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          -1 tests included. The patch doesn't appear to include any new or modified tests.
          Please justify why no new tests are needed for this patch.
          Also please list what manual steps were performed to verify this patch.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 eclipse:eclipse. The patch built with eclipse:eclipse.

          +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 core tests. The patch passed unit tests in hadoop-hdfs-project/hadoop-hdfs.

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/2932//testReport/
          Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/2932//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12538632/HDFS-3579.006.patch against trunk revision . +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 javadoc. The javadoc tool did not generate any warning messages. +1 eclipse:eclipse. The patch built with eclipse:eclipse. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed unit tests in hadoop-hdfs-project/hadoop-hdfs. +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/2932//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/2932//console This message is automatically generated.
          Hide
          Andy Isaacson added a comment -
          @@ -870,15 +812,12 @@ int hdfsCloseFile(hdfsFS fs, hdfsFile file)
          -    //Parameters
          -    jobject jStream = (jobject)(file ? file->file : NULL);
          ...
          +    (*env)->DeleteGlobalRef(env, file->file);
               free(file);
          -    (*env)->DeleteGlobalRef(env, jStream);
          

          Let's preserve the interface that hdfsFile file can be NULL without causing SEGV. Just toss in a if (file == NULL) return -1; near the top.

          +    jthr = invokeMethod(env, &jVal, INSTANCE, jInputStream, HADOOP_ISTRM,
                                          "read", "([B)I", jbRarray);
          ...
          +    if (jVal.i < 0) {
          +        // EOF
          +        destroyLocalReference(env, jbRarray);
          +        return 0;
          +    } else if (jVal.i == 0) {
          +        destroyLocalReference(env, jbRarray);
          +        errno = EINTR;
          +        return -1;
          

          Is this correct? FSDataInputStream#read returns -1 on EOF and 0 on EINTR? That's special. I see docs for the -1 case, but I don't see anywhere that the 0 could come from?

           tSize hdfsPread(hdfsFS fs, hdfsFile f, tOffset position,
                           void* buffer, tSize length)
           {
          -    // JAVA EQUIVALENT:
          -    //  byte [] bR = new byte[length];
          -    //  fis.read(pos, bR, 0, length);
          

          I find these JAVA EQUIVALENT comments to be very helpful, could we keep them around? so long as they're accurate, I mean. If they're misleading then deleting is correct.

          +    if (jthr) {
          +        errno = printExceptionAndFree(env, jthr, PRINT_EXC_ALL,
          +            "hdfsTell: org.apache.hadoop.fs.%s::getPos",
          +            ((f->type == INPUT) ? "FSDataInputStream" :
          +                                 "FSDataOutputStream"));
          

          Please use interface here rather than recapitulating its ternary.

          more review to come ... two-thirds done now ...

          Show
          Andy Isaacson added a comment - @@ -870,15 +812,12 @@ int hdfsCloseFile(hdfsFS fs, hdfsFile file) - //Parameters - jobject jStream = (jobject)(file ? file->file : NULL); ... + (*env)->DeleteGlobalRef(env, file->file); free(file); - (*env)->DeleteGlobalRef(env, jStream); Let's preserve the interface that hdfsFile file can be NULL without causing SEGV. Just toss in a if (file == NULL) return -1; near the top. + jthr = invokeMethod(env, &jVal, INSTANCE, jInputStream, HADOOP_ISTRM, "read" , "([B)I" , jbRarray); ... + if (jVal.i < 0) { + // EOF + destroyLocalReference(env, jbRarray); + return 0; + } else if (jVal.i == 0) { + destroyLocalReference(env, jbRarray); + errno = EINTR; + return -1; Is this correct? FSDataInputStream#read returns -1 on EOF and 0 on EINTR? That's special. I see docs for the -1 case, but I don't see anywhere that the 0 could come from? tSize hdfsPread(hdfsFS fs, hdfsFile f, tOffset position, void* buffer, tSize length) { - // JAVA EQUIVALENT: - // byte [] bR = new byte [length]; - // fis.read(pos, bR, 0, length); I find these JAVA EQUIVALENT comments to be very helpful, could we keep them around? so long as they're accurate, I mean. If they're misleading then deleting is correct. + if (jthr) { + errno = printExceptionAndFree(env, jthr, PRINT_EXC_ALL, + "hdfsTell: org.apache.hadoop.fs.%s::getPos" , + ((f->type == INPUT) ? "FSDataInputStream" : + "FSDataOutputStream" )); Please use interface here rather than recapitulating its ternary. more review to come ... two-thirds done now ...
          Hide
          Colin Patrick McCabe added a comment -

          I find these JAVA EQUIVALENT comments to be very helpful, could we keep them around? so long as they're accurate, I mean. If they're misleading then deleting is correct.

          ok.

          Please use interface here rather than recapitulating its ternary.

          It was done to avoid printing out "org/apache/hadoop/fs/FSDataInputStream" etc. since- as you commeted above-- it's nicer to print something shorter. I don't have a strong feeling about it either way, but I suspect it's easier just to redo the ternary here.

          Is this correct? FSDataInputStream#read returns -1 on EOF and 0 on EINTR? That's special. I see docs for the -1 case, but I don't see anywhere that the 0 could come from?

          The standard Java convention is that -1 means EOF, and 0 is just a short read. hdfsRead, on the other hand, follows the UNIX convention. See HADOOP-1582 for more details.

          Let's preserve the interface that hdfsFile file can be NULL without causing SEGV. Just toss in a if (file == NULL) return -1; near the top.

          The whole thing is kind of messy. Passing a NULL pointer to hdfsClose is a user error, yet we check for it for some reason.

          There's no way to actually get an hdfsFile which has file->type UNINITIALIZED. Every hdfsOpen path either leads to returning null, or returning a file of type INPUT or OUTPUT. There's no way to close a file that hasn't been opened either. Similarly, there's no way to get a file where file->file is NULL.

          The original code didn't check for file->file being NULL either (look at it carefully, you'll see what I mean).

          tl;dr: I didn't change the behavior here. But someone should eventually.

          Show
          Colin Patrick McCabe added a comment - I find these JAVA EQUIVALENT comments to be very helpful, could we keep them around? so long as they're accurate, I mean. If they're misleading then deleting is correct. ok. Please use interface here rather than recapitulating its ternary. It was done to avoid printing out "org/apache/hadoop/fs/FSDataInputStream" etc. since- as you commeted above-- it's nicer to print something shorter. I don't have a strong feeling about it either way, but I suspect it's easier just to redo the ternary here. Is this correct? FSDataInputStream#read returns -1 on EOF and 0 on EINTR? That's special. I see docs for the -1 case, but I don't see anywhere that the 0 could come from? The standard Java convention is that -1 means EOF, and 0 is just a short read. hdfsRead, on the other hand, follows the UNIX convention. See HADOOP-1582 for more details. Let's preserve the interface that hdfsFile file can be NULL without causing SEGV. Just toss in a if (file == NULL) return -1; near the top. The whole thing is kind of messy. Passing a NULL pointer to hdfsClose is a user error, yet we check for it for some reason. There's no way to actually get an hdfsFile which has file->type UNINITIALIZED. Every hdfsOpen path either leads to returning null, or returning a file of type INPUT or OUTPUT. There's no way to close a file that hasn't been opened either. Similarly, there's no way to get a file where file->file is NULL. The original code didn't check for file->file being NULL either (look at it carefully, you'll see what I mean). tl;dr: I didn't change the behavior here. But someone should eventually.
          Hide
          Andy Isaacson added a comment -

          There's no way to actually get an hdfsFile which has file->type UNINITIALIZED.

          That's not what I was asking for. I want us to deal with file == NULL, it is a simple user error, no need to SEGV on it.

          If file->file == NULL or file->type == UNINITIALIZED the data structures are corrupt and we can SEGV all we want.

          Show
          Andy Isaacson added a comment - There's no way to actually get an hdfsFile which has file->type UNINITIALIZED. That's not what I was asking for. I want us to deal with file == NULL , it is a simple user error, no need to SEGV on it. If file->file == NULL or file->type == UNINITIALIZED the data structures are corrupt and we can SEGV all we want.
          Hide
          Colin Patrick McCabe added a comment -

          I want us to deal with file == NULL

          The latest patch has this in hdfsClose:

              //Sanity check
              if (!file || file->type == UNINITIALIZED) {
                  errno = EBADF;
                  return -1;
              }
          

          So it should set errno = EBADF and return -1, just as before.

          Show
          Colin Patrick McCabe added a comment - I want us to deal with file == NULL The latest patch has this in hdfsClose: //Sanity check if (!file || file->type == UNINITIALIZED) { errno = EBADF; return -1; } So it should set errno = EBADF and return -1, just as before.
          Hide
          Aaron T. Myers added a comment -

          I've taken a look at the patch and it looks good to me. I agree that this is some good cleanup to do. Thanks a lot, Andy, for your very thorough review.

          One question before I commit this patch, though: can you please describe what sort of testing you did to verify this change? Are the warnings about pending exceptions now gone from the logs? Were you able to ensure that memory is no longer leaked when exceptions are thrown?

          Show
          Aaron T. Myers added a comment - I've taken a look at the patch and it looks good to me. I agree that this is some good cleanup to do. Thanks a lot, Andy, for your very thorough review. One question before I commit this patch, though: can you please describe what sort of testing you did to verify this change? Are the warnings about pending exceptions now gone from the logs? Were you able to ensure that memory is no longer leaked when exceptions are thrown?
          Hide
          Colin Patrick McCabe added a comment -

          Are the warnings about pending exceptions now gone from the logs?

          Yes. Running a before and after with LIBHDFS_OPTS="-Xcheck:jni -Xcheck:nabounds" confirms that the messages about "JNI call made with exception pending" are gone after the patch. The test I ran was test_libhdfs_threaded.

          I also ran test_fuse_dfs and verified that it passed successfully. That test also exercises libhdfs, albeit in a slightly different way.

          We need a longer running test that exercises more failure conditions to fully establish that all memory leaks are fixed. I think writing such a test is a little bit of out scope for this JIRA, but it's definitely something we should do in the future.

          Show
          Colin Patrick McCabe added a comment - Are the warnings about pending exceptions now gone from the logs? Yes. Running a before and after with LIBHDFS_OPTS="-Xcheck:jni -Xcheck:nabounds" confirms that the messages about "JNI call made with exception pending" are gone after the patch. The test I ran was test_libhdfs_threaded. I also ran test_fuse_dfs and verified that it passed successfully. That test also exercises libhdfs, albeit in a slightly different way. We need a longer running test that exercises more failure conditions to fully establish that all memory leaks are fixed. I think writing such a test is a little bit of out scope for this JIRA, but it's definitely something we should do in the future.
          Hide
          Aaron T. Myers added a comment -

          We need a longer running test that exercises more failure conditions to fully establish that all memory leaks are fixed. I think writing such a test is a little bit of out scope for this JIRA, but it's definitely something we should do in the future.

          Definitely agree that writing such a test is well out of scope for this JIRA, but would it be possible to, for example, run test_fuse_dfs with valgrind? (No need to do that for this JIRA. This is good cleanup regardless, and we can fix any other memory leaks found in a different JIRA.)

          Yes. Running a before and after with LIBHDFS_OPTS="-Xcheck:jni -Xcheck:nabounds" confirms that the messages about "JNI call made with exception pending" are gone after the patch. The test I ran was test_libhdfs_threaded.
          I also ran test_fuse_dfs and verified that it passed successfully. That test also exercises libhdfs, albeit in a slightly different way.

          Cool, thanks for doing that.

          +1, I'll go ahead and commit this patch.

          Show
          Aaron T. Myers added a comment - We need a longer running test that exercises more failure conditions to fully establish that all memory leaks are fixed. I think writing such a test is a little bit of out scope for this JIRA, but it's definitely something we should do in the future. Definitely agree that writing such a test is well out of scope for this JIRA, but would it be possible to, for example, run test_fuse_dfs with valgrind? (No need to do that for this JIRA. This is good cleanup regardless, and we can fix any other memory leaks found in a different JIRA.) Yes. Running a before and after with LIBHDFS_OPTS="-Xcheck:jni -Xcheck:nabounds" confirms that the messages about "JNI call made with exception pending" are gone after the patch. The test I ran was test_libhdfs_threaded. I also ran test_fuse_dfs and verified that it passed successfully. That test also exercises libhdfs, albeit in a slightly different way. Cool, thanks for doing that. +1, I'll go ahead and commit this patch.
          Hide
          Colin Patrick McCabe added a comment -

          Thanks, atm. I have tried running valgrind on fuse_dfs in the past. It doesn't really work-- I get tons of false positives. I think there's a general problem running valgrind with JNI code. I did try adding more and more stuff to the "exclude lists," but it didn't seem to work. Maybe someone more knowledgeable on this topic can come up with a workaround.

          I'm also confused about whether valgrind can identify memory leaks of memory managed by the JVM. I suspect that the answer is "no," which would mean that the local reference leaks fixed by the patch would have been invisible to valgrind anyway. As far as I know, valgrind only deals with memory allocated via malloc-- although I'm happy to be corrected on this topic if someone has more info ( ? )

          Show
          Colin Patrick McCabe added a comment - Thanks, atm. I have tried running valgrind on fuse_dfs in the past. It doesn't really work-- I get tons of false positives. I think there's a general problem running valgrind with JNI code. I did try adding more and more stuff to the "exclude lists," but it didn't seem to work. Maybe someone more knowledgeable on this topic can come up with a workaround. I'm also confused about whether valgrind can identify memory leaks of memory managed by the JVM. I suspect that the answer is "no," which would mean that the local reference leaks fixed by the patch would have been invisible to valgrind anyway. As far as I know, valgrind only deals with memory allocated via malloc -- although I'm happy to be corrected on this topic if someone has more info ( ? )
          Hide
          Aaron T. Myers added a comment -

          I have tried running valgrind on fuse_dfs in the past. It doesn't really work-- <snip>

          Got it, thanks for the explanation.

          I've just committed this to trunk and branch-2. Thanks a lot for the contribution, Colin. Fixes like this are yeoman's work. Thanks for doing it.

          Show
          Aaron T. Myers added a comment - I have tried running valgrind on fuse_dfs in the past. It doesn't really work-- <snip> Got it, thanks for the explanation. I've just committed this to trunk and branch-2. Thanks a lot for the contribution, Colin. Fixes like this are yeoman's work. Thanks for doing it.
          Hide
          Hudson added a comment -

          Integrated in Hadoop-Hdfs-trunk-Commit #2621 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk-Commit/2621/)
          Add two new files missed by last commit of HDFS-3579. (Revision 1370017)
          HDFS-3579. libhdfs: fix exception handling. Contributed by Colin Patrick McCabe. (Revision 1370015)

          Result = SUCCESS
          atm : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1370017
          Files :

          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/native/libhdfs/exception.c
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/native/libhdfs/exception.h

          atm : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1370015
          Files :

          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/CMakeLists.txt
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/native/libhdfs/hdfs.c
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/native/libhdfs/hdfs.h
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/native/libhdfs/jni_helper.c
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/native/libhdfs/jni_helper.h
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/native/libhdfs/native_mini_dfs.c
          Show
          Hudson added a comment - Integrated in Hadoop-Hdfs-trunk-Commit #2621 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk-Commit/2621/ ) Add two new files missed by last commit of HDFS-3579 . (Revision 1370017) HDFS-3579 . libhdfs: fix exception handling. Contributed by Colin Patrick McCabe. (Revision 1370015) Result = SUCCESS atm : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1370017 Files : /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/native/libhdfs/exception.c /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/native/libhdfs/exception.h atm : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1370015 Files : /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/CMakeLists.txt /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/native/libhdfs/hdfs.c /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/native/libhdfs/hdfs.h /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/native/libhdfs/jni_helper.c /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/native/libhdfs/jni_helper.h /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/native/libhdfs/native_mini_dfs.c
          Hide
          Hudson added a comment -

          Integrated in Hadoop-Common-trunk-Commit #2556 (See https://builds.apache.org/job/Hadoop-Common-trunk-Commit/2556/)
          Add two new files missed by last commit of HDFS-3579. (Revision 1370017)
          HDFS-3579. libhdfs: fix exception handling. Contributed by Colin Patrick McCabe. (Revision 1370015)

          Result = SUCCESS
          atm : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1370017
          Files :

          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/native/libhdfs/exception.c
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/native/libhdfs/exception.h

          atm : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1370015
          Files :

          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/CMakeLists.txt
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/native/libhdfs/hdfs.c
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/native/libhdfs/hdfs.h
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/native/libhdfs/jni_helper.c
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/native/libhdfs/jni_helper.h
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/native/libhdfs/native_mini_dfs.c
          Show
          Hudson added a comment - Integrated in Hadoop-Common-trunk-Commit #2556 (See https://builds.apache.org/job/Hadoop-Common-trunk-Commit/2556/ ) Add two new files missed by last commit of HDFS-3579 . (Revision 1370017) HDFS-3579 . libhdfs: fix exception handling. Contributed by Colin Patrick McCabe. (Revision 1370015) Result = SUCCESS atm : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1370017 Files : /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/native/libhdfs/exception.c /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/native/libhdfs/exception.h atm : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1370015 Files : /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/CMakeLists.txt /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/native/libhdfs/hdfs.c /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/native/libhdfs/hdfs.h /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/native/libhdfs/jni_helper.c /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/native/libhdfs/jni_helper.h /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/native/libhdfs/native_mini_dfs.c
          Hide
          Hudson added a comment -

          Integrated in Hadoop-Mapreduce-trunk-Commit #2575 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Commit/2575/)
          Add two new files missed by last commit of HDFS-3579. (Revision 1370017)
          HDFS-3579. libhdfs: fix exception handling. Contributed by Colin Patrick McCabe. (Revision 1370015)

          Result = FAILURE
          atm : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1370017
          Files :

          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/native/libhdfs/exception.c
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/native/libhdfs/exception.h

          atm : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1370015
          Files :

          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/CMakeLists.txt
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/native/libhdfs/hdfs.c
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/native/libhdfs/hdfs.h
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/native/libhdfs/jni_helper.c
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/native/libhdfs/jni_helper.h
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/native/libhdfs/native_mini_dfs.c
          Show
          Hudson added a comment - Integrated in Hadoop-Mapreduce-trunk-Commit #2575 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Commit/2575/ ) Add two new files missed by last commit of HDFS-3579 . (Revision 1370017) HDFS-3579 . libhdfs: fix exception handling. Contributed by Colin Patrick McCabe. (Revision 1370015) Result = FAILURE atm : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1370017 Files : /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/native/libhdfs/exception.c /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/native/libhdfs/exception.h atm : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1370015 Files : /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/CMakeLists.txt /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/native/libhdfs/hdfs.c /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/native/libhdfs/hdfs.h /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/native/libhdfs/jni_helper.c /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/native/libhdfs/jni_helper.h /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/native/libhdfs/native_mini_dfs.c
          Hide
          Hudson added a comment -

          Integrated in Hadoop-Hdfs-trunk #1128 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk/1128/)
          Add two new files missed by last commit of HDFS-3579. (Revision 1370017)
          HDFS-3579. libhdfs: fix exception handling. Contributed by Colin Patrick McCabe. (Revision 1370015)

          Result = SUCCESS
          atm : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1370017
          Files :

          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/native/libhdfs/exception.c
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/native/libhdfs/exception.h

          atm : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1370015
          Files :

          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/CMakeLists.txt
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/native/libhdfs/hdfs.c
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/native/libhdfs/hdfs.h
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/native/libhdfs/jni_helper.c
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/native/libhdfs/jni_helper.h
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/native/libhdfs/native_mini_dfs.c
          Show
          Hudson added a comment - Integrated in Hadoop-Hdfs-trunk #1128 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk/1128/ ) Add two new files missed by last commit of HDFS-3579 . (Revision 1370017) HDFS-3579 . libhdfs: fix exception handling. Contributed by Colin Patrick McCabe. (Revision 1370015) Result = SUCCESS atm : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1370017 Files : /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/native/libhdfs/exception.c /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/native/libhdfs/exception.h atm : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1370015 Files : /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/CMakeLists.txt /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/native/libhdfs/hdfs.c /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/native/libhdfs/hdfs.h /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/native/libhdfs/jni_helper.c /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/native/libhdfs/jni_helper.h /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/native/libhdfs/native_mini_dfs.c

            People

            • Assignee:
              Colin Patrick McCabe
              Reporter:
              Colin Patrick McCabe
            • Votes:
              0 Vote for this issue
              Watchers:
              9 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development