Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Blocker Blocker
    • Resolution: Fixed
    • Affects Version/s: 0.19.0, 0.20.0
    • Fix Version/s: 0.18.3, 0.19.1
    • Component/s: None
    • Labels:
      None
    • Hadoop Flags:
      Reviewed
    • Release Note:
      Hide
      fix memory leak of user/group information in fuse-dfs

      Show
      fix memory leak of user/group information in fuse-dfs

      Description

      I am running a process that needs to crawl a tree structure containing ~10K images, copy the images to the local disk, process these images, and copy them back to HDFS.

      My problem is the following : after about 10h of processing, the processes crash, complaining about a std::bad_alloc exception (I use hadoop pipes to run existing software). When running fuse_dfs in debug mode, I get an outOfMemoryError, telling that there is no more room in the heap.

      While the process is running, using top or ps, I notice that fuse is using up an increasing amount of memory, until some limit is reached. At that point , the memory used is oscillating. I suppose that this is due to the use of the virtual memory.

      This leads me to the conclusion that there is some memory leak in fuse_dfs, since the only other programs running are Hadoop and the existing software, both thoroughly tested in the past.

      My problem is that my knowledge concerning memory leak tracking is rather limited, so I will need some instructions to get more insight concerning this issue.

      Thank you

      1. TEST-TestFuseDFS.txt-4635
        36 kB
        Pete Wyckoff
      2. patch-hadoop4635.test_19
        61 kB
        Pete Wyckoff
      3. patch-hadoop4635.test_18
        32 kB
        Pete Wyckoff
      4. HADOOP-4635.txt.0.19
        2 kB
        Pete Wyckoff
      5. HADOOP-4635.txt.0.18
        3 kB
        Pete Wyckoff

        Issue Links

          Activity

          Hide
          Hudson added a comment -

          Integrated in Hadoop-trunk #677 (See http://hudson.zones.apache.org/hudson/job/Hadoop-trunk/677/)
          . Fix a memory leak in fuse dfs. (pete wyckoff via mahadev)

          Show
          Hudson added a comment - Integrated in Hadoop-trunk #677 (See http://hudson.zones.apache.org/hudson/job/Hadoop-trunk/677/ ) . Fix a memory leak in fuse dfs. (pete wyckoff via mahadev)
          Hide
          Mahadev konar added a comment -

          I just committed this. Thanks pete.

          Show
          Mahadev konar added a comment - I just committed this. Thanks pete.
          Hide
          Pete Wyckoff added a comment -

          failure in Hudson not due to this patch which is fuse-dfs c code only.

          Show
          Pete Wyckoff added a comment - failure in Hudson not due to this patch which is fuse-dfs c code only.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12394690/HADOOP-4635.txt.0.19
          against trunk revision 720930.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 3 new or modified tests.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 findbugs. The patch does not introduce any new Findbugs warnings.

          +1 Eclipse classpath. The patch retains Eclipse classpath integrity.

          -1 core tests. The patch failed core unit tests.

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3657/testReport/
          Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3657/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
          Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3657/artifact/trunk/build/test/checkstyle-errors.html
          Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3657/console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12394690/HADOOP-4635.txt.0.19 against trunk revision 720930. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 Eclipse classpath. The patch retains Eclipse classpath integrity. -1 core tests. The patch failed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3657/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3657/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3657/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3657/console This message is automatically generated.
          Hide
          dhruba borthakur added a comment -

          +1. Fox looks good.

          Show
          dhruba borthakur added a comment - +1. Fox looks good.
          Hide
          Pete Wyckoff added a comment -

          0.19 patch

          Show
          Pete Wyckoff added a comment - 0.19 patch
          Hide
          Pete Wyckoff added a comment -

          0.18 patch.

          Show
          Pete Wyckoff added a comment - 0.18 patch.
          Hide
          Pete Wyckoff added a comment -

          output of ant test-contrib -Dtestcase=TestFuseDFS -Dlibhdfs=1 -Dfusedfs=1

          BUILD SUCCESSFUL

          Show
          Pete Wyckoff added a comment - output of ant test-contrib -Dtestcase=TestFuseDFS -Dlibhdfs=1 -Dfusedfs=1 BUILD SUCCESSFUL
          Hide
          Pete Wyckoff added a comment -

          [exec] +1 overall.

          [exec] +1 @author. The patch does not contain any @author tags.

          [exec] +1 tests included. The patch appears to include 3 new or modified tests.

          [exec] +1 javadoc. The javadoc tool did not generate any warning messages.

          [exec] +1 javac

          on 0.19 patch test output

          Show
          Pete Wyckoff added a comment - [exec] +1 overall. [exec] +1 @author. The patch does not contain any @author tags. [exec] +1 tests included. The patch appears to include 3 new or modified tests. [exec] +1 javadoc. The javadoc tool did not generate any warning messages. [exec] +1 javac on 0.19 patch test output
          Hide
          Pete Wyckoff added a comment -

          [exec] +1 overall.

          [exec] +1 @author. The patch does not contain any @author tags.

          [exec] +1 tests included. The patch appears to include 3 new or modified tests.

          [exec] +1 javadoc. The javadoc tool did not generate any warning messages.

          [exec] +1 javac

          patch test on 0.18

          Show
          Pete Wyckoff added a comment - [exec] +1 overall. [exec] +1 @author. The patch does not contain any @author tags. [exec] +1 tests included. The patch appears to include 3 new or modified tests. [exec] +1 javadoc. The javadoc tool did not generate any warning messages. [exec] +1 javac patch test on 0.18
          Hide
          Pete Wyckoff added a comment -

          interesting. I am using the latest version of the code but against 17.1 so i also have permissions off.

          we should definitely plug the num_groups hole that you mentioned though.

          Show
          Pete Wyckoff added a comment - interesting. I am using the latest version of the code but against 17.1 so i also have permissions off. we should definitely plug the num_groups hole that you mentioned though.
          Hide
          Marc-Olivier Fleury added a comment -

          I tried to get back to version 0.18.3, and it corrected the leak. I don't have time to evaluate more precisely what caused the leak, but I can still provide these details:

          I am now using fuse_dfs without permissions (-Dlibhdfs.noperms=1). It was actually needed to compile with 0.18.3.

          So, the leak I was experiencing may come from the getGroups stuff. However, considering that you also use the latest version of fuse-dfs, and do not experience any leak, the problem ,might also very well come from libhdfs, and not from fuse-dfs.

          I hope that my feedback may be of some use someday...

          Thanks for your help.

          Show
          Marc-Olivier Fleury added a comment - I tried to get back to version 0.18.3, and it corrected the leak. I don't have time to evaluate more precisely what caused the leak, but I can still provide these details: I am now using fuse_dfs without permissions (-Dlibhdfs.noperms=1). It was actually needed to compile with 0.18.3. So, the leak I was experiencing may come from the getGroups stuff. However, considering that you also use the latest version of fuse-dfs, and do not experience any leak, the problem ,might also very well come from libhdfs, and not from fuse-dfs. I hope that my feedback may be of some use someday... Thanks for your help.
          Hide
          Pete Wyckoff added a comment -

          My 17.1 cluster, I don't see mine leaking, but I do see that because of the Java GC, the memory use is hard to peg. I mounted with -ordbuffer=67108864 and then ran 1000 head part-00000 > /dev/null for a file and the memory sometimes climbs as high as a 1 GB, but then comes down and stays at about 550MB.
          I also tried the following code, and I saw about the same behavior as with fuse. I also tried using a 64MB buffer in fuse, but short circuiting the code to do such big reads to dfs and the memory never grew much.

          It may just be Java. Maybe you could set LIBHDFS_OPTS to the options to make the GC write a log but I don't know if it will show how much is in use by the JVM ??

          
          import org.apache.hadoop.fs.*;
          import org.apache.hadoop.conf.*;
          import org.apache.hadoop.dfs.*;
          
          
          public class test { 
            public static void main(String args[]) { 
              try {
                int size = 256 * 1024 * 1024;
                byte buf[] = new byte[size];
                FileSystem fs = FileSystem.get(new Configuration());
                for (int i = 0; i < 1000; i++) {
                  FSDataInputStream fsi = fs.open(new Path("/some/path/to/part-00000"));
                  fsi.readFully( 0,buf);
                  System.err.println(i );
                  fsi.close();
                }
                Thread.sleep(60*1000);
              } catch(Exception e) {
                e.printStackTrace();
              }
            }
          };
          
          
          Show
          Pete Wyckoff added a comment - My 17.1 cluster, I don't see mine leaking, but I do see that because of the Java GC, the memory use is hard to peg. I mounted with -ordbuffer=67108864 and then ran 1000 head part-00000 > /dev/null for a file and the memory sometimes climbs as high as a 1 GB, but then comes down and stays at about 550MB. I also tried the following code, and I saw about the same behavior as with fuse. I also tried using a 64MB buffer in fuse, but short circuiting the code to do such big reads to dfs and the memory never grew much. It may just be Java. Maybe you could set LIBHDFS_OPTS to the options to make the GC write a log but I don't know if it will show how much is in use by the JVM ?? import org.apache.hadoop.fs.*; import org.apache.hadoop.conf.*; import org.apache.hadoop.dfs.*; public class test { public static void main( String args[]) { try { int size = 256 * 1024 * 1024; byte buf[] = new byte [size]; FileSystem fs = FileSystem.get( new Configuration()); for ( int i = 0; i < 1000; i++) { FSDataInputStream fsi = fs.open( new Path( "/some/path/to/part-00000" )); fsi.readFully( 0,buf); System .err.println(i ); fsi.close(); } Thread .sleep(60*1000); } catch (Exception e) { e.printStackTrace(); } } };
          Hide
          Pete Wyckoff added a comment -

          Did you include your fix for the off by one problem you found in the grouplist?

          Also, would be helpful to know if it still leaks when -Dlibhdfs.noperms=1 - at least that will rule out the group list stuff.

          I'm going to watch our production environment for leaks, but I never heard of one but also, we are running 0.17.1 in production but with the latest fuse-dfs.

          – pete

          Show
          Pete Wyckoff added a comment - Did you include your fix for the off by one problem you found in the grouplist? Also, would be helpful to know if it still leaks when -Dlibhdfs.noperms=1 - at least that will rule out the group list stuff. I'm going to watch our production environment for leaks, but I never heard of one but also, we are running 0.17.1 in production but with the latest fuse-dfs. – pete
          Hide
          Marc-Olivier Fleury added a comment -

          I am using the latest patch provided in issue HADOOP-4616, which is supposed to fix the leaks, and I have bad news: there are still leaks.

          while searching through the code, I noticed taht some of the functions used as fuse_operations are not static. Can it be a problem?

          At least, fixing this does not remove the leak. I am running a version using static functions, the leak is still there.

          I am running out of clues, so any help would be appreciated.

          Thanks!

          Show
          Marc-Olivier Fleury added a comment - I am using the latest patch provided in issue HADOOP-4616 , which is supposed to fix the leaks, and I have bad news: there are still leaks. while searching through the code, I noticed taht some of the functions used as fuse_operations are not static. Can it be a problem? At least, fixing this does not remove the leak. I am running a version using static functions, the leak is still there. I am running out of clues, so any help would be appreciated. Thanks!
          Hide
          Pete Wyckoff added a comment -

          excellent catch - that could have definitely caused lots of problems!

          Show
          Pete Wyckoff added a comment - excellent catch - that could have definitely caused lots of problems!
          Hide
          Marc-Olivier Fleury added a comment -

          Yes, after posting my last comment I noticed exactly the problem you mentioned.

          I see two ways of correcting this : either increment the number of groups, or delete one more item than the number of groups. It merely depends on the meaning of num_groups.

          I chose the second way to solve it, adding

          free(groups[i])

          at the end of the for loop, to mirror the creation code.

          I also changed

          groupnames = (char**)malloc(sizeof(char*)* (*num_groups) + 1);
          

          to

          groupnames = (char**)malloc(sizeof(char*)* (*num_groups + 1) );
          

          just in case sizeof(char*) != 1 (we never know...)

          Thanks for the details.

          Show
          Marc-Olivier Fleury added a comment - Yes, after posting my last comment I noticed exactly the problem you mentioned. I see two ways of correcting this : either increment the number of groups, or delete one more item than the number of groups. It merely depends on the meaning of num_groups. I chose the second way to solve it, adding free(groups[i]) at the end of the for loop, to mirror the creation code. I also changed groupnames = ( char **)malloc(sizeof( char *)* (*num_groups) + 1); to groupnames = ( char **)malloc(sizeof( char *)* (*num_groups + 1) ); just in case sizeof(char*) != 1 (we never know...) Thanks for the details.
          Hide
          Pete Wyckoff added a comment -

          It's this code in getGroups that is causing the (a?) leak:

            groupnames = (char**)malloc(sizeof(char*)* (*num_groups) + 1);          ** allocating num_groups + 1 **                                                                                                                                   
            assert(groupnames);                                                                                                                                                                                       
            int i;                                                                                                                                                                                                    
            for (i=0; i < *num_groups; i++)  {                                                                                                                                                                        
              groupnames[i] = getGroup(grouplist[i]);                                                                                                                                                                 
              if (groupnames[i] == NULL) {                                                                                                                                                                            
                fprintf(stderr, "error could not lookup group %d\n",(int)grouplist[i]);                                                                                                                               
              }                                                                                                                                                                                                       
            }                                                                                                                                                                                                         
            free(grouplist);                                                                                                                                                                                          
            assert(user != NULL);                                                                                                                                                                                     
            groupnames[i] = user;      ** setting position beyond num_groups - never released **                 
          

          The last groupnames[i] = user is never released because freeGroups code is:

          static void freeGroups(char **groups, int numgroups) {                                                                                                                                                      
            if (groups == NULL) {                                                                                                                                                                                     
              return;                                                                                                                                                                                                 
            }                                                                                                                                                                                                         
            int i ;                                                                                                                                                                                                   
            for (i = 0; i < numgroups; i++) {                                                                                                                                                                         
              free(groups[i]);                                                                                                                                                                                        
            }                                                                                                                                                                                                         
            free(groups);                                                                                                                                                                                             
          }                                                                                                                                                                                                           
          

          Also note that Hadoop never sees the last group either. I think we need a

            *num_groups = *num_groups + 1;
          

          in getGroups after setting groupnames[i] = user to fix the leak and hadoop not seeing the last group.

          – pete

          Show
          Pete Wyckoff added a comment - It's this code in getGroups that is causing the (a?) leak: groupnames = ( char **)malloc(sizeof( char *)* (*num_groups) + 1); ** allocating num_groups + 1 ** assert (groupnames); int i; for (i=0; i < *num_groups; i++) { groupnames[i] = getGroup(grouplist[i]); if (groupnames[i] == NULL) { fprintf(stderr, "error could not lookup group %d\n" ,( int )grouplist[i]); } } free(grouplist); assert (user != NULL); groupnames[i] = user; ** setting position beyond num_groups - never released ** The last groupnames [i] = user is never released because freeGroups code is: static void freeGroups( char **groups, int numgroups) { if (groups == NULL) { return ; } int i ; for (i = 0; i < numgroups; i++) { free(groups[i]); } free(groups); } Also note that Hadoop never sees the last group either. I think we need a *num_groups = *num_groups + 1; in getGroups after setting groupnames [i] = user to fix the leak and hadoop not seeing the last group. – pete
          Hide
          Marc-Olivier Fleury added a comment -

          Well, I think that the leak you mentioned that happens at each hdfsConnect should definitely be fixed, and if this is the right place to report that issue, we can use it to correct it.

          I am still not sure where exactly the leak happens... is it the doConnect function that does not clean correctly what it allocates, or is it hdfsConnectAsUser that has problems?

          Looking at the code of doConnect, it seems that everything is freed (call to freeGroups and free(user)). Was this the leak you were mentioning? Or is there an issue with getGroups/freeGroups?

          Thanks for your help, I really need to fix this problem...

          Show
          Marc-Olivier Fleury added a comment - Well, I think that the leak you mentioned that happens at each hdfsConnect should definitely be fixed, and if this is the right place to report that issue, we can use it to correct it. I am still not sure where exactly the leak happens... is it the doConnect function that does not clean correctly what it allocates, or is it hdfsConnectAsUser that has problems? Looking at the code of doConnect, it seems that everything is freed (call to freeGroups and free(user)). Was this the leak you were mentioning? Or is there an issue with getGroups/freeGroups? Thanks for your help, I really need to fix this problem...
          Hide
          Pete Wyckoff added a comment -

          4616 fixes the memory leak of the pthread_mutex_destroy not being called on write file handles. this would seem to be the only known leak for 0.18.2.

          for 0.19/trunk, there are some other small leaks associated with doing opens as a specific user and looking up their groups; an off by 1 error would seem to prevent one of the groups as a char * being freed. will address that in this JIRA??

          Show
          Pete Wyckoff added a comment - 4616 fixes the memory leak of the pthread_mutex_destroy not being called on write file handles. this would seem to be the only known leak for 0.18.2. for 0.19/trunk, there are some other small leaks associated with doing opens as a specific user and looking up their groups; an off by 1 error would seem to prevent one of the groups as a char * being freed. will address that in this JIRA??
          Hide
          Marc-Olivier Fleury added a comment -

          Great, I see that you have some good ideas on what parts of the code should cause issues.

          I am using the latest version from svn, 0.20.0, I think. (had to upgrade to be able to use the writing functionalities)

          I spent some time looking at the code, and I noticed a strange little quirk (fuse_dfs.c:608). The '+1' in the malloc. I don't get why it is there, must be a relic... it is not important, since it will be freed anyway, still...

          Anyway, I am chasing the leaks right now and I am happy to see that some of the leaks are already located. Is the hdfsConnectAsUser difficult to fix? I will take a look and try to fix it, but if you have any insight, pleease let me know!

          Show
          Marc-Olivier Fleury added a comment - Great, I see that you have some good ideas on what parts of the code should cause issues. I am using the latest version from svn, 0.20.0, I think. (had to upgrade to be able to use the writing functionalities) I spent some time looking at the code, and I noticed a strange little quirk (fuse_dfs.c:608). The '+1' in the malloc. I don't get why it is there, must be a relic... it is not important, since it will be freed anyway, still... Anyway, I am chasing the leaks right now and I am happy to see that some of the leaks are already located. Is the hdfsConnectAsUser difficult to fix? I will take a look and try to fix it, but if you have any insight, pleease let me know!
          Hide
          Pete Wyckoff added a comment -

          There is 1 memory leak in 0.18.2. For every file opened in write mode, it never calls pthread_mutex_destroy for a mutex. This would probably be a few 10s of bytes per file.

          For 0.19 and 0.20, it also leaks per hdfsConnect which happens on file opens, chmod, mvdir, ... That is leaking a char * of the username.

          There's also a bug open for fuse-dfs leaking FileSystem handles, but that is a single handle per unique user/group combination doing operations, and so should be very small and not worrisome as it is O(#of users) and # of users is small.

          I'm glad you opened this and we looked at it.

          This is 0.18.2, right?

          Show
          Pete Wyckoff added a comment - There is 1 memory leak in 0.18.2. For every file opened in write mode, it never calls pthread_mutex_destroy for a mutex. This would probably be a few 10s of bytes per file. For 0.19 and 0.20, it also leaks per hdfsConnect which happens on file opens, chmod, mvdir, ... That is leaking a char * of the username. There's also a bug open for fuse-dfs leaking FileSystem handles, but that is a single handle per unique user/group combination doing operations, and so should be very small and not worrisome as it is O(#of users) and # of users is small. I'm glad you opened this and we looked at it. This is 0.18.2, right?
          Hide
          Marc-Olivier Fleury added a comment - - edited

          It seems that there really is a leak...

          Using top, I am monitoring the amount of memory used by fuse_dfs on one of the machines.

          The percentage went from 6.8 % to 8.1 % in 2-3 hours. I will continue tracking the memory usage to get some more insight.

          Show
          Marc-Olivier Fleury added a comment - - edited It seems that there really is a leak... Using top, I am monitoring the amount of memory used by fuse_dfs on one of the machines. The percentage went from 6.8 % to 8.1 % in 2-3 hours. I will continue tracking the memory usage to get some more insight.
          Hide
          Marc-Olivier Fleury added a comment -

          After some further testing, I notified that the amount of memory used by fuse_dfs is of about 70MB (varies from a machine to the other).

          I tried to use vmstat to see if a lot of paging was involved, and it does not seem to be the case.

          I might just have done a report for an unexisting bug, I apologize if it is the case, an am in any way ready to do as many tests as needed to get a better understanding of the situation.

          Show
          Marc-Olivier Fleury added a comment - After some further testing, I notified that the amount of memory used by fuse_dfs is of about 70MB (varies from a machine to the other). I tried to use vmstat to see if a lot of paging was involved, and it does not seem to be the case. I might just have done a report for an unexisting bug, I apologize if it is the case, an am in any way ready to do as many tests as needed to get a better understanding of the situation.

            People

            • Assignee:
              Pete Wyckoff
              Reporter:
              Marc-Olivier Fleury
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development