Hadoop Common
  1. Hadoop Common
  2. HADOOP-2576

Namenode performance degradation over time

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Blocker Blocker
    • Resolution: Fixed
    • Affects Version/s: 0.16.0
    • Fix Version/s: 0.16.0
    • Component/s: None
    • Labels:
      None

      Description

      We have a cluster running the same applications again and again with a high turnover of files.

      The performance of these applications seem to be correlated to the lifetime of the namenode:
      After starting the namenode, the applications need increasingly more time to complete, with about 50% more time after 1 week.

      During that time the namenode average cpu usage increases from typically 10% to 30%, memory usage nearly doubles (although the average amount of data on dfs stays the same), and the average load factor increases by a factor of 2-3 (although not significantly high, <2).

      When looking at the namenode and datanode logs, I see a lot of asks to delete blocks coming from the namenode for blocks not in the blockmap of the datanodes, repeatedly for the same blocks.
      When I counted the number of blocks asked by the namenode to be deleted, I noticed a noticeable increase with the lifetime of the namenode (a factor of 2-3 after 1 week).

      This makes me wonder whether the namenode does not purge the list of invalid blocks from non-existing blocks.

      But independently, the namenode has a degradation issue.

      1. HADOOP-2576.patch
        4 kB
        Raghu Angadi
      2. HADOOP-2576.patch
        4 kB
        Raghu Angadi
      3. HADOOP-2576.patch
        4 kB
        Raghu Angadi
      4. HADOOP-2576.patch
        4 kB
        Raghu Angadi
      5. HADOOP-2576.patch
        2 kB
        Raghu Angadi

        Activity

        Hide
        dhruba borthakur added a comment -

        There is a command that dumps namenode internal data structures to a log file. When this problem ocurs, can you pl run it as

        bin/hadoop dfsadmin -metasave "filename".

        The specified filename will be created in the namenode's log directory. This file will list blocks that are waiting to be replicated as well as blocks waiting to be deleted. Using this tool we can determine if the namenode is not purging the list of blocks to be invalidated.

        Show
        dhruba borthakur added a comment - There is a command that dumps namenode internal data structures to a log file. When this problem ocurs, can you pl run it as bin/hadoop dfsadmin -metasave "filename". The specified filename will be created in the namenode's log directory. This file will list blocks that are waiting to be replicated as well as blocks waiting to be deleted. Using this tool we can determine if the namenode is not purging the list of blocks to be invalidated.
        Hide
        Christian Kunz added a comment -

        I have 2 block reports now, one generated 1.5 days after namenode startup, and one 4.5 days after. The build process did not yet slow down to a large extent, but the block reports already indicate some leak:

        The first block report lists about 20,000 blocks to delete from 14 nodes
        the 2nd one about 140,000 blocks to delete from 10 nodes.

        I checked the first block of the first node in the datanode log files: there were about 40 futile attempts to delete that block (not found in blockMap).

        Show
        Christian Kunz added a comment - I have 2 block reports now, one generated 1.5 days after namenode startup, and one 4.5 days after. The build process did not yet slow down to a large extent, but the block reports already indicate some leak: The first block report lists about 20,000 blocks to delete from 14 nodes the 2nd one about 140,000 blocks to delete from 10 nodes. I checked the first block of the first node in the datanode log files: there were about 40 futile attempts to delete that block (not found in blockMap).
        Hide
        Raghu Angadi added a comment -

        What is the heartbeat on this cluster? Is it is say large like 1 min? invalidateSet that contains the blocks to delete for each datanode at namenode is actually a array.. each block could be present multiple times in this array.

        Show
        Raghu Angadi added a comment - What is the heartbeat on this cluster? Is it is say large like 1 min? invalidateSet that contains the blocks to delete for each datanode at namenode is actually a array.. each block could be present multiple times in this array.
        Hide
        Christian Kunz added a comment -

        Heartbeat is indeed 1 minute.

        On the other hand, of the 143437 blocks in 2nd block report listing blocks to be deleted on 10 datanodes, about 127650 are unique (including the one I checked having 40 failed attempts to delete), about 15400 are double, and less than 400 are replicated more or equal 3 times.

        Show
        Christian Kunz added a comment - Heartbeat is indeed 1 minute. On the other hand, of the 143437 blocks in 2nd block report listing blocks to be deleted on 10 datanodes, about 127650 are unique (including the one I checked having 40 failed attempts to delete), about 15400 are double, and less than 400 are replicated more or equal 3 times.
        Hide
        Raghu Angadi added a comment -

        Thanks. May be with access to the logs, this could investigated better.

        Show
        Raghu Angadi added a comment - Thanks. May be with access to the logs, this could investigated better.
        Hide
        Stu Hood added a comment -

        For comparison, our cluster also runs job frequently with new files across 8 nodes, and we haven't experienced this issue with Hadoop 0.15.0. The cluster has been up for 2 months now.

        Show
        Stu Hood added a comment - For comparison, our cluster also runs job frequently with new files across 8 nodes, and we haven't experienced this issue with Hadoop 0.15.0. The cluster has been up for 2 months now.
        Hide
        Raghu Angadi added a comment -

        Thanks Christian, I have access to the logs. The cluster seems to be running an old version of the trunk can you get the svn revision? Also Namenode was recently restarted.

        Looks like there another linked list attached each datanode. metasave prints only the "recent invalidates". A loop in Namenode moves the invalidated blocks from recent invalidates to the datanode list. So it is possible for the block to exist many more times in this list. This is most probably the reason.

        I think it is better to relieve Namenode from throttling the deletion of blocks. In cases like these there seems to quite a bit of penalty on Namenode memory, the most precious resource for HDFS. Namenode could just ask Datanode to delete anything that it want to delete. Datanode could throttle it, I think it would be more scalable. This will also remove code related to management of throttling.

        Show
        Raghu Angadi added a comment - Thanks Christian, I have access to the logs. The cluster seems to be running an old version of the trunk can you get the svn revision? Also Namenode was recently restarted. Looks like there another linked list attached each datanode. metasave prints only the "recent invalidates". A loop in Namenode moves the invalidated blocks from recent invalidates to the datanode list. So it is possible for the block to exist many more times in this list. This is most probably the reason. I think it is better to relieve Namenode from throttling the deletion of blocks. In cases like these there seems to quite a bit of penalty on Namenode memory, the most precious resource for HDFS. Namenode could just ask Datanode to delete anything that it want to delete. Datanode could throttle it, I think it would be more scalable. This will also remove code related to management of throttling.
        Hide
        Raghu Angadi added a comment -

        But for now, just changing the datanode list to a Set might be good enough.

        Show
        Raghu Angadi added a comment - But for now, just changing the datanode list to a Set might be good enough.
        Hide
        Raghu Angadi added a comment -

        > For comparison, our cluster also runs job frequently with new files across 8 nodes, and we haven't experienced this issue with Hadoop 0.15.0. The cluster has been up for 2 months now.

        I think a combination of large heartbeat interval and busts of deletions trigger this. What is dfs.heartbeat.interval set to?

        Show
        Raghu Angadi added a comment - > For comparison, our cluster also runs job frequently with new files across 8 nodes, and we haven't experienced this issue with Hadoop 0.15.0. The cluster has been up for 2 months now. I think a combination of large heartbeat interval and busts of deletions trigger this. What is dfs.heartbeat.interval set to?
        Hide
        Raghu Angadi added a comment -

        Could you try the attached patch.

        You might get a conflict in FSNameSystem.java since it changes so often. But its only a one line change there.

        Show
        Raghu Angadi added a comment - Could you try the attached patch. You might get a conflict in FSNameSystem.java since it changes so often. But its only a one line change there.
        Hide
        dhruba borthakur added a comment -

        I agree with Raghu that the throttle to delete blocks from a datanode could be done by the Datanode. Currently, the namenode does this throttling. See HADOOP-774 for more discussion on this topic.

        Show
        dhruba borthakur added a comment - I agree with Raghu that the throttle to delete blocks from a datanode could be done by the Datanode. Currently, the namenode does this throttling. See HADOOP-774 for more discussion on this topic.
        Hide
        Christian Kunz added a comment -

        I applied Raghu's patch, restarted the nameserver, and will monitor its performance.

        Show
        Christian Kunz added a comment - I applied Raghu's patch, restarted the nameserver, and will monitor its performance.
        Hide
        Raghu Angadi added a comment -

        Another issue that contributes this problem and made worse by large heartbeat :

        1. For each heartbeat, Namenode asks a datanode to delete blocks only the datanode does not need to transfer (for replication, rebalancing etc) any blocks. I don't see any reason by datanode can not do both. If a datanode gets asked to transfer one block in each minute, it never deletes anything.
        1. Similarly 'computeDatanodeWork()' does the same: it moves blocks from recentInvalidates to datanode's list only if no transfers are scheduled on a node.
        Show
        Raghu Angadi added a comment - Another issue that contributes this problem and made worse by large heartbeat : For each heartbeat, Namenode asks a datanode to delete blocks only the datanode does not need to transfer (for replication, rebalancing etc) any blocks. I don't see any reason by datanode can not do both. If a datanode gets asked to transfer one block in each minute, it never deletes anything. Similarly 'computeDatanodeWork()' does the same: it moves blocks from recentInvalidates to datanode's list only if no transfers are scheduled on a node.
        Hide
        Raghu Angadi added a comment -


        Does anyone know why there are two places where an invalidated blocks are maintained. FSNamesystem.recentInvalidates and DatanodeDescriptor.invalidatedSet. I am not sure if there is problem if we just remove recentInvalidates.

        Show
        Raghu Angadi added a comment - Does anyone know why there are two places where an invalidated blocks are maintained. FSNamesystem.recentInvalidates and DatanodeDescriptor.invalidatedSet. I am not sure if there is problem if we just remove recentInvalidates.
        Hide
        Raghu Angadi added a comment - - edited

        So far the cluster looks ok.

        Filed three follow up jiras : HADOOP-2694, HADOOP-2695, and HADOOP-2696 .

        Note the eventual patch for this jira will make the limit (on number of blocks Namenode asks datanode to delete) proportional to heartBeat interval. May be something like max(100, 20*interval_in_sec). The test patch did not include this since we wanted to confirm the underlying problem.

        Show
        Raghu Angadi added a comment - - edited So far the cluster looks ok. Filed three follow up jiras : HADOOP-2694 , HADOOP-2695 , and HADOOP-2696 . Note the eventual patch for this jira will make the limit (on number of blocks Namenode asks datanode to delete) proportional to heartBeat interval. May be something like max(100, 20*interval_in_sec). The test patch did not include this since we wanted to confirm the underlying problem.
        Hide
        Raghu Angadi added a comment -

        The namenode is symptom free till now (~ one day).

        patch intended for inclusion. Only difference from previous patch is that limit on number of blocks remove in each heartbeat is changed to Max(100, 20*heartbeat) instead of 100.

        Show
        Raghu Angadi added a comment - The namenode is symptom free till now (~ one day). patch intended for inclusion. Only difference from previous patch is that limit on number of blocks remove in each heartbeat is changed to Max(100, 20*heartbeat) instead of 100.
        Hide
        dhruba borthakur added a comment -

        FSNamesystem.blockInvalidateLimit shoudl be initialized to FSConstants.BLOCK_INVALIDATE_CHUNK.

        The heartbeatInterval is in milliseconds. Do you really want the blockInvalidateLimit to be set to 20 times the number of milliseconds in a heartbeat interval? That seems to high.

        Show
        dhruba borthakur added a comment - FSNamesystem.blockInvalidateLimit shoudl be initialized to FSConstants.BLOCK_INVALIDATE_CHUNK. The heartbeatInterval is in milliseconds. Do you really want the blockInvalidateLimit to be set to 20 times the number of milliseconds in a heartbeat interval? That seems to high.
        Hide
        Raghu Angadi added a comment -

        Thanks Dhruba,

        Updated patch has both suggested changes.

        > The heartbeatInterval is in milliseconds. Do you really want the blockInvalidateLimit to be set to 20 times the number of milliseconds in a heartbeat interval? That seems to high.
        oops! good that you caught it. It

        Show
        Raghu Angadi added a comment - Thanks Dhruba, Updated patch has both suggested changes. > The heartbeatInterval is in milliseconds. Do you really want the blockInvalidateLimit to be set to 20 times the number of milliseconds in a heartbeat interval? That seems to high. oops! good that you caught it. It
        Hide
        dhruba borthakur added a comment -

        +1. code looks good.

        Show
        dhruba borthakur added a comment - +1. code looks good.
        Hide
        Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12373997/HADOOP-2576.patch
        against trunk revision 614721.

        @author +1. The patch does not contain any @author tags.

        javadoc +1. The javadoc tool did not generate any warning messages.

        javac +1. The applied patch does not generate any new compiler warnings.

        findbugs -1. The patch appears to introduce 1 new Findbugs warnings.

        core tests +1. The patch passed core unit tests.

        contrib tests +1. The patch passed contrib unit tests.

        Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/1677/testReport/
        Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/1677/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
        Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/1677/artifact/trunk/build/test/checkstyle-errors.html
        Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/1677/console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12373997/HADOOP-2576.patch against trunk revision 614721. @author +1. The patch does not contain any @author tags. javadoc +1. The javadoc tool did not generate any warning messages. javac +1. The applied patch does not generate any new compiler warnings. findbugs -1. The patch appears to introduce 1 new Findbugs warnings. core tests +1. The patch passed core unit tests. contrib tests +1. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/1677/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/1677/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/1677/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/1677/console This message is automatically generated.
        Hide
        Raghu Angadi added a comment - - edited

        The earlier patch did add a findbugs warning but newPatchFindbugsWarnings.html shows wrong one.

        Note that this patch does not actually change any locking. It replaced a constant with an int that is initialized once in the constructor. Fortunately there are some indirect ways of letting findbugs know.

        Updated patch attached.

        Show
        Raghu Angadi added a comment - - edited The earlier patch did add a findbugs warning but newPatchFindbugsWarnings.html shows wrong one. Note that this patch does not actually change any locking. It replaced a constant with an int that is initialized once in the constructor. Fortunately there are some indirect ways of letting findbugs know. Updated patch attached.
        Hide
        Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12374076/HADOOP-2576.patch
        against trunk revision 614721.

        @author +1. The patch does not contain any @author tags.

        javadoc -1. The javadoc tool appears to have generated messages.

        javac +1. The applied patch does not generate any new compiler warnings.

        findbugs -1. The patch appears to cause Findbugs to fail.

        core tests -1. The patch failed core unit tests.

        contrib tests -1. The patch failed contrib unit tests.

        Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/1681/testReport/
        Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/1681/artifact/trunk/build/test/checkstyle-errors.html
        Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/1681/console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12374076/HADOOP-2576.patch against trunk revision 614721. @author +1. The patch does not contain any @author tags. javadoc -1. The javadoc tool appears to have generated messages. javac +1. The applied patch does not generate any new compiler warnings. findbugs -1. The patch appears to cause Findbugs to fail. core tests -1. The patch failed core unit tests. contrib tests -1. The patch failed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/1681/testReport/ Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/1681/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/1681/console This message is automatically generated.
        Hide
        Raghu Angadi added a comment -

        Looks like trunk was briefly in wrong state. The above build was affected by it. Basic compilation failed.

        Show
        Raghu Angadi added a comment - Looks like trunk was briefly in wrong state. The above build was affected by it. Basic compilation failed.
        Hide
        Nigel Daley added a comment -

        I just ran these check against http://issues.apache.org/jira/secure/attachment/12374076/HADOOP-2576.patch

        @author +1. The patch does not contain any @author tags.

        javadoc +1. The javadoc tool did not generate any warning messages.

        javac +1. The applied patch does not generate any new compiler warnings.

        findbugs +1. The patch does not introduce any new Findbugs warnings.

        Raghu, if you get "ant test" to pass, then please commit.

        Show
        Nigel Daley added a comment - I just ran these check against http://issues.apache.org/jira/secure/attachment/12374076/HADOOP-2576.patch @author +1. The patch does not contain any @author tags. javadoc +1. The javadoc tool did not generate any warning messages. javac +1. The applied patch does not generate any new compiler warnings. findbugs +1. The patch does not introduce any new Findbugs warnings. Raghu, if you get "ant test" to pass, then please commit.
        Hide
        Raghu Angadi added a comment -

        With all the juggling, I ended up introducing white space change in the last patch. Now cleaned up.

        Show
        Raghu Angadi added a comment - With all the juggling, I ended up introducing white space change in the last patch. Now cleaned up.
        Hide
        Raghu Angadi added a comment -

        The difference between patch Nigel used and the latest patch is just removal of following hunk :

        @@ -311,7 +314,7 @@
            * Initializes some of the members from configuration
            */
           private void setConfigurationParameters(Configuration conf) 
        -                                          throws IOException {
        +                                                       throws IOException {
             fsNamesystemObject = this;
             try {
               fsOwner = UnixUserGroupInformation.login(conf);
        
        Show
        Raghu Angadi added a comment - The difference between patch Nigel used and the latest patch is just removal of following hunk : @@ -311,7 +314,7 @@ * Initializes some of the members from configuration */ private void setConfigurationParameters(Configuration conf) - throws IOException { + throws IOException { fsNamesystemObject = this; try { fsOwner = UnixUserGroupInformation.login(conf);
        Hide
        Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12374104/HADOOP-2576.patch
        against trunk revision 614721.

        @author +1. The patch does not contain any @author tags.

        javadoc +1. The javadoc tool did not generate any warning messages.

        javac +1. The applied patch does not generate any new compiler warnings.

        findbugs +1. The patch does not introduce any new Findbugs warnings.

        core tests -1. The patch failed core unit tests.

        contrib tests +1. The patch passed contrib unit tests.

        Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/1684/testReport/
        Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/1684/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
        Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/1684/artifact/trunk/build/test/checkstyle-errors.html
        Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/1684/console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12374104/HADOOP-2576.patch against trunk revision 614721. @author +1. The patch does not contain any @author tags. javadoc +1. The javadoc tool did not generate any warning messages. javac +1. The applied patch does not generate any new compiler warnings. findbugs +1. The patch does not introduce any new Findbugs warnings. core tests -1. The patch failed core unit tests. contrib tests +1. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/1684/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/1684/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/1684/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/1684/console This message is automatically generated.
        Hide
        Raghu Angadi added a comment -

        The test failure I think is a real bug but totally unrelated to the patch. It happened on the client side while writing/closing a file.

        Show
        Raghu Angadi added a comment - The test failure I think is a real bug but totally unrelated to the patch. It happened on the client side while writing/closing a file.
        Hide
        dhruba borthakur added a comment -

        +1 on committing the patch. The test failure is not related to this patch.

        Show
        dhruba borthakur added a comment - +1 on committing the patch. The test failure is not related to this patch.
        Hide
        Hadoop QA added a comment -

        +1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12374104/HADOOP-2576.patch
        against trunk revision 614721.

        @author +1. The patch does not contain any @author tags.

        javadoc +1. The javadoc tool did not generate any warning messages.

        javac +1. The applied patch does not generate any new compiler warnings.

        findbugs +1. The patch does not introduce any new Findbugs warnings.

        core tests +1. The patch passed core unit tests.

        contrib tests +1. The patch passed contrib unit tests.

        Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/1688/testReport/
        Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/1688/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
        Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/1688/artifact/trunk/build/test/checkstyle-errors.html
        Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/1688/console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - +1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12374104/HADOOP-2576.patch against trunk revision 614721. @author +1. The patch does not contain any @author tags. javadoc +1. The javadoc tool did not generate any warning messages. javac +1. The applied patch does not generate any new compiler warnings. findbugs +1. The patch does not introduce any new Findbugs warnings. core tests +1. The patch passed core unit tests. contrib tests +1. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/1688/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/1688/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/1688/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/1688/console This message is automatically generated.
        Hide
        Raghu Angadi added a comment -

        I just committed this.

        Show
        Raghu Angadi added a comment - I just committed this.
        Hide
        Hudson added a comment -
        Show
        Hudson added a comment - Integrated in Hadoop-trunk #381 (See http://hudson.zones.apache.org/hudson/job/Hadoop-trunk/381/ )

          People

          • Assignee:
            Raghu Angadi
            Reporter:
            Christian Kunz
          • Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development