Uploaded image for project: 'Hadoop HDFS'
  1. Hadoop HDFS
  2. HDFS-7503

Namenode restart after large deletions can cause slow processReport (due to logging)

    Details

    • Target Version/s:
    • Hadoop Flags:
      Reviewed

      Description

      If a large directory is deleted and namenode is immediately restarted, there are a lot of blocks that do not belong to any file. This results in a log:

      2014-11-08 03:11:45,584 INFO BlockStateChange (BlockManager.java:processReport(1901)) - BLOCK* processReport: blk_1074250282_509532 on 172.31.44.17:1019 size 6 does not belong to any file.
      

      This log is printed within FSNamsystem lock. This can cause namenode to take long time in coming out of safemode.

      One solution is to downgrade the logging level.

      1. HDFS-7503.branch-1.02.patch
        3 kB
        Arpit Agarwal
      2. HDFS-7503.branch-1.patch
        0.8 kB
        Arpit Agarwal
      3. HDFS-7503.trunk.01.patch
        0.9 kB
        Arpit Agarwal
      4. HDFS-7503.trunk.02.patch
        3 kB
        Arpit Agarwal

        Issue Links

          Activity

          Hide
          hadoopqa Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12686132/HDFS-7503.trunk.01.patch
          against trunk revision 03867eb.

          +1 @author. The patch does not contain any @author tags.

          -1 tests included. The patch doesn't appear to include any new or modified tests.
          Please justify why no new tests are needed for this patch.
          Also please list what manual steps were performed to verify this patch.

          -1 javac. The applied patch generated 1247 javac compiler warnings (more than the trunk's current 1215 warnings).

          +1 javadoc. There were no new javadoc warning messages.

          +1 eclipse:eclipse. The patch built with eclipse:eclipse.

          +1 findbugs. The patch does not introduce any new Findbugs (version 2.0.3) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The following test timeouts occurred in hadoop-hdfs-project/hadoop-hdfs-nfs:

          org.apache.hadoop.hdfs.nfs.TTests
          org.apache.hadoop.hdfs.nfs.nfs3.TeTests
          org.apache.hadoop.hdfs.nfs.nfs3.TestClientAccessTests
          org.apache.hadoop.hdfs.nfs.nfs3.TestNfs3HTests
          org.apache.hadoop.hdfs.nfs.nfs3.TestRpcPrTests
          org.apache.hadoop.hdfs.nfs.nfs3.TTests
          org.apache.hadoop.hdfs.nfs.nfs3.TestOpenFilTests

          Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/8983//testReport/
          Javac warnings: https://builds.apache.org/job/PreCommit-HDFS-Build/8983//artifact/patchprocess/diffJavacWarnings.txt
          Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/8983//console

          This message is automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12686132/HDFS-7503.trunk.01.patch against trunk revision 03867eb. +1 @author . The patch does not contain any @author tags. -1 tests included . The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. -1 javac . The applied patch generated 1247 javac compiler warnings (more than the trunk's current 1215 warnings). +1 javadoc . There were no new javadoc warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. +1 findbugs . The patch does not introduce any new Findbugs (version 2.0.3) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. -1 core tests . The following test timeouts occurred in hadoop-hdfs-project/hadoop-hdfs-nfs: org.apache.hadoop.hdfs.nfs.TTests org.apache.hadoop.hdfs.nfs.nfs3.TeTests org.apache.hadoop.hdfs.nfs.nfs3.TestClientAccessTests org.apache.hadoop.hdfs.nfs.nfs3.TestNfs3HTests org.apache.hadoop.hdfs.nfs.nfs3.TestRpcPrTests org.apache.hadoop.hdfs.nfs.nfs3.TTests org.apache.hadoop.hdfs.nfs.nfs3.TestOpenFilTests Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/8983//testReport/ Javac warnings: https://builds.apache.org/job/PreCommit-HDFS-Build/8983//artifact/patchprocess/diffJavacWarnings.txt Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/8983//console This message is automatically generated.
          Hide
          sureshms Suresh Srinivas added a comment -

          Arpit Agarwal, can you please describe the approach you are taking to solve this issue?

          Show
          sureshms Suresh Srinivas added a comment - Arpit Agarwal , can you please describe the approach you are taking to solve this issue?
          Hide
          arpitagarwal Arpit Agarwal added a comment -

          Hi Suresh Srinivas, sure. The fix is pretty trivial. I just downgraded the logging level from info to trace (likely too noisy even for debug).

          Show
          arpitagarwal Arpit Agarwal added a comment - Hi Suresh Srinivas , sure. The fix is pretty trivial. I just downgraded the logging level from info to trace (likely too noisy even for debug).
          Hide
          sureshms Suresh Srinivas added a comment -

          One concern I had was if this information is useful for debugging. During startup, namenode delays block deletion which is the place where this debug log was useful, albeit verbose. Given that, I am +1 on this patch.

          Show
          sureshms Suresh Srinivas added a comment - One concern I had was if this information is useful for debugging. During startup, namenode delays block deletion which is the place where this debug log was useful, albeit verbose. Given that, I am +1 on this patch.
          Hide
          arpitagarwal Arpit Agarwal added a comment -

          Thanks for the review Suresh.

          One concern I had was if this information is useful for debugging.

          Here is an alternate patch for trunk that logs invalidated blocks outside theNameSystem write lock.

          Separately perhaps it is time to use async appenders for blockLog/blockStateChangeLog.

          Show
          arpitagarwal Arpit Agarwal added a comment - Thanks for the review Suresh. One concern I had was if this information is useful for debugging. Here is an alternate patch for trunk that logs invalidated blocks outside theNameSystem write lock. Separately perhaps it is time to use async appenders for blockLog/blockStateChangeLog.
          Hide
          hadoopqa Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12686171/HDFS-7503.trunk.02.patch
          against trunk revision 2ed90a5.

          +1 @author. The patch does not contain any @author tags.

          -1 tests included. The patch doesn't appear to include any new or modified tests.
          Please justify why no new tests are needed for this patch.
          Also please list what manual steps were performed to verify this patch.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 javadoc. There were no new javadoc warning messages.

          +1 eclipse:eclipse. The patch built with eclipse:eclipse.

          -1 findbugs. The patch appears to introduce 40 new Findbugs (version 2.0.3) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed these unit tests in hadoop-hdfs-project/hadoop-hdfs:

          org.apache.hadoop.hdfs.server.blockmanagement.TestDatanodeManager

          Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/8987//testReport/
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HDFS-Build/8987//artifact/patchprocess/newPatchFindbugsWarningshadoop-hdfs.html
          Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/8987//console

          This message is automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12686171/HDFS-7503.trunk.02.patch against trunk revision 2ed90a5. +1 @author . The patch does not contain any @author tags. -1 tests included . The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . There were no new javadoc warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. -1 findbugs . The patch appears to introduce 40 new Findbugs (version 2.0.3) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. -1 core tests . The patch failed these unit tests in hadoop-hdfs-project/hadoop-hdfs: org.apache.hadoop.hdfs.server.blockmanagement.TestDatanodeManager Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/8987//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HDFS-Build/8987//artifact/patchprocess/newPatchFindbugsWarningshadoop-hdfs.html Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/8987//console This message is automatically generated.
          Hide
          cnauroth Chris Nauroth added a comment -

          +1 for patch v2. I also prefer that we preserve the current log level and defer it until after releasing the lock.

          I don't think new tests are required for this patch. The Findbugs warnings are unrelated and documented elsewhere. The test failure is unrelated, and I couldn't repro it locally.

          Thank you for the patch, Arpit.

          Show
          cnauroth Chris Nauroth added a comment - +1 for patch v2. I also prefer that we preserve the current log level and defer it until after releasing the lock. I don't think new tests are required for this patch. The Findbugs warnings are unrelated and documented elsewhere. The test failure is unrelated, and I couldn't repro it locally. Thank you for the patch, Arpit.
          Hide
          arpitagarwal Arpit Agarwal added a comment -

          Thank you Chris! I just realized the v1 patch had failed to wrap the logging call in isTraceEnabled.

          Attaching v2 patch for branch-1. I will hold off on committing the v2 trunk patch until tonight in case Suresh has any comments.

          Show
          arpitagarwal Arpit Agarwal added a comment - Thank you Chris! I just realized the v1 patch had failed to wrap the logging call in isTraceEnabled . Attaching v2 patch for branch-1. I will hold off on committing the v2 trunk patch until tonight in case Suresh has any comments.
          Hide
          hadoopqa Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12686384/HDFS-7503.branch-1.02.patch
          against trunk revision 59cb8b9.

          -1 patch. The patch command could not apply the patch.

          Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/8997//console

          This message is automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12686384/HDFS-7503.branch-1.02.patch against trunk revision 59cb8b9. -1 patch . The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/8997//console This message is automatically generated.
          Hide
          cnauroth Chris Nauroth added a comment -

          +1 for the branch-1 v2 patch too. Thanks again, Arpit.

          Show
          cnauroth Chris Nauroth added a comment - +1 for the branch-1 v2 patch too. Thanks again, Arpit.
          Hide
          hudson Hudson added a comment -

          FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #35 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/35/)
          HDFS-7503. Namenode restart after large deletions can cause slow processReport (Arpit Agarwal) (arp: rev 390642acf35f3d599271617d30ba26c2f6406fc1)

          • hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
          • hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManager.java
          Show
          hudson Hudson added a comment - FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #35 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/35/ ) HDFS-7503 . Namenode restart after large deletions can cause slow processReport (Arpit Agarwal) (arp: rev 390642acf35f3d599271617d30ba26c2f6406fc1) hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManager.java
          Hide
          hudson Hudson added a comment -

          FAILURE: Integrated in Hadoop-Hdfs-trunk #1969 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk/1969/)
          HDFS-7503. Namenode restart after large deletions can cause slow processReport (Arpit Agarwal) (arp: rev 390642acf35f3d599271617d30ba26c2f6406fc1)

          • hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
          • hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManager.java
          Show
          hudson Hudson added a comment - FAILURE: Integrated in Hadoop-Hdfs-trunk #1969 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk/1969/ ) HDFS-7503 . Namenode restart after large deletions can cause slow processReport (Arpit Agarwal) (arp: rev 390642acf35f3d599271617d30ba26c2f6406fc1) hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManager.java
          Hide
          hudson Hudson added a comment -

          FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #39 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/39/)
          HDFS-7503. Namenode restart after large deletions can cause slow processReport (Arpit Agarwal) (arp: rev 390642acf35f3d599271617d30ba26c2f6406fc1)

          • hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManager.java
          • hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
          Show
          hudson Hudson added a comment - FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #39 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/39/ ) HDFS-7503 . Namenode restart after large deletions can cause slow processReport (Arpit Agarwal) (arp: rev 390642acf35f3d599271617d30ba26c2f6406fc1) hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManager.java hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
          Hide
          hudson Hudson added a comment -

          FAILURE: Integrated in Hadoop-Mapreduce-trunk #1989 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1989/)
          HDFS-7503. Namenode restart after large deletions can cause slow processReport (Arpit Agarwal) (arp: rev 390642acf35f3d599271617d30ba26c2f6406fc1)

          • hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
          • hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManager.java
          Show
          hudson Hudson added a comment - FAILURE: Integrated in Hadoop-Mapreduce-trunk #1989 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1989/ ) HDFS-7503 . Namenode restart after large deletions can cause slow processReport (Arpit Agarwal) (arp: rev 390642acf35f3d599271617d30ba26c2f6406fc1) hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManager.java
          Hide
          arpitagarwal Arpit Agarwal added a comment -

          Thanks for the reviews Suresh and Chris!

          I committed this to branch-1, trunk, branch-2 and branch-2.6.

          -1 tests included. The patch doesn't appear to include any new or modified tests.

          This is a logging change.

          Show
          arpitagarwal Arpit Agarwal added a comment - Thanks for the reviews Suresh and Chris! I committed this to branch-1, trunk, branch-2 and branch-2.6. -1 tests included. The patch doesn't appear to include any new or modified tests. This is a logging change.
          Hide
          sureshms Suresh Srinivas added a comment -

          A minor nit:

          +        blockLog.info("BLOCK* processReport: " + b + " on " + node
          +                          + " size " + b.getNumBytes()
          +                          + " does not belong to any file");
          

          We can print the repetitive node information and the information that block does not belong to any file outside the for loop.

          Show
          sureshms Suresh Srinivas added a comment - A minor nit: + blockLog.info( "BLOCK* processReport: " + b + " on " + node + + " size " + b.getNumBytes() + + " does not belong to any file" ); We can print the repetitive node information and the information that block does not belong to any file outside the for loop.
          Hide
          arpitagarwal Arpit Agarwal added a comment -

          Hi Suresh, that may cause the output from multiple threads to get interleaved since we're not synchronized any more and make it difficult to parse.

          Show
          arpitagarwal Arpit Agarwal added a comment - Hi Suresh, that may cause the output from multiple threads to get interleaved since we're not synchronized any more and make it difficult to parse.

            People

            • Assignee:
              arpitagarwal Arpit Agarwal
              Reporter:
              arpitagarwal Arpit Agarwal
            • Votes:
              0 Vote for this issue
              Watchers:
              14 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development