Hadoop Common
  1. Hadoop Common
  2. HADOOP-3002

HDFS should not remove blocks while in safemode.

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Blocker Blocker
    • Resolution: Fixed
    • Affects Version/s: 0.16.0
    • Fix Version/s: 0.17.2
    • Component/s: None
    • Labels:
      None

      Description

      I noticed that data-nodes are removing blocks during a rather prolonged distributed upgrade when the name-node is in safe mode.
      This happened on my experimental cluster with accelerated block report rate.
      By definition in safe mode the name-node should not

      • accept client requests to change the namespace state, and
      • schedule block replications and/or block removal for the data-nodes.

      We don't want any unnecessary replications until all blocks are reported during startup.
      We also don't want to remove blocks if safe mode is entered manually.
      In heartbeat processing we explicitly verify that the name-node is in safe-mode and do not return any block commands to the data-nodes.
      Block reports can also return block commands, which should be banned during safe mode.

      1. DelBlocksInSafeMode.patch
        8 kB
        Konstantin Shvachko
      2. DelBlocksInSafeMode.patch
        11 kB
        Konstantin Shvachko
      3. DelBlocksInSafeMode-017.patch
        7 kB
        Konstantin Shvachko
      4. DelBlocksInSafeMode-018.patch
        9 kB
        Konstantin Shvachko
      5. DelBlocksInSafeMode-018.patch
        10 kB
        Konstantin Shvachko

        Issue Links

          Activity

          Konstantin Shvachko created issue -
          Sameer Paranjpye made changes -
          Field Original Value New Value
          Fix Version/s 0.16.2 [ 12313051 ]
          Nigel Daley made changes -
          Priority Critical [ 2 ] Blocker [ 1 ]
          Sameer Paranjpye made changes -
          Priority Blocker [ 1 ] Major [ 3 ]
          Sameer Paranjpye made changes -
          Affects Version/s 0.16.0 [ 12312740 ]
          Fix Version/s 0.17.0 [ 12312913 ]
          Lohit Vijayarenu made changes -
          Fix Version/s 0.19.0 [ 12313211 ]
          Hide
          Robert Chansler added a comment -

          If we fix this, 3677 can be demoted.

          Show
          Robert Chansler added a comment - If we fix this, 3677 can be demoted.
          Robert Chansler made changes -
          Priority Major [ 3 ] Blocker [ 1 ]
          Robert Chansler made changes -
          Assignee Konstantin Shvachko [ shv ]
          Hide
          Konstantin Shvachko added a comment -

          This is the patch that postpones removal of blocks until the safe mode is off.
          The main reason for delition was that block report processing was removing blocks that do not belong
          to any file directly ignoring the regular mechanism that first adds invalid blocks into recentInvalidateSets
          and then schedules them for deletion via heartbeats.

          1. I changed block report processing to just placing invalid blocks to recentInvalidateSets
            and not returning any commands to data-nodes. This optimized processReport() because now it
            does not scan the block report once again looking for invalid blocks.
          2. I changed heartbeat processing because it never checked the safe mode and would schedule
            replications or deletions if there were any in the pending lists.
            During startup the pending lists are empty but in manual safe mode it may not be the case.
            So now the only commands that are allowed when safe mode is on are requests for block reports
            and distributed upgrade commands.
            It is not clear why some code in handleHeartbeat() is inside the synchronized section and some is not.
            Placed everything inside.
          Show
          Konstantin Shvachko added a comment - This is the patch that postpones removal of blocks until the safe mode is off. The main reason for delition was that block report processing was removing blocks that do not belong to any file directly ignoring the regular mechanism that first adds invalid blocks into recentInvalidateSets and then schedules them for deletion via heartbeats. I changed block report processing to just placing invalid blocks to recentInvalidateSets and not returning any commands to data-nodes. This optimized processReport() because now it does not scan the block report once again looking for invalid blocks. I changed heartbeat processing because it never checked the safe mode and would schedule replications or deletions if there were any in the pending lists. During startup the pending lists are empty but in manual safe mode it may not be the case. So now the only commands that are allowed when safe mode is on are requests for block reports and distributed upgrade commands. It is not clear why some code in handleHeartbeat() is inside the synchronized section and some is not. Placed everything inside.
          Konstantin Shvachko made changes -
          Attachment DelBlocksInSafeMode.patch [ 12385265 ]
          Konstantin Shvachko made changes -
          Status Open [ 1 ] Patch Available [ 10002 ]
          Fix Version/s 0.17.0 [ 12312913 ]
          Fix Version/s 0.19.0 [ 12313211 ]
          Fix Version/s 0.18.0 [ 12312972 ]
          Nigel Daley made changes -
          Fix Version/s 0.17.0 [ 12312913 ]
          Fix Version/s 0.17.2 [ 12313296 ]
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12385265/DelBlocksInSafeMode.patch
          against trunk revision 674442.

          +1 @author. The patch does not contain any @author tags.

          -1 tests included. The patch doesn't appear to include any new or modified tests.
          Please justify why no tests are needed for this patch.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          -1 findbugs. The patch appears to introduce 1 new Findbugs warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed core unit tests.

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2799/testReport/
          Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2799/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
          Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2799/artifact/trunk/build/test/checkstyle-errors.html
          Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2799/console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12385265/DelBlocksInSafeMode.patch against trunk revision 674442. +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no tests are needed for this patch. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. -1 findbugs. The patch appears to introduce 1 new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2799/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2799/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2799/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2799/console This message is automatically generated.
          Konstantin Shvachko made changes -
          Attachment DelBlocksInSafeMode-018.patch [ 12385446 ]
          Hide
          dhruba borthakur added a comment -

          Hi Konstantin, I opened HADOOP-3709 to document the lock-hierarchy violation in processing heartbeats. The goal is to not acquire the global FSNamesystem lock to process every heartbeat. Maybe the patch you provide in this patch already fixes HADOOP-3709.

          Show
          dhruba borthakur added a comment - Hi Konstantin, I opened HADOOP-3709 to document the lock-hierarchy violation in processing heartbeats. The goal is to not acquire the global FSNamesystem lock to process every heartbeat. Maybe the patch you provide in this patch already fixes HADOOP-3709 .
          dhruba borthakur made changes -
          Link This issue relates to HADOOP-3709 [ HADOOP-3709 ]
          Hide
          Konstantin Shvachko added a comment -

          I reverted changes that has been committed. The global lock leads to a potential deadlock.

          Thanks Dhruba I overlooked the global lock, which we did not have before. It was introduced in 0.18 by HADOOP-1985.
          I'll submit another patch.

          Show
          Konstantin Shvachko added a comment - I reverted changes that has been committed. The global lock leads to a potential deadlock. Thanks Dhruba I overlooked the global lock, which we did not have before. It was introduced in 0.18 by HADOOP-1985 . I'll submit another patch.
          Konstantin Shvachko made changes -
          Link This issue is related to HADOOP-3677 [ HADOOP-3677 ]
          Konstantin Shvachko made changes -
          Status Patch Available [ 10002 ] Open [ 1 ]
          Hide
          Konstantin Shvachko added a comment -

          This is a new patch, which does not change heartbeat processing.
          The global lock issue will be taken care of by HADOOP-3620.

          Show
          Konstantin Shvachko added a comment - This is a new patch, which does not change heartbeat processing. The global lock issue will be taken care of by HADOOP-3620 .
          Konstantin Shvachko made changes -
          Attachment DelBlocksInSafeMode.patch [ 12385454 ]
          Konstantin Shvachko made changes -
          Status Open [ 1 ] Patch Available [ 10002 ]
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12385454/DelBlocksInSafeMode.patch
          against trunk revision 674932.

          +1 @author. The patch does not contain any @author tags.

          -1 tests included. The patch doesn't appear to include any new or modified tests.
          Please justify why no tests are needed for this patch.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 findbugs. The patch does not introduce any new Findbugs warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 core tests. The patch passed core unit tests.

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2816/testReport/
          Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2816/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
          Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2816/artifact/trunk/build/test/checkstyle-errors.html
          Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2816/console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12385454/DelBlocksInSafeMode.patch against trunk revision 674932. +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no tests are needed for this patch. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2816/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2816/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2816/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2816/console This message is automatically generated.
          Konstantin Shvachko made changes -
          Attachment DelBlocksInSafeMode-018.patch [ 12385546 ]
          Attachment DelBlocksInSafeMode-017.patch [ 12385547 ]
          Hide
          Konstantin Shvachko added a comment -

          I just committed this.

          Show
          Konstantin Shvachko added a comment - I just committed this.
          Konstantin Shvachko made changes -
          Fix Version/s 0.18.0 [ 12312972 ]
          Status Patch Available [ 10002 ] Resolved [ 5 ]
          Resolution Fixed [ 1 ]
          Tsz Wo Nicholas Sze made changes -
          Link This issue relates to HADOOP-3804 [ HADOOP-3804 ]
          Owen O'Malley made changes -
          Status Resolved [ 5 ] Closed [ 6 ]
          Hide
          Hudson added a comment -
          Show
          Hudson added a comment - Integrated in Hadoop-trunk #581 (See http://hudson.zones.apache.org/hudson/job/Hadoop-trunk/581/ )
          Owen O'Malley made changes -
          Component/s dfs [ 12310710 ]

            People

            • Assignee:
              Konstantin Shvachko
              Reporter:
              Konstantin Shvachko
            • Votes:
              0 Vote for this issue
              Watchers:
              0 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development