Hadoop HDFS
  1. Hadoop HDFS
  2. HDFS-1726

query method for what kind of safe mode the Namenode is in

    Details

    • Type: Improvement Improvement
    • Status: Open
    • Priority: Major Major
    • Resolution: Unresolved
    • Affects Version/s: 0.22.0
    • Fix Version/s: 0.24.0
    • Component/s: namenode
    • Labels:
      None
    • Tags:
      safe mode, safemode, startup

      Description

      If we could differentiate between "startup safemode" vs other safemode, it would be easier to do startup optimizations like HDFS-1295. Looking at FSNamesystem, this can be queried, but not with a single query, and the semantics are not reliable under future changes. Also, the FSNamesystem code itself, internally, uses more than one way to test for manual safe mode.

      Proposal is to create a status field and query method in FSNamesystem with enum values

      {NOT_IN_SAFEMODE, SAFEMODE_STARTUP, SAFEMODE_EXTENSION, SAFEMODE_MANUAL}

      If in the future we add automatic fallback to safe mode, we would add value SAFEMODE_AUTOMATIC.

      This change will make it easier to do startup optimizations, and will also allow making the safemode management code in FSNamesystem simpler and more consistent.

        Issue Links

          Activity

          Matt Foley created issue -
          Aaron T. Myers made changes -
          Field Original Value New Value
          Link This issue is related to HDFS-1594 [ HDFS-1594 ]
          Hide
          Aaron T. Myers added a comment -

          HDFS-1594 could make use of the proposed SAFEMODE_AUTOMATIC status.

          Show
          Aaron T. Myers added a comment - HDFS-1594 could make use of the proposed SAFEMODE_AUTOMATIC status.
          Konstantin Boudnik made changes -
          Link This issue blocks HDFS-1594 [ HDFS-1594 ]
          Hide
          Matt Foley added a comment -

          Besides creating SafeModeState and a way to query it, this patch fixes two minor bugs found during development:

          • when entering manual mode via the SafeModeInfo() ctor, sets "reached" = -1. This caused an incorrect getTurnOffTip() result, because "reached == -1" is supposed to mean safe mode is off, and it is not needed for manual mode detection, which relies on "extension = Integer.MAX_VALUE". Changed to set it to 0, consistent with being in safe mode (regardless of reason).
          • safeMode.incrementSafeBlockCount() and safeMode.decrementSafeBlockCount() call checkMode() on every call, regardless of whether they change anything that needs checkMode(). Changed to only call checkMode() if they succeed in changing the blockSafe count.

          Finally, found some inconsistencies in the use of "synchronized". Opened ticket HDFS-1790 to clarify them.

          Show
          Matt Foley added a comment - Besides creating SafeModeState and a way to query it, this patch fixes two minor bugs found during development: when entering manual mode via the SafeModeInfo() ctor, sets "reached" = -1. This caused an incorrect getTurnOffTip() result, because "reached == -1" is supposed to mean safe mode is off, and it is not needed for manual mode detection, which relies on "extension = Integer.MAX_VALUE". Changed to set it to 0, consistent with being in safe mode (regardless of reason). safeMode.incrementSafeBlockCount() and safeMode.decrementSafeBlockCount() call checkMode() on every call, regardless of whether they change anything that needs checkMode(). Changed to only call checkMode() if they succeed in changing the blockSafe count. Finally, found some inconsistencies in the use of "synchronized". Opened ticket HDFS-1790 to clarify them.
          Matt Foley made changes -
          Attachment SafeModeState_v2.patch [ 12474681 ]
          Arun C Murthy made changes -
          Fix Version/s 0.24.0 [ 12317653 ]
          Fix Version/s 0.23.0 [ 12315571 ]
          Hide
          Steve Loughran added a comment -

          It'd be nice to also have text for end users, which could also be set in a manual entry mode. This would let the ops teams add text like "down for rack maintenance back at 18:00" for the web UI and exception text

          Show
          Steve Loughran added a comment - It'd be nice to also have text for end users, which could also be set in a manual entry mode. This would let the ops teams add text like "down for rack maintenance back at 18:00" for the web UI and exception text
          Gavin made changes -
          Link This issue blocks HDFS-1594 [ HDFS-1594 ]
          Gavin made changes -
          Link This issue is depended upon by HDFS-1594 [ HDFS-1594 ]

            People

            • Assignee:
              Matt Foley
              Reporter:
              Matt Foley
            • Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

              • Created:
                Updated:

                Development