Uploaded image for project: 'Hadoop HDFS'
  1. Hadoop HDFS
  2. HDFS-1790

review use of synchronized methods in FSNamesystem and SafeModeInfo

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 0.23.0
    • None
    • namenode
    • None

    Description

      While analyzing SafeMode semantics for HDFS-1726, I noticed the following usage of synchronized methods in FSNamesystem that raised questions:

      1. incVolumeFailure() is synchronized on the FSNamesystem instance (monitor lock), for unclear reasons. Is this a left-over from the conversion to read/write lock in FSNamesystem?

      2. isInSafeMode() and isPopulatingReplQueues() are also synchronized on the FSNamesystem instance (monitor lock), but the SafeModeInfo methods they call (safeMode.isInSafeMode() and safeMode.isPopulatingReplQueues() are synchronized on the SafeModeInfo instance. Is synchronizing on the FSNamesystem instance necessary? What does it add? If it is necessary, should it be using the read/write lock?

      3. In SafeModeInfo, these methods are synchronized on the SafeModeInfo instance:

      • isOn
      • isPopulatingReplQueues
      • leave
      • initialize
      • canInitializeReplQueues
      • canLeave
      • setBlock
      • incrementSafeBlockCount
      • decrementSafeBlockCount
      • setManual
        but these are not:
      • enter
      • needEnter
      • checkMode
      • isManual
      • isConsistent

      Regarding these:

      • isOn() asserts isConsistent(), but is otherwise an atomic read operation.
      • isPopulatingReplQueues() is an atomic read. Does it need synchronization?
      • leave() is complex, but shouldn't it be synchronized with enter(), which is also a write operation? Yet enter() is unsynchronized.
      • initializeReplQueues() calls blockManager.processMisReplicatedBlocks(), which can take minutes to run. By synchronizing on the SafeModeInfo instance, it prevents essentially all of the other safeMode methods from running for the duration! Is this desirable or needed?
      • canInitializeReplQueues() is a read-only operation, although it does compare two read values. needEnter() compares four read values, and is unsynchronized. Does either need synchronization?
      • canLeave() is a compound operation, it's good that it is synchronized.
      • checkMode() is a big compound read/write operation. Doesn't it need synchronization if the other methods do?
        and so on.

      Attachments

        Activity

          People

            Unassigned Unassigned
            mattf Matthew Foley
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

              Created:
              Updated: