Uploaded image for project: 'Hadoop HDFS'
  1. Hadoop HDFS
  2. HDFS-4832

Namenode doesn't change the number of missing blocks in safemode when DNs rejoin or leave

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Critical
    • Resolution: Fixed
    • 0.23.7, 2.1.0-beta, 3.0.0-alpha1
    • 2.1.0-beta, 0.23.9
    • None
    • None
    • Reviewed
    • This change makes name node keep its internal replication queues and data node state updated in manual safe mode. This allows metrics and UI to present up-to-date information while in safe mode. The behavior during start-up safe mode is unchanged.

    Description

      Courtesy Karri VRK Reddy!

      1. Namenode lost datanodes causing missing blocks
      2. Namenode was put in safe mode
      3. Datanode restarted on dead nodes
      4. Waited for lots of time for the NN UI to reflect the recovered blocks.
      5. Forced NN out of safe mode and suddenly, no more missing blocks anymore.

      I was able to replicate this on 0.23 and trunk. I set dfs.namenode.heartbeat.recheck-interval to 1 and killed the DN to simulate "lost" datanode. The opposite case also has problems (i.e. Datanode failing when NN is in safemode, doesn't lead to a missing blocks message)

      Without the NN updating this list of missing blocks, the grid admins will not know when to take the cluster out of safemode.

      Attachments

        1. HDFS-4832.patch
          0.8 kB
          Ravi Prakash
        2. HDFS-4832.patch
          7 kB
          Ravi Prakash
        3. HDFS-4832.patch
          7 kB
          Ravi Prakash
        4. HDFS-4832.patch
          7 kB
          Ravi Prakash
        5. HDFS-4832.patch
          8 kB
          Ravi Prakash
        6. HDFS-4832.patch
          8 kB
          Ravi Prakash
        7. HDFS-4832.branch-0.23.patch
          7 kB
          Ravi Prakash

        Issue Links

          Activity

            People

              raviprak Ravi Prakash
              raviprak Ravi Prakash
              Votes:
              0 Vote for this issue
              Watchers:
              8 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: