Hadoop HDFS
  1. Hadoop HDFS
  2. HDFS-3599

Better expose when under-construction files are preventing DN decommission

    Details

    • Type: Improvement Improvement
    • Status: Open
    • Priority: Major Major
    • Resolution: Unresolved
    • Affects Version/s: 3.0.0
    • Fix Version/s: None
    • Component/s: datanode, namenode
    • Labels:
      None

      Description

      Filing on behalf of Konstantin Olchanski:

      I have been trying to decommission a data node, but the process
      stalled. I followed the correct instructions, observed my node
      listed in "Decommissioning Nodes", etc, observed "Under Replicated Blocks"
      decrease, etc. But the count went down to "1" and the decommissin process stalled.
      There was no visible activity anywhere, nothing was happening (well,
      maybe in some hidden log file somewhere something complained,
      but I did not look).

      It turns out that I had some files stuck in "OPENFORWRITE" mode,
      as reported by "hdfs fsck / -openforwrite -files -blocks -locations -racks":

      /users/trinat/data/.fuse_hidden0000177e00000002 0 bytes, 0 block(s), OPENFORWRITE:  OK
      /users/trinat/data/.fuse_hidden0000178d00000003 0 bytes, 0 block(s), OPENFORWRITE:  OK
      /users/trinat/data/.fuse_hidden00001da300000004 0 bytes, 1 block(s), OPENFORWRITE:  OK
      0. BP-88378204-142.90.119.126-1340494203431:blk_6980480609696383665_20259{blockUCState=UNDER_CONSTRUCTION, primaryNodeIndex=-1, replicas=[ReplicaUnderConstruction[142.90.111.72:50010|RBW], ReplicaUnderConstruction[142.90.119.162:50010|RBW], ReplicaUnderConstruction[142.90.119.126:50010|RBW]]} len=0 repl=3 [/detfac/142.90.111.72:50010, /isac2/142.90.119.162:50010, /isac2/142.90.119.126:50010]
      

      After I deleted those files, the decommission process completed successfully.

      Perhaps one can add some visible indication somewhere on the HDFS status web page
      that the decommission process is stalled and maybe report why it is stalled?

      Maybe the number of "OPENFORWRITE" files should be listed on the status page
      next to the "Number of Under-Replicated Blocks"? (Since I know that nobody is writing
      to my HDFS, the non-zero count would give me a clue that something is wrong).

        Activity

        Hide
        Ming Ma added a comment -
        Show
        Ming Ma added a comment - It seems https://issues.apache.org/jira/browse/HDFS-5579 has fixed it.
        Todd Lipcon created issue -

          People

          • Assignee:
            Unassigned
            Reporter:
            Todd Lipcon
          • Votes:
            0 Vote for this issue
            Watchers:
            9 Start watching this issue

            Dates

            • Created:
              Updated:

              Development