Uploaded image for project: 'Hadoop HDFS'
  1. Hadoop HDFS
  2. HDFS-4754

Add an API in the namenode to mark a datanode as stale



    • Improvement
    • Status: Patch Available
    • Critical
    • Resolution: Unresolved
    • None
    • None
    • hdfs-client, namenode


      There is a detection of the stale datanodes in HDFS since HDFS-3703, with a timeout, defaulted to 30s.

      There are two reasons to add an API to mark a node as stale even if the timeout is not yet reached:
      1) ZooKeeper can detect that a client is dead at any moment. So, for HBase, we sometimes start the recovery before a node is marked staled. (even with reasonable settings as: stale: 20s; HBase ZK timeout: 30s
      2) Some third parties could detect that a node is dead before the timeout, hence saving us the cost of retrying. An example or such hw is Arista, presented here by tsuna http://tsunanet.net/~tsuna/fsf-hbase-meetup-april13.pdf, and confirmed in HBASE-6290.

      As usual, even if the node is dead it can comeback before the 10 minutes limit. So I would propose to set a timebound. The API would be

      namenode.markStale(String ipAddress, int port, long durationInMs);

      After durationInMs, the namenode would again rely only on its heartbeat to decide.


      If there is no objections, and if nobody in the hdfs dev team has the time to spend some time on it, I will give it a try for branch 2 & 3.


        1. 4754.v1.patch
          88 kB
          Nicolas Liochon
        2. 4754.v2.patch
          91 kB
          Nicolas Liochon
        3. 4754.v4.patch
          26 kB
          Nicolas Liochon
        4. 4754.v4.patch
          26 kB
          Nicolas Liochon

        Issue Links



              nkeywal Nicolas Liochon
              nkeywal Nicolas Liochon
              0 Vote for this issue
              13 Start watching this issue