Hadoop HDFS
  1. Hadoop HDFS
  2. HDFS-3703

Decrease the datanode failure detection time

    Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 1.0.3, 2.0.0-alpha, 3.0.0
    • Fix Version/s: 1.1.0, 2.0.3-alpha
    • Component/s: datanode, namenode
    • Labels:
      None
    • Hadoop Flags:
      Reviewed
    • Release Note:
      Hide
      This jira adds a new DataNode state called "stale" at the NameNode. DataNodes are marked as stale if it does not send heartbeat message to NameNode within the timeout configured using the configuration parameter "dfs.namenode.stale.datanode.interval" in seconds (default value is 30 seconds). NameNode picks a stale datanode as the last target to read from when returning block locations for reads.

      This feature is by default turned * off *. To turn on the feature, set the HDFS configuration "dfs.namenode.check.stale.datanode" to true.
      Show
      This jira adds a new DataNode state called "stale" at the NameNode. DataNodes are marked as stale if it does not send heartbeat message to NameNode within the timeout configured using the configuration parameter "dfs.namenode.stale.datanode.interval" in seconds (default value is 30 seconds). NameNode picks a stale datanode as the last target to read from when returning block locations for reads. This feature is by default turned * off *. To turn on the feature, set the HDFS configuration "dfs.namenode.check.stale.datanode" to true.

      Description

      By default, if a box dies, the datanode will be marked as dead by the namenode after 10:30 minutes. In the meantime, this datanode will still be proposed by the nanenode to write blocks or to read replicas. It happens as well if the datanode crashes: there is no shutdown hooks to tell the nanemode we're not there anymore.
      It especially an issue with HBase. HBase regionserver timeout for production is often 30s. So with these configs, when a box dies HBase starts to recover after 30s and, while 10 minutes, the namenode will consider the blocks on the same box as available. Beyond the write errors, this will trigger a lot of missed reads:

      • during the recovery, HBase needs to read the blocks used on the dead box (the ones in the 'HBase Write-Ahead-Log')
      • after the recovery, reading these data blocks (the 'HBase region') will fail 33% of the time with the default number of replica, slowering the data access, especially when the errors are socket timeout (i.e. around 60s most of the time).

      Globally, it would be ideal if HDFS settings could be under HBase settings.
      As a side note, HBase relies on ZooKeeper to detect regionservers issues.

      1. HDFS-3703-branch-1.1-read-only.patch
        13 kB
        Jing Zhao
      2. HDFS-3703-branch-1.1-read-only.patch
        13 kB
        Jing Zhao
      3. HDFS-3703-trunk-read-only.patch
        22 kB
        Jing Zhao
      4. 3703-hadoop-1.0.txt
        15 kB
        Ted Yu
      5. HDFS-3703-trunk-read-only.patch
        22 kB
        Jing Zhao
      6. HDFS-3703-trunk-read-only.patch
        18 kB
        Jing Zhao
      7. HDFS-3703-trunk-read-only.patch
        18 kB
        Jing Zhao
      8. HDFS-3703-trunk-read-only.patch
        18 kB
        Jing Zhao
      9. HDFS-3703-trunk-read-only.patch
        16 kB
        Jing Zhao
      10. HDFS-3703-trunk-read-only.patch
        16 kB
        Jing Zhao
      11. HDFS-3703-trunk-with-write.patch
        18 kB
        Jing Zhao
      12. HDFS-3703-branch2.patch
        18 kB
        Nicolas Liochon
      13. HDFS-3703.patch
        16 kB
        Jing Zhao

        Issue Links

          Activity

          Nicolas Liochon created issue -
          Nicolas Liochon made changes -
          Field Original Value New Value
          Link This issue relates to HBASE-5843 [ HBASE-5843 ]
          Suresh Srinivas made changes -
          Assignee Suresh Srinivas [ sureshms ]
          Jeff Hammerbacher made changes -
          Link This issue is related to HDFS-1599 [ HDFS-1599 ]
          Jing Zhao made changes -
          Attachment HDFS-3703.patch [ 12542700 ]
          Nicolas Liochon made changes -
          Attachment HDFS-3703-branch2.patch [ 12543887 ]
          Jing Zhao made changes -
          Attachment HDFS-3703-trunk-with-write.patch [ 12544259 ]
          Nicolas Liochon made changes -
          Link This issue is related to HBASE-6751 [ HBASE-6751 ]
          Jing Zhao made changes -
          Attachment HDFS-3703-trunk-read-only.patch [ 12544497 ]
          Jing Zhao made changes -
          Attachment HDFS-3703-trunk-read-only.patch [ 12544505 ]
          Jing Zhao made changes -
          Attachment HDFS-3703-trunk-read-only.patch [ 12544595 ]
          Jing Zhao made changes -
          Attachment HDFS-3703-trunk-read-only.patch [ 12544677 ]
          Ted Yu made changes -
          Status Open [ 1 ] Patch Available [ 10002 ]
          Suresh Srinivas made changes -
          Assignee Suresh Srinivas [ sureshms ] Jing Zhao [ jingzhao ]
          Jing Zhao made changes -
          Attachment HDFS-3703-trunk-read-only.patch [ 12544728 ]
          Ted Yu made changes -
          Fix Version/s 1.0.4 [ 12322463 ]
          Jing Zhao made changes -
          Attachment HDFS-3703-trunk-read-only.patch [ 12544886 ]
          Ted Yu made changes -
          Status Patch Available [ 10002 ] Open [ 1 ]
          Ted Yu made changes -
          Attachment 3703-hadoop-1.0.txt [ 12544896 ]
          Jing Zhao made changes -
          Attachment HDFS-3703-trunk-read-only.patch [ 12544897 ]
          Jing Zhao made changes -
          Status Open [ 1 ] Patch Available [ 10002 ]
          Affects Version/s 3.0.0 [ 12320356 ]
          Fix Version/s 1.0.4 [ 12322463 ]
          Suresh Srinivas made changes -
          Fix Version/s 3.0.0 [ 12320356 ]
          Suresh Srinivas made changes -
          Release Note This jira adds a new DataNode state called "stale" at the NameNode. DataNodes are marked as stale if it does not send heartbeat message to NameNode within the timeout configured using the configuration parameter "dfs.namenode.stale.datanode.interval" in seconds (default value is 30 seconds). NameNode picks a stale datanode as the last target to read from when returning block locations for reads.

          This feature is by default turned * off *. To turn on the feature, set the HDFS configuration "dfs.namenode.check.stale.datanode" to true.
          Suresh Srinivas made changes -
          Fix Version/s 2.0.3-alpha [ 12323274 ]
          Nicolas Liochon made changes -
          Link This issue relates to HDFS-3912 [ HDFS-3912 ]
          Jing Zhao made changes -
          Attachment HDFS-3703-branch-1.1-read-only.patch [ 12545088 ]
          Jing Zhao made changes -
          Attachment HDFS-3703-branch-1.1-read-only.patch [ 12545098 ]
          Jing Zhao made changes -
          Status Patch Available [ 10002 ] Open [ 1 ]
          Tsz Wo Nicholas Sze made changes -
          Status Open [ 1 ] Resolved [ 5 ]
          Hadoop Flags Reviewed [ 10343 ]
          Fix Version/s 1.1.0 [ 12317959 ]
          Resolution Fixed [ 1 ]
          Suresh Srinivas made changes -
          Target Version/s 1.1.0 [ 12317959 ]
          Matt Foley made changes -
          Status Resolved [ 5 ] Closed [ 6 ]
          Harsh J made changes -
          Target Version/s 1.1.0 [ 12317959 ]
          Todd Lipcon made changes -
          Link This issue is related to HDFS-4350 [ HDFS-4350 ]
          Suresh Srinivas made changes -
          Fix Version/s 2.0.2-alpha [ 12322472 ]
          Fix Version/s 3.0.0 [ 12320356 ]
          Fix Version/s 2.0.3-alpha [ 12323274 ]
          Suresh Srinivas made changes -
          Fix Version/s 2.0.3-alpha [ 12323274 ]
          Fix Version/s 2.0.2-alpha [ 12322472 ]

            People

            • Assignee:
              Jing Zhao
              Reporter:
              Nicolas Liochon
            • Votes:
              0 Vote for this issue
              Watchers:
              29 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development