Uploaded image for project: 'Hadoop HDFS'
  1. Hadoop HDFS
  2. HDFS-13571

Deadnode detection

VotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • New Feature
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 2.4.0, 2.6.0, 3.0.2
    • 3.3.0
    • hdfs-client
    • None
    • Reviewed
    • When dead node blocks DFSInputStream, Deadnode detection can find it and share this information to other DFSInputStreams in the same DFSClient. Thus, these DFSInputStreams will not read from the dead node and be blocked by this dead node.

    Description

      Currently, the information of the dead datanode in DFSInputStream in stored locally. So, it could not be shared among the inputstreams of the same DFSClient. In our production env, every days, some datanodes dies with different causes. At this time, after the first inputstream blocked and detect this, it could share this information to others in the same DFSClient, thus, the ohter inputstreams are still blocked by the dead node for some time, which could cause bad service latency.

      To eliminate this impact from dead datanode, we designed a dead datanode detector, which detect the dead ones in advance, and share this information among all the inputstreams in the same client. This improvement has being online for some months and works fine.  So, we decide to port to the 3.0 (the version used in our production env is 2.4 and 2.6).

      I will do the porting work and upload the code later.

      Attachments

        1. DeadNodeDetectorDesign.pdf
          195 kB
          Lisheng Sun
        2. HDFS-13571-2.6.diff
          57 kB
          Gang Xie
        3. node status machine.png
          291 kB
          Lisheng Sun

        Issue Links

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            leosun08 Lisheng Sun
            xiegang112 Gang Xie
            Votes:
            0 Vote for this issue
            Watchers:
            20 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Time Tracking

                Estimated:
                Original Estimate - Not Specified
                Not Specified
                Remaining:
                Remaining Estimate - 0h
                0h
                Logged:
                Time Spent - 3h
                3h

                Slack

                  Issue deployment