Uploaded image for project: 'Hadoop HDFS'
  1. Hadoop HDFS
  2. HDFS-101

DFS write pipeline : DFSClient sometimes does not detect second datanode failure

VotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Blocker
    • Resolution: Fixed
    • 0.20.1, 0.20-append
    • 0.20.2, 0.20-append, 0.21.0
    • datanode
    • None
    • Reviewed

    Description

      When the first datanode's write to second datanode fails or times out DFSClient ends up marking first datanode as the bad one and removes it from the pipeline. Similar problem exists on DataNode as well and it is fixed in HADOOP-3339. From HADOOP-3339 :

      "The main issue is that BlockReceiver thread (and DataStreamer in the case of DFSClient) interrupt() the 'responder' thread. But interrupting is a pretty coarse control. We don't know what state the responder is in and interrupting has different effects depending on responder state. To fix this properly we need to redesign how we handle these interactions."

      When the first datanode closes its socket from DFSClient, DFSClient should properly read all the data left in the socket.. Also, DataNode's closing of the socket should not result in a TCP reset, otherwise I think DFSClient will not be able to read from the socket.

      Attachments

        1. hdfs-101-branch-0.20-append-cdh3.txt
          11 kB
          Todd Lipcon
        2. HDFS-101_20-append.patch
          12 kB
          Nicolas Spiegelberg
        3. pipelineHeartbeat_yahoo.patch
          4 kB
          Hairong Kuang
        4. pipelineHeartbeat.patch
          4 kB
          Hairong Kuang
        5. detectDownDN3-0.20-yahoo.patch
          10 kB
          Hairong Kuang
        6. detectDownDN3.patch
          8 kB
          Hairong Kuang
        7. detectDownDN3-0.20.patch
          9 kB
          Hairong Kuang
        8. detectDownDN2.patch
          7 kB
          Hairong Kuang
        9. detectDownDN1-0.20.patch
          9 kB
          Hairong Kuang
        10. hdfs-101.tar.gz
          3 kB
          Todd Lipcon
        11. detectDownDN-0.20.patch
          8 kB
          Hairong Kuang

        Issue Links

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            hairong Hairong Kuang
            rangadi Raghu Angadi
            Votes:
            0 Vote for this issue
            Watchers:
            12 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment