Uploaded image for project: 'Hadoop HDFS'
  1. Hadoop HDFS
  2. HDFS-915

Hung DN stalls write pipeline for far longer than its timeout

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 0.20.1
    • None
    • hdfs-client
    • None

    Description

      After running kill -STOP on the datanode in the middle of a write pipeline, the client takes far longer to recover than it should. The ResponseProcessor times out in the correct interval, but doesn't interrupt the DataStreamer, which appears to not be subject to the same timeout. The client only recovers once the OS actually declares the TCP stream dead, which can take a very long time.

      I've experienced this on 0.20.1, haven't tried it yet on trunk or 0.21.

      Attachments

        1. hdfs-915-0.20.txt
          2 kB
          Todd Lipcon
        2. local-dn.log
          12 kB
          Todd Lipcon

        Issue Links

          Activity

            People

              tlipcon Todd Lipcon
              tlipcon Todd Lipcon
              Votes:
              0 Vote for this issue
              Watchers:
              8 Start watching this issue

              Dates

                Created:
                Updated: