Uploaded image for project: 'Hadoop HDFS'
  1. Hadoop HDFS
  2. HDFS-916

Rewrite DFSOutputStream to use a single thread with NIO

Add voteVotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 0.22.0
    • None
    • hdfs-client
    • None

    Description

      The DFS write pipeline code has some really hairy multi-threaded synchronization. There have been a lot of bugs produced by this (HDFS-101, HDFS-793, HDFS-915, tens of others) since it's very hard to understand the message passing, lock sharing, and interruption properties. The reason for the multiple threads is to be able to simultaneously send and receive. If instead of using multiple threads, it used nonblocking IO, I think the whole thing would be a lot less error prone.

      I think we could do this in two halves: one half is the DFSOutputStream. The other half is BlockReceiver. I opened this JIRA first as I think it's simpler (only one TCP connection to deal with, rather than an up and downstream)

      Opinions? Am I crazy? I would like to see some agreement on the idea before I spend time writing code.

      Attachments

        Issue Links

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            Unassigned Unassigned
            tlipcon Todd Lipcon

            Dates

              Created:
              Updated:

              Slack

                Issue deployment