Uploaded image for project: 'Hadoop HDFS'
  1. Hadoop HDFS
  2. HDFS-277

DFSClient writes : DataStreamer thread can be removed

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Open
    • Major
    • Resolution: Unresolved
    • None
    • None
    • None
    • None

    Description

      When a client is writing data to DFS, DFSClient keeps two threads for each file open :

      • DataStreamer thread : writes the data to DataNodes (as 64k packets)
      • ResponseProcessor : receives acks from the datanodes and detects related errors.

      I think job of DataStreamer can be done inside user's write() (i.e. inside the user thread). So for normal case, there will be one less thread. When there is an error in the write pipeline, all the un-acked packets need to be resent. In that case, ResponseProcessor can always create temporary thread to send these packets.

      In the future, the acks for multiple pipelines can be handled by a common thread (at least in the default case where sockets are non-blocking).

      Attachments

        Activity

          People

            Unassigned Unassigned
            rangadi Raghu Angadi
            Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

              Created:
              Updated: