Uploaded image for project: 'Hadoop Common'
  1. Hadoop Common
  2. HADOOP-445

Parallel data/socket writing for DFSOutputStream

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Closed
    • Major
    • Resolution: Won't Fix
    • 0.5.0
    • None
    • None
    • None

    Description

      Currently, as DFS clients output blocks they write the entire block to disk before starting to transmit to the datanode. By writing to disk the client is able to retry a block write if the datanode files in the middle of a block transfer. Writing to disk and then to the datanode adds latency. Hopefully, the common case is that block transfers to datanodes are successful. This patch writes to the datanode and the disk in parallel. If the write to the datanode fails, it falls back to current behavior.

      In my tests of transmits of 237M and 946M datasets using -copyFromLocal I'm seeing a 20-25% improvement in throughput.

      Attachments

        Issue Links

          Activity

            People

              sameerp Sameer Paranjpye
              breed Benjamin Reed
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: