Uploaded image for project: 'Hadoop HDFS'
  1. Hadoop HDFS
  2. HDFS-11234

distcp performance is suboptimal for high bandwidth/high latency setups

    XMLWordPrintableJSON

    Details

    • Type: Improvement
    • Status: Open
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: 2.7.1
    • Fix Version/s: None
    • Component/s: hdfs
    • Labels:
      None

      Description

      Because distcp uses tcp socket with buffer size set to 128K, for a setup which has very high bandwidth but also a very high latency, the throughput is quite poor. This is because tcp stops sending more data till the time it gets the ACKs. By not setting the socket size and letting linux kernel manage the socket, we should be able to get optimal performance.

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                subahugu Suresh Bahuguna
                Reporter:
                subahugu Suresh Bahuguna
              • Votes:
                0 Vote for this issue
                Watchers:
                5 Start watching this issue

                Dates

                • Created:
                  Updated: