Hadoop HDFS
  1. Hadoop HDFS
  2. HDFS-347

DFS read performance suboptimal when client co-located on nodes with data

    Details

    • Target Version/s:
    • Hadoop Flags:
      Reviewed

      Description

      One of the major strategies Hadoop uses to get scalable data processing is to move the code to the data. However, putting the DFS client on the same physical node as the data blocks it acts on doesn't improve read performance as much as expected.

      After looking at Hadoop and O/S traces (via HADOOP-4049), I think the problem is due to the HDFS streaming protocol causing many more read I/O operations (iops) than necessary. Consider the case of a DFSClient fetching a 64 MB disk block from the DataNode process (running in a separate JVM) running on the same machine. The DataNode will satisfy the single disk block request by sending data back to the HDFS client in 64-KB chunks. In BlockSender.java, this is done in the sendChunk() method, relying on Java's transferTo() method. Depending on the host O/S and JVM implementation, transferTo() is implemented as either a sendfilev() syscall or a pair of mmap() and write(). In either case, each chunk is read from the disk by issuing a separahitting te I/O operation for each chunk. The result is that the single request for a 64-MB block ends up the disk as over a thousand smaller requests for 64-KB each.

      Since the DFSClient runs in a different JVM and process than the DataNode, shuttling data from the disk to the DFSClient also results in context switches each time network packets get sent (in this case, the 64-kb chunk turns into a large number of 1500 byte packet send operations). Thus we see a large number of context switches for each block send operation.

      I'd like to get some feedback on the best way to address this, but I think providing a mechanism for a DFSClient to directly open data blocks that happen to be on the same machine. It could do this by examining the set of LocatedBlocks returned by the NameNode, marking those that should be resident on the local host. Since the DataNode and DFSClient (probably) share the same hadoop configuration, the DFSClient should be able to find the files holding the block data, and it could directly open them and send data back to the client. This would avoid the context switches imposed by the network layer, and would allow for much larger read buffers than 64KB, which should reduce the number of iops imposed by each read block operation.

      1. a.patch
        413 kB
        Tsz Wo Nicholas Sze
      2. 2013-04-01-jenkins.patch
        424 kB
        Colin Patrick McCabe
      3. 2013.02.15.consolidated4.patch
        366 kB
        Colin Patrick McCabe
      4. 2013.01.31.consolidated2.patch
        402 kB
        Colin Patrick McCabe
      5. 2013.01.31.consolidated.patch
        402 kB
        Colin Patrick McCabe
      6. 2013.01.28.design.pdf
        72 kB
        Colin Patrick McCabe
      7. full.patch
        379 kB
        Colin Patrick McCabe
      8. hdfs-347-merge.txt
        357 kB
        Todd Lipcon
      9. hdfs-347-merge.txt
        349 kB
        Todd Lipcon
      10. hdfs-347-merge.txt
        346 kB
        Todd Lipcon
      11. HDFS-347.035.patch
        343 kB
        Colin Patrick McCabe
      12. HDFS-347.033.patch
        321 kB
        Colin Patrick McCabe
      13. HDFS-347.030.patch
        260 kB
        Colin Patrick McCabe
      14. HDFS-347.029.patch
        259 kB
        Colin Patrick McCabe
      15. HDFS-347.027.patch
        261 kB
        Colin Patrick McCabe
      16. HDFS-347.026.patch
        240 kB
        Colin Patrick McCabe
      17. HDFS-347.025.patch
        240 kB
        Colin Patrick McCabe
      18. HDFS-347.024.patch
        240 kB
        Colin Patrick McCabe
      19. HDFS-347.022.patch
        249 kB
        Colin Patrick McCabe
      20. HDFS-347.021.patch
        248 kB
        Colin Patrick McCabe
      21. HDFS-347.020.patch
        245 kB
        Colin Patrick McCabe
      22. HDFS-347.019.patch
        247 kB
        Colin Patrick McCabe
      23. HDFS-347.018.patch2
        246 kB
        Colin Patrick McCabe
      24. HDFS-347.018.clean.patch
        112 kB
        Colin Patrick McCabe
      25. HDFS-347.017.patch
        247 kB
        Colin Patrick McCabe
      26. HDFS-347.017.clean.patch
        113 kB
        Colin Patrick McCabe
      27. HDFS-347.016.patch
        250 kB
        Colin Patrick McCabe
      28. HDFS-347-016_cleaned.patch
        109 kB
        Colin Patrick McCabe
      29. HDFS-347-branch-20-append.txt
        27 kB
        ryan rawson
      30. BlockReaderLocal1.txt
        28 kB
        dhruba borthakur
      31. hdfs-347.png
        16 kB
        Todd Lipcon
      32. all.tsv
        12 kB
        Todd Lipcon
      33. hdfs-347.txt
        26 kB
        Todd Lipcon
      34. local-reads-doc
        14 kB
        Todd Lipcon
      35. HADOOP-4801.3.patch
        15 kB
        George Porter
      36. HADOOP-4801.2.patch
        15 kB
        George Porter
      37. HADOOP-4801.1.patch
        14 kB
        George Porter

        Issue Links

        1.
        Encapsulate arguments to BlockReaderFactory in a class Sub-task Resolved Colin Patrick McCabe
         
        2.
        Encapsulate connections to peers in Peer and PeerServer classes Sub-task Resolved Colin Patrick McCabe
         
        3.
        Create DomainSocket and DomainPeer and associated unit tests Sub-task Resolved Colin Patrick McCabe
         
        4.
        BlockReaderLocal should use passed file descriptors rather than paths Sub-task Resolved Colin Patrick McCabe
         
        5.
        DomainSocket should throw AsynchronousCloseException when appropriate Sub-task Resolved Colin Patrick McCabe
         
        6.
        Bypass UNIX domain socket unit tests when they cannot be run Sub-task Resolved Colin Patrick McCabe
         
        7.
        DFSInputStream#getBlockReader: last retries should ignore the cache Sub-task Resolved Colin Patrick McCabe
         
        8.
        Fix bug in DomainSocket path validation Sub-task Resolved Colin Patrick McCabe
         
        9.
        some small DomainSocket fixes: avoid findbugs warning, change log level, etc. Sub-task Resolved Colin Patrick McCabe
         
        10.
        change dfs.datanode.domain.socket.path to dfs.domain.socket.path Sub-task Resolved Colin Patrick McCabe
         
        11.
        HDFS-347: fix case where local reads get disabled incorrectly Sub-task Resolved Colin Patrick McCabe
         
        12.
        HDFS-347: increase default FileInputStreamCache size Sub-task Resolved Todd Lipcon
         
        13.
        make TestPeerCache not flaky Sub-task Resolved Colin Patrick McCabe
         
        14.
        TestDomainSocket fails when system umask is set to 0002 Sub-task Resolved Colin Patrick McCabe
         
        15.
        avoid annoying log message when dfs.domain.socket.path is not set Sub-task Resolved Colin Patrick McCabe
         
        16.
        Make a simple doc to describe the usage and design of the shortcircuit read feature Sub-task Resolved Colin Patrick McCabe
         
        17.
        DataNode: don't create domain socket unless we need it Sub-task Resolved Colin Patrick McCabe
         
        18.
        HDFS-347: style cleanups Sub-task Resolved Colin Patrick McCabe
         
        19.
        HDFS-347: DN should chmod socket path a+w Sub-task Resolved Colin Patrick McCabe
         
        20.
        DFSClient: don't create a domain socket unless we need it Sub-task Resolved Colin Patrick McCabe
         
        21.
        allow use of legacy blockreader Sub-task Resolved Colin Patrick McCabe
         
        22.
        fix various bugs in short circuit read Sub-task Resolved Colin Patrick McCabe
         

          Activity

            People

            • Assignee:
              Colin Patrick McCabe
              Reporter:
              George Porter
            • Votes:
              12 Vote for this issue
              Watchers:
              118 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development