Uploaded image for project: 'Hadoop HDFS'
  1. Hadoop HDFS
  2. HDFS-4817

make HDFS advisory caching configurable on a per-file basis

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Closed
    • Minor
    • Resolution: Fixed
    • 3.0.0-alpha1
    • 2.2.0
    • hdfs-client
    • None

    Description

      HADOOP-7753 and related JIRAs introduced some performance optimizations for the DataNode. One of them was readahead. When readahead is enabled, the DataNode starts reading the next bytes it thinks it will need in the block file, before the client requests them. This helps hide the latency of rotational media and send larger reads down to the device. Another optimization was "drop-behind." Using this optimization, we could remove files from the Linux page cache after they were no longer needed.

      Using dfs.datanode.drop.cache.behind.writes and dfs.datanode.drop.cache.behind.reads can improve performance substantially on many MapReduce jobs. In our internal benchmarks, we have seen speedups of 40% on certain workloads. The reason is because if we know the block data will not be read again any time soon, keeping it out of memory allows more memory to be used by the other processes on the system. See HADOOP-7714 for more benchmarks.

      We would like to turn on these configurations on a per-file or per-client basis, rather than on the DataNode as a whole. This will allow more users to actually make use of them. It would also be good to add unit tests for the drop-cache code path, to ensure that it is functioning as we expect.

      Attachments

        1. HDFS-4817.001.patch
          80 kB
          Colin McCabe
        2. HDFS-4817.002.patch
          85 kB
          Colin McCabe
        3. HDFS-4817.004.patch
          87 kB
          Colin McCabe
        4. HDFS-4817.006.patch
          88 kB
          Colin McCabe
        5. HDFS-4817.007.patch
          86 kB
          Colin McCabe
        6. HDFS-4817.008.patch
          88 kB
          Colin McCabe
        7. HDFS-4817.009.patch
          88 kB
          Colin McCabe
        8. HDFS-4817.010.patch
          90 kB
          Colin McCabe
        9. HDFS-4817-b2.1.001.patch
          91 kB
          Colin McCabe

        Issue Links

          Activity

            People

              cmccabe Colin McCabe
              cmccabe Colin McCabe
              Votes:
              0 Vote for this issue
              Watchers:
              18 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: