Uploaded image for project: 'Hadoop HDFS'
  1. Hadoop HDFS
  2. HDFS-15202

HDFS-client: boost ShortCircuit Cache

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Minor
    • Resolution: Fixed
    • None
    • 3.3.1, 3.4.0
    • dfsclient
    • None
    • 4 nodes E5-2698 v4 @ 2.20GHz, 700 Gb Mem.

      8 RegionServers (2 by host)

      8 tables by 64 regions by 1.88 Gb data in each = 900 Gb total

      Random read in 800 threads via YCSB and a little bit updates (10% of reads)

    Description

      ТотI want to propose how to improve reading performance HDFS-client. The idea: create few instances ShortCircuit caches instead of one.

      The key points:

      1. Create array of caches (set by clientShortCircuitNum=dfs.client.short.circuit.num, see in the pull requests below):

      private ClientContext(String name, DfsClientConf conf, Configuration config) {
      ...
          shortCircuitCache = new ShortCircuitCache[this.clientShortCircuitNum];
          for (int i = 0; i < this.clientShortCircuitNum; i++) {
            this.shortCircuitCache[i] = ShortCircuitCache.fromConf(scConf);
          }
      
      

      2 Then divide blocks by caches:

        public ShortCircuitCache getShortCircuitCache(long idx) {
          return shortCircuitCache[(int) (idx % clientShortCircuitNum)];
        }
      

      3. And how to call it:

      ShortCircuitCache cache = clientContext.getShortCircuitCache(block.getBlockId());
      

      The last number of offset evenly distributed from 0 to 9 - that's why all caches will full approximately the same.

      It is good for performance. Below the attachment, it is load test reading HDFS via HBase where clientShortCircuitNum = 1 vs 3. We can see that performance grows ~30%, CPU usage about +15%.

      Hope it is interesting for someone.
      Ready to explain some unobvious things.

      Attachments

        1. cpu_SSC.png
          39 kB
          Danil Lipovoy
        2. cpu_SSC2.png
          39 kB
          Danil Lipovoy
        3. HDFS_CPU_full_cycle.png
          86 kB
          Danil Lipovoy
        4. hdfs_cpu.png
          76 kB
          Danil Lipovoy
        5. hdfs_reads.png
          46 kB
          Danil Lipovoy
        6. hdfs_scc_3_test.png
          33 kB
          Danil Lipovoy
        7. hdfs_scc_test_full-cycle.png
          53 kB
          Danil Lipovoy
        8. HDFS-15202-Addendum-01.patch
          1 kB
          Ayush Saxena
        9. locks.png
          70 kB
          Danil Lipovoy
        10. requests_SSC.png
          43 kB
          Danil Lipovoy

        Issue Links

          Activity

            People

              pustota Danil Lipovoy
              pustota Danil Lipovoy
              Votes:
              0 Vote for this issue
              Watchers:
              21 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: