Uploaded image for project: 'Hadoop HDFS'
  1. Hadoop HDFS
  2. HDFS-13762

Support non-volatile storage class memory(SCM) in HDFS cache directives

    XMLWordPrintableJSON

    Details

    • Type: New Feature
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 3.3.0
    • Component/s: caching, datanode
    • Labels:
      None
    • Target Version/s:
    • Hadoop Flags:
      Reviewed
    • Release Note:
      Hide
      Non-volatile storage class memory (SCM, also known as persistent memory) is supported in HDFS cache. To enable SCM cache, user just needs to configure SCM volume for property “dfs.datanode.cache.pmem.dirs” in hdfs-site.xml. And all HDFS cache directives keep unchanged. There are two implementations for HDFS SCM Cache, one is pure java code implementation and the other is native PMDK based implementation. The latter implementation can bring user better performance gain in cache write and cache read. If PMDK native libs could be loaded, it will use PMDK based implementation otherwise it will fallback to java code implementation. To enable PMDK based implementation, user should install PMDK library by referring to the official site http://pmem.io/. Then, build Hadoop with PMDK support by referring to "PMDK library build options" section in `BUILDING.txt` in the source code. If multiple SCM volumes are configured, a round-robin policy is used to select an available volume for caching a block. Consistent with DRAM cache, SCM cache also has no cache eviction mechanism. When DataNode receives a data read request from a client, if the corresponding block is cached into SCM, DataNode will instantiate an InputStream with the block location path on SCM (pure java implementation) or cache address on SCM (PMDK based implementation). Once the InputStream is created, DataNode will send the cached data to the client. Please refer "Centralized Cache Management" guide for more details.
      Show
      Non-volatile storage class memory (SCM, also known as persistent memory) is supported in HDFS cache. To enable SCM cache, user just needs to configure SCM volume for property “dfs.datanode.cache.pmem.dirs” in hdfs-site.xml. And all HDFS cache directives keep unchanged. There are two implementations for HDFS SCM Cache, one is pure java code implementation and the other is native PMDK based implementation. The latter implementation can bring user better performance gain in cache write and cache read. If PMDK native libs could be loaded, it will use PMDK based implementation otherwise it will fallback to java code implementation. To enable PMDK based implementation, user should install PMDK library by referring to the official site http://pmem.io/ . Then, build Hadoop with PMDK support by referring to "PMDK library build options" section in `BUILDING.txt` in the source code. If multiple SCM volumes are configured, a round-robin policy is used to select an available volume for caching a block. Consistent with DRAM cache, SCM cache also has no cache eviction mechanism. When DataNode receives a data read request from a client, if the corresponding block is cached into SCM, DataNode will instantiate an InputStream with the block location path on SCM (pure java implementation) or cache address on SCM (PMDK based implementation). Once the InputStream is created, DataNode will send the cached data to the client. Please refer "Centralized Cache Management" guide for more details.

      Description

      No-volatile storage class memory is a type of memory that can keep the data content after power failure or between the power cycle. Non-volatile storage class memory device usually has near access speed as memory DIMM while has lower cost than memory.  So today It is usually used as a supplement to memory to hold long tern persistent data, such as data in cache. 

      Currently in HDFS, we have OS page cache backed read only cache and RAMDISK based lazy write cache.  Non-volatile memory suits for both these functions. 

      This Jira aims to enable storage class memory first in read cache. Although storage class memory has non-volatile characteristics, to keep the same behavior as current read only cache, we don't use its persistent characteristics currently. 

       

       

       

        Attachments

        1. HDFS_Persistent_Memory_Cache_Perf_Results.pdf
          334 kB
          Feilong He
        2. SCMCacheDesign-2019-07-16.pdf
          308 kB
          Feilong He
        3. SCMCacheDesign-2019-07-12.pdf
          245 kB
          Feilong He
        4. SCMCacheTestPlan-2019-3-27.pdf
          172 kB
          Feilong He
        5. SCMCacheDesign-2019-3-26.pdf
          242 kB
          Feilong He
        6. HDFS-13762.008.patch
          91 kB
          Feilong He
        7. HDFS-13762.007.patch
          87 kB
          Wei Zhou
        8. HDFS-13762.006.patch
          86 kB
          Wei Zhou
        9. HDFS-13762.005.patch
          81 kB
          Wei Zhou
        10. HDFS-13762.004.patch
          81 kB
          Wei Zhou
        11. HDFS-13762.003.patch
          80 kB
          Wei Zhou
        12. HDFS-13762.002.patch
          78 kB
          Wei Zhou
        13. HDFS-13762.001.patch
          79 kB
          Wei Zhou
        14. SCMCacheDesign-2018-11-08.pdf
          825 kB
          Sammi Chen
        15. SCMCacheTestPlan.pdf
          332 kB
          Sammi Chen
        16. HDFS-13762.000.patch
          79 kB
          Sammi Chen

          Issue Links

            Activity

              People

              • Assignee:
                PhiloHe Feilong He
                Reporter:
                Sammi Sammi Chen
              • Votes:
                0 Vote for this issue
                Watchers:
                30 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: