Hadoop HDFS
  1. Hadoop HDFS
  2. HDFS-4949

Centralized cache management in HDFS

    Details

    • Type: New Feature New Feature
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 3.0.0, 2.3.0
    • Fix Version/s: 2.3.0
    • Component/s: datanode, namenode
    • Labels:
      None

      Description

      HDFS currently has no support for managing or exposing in-memory caches at datanodes. This makes it harder for higher level application frameworks like Hive, Pig, and Impala to effectively use cluster memory, because they cannot explicitly cache important datasets or place their tasks for memory locality.

      1. caching-design-doc-2013-07-02.pdf
        270 kB
        Andrew Wang
      2. caching-design-doc-2013-08-09.pdf
        305 kB
        Andrew Wang
      3. caching-design-doc-2013-10-24.pdf
        312 kB
        Colin Patrick McCabe
      4. caching-testplan.pdf
        99 kB
        Stephen Chu
      5. hdfs-4949-branch-2.patch
        698 kB
        Andrew Wang
      6. HDFS-4949-consolidated.patch
        503 kB
        Andrew Wang

        Issue Links

        1.
        Add JNI mlock support Sub-task Resolved Andrew Wang
         
        2.
        Propagate cache status information from the DataNode to the NameNode Sub-task Resolved Andrew Wang
         
        3.
        Add DataNode support for mlock and munlock Sub-task Resolved Andrew Wang
         
        4.
        Add cacheRequest/uncacheRequest support to NameNode Sub-task Resolved Colin Patrick McCabe
         
        5.
        NameNode should invoke DataNode APIs to coordinate caching Sub-task Resolved Andrew Wang
         
        6.
        add RPCs for creating and manipulating cache pools Sub-task Resolved Colin Patrick McCabe
         
        7.
        add command-line support for manipulating cache pools Sub-task Resolved Colin Patrick McCabe
         
        8.
        Add cache status information to datanode heartbeat Sub-task Resolved Andrew Wang
         
        9.
        add command-line support for manipulating cache directives Sub-task Resolved Colin Patrick McCabe
         
        10.
        miscellaneous cache pool RPC fixes Sub-task Resolved Colin Patrick McCabe
         
        11.
        prettier dfsadmin -listCachePools output Sub-task Resolved Colin Patrick McCabe
         
        12.
        revisit zero-copy API in FSDataInputStream to make it more intuitive Sub-task Resolved Colin Patrick McCabe
         
        13.
        Automatically cache new data added to a cached path Sub-task Resolved Colin Patrick McCabe
         
        14.
        Persist CacheManager state in the edit log Sub-task Resolved Andrew Wang
         
        15.
        Support for federated cache pools Sub-task Resolved Andrew Wang
         
        16.
        caching PB cleanups Sub-task Resolved Colin Patrick McCabe
         
        17.
        Move cache pool related CLI commands to CacheAdmin Sub-task Resolved Andrew Wang
         
        18.
        NameNodeRpcServer must not send back DNA_FINALIZE in reply to a cache report Sub-task Resolved Colin Patrick McCabe
         
        19.
        NativeIO: consolidate getrlimit into NativeIO#getMemlockLimit Sub-task Resolved Colin Patrick McCabe
         
        20.
        Fix some failing unit tests on HDFS-4949 branch Sub-task Resolved Andrew Wang
         
        21.
        separate PathBasedCacheEntry and PathBasedCacheDirectiveWithId Sub-task Resolved Colin Patrick McCabe
         
        22.
        Refactor PathBasedCache* methods to use a Path rather than a String Sub-task Resolved Chris Nauroth
         
        23.
        Change PathBasedCacheDirective APIs to be a single value rather than batch Sub-task Resolved Andrew Wang
         
        24.
        Add requesting user's name to PathBasedCacheEntry Sub-task Resolved Andrew Wang
         
        25.
        Expose if a block replica is cached in getFileBlockLocations Sub-task Resolved Andrew Wang
         
        26.
        Fix failing caching unit tests Sub-task Resolved Andrew Wang
         
        27.
        Do not expose CachePool type in AddCachePoolOp Sub-task Resolved Colin Patrick McCabe
         
        28.
        Add datanode caching metrics Sub-task Resolved Andrew Wang
         
        29.
        add modifyDirective to cacheAdmin Sub-task Resolved Colin Patrick McCabe
         
        30.
        Fix error message when dfs.datanode.max.locked.memory is improperly configured Sub-task Resolved Colin Patrick McCabe
         
        31.
        DNA_CACHE and DNA_UNCACHE should be by blockId only Sub-task Resolved Colin Patrick McCabe
         
        32.
        Add replication field to PathBasedCacheDirective Sub-task Resolved Colin Patrick McCabe
         
        33.
        Allow LightWeightGSet#Iterator to remove elements Sub-task Resolved Colin Patrick McCabe
         
        34.
        recaching improvements Sub-task Resolved Colin Patrick McCabe
         
        35.
        In CacheReport, don't send genstamp and length on the wire Sub-task Resolved Colin Patrick McCabe
         
        36.
        fix broken caching unit tests Sub-task Resolved Andrew Wang
         
        37.
        Loading fsimage fails to find cache pools during namenode startup. Sub-task Resolved Chris Nauroth
         
        38.
        Concurrent clients that add a cache directive on the same path may prematurely uncache from each other. Sub-task Resolved Chris Nauroth
         
        39.
        fix race conditions in DN caching and uncaching Sub-task Resolved Colin Patrick McCabe
         
        40.
        Add feature documentation for datanode caching. Sub-task Resolved Colin Patrick McCabe
         
        41.
        Caching RPCs are AtMostOnce, but do not persist client ID and call ID to edit log. Sub-task Resolved Chris Nauroth
         
        42.
        Resolve regressions in Windows compatibility on HDFS-4949 branch. Sub-task Resolved Chris Nauroth
         
        43.
        Fix possible RetryCache hang for caching RPC handlers in FSNamesystem Sub-task Resolved Andrew Wang
         
        44.
        Fixup test-patch.sh warnings on HDFS-4949 branch Sub-task Resolved Andrew Wang
         
        45.
        Support TTL on CacheDirectives Sub-task Resolved Andrew Wang
         
        46.
        support cachepool-based limit management in path-based caching Sub-task Resolved Andrew Wang
         
        47.
        better API for getting the cached blocks locations Sub-task Resolved Andrew Wang
         
        48.
        Add byte and file statistics to PathBasedCacheEntry Sub-task Resolved Colin Patrick McCabe
         
        49.
        Consistent naming of user-visible caching classes and methods Sub-task Resolved Colin Patrick McCabe
         
        50.
        add command-line support for modifyDirective Sub-task Resolved Colin Patrick McCabe
         
        51.
        Consider maximum DN memory, stale status when scheduling recaching Sub-task Resolved Colin Patrick McCabe
         
        52.
        TestPathBasedCacheRequests#testReplicationFactor is flaky Sub-task Resolved Andrew Wang
         
        53.
        improve CacheManipulator interface to allow better unit testing Sub-task Resolved Colin Patrick McCabe
         
        54.
        loading cache path directives from edit log doesn't update nextEntryId Sub-task Resolved Colin Patrick McCabe
         
        55.
        skip checksums when reading a cached block via non-local reads Sub-task Resolved Colin Patrick McCabe
         
        56.
        fix narrow race condition in TestPathBasedCacheRequests Sub-task Resolved Colin Patrick McCabe
         
        57.
        Rename "path.based" caching configuration options Sub-task Resolved Andrew Wang
         
        58.
        add some more NameNode cache statistics, cache pool stats Sub-task Resolved Colin Patrick McCabe
         
        59.
        Refactor tests in TestCacheDirectives Sub-task Resolved Andrew Wang
         
        60.
        CacheAdmin help should match against non-dashed commands Sub-task Resolved Andrew Wang
         
        61.
        Namenode loops caching and uncaching when data should be uncached Sub-task Resolved Andrew Wang
         
        62.
        Hook up cache directive and pool usage statistics Sub-task Resolved Andrew Wang
         
        63.
        allow BlockReaderLocal to switch between checksumming and not Sub-task Closed Colin Patrick McCabe
         
        64.
        Enforce a max TTL per cache pool Sub-task Resolved Andrew Wang
         
        65.
        Remove dfs.namenode.caching.enabled and improve CRM locking Sub-task Resolved Colin Patrick McCabe
         
        66.
        The CacheManager throws a NPE in the DataNode logs when processing cache reports that refer to a block not known to the BlockManager Sub-task Resolved Colin Patrick McCabe
         

          Activity

          Andrew Wang created issue -
          Andrew Wang made changes -
          Field Original Value New Value
          Attachment caching-design-doc-2013-07-02.pdf [ 12590505 ]
          Eli Collins made changes -
          Assignee Andrew Wang [ andrew.wang ]
          Sanjay Radia made changes -
          Link This issue relates to HDFS-2832 [ HDFS-2832 ]
          Colin Patrick McCabe made changes -
          Link This issue is related to HDFS-4952 [ HDFS-4952 ]
          Colin Patrick McCabe made changes -
          Link This issue is related to HDFS-4952 [ HDFS-4952 ]
          Colin Patrick McCabe made changes -
          Link This issue is related to HDFS-4953 [ HDFS-4953 ]
          Arun C Murthy made changes -
          Affects Version/s 2.3.0 [ 12324588 ]
          Affects Version/s 2.2.0 [ 12324630 ]
          Andrew Wang made changes -
          Attachment caching-design-doc-2013-08-09.pdf [ 12597176 ]
          Chris Nauroth made changes -
          Link This issue is related to HDFS-5195 [ HDFS-5195 ]
          Chris Nauroth made changes -
          Link This issue is related to HDFS-5197 [ HDFS-5197 ]
          Colin Patrick McCabe made changes -
          Link This issue relates to HDFS-5202 [ HDFS-5202 ]
          Chris Nauroth made changes -
          Link This issue is related to HDFS-5203 [ HDFS-5203 ]
          Chris Nauroth made changes -
          Link This issue is related to HDFS-5266 [ HDFS-5266 ]
          Chris Nauroth made changes -
          Link This issue is related to HDFS-5269 [ HDFS-5269 ]
          Chris Nauroth made changes -
          Link This issue is related to HDFS-5313 [ HDFS-5313 ]
          Chris Nauroth made changes -
          Link This issue is related to HDFS-5373 [ HDFS-5373 ]
          Chris Nauroth made changes -
          Link This issue is related to HDFS-5385 [ HDFS-5385 ]
          Chris Nauroth made changes -
          Link This issue is related to HDFS-5388 [ HDFS-5388 ]
          Stephen Chu made changes -
          Attachment caching-testplan.pdf [ 12609978 ]
          Andrew Wang made changes -
          Attachment HDFS-4949-consolidated.patch [ 12610166 ]
          Andrew Wang made changes -
          Status Open [ 1 ] Patch Available [ 10002 ]
          Colin Patrick McCabe made changes -
          Attachment caching-design-doc-2013-10-24.pdf [ 12610186 ]
          Andrew Wang made changes -
          Attachment HDFS-4949-consolidated.patch [ 12610166 ]
          Andrew Wang made changes -
          Attachment HDFS-4949-consolidated.patch [ 12610221 ]
          Arun C Murthy made changes -
          Link This issue is related to YARN-1488 [ YARN-1488 ]
          Andrew Wang made changes -
          Attachment hdfs-4949-branch-2.patch [ 12624211 ]
          Andrew Wang made changes -
          Status Patch Available [ 10002 ] Resolved [ 5 ]
          Fix Version/s 2.4.0 [ 12324588 ]
          Resolution Fixed [ 1 ]
          Arun C Murthy made changes -
          Affects Version/s 2.3.0 [ 12325255 ]
          Affects Version/s 2.4.0 [ 12324588 ]
          Fix Version/s 2.3.0 [ 12325255 ]
          Fix Version/s 2.4.0 [ 12324588 ]
          Gopal V made changes -
          Link This issue relates to HIVE-6347 [ HIVE-6347 ]
          Arun C Murthy made changes -
          Status Resolved [ 5 ] Closed [ 6 ]

            People

            • Assignee:
              Andrew Wang
              Reporter:
              Andrew Wang
            • Votes:
              0 Vote for this issue
              Watchers:
              96 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development