Uploaded image for project: 'Hadoop HDFS'
  1. Hadoop HDFS
  2. HDFS-4949 Centralized cache management in HDFS
  3. HDFS-5096

Automatically cache new data added to a cached path

    XMLWordPrintableJSON

Details

    • Sub-task
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • HDFS-4949
    • datanode, namenode
    • None

    Description

      For some applications, it's convenient to specify a path to cache, and have HDFS automatically cache new data added to the path without sending a new caching request or a manual refresh command.

      One example is new data appended to a cached file. It would be nice to re-cache a block at the new appended length, and cache new blocks added to the file.

      Another example is a cached Hive partition directory, where a user can drop new files directly into the partition. It would be nice if these new files were cached.

      In both cases, this automatic caching would happen after the file is closed, i.e. block replica is finalized.

      Attachments

        1. HDFS-5096-caching.014.patch
          204 kB
          Colin McCabe
        2. HDFS-5096-caching.002.patch
          41 kB
          Colin McCabe
        3. HDFS-5096-caching.012.patch
          200 kB
          Colin McCabe
        4. HDFS-5096-caching.011.patch
          199 kB
          Colin McCabe
        5. HDFS-5096-caching.010.patch
          199 kB
          Colin McCabe
        6. HDFS-5096-caching.009.patch
          193 kB
          Colin McCabe
        7. HDFS-5096-caching.006.patch
          200 kB
          Colin McCabe
        8. HDFS-5096-caching.005.patch
          205 kB
          Colin McCabe

        Issue Links

          Activity

            People

              cmccabe Colin McCabe
              andrew.wang Andrew Wang
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: