Details

    • Type: Improvement Improvement
    • Status: Open
    • Priority: Major Major
    • Resolution: Unresolved
    • Affects Version/s: 2.4.0
    • Fix Version/s: None
    • Component/s: hdfs-client, namenode
    • Labels:
      None
    • Tags:
      ttl

      Description

      In production environment, we always have scenario like this, we want to backup files on hdfs for some time and then hope to delete these files automatically. For example, we keep only 1 day's logs on local disk due to limited disk space, but we need to keep about 1 month's logs in order to debug program bugs, so we keep all the logs on hdfs and delete logs which are older than 1 month. This is a typical scenario of HDFS TTL. So here we propose that hdfs can support TTL.

      Following are some details of this proposal:
      1. HDFS can support TTL on a specified file or directory
      2. If a TTL is set on a file, the file will be deleted automatically after the TTL is expired
      3. If a TTL is set on a directory, the child files and directories will be deleted automatically after the TTL is expired
      4. The child file/directory's TTL configuration should override its parent directory's
      5. A global configuration is needed to configure that whether the deleted files/directories should go to the trash or not
      6. A global configuration is needed to configure that whether a directory with TTL should be deleted when it is emptied by TTL mechanism or not.

      1. HDFS-TTL-Design.pdf
        106 kB
        Zesheng Wu
      2. HDFS-TTL-Design -2.pdf
        115 kB
        Zesheng Wu
      3. HDFS-TTL-Design-3.pdf
        122 kB
        Zesheng Wu

        Issue Links

          Activity

          Zesheng Wu created issue -
          Zesheng Wu made changes -
          Field Original Value New Value
          Description In production environment, we always have scenario like this, we want to backup files on hdfs for some time and then hope to delete theses files automatically. For example, we keep only 1 day's logs on local disk due to limited disk space, but we need to keep about 1 month's logs in order to debug program bugs, so we keep all the logs on hdfs and delete logs which are older than 1 month. This is a typical scenario of HDFS TTL. So here we propose that hdfs can support TTL.

          Following are some details of this proposal:
          1. HDFS can support TTL on a specified file or directory
          2. If a TTL is set on a file, the file will be deleted automatically after the TTL is expired
          3. If a TTL is set on a directory, the child files and directories will be deleted automatically after the TTL is expired
          4. The child file/directory's TTL configuration should override its parent directory's
          5. A global configuration is needed to configure that whether the deleted files/directories should go to the trash or not
          6. A global configuration is needed to configure that whether a directory with TTL should be deleted when it is emptied by TTL mechanism or not.
          In production environment, we always have scenario like this, we want to backup files on hdfs for some time and then hope to delete these files automatically. For example, we keep only 1 day's logs on local disk due to limited disk space, but we need to keep about 1 month's logs in order to debug program bugs, so we keep all the logs on hdfs and delete logs which are older than 1 month. This is a typical scenario of HDFS TTL. So here we propose that hdfs can support TTL.

          Following are some details of this proposal:
          1. HDFS can support TTL on a specified file or directory
          2. If a TTL is set on a file, the file will be deleted automatically after the TTL is expired
          3. If a TTL is set on a directory, the child files and directories will be deleted automatically after the TTL is expired
          4. The child file/directory's TTL configuration should override its parent directory's
          5. A global configuration is needed to configure that whether the deleted files/directories should go to the trash or not
          6. A global configuration is needed to configure that whether a directory with TTL should be deleted when it is emptied by TTL mechanism or not.
          Zesheng Wu made changes -
          Assignee Zesheng Wu [ wuzesheng ]
          Zesheng Wu made changes -
          Attachment HDFS-TTL-Design.pdf [ 12648936 ]
          Zesheng Wu made changes -
          Attachment HDFS-TTL-Design -2.pdf [ 12649736 ]
          Zesheng Wu made changes -
          Attachment HDFS-TTL-Design-3.pdf [ 12651958 ]
          Allen Wittenauer made changes -
          Link This issue duplicates HADOOP-2892 [ HADOOP-2892 ]
          Allen Wittenauer made changes -
          Link This issue duplicates HDFS-205 [ HDFS-205 ]
          Allen Wittenauer made changes -
          Link This issue is related to HDFS-268 [ HDFS-268 ]
          Allen Wittenauer made changes -
          Link This issue relates to HDFS-7044 [ HDFS-7044 ]

            People

            • Assignee:
              Zesheng Wu
              Reporter:
              Zesheng Wu
            • Votes:
              2 Vote for this issue
              Watchers:
              28 Start watching this issue

              Dates

              • Created:
                Updated:

                Development