Uploaded image for project: 'Hadoop Common'
  1. Hadoop Common
  2. HADOOP-15619 Über-JIRA: S3Guard Phase IV: Hadoop 3.3 features
  3. HADOOP-16279

S3Guard: Implement time-based (TTL) expiry for entries (and tombstones)

    XMLWordPrintableJSON

Details

    • Sub-task
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • 3.3.0
    • fs/s3
    • None

    Description

      In HADOOP-15621 we implemented TTL for Authoritative Directory Listings and added ExpirableMetadata. DDBPathMetadata extends PathMetadata extends ExpirableMetadata, so all metadata entries in ddb can expire, but the implementation is not done yet.

      To complete this feature the following should be done:

      • Add new tests for metadata entry and tombstone expiry to ITestS3GuardTtl
      • Implement metadata entry and tombstone expiry

      I would like to start a debate on whether we need to use separate expiry times for entries and tombstones. My +1 on not using separate settings - so only one config name and value.


      Notes:

      • In HADOOP-13649 the metadata TTL is implemented in LocalMetadataStore, using an existing feature in guava's cache implementation. Expiry is set with fs.s3a.s3guard.local.ttl.
      • LocalMetadataStore's TTL and this TTL is different. That TTL is using the guava cache's internal solution for the TTL of these entries. This is an S3AFileSystem level solution in S3Guard, a layer above all metadata store.
      • This is not the same, and not using the DDB's TTL feature. We need a different behavior than what ddb promises: cleaning once a day with a background job is not usable for this feature - although it can be used as a general cleanup solution separately and independently from S3Guard.
      • Use the same ttl for entries and authoritative directory listing
      • All entries can be expired. Then the returned metadata from the MS will be null.
      • Add two new methods pruneExpiredTtl() and pruneExpiredTtl(String keyPrefix) to MetadataStore interface. These methods will delete all expired metadata from the ms.
      • Use last_updated field in ms for both file metadata and authoritative directory expiry.

      Attachments

        Issue Links

          Activity

            People

              gabor.bota Gabor Bota
              gabor.bota Gabor Bota
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: