Uploaded image for project: 'Apache Ozone'
  1. Apache Ozone
  2. HDDS-8342

AWS S3 Lifecycle Configurations

Attach filesAttach ScreenshotAdd voteVotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • New Feature
    • Status: Open
    • Major
    • Resolution: Unresolved
    • None
    • None
    • OM, S3

    Description

      I had the need for a retention solution in my cluster (delete keys in specific paths after some time). The idea was very similar to AWS S3 Lifecycle configurations (Expiration part). 
      https://docs.aws.amazon.com/AmazonS3/latest/userguide/lifecycle-configuration-examples.html

      I made a design and already Implemented most of it, and would like to contribute back to Apache Ozone community.

      Here is what included

      1. User should be able to create/remove/fetch lifecycle configurations for a specific S3 bucket.
      2. The lifecycle configurations will be executed periodically.
      3. Depending on the rules of the lifecycle configuration there could be different actions or even multiple actions. 
      4. At the moment only expiration is supported (keys get deleted).
      5. The lifecycle configurations supports all buckets not only S3 buckets.

       

      Design

       

      Components

      1. Lifecycle configurations (will be stored in DB) consists of volumeName, bucketName and a list of rules
        • A rule contains prefix (string), Expiration and an optional Filter.
        • Expiration contains either days (integer) or Date (long)
        • Filter contains prefix (string).
      2. S3G bucket endpoint needs few updates to accept ?/lifecycle 
      3. ClientProtocol and all implementers provides (get, list, delete and create) lifecycle configuration
      4. RetentionManager will be running periodically.
        • Fetches a lifecycle configurations list with the help of OM
        • Executes each lifecycle configuration on a specific bucket
        • Lifecycle configurations will be running on parallel (each one against different bucket).

      Flow

      1. Users PUT/GET/DELETE lifecycle configurations via S3Gateway.
      2. The lifecycle configurations details will be sent to some handler to be processed.
      3. The lifecycle configurations will be saved to/fetched from the DB.
      4. RetentionManager will be running periodically in the Leader OM to execute these lifecycle configurations.
      5. RetentionManager will be issuing deletions for eligible keys.

       

      Not a complete solution

      The solution lacks some interesting features for example:

      • The filter doesn't support `AND` yet.
      • Only expiration is supported.
      • A CLI to manage lifecycle configurations for all the buckets (At the moment S3G is the only supported entry).

      But these kind of features can be added in the future.

       

       

      I made some decisions that must be discussed before contributing (Current design)

      Lifecycle configurations will be stored in its own column family in the DB instead being a filed in the OmBucketInfo.

      I preferred the lifecycle configuration to have its own table for two reasons:

      1. No need to modify OmBucketInfo table.
      2. The way the Retention manager Works, this way It will query only the buckets that has an attached lifecycle configuration. if the lifecycle is a filed in OmBucketInfo it will have to query all the buckets and filter the ones that has a LifecycleConfiguration.

      If the other way is preferred, then I will get rid of LifecycleConfigurationsManager & the new codec.

       

      To summarize this:

       

      A new table for lifecycle configurations A new field in OmBucketInfo
      A new table Existing table
      Efficient query Less efficient
      A new manager (lifecycle manager) No need
      A new codec  No need
      No need to alter existing design Need to update the existing design
      Need to update Bucket Deletion. Delete
      the linked lifecycle configuration when
      the bucket is deleted. 
      No need for updates
        Needs updates to create, get, list
      and delete lifecycle configuration
      in the BucketManager.

       

       

      Plan for contribution

      The implementation is not small enough for review. I believe it needs to be split into few merge requests for better review. Here is my suggested breakdown.

      1. Basic building blocks (lifecycle configuration, rule, expiration, ...) And the related table (if needed).
      2. ClientProtocol & OzoneManager new operations (create, get, list, delete) lifecycle configurations (protobuf messages as well)
      3. S3G endpoints updates.
      4. The retention manager.
      5. All of them to be merged into a new branch (Let's call it X)
      6. Then merge branch X into master.

       

      Please feel free to review the design and ask for more clarifications if needed.

       

      Attachments

        Issue Links

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            mohanad Mohanad Elsafty
            mohanad Mohanad Elsafty

            Dates

              Created:
              Updated:

              Slack

                Issue deployment