Details
-
New Feature
-
Status: Open
-
Major
-
Resolution: Unresolved
-
None
-
None
Description
I had the need for a retention solution in my cluster (delete keys in specific paths after some time). The idea was very similar to AWS S3 Lifecycle configurations (Expiration part).
https://docs.aws.amazon.com/AmazonS3/latest/userguide/lifecycle-configuration-examples.html
I made a design and already Implemented most of it, and would like to contribute back to Apache Ozone community.
Here is what included
- User should be able to create/remove/fetch lifecycle configurations for a specific S3 bucket.
- The lifecycle configurations will be executed periodically.
- Depending on the rules of the lifecycle configuration there could be different actions or even multiple actions.
- At the moment only expiration is supported (keys get deleted).
- The lifecycle configurations supports all buckets not only S3 buckets.
Design
Components
- Lifecycle configurations (will be stored in DB) consists of volumeName, bucketName and a list of rules
- A rule contains prefix (string), Expiration and an optional Filter.
- Expiration contains either days (integer) or Date (long)
- Filter contains prefix (string).
- S3G bucket endpoint needs few updates to accept ?/lifecycle
- ClientProtocol and all implementers provides (get, list, delete and create) lifecycle configuration
- RetentionManager will be running periodically.
- Fetches a lifecycle configurations list with the help of OM
- Executes each lifecycle configuration on a specific bucket
- Lifecycle configurations will be running on parallel (each one against different bucket).
Flow
- Users PUT/GET/DELETE lifecycle configurations via S3Gateway.
- The lifecycle configurations details will be sent to some handler to be processed.
- The lifecycle configurations will be saved to/fetched from the DB.
- RetentionManager will be running periodically in the Leader OM to execute these lifecycle configurations.
- RetentionManager will be issuing deletions for eligible keys.
Not a complete solution
The solution lacks some interesting features for example:
- The filter doesn't support `AND` yet.
- Only expiration is supported.
- A CLI to manage lifecycle configurations for all the buckets (At the moment S3G is the only supported entry).
But these kind of features can be added in the future.
I made some decisions that must be discussed before contributing (Current design)
Lifecycle configurations will be stored in its own column family in the DB instead being a filed in the OmBucketInfo.
I preferred the lifecycle configuration to have its own table for two reasons:
- No need to modify OmBucketInfo table.
- The way the Retention manager Works, this way It will query only the buckets that has an attached lifecycle configuration. if the lifecycle is a filed in OmBucketInfo it will have to query all the buckets and filter the ones that has a LifecycleConfiguration.
If the other way is preferred, then I will get rid of LifecycleConfigurationsManager & the new codec.
To summarize this:
A new table for lifecycle configurations | A new field in OmBucketInfo |
---|---|
A new table | Existing table |
Efficient query | Less efficient |
A new manager (lifecycle manager) | No need |
A new codec | No need |
No need to alter existing design | Need to update the existing design |
Need to update Bucket Deletion. Delete the linked lifecycle configuration when the bucket is deleted. |
No need for updates |
Needs updates to create, get, list and delete lifecycle configuration in the BucketManager. |
Plan for contribution
The implementation is not small enough for review. I believe it needs to be split into few merge requests for better review. Here is my suggested breakdown.
- Basic building blocks (lifecycle configuration, rule, expiration, ...) And the related table (if needed).
- ClientProtocol & OzoneManager new operations (create, get, list, delete) lifecycle configurations (protobuf messages as well)
- S3G endpoints updates.
- The retention manager.
- All of them to be merged into a new branch (Let's call it X)
- Then merge branch X into master.
Please feel free to review the design and ask for more clarifications if needed.
Attachments
Attachments
Issue Links
- is part of
-
HDDS-1186 Ozone S3 gateway (phase III)
- Open
- is related to
-
HDDS-1932 Add support for object expiration in the s3 api
- Open
- relates to
-
HDDS-10435 Support S3 object tags for existing requests
- Resolved
- links to