Details
-
New Feature
-
Status: Resolved
-
Critical
-
Resolution: Fixed
-
None
-
None
Description
Currently the base types in Atlas do not include AWS data lake objects. It would be nice to add typedefs for AWS data lake objects (buckets and pseudo-directories) and lineage processes that move the data from another source (e.g., kafka topic) to the data lake. For example:
- AWSS3PseudoDir type represents the pseudo-directory “prefix” of objects in an S3 bucket. For example, in the case of an object with key “myWork/Development/Projects1.xls”, “myWork/Development” is the pseudo-directory. It supports:
- Array of avro schemas that are associated with the data in the pseudo-directory (based on Avro schema extensions outlined in
ATLAS-2694) - what type of data it contains, e.g., avro, json, unstructured
- time of creation
- Array of avro schemas that are associated with the data in the pseudo-directory (based on Avro schema extensions outlined in
- AWSS3BucketLifeCycleRule type represents a rule specifying a transition of the data in a bucket to a storageClass after a specific time interval, or expiration. For example, transition to GLACIER after 60 days, or expire (i.e. be deleted) after 90 days:
- ruleType (e.g., transition or expiration)
- time interval in days before rule is executed
- storageClass to which the data is transitioned (null if ruleType is expiration)
- AWSTag type represents a tag-value pair created by the user and associated with an AWS object.
- tag
- value
- AWSCloudWatchMetric type represents a storage or request metric that is monitored by AWS CloudWatch and can be configured for a bucket
- metricName, for example, “AllRequests”, “GetRequests”, TotalRequestLatency, BucketSizeBytes
- scope: null if entire bucket; otherwise, the prefixes/tags that filter or limit the monitoring of the metric.
- AWSS3Bucket type represents a bucket in an S3 instance. It supports:
- Array of AWSS3PseudoDirectories that are associated with objects stored in the bucket
- AWS region
- IsEncrypted (boolean)
- encryptionType, e.g., AES-256
- S3AccessPolicy, a JSON object expressing access policies, eg GetObject, PutObject
- time of creation
- Array of AWSS3BucketLifeCycleRules that are associated with the bucket
- Array of AWSS3CloudWatchMetrics that are associated with the bucket or its tags or prefixes
- Array of AWSTags that are associated with the bucket
- Generic dataset2Dataset process to represent movement of data from one dataset to another. It supports:
- array of transforms performed by the process
- map of tag/value pairs representing configurationParameters of the process
- inputs and outputs are arrays of dataset objects, e.g., kafka topic and S3 pseudo-directory.
Attachments
Attachments
Issue Links
- relates to
-
ATLAS-2889 S3 object tag import hook
- Open