[HADOOP-16355] ZookeeperMetadataStore: Use Zookeeper as S3Guard backend store - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Sub-task
Status: Resolved
Priority: Major
Resolution: Abandoned
Affects Version/s: None
Fix Version/s: None
Component/s: fs/s3
Labels:
None

Target Version/s:

3.4.0

Description

When S3Guard was proposed, there are a couple of valid reasons to choose DynamoDB as its default backend store: 0) seamless integration as part of AWS ecosystem e.g. client library 1) it's a managed web service which is zero operational cost, highly available and infinitely scalable 2) it's performant with single digit latency 3) it's proven by Netflix's S3mper (not actively maintained) and EMRFS (closed source and usage). As it's pluggable, it's possible to implement MetadataStore with other backend store without changing semantics, besides null and in-memory local ones.

Here we propose ZookeeperMetadataStore which uses Zookeeper as S3Guard backend store. Its main motivation is to provide a new MetadataStore option which:

can be easily integrated as Zookeeper is heavily used in Hadoop community
affordable performance as both client and Zookeeper ensemble are usually "local" in a Hadoop cluster (ZK/HBase/Hive etc)
removes DynamoDB dependency

Obviously all use cases will not prefer this to default DynamoDB store. For e.g. ZK might not scale well if there are dozens of S3 buckets and each has millions of objects. Our use case is targeting HBase to store HFiles on S3 instead of HDFS. A total solution for HBase on S3 must be HBOSS (see ~~HBASE-22149~~) for recovering atomicity of metadata operations like rename, and S3Guard for consistent enumeration and access to object store bucket metadata. We would like to use Zookeeper as backend store for both.

Attachments

Activity

People

Assignee:: Unassigned

Reporter:: Mingliang Liu

Votes:: 1 Vote for this issue

Watchers:: 5 Start watching this issue

Dates

Created:: 07/Jun/19 18:29

Updated:: 25/Jan/21 18:54

Resolved:: 25/Jan/21 18:54