[ATLAS-488] Atlas service scalability - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Open
Priority: Major
Resolution: Unresolved
Affects Version/s: None
Fix Version/s: None
Component/s: None
Labels:
None

Description

Requirement – We are looking for ways to have Atlas service running in multiple instances for Cloud deployment, and this requirement is kind of crucial to the adoption of Atlas.

On the dev discussion forum, I had discussed earlier (Jan 25th) on the requirement to run multiple instances of Atlas service/runtime for horizontal scale. Per answers that we got from Hemanth, Atlas has a current limitation that only instance can be active at any given point of time for a couple of reasons. Few reasons cited were ...
1) For performance reasons, the typesystem is cached in-memory. Type lookup is needed for query evaluation, instance serialization/deserialization etc.,

2) Titan locks for Hbase transactions, causing performance degradation.

Considering the above reasons, among them,
#1 can be mitigated in our scenarios. The types that we create are more or less design-time application data-models (metamodels) and they do not change at runtime. So if I were to pre-register types with Atlas before my cluster is up, I am hoping #1 is not an issue. In the sense, how many ever Atlas instances come up, all would have the same state, which would mean I am guaranteed API correctness.

I am not so sure how we can avoid #2. I neither understand the persistence mapping of metadata to the graph(Atlas) and the mapping of graph to the Hbase (Titan) to comment on whether #2 is really a problem for us in terms of lock-contention. Atlas as REST API inherently does not offer any "transaction" like facility to run a set of API for its end-users. Given that, we are more likely to implement some kind of optimistic locking strategy to avoid data inconsistencies.

Are there are any other reasons, that you can think of why multiple Atlas runtime instances cannot be talking to a single Hbase cluster for horizontal scale ?

Attachments

Issue Links

relates to

ATLAS-510 High availability of Atlas

Resolved

Sub-Tasks

1.	Refactor local type-system cache with cache provider interface		Resolved	venkata madugundu
2.	Redis-based implementation of type cache provider		Resolved	Dave Kantor

Activity

People

Assignee:: Unassigned

Reporter:: venkata madugundu

Votes:: 1 Vote for this issue

Watchers:: 8 Start watching this issue

Dates

Created:: 16/Feb/16 14:27

Updated:: 19/Nov/20 11:10