[KUDU-3135] Add Client Metadata Tokens - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Open
Priority: Major
Resolution: Unresolved
Affects Version/s: 1.12.0
Fix Version/s: None
Component/s: client
Labels:
- roadmap-candidate
- scalability

Description

Currently when a distributed task is done using the Kudu client, the driver/coordinator client needs to open the table to request its current metadata and locations. Then it can distribute the work to tasks/executors on remote nodes. In the case of reading data, often ScanTokens are used to distribute the work, and in the case of writing data perhaps just the table name is required.

The problem is that each parallel task then also needs to open the table to request the metadata for the table. Using Spark as an example, this happens when deserializing the scan tokens in KuduRDD (here) or when writing rows using the KuduContext (here). This results in a large burst of metadata requests to the leader Kudu master all at once. Given the Kudu master is only a single server and requests can't be served from the follower masters, this effectively limits the amount of parallel tasks that can run in a large Kudu deployment. Even if the follower masters could service the requests, that still limits scalability in very large clusters given most deployments would only have 3-5 masters.

Adding a metadata token, similar to a scan token, would be a useful way to allow the single driver to fetch all the metadata required for the parallel tasks. The tokens can be serialized and then passed to each task in a similar fashion to scan tokens.

Of course in a pessimistic case, something may change between generation of the token and the start of the task. In that case a request would need to be sent to get the updated metadata. However, that scenario should be rare and likely would not result in all of the requests happening at the same time.

Attachments

Issue Links

is related to

KUDU-3115 Improve scalability of Kudu masters

Open

KUDU-3146 Consistent table metadata for scans

Open

relates to

KUDU-1802 Deserializing scan tokens should avoid round-trip to master

Resolved

Activity

People

Assignee:: Unassigned

Reporter:: Grant Henke

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Dates

Created:: 27/May/20 21:16

Updated:: 13/Jul/21 20:02