[KUDU-3036] RPC size multiplication for DDL operations might hit maximum RPC size limit - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: 1.0.0, 1.0.1, 1.1.0, 1.2.0, 1.3.0, 1.3.1, 1.4.0, 1.5.0, 1.6.0, 1.7.0, 1.8.0, 1.7.1, 1.9.0, 1.10.0, 1.10.1, 1.11.0, 1.11.1
Fix Version/s: 1.12.0
Component/s: master, rpc
Labels:
- operability
- scalability

Code Review:
https://gerrit.cloudera.org/#/c/14999/

Description

When a table uses multi-tier partitioning scheme, with large number of partitions created, an AlterTable request that affects many partitions/tablets turns into a much larger UpdateConsensus RPC when leader master pushes the corresponding update on the system tablet to follower masters.

I did some testing for this use case. With AlterTable RPC adding new range partitions, I observed the following:

With range x 2 hash partitions, with the incoming AlterTable RPC request size is 37070 bytes, the size for the corresponding UpdateConsensus is 274278 bytes (~ 7x multiplication factor).
With range x 10 hash partitions, with the incoming AlterTable RPC request size is 37070 bytes, the size for the corresponding UpdateConsensus when leader master pushes the updates on the system tablet to followers is 1365438 bytes (~ 36x multiplication factor).

With that, it's easy to hit the limit on the maximum PRC size (controlled via the --rpc_max_message_size flag) in case of larger Kudu clusters. If that happens, Kudu masters start continuous leader re-election cycle since follower masters don't receive any Raft heartbeats from their leader: the heartbeats are rejected at the lower RPC layer due to the maximum RPC size limit.

Attachments

Issue Links

is related to

KUDU-3016 Catalog manager: don't lump together all updates from one tablet report

Resolved

Activity

People

Assignee:: Alexey Serbin

Reporter:: Alexey Serbin

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Dates

Created:: 08/Jan/20 20:48

Updated:: 14/Jan/20 18:18

Resolved:: 14/Jan/20 18:18