Details
-
Improvement
-
Status: Open
-
Minor
-
Resolution: Unresolved
-
None
-
None
-
None
-
None
Description
Copying tablets from an old cluster to another new cluster is a high resource consumed operation using the command : kudu local_replica copy_from_remote. As the follow picture shows: the usage of memory is as high as 75%. And the network is almost occupied fully (the overall network bandwidth is 2Gb/s). Disk reading is every high (the overall disk bandwidth is 200MB/s).
If the data size is very large, the copying process will last for a long time. Other service maybe get impacted and become unavailable. Therefore it is better to limit the tablets copying speed and make the system more stable. The goal is to balance the tablets copying speed and the impact to other services.
As copy_from_remote is mainly downloading data from the remote cluster and write the data to local file system, it is better to control the downloading speed to control the resource consumption. There are some algorithms to implement a rate limiter. This patch will use the token bucket algorithm implemented by Facebook Folly library: https://github.com/facebook/folly/blob/main/folly/TokenBucket.h
Performance Tests
1. Data size:
TABLE test_1
on disk size: 13263880213
live row count: 66433035
2. Test Case:
case 1:
kudu local_replica copy_from_remote xxx_tablet_ids src_tserver_adddr:7050 -fs_data_dirs=/test/data_dir -fs_wal_dir=/test/wal_dir -tablet_copy_download_threads_nums_per_session=4 -num_threads=4
case 2:
kudu local_replica copy_from_remote xxx_tablet_ids src_tserver_adddr:7050 -fs_data_dirs=/test/data_dir -fs_wal_dir=/test/wal_dir -tablet_copy_download_threads_nums_per_session=4 -num_threads=4 -enable_network_speed_limit=true -limit_network_speed=25
3. Results:
3.1 The usage of CPU
Left is test case 1, right is 2. As we can seek, using speed limit feature can reduce CPU comsumption.
3.2 Load of CPU
Left is case 1, right is case 2. As we can see, using speed limit feature can reduce CPU Load.
3.3 Network brandwidth
Left is case 1, right is case 2. As we can see, using speed limit feature can limit the network to 25MB/s nearly.