[HBASE-26178] Improve data structure and algorithm for BalanceClusterState to improve computation speed for large cluster - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: None
Fix Version/s: 2.5.0, 3.0.0-alpha-2
Component/s: Balancer, Performance
Labels:
None

Hadoop Flags:

Reviewed

Description

With ~800 node and ~500 regions per node on our large production cluster, balancer cannot complete within hours even after we just add 2% servers after maintenance.

The unit tests with larger number of regions are taking longer and longer with changes to balancer with recent changes too, evident with the increment of the time limit recent PR's included.

It is time to replace some of the data structure for better time complexity including:

int[][] regionsPerServer; // serverIndex -> region list

int[][] regionsPerHost; // hostIndex -> list of regions
int[][] regionsPerRack; // rackIndex -> region list

// serverIndex -> sorted list of regions by primary region index
ArrayList<HashSet<Integer>> primariesOfRegionsPerServer;

// hostIndex -> sorted list of regions by primary region index
int[][] primariesOfRegionsPerHost;

// rackIndex -> sorted list of regions by primary region index
int[][] primariesOfRegionsPerRack;

Areas of algorithm improvement include:

O(n ) to O(1) time to lookup or update per server/host/rack for every move test iteration.(n = number of regions per server/host/rack).
O(n ) to O(1) time for reserse lookup of region index from primary index.
Recomputation of primaryRegionCountSkewCostFunction reduced from O(n ) to O(1)

Attachments

Issue Links

is a child of

HBASE-25697 StochasticBalancer improvement for large scale clusters

Open

is related to

HBASE-24643 Replace Cluster#primariesOfRegionsPerServer from int array to treemap

Open

links to

Design Doc

GitHub Pull Request #3575

GitHub Pull Request #3682

Activity

People

Assignee:: Clara Xiong

Reporter:: Clara Xiong

Votes:: 0 Vote for this issue

Watchers:: 6 Start watching this issue

Dates

Created:: 07/Aug/21 00:16

Updated:: 26/Sep/21 18:24

Resolved:: 25/Sep/21 13:50