Details
-
Sub-task
-
Status: Open
-
Major
-
Resolution: Unresolved
-
None
-
None
-
None
Description
In production, we have seen some critical big tables that handle majority of the load. Table Skew is becoming more important. With the update of table skew function, balancer finally works for large table distribution on large cluster. We should increase the weight from 35 to a level comparable to region count skew: 500. We can even push further to replace region count skew by table skew since the latter works in the same way and account for region distribution per node.
Another weight we found helpful to increase is for store file size cost function. Ideally if normalizer works perfectly, we don't need to worry about it since region count skew would have accounted for it. But we are often in a situation it doesn't. Store file distribution needs to be given more way as accommodation. we tested changing it from 5 to 200 and it works fine.