Currently implemented range repartition algorithm has two major problems as follows:
- It assumes that data distribution is uniform, so is fragile for skewed data distribution.
- Given floating point values, it ignores the numbers to the right to the decimal point, so is difficult to guess the proper partition number.
One of the solutions for this problem is to use the histogram. With a histogram, we can figure out data distribution and provide a proper handling of floating point values.