Details
-
Bug
-
Status: Closed
-
Major
-
Resolution: Fixed
-
1.26.0
Description
Previous implementation of RelMdUtil#numDistinctVals uses the approximation ln(1 + x) ~= x when x is small.
However CALCITE-4132 remove this approximation to make the result more accurate. This causes the function to calculate an incorrect result for large inputs (for example, when domainSize = 1e18 and numSelected = 1e10 the result is 0) due to precision problems.
What I would suggest is to treat small and large inputs in different ways. For small inputs we use the new, more precise function and for large inputs we use the old, approximated function.
Attachments
Issue Links
- causes
-
FLINK-19780 FlinkRelMdDistinctRowCount#getDistinctRowCount(Calc) will always return 0 when number of rows are large
- Closed
-
FLINK-21946 FlinkRelMdUtil.numDistinctVals produces exceptional Double.NaN result when domainSize is in range(0,1)
- Closed
- is caused by
-
CALCITE-4132 Estimate the number of distinct values more accurately
- Closed
- is duplicated by
-
CALCITE-5431 Method RelMdUtil$numDistinctVals() wrongly return zero if the method input domainSize is a very large value
- Resolved
- relates to
-
CALCITE-5431 Method RelMdUtil$numDistinctVals() wrongly return zero if the method input domainSize is a very large value
- Resolved
- links to