Uploaded image for project: 'Calcite'
  1. Calcite
  2. CALCITE-4351

RelMdUtil#numDistinctVals always returns 0 for large inputs

VotersWatch issueWatchersLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    Description

      Previous implementation of RelMdUtil#numDistinctVals uses the approximation ln(1 + x) ~= x when x is small.

      However CALCITE-4132 remove this approximation to make the result more accurate. This causes the function to calculate an incorrect result for large inputs (for example, when domainSize = 1e18 and numSelected = 1e10 the result is 0) due to precision problems.

      What I would suggest is to treat small and large inputs in different ways. For small inputs we use the new, more precise function and for large inputs we use the old, approximated function.

      Attachments

        Issue Links

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            fan_li_ya Liya Fan
            TsReaper Caizhi Weng
            Votes:
            0 Vote for this issue
            Watchers:
            9 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Time Tracking

                Estimated:
                Original Estimate - Not Specified
                Not Specified
                Remaining:
                Remaining Estimate - 0h
                0h
                Logged:
                Time Spent - 20m
                20m

                Slack

                  Issue deployment