Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-10445

The ability to adjust NDV's precision with query option

Attach filesAttach ScreenshotVotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Minor
    • Resolution: Fixed
    • Impala 4.0.0
    • Impala 4.0.0
    • Frontend
    • None

    Description

      Since IMPALA-2658, we can trade memory for more accurate NDV estimation. It is fascinating because tests showing error rate within 0.1% while no tremendous resource usage rise is found( #registers is 2 << 18). Users may have less complaint on computation precision in the future.

      However, the road to apply high precision NDV to production environment is uneven. 

      1) We have to re-write sqls for a large number of historical workloads. Which is time costing and is prone to error.

      2) Cluster users, aka sql writers, are reluctant to lower their expectations. It would be more convenient to have a way for cluster admins to adjust precision for each Admission Control queue according to cluster's resource usage(rough world).

      Propose:

      Add a new query option DEFAULT_NDV_SCALE to change the  default precision setting for NDV() 

      Implementation:

      1. Add a query option in FE
      2. If the option is set, use the matching NDV(<expr>, <scale>) function instead of NDV(). 

       

       

      Attachments

        Issue Links

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            fifteencai Fifteen
            fifteencai Fifteen
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment