Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-7602

Definition of NDV differs between planner and stats mechanism

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Open
    • Major
    • Resolution: Unresolved
    • None
    • None
    • Frontend
    • None
    • ghx-label-8

    Description

      See IMPALA-7310 which says that the Impala NDV function is implemented as "number of non-null distinct values." IMPALA-7310 also says that the stats gathering mechanism uses the same definition.

      Down in the comments, we point to ExprNdvTest which shows that, in the planner itself, when working with constant expressions, NULL is considered a distinct value.

      In the case described in IMPALA-7310, this means that a column of only nulls has an NDV=0 if stats are used, NDV=1 if constants are used.

      This is a minor point, but would be good to use a single definition everywhere. That way, if we use the "number of non-null distinct values" rule, the "adjusted NDV" is always one more than the "raw" NDV. As it is now, we can't be sure when to add the null adjustment because we don't know if it is already included.

      Attachments

        Activity

          People

            Unassigned Unassigned
            Paul.Rogers Paul Rogers
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated: