Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-7603

Incorrect NDV expression for col1 mathop col2

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Minor
    • Resolution: Unresolved
    • None
    • None
    • Frontend
    • None
    • ghx-label-8

    Description

      Consider theĀ ExprNdvTest test case. The code contains tests for the CASE expression. Add tests for simple arithmetic expressions:

          verifyNdv("id + 2", 7300);
          verifyNdv("id * 2", 7300);
      

      The above suggests that the NDV of a column op const is

      max(NDV(column), NDV(const)) =
      max(NDV(column), 1) = NDV(column)
      

      This is good and as expected.

      Now try two columns:

          verifyNdv("id + int_col", 7300);
          verifyNdv("id * int_col", 7300);
      

      This is not expected. Though the two columns are from the same table, they are not correlated: there is no reason to believe that the value of "id" determines the value of "int_col" in the general case. (Perhaps the table is the Cartesian product of the two fields.)

      In this case, the calculation should be:

      NDV(a op b) = NDV(a) * NDV(b)
      

      There might be some back-off to account for overlapping results. Could not readily find a reference for these calcs.

      Attachments

        Activity

          People

            Unassigned Unassigned
            Paul.Rogers Paul Rogers
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated: