Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-12371

Add better cardinality estimation for Iceberg V2 tables with deletes

    XMLWordPrintableJSON

Details

    • ghx-label-11

    Description

      IMPALA-11797 is about the generic case, i.e. better cardinality for all ANTI JOIN operators.

      For Iceberg V2 we can safely come up with a better cardinality estimation as we can assume that all rows at RHS have a match in LHS when there is no filtering. Though RHS might contain duplicate rows, see:
      https://github.com/apache/iceberg/blob/462a203e67dd42d111a7fd2d3a0090b5aeb80833/api/src/main/java/org/apache/iceberg/RowDelta.java#L132-L133

      So we can come up something like this:
      Cardinality of DELETE operator = Cardinality(LHS) - (Cardinality(RHS) * selectivity of LHS)
      With some safety checks if it becomes negative (due to duplicates in RHS).

      Attachments

        Activity

          People

            boroknagyz Zoltán Borók-Nagy
            boroknagyz Zoltán Borók-Nagy
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: