Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-7560

Better selectivity estimate for != (not equals) binary predicate

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • Impala 2.8.0, Impala 2.9.0, Impala 2.10.0, Impala 2.12.0, Impala 2.13.0
    • Impala 4.1.0, Impala 4.0.1
    • Frontend
    • None
    • ghx-label-5

    Description

      Currently we use the default selectivity estimate for any binary predicate with op other than EQ / NON_DISTINCT.

      // Determine selectivity
          // TODO: Compute selectivity for nested predicates.
          // TODO: Improve estimation using histograms.
          Reference<SlotRef> slotRefRef = new Reference<SlotRef>();
          if ((op_ == Operator.EQ || op_ == Operator.NOT_DISTINCT)
              && isSingleColumnPredicate(slotRefRef, null)) {
            long distinctValues = slotRefRef.getRef().getNumDistinctValues();
            if (distinctValues > 0) {
              selectivity_ = 1.0 / distinctValues;
              selectivity_ = Math.max(0, Math.min(1, selectivity_));
            }
          }
      

      This can give very conservative estimates. For example:

      [localhost:21000] tpch> select * from nation where n_regionkey != 1;
      [localhost:21000] tpch> summary;
      +--------------+--------+----------+----------+-------+------------+-----------+---------------+-------------+
      | Operator     | #Hosts | Avg Time | Max Time | *#Rows* | *Est. #Rows* | Peak Mem  | Est. Peak Mem | Detail      |
      +--------------+--------+----------+----------+-------+------------+-----------+---------------+-------------+
      | 00:SCAN HDFS | 1      | 3.32ms   | 3.32ms   | *20*    | *3*          | 143.00 KB | 16.00 MB      | tpch.nation |
      +--------------+--------+----------+----------+-------+------------+-----------+---------------+-------------+
      [localhost:21000] tpch> 
      

      Ideally we could've inversed the selecitivity to 4/5 (=1 - 1/5) that can give better estimate.

      Attachments

        Issue Links

          Activity

            People

              liuyao liuyao
              bharathv Bharath Vissapragada
              Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: