Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-7604

In AggregationNode.computeStats, handle cardinality overflow better

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • Impala 2.12.0
    • Impala 3.4.0
    • Frontend
    • None

    Description

      Consider the cardinality overflow logic inĀ AggregationNode.computeStats(). Current code:

          // if we ended up with an overflow, the estimate is certain to be wrong
          if (cardinality_ < 0) cardinality_ = -1;
      

      This code has a number of issues.

      • The check is done after looping over all conjuncts. It could be that, as a result, the number overflowed twice. The check should be done after each multiplication.
      • Since we know that the number overflowed, a better estimate of the total count is Long.MAX_VALUE.
      • The code later checks for the -1 value and, if found, uses the cardinality of the first child. This is a worse estimate than using the max value, since the first child might have a low cardinality (it could be the later children that caused the overflow.)
      • If we really do expect overflow, then we are dealing with very large numbers. Being accurate to the row is not needed. Better to use a double which can handle the large values.

      Since overflow probably seldom occurs, this is not an urgent issue. Though, if overflow does occur, the query is huge, and having at least some estimate of the hugeness is better than none. Also, seems that this code probably evolved; this newbie is looking at it fresh and seeing that the accumulated fixes could be tidied up.

      Attachments

        Activity

          People

            tarmstrong Tim Armstrong
            Paul.Rogers Paul Rogers
            Votes:
            1 Vote for this issue
            Watchers:
            8 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: