Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-7528

Division by zero when computing cardinalities of many to many joins on NULL columns

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Critical
    • Resolution: Fixed
    • Affects Version/s: Impala 2.12.0
    • Fix Version/s: Impala 3.1.0
    • Component/s: Frontend
    • Labels:

      Description

      The following:

      | F00:PLAN FRAGMENT [RANDOM] hosts=1 instances=1                     |
      | Per-Host Resources: mem-estimate=33.94MB mem-reservation=1.94MB    |
      | 02:HASH JOIN [INNER JOIN, BROADCAST]                               |
      | |  hash predicates: b.code = a.code                                |
      | |  fk/pk conjuncts: none                                           |
      | |  runtime filters: RF000 <- a.code                                |
      | |  mem-estimate=1.94MB mem-reservation=1.94MB spill-buffer=64.00KB |
      | |  tuple-ids=1,0 row-size=163B cardinality=9223372036854775807     |   <==== Estimation due to overflow.
      | |                                                                  |
      | |--03:EXCHANGE [BROADCAST]                                         |
      | |  |  mem-estimate=0B mem-reservation=0B                           |
      | |  |  tuple-ids=0 row-size=82B cardinality=823                     |
      | |  |                                                               |
      | |  F01:PLAN FRAGMENT [RANDOM] hosts=1 instances=1                  |
      | |  Per-Host Resources: mem-estimate=32.00MB mem-reservation=0B     |
      | |  00:SCAN HDFS [default.sample_07 a, RANDOM]                      |
      | |     partitions=1/1 files=1 size=44.98KB                          |
      | |     stats-rows=823 extrapolated-rows=disabled                    |
      | |     table stats: rows=823 size=44.98KB                           |
      | |     column stats: all                                            |
      | |     mem-estimate=32.00MB mem-reservation=0B                      |
      | |     tuple-ids=0 row-size=82B cardinality=823                     |
      | |                                                                  |
      | 01:SCAN HDFS [default.sample_08 b, RANDOM]                         |
      |    partitions=1/1 files=1 size=44.99KB                             |
      |    runtime filters: RF000 -> b.code                                |
      |    stats-rows=823 extrapolated-rows=disabled                       |
      |    table stats: rows=823 size=44.99KB                              |
      |    column stats: all                                               |
      |    mem-estimate=32.00MB mem-reservation=0B                         |
      |    tuple-ids=1 row-size=82B cardinality=823                        |
      +--------------------------------------------------------------------+
      

      is the result of both join columns having 0 as NDV.
      https://github.com/cloudera/Impala/blob/cdh5-trunk/fe/src/main/java/org/apache/impala/planner/JoinNode.java#L368
      should handle this more gracefully.

      IMPALA-7310 makes it a bit more likely that someone will run into this.

        Attachments

          Activity

            People

            • Assignee:
              bikramjeet.vig Bikramjeet Vig
              Reporter:
              jeszyb Balazs Jeszenszky
            • Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: