Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-3238

Streaming pre-aggregation falls over with high cardinality aggs

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Critical
    • Resolution: Fixed
    • Affects Version/s: Impala 2.5.0
    • Fix Version/s: impala 2.5.1
    • Component/s: Backend
    • Labels:
      None

      Description

      If the cardinality of a streaming pre-aggregation is quite large (eg ~1B in a single fragment) I see the following behavior:

      • each of the 16 partitions ends up with about 62M entries (1B/16)
      • The target hashtable load factor is 0.75, so for this number of entries, 64M buckets is not enough. It wants to expand to 128M buckets.
      • sizeof(BuckeT) is 16, so 128M buckets is a 2GB bucket array.
      • BufferedBlockMgr::ConsumeMemory logs a warning and fails when trying to allocate the 2G array (IMPALA-1619)

      On every row batch, the partitioned agg node tries again to expand the hashtable, resulting in one log message per row batch. The log messages also include GetStackTrace() which is quite slow. This flood of logging makes the query basically hang.

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                tarmstrong Tim Armstrong
                Reporter:
                tlipcon Todd Lipcon
              • Votes:
                0 Vote for this issue
                Watchers:
                6 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: