Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-3238

Streaming pre-aggregation falls over with high cardinality aggs

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Critical
    • Resolution: Fixed
    • Impala 2.5.0
    • impala 2.5.1
    • Backend
    • None

    Description

      If the cardinality of a streaming pre-aggregation is quite large (eg ~1B in a single fragment) I see the following behavior:

      • each of the 16 partitions ends up with about 62M entries (1B/16)
      • The target hashtable load factor is 0.75, so for this number of entries, 64M buckets is not enough. It wants to expand to 128M buckets.
      • sizeof(BuckeT) is 16, so 128M buckets is a 2GB bucket array.
      • BufferedBlockMgr::ConsumeMemory logs a warning and fails when trying to allocate the 2G array (IMPALA-1619)

      On every row batch, the partitioned agg node tries again to expand the hashtable, resulting in one log message per row batch. The log messages also include GetStackTrace() which is quite slow. This flood of logging makes the query basically hang.

      Attachments

        Issue Links

          Activity

            People

              tarmstrong Tim Armstrong
              tlipcon Todd Lipcon
              Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: