Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-2986

Aggregation spill loop gives up too early leading to mem limit exceeded errors

    XMLWordPrintableJSON

Details

    Description

      As noticed by Mostafa:

      When looking at the profile I can't reason why the query failed, the aggregations should have spilled more partitions and released memory. Also the error produce seems wrong.

      Memory limit exceeded
      The memory limit is set too low to initialize spilling operator (id=1). The minimum required memory to spill this operator is 264.00 MB.
      
          Query Options (non default): MEM_LIMIT=2147483648,REQUEST_POOL=HighThrpt
      
      04:EXCHANGE [UNPARTITIONED]
      |  hosts=72 per-host-mem=unavailable
      |  tuple-ids=1 row-size=80B cardinality=2281794842
      |
      03:AGGREGATE [FINALIZE]
      |  output: count:merge(*)
      |  group by: o_orderkey, o_comment
      |  having: count(*) > 1000
      |  hosts=72 per-host-mem=188.18GB
      |  tuple-ids=1 row-size=80B cardinality=2281794842
      |
      02:EXCHANGE [HASH(o_orderkey,o_comment)]
      |  hosts=72 per-host-mem=0B
      |  tuple-ids=1 row-size=80B cardinality=2281794842
      |
      01:AGGREGATE
      |  output: count(*)
      |  group by: o_orderkey, o_comment
      |  hosts=72 per-host-mem=188.18GB
      |  tuple-ids=1 row-size=80B cardinality=2281794842
      |
      00:SCAN HDFS [tpch_10000_decimal_parquet.orders, RANDOM]
         partitions=366/2406 files=732 size=100.30GB
         table stats: 15000000000 rows total
         column stats: all
         hosts=72 per-host-mem=176.00MB
         tuple-ids=0 row-size=72B cardinality=2281794842
      Per Node Peak Memory Usage: e1406.halxg.cloudera.com:22000(1.70 GB) e1411.halxg.cloudera.com:22000(1.72 GB) e1208.halxg.cloudera.com:22000(1.72 GB) e1203.halxg.cloudera.com:22000(1.76 GB) e1216.halxg.cloudera.com:22000(1.76 GB) e1413.halxg.cloudera.com:22000(1.73 GB) e1103.halxg.cloudera.com:22000(1.73 GB) e1207.halxg.cloudera.com:22000(1.72 GB)
         BlockMgr:
               - BlockWritesOutstanding: 0 (0)
               - BlocksCreated: 504 (504)
               - BlocksRecycled: 234 (234)
               - BufferedPins: 5 (5)
               - BytesWritten: 2.01 GB (2155872256)
               - MaxBlockSize: 8.00 MB (8388608)
               - MemoryLimit: 1.60 GB (1717986944)
               - PeakMemoryUsage: 1.60 GB (1717829632)
         AGGREGATION_NODE (id=3):(Total: 449.474ms, non-child: 0.000ns, % non-child: 0.00%)
               - BuildTime: 12s636ms
               - GetNewBlockTime: 725.061ms
               - GetResultsTime: 0.000ns
               - HTResizeTime: 1s949ms
               - HashBuckets: 0 (0)
               - LargestPartitionPercent: 0 (0)
               - MaxPartitionLevel: 0 (0)
               - NumRepartitions: 0 (0)
               - PartitionsCreated: 16 (16)
               - PeakMemoryUsage: 1.09 GB (1175338755)
               - PinTime: 0.000ns
               - RowsRepartitioned: 0 (0)
               - RowsReturned: 0 (0)
               - RowsReturnedRate: 0
               - SpilledPartitions: 3 (3)
               - UnpinTime: 18.093ms
      
           AGGREGATION_NODE (id=1):(Total: 33s123ms, non-child: 28s910ms, % non-child: 87.28%)
               - BuildTime: 25s522ms
               - GetNewBlockTime: 2s712ms
               - GetResultsTime: 587.284ms
               - HTResizeTime: 1s568ms
               - HashBuckets: 22.28M (22284086)
               - LargestPartitionPercent: 6 (6)
               - MaxPartitionLevel: 0 (0) 
               - NumRepartitions: 1 (1)
               - PartitionsCreated: 41 (41)
               - PeakMemoryUsage: 1.35 GB (1445108843)
               - PinTime: 306.510ms
               - RowsRepartitioned: 2.03M (2033033)
               - RowsReturned: 10.67M (10666100)
               - RowsReturnedRate: 326.62 K/sec
               - SpilledPartitions: 11 (11)
               - UnpinTime: 1s067ms
      

      Note that not all partitions were spilled and we had more memory than the required reservation at the time of the error.

      Attachments

        Activity

          People

            dhecht Daniel Hecht
            dhecht Daniel Hecht
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: