Uploaded image for project: 'IMPALA'
  2. IMPALA-4858

Provide better explanation for obscure Memory limit exceeded failures



    • Bug
    • Status: Resolved
    • Critical
    • Resolution: Fixed
    • Impala 2.9.0
    • Impala 2.9.0
    • Backend


      Today I found two memory limit exceeded failures which are hard to root cause, it is not clear what was consuming the memory at that time and how much memory the query was requesting, how much was available etc..

      ExecPlanRequest rpc query_id=9c43ca499b1c64c1:a6a91d3900000000 instance_id=9c43ca499b1c64c1:a6a91d3900000068 failed: 
      Memory limit exceeded
      Query 9c43ca499b1c64c1:a6a91d3900000000 could not start because the backend Impala daemon is over its memory limit

      It is not clear which Impala daemon didn't have enough memory.

      And this is the full profile for the failed query

      Estimated Per-Host Requirements: Memory=10.00MB VCores=1
      WARNING: The following tables are missing relevant table and/or column statistics.
      |  output: count:merge(*)
      |  hosts=133 per-host-mem=unavailable
      |  tuple-ids=1 row-size=8B cardinality=1
      |  hosts=133 per-host-mem=unavailable
      |  tuple-ids=1 row-size=8B cardinality=1
      |  output: count(*)
      |  hosts=133 per-host-mem=10.00MB
      |  tuple-ids=1 row-size=8B cardinality=1
      00:SCAN HDFS [tpcds_100000_parquet.store_returns, RANDOM]
         partitions=2004/2004 files=7087 size=1.50TB
         table stats: unavailable
         column stats: all
         hosts=133 per-host-mem=0B
         tuple-ids=0 row-size=0B cardinality=unavailable
          Estimated Per-Host Mem: 10485760
          Estimated Per-Host VCores: 1
          Tables Missing Stats: tpcds_100000_parquet.store_returns
          Request Pool: root.systest
          Admission result: Admitted immediately
      Operator       #Hosts  Avg Time  Max Time  #Rows  Est. #Rows  Peak Mem  Est. Peak Mem  Detail                         
      03:AGGREGATE        1   0.000ns   0.000ns      0           1         0        -1.00 B  FINALIZE                       
      02:EXCHANGE         1   0.000ns   0.000ns      0           1         0        -1.00 B  UNPARTITIONED                  
      01:AGGREGATE      133   0.000ns   0.000ns      0           1         0       10.00 MB                                 
      00:SCAN HDFS      133   0.000ns   0.000ns      0          -1         0              0  tpcds_100000_parquet.store_... 
          Planner Timeline: 9.968ms
             - Analysis finished: 1.704ms (1.704ms)
             - Equivalence classes computed: 1.806ms (101.124us)
             - Single node plan created: 7.763ms (5.957ms)
             - Runtime filters computed: 7.794ms (30.470us)
             - Distributed plan created: 7.958ms (164.549us)
             - Lineage info computed: 8.001ms (42.703us)
             - Planning finished: 9.968ms (1.967ms)
          Query Timeline: 1s238ms
             - Query submitted: 415.243us (415.243us)
             - Planning finished: 122.247ms (121.832ms)
             - Submit for admission: 148.126ms (25.878ms)
             - Completed admission: 149.066ms (940.329us)
             - Ready to start 134 fragment instances: 158.642ms (9.576ms)
             - All 134 fragment instances started: 873.469ms (714.826ms)
             - Unregister query: 1s235ms (362.225ms)
           - ComputeScanRangeAssignmentTimer: 12.877ms
           - ClientFetchWaitTimer: 0.000ns
           - RowMaterializationTimer: 0.000ns

      The second is

       AnalysisException: Failed to evaluate expr: 100
      CAUSED BY: InternalException: Memory limit exceeded

      And this is the full profile

       Impala Version: impalad version 2.9.0-SNAPSHOT RELEASE (build e0835699a2a2f2ce288ca306b3d7b020dc93a1e3)
          User: systest@foo
          Connected User: systest@foo
          Delegated User: 
          Network Address: ::ffff:
          Default Db: tpcds_30000_parquet
          Sql Statement: with customer_total_return as
       (select wr_returning_customer_sk as ctr_customer_sk
              ,ca_state as ctr_state, 
       	sum(wr_return_amt) as ctr_total_return
       from web_returns
       where wr_returned_date_sk = d_date_sk 
         and d_year =2002
         and wr_returning_addr_sk = ca_address_sk 
       group by wr_returning_customer_sk
        select  c_customer_id,c_salutation,c_first_name,c_last_name,c_preferred_cust_flag
       from customer_total_return ctr1
       where ctr1.ctr_total_return > (select avg(ctr_total_return)*1.2
       			  from customer_total_return ctr2 
                        	  where ctr1.ctr_state = ctr2.ctr_state)
             and ca_address_sk = c_current_addr_sk
             and ca_state = 'VT'
             and ctr1.ctr_customer_sk = c_customer_sk
       order by c_customer_id,c_salutation,c_first_name,c_last_name,c_preferred_cust_flag
      limit 100
          Coordinator: va1026.foo:22000
          Query Timeline: 34.362ms
             - Query submitted: 24.311us (24.311us)
             - Unregister query: 33.617ms (33.592ms)


        Issue Links



              tarmstrong Tim Armstrong
              mmokhtar Mostafa Mokhtar
              0 Vote for this issue
              3 Start watching this issue