Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-4858

Provide better explanation for obscure Memory limit exceeded failures

Attach filesAttach ScreenshotVotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Critical
    • Resolution: Fixed
    • Impala 2.9.0
    • Impala 2.9.0
    • Backend

    Description

      Today I found two memory limit exceeded failures which are hard to root cause, it is not clear what was consuming the memory at that time and how much memory the query was requesting, how much was available etc..

      ExecPlanRequest rpc query_id=9c43ca499b1c64c1:a6a91d3900000000 instance_id=9c43ca499b1c64c1:a6a91d3900000068 failed: 
      Memory limit exceeded
      Query 9c43ca499b1c64c1:a6a91d3900000000 could not start because the backend Impala daemon is over its memory limit
      

      It is not clear which Impala daemon didn't have enough memory.

      And this is the full profile for the failed query

      ----------------
      Estimated Per-Host Requirements: Memory=10.00MB VCores=1
      WARNING: The following tables are missing relevant table and/or column statistics.
      tpcds_100000_parquet.store_returns
      
      PLAN-ROOT SINK
      |
      03:AGGREGATE [FINALIZE]
      |  output: count:merge(*)
      |  hosts=133 per-host-mem=unavailable
      |  tuple-ids=1 row-size=8B cardinality=1
      |
      02:EXCHANGE [UNPARTITIONED]
      |  hosts=133 per-host-mem=unavailable
      |  tuple-ids=1 row-size=8B cardinality=1
      |
      01:AGGREGATE
      |  output: count(*)
      |  hosts=133 per-host-mem=10.00MB
      |  tuple-ids=1 row-size=8B cardinality=1
      |
      00:SCAN HDFS [tpcds_100000_parquet.store_returns, RANDOM]
         partitions=2004/2004 files=7087 size=1.50TB
         table stats: unavailable
         column stats: all
         hosts=133 per-host-mem=0B
         tuple-ids=0 row-size=0B cardinality=unavailable
      ----------------
          Estimated Per-Host Mem: 10485760
          Estimated Per-Host VCores: 1
          Tables Missing Stats: tpcds_100000_parquet.store_returns
          Request Pool: root.systest
          Admission result: Admitted immediately
          ExecSummary: 
      Operator       #Hosts  Avg Time  Max Time  #Rows  Est. #Rows  Peak Mem  Est. Peak Mem  Detail                         
      ----------------------------------------------------------------------------------------------------------------------
      03:AGGREGATE        1   0.000ns   0.000ns      0           1         0        -1.00 B  FINALIZE                       
      02:EXCHANGE         1   0.000ns   0.000ns      0           1         0        -1.00 B  UNPARTITIONED                  
      01:AGGREGATE      133   0.000ns   0.000ns      0           1         0       10.00 MB                                 
      00:SCAN HDFS      133   0.000ns   0.000ns      0          -1         0              0  tpcds_100000_parquet.store_... 
          Planner Timeline: 9.968ms
             - Analysis finished: 1.704ms (1.704ms)
             - Equivalence classes computed: 1.806ms (101.124us)
             - Single node plan created: 7.763ms (5.957ms)
             - Runtime filters computed: 7.794ms (30.470us)
             - Distributed plan created: 7.958ms (164.549us)
             - Lineage info computed: 8.001ms (42.703us)
             - Planning finished: 9.968ms (1.967ms)
          Query Timeline: 1s238ms
             - Query submitted: 415.243us (415.243us)
             - Planning finished: 122.247ms (121.832ms)
             - Submit for admission: 148.126ms (25.878ms)
             - Completed admission: 149.066ms (940.329us)
             - Ready to start 134 fragment instances: 158.642ms (9.576ms)
             - All 134 fragment instances started: 873.469ms (714.826ms)
             - Unregister query: 1s235ms (362.225ms)
           - ComputeScanRangeAssignmentTimer: 12.877ms
        ImpalaServer:
           - ClientFetchWaitTimer: 0.000ns
           - RowMaterializationTimer: 0.000ns
      

      The second is

       AnalysisException: Failed to evaluate expr: 100
      CAUSED BY: InternalException: Memory limit exceeded
      

      And this is the full profile

       Impala Version: impalad version 2.9.0-SNAPSHOT RELEASE (build e0835699a2a2f2ce288ca306b3d7b020dc93a1e3)
          User: systest@foo
          Connected User: systest@foo
          Delegated User: 
          Network Address: ::ffff:10.17.187.36:55556
          Default Db: tpcds_30000_parquet
          Sql Statement: with customer_total_return as
       (select wr_returning_customer_sk as ctr_customer_sk
              ,ca_state as ctr_state, 
       	sum(wr_return_amt) as ctr_total_return
       from web_returns
           ,date_dim
           ,customer_address
       where wr_returned_date_sk = d_date_sk 
         and d_year =2002
         and wr_returning_addr_sk = ca_address_sk 
       group by wr_returning_customer_sk
               ,ca_state)
        select  c_customer_id,c_salutation,c_first_name,c_last_name,c_preferred_cust_flag
             ,c_birth_day,c_birth_month,c_birth_year,c_birth_country,c_login,c_email_address
             ,c_last_review_date,ctr_total_return
       from customer_total_return ctr1
           ,customer_address
           ,customer
       where ctr1.ctr_total_return > (select avg(ctr_total_return)*1.2
       			  from customer_total_return ctr2 
                        	  where ctr1.ctr_state = ctr2.ctr_state)
             and ca_address_sk = c_current_addr_sk
             and ca_state = 'VT'
             and ctr1.ctr_customer_sk = c_customer_sk
       order by c_customer_id,c_salutation,c_first_name,c_last_name,c_preferred_cust_flag
                        ,c_birth_day,c_birth_month,c_birth_year,c_birth_country,c_login,c_email_address
                        ,c_last_review_date,ctr_total_return
      limit 100
          Coordinator: va1026.foo:22000
          Query Timeline: 34.362ms
             - Query submitted: 24.311us (24.311us)
             - Unregister query: 33.617ms (33.592ms)
      

      Attachments

        Issue Links

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            tarmstrong Tim Armstrong
            mmokhtar Mostafa Mokhtar
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment