[IMPALA-6222] Make it easier to root-cause "failed to get minimum memory reservation" error - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: Impala 2.10.0
Fix Version/s: Impala 2.12.0
Component/s: Backend
Labels:
- resource-management

Target Version:

Impala 2.12.0
Epic Color:
ghx-label-4

Description

A user reported this error message:

 ExecQueryFInstances rpc query_id=c94288312d6d4055:bbfa166500000000 failed: Failed to get minimum memory reservation of 26.69 MB on daemon hodor-030.edh.cloudera.com:22000 for query c94288312d6d4055:bbfa166500000000 because it would exceed an applicable memory limit. Memory is likely oversubscribed. Reducing query concurrency or configuring admission control may help avoid this error. Memory usage:
Process: Limit=96.00 GB Total=16.54 GB Peak=83.37 GB

It turns out that a query was using up a lot of reservation, but it wasn't immediately apparent that the process reservation was mostly allocated to that query.

Process: Limit=96.00 GB Total=12.20 GB Peak=83.37 GB
  Buffer Pool: Free Buffers: Total=208.00 MB
  Buffer Pool: Clean Pages: Total=7.19 GB
  Buffer Pool: Unused Reservation: Total=-76.79 GB
  Free Disk IO Buffers: Total=1.37 GB Peak=1.37 GB
  RequestPool=root.default: Total=76.81 GB Peak=77.56 GB
    Query(464a9afdbf2646cf:d9e2d41100000000): Reservation=76.80 GB ReservationLimit=76.80 GB OtherMemory=6.69 MB Total=76.81 GB Peak=76.93 GB
      Fragment 464a9afdbf2646cf:d9e2d4110000003f: Reservation=76.80 GB OtherMemory=6.69 MB Total=76.81 GB Peak=76.81 GB
        SELECT_NODE (id=3): Total=20.00 KB Peak=9.02 MB
          Exprs: Total=4.00 KB Peak=4.00 KB
        ANALYTIC_EVAL_NODE (id=2): Reservation=4.00 MB OtherMemory=6.64 MB Total=10.64 MB Peak=15.04 MB
          Exprs: Total=4.00 KB Peak=4.00 KB
        SORT_NODE (id=1): Reservation=76.79 GB OtherMemory=16.00 KB Total=76.79 GB Peak=76.80 GB
        EXCHANGE_NODE (id=4): Total=0 Peak=0
        DataStreamRecvr: Total=0 Peak=10.19 MB
        DataStreamSender (dst_id=5): Total=1.48 KB Peak=1.48 KB
        CodeGen: Total=1.68 KB Peak=710.00 KB
      Fragment 464a9afdbf2646cf:d9e2d41100000016: Reservation=0 OtherMemory=0 Total=0 Peak=389.69 MB
        HDFS_SCAN_NODE (id=0): Total=0 Peak=388.54 MB
        DataStreamSender (dst_id=4): Total=0 Peak=1.23 MB
        CodeGen: Total=0 Peak=49.00 KB
  Untracked Memory: Total=3.42 GB

When a user or admin sees this problem they really want to immediately know:

What resource is exhausted (i.e. the process-wide reservation)?
Which query(ies) are using it and how do I kill them (i.e. what are the query ids and coordinators of the query).

We should think through the error messages and diagnostics and improve them.

Attachments

Issue Links

breaks

IMPALA-6362 Queries don't make progress due to what seems like a memory reservation deadlock while running the stress tests

Resolved

is duplicated by

IMPALA-5790 Failure to get reservation should print buffer pool limit

Resolved

relates to

IMPALA-5790 Failure to get reservation should print buffer pool limit

Resolved

Activity

People

Assignee:: Bikramjeet Vig

Reporter:: Tim Armstrong

Votes:: 0 Vote for this issue

Watchers:: 4 Start watching this issue

Dates

Created:: 20/Nov/17 20:58

Updated:: 20/Feb/18 06:43

Resolved:: 18/Dec/17 22:10