Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-6222

Make it easier to root-cause "failed to get minimum memory reservation" error

    XMLWordPrintableJSON

Details

    Description

      A user reported this error message:

       ExecQueryFInstances rpc query_id=c94288312d6d4055:bbfa166500000000 failed: Failed to get minimum memory reservation of 26.69 MB on daemon hodor-030.edh.cloudera.com:22000 for query c94288312d6d4055:bbfa166500000000 because it would exceed an applicable memory limit. Memory is likely oversubscribed. Reducing query concurrency or configuring admission control may help avoid this error. Memory usage:
      Process: Limit=96.00 GB Total=16.54 GB Peak=83.37 GB
      

      It turns out that a query was using up a lot of reservation, but it wasn't immediately apparent that the process reservation was mostly allocated to that query.

      Process: Limit=96.00 GB Total=12.20 GB Peak=83.37 GB
        Buffer Pool: Free Buffers: Total=208.00 MB
        Buffer Pool: Clean Pages: Total=7.19 GB
        Buffer Pool: Unused Reservation: Total=-76.79 GB
        Free Disk IO Buffers: Total=1.37 GB Peak=1.37 GB
        RequestPool=root.default: Total=76.81 GB Peak=77.56 GB
          Query(464a9afdbf2646cf:d9e2d41100000000): Reservation=76.80 GB ReservationLimit=76.80 GB OtherMemory=6.69 MB Total=76.81 GB Peak=76.93 GB
            Fragment 464a9afdbf2646cf:d9e2d4110000003f: Reservation=76.80 GB OtherMemory=6.69 MB Total=76.81 GB Peak=76.81 GB
              SELECT_NODE (id=3): Total=20.00 KB Peak=9.02 MB
                Exprs: Total=4.00 KB Peak=4.00 KB
              ANALYTIC_EVAL_NODE (id=2): Reservation=4.00 MB OtherMemory=6.64 MB Total=10.64 MB Peak=15.04 MB
                Exprs: Total=4.00 KB Peak=4.00 KB
              SORT_NODE (id=1): Reservation=76.79 GB OtherMemory=16.00 KB Total=76.79 GB Peak=76.80 GB
              EXCHANGE_NODE (id=4): Total=0 Peak=0
              DataStreamRecvr: Total=0 Peak=10.19 MB
              DataStreamSender (dst_id=5): Total=1.48 KB Peak=1.48 KB
              CodeGen: Total=1.68 KB Peak=710.00 KB
            Fragment 464a9afdbf2646cf:d9e2d41100000016: Reservation=0 OtherMemory=0 Total=0 Peak=389.69 MB
              HDFS_SCAN_NODE (id=0): Total=0 Peak=388.54 MB
              DataStreamSender (dst_id=4): Total=0 Peak=1.23 MB
              CodeGen: Total=0 Peak=49.00 KB
        Untracked Memory: Total=3.42 GB
      

      When a user or admin sees this problem they really want to immediately know:

      • What resource is exhausted (i.e. the process-wide reservation)?
      • Which query(ies) are using it and how do I kill them (i.e. what are the query ids and coordinators of the query).

      We should think through the error messages and diagnostics and improve them.

      Attachments

        Issue Links

          Activity

            People

              bikramjeet.vig Bikramjeet Vig
              tarmstrong Tim Armstrong
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: