Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-3200 Replace BufferedBlockMgr with new buffer pool
  3. IMPALA-5158

Account for difference between process memory consumption and memory used by queries

    XMLWordPrintableJSON

Details

    • Sub-task
    • Status: Resolved
    • Critical
    • Resolution: Fixed
    • Impala 2.9.0
    • Impala 2.10.0
    • Backend

    Description

      There is discrepancy between process wide memory usage and memory used by the query, in the example below the query is using 26.54 GB while the process is reporting 11.63 GB. We should make sure that all the memory is accounted for:

      • Used/unused reservations
      • Cached free buffers
      • Clean pages that haven't been evicted
      • True untracked memory (e.g. LLVM, etc)
      • Make the TCMalloc memory report less prominent, since it is less important now.

      Process memory usage

      Memory Usage
      Memory consumption / limit: 11.63 GB / 220.00 GB
      

      Breakdown

      Process: Limit=220.00 GB Total=11.63 GB Peak=183.69 GB
        Free Disk IO Buffers: Total=1.71 GB Peak=1.71 GB
        RequestPool=fe-eval-exprs: Total=0 Peak=4.00 KB
        RequestPool=root.mmokhtar: Total=26.54 GB Peak=90.13 GB
          Query(4b403f9f6fe2ecf6:4f81291100000000): Limit=55.00 GB Total=26.54 GB Peak=44.26 GB
            Fragment 4b403f9f6fe2ecf6:4f81291100000000: BufferPoolUsed/Reservation=0/0 OtherMemory=84.80 KB Total=84.80 KB Peak=1.53 MB
              EXCHANGE_NODE (id=36): Total=4.00 KB Peak=4.00 KB
                Exprs: Total=4.00 KB Peak=4.00 KB
              DataStreamRecvr: Total=67.38 KB Peak=67.38 KB
              PLAN_ROOT_SINK: Total=0 Peak=0
              CodeGen: Total=5.42 KB Peak=1.52 MB
            Fragment 4b403f9f6fe2ecf6:4f81291100000045: BufferPoolUsed/Reservation=0/26.54 GB OtherMemory=243.20 KB Total=26.54 GB Peak=26.54 GB
              SORT_NODE (id=20): Total=56.00 KB Peak=56.00 KB
                Exprs: Total=4.00 KB Peak=4.00 KB
              HASH_JOIN_NODE (id=19): BufferPoolUsed/Reservation=6.62 GB/16.14 GB OtherMemory=56.25 KB Total=16.14 GB Peak=16.14 GB
                Exprs: Total=4.00 KB Peak=4.00 KB
                Hash Join Builder (join_node_id=19): Total=14.12 KB Peak=22.12 KB
              HASH_JOIN_NODE (id=18): BufferPoolUsed/Reservation=0/7.31 GB OtherMemory=66.25 KB Total=7.31 GB Peak=7.31 GB
                Exprs: Total=4.00 KB Peak=4.00 KB
                Hash Join Builder (join_node_id=18): Total=23.12 KB Peak=31.12 KB
              AGGREGATION_NODE (id=25): BufferPoolUsed/Reservation=0/3.08 GB OtherMemory=31.12 KB Total=3.08 GB Peak=3.08 GB
                Exprs: Total=8.00 KB Peak=8.00 KB
              EXCHANGE_NODE (id=24): Total=0 Peak=0
              EXCHANGE_NODE (id=30): Total=0 Peak=0
              DataStreamRecvr: Total=0 Peak=2.47 MB
              EXCHANGE_NODE (id=35): Total=0 Peak=0
              DataStreamRecvr: Total=0 Peak=22.96 MB
              DataStreamSender (dst_id=36): Total=1008.00 B Peak=1008.00 B
              CodeGen: Total=24.59 KB Peak=4.67 MB
      

      This was captured while running TPC-DS Q78

      with ws as
        (select d_year AS ws_sold_year, ws_item_sk,
          ws_bill_customer_sk ws_customer_sk,
          sum(ws_quantity) ws_qty,
          sum(ws_wholesale_cost) ws_wc,
          sum(ws_sales_price) ws_sp
         from web_sales
         left join web_returns on wr_order_number=ws_order_number and ws_item_sk=wr_item_sk
         join date_dim on ws_sold_date_sk = d_date_sk
         where wr_order_number is null
         group by d_year, ws_item_sk, ws_bill_customer_sk
         ),
      cs as
        (select d_year AS cs_sold_year, cs_item_sk,
          cs_bill_customer_sk cs_customer_sk,
          sum(cs_quantity) cs_qty,
          sum(cs_wholesale_cost) cs_wc,
          sum(cs_sales_price) cs_sp
         from catalog_sales
         left join catalog_returns on cr_order_number=cs_order_number and cs_item_sk=cr_item_sk
         join date_dim on cs_sold_date_sk = d_date_sk
         where cr_order_number is null
         group by d_year, cs_item_sk, cs_bill_customer_sk
         ),
      ss as
        (select d_year AS ss_sold_year, ss_item_sk,
          ss_customer_sk,
          sum(ss_quantity) ss_qty,
          sum(ss_wholesale_cost) ss_wc,
          sum(ss_sales_price) ss_sp
         from store_sales
         left join store_returns on sr_ticket_number=ss_ticket_number and ss_item_sk=sr_item_sk
         join date_dim on ss_sold_date_sk = d_date_sk
         where sr_ticket_number is null
         group by d_year, ss_item_sk, ss_customer_sk
         )
       select 
      ss_sold_year, ss_item_sk, ss_customer_sk,
      round(ss_qty/(coalesce(ws_qty+cs_qty,1)),2) ratio,
      ss_qty store_qty, ss_wc store_wholesale_cost, ss_sp store_sales_price,
      coalesce(ws_qty,0)+coalesce(cs_qty,0) other_chan_qty,
      coalesce(ws_wc,0)+coalesce(cs_wc,0) other_chan_wholesale_cost,
      coalesce(ws_sp,0)+coalesce(cs_sp,0) other_chan_sales_price
      from ss
      left join ws on (ws_sold_year=ss_sold_year and ws_item_sk=ss_item_sk and ws_customer_sk=ss_customer_sk)
      left join cs on (cs_sold_year=ss_sold_year and cs_item_sk=cs_item_sk and cs_customer_sk=ss_customer_sk)
      where coalesce(ws_qty,0)>0 and coalesce(cs_qty, 0)>0 and ss_sold_year=2002
      order by 
        ss_sold_year, ss_item_sk, ss_customer_sk,
        ss_qty desc, ss_wc desc, ss_sp desc,
        other_chan_qty,
        other_chan_wholesale_cost,
        other_chan_sales_price,
        round(ss_qty/(coalesce(ws_qty+cs_qty,1)),2)
      limit 100;
      

      Attachments

        1. tpcds_q78_memory_discrepancy.txt
          665 kB
          Mostafa Mokhtar
        2. Screen Shot 2017-04-04 at 10.11.12 AM.png
          487 kB
          Mostafa Mokhtar

        Issue Links

          Activity

            People

              tarmstrong Tim Armstrong
              mmokhtar Mostafa Mokhtar
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: