Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-6596

Query failed with OOM on coordinator while remote fragments on other nodes continue to run

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Open
    • Priority: Critical
    • Resolution: Unresolved
    • Affects Version/s: Impala 2.11.0
    • Fix Version/s: None
    • Component/s: Backend
    • Labels:

      Description

      This is somewhat similar to IMPALA-2990.

      Query 

      set NUM_SCANNER_THREADS=2;
      
      set MAX_ROW_SIZE=4194304;
      
      with cte as (select group_concat(cast(fnv_hash(concat(o_comment, 'd')) as string)) as c1,group_concat(cast(fnv_hash(concat(o_comment, 'e')) as string)) as c2 from orders where o_orderkey <1200000000 and o_orderdate <"1993-01-01" union all select group_concat(cast(fnv_hash(concat(o_comment, 'd')) as string)) as c1,group_concat(cast(fnv_hash(concat(o_comment, 'e')) as string)) as c2 from orders where o_orderkey <1200000000 and o_orderdate <"1993-01-01"), cte2 as (select c1,c2,s_suppkey from cte , supplier) select count(*) from cte2 t1, cte2 t2 where t1.s_suppkey = t2.s_suppkey group by t1.c1 , t1.c2 , t2.c1 , t2.c2 having count(*) = 1
      
      

      Failed on coordinator node which is also an executor with 

      h4. _Status:_ Row of size 1.82 GB could not be materialized in plan node with id 14. Increase the max_row_size query option (currently 4.00 MB) to process larger rows.
      
      

      Log on the coordinator has lots of entries with 

      I0227 19:20:58.057637 62974 impala-server.cc:1196] ReportExecStatus(): Received report for unknown query ID (probably closed or cancelled): 9b439fc1ee1addb7:82d4156900000000 I0227 19:20:58.152979 63129 impala-server.cc:1196] ReportExecStatus(): Received report for unknown query ID (probably closed or cancelled): 9b439fc1ee1addb7:82d4156900000000 I0227 19:20:58.714336 63930 impala-server.cc:1196] ReportExecStatus(): Received report for unknown query ID (probably closed or cancelled): 9b439fc1ee1addb7:82d4156900000000 I0227 19:20:58.718415 63095 impala-server.cc:1196] ReportExecStatus(): Received report for unknown query ID (probably closed or cancelled): 9b439fc1ee1addb7:82d4156900000000 I0227 19:20:58.757306 63339 impala-server.cc:1196] ReportExecStatus(): Received report for unknown query ID (probably closed or cancelled): 9b439fc1ee1addb7:82d4156900000000 I0227 19:20:58.762310 63406 impala-server.cc:1196] ReportExecStatus(): Received report for unknown query ID (probably closed or cancelled): 9b439fc1ee1addb7:82d4156900000000
      
      

      From the memz tab on a different node. 

      Memory Usage

      Memory consumption / limit: 142.87 GB / 100.00 GB

      Breakdown

      
      Process: memory limit exceeded. Limit=100.00 GB Total=141.05 GB Peak=144.25 GB Buffer Pool: Free Buffers: Total=0 Buffer Pool: Clean Pages: Total=0 Buffer Pool: Unused Reservation: Total=-118.00 MB TCMalloc Overhead: Total=1.68 GB RequestPool=root.default: Total=40.10 GB Peak=89.40 GB Query(9b439fc1ee1addb7:82d4156900000000): Reservation=122.00 MB ReservationLimit=80.00 GB OtherMemory=39.98 GB Total=40.10 GB Peak=40.10 GB Unclaimed reservations: Reservation=72.00 MB OtherMemory=0 Total=72.00 MB Peak=226.00 MB Fragment 9b439fc1ee1addb7:82d4156900000059: Reservation=0 OtherMemory=0 Total=0 Peak=632.88 KB AGGREGATION_NODE (id=29): Total=0 Peak=76.12 KB EXCHANGE_NODE (id=28): Reservation=0 OtherMemory=0 Total=0 Peak=0 DataStreamRecvr: Total=0 Peak=0 DataStreamSender (dst_id=30): Total=0 Peak=1.75 KB CodeGen: Total=0 Peak=547.00 KB Fragment 9b439fc1ee1addb7:82d4156900000050: Reservation=0 OtherMemory=0 Total=0 Peak=3.67 GB AGGREGATION_NODE (id=15): Total=0 Peak=76.12 KB HASH_JOIN_NODE (id=14): Reservation=0 OtherMemory=0 Total=0 Peak=1.85 GB Hash Join Builder (join_node_id=14): Total=0 Peak=13.12 KB EXCHANGE_NODE (id=26): Reservation=0 OtherMemory=0 Total=0 Peak=1.82 GB DataStreamRecvr: Total=0 Peak=1.82 GB EXCHANGE_NODE (id=27): Reservation=0 OtherMemory=0 Total=0 Peak=1.82 GB DataStreamRecvr: Total=0 Peak=1.82 GB DataStreamSender (dst_id=28): Total=0 Peak=15.75 KB CodeGen: Total=0 Peak=1.81 MB Fragment 9b439fc1ee1addb7:82d4156900000023: Reservation=26.00 MB OtherMemory=19.99 GB Total=20.02 GB Peak=20.02 GB Runtime Filter Bank: Reservation=2.00 MB ReservationLimit=2.00 MB OtherMemory=0 Total=2.00 MB Peak=2.00 MB NESTED_LOOP_JOIN_NODE (id=6): Total=3.63 GB Peak=3.63 GB Nested Loop Join Builder: Total=3.63 GB Peak=3.63 GB HDFS_SCAN_NODE (id=5): Reservation=24.00 MB OtherMemory=2.88 MB Total=26.88 MB Peak=26.88 MB EXCHANGE_NODE (id=20): Reservation=0 OtherMemory=0 Total=0 Peak=1.82 GB DataStreamRecvr: Total=0 Peak=1.82 GB DataStreamSender (dst_id=26): Total=16.36 GB Peak=16.36 GB CodeGen: Total=628.00 B Peak=222.50 KB Fragment 9b439fc1ee1addb7:82d4156900000007: Reservation=0 OtherMemory=0 Total=0 Peak=734.73 MB AGGREGATION_NODE (id=2): Total=0 Peak=516.29 MB HDFS_SCAN_NODE (id=1): Reservation=0 OtherMemory=0 Total=0 Peak=350.27 MB DataStreamSender (dst_id=16): Total=0 Peak=3.88 KB CodeGen: Total=0 Peak=839.50 KB Fragment 9b439fc1ee1addb7:82d4156900000014: Reservation=0 OtherMemory=0 Total=0 Peak=734.18 MB AGGREGATION_NODE (id=4): Total=0 Peak=516.33 MB HDFS_SCAN_NODE (id=3): Reservation=0 OtherMemory=0 Total=0 Peak=349.92 MB DataStreamSender (dst_id=18): Total=0 Peak=3.88 KB CodeGen: Total=0 Peak=839.50 KB Fragment 9b439fc1ee1addb7:82d4156900000047: Reservation=24.00 MB OtherMemory=19.99 GB Total=20.02 GB Peak=20.02 GB NESTED_LOOP_JOIN_NODE (id=13): Total=3.63 GB Peak=3.63 GB Nested Loop Join Builder: Total=3.63 GB Peak=3.63 GB HDFS_SCAN_NODE (id=12): Reservation=24.00 MB OtherMemory=2.88 MB Total=26.88 MB Peak=26.88 MB EXCHANGE_NODE (id=25): Reservation=0 OtherMemory=0 Total=0 Peak=1.82 GB DataStreamRecvr: Total=0 Peak=1.82 GB DataStreamSender (dst_id=27): Total=16.36 GB Peak=16.36 GB CodeGen: Total=234.00 B Peak=52.50 KB Fragment 9b439fc1ee1addb7:82d415690000002b: Reservation=0 OtherMemory=0 Total=0 Peak=734.73 MB AGGREGATION_NODE (id=9): Total=0 Peak=516.29 MB HDFS_SCAN_NODE (id=8): Reservation=0 OtherMemory=0 Total=0 Peak=350.24 MB DataStreamSender (dst_id=21): Total=0 Peak=3.88 KB CodeGen: Total=0 Peak=839.50 KB Fragment 9b439fc1ee1addb7:82d4156900000038: Reservation=0 OtherMemory=0 Total=0 Peak=734.73 MB AGGREGATION_NODE (id=11): Total=0 Peak=516.29 MB HDFS_SCAN_NODE (id=10): Reservation=0 OtherMemory=0 Total=0 Peak=350.52 MB DataStreamSender (dst_id=23): Total=0 Peak=3.88 KB CodeGen: Total=0 Peak=839.50 KB Untracked Memory: Total=99.38 GB
      
      

       

      It is strange that a consumption of 143.63 GB is reported while the system has 128GB RAM.

      
      Hardware Info
      
      Cpu Info: Model: Intel(R) Xeon(R) CPU E5-2630L v2 @ 2.40GHz Cores: 24 Max Possible Cores: 24 L1 Cache: 32.00 KB (Line: 64.00 B) L2 Cache: 256.00 KB (Line: 64.00 B) L3 Cache: 15.00 MB (Line: 64.00 B) Hardware Supports: ssse3 sse4_1 sse4_2 popcnt avx pclmulqdq Numa Nodes: 2 Numa Nodes of Cores: 0->0 | 1->0 | 2->0 | 3->0 | 4->0 | 5->0 | 6->1 | 7->1 | 8->1 | 9->1 | 10->1 | 11->1 | 12->0 | 13->0 | 14->0 | 15->0 | 16->0 | 17->0 | 18->1 | 19->1 | 20->1 | 21->1 | 22->1 | 23->1 | Physical Memory: 126.00 GB
      
      

        Attachments

        1. RowSizeFailureProfile.txt
          708 kB
          Mostafa Mokhtar

          Issue Links

            Activity

              People

              • Assignee:
                Unassigned
                Reporter:
                mmokhtar Mostafa Mokhtar
              • Votes:
                0 Vote for this issue
                Watchers:
                5 Start watching this issue

                Dates

                • Created:
                  Updated: