Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-473

Impala hits BE memory limit running COUNT(*) on SEQ/SNAP table with large number of clients

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Critical
    • Resolution: Fixed
    • Impala 1.1
    • Impala 1.1.1
    • None
    • None

    Description

      The query: SELECT COUNT FROM 300gbTable is failing on the 17-node cluster when running with 500 concurrent clients (~30 clients per node). It seems to be related to SEQ/SNAP file format.

      Attached is a heap growth profile. It can be seen that HdfsSequenceScanner::ReadCompressedBlock is using ~44GB of memory. SnappyBlockDecompressor::ProcessBlock is using ~32GB of the ~44GB.

      14:21:48  Client Thread 148: Result:
      14:21:48    -> Avg Time: None, Std Dev: None
      14:21:48  
      14:21:52  Client Thread 178: <class 'tests.beeswax.impala_beeswax.ImpalaBeeswaxException'>:
      14:21:52   Query aborted:
      14:21:52  Backend 7:Memory limit exceeded
      14:21:52  Format error in record or block header at offset: 7937326
      14:21:52  First error while processing: hdfs://c2102.hal.cloudera.com:8020/user/impala/test-warehouse/300gb.main_seq_snap/000328_0 at offset: 8388608
      

      Attachments

        1. mem_leak.ps
          39 kB
          Lenni Kuff

        Activity

          People

            skye Skye Wanderman-Milne
            lskuff Lenni Kuff
            Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: