Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-2249

Avoid allocating stringbuffer larger than 1GB in HdfsTextScanner::FillByteBufferCompressedFile()

    XMLWordPrintableJSON

    Details

    • Type: Improvement
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: Impala 2.0, Impala 2.1, Impala 2.2
    • Fix Version/s: Impala 2.3.0, Impala 2.2.8
    • Component/s: None
    • Labels:
      None

      Description

      Due to IMPALA-1619, allocating stringbuffer larger than 1GB could cause Impala crash.
      For certain compressed file formats, Impala needs to read the whole file before decompressing. If the file is larger than 1GB, it will hit IMPALA-1619 and crash.
      It's better to check the filesize in advance and fail the query if compressed file is larger than 1GB, and give proper warning instead of crash.

      Crash happens after memory allocation, most likely at

      # Problematic frame:
      # C  [libc.so.6+0x89aab]  memcpy+0x15b
      
      #0 0x00000030b0e32625 in raise () from /lib64/libc.so.6
      #1 0x00000030b0e33e05 in abort () from /lib64/libc.so.6
      #2 0x00007fe070419a55 in os::abort(bool) () from /usr/java/jdk1.7.0_67-cloudera/jre/lib/amd64/server/libjvm.so
      #3 0x00007fe070599f87 in VMError::report_and_die() () from /usr/java/jdk1.7.0_67-cloudera/jre/lib/amd64/server/libjvm.so
      #4 0x00007fe07041e96f in JVM_handle_linux_signal () from /usr/java/jdk1.7.0_67-cloudera/jre/lib/amd64/server/libjvm.so
      #5 <signal handler called>
      #6 0x00000030b0e89a30 in memcpy () from /lib64/libc.so.6
      #7 0x00000000014573ef in impala::StringBuffer::GrowBuffer(int) ()
      #8 0x00000000014572eb in impala::StringBuffer::Append(char const*, int) ()
      #9 0x00000000014ce535 in impala::StringBuffer::Append(unsigned char const*, int) ()
      #10 0x00000000014cda5b in impala::ScannerContext::Stream::GetBytesInternal(long, unsigned char*, bool, long) ()
      #11 0x000000000143a71d in impala::ScannerContext::Stream::GetBytes(long, unsigned char*, long, impala::Status*, bool) ()
      #12 0x000000000145533e in impala::HdfsTextScanner::FillByteBufferCompressedFile(bool*) ()
      #13 0x00000000014542e4 in impala::HdfsTextScanner::FillByteBuffer(bool*, int) ()
      #14 0x0000000001453534 in impala::HdfsTextScanner::ProcessRange(int*, bool) ()
      

      Compression formats that could hit this issue are gzip, bz2, snappy.
      The symptom is when query table with smaller file (< 1GB), query runs fine. when query table with larger file, even a simple select count from tablewithlargefile limit 10 Impala will crash.

      workaround re-compress file to smaller ones.

      PS: streaming decompression for gzip text is supported since impala 2.1, so gzip file larger than 1GB works for 2.1 and later version.

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                jyu@cloudera.com Juan Yu
                Reporter:
                jyu@cloudera.com Juan Yu
              • Votes:
                0 Vote for this issue
                Watchers:
                3 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: