Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-10005

Impala can't read Snappy compressed text files on S3 or ABFS

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Blocker
    • Resolution: Fixed
    • Impala 4.0.0
    • Impala 4.0.0
    • Frontend
    • None

    Description

      When reading snappy compressed text from S3 or ABFS on a release build, it fails to decompress:

       

      I0723 21:19:43.712909 229706 status.cc:128] Snappy: RawUncompress failed
          @           0xae26c9  impala::Status::Status()
          @          0x107635b  impala::SnappyDecompressor::ProcessBlock()
          @          0x11b1f2d  impala::HdfsTextScanner::FillByteBufferCompressedFile()
          @          0x11b23ef  impala::HdfsTextScanner::FillByteBuffer()
          @          0x11af96f  impala::HdfsTextScanner::FillByteBufferWrapper()
          @          0x11b096b  impala::HdfsTextScanner::ProcessRange()
          @          0x11b2b31  impala::HdfsTextScanner::GetNextInternal()
          @          0x118644b  impala::HdfsScanner::ProcessSplit()
          @          0x11774c2  impala::HdfsScanNode::ProcessSplit()
          @          0x1178805  impala::HdfsScanNode::ScannerThread()
          @          0x1100f31  impala::Thread::SuperviseThread()
          @          0x1101a79  boost::detail::thread_data<>::run()
          @          0x16a3449  thread_proxy
          @     0x7fc522befe24  start_thread
          @     0x7fc522919bac  __clone

      When using a debug build, Impala hits the following DCHECK:

       

       

      F0723 23:45:12.849973 249653 hdfs-text-scanner.cc:197] Check failed: stream_>file_desc()>file_compression != THdfsCompression::SNAPPY FE should have generated SNAPPY_BLOCKED instead.

      That DCHECK explains why it would fail to decompress. It is using the wrong THdfsCompression.

      I reproduced this on master in my dev env by changing FileSystemUtil::supportsStorageIds() to always return true. This emulates the behavior on object stores like S3 and ABFS.

       

        /**
         * Returns true if the filesystem supports storage UUIDs in BlockLocation calls.
         */
        public static boolean supportsStorageIds(FileSystem fs) {
          return false;
        }

      This is specific to Snappy and does not appear to apply to other compression codecs.

      Attachments

        Activity

          People

            joemcdonnell Joe McDonnell
            joemcdonnell Joe McDonnell
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: