Details
-
Bug
-
Status: Resolved
-
Blocker
-
Resolution: Fixed
-
Impala 4.0.0
-
None
-
ghx-label-5
Description
When reading snappy compressed text from S3 or ABFS on a release build, it fails to decompress:
I0723 21:19:43.712909 229706 status.cc:128] Snappy: RawUncompress failed @ 0xae26c9 impala::Status::Status() @ 0x107635b impala::SnappyDecompressor::ProcessBlock() @ 0x11b1f2d impala::HdfsTextScanner::FillByteBufferCompressedFile() @ 0x11b23ef impala::HdfsTextScanner::FillByteBuffer() @ 0x11af96f impala::HdfsTextScanner::FillByteBufferWrapper() @ 0x11b096b impala::HdfsTextScanner::ProcessRange() @ 0x11b2b31 impala::HdfsTextScanner::GetNextInternal() @ 0x118644b impala::HdfsScanner::ProcessSplit() @ 0x11774c2 impala::HdfsScanNode::ProcessSplit() @ 0x1178805 impala::HdfsScanNode::ScannerThread() @ 0x1100f31 impala::Thread::SuperviseThread() @ 0x1101a79 boost::detail::thread_data<>::run() @ 0x16a3449 thread_proxy @ 0x7fc522befe24 start_thread @ 0x7fc522919bac __clone
When using a debug build, Impala hits the following DCHECK:
F0723 23:45:12.849973 249653 hdfs-text-scanner.cc:197] Check failed: stream_>file_desc()>file_compression != THdfsCompression::SNAPPY FE should have generated SNAPPY_BLOCKED instead.
That DCHECK explains why it would fail to decompress. It is using the wrong THdfsCompression.
I reproduced this on master in my dev env by changing FileSystemUtil::supportsStorageIds() to always return true. This emulates the behavior on object stores like S3 and ABFS.
/** * Returns true if the filesystem supports storage UUIDs in BlockLocation calls. */ public static boolean supportsStorageIds(FileSystem fs) { return false; }
This is specific to Snappy and does not appear to apply to other compression codecs.