Details
-
Bug
-
Status: Closed
-
Critical
-
Resolution: Fixed
-
Impala 2.5.0, Impala 2.6.0, Impala 2.7.0, Impala 2.8.0, Impala 2.9.0, Impala 2.10.0
-
ghx-label-7
Description
LZ4 has a built-in limit on the payload size that it can successfully compress. This limit can be indirectly checked via LZ4_compressBound(), but our code does not properly handle when LZ4_compressBound() returns 0 (which means the payload is too big).
As a result, large payloads are compressed to a bogus result. The bogus result even decompresses successfully - but not to the data that was originally compressed.
Relevant LZ4 code snippet:
https://github.com/lz4/lz4/blob/dev/lib/lz4.h#L153
Reproduction:
Add the following test case to decompress-test.cc
TEST_F(DecompressorTest, LZ4Huge) { // Generate a big random payload. int payload_len = numeric_limits<int>::max(); uint8_t* payload = new uint8_t[payload_len]; for (int i = 0 ; i < payload_len; ++i) payload[i] = rand(); scoped_ptr<Codec> compressor; EXPECT_OK(Codec::CreateCompressor(nullptr, true, impala::THdfsCompression::LZ4, &compressor)); // The returned max_size is 0 because the payload is too big. int64_t max_size = compressor->MaxOutputLen(payload_len); // Compression succeeds! int64_t compressed_len = max_size; uint8_t* compressed = new uint8_t[max_size]; EXPECT_OK(compressor->ProcessBlock(true, payload_len, payload, &compressed_len, &compressed)); // Decompression succeeds! scoped_ptr<Codec> decompressor; EXPECT_OK(Codec::CreateDecompressor(nullptr, true, impala::THdfsCompression::LZ4, &decompressor)); int64_t decompressed_len = compressed_len; uint8_t* decompressed = new uint8_t[compressed_len]; EXPECT_OK(decompressor->ProcessBlock(true, compressed_len, compressed, &decompressed_len, &decompressed)); // Assert fails. The uncompressed data is not the same as the original payload. ASSERT_EQ(memcmp(payload, decompressed, payload_len), 0); }