Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-5987

LZ4 Codec silently produces bogus compressed data for large inputs

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Critical
    • Resolution: Fixed
    • Affects Version/s: Impala 2.5.0, Impala 2.6.0, Impala 2.7.0, Impala 2.8.0, Impala 2.9.0, Impala 2.10.0
    • Fix Version/s: Impala 2.11.0
    • Component/s: Backend
    • Labels:

      Description

      LZ4 has a built-in limit on the payload size that it can successfully compress. This limit can be indirectly checked via LZ4_compressBound(), but our code does not properly handle when LZ4_compressBound() returns 0 (which means the payload is too big).

      As a result, large payloads are compressed to a bogus result. The bogus result even decompresses successfully - but not to the data that was originally compressed.

      Relevant LZ4 code snippet:
      https://github.com/lz4/lz4/blob/dev/lib/lz4.h#L153

      Reproduction:
      Add the following test case to decompress-test.cc

      TEST_F(DecompressorTest, LZ4Huge) {
        // Generate a big random payload.
        int payload_len = numeric_limits<int>::max();
        uint8_t* payload = new uint8_t[payload_len];
        for (int i = 0 ; i < payload_len; ++i) payload[i] = rand();
      
        scoped_ptr<Codec> compressor;
        EXPECT_OK(Codec::CreateCompressor(nullptr, true, impala::THdfsCompression::LZ4,
            &compressor));
      
        // The returned max_size is 0 because the payload is too big.
        int64_t max_size = compressor->MaxOutputLen(payload_len);
      
        // Compression succeeds!
        int64_t compressed_len = max_size;
        uint8_t* compressed = new uint8_t[max_size];
        EXPECT_OK(compressor->ProcessBlock(true, payload_len, payload,
            &compressed_len, &compressed));
      
        // Decompression succeeds!
        scoped_ptr<Codec> decompressor;
        EXPECT_OK(Codec::CreateDecompressor(nullptr, true, impala::THdfsCompression::LZ4,
            &decompressor));
        int64_t decompressed_len = compressed_len;
        uint8_t* decompressed = new uint8_t[compressed_len];
        EXPECT_OK(decompressor->ProcessBlock(true, compressed_len,
            compressed, &decompressed_len, &decompressed));
      
        // Assert fails. The uncompressed data is not the same as the original payload.
        ASSERT_EQ(memcmp(payload, decompressed, payload_len), 0);
      }
      

        Attachments

          Activity

            People

            • Assignee:
              boroknagyz Zoltán Borók-Nagy
              Reporter:
              alex.behm Alexander Behm
            • Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: