Lucene - Core
  1. Lucene - Core
  2. LUCENE-5201

Compression issue on highly compressible inputs with LZ4.compressHC

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 4.5, 5.0
    • Component/s: None
    • Labels:
      None
    • Lucene Fields:
      New

      Description

      LZ4.compressHC sometimes fails at compressing highly compressible inputs when the start offset is > 0.

      1. LUCENE-5201.patch
        14 kB
        Adrien Grand

        Activity

        Hide
        Adrien Grand added a comment -

        4.5 release -> bulk close

        Show
        Adrien Grand added a comment - 4.5 release -> bulk close
        Hide
        ASF subversion and git services added a comment -

        Commit 1520269 from Adrien Grand in branch 'dev/branches/branch_4x'
        [ https://svn.apache.org/r1520269 ]

        LUCENE-5201: Fixed compression bug in LZ4.compressHC.

        Show
        ASF subversion and git services added a comment - Commit 1520269 from Adrien Grand in branch 'dev/branches/branch_4x' [ https://svn.apache.org/r1520269 ] LUCENE-5201 : Fixed compression bug in LZ4.compressHC.
        Hide
        ASF subversion and git services added a comment -

        Commit 1520268 from Adrien Grand in branch 'dev/trunk'
        [ https://svn.apache.org/r1520268 ]

        LUCENE-5201: Fixed compression bug in LZ4.compressHC.

        Show
        ASF subversion and git services added a comment - Commit 1520268 from Adrien Grand in branch 'dev/trunk' [ https://svn.apache.org/r1520268 ] LUCENE-5201 : Fixed compression bug in LZ4.compressHC.
        Hide
        Adrien Grand added a comment -

        This bugs needed two conditions to appear:

        • the input needs to be highly compressible so that there are collisions in the chain table used for finding references backwards in the stream,
        • the start offset needs to be > 0.

        CompressingStoredFieldFormat only calls LZ4.compress(HC) with positive start offsets since LUCENE-5188 so this shouldn't have impact on people who were using CompressionMode.FAST_DECOMPRESSION (which seems to be confirmed by the fact that we never saw any test failure related to this until today, only a few minutes after I committed LUCENE-5188).

        I was able to write a test case that reproduces the bug and changed the existing tests so that they don't only test compression with a start offset of 0.

        Show
        Adrien Grand added a comment - This bugs needed two conditions to appear: the input needs to be highly compressible so that there are collisions in the chain table used for finding references backwards in the stream, the start offset needs to be > 0. CompressingStoredFieldFormat only calls LZ4.compress(HC) with positive start offsets since LUCENE-5188 so this shouldn't have impact on people who were using CompressionMode.FAST_DECOMPRESSION (which seems to be confirmed by the fact that we never saw any test failure related to this until today, only a few minutes after I committed LUCENE-5188 ). I was able to write a test case that reproduces the bug and changed the existing tests so that they don't only test compression with a start offset of 0.
        Hide
        Adrien Grand added a comment -

        A fix is already committed but I opened this issue on the suggestion of Uwe so that it has an entry in the changelog.

        Show
        Adrien Grand added a comment - A fix is already committed but I opened this issue on the suggestion of Uwe so that it has an entry in the changelog.

          People

          • Assignee:
            Adrien Grand
            Reporter:
            Adrien Grand
          • Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development